Household Types, Combinatorial Problems and Pure Maths

.........................................................

In some of the work I’m currently doing looking at households as derived from the Southwark patient register I wanted to go beyond a quantification of how many people lived in a households – a rudimentary household size, to looking at the composition of a household and hence what type of household it represented. In order to do this I looked at how types of household were generally reported in the UK Census, in European statistics, and in terms of social research on the life course, as well as in health literature itself. In terms of defining households, I found that although complex household typologies do exist, there exists a general set of likely household forms: as expected these revolve around the single, co-habiting, family, single parenthood, extended family etc models. As I have data on individuals I first decided to classify individuals into 5 broad categories that seem important in the literature and then look at the composition of these categories within households. The categories were:

1) Dependent Children (<18 yrs old)

2) Adult Male (18-65 yrs old)

3) Adult Female (18-60 yrs old)

4) Male Pensioner (65+ yrs old)

5) Female Pensioner (60+ yrs old)

Evidence suggests that these represent the coarsest categories that could usefully represent significant periods within the life course, as well as being relevant to changes in health status. In a sense, different type of household structure can be described by different combinations of these person classes for different household sizes.

I decided to test this by calculating all the possible combinations of these 5 classes for a 2 person household and then looking at their uptake in the actual household data I had derived from the Southwark patient register. It turned out that for a two person household there were 15 different ways in which you could combine the 5 person classes in order to create a unique household:

Children Only (Parents Unregistered); Single Parent Male and Child; Co-Habiting Men; Single Parent Female and Child; Single Parent Male Pensioner and Child; Co-Habiting Man and Woman; Co-Habiting Man and Male Pensioner; Co-Habiting Women; Single Parent Female Pensioner and Child; Cohabiting Woman and Male Pensioner; Cohabiting Man and Female Pensioner; Cohabiting Male Pensioners; Cohabiting Woman and Female Pensioner; Cohabiting Male and Female Pensioner; Cohabiting Female Pensioners.

Using this typology of 15 possible household types, I extracted the two person households from the larger dataset and wrote a Python script to classify these households. The result for 27,124 households was a follows:

What this graph seems to demonstrate is that roughly half of all 2 person households consist of a man and a woman (either adult or pensioner) cohabiting, and roughly a further 22% of same sex cohabitation. In this dataset for two person household, single parents only make up around 15% of households of which almost 13% is a single female parent (adult or pensioner) and a child. All other groups only make up around 13% of households, but crucially the only category in which no households were found to exist was the adult man cohabiting with a male pensioner category. Indeed many of the smaller categories can be interpreted as having inherently important social roles, the adult woman looking after a male or female pensioner for instance.

Essentially, the terrain of household type was a lot more nuanced and tricky than I’d at first though, made even more so by my realisation that as household size increases, the number of possible combinations of the person types within a  household increases dramatically. I wrote a python script to calculate the number of possible different sets of people for the household sizes 1 to 10:

This presents a difficult situation, even for reasonably small households. This is a problem known as “combinatorial mathematics” or “combinatorics“. I decided to see what I could learn about this distribution, so I looked for patterns in the sequence, as you are taught in pre-GCSE maths and soon found that the sequence had a constant fourth difference:

This constant fourth difference indicated that the sequence can be explained by a quartic function, of which is was easy to then calculate the form:

Sadly not one of those classically beautiful equations.

This all leads to the issue of how I now classify households, clearly the number of possible sets makes anything above around 4 people per household fairly intractable. I’ll experiment with 3 households and see whether I can account for most household types with a few set patterns and then look at households that fall outside of this remit.

Interesting none the less, I hadn’t expected to be doing much of this kind of maths!

Continue Reading

Computing the geometric median in Python

.........................................................

I noticed in a beta of ArcGIS 10 (then called 9.4) that there was a ‘new’ option for computing a Geometric Median which didn’t exist in my copy of ArcGIS 9.3. This is an interesting concept, as in 1d statistics, the geometric (2d) mean is easy to calculate, being the average of all the X coords and all the Y coords. From stats we know that the Mean and Median value of a distribution will coincide if the data is perfectly normally distributed; however in the real world data usually will only approximate a normal distribution, leading to a mean value that is different from the midpoint, or median.  Therefore for a skewed distribution on the plane, we encounter a situation in which the mean is not necessarily the best representation of the ‘centre’ of the data, thus we may wish to calculate the median; doing so will also give us a good idea of the direction of the skew of the point pattern we are investigating. In calculating the median of a 2d point pattern we can express the problem as a need to:

minimise the sum of squared distances from all points in a distribution to a centre.

Thus it is reasonably clear that we are dealing with an ‘optimisation problem’, something that I have experimented with before in work I conducted using the ‘transportation problem’, a classic linear programming problem.

In terms of application, I though that finding the median of a distribution of people around a service would be a useful, albeit basic, indication of whether all people were making a similar trip to a service, or whether there were other effects at work (this would be evidenced by a median centre that was not close to the actual service location). I though I would be able to code the optimisation routine in Python using pre-existing insight. Notably, the wikipedia page on this details the Weiszfeld Algorithm as the acknowledged computational solution to the geometric median problem, it takes the form:

However, actually writing the algorithm proved somewhat tough. Essentially the answer is to start with a candidate data point (I started with the mean centre) and calculate the objective function – in this case the sum of the euclidian distances of all points from the candidate centre. Then pass the candidate point through the Weiszfeld Algortihm and reassess the objective function, at such a point as the objective function converges a median has been found. There is no guarantee that the median found is the optimal median though, and depending of the data there may be more than 1 optimal solution. Below is a solution for some of my data (the data has been randomly offset by 75m to preserve anonymity) on patient registrations to a doctor.

Here we can see that the mean and median centres are slightly different, suggesting that the patient population is skewed slightly northwards, most likely as a result of discontinuous urban infrastructure.

The scatterplot was achieved using the matplotlib Python plotting library. This was just a test, but I imagine more complex outputs can be achieved reasonably easily.

Notably, this technique is using euclidian distance, which in a dense urban environment may be misleading, I note that there is a relatively simple execution of the Dijkstra algorithm for shortest paths in Python, and I am curious whether this could be integrated to find a geometric median on the network, although I suspect that it may be unworkable due to computational time constraints, although for smaller problems it might be ok.

Naturally there are algorithms that can calculate a solution to the above for p-medians (i.e. several service centres in a population- commonly known as location-allocation), it is something that Paul Densham at UCL has worked on, and his code is making a return to service in ArcGIS version 10. I’m looking forward to seeing it, as it is a very difficult problem to solve (and in fact already has been ‘solved’), and not one I intend to investigate!

My code for the geometric median is here.

Continue Reading

Distribution of Household Occupancy in Southwark

.........................................................

I’ve been doing some more analysis on the Southwark GP patient register at the household level. After a fair amount of cleaning and interpretation I’ve arrived at the following distribution of households.

There are a number of interesting things to say about this data, not least in the section that I’ve marked ‘larger social groupings’ as it seems to suggest a possible migrant social network effect, as the larger household groupings tend to be of minority ethnic groups, including Nigerians and other Africans, Hispanics and South-East Asians who are perhaps using cross-country social ties as help in getting established when first arriving in the UK. However, visually the shape of the distribution of household occupancy is very distinctive, and actually is very close to an exponential. Here I’ve taken the log of frequency of occurence and plotted the best-fit line through the plot:

This linear trend means that the model log(y) = -0.1635x + 4.602 is a good predictor of the number of Households we can expect to exist in Southwark for a given value of x, or occupancy.

It is not entirely clear however why this situation is the case. Firstly, it may just be an artifact of the data, either of the matching process that has occured between the patient register and OS AddressLayer2, the way that GPs encode patient addresses in the first place, or the fact that the patient register is only a sample of the total population of Southwark, i.e. those people who register with a doctor. Secondly, it may simply be a reflection of the structure of the built environment in Southwark – i.e. what kind of housing is actually available. However, the distribution is also subject to the choices of individuals or groups.

Currently, I am in the process of dissagregating the above characteristics and looking at trends by different population groups.

Continue Reading

Review of Elementary Statistics for Geographers- Bert et al.

.........................................................

A review I authored of Bert, Barber and Rigby’s “Elementary Statistics for Geographers” third edition, has made it into the Journal of the Royal Statistical Society Series A: Statistics in Society. The book is a truly excellent collection of statistical methods themed explicitly for use by geographers and spatial scientists, moreover the explanation and presentation is superb. This has become a core book for myself and my colleague James Cheshire as we continue along the route of our PhD studies. I have said much the same thing in my review, accessible here.

Continue Reading

Jenks’ Natural Breaks Algorithm in Python

.........................................................

The Jenks Optimal, or Jenks’ Natural Breaks, Algorithm is a common method for classifying data presented in a choropleth map. It aims to present a series of break values that best represent the actual breaks observed in the data as opposed to some arbitrary classificatory scheme (i.e. equal interval), in this way the actual clustering of data values is preserved (subject to the arbitrary specification of k classes). The method was originally published in George Jenks’ (1977) Optimal Data Classification for Choropleth Maps and reportedly represented the culmination of 15 years research on the topic, the method primarily derived from Walter Fisher’s work ‘On grouping for maximum homogeneity‘. The specifics of the algorithm aim to create k classes so that the variance within groups is minimised, as such it is a problem of numerical optimisation.

A paper by Michael Coulson (1987) entitled In The Matter Of Class Intervals For Choropleth Maps: With Particular Reference To The Work Of George F Jenks details a method that Jenks apparently authored, but never published, to derive how optimum the number of classes chosen was, the method Goodness of Variance Fit (GVF) works by taking the difference between the squared deviations from the array mean (SDAM) and the squared deviations from the class means (SDCM), and dividing by the SDAM. Thus:

GVF = (SDAM – SDCM)/SDAM

However, it is likely this was never published as the GVF improves as the number of classes increases, until at such a points as there are the same number of classes as data points, the GVF reaches unity. Nonetheless, I have included a rudimentary example for calculating this statistic. In reality, this method is used to generalise data into a few classes for visualisation, so you are unlikely to be using more than 7 (+/- 2) classes; number of classes can be loosely assigned by looking at the distribution histogram, but often this is difficult.

The script is here.

Acknowledgement: The initial script I used for the Python conversion can be found (in JAVA and Fortran) here: https://stat.ethz.ch/pipermail/r-sig-geo/2006-March/000811.html

Continue Reading

Review of Rethinking Maps by Dodge, Kitchin and Perkins in EPB

.........................................................

This month has seen the publication of my review of “Rethinking Maps: New Frontiers in Cartographic Theory”, editted by Martin Dodge, Rob Kitchin and Chris Perkins, in Environment and Planning B.

The review begins thus:

This collection of essays marks a milestone of scholarship in critical cartography, a discourse most notably augered by the seminal work of John B Harley collected in The New Nature of Maps (2001). This collection moves forward from Harley and provides a timely summation and spur for future research in maps and mapping. In the final chapter of this edited book, a chapter subtitled “A manifesto for map studies”, Martin Dodge, Chris Perkins, and Rob Kitchin make clear that: “It is, we would argue, a stimulating time for mapping scholarship with many challenges and opportunities opening up: no single epistemological position now dominates interpretation” (page 229).

For more see the full review. Sorry if you aren’t a subscriber to the journal, I suspect I can’t post the full text though.

A proof of the first chapter, courtesy of Martin Dodge, is available here.

Continue Reading

UK OAC map in Python

.........................................................

Here is a quick confirmation that you can use Python to draw very detailed maps; using the previously specified method I was unable to get python to draw all UK OAs due to their great number (c.220,000) and high complexity (c.50,000,000) vertices. Additionally I was unable to use the generalised OA boundaries for the UK from UKBorders as they contain topological errors that the shapefile reader cannot deal with. ArcGIS is obviously a bit clever in how it handles bad topologies. So I extracted all the vertices and fed them into shapely polygons, and visualised them in the same way, but without reading shapefiles directly into python and was able to output this:

This method has had an impact on the speed of computation as it can take roughly 25 minutes to output this map. The map looks pretty good, aside from a slightly odd polygon in the Bristol channel. Nevertheless, coupled with the operations that shapely, and other geo-libraries, can do this si increasing indication of the maturity of GIS in a variety of platforms. Oh, and it’s all free!

Continue Reading

‘Locally led’ NHS Service changes dubious

.........................................................

Since coming to government, new Conservative Health Secretary Andrew Lansley has sought to fulfil the pledge he made to put an end to local restructurings of NHS service delivery by authorities higher up the NHS hierarchy. Ostensibly he believes that local decision-making will have a better overall effect on the quality of outcomes for patients and hence lead to a better health service. Specifically he wants to provide GPs with an opportunity to work with community leaders and their local authorities to steer local services. The core elements actually do not differ greatly from the outgoing Labour policies, particularly with respect to patient choice; however I will argue that there is a clear danger in engaging to too great an extent with a purely ‘local’ approach, in general there seems to be something of a misconception in Government, particularly in the provision of local services (i.e. schools), that local approaches are somehow ‘better’.

Firstly, let us consider something that the Government seems to do without fail, something that I, as a Geographer, find to be a grave sin of omission. That is the apparently indiscriminate use of spatial qualifiers without so much of an explanation as to their meaning. The use of ‘local’ and ‘community’ are spectacularly misleading without qualification, and yet they are often used because people seems to think they understand what is meant by them – everyone considers themselves part of a community, and local to a service – but will these personal feelings about their socio-spatial connections actually translate to the ability to input on healthcare decision making? My investigation of access and registeration of patients to GPs in Southwark has shown that a) primary care is a very location based service and without fail each doctor exhibits a characteristic distance decay function that describes the pattern of registration with a GP suggest to some socio-economic criteria, but also that b) patients overlap to a large extent in a densely-populated urban context, the suggestion being that activity-spaces (i.e. retail areas, workplace and schools) has a distorting effect on patterns of registration for some people. To this end I suggest that a ‘community’ can be defined independently for individual GPs based upon the patterns of patient uptake unique to that service, although there may be some strong correlations with residential, workplace, educational etc. communities that overlap it (of course for some GPs the profile of its registered community may be greatly divergent from its observed local (defined by proximity to a GP) community). The following map is an example of this kind of complexity:

Here it is clear that any definition of locality or community based upon an arbitrary areal basis yields groups of people who could be registered to as many as 29 different Southwark GPs in only a very small area. This is in fact a very good, simple, illustration of patient choice in action. There are a lot of questions to ask Mr Lansley about how he views ‘local’ or ‘community’, and whether he is willing to enshrine that definiton in policy before we actually consent to doing anything with provision of services.

Further still, I have claimed that GPs are very much location based services – they are, over a certain distance (in Southwark this is about 6 -10km) no one is registered with a GP, choosing instead a closer service. In many ways this was constrained by the pre-existing system of ‘catchment areas’, however these were set to be removed by the end of the year in the quest for patient choice, thus the potential for registration is opened up to people using doctors near their place of work (for instance) rather than than near their home, thus should these people have a say in provison of services in the area within which they do not live – they are part of the GP’s ‘community’ but not of the residential one. A good illustration of this  is actually the polyclinic system – Southwark is geared up to introduce 3 polyclinics – one which already exists as a large GP-led health centre in the centr eof the borough, and two in the north connected to hospitals, the biggest difficulty faced at the moment is in estimating the daytime population (i.e. transient workforce) of the Southbank in order to account for likely polyclinic usage – a huge number of people who do not live in Southwark but will likely have some part of their healthcare provided for by Southwark PCT.

It is also unclear what Mr Lansley refers to when he talks about ‘top-down’: is it the Strategic health authorities and the DoH itself? It cannot be the PCTs as Mr Lansley claims that the new criteria will have the support of ‘GP commissioners’ and it is the PCTs that actually do the commissioning, further the idea of GPs working with local authorities is largely the same of GPs working with PCTs now, as PCTs and LAs are generally coterminous.

Whilst it is pleasing to see a politician quoting the need for an evidence based appraoch to restructuring, it is unclear what evidence he might base GP quality on, the current payment method (QoF) is based on GP reporting of pre-specified target outcomes to a centralised authority, surely GPs will simply follow these directives in order to bring in as much money as possible. Indeed, these stats are strong recommended not to be used as measures of GP quality as they are by-and-large patchy in what they cover, and include little demographic data. Indeed, had the previous government not already cut the NHS IT initiative that would have made reporting of outcomes actually feasible nationally, the new government would have no doubt cut it anyway.

The final worry I have is one of equity, something upon which the NHS is founded – the provision of a fair service contingent on those that need it, that is free at point of service. Surely such an atomistic approach to healthcare provision as Mr Lansley seems to specify, is liable to deepen the perceived ‘social gradient’ in health care, as without a careful (top-down) hand, the GPs and communities best-equiped to play an active role in orchestrating GP services will get increasingly better provision: most likely to be the wealthier areas of the country. There needs to be at least some form of national accountability for a national health service.

Continue Reading

More Thematic Maps in Python – shapely and descartes

.........................................................

Thanks to Sean Gillies for commenting on my last post, he put me onto a couple of Python packages that he’s been involved in creating that allow you to do some really excellent geospatial things. The shapely package is a great implementation of a lot of spatial analyses that you can do on projected (i.e. flattened) datasets, including topological operations and a full set of object types. The descartes package allows better integration of matplotlib with spatial data, particularly in terms of not having to use the “fill” plotting function repeatedly, but creating a more efficient set of “patches” which can then be added to the figure plot. The overal impression I got from descartes is that it wasn’t spectacularly different from the method detailed in my previous post, but it gives you more control and stability over the map plotting process; whereas using raw matplotlib you are inclined to hope that the map outputs correctly (it all seems a bit up to chance), using descartes you have a more robust and easily manipulable output.

In order to test this I rewrote my previous thematic map script to: firstly convert the shapefile geometries into shapely polygons, and secondly to pass those shapely polygons to descartes and draw a map plot using descartes-matplotlib. The only slightly odd piece of functionality that I found was that you can’t pass the shapely polygon object a list of shapely points in order to create the polygon, rather you have to pass a list of x,y tuples – much less satisfying!

Nonetheless, the changes were easy to implement, and with the previous script as given basically include:

from shapely.geometry import Polygon

points = []
for i in range(0,number of points in shapefile):
 tempx = float(x coord of point in shapefile polygon)
 tempy = float(y coord of point in shapefile polygon)

 points.append((tempx,tempy))
polygon = Polygon(points)

The above method creates a simple polygon without holes, shapely can accomodate this is need be though. Having created the shapely polygons, all that remains is to create a patch.

from descartes import PolygonPatch

patch = PolygonPatch(polygon, plus colour and line considerations)

Then you simply add the patch to the matplotlib figure you have already created so:

from matplotlib import pyplot

fig = pyplot.figure(1, figsize = [10,10], dpi = 300)   #create 10x10 figure
ax = fig.addsubplot(111)    #Add the map frame (single plot)

# here you create all the polygons and patches

ax.addpatch(patch)   # simply add the patch to the subplot
# set plot vars
ax.set_xlim(get xmin and xmax values from data)
ax.set_ylim(get ymin and ymax values from data)
ax.set_aspect(1)

pyplot.show()

Using these basics I was able to create a basic OAC map using Welsh OAs as an example:



Continue Reading

A Thematic Map in Python

.........................................................

I though I would explore the possibility of creating thematic maps using Python, this post documents my initial attempt. The output is hence rather basic, but encouraging. The primary reason that I wanted to test the mapping potential of python is to allow for some basic automated map production in order to quickly visually assess the geographical patterns contained within large data sets. This is something that I am at a loss to do in ESRI’s ArcGIS, although that might change in ArcGIS 10. For fans of R I know it can be done there, however R is too tricky for me! My colleague James Cheshire explains the method in R here.

The first hurdle in map making is getting the data in, for this I used the shapefile reader that Zachary Forest Johnson put together for his excellent blog ‘IndieMaps.com‘. This allowed me read in any of my masses of pre-existing Shapefile format datafiles, and indeed use the python scripting functionality in ArcGIS to perform spatial operations and then output a map quickly and without the hassle of dealing with ArcGIS layouts.

Once you have download the shapefile reader, it is easily implemented using:

import shpUtils   #imports the shapefile reader
#Load a shapefile into an object called shpRecords
shpRecords = shpUtils.loadShapefile('\filename.shp')

This is undoubtedly simple, what you then have is a (slightly) complex object which contians all of the shapefile data nested as lists and dictionaries. In order to get my head round this I spent some time investigating it, a standard shapefile that contains areal geographies (i.e. UK Output Areas) will have a similar set up to this:

  • The first list (shpRecords[i]) records the number of complete geometries, this corresponds to the number of rows in the attribute table. Thus a single polygon has 1 row in the attribute table and 1 list (list index 0) in Python.
  • The second dictionary (shpRecords[i]['key']) records two branches, reporting either the ‘dbf_data’ from the attribute table, or the ‘shp_data’ from the .shp file describing the underlying geometry.
  • Choosing the ‘dbf_data’ key (shpRecords[i]['dbf_data']) allows you to see the attributes recorded column-by-column for each row (and hence each geometry) in the attribute table. Thus shpRecords[i]['dbf_data']['name'] will return the attribute value for the field ‘name’ for the ith geometry in the shapefile.
  • Choosing the ‘shp_data’ key (shpRecords[i]['shp_data']) allows you to access the various components of the shapefile’s geometry. In the case of a polyline/polygon you get dictionary items ‘ymax’, ‘ymin’, ‘xmax’, ‘xmin’, ‘numpoints’, ‘numparts’ and ‘parts’. Clearly the first 6 items are properties of the ith geometry you are querying, so it allows you to form a bounding box, get the number of vertices in the line/polygon, and draw separate lines/polygons if the shapefile is setup to have spatially discontinuous shapes for each row.
  • The thing we are most interested in is the ‘parts’ dictionary key, as this contains all the coordinates for the particular geometry being considered, this is accessed as: shpRecords[i]['shp_data']['parts']. The next list (shpRecords[i]['shp_data']['parts'][j]) thus allows you to distinguish between parts in a multipart file. i.e. the jth part of the ith geometry.
  • Having come this far, one final dictionary allows us to see the coordinates themselves, this dictionary simply offers us ‘x’ or ‘y’. Thus finding the x-coordinate of the ith geometry and jth part is accessed by: shpRecords[i]['shp_data']['parts'][j]['x'] – simple!

I have been using matplotlib – a python library for scientific visualisation a lot recent, and have found it a very simple and powerful resource, so I thought I’d see if it could be made to draw a map.

Firstly import the pyplot element which does all the figure drawing:

import matplotlib.pyplot as plt

Now lets use the “fill” component of matplotlib to draw all the geometries in a shapefile – my shapefile is Output Areas in Southwark. Firstly we need to loop through each geometry, and then draw a polygon using all the points contained within each geometry. I omitted a loop for multipart geometries as my shapefile has none, however this would be very easy if the data did have multiple parts- simply add a loop in the middle!

for i in range(0,len(shpRecords)):
 # x and y are empty lists to be populated with the coords of each geometry.
 x = []
 y = []
 for j in range(0,len(shpRecords[i]['shp_data']['parts'][0]['points'])):
  # This is the number of vertices in the ith geometry.
  # The parts list is [0] as it is singlepart.

  # get x and y coordinates.
  tempx = float(shpRecords[i]['shp_data']['parts'][0]['points'][j]['x'])
  tempy = float(shpRecords[i]['shp_data']['parts'][0]['points'][j]['y'])
  x.append(tempx)
  y.append(tempy) # Populate the lists  

 # Creates a polygon in matplotlib for each geometry in the shapefile
 plt.fill(x,y)

plt.axis('equal')
# This sets the x and y axes as equal intervals.
# NB this script will only work for projected data, for geographical
# coordinate systems get ready to do some maths  

plt.show() # Draws the map!

This is the simplest form of the script, it will simply draw the shapefile with each area filled a random colour. This is not that useful, but it is easy to create a thematic maps of categorical data, so let investigate a way of doing that. I’ve got data for the Output Area Classification, which is a clustering of areas by social characteristics, I know that there are 7 supergroups in the classification, named numerically, so before all the processing of the shapefile I can create a dictionary of colour choices for each group. I’m using hexadecimal colours that I got from Cynthia Brewer’s website for a ‘qualitative’ 7 class classification. The dictionary looks like this:

oacSGroups = {'1':'#A6761D','2':'#E6AB02','3':'#66A61E','4':'#E7298A',\
'5':'#7570B3','6':'#D95F02','7': '#1B9E77'}

Thus the key ’1′ returns the associated hex colour, this can be linked to the ‘dbf_data’ key in the shapefile. In the plt.fill() component I simply have to specify the colour choice, thus we alter the line in the above script to read:

plt.fill(x,y,fc = oacSGroups[str(int(shpRecords[i]['dbf_data']['supergroup']))]\
,ec = '0.7',lw=0.1)

‘fc’ is the ‘foreground colour’ we are asking python to make the colour equal to the value in the oacSGroups dictionary where the key is the value contained in the attribute table for the ith row in the ‘supergroup’ field. Thus if the ith row had a ‘supergroup’ value of ’7′ that foreground colour would be set to ‘#1B9E77′. ‘ec’ is ‘edge colour’ and ‘lw’ is linewidth, here I have set the values to display fine, light grey lines.

Finally, as basic a map as this will turn out to be, we wouldn’t be anywhere without a legend. The following a a very basic, wholy manual way to add a legend to the map:

p1 = plt.Rectangle((0, 0), 1, 1, fc="#A6761D")
p2 = plt.Rectangle((0, 0), 1, 1, fc="#E6AB02")
p3 = plt.Rectangle((0, 0), 1, 1, fc="#66A61E")
p4 = plt.Rectangle((0, 0), 1, 1, fc="#E7298A")
p5 = plt.Rectangle((0, 0), 1, 1, fc="#7570B3")
p6 = plt.Rectangle((0, 0), 1, 1, fc="#D95F02")
p7 = plt.Rectangle((0, 0), 1, 1, fc="#1B9E77")

plt.legend([p1,p2,p3,p4,p5,p6,p7], ["Super Group 1","Super Group 2",\
"Super Group 3","Super Group 4","Super Group 5","Super Group 6","Super Group 7"], loc = 4)

This simply creates 7 rectangular plots which don’t appear on the plotted output, but instead are passed to the legend creator, each rectangle has the appropriate colour to match the mapped representation, and a label, shown int he legend as two ordered lists. The ‘loc’ tag allows the setting of where the legend will appear, 4 denotes the bottom right corner. the tag ‘title’ allows you to add a title to the legend as a string.

An example output looks something like this:

This took a couple of seconds to produce, and accounts for 846 individual geometries, which actually have quite a number of vertices.

I’ll update the blog should I find new methods to visualise spatial data in python.

Continue Reading

The Blogroll

Search this Site


[]