A Thematic Map in Python


I though I would explore the possibility of creating thematic maps using Python, this post documents my initial attempt. The output is hence rather basic, but encouraging. The primary reason that I wanted to test the mapping potential of python is to allow for some basic automated map production in order to quickly visually assess the geographical patterns contained within large data sets. This is something that I am at a loss to do in ESRI’s ArcGIS, although that might change in ArcGIS 10. For fans of R I know it can be done there, however R is too tricky for me! My colleague James Cheshire explains the method in R here.

The first hurdle in map making is getting the data in, for this I used the shapefile reader that Zachary Forest Johnson put together for his excellent blog ‘IndieMaps.com‘. This allowed me read in any of my masses of pre-existing Shapefile format datafiles, and indeed use the python scripting functionality in ArcGIS to perform spatial operations and then output a map quickly and without the hassle of dealing with ArcGIS layouts.

Once you have download the shapefile reader, it is easily implemented using:

import shpUtils   #imports the shapefile reader
#Load a shapefile into an object called shpRecords
shpRecords = shpUtils.loadShapefile('\filename.shp')

This is undoubtedly simple, what you then have is a (slightly) complex object which contians all of the shapefile data nested as lists and dictionaries. In order to get my head round this I spent some time investigating it, a standard shapefile that contains areal geographies (i.e. UK Output Areas) will have a similar set up to this:

  • The first list (shpRecords[i]) records the number of complete geometries, this corresponds to the number of rows in the attribute table. Thus a single polygon has 1 row in the attribute table and 1 list (list index 0) in Python.
  • The second dictionary (shpRecords[i]['key']) records two branches, reporting either the ‘dbf_data’ from the attribute table, or the ‘shp_data’ from the .shp file describing the underlying geometry.
  • Choosing the ‘dbf_data’ key (shpRecords[i]['dbf_data']) allows you to see the attributes recorded column-by-column for each row (and hence each geometry) in the attribute table. Thus shpRecords[i]['dbf_data']['name'] will return the attribute value for the field ‘name’ for the ith geometry in the shapefile.
  • Choosing the ‘shp_data’ key (shpRecords[i]['shp_data']) allows you to access the various components of the shapefile’s geometry. In the case of a polyline/polygon you get dictionary items ‘ymax’, ‘ymin’, ‘xmax’, ‘xmin’, ‘numpoints’, ‘numparts’ and ‘parts’. Clearly the first 6 items are properties of the ith geometry you are querying, so it allows you to form a bounding box, get the number of vertices in the line/polygon, and draw separate lines/polygons if the shapefile is setup to have spatially discontinuous shapes for each row.
  • The thing we are most interested in is the ‘parts’ dictionary key, as this contains all the coordinates for the particular geometry being considered, this is accessed as: shpRecords[i]['shp_data']['parts']. The next list (shpRecords[i]['shp_data']['parts'][j]) thus allows you to distinguish between parts in a multipart file. i.e. the jth part of the ith geometry.
  • Having come this far, one final dictionary allows us to see the coordinates themselves, this dictionary simply offers us ‘x’ or ‘y’. Thus finding the x-coordinate of the ith geometry and jth part is accessed by: shpRecords[i]['shp_data']['parts'][j]['x'] – simple!

I have been using matplotlib – a python library for scientific visualisation a lot recent, and have found it a very simple and powerful resource, so I thought I’d see if it could be made to draw a map.

Firstly import the pyplot element which does all the figure drawing:

import matplotlib.pyplot as plt

Now lets use the “fill” component of matplotlib to draw all the geometries in a shapefile – my shapefile is Output Areas in Southwark. Firstly we need to loop through each geometry, and then draw a polygon using all the points contained within each geometry. I omitted a loop for multipart geometries as my shapefile has none, however this would be very easy if the data did have multiple parts- simply add a loop in the middle!

for i in range(0,len(shpRecords)):
 # x and y are empty lists to be populated with the coords of each geometry.
 x = []
 y = []
 for j in range(0,len(shpRecords[i]['shp_data']['parts'][0]['points'])):
  # This is the number of vertices in the ith geometry.
  # The parts list is [0] as it is singlepart.

  # get x and y coordinates.
  tempx = float(shpRecords[i]['shp_data']['parts'][0]['points'][j]['x'])
  tempy = float(shpRecords[i]['shp_data']['parts'][0]['points'][j]['y'])
  y.append(tempy) # Populate the lists  

 # Creates a polygon in matplotlib for each geometry in the shapefile

# This sets the x and y axes as equal intervals.
# NB this script will only work for projected data, for geographical
# coordinate systems get ready to do some maths  

plt.show() # Draws the map!

This is the simplest form of the script, it will simply draw the shapefile with each area filled a random colour. This is not that useful, but it is easy to create a thematic maps of categorical data, so let investigate a way of doing that. I’ve got data for the Output Area Classification, which is a clustering of areas by social characteristics, I know that there are 7 supergroups in the classification, named numerically, so before all the processing of the shapefile I can create a dictionary of colour choices for each group. I’m using hexadecimal colours that I got from Cynthia Brewer’s website for a ‘qualitative’ 7 class classification. The dictionary looks like this:

oacSGroups = {'1':'#A6761D','2':'#E6AB02','3':'#66A61E','4':'#E7298A',\
'5':'#7570B3','6':'#D95F02','7': '#1B9E77'}

Thus the key ’1′ returns the associated hex colour, this can be linked to the ‘dbf_data’ key in the shapefile. In the plt.fill() component I simply have to specify the colour choice, thus we alter the line in the above script to read:

plt.fill(x,y,fc = oacSGroups[str(int(shpRecords[i]['dbf_data']['supergroup']))]\
,ec = '0.7',lw=0.1)

‘fc’ is the ‘foreground colour’ we are asking python to make the colour equal to the value in the oacSGroups dictionary where the key is the value contained in the attribute table for the ith row in the ‘supergroup’ field. Thus if the ith row had a ‘supergroup’ value of ’7′ that foreground colour would be set to ‘#1B9E77′. ‘ec’ is ‘edge colour’ and ‘lw’ is linewidth, here I have set the values to display fine, light grey lines.

Finally, as basic a map as this will turn out to be, we wouldn’t be anywhere without a legend. The following a a very basic, wholy manual way to add a legend to the map:

p1 = plt.Rectangle((0, 0), 1, 1, fc="#A6761D")
p2 = plt.Rectangle((0, 0), 1, 1, fc="#E6AB02")
p3 = plt.Rectangle((0, 0), 1, 1, fc="#66A61E")
p4 = plt.Rectangle((0, 0), 1, 1, fc="#E7298A")
p5 = plt.Rectangle((0, 0), 1, 1, fc="#7570B3")
p6 = plt.Rectangle((0, 0), 1, 1, fc="#D95F02")
p7 = plt.Rectangle((0, 0), 1, 1, fc="#1B9E77")

plt.legend([p1,p2,p3,p4,p5,p6,p7], ["Super Group 1","Super Group 2",\
"Super Group 3","Super Group 4","Super Group 5","Super Group 6","Super Group 7"], loc = 4)

This simply creates 7 rectangular plots which don’t appear on the plotted output, but instead are passed to the legend creator, each rectangle has the appropriate colour to match the mapped representation, and a label, shown int he legend as two ordered lists. The ‘loc’ tag allows the setting of where the legend will appear, 4 denotes the bottom right corner. the tag ‘title’ allows you to add a title to the legend as a string.

An example output looks something like this:

This took a couple of seconds to produce, and accounts for 846 individual geometries, which actually have quite a number of vertices.

I’ll update the blog should I find new methods to visualise spatial data in python.

~ End Article and Begin Conversation ~

~ Now It's Your Turn ~

Feel free to use <strong>, <em>, and <a href="">


The Blogroll

Search this Site