<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Volunteered Geographic Information &#187; python</title>
	<atom:link href="http://danieljlewis.org/tag/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://danieljlewis.org</link>
	<description>A Geography/GIS blog by Daniel J Lewis</description>
	<lastBuildDate>Tue, 20 Dec 2011 17:15:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Weighted Mean Direction Surfaces in Python</title>
		<link>http://danieljlewis.org/2011/08/31/weighted-mean-direction-surfaces-in-python/</link>
		<comments>http://danieljlewis.org/2011/08/31/weighted-mean-direction-surfaces-in-python/#comments</comments>
		<pubDate>Wed, 31 Aug 2011 13:18:18 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[GIS]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[Representation]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[Brunsdon]]></category>
		<category><![CDATA[Charlton]]></category>
		<category><![CDATA[circular statistics]]></category>
		<category><![CDATA[mean direction]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[weighting]]></category>

		<guid isPermaLink="false">http://danieljlewis.org.blogs.splintdev.geog.ucl.ac.uk/?p=537</guid>
		<description><![CDATA[I work a lot with flows and spatial interactions, one thing that I&#8217;ve wanted to do for a while is compute a mean flow direction surface. Unfortunately, arithmetic means don&#8217;t work for angular data, this is because it cannot account for the circular nature of the distribution of angular measurements. For instance the angles 5 [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2011%2F08%2F31%2Fweighted-mean-direction-surfaces-in-python%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2011%2F08%2F31%2Fweighted-mean-direction-surfaces-in-python%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I work a lot with flows and spatial interactions, one thing that I&#8217;ve wanted to do for a while is compute a mean flow direction surface. Unfortunately, arithmetic means don&#8217;t work for angular data, this is because it cannot account for the circular nature of the distribution of angular measurements. For instance the angles 5 degrees and 355 degrees are seperated only by 10 degrees, but their arithmetic mean is 180 degrees -w ay off, it should be 0 degrees!</p>
<p>Luckily, <a title="Local trend Statistics for Direction Data" href="http://leicester.academia.edu/ChrisBrunsdon/Papers/534394/Local_trend_statistics_for_directional_data--A_moving_window_approach">Brunsdon and Charlton</a> have published on this very subject, so I took it upon myself to implement a weighted circular mean function in Python. The key obstacle was learning about complex numbers, about which, up until this point, I had no idea about at all!</p>
<p>The first thing to do is calculate the angle between a set of candidate points (such as people) and a set of services (such as Medical Centres). This is simple enough to do using, and would look something like:</p>
<pre>import math</pre>
<pre>math.atan2((y2-y1),(x2-x1))</pre>
<p>In which the pair (x1,y1) is the location of the candidate point, and (x2,y2) the location of the allocated service for that candidate point. The line linking these two points defines a flow from a candidate point, to a servcie and vice versa.</p>
<p>Having calculated all of the angles, I used ArcGIS to create an output grid, at the extent of the candidate points, using the &#8220;fishnet&#8221; function which creates a vector grid of prespecified dimensions.</p>
<p>The beauty of Brunsdon and Charlton&#8217;s method is that it uses a local method of approximation, this means that for each cell in the output grid, a mean direction can be calculated based upon the values of nearby points, applying a weighting allows for more distance points to have less of an effect on the mean direction.</p>
<p>Firstly, I read all the candidate points into a KDTree structure, this allows me to search for local points, at the same time I also create an array of the angles for those candidate points.</p>
<pre>from scipy.spatial import cKDTree
import numpy as np

tree = cKDTree(treepoints)
res, idx = tree.query(testpoint,300000,0,2,100)
res = res[0][np.where(res[0] &lt; np.Inf)[0]]
idx = idx[0][:len(res)]</pre>
<p>The tree takes a numpy array of coordinate pairs, and the query method returns an array of distances to points (res) and their index value in the original array of coordinates (idx). The testpoint is a cell in the vector grid; 300000 is the k-number of nearest neighbours to find, here I have simply set it arbitrarily high in the context of my dataset; 0 is for approximate nearest neighbours, here I&#8217;ve specified exact; 2 indicates the use of euclidian distance; and 100 is the threshold, neighbours won&#8217;t be returned if they are further than 100 metres away. The penultimate line simply returns an array that is shortened to just those values which are less than 100m away (i.e. less than infinity) &#8211; points over 100m away are returned as value Inf.</p>
<p>The next step is to actually compute the mean direction, this requires a special approach using complex numbers however. Brunsdon and Charlton show that a direction can be stated as a complex number <em>z</em> in which <em>z = exp(iθ)</em> this is effectively: <em>z = cos(θ) + i sin(θ)  </em>in which <em>i</em> is an imaginary number. We can restate our directions in Python using:</p>
<pre>import cmath

thetas = angles[idx]
cThetas = []
for i in xrange(0,len(thetas)):
    cThetas.append(complex(np.cos(thetas[i]),np.sin(thetas[i])))
cThetas = np.array(cThetas)</pre>
<p>Here, the complex function allows the complex number representing an angle to be stored in a list, which I convert (lazily) to a numpy array. The first term, thetas, is using the idx array from the cKDTree to cleverly index the relevant angle records from the angles array which stores all the angle values in the order of entries for the cKDTree.</p>
<p>Next a temporary variable is created which calculates the mean direction:</p>
<pre>temp = np.sum(cThetas)/np.absolute(np.sum(cThetas))
MeanDir = np.angle(temp, deg = True)</pre>
<p>The mean direction is given by the argument (Arg) of the resultant complex number, Python implements this with the np.angle function, where deg = True returns the angle in degrees, and False in radians.</p>
<p>So far this is the unweighted mean, aggregating directional observations within a 100m disk (see also: uniform disk smoothing). To introduce weighting we must first define a weighting scheme, I&#8217;ve used the one suggested by Brunsdon and Charlton, which is Gaussian, and might look at bit like this:</p>
<pre>def gaussW(dists,band):
    out = np.zeros(dists.shape)
    for i in xrange(0,len(out)):
        temp = np.power(dists[i],2)/(2.0*np.power(float(band),2))
        out[i] = np.exp(-1.0 * temp)
    return out

weight = gaussW(res,100)</pre>
<p>Quite simply, I pass the distance array res to the gaussW function and it gives me back an array of weights for that ordering of distances. Using this I can redo the mean direction thus:</p>
<pre>temp = np.sum(weight*cThetas)/np.absolute(np.sum(weight*cThetas))
MeanDir = np.angle(temp, deg = True)</pre>
<p>There you have it! Attached is the script I used. Obviously, Brunsdon and Charlton implement a variance and a couple of visualisation devices, but these should be simple enough to implement now!</p>
<p>I created an output for flows of patients to GPs in Southwark, visualised using one of ESRI&#8217;s circular/direction colour ramps from <a title="Mapping Resources" href="http://mappingcenter.esri.com/index.cfm?fa=arcgisResources.gateway">colour ramp pack 2</a>. Not sure how best to visualise the legend at this point though. NB. 90 is north, -90 is South, 0/-0 is East and 180/-180 is West. The map is visualised to show the 4 cardinal directions, but the output is in fact continuous.</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2011/08/MeanDirectionFlows.png"><img class="aligncenter size-large wp-image-538" src="http://danieljlewis.org/files/2011/08/MeanDirectionFlows-724x1024.png" alt="" width="434" height="614" /></a>My example script is <a href="http://danieljlewis.org/files/2011/08/meanDirection.txt">here. </a> Note that I am using dbfpy to read and write to shapefile DBF tables directly.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2011/08/31/weighted-mean-direction-surfaces-in-python/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Finnish Municipalities: A case for zone design?</title>
		<link>http://danieljlewis.org/2010/11/30/finnish-municipalities-a-case-for-zone-design/</link>
		<comments>http://danieljlewis.org/2010/11/30/finnish-municipalities-a-case-for-zone-design/#comments</comments>
		<pubDate>Tue, 30 Nov 2010 20:05:39 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Geography]]></category>
		<category><![CDATA[GIS]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[finland]]></category>
		<category><![CDATA[Maps]]></category>
		<category><![CDATA[municipal]]></category>
		<category><![CDATA[pysal]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[zone design]]></category>

		<guid isPermaLink="false">http://danieljlewis.org.blogs.splintdev.geog.ucl.ac.uk/?p=438</guid>
		<description><![CDATA[In Finland, municipalities are incredibly powerful; like local authorities in the UK, municipalities are responsible for local administration, but they also levy an income tax and are responsible for providing most public services. Municipalities were founded on the assumption of equality, which forms the basis for the reform considerations currently ongoing in the Finnish government. [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F11%2F30%2Ffinnish-municipalities-a-case-for-zone-design%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F11%2F30%2Ffinnish-municipalities-a-case-for-zone-design%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>In Finland, municipalities are incredibly powerful; like local authorities in the UK, municipalities are responsible for local administration, but they also levy an income tax and are responsible for providing most public services. Municipalities were founded on the assumption of equality, which forms the basis for the reform considerations currently ongoing in the Finnish government. The fact is, municipalities simply aren&#8217;t equal, they vary wildly in population size and area; numbering 342 they are of rough numerical equivalence to UK local authorities, despite Finland itself having around 1/11th of the UK&#8217;s population. The population skew is greatly emphasised by the presence of cities such as Helsinki which cross municipal boundaries, and more remote municipalities whose geographical extent was set out by horse-and-cart distance. It is therefore understandable that the Finnish government would be interested in the possibility of mergers to reduce the number of municipalities, and create a system in which muncipalities serve a similar number, or at least a threshold, population. It is believed that this would make administration more efficient, as services can be centralised to a greater degree, and small municipalities which already share services can formalise this.</p>
<p>Some mergers have already taken place, and there are governmental incentives for merging. However, with the specified model that municalities should be reformed such that they have a base population of 20-30,000 people, from the Association of Finnish Local and Regional Authorities (<em>Kuntaliitto</em>), we can apply automated zone design scenarios to test the &#8216;what if&#8217; aspect of creating a new Finnish municipal system based on preserving different characteristics.</p>
<p>The zone design tool that I use to test a few basic scenarios is the regionalization library in pySAL, a spatial analysis module for Python. This implements the max-P algorithm for spatially contrained clustering subject to a similarity matrix, and a threshold value. I specified the research design so that I was testing for the optimal new aggregation of the pre-existing muncipalities and tested 3 different scenarios: 1) No similarity measure (all municipalities assumed equal, but for population) 2) Similarity based on municipal tax regime, and 3) Similarity based upon municipal tax regime and % non-finnish speakers, which accounts culturally for the Sami people of Lapland and ethnically Swedish Finns.</p>
<p>The regionalisation requires that you create a contiguity matrix for the zones, I arbitrarily chose the queen case, and added bespoke contiguity for Finnish islands based upon proximity, this is easy to do in GeoDa and it outputs a .gal file which you can read into python. Then all you really need to do is the following:</p>
<pre>import pysal
import numpy as np #required as pysal uses numpy arrays

#Read in your population and similarity data in some way,
#I tend to create a python list from a csv.

#convert the population and similarity data into
#numpy arrays, from lists called pop and sim
pop = np.asarray(pop)
sim = np.asarray(sim)

# Read in your precomputed Weights matrix
w = pysal.open("...\\QueenWeights.gal").read()

# Create an (optional) array of 1s to represent equality
# (replace sim in Maxp function call)
nosim = np.ones((342,1))

# Run solutions for maxp algorithms with specific parameters
r= pysal.Maxp(w,sim,floor = 20000,floor_variable = pop,initial=100)

# Write r.regions to outfile to get regional assignments,
#this can be joined to shp in ArcGIS.
</pre>
<p>The maxP algorithm works by first randomly creating a set of possible zoning configurations, then it chooses the current optimum and seeks to refine it using a computationally expensive zone-swapping method. Optimality in this case is defined by minimising dissimilarity whilst obeying the threshold population constraint. The <a title="pySAL Documentation" href="http://www.pysal.org/users/tutorials/region.html" target="_blank">API reference</a> for regionalisation in pysal is very good.</p>
<p>Here are some of the results I produced for this basic approach:</p>
<div id="attachment_453" class="wp-caption aligncenter" style="width: 271px"><a href="http://danieljlewis.org/files/2010/11/NoH20000.png"><img class="size-large wp-image-453 " src="http://danieljlewis.org/files/2010/11/NoH20000-725x1024.png" alt="" width="261" height="368" /></a><p class="wp-caption-text">Zone Design - Pop &gt; 20,000 with assumption of Municipal Equality</p></div>
<div id="attachment_454" class="wp-caption aligncenter" style="width: 271px"><a href="http://danieljlewis.org/files/2010/11/TaxLang20000.png"><img class="size-large wp-image-454 " src="http://danieljlewis.org/files/2010/11/TaxLang20000-725x1024.png" alt="" width="261" height="368" /></a><p class="wp-caption-text">Zone Design - Pop &gt; 20,000 with similarity of tax regime and language</p></div>
<p>Unlike some of the more advanced zone-design algorithms, pySAL doesn&#8217;t yet provide a way of preserving or optimising area shape characteristics, so you can get sliver-like polygons forming. Nonetheless it presents an interesting insight and a set of functions from which a more advanced/bespoke algorithm could be built.As it turns out the islands that I arbitrarily allocated to the  contiguity matrix are actually quite a contentious topic and given their  strategic significance to Sweden are in fact neutral territories which  would be untouched by any redesign of municipal structure.</p>
<p>As ever, local knowledge is important, and for the economists at <a title="VATT" href="http://www.vatt.fi/en/" target="_blank">VATT</a> this is a real task to undertake. They won&#8217;t be doing anything quite so crude, they do however have a curious spatial problem to deal with.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/11/30/finnish-municipalities-a-case-for-zone-design/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ArcGIS 10 &#8211; Field Calculator and Python</title>
		<link>http://danieljlewis.org/2010/10/11/arcgis-10-field-calculator-and-python/</link>
		<comments>http://danieljlewis.org/2010/10/11/arcgis-10-field-calculator-and-python/#comments</comments>
		<pubDate>Mon, 11 Oct 2010 17:10:57 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[GIS]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[arcgis]]></category>
		<category><![CDATA[esri]]></category>
		<category><![CDATA[field calculations]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=422</guid>
		<description><![CDATA[Python has been more tightly integrated in the new release of ArcGIS 10, allowing scripting to occur directly through a Python process without even opening up ArcMap. Admittedly this was available before, but now everything is more tightly coupled and a lot cleaner in it&#8217;s implementation. However, what has really interested, and indeed confused me [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F10%2F11%2Farcgis-10-field-calculator-and-python%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F10%2F11%2Farcgis-10-field-calculator-and-python%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Python has been more tightly integrated in the new release of ArcGIS 10, allowing scripting to occur directly through a Python process without even opening up ArcMap. Admittedly this was available before, but now everything is more tightly coupled and a lot cleaner in it&#8217;s implementation. However, what has really interested, and indeed confused me of late is how to use Python in the &#8216;field calculator&#8217;.</p>
<p>Field Calculator is a really useful tool, when you are looking at an attribute table for a shapefile in ArcGIS and you want to derive a value for each object in the file based on a function you can input the function into the field calculator and it will work it out for you row by row. Sometimes the value you want to derive is a bit more complicated than simple arithmetic and you need to write a script. Previously you could do this in VBA, but I always found it limited and confusing, now however you can do it in Python &#8211; much simpler!</p>
<p>There are a few pitfalls to using Python in ArcGIS field calculator, and so I&#8217;m going to specify how to write simple field calculator python scripts in ArcGIS from my early experience.</p>
<p>Firstly, for Python in field calculator the way to do it seems to be in write a Python function, and then call it for each row. In addition to this, because you are writing a function you have to give it the relevant parameters (i.e fields) with which to do the computation. Finally, and annoyingly you have to write your function in a little box, and use a consistent indentation standard (1 space works best for reasons of space) as Python requires.</p>
<p>Here is a basic recipe for achieving field calculations in ArcGIS using Python. Obvious this is overly simplistic as you do not need a script to do this calculation, but it serves as an introductory example.</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/10/FieldCalculator10.png"></a><a href="http://danieljlewis.org/files/2010/10/FieldCalculator101.png"><img class="aligncenter size-full wp-image-427" src="http://danieljlewis.org/files/2010/10/FieldCalculator101.png" alt="" width="493" height="472" /></a></p>
<p>1) Name a function and parameterise it with the fields to base the calculation on. Do this in the lower box.</p>
<p style="padding-left: 30px">In the image you can see I&#8217;ve input: density( !sum_pop!, !Area!) This means send the values in the fields called sum_pop and Area to the function called density.</p>
<p>2) Define the function you are calling in the larger upper box.</p>
<p style="padding-left: 30px">You define a function in python using the &#8220;def&#8221; command. In the image i have defined the &#8220;density&#8221; function by writing the line: def density( pop,area):</p>
<p style="padding-left: 30px">This function definition means: define a function called density which takes the parameters pop and area. The parameters could be called anything, but it is useful to call them something that makes sense for use in the function. These parameters are variable names that the function uses to identify the fields you have passed the function when you called it, as in 1).</p>
<p style="padding-left: 30px">Normally you&#8217;d do some sort of calculation within the function, however this example is so simple that all we need to do is &#8220;return&#8221; a value to the function call. This function is the density defined as population over area: pop/area.</p>
<p>Looking at the field calculator I have found that you are limited to the basic, math and datetime modules in python, without the ability to import other modules. You can however define several functions and call them from within your main function.</p>
<p>For details on the basic syntax of using python, this site is particularly good: http://www.tutorialspoint.com/python/index.htm</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/10/11/arcgis-10-field-calculator-and-python/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Drawing maps with Python</title>
		<link>http://danieljlewis.org/2010/09/15/drawing-maps-with-python/</link>
		<comments>http://danieljlewis.org/2010/09/15/drawing-maps-with-python/#comments</comments>
		<pubDate>Wed, 15 Sep 2010 22:23:08 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Cartography]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[shapefiles]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://danieljlewis.org.blogs.splintdev.geog.ucl.ac.uk/?p=403</guid>
		<description><![CDATA[There&#8217;s an increasing amount of useful packages that allow for spatial analysis in python. Having said that, actually drawing a map remains relatively tricky, here I am sharing a few of the methods that I have come up with recently to help in this area. Firstly, let&#8217;s consider the basic set of prerequisites that you [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F09%2F15%2Fdrawing-maps-with-python%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F09%2F15%2Fdrawing-maps-with-python%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>There&#8217;s an increasing amount of useful packages that allow for spatial analysis in python. Having said that, actually drawing a map remains relatively tricky, here I am sharing a few of the methods that I have come up with recently to help in this area. Firstly, let&#8217;s consider the basic set of prerequisites that you should have installed to do some useful things in Python.</p>
<p><a title="NumPy" href="http://numpy.scipy.org/" target="_blank">Numpy</a> and <a title="SciPy" href="http://www.scipy.org/" target="_blank">Scipy</a> &#8211; easy_install Numpy / easy_install Scipy</p>
<p>Numeric Python and Scientific Python vastly extend the scientific programming capabilities of Python. Numpy adds the array() object which, for numeric matters, is far superior to the standard Python List, as well as numerous mathematical methods. SciPy then makes use of these methods to provide interfaces to allow the use of numerous mathematical techniques, I have made use of SciPy&#8217;s Clustering routines, as well as elements that allow Multi Dimensional Scaling.</p>
<p><a title="matplotlib" href="http://matplotlib.sourceforge.net/" target="_blank">matplotlib</a> &#8211; easy_install matplotlib</p>
<p>Standard Python plotting library, with excellent graphic data visualisation capabilities built in and an excellent set of examples and tutorials. We can leverage the &#8216;patch&#8217; plotting capabilities for creating maps.</p>
<p><a title="Descartes package" href="http://pypi.python.org/pypi/descartes/1.0" target="_blank">descartes</a> &#8211; easy_install descartes</p>
<p>This is the matplotlib lever that allows us to draw geojson-like geometries on a matplotlib canvas effectively.</p>
<p><a title="gdal" href="http://www.gdal.org/" target="_blank">gdal</a> and <a title="OGR" href="http://www.gdal.org/ogr/index.html" target="_blank">ogr</a> &#8211; install <a title="gdal downloads" href="http://trac.osgeo.org/gdal/wiki/DownloadingGdalBinaries" target="_blank">gdal</a>, then easy_install gdal</p>
<p>This is a c/c++ library which is accessible in Python which makes reading and writing many different spatial data formats easy, this includes shapefiles, kml, gml etc. GDAL works specifically with raster data, and OGR with vectors. Actually setting this up can be tricky, for a start you need to make sure you have a c compiler installed, and that all the paths work. Pretty easy on a Mac (and i guess Linux), but a nightmare on Windows. Be prepared to consult forums.</p>
<p><a title="PySAL home" href="http://www.pysal.org/" target="_blank">PySAL</a> &#8211; <a title="PySAL downloads" href="http://code.google.com/p/pysal/downloads/list" target="_blank">download only as ongoing project.</a></p>
<p>This is a superb spatial analysis library that is currently under development. Provides a shapefiel reader is gdal is out of the question as well as numerous other modules. I&#8217;ve been using the ESDA, exploratory spatial data analysis, module a lot as it includes a series of useful map classification functions.</p>
<p>Now, the first attempt we will make to draw a map in Python will be a simple one. This will involve using data that is already in a projected coordinate system, so no need for coordinate transformations and we will simply visualise one aspect of the data using different classes. The model we will follow to create the map is a standard approach as such:</p>
<ol>
<li>Read in data, in this example as a shapefile</li>
<li>Parse attributes to create the boundary values for classification</li>
<li>Draw geometries, styling as per the previously defined boundary values</li>
<li>Add a Legend, North Arrow, Scale Bar and add any required text</li>
<li>Display or save the map.</li>
</ol>
<p style="text-align: left">The attached file demonstrates the drawing of the image below:<a href="http://danieljlewis.org/files/2010/09/plot1.png"></a></p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/09/plot11.png"><img class="aligncenter size-full wp-image-410" src="http://danieljlewis.org/files/2010/09/plot11.png" alt="" width="542" height="409" /></a></p>
<p style="text-align: left">In creating the above map there are a few elements to remember that add to the challenge. Firstly, the north arrow is an image that I made and simply loaded into the map, secondly, the scale bar is simply an appropriately scaled square patch with some explanatory text and thirdly the class colours come from <a title="Colour Brewer" href="http://colorbrewer2.org/" target="_blank">colour brewer</a>. For the map I&#8217;m using the freely available population data from london.data.gov.uk and district boundaries the OS Boundary Line open data release. These are both subject to Crown Copyright of course, but I provide them rather than force you to expressly download them. I&#8217;ve also included the north arrow image that I made in <a title="Processing" href="http://processing.org/" target="_blank">Processing</a>. I last used the population data shapefile to <a title="Processing Cartograms" href="http://danieljlewis.org/2010/03/16/london-population-cartograms-in-processing/" target="_blank">visualise population cartograms in Processing</a>.</p>
<p style="text-align: left">Resources:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/09/basicpythonmap.pdf">Python Script</a></p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/09/ShapefileNorthArrow.zip">Shapefile NorthArrow</a></p>
<p style="text-align: left">
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/09/15/drawing-maps-with-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>k-nearest neighbour weights using PySAL</title>
		<link>http://danieljlewis.org/2010/08/27/k-nearest-neighbour-weights-using-pysal/</link>
		<comments>http://danieljlewis.org/2010/08/27/k-nearest-neighbour-weights-using-pysal/#comments</comments>
		<pubDate>Fri, 27 Aug 2010 15:13:01 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[PhD Work]]></category>
		<category><![CDATA[pysal]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[weights]]></category>

		<guid isPermaLink="false">http://danieljlewis.org.blogs.splintdev.geog.ucl.ac.uk/?p=397</guid>
		<description><![CDATA[I found a nice little time saving device when testing different numbers of nearest neighbour weights in the excellent PySAL library in python, so I thought I&#8217;d share it. Basically I wanted to test a number of different values of k when choosing a weighting scheme for spatial smoothing using nearest neighbours, but I didn&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F08%2F27%2Fk-nearest-neighbour-weights-using-pysal%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F08%2F27%2Fk-nearest-neighbour-weights-using-pysal%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I found a nice little time saving device when testing different numbers of nearest neighbour weights in the excellent <a title="PySAL home" href="http://geodacenter.asu.edu/pysal" target="_blank">PySAL</a> library in python, so I thought I&#8217;d share it.</p>
<p>Basically I wanted to test a number of different values of k when choosing a weighting scheme for spatial smoothing using nearest neighbours, but I didn&#8217;t want to have to continually recalculate the weight&#8217;s matrix for different values of k. Here&#8217;s what I did:</p>
<ol>
<li>Calculate a k nearest neighbour vector for a high value of k, this should be the same size, or larger than what you anticipate as being your maximum value of k.</li>
<li>Store this table in a database, the database is useful as for large values of k for a large set of data you create k x n rows. Databases are optimised to store and query this amount of data in a way that text files aren&#8217;t!</li>
<li>Create the weights matrix in PySAL from first principles: grab all the data from the database and order it by the &#8216;from&#8217; neighbour id, and then the weights of the &#8216;to&#8217; neighbours, Create the weights matrix as specified in the<a title="PySAL Weights Docs" href="http://pysal.org/library/weights/weights.html" target="_blank"> PySAL docs on weighting</a>, but only use as many observations as you want to test by slicing the input matrixes.</li>
</ol>
<p>Here is my code showing how this works:</p>
<pre>import _mysql    #Library that connects to Mysql
from pysal import W    #Weights part of pysal

# Important - Database connection!
db = _mysql.connect(host="localhost",user="root",passwd="",db="data")

# Now Create a Spatial Weights Object

# These first 4 lines pull in the weights data from my MySQL database
# and store it as a list of tuples called 'resultWeight'
getWeights = "select * from `spatialweight` order by `polyID`,`weight`"
db.query(getWeights)
r = db.store_result()
resultWeight = r.fetch_row(maxrows=0)

nList = []    # Empty list for neighbours
wList = []    # Empty list for weights
neighbours = {}    # Empty dictionary to hold neighbours
weights = {}    # Empty dictionary to hold weights

# Now iterate through the results to form dictionaries with lists of
# neighbours and weights for each relevant point in the dataset

recNum = 1
while recNum &lt; (len(resultWeight)+1):
    # append data from resultWeights until the limit
    # for k is reached for each point
    nList.append(int(resultWeight[recNum-1][1]))
    wList.append(float(resultWeight[recNum-1][2]))

    if recNum % 50 == 0:    # 50 as I used a maximum value of k = 50
        # Slice nList and wList depending on the value of k to test
        neighbours[int(resultWeight[recNum-1][0])] = nList[0:20]
        weights[int(resultWeight[recNum-1][0])] = wList[0:20]
        # Reset nList and wList for the next point in the weights data.
        nList = []
        wList = []
    recNum += 1

# Now simply create the weights matrix for the value of k specified.
w = W(neighbours,weights)

#the order the matrix for use later.
if not w.id_order_set:
    w.id_order = range(1,n)
# Where n = number of observations in the dataset (assuming
# point IDs are sequential integers starting at 1.)
</pre>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/08/27/k-nearest-neighbour-weights-using-pysal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Computing the geometric median in Python</title>
		<link>http://danieljlewis.org/2010/07/09/computing-the-geometric-median-in-python/</link>
		<comments>http://danieljlewis.org/2010/07/09/computing-the-geometric-median-in-python/#comments</comments>
		<pubDate>Fri, 09 Jul 2010 10:17:56 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Modeling]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[allocation]]></category>
		<category><![CDATA[dijkstra]]></category>
		<category><![CDATA[geometric]]></category>
		<category><![CDATA[location]]></category>
		<category><![CDATA[matplotlib]]></category>
		<category><![CDATA[median]]></category>
		<category><![CDATA[optimisation]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[service]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=362</guid>
		<description><![CDATA[I noticed in a beta of ArcGIS 10 (then called 9.4) that there was a &#8216;new&#8217; option for computing a Geometric Median which didn&#8217;t exist in my copy of ArcGIS 9.3. This is an interesting concept, as in 1d statistics, the geometric (2d) mean is easy to calculate, being the average of all the X [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F07%2F09%2Fcomputing-the-geometric-median-in-python%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F07%2F09%2Fcomputing-the-geometric-median-in-python%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I noticed in a beta of ArcGIS 10 (then called 9.4) that there was a &#8216;new&#8217; option for computing a Geometric Median which didn&#8217;t exist in my copy of ArcGIS 9.3. This is an interesting concept, as in 1d statistics, the geometric (2d) mean is easy to calculate, being the average of all the X coords and all the Y coords. From stats we know that the Mean and Median value of a distribution will coincide if the data is perfectly normally distributed; however in the real world data usually will only approximate a normal distribution, leading to a mean value that is different from the midpoint, or median.  Therefore for a skewed distribution on the plane, we encounter a situation in which the mean is not necessarily the best representation of the &#8216;centre&#8217; of the data, thus we may wish to calculate the median; doing so will also give us a good idea of the direction of the skew of the point pattern we are investigating. In calculating the median of a 2d point pattern we can express the problem as a need to:</p>
<p><em> minimise the sum of squared distances from all points in a distribution to a centre.</em></p>
<p>Thus it is reasonably clear that we are dealing with an &#8216;optimisation problem&#8217;, something that I have experimented with before in work I conducted using the &#8216;transportation problem&#8217;, a classic linear programming problem.</p>
<p>In terms of application, I though that finding the median of a distribution of people around a service would be a useful, albeit basic, indication of whether all people were making a similar trip to a service, or whether there were other effects at work (this would be evidenced by a median centre that was not close to the actual service location). I though I would be able to code the optimisation routine in Python using pre-existing insight. Notably, the <a title="Geometric Median" href="http://en.wikipedia.org/wiki/Geometric_median" target="_blank">wikipedia page</a> on this details the Weiszfeld Algorithm as the acknowledged computational solution to the geometric median problem, it takes the form:</p>
<p><a href="http://danieljlewis.org/files/2010/07/weiszfeld.png"><img class="aligncenter size-full wp-image-363" title="weiszfeld" src="http://danieljlewis.org/files/2010/07/weiszfeld.png" alt="" width="368" height="61" /></a>However, actually writing the algorithm proved somewhat tough. Essentially the answer is to start with a candidate data point (I started with the mean centre) and calculate the objective function &#8211; in this case the sum of the euclidian distances of all points from the candidate centre. Then pass the candidate point through the Weiszfeld Algortihm and reassess the objective function, at such a point as the objective function converges a median has been found. There is no guarantee that the median found is the optimal median though, and depending of the data there may be more than 1 optimal solution. Below is a solution for some of my data (the data has been randomly offset by 75m to preserve anonymity) on patient registrations to a doctor.</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/07/geomedian.png"><img class="aligncenter size-large wp-image-365" title="geomedian" src="http://danieljlewis.org/files/2010/07/geomedian-1024x742.png" alt="" width="574" height="415" /></a>Here we can see that the mean and median centres are slightly different, suggesting that the patient population is skewed slightly northwards, most likely as a result of discontinuous urban infrastructure.</p>
<p style="text-align: left">The scatterplot was achieved using the <a title="MatPlotLib @ Sourceforge" href="http://matplotlib.sourceforge.net/index.html" target="_blank">matplotlib</a> Python plotting library. This was just a test, but I imagine more complex outputs can be achieved reasonably easily.</p>
<p style="text-align: left">Notably, this technique is using euclidian distance, which in a dense urban environment may be misleading, I note that there is a relatively simple execution of the <a title="Python Dijkstra" href="http://code.activestate.com/recipes/119466-dijkstras-algorithm-for-shortest-paths/" target="_blank">Dijkstra algorithm for shortest paths in Python</a>, and I am curious whether this could be integrated to find a geometric median on the network, although I suspect that it may be unworkable due to computational time constraints, although for smaller problems it might be ok.</p>
<p style="text-align: left">Naturally there are algorithms that can calculate a solution to the above for <em>p</em>-medians (i.e. several service centres in a population- commonly known as location-allocation), it is something that <a title="Paul Densham" href="http://www.geog.ucl.ac.uk/~pdensham/s_t_paper.html" target="_blank">Paul Densham</a> at UCL has worked on, and his code is making a return to service in ArcGIS version 10. I&#8217;m looking forward to seeing it, as it is a very difficult problem to solve (and in fact already has been &#8216;solved&#8217;), and not one I intend to investigate!</p>
<p style="text-align: left">My code for the geometric median is <a href="http://danieljlewis.org/files/2010/07/geomedian.pdf">here.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/07/09/computing-the-geometric-median-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UK OAC map in Python</title>
		<link>http://danieljlewis.org/2010/06/02/uk-oac-map-in-python/</link>
		<comments>http://danieljlewis.org/2010/06/02/uk-oac-map-in-python/#comments</comments>
		<pubDate>Wed, 02 Jun 2010 11:05:57 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Cartography]]></category>
		<category><![CDATA[GIS]]></category>
		<category><![CDATA[Representation]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[OAC]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[shapely]]></category>
		<category><![CDATA[UK]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=336</guid>
		<description><![CDATA[Here is a quick confirmation that you can use Python to draw very detailed maps; using the previously specified method I was unable to get python to draw all UK OAs due to their great number (c.220,000) and high complexity (c.50,000,000) vertices. Additionally I was unable to use the generalised OA boundaries for the UK [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F06%2F02%2Fuk-oac-map-in-python%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F06%2F02%2Fuk-oac-map-in-python%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Here is a quick confirmation that you can use Python to draw very detailed maps; using the previously specified method I was unable to get python to draw all UK OAs due to their great number (c.220,000) and high complexity (c.50,000,000) vertices. Additionally I was unable to use the generalised OA boundaries for the UK from UKBorders as they contain topological errors that the shapefile reader cannot deal with. ArcGIS is obviously a bit clever in how it handles bad topologies. So I extracted all the vertices and fed them into shapely polygons, and visualised them in the same way, but without reading shapefiles directly into python and was able to output this:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/06/UKOAC.png"><img class="aligncenter size-large wp-image-337" title="UKOAC" src="http://danieljlewis.org/files/2010/06/UKOAC-640x1024.png" alt="" width="576" height="922" /></a>This method has had an impact on the speed of computation as it can take roughly 25 minutes to output this map. The map looks pretty good, aside from a slightly odd polygon in the Bristol channel. Nevertheless, coupled with the operations that shapely, and other geo-libraries, can do this si increasing indication of the maturity of GIS in a variety of platforms. Oh, and it&#8217;s all free!</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/06/02/uk-oac-map-in-python/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>More Thematic Maps in Python &#8211; shapely and descartes</title>
		<link>http://danieljlewis.org/2010/05/27/more-thematic-maps-in-python-shapely-and-descartes/</link>
		<comments>http://danieljlewis.org/2010/05/27/more-thematic-maps-in-python-shapely-and-descartes/#comments</comments>
		<pubDate>Thu, 27 May 2010 16:58:14 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Representation]]></category>
		<category><![CDATA[descartes]]></category>
		<category><![CDATA[matplotlib]]></category>
		<category><![CDATA[OAC]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[shapely]]></category>
		<category><![CDATA[Wales]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=326</guid>
		<description><![CDATA[Thanks to Sean Gillies for commenting on my last post, he put me onto a couple of Python packages that he&#8217;s been involved in creating that allow you to do some really excellent geospatial things. The shapely package is a great implementation of a lot of spatial analyses that you can do on projected (i.e. [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F05%2F27%2Fmore-thematic-maps-in-python-shapely-and-descartes%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F05%2F27%2Fmore-thematic-maps-in-python-shapely-and-descartes%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Thanks to <a title="Sean Gillies Homepage" href="http://sgillies.net/" target="_blank">Sean Gillies</a> for commenting on my last post, he put me onto a couple of Python packages that he&#8217;s been involved in creating that allow you to do some really excellent geospatial things. The <a title="shapely" href="http://trac.gispython.org/lab/wiki/Shapely" target="_blank">shapely</a> package is a great implementation of a lot of spatial analyses that you can do on projected (i.e. flattened) datasets, including topological operations and a full set of object types. The <a title="Descartes package" href="http://pypi.python.org/pypi/descartes/1.0" target="_blank">descartes</a> package allows better integration of matplotlib with spatial data, particularly in terms of not having to use the &#8220;fill&#8221; plotting function repeatedly, but creating a more efficient set of &#8220;patches&#8221; which can then be added to the figure plot. The overal impression I got from descartes is that it wasn&#8217;t spectacularly different from the method detailed in my previous post, but it gives you more control and stability over the map plotting process; whereas using raw matplotlib you are inclined to hope that the map outputs correctly (it all seems a bit up to chance), using descartes you have a more robust and easily manipulable output.</p>
<p>In order to test this I rewrote my previous thematic map script to: firstly convert the shapefile geometries into shapely polygons, and secondly to pass those shapely polygons to descartes and draw a map plot using descartes-matplotlib. The only slightly odd piece of functionality that I found was that you can&#8217;t pass the shapely polygon object a list of shapely points in order to create the polygon, rather you have to pass a list of x,y tuples &#8211; much less satisfying!</p>
<p>Nonetheless, the changes were easy to implement, and with the previous script as given basically include:</p>
<pre>from shapely.geometry import Polygon

points = []
for i in range(0,<em>number of points in shapefile</em>):
 tempx = float(<em>x coord of point in shapefile polygon</em>)
 tempy = float(<em>y coord of point in shapefile polygon</em>)

 points.append((tempx,tempy))
polygon = Polygon(points)
</pre>
<p>The above method creates a simple polygon without holes, shapely can accomodate this is need be though. Having created the shapely polygons, all that remains is to create a patch.</p>
<pre>from descartes import PolygonPatch

patch = PolygonPatch(polygon, <em>plus colour and line considerations</em>)
</pre>
<p>Then you simply add the patch to the matplotlib figure you have already created so:</p>
<pre>from matplotlib import pyplot

fig = pyplot.figure(1, figsize = [10,10], dpi = 300)   #create 10x10 figure
ax = fig.addsubplot(111)    #Add the map frame (single plot)

# here you create all the polygons and patches

ax.addpatch(patch)   # simply add the patch to the subplot
# set plot vars
ax.set_xlim(<em>get xmin and xmax values from data</em>)
ax.set_ylim(<em>get ymin and ymax values from data</em>)
ax.set_aspect(1)

pyplot.show()
</pre>
<p>Using these basics I was able to create a basic OAC map using Welsh OAs as an example:</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/05/WalesOAC1.png"><img class="aligncenter size-full wp-image-328" title="WalesOAC" src="http://danieljlewis.org/files/2010/05/WalesOAC1.png" alt="" width="520" height="545" /></a></p>
<pre>
</pre>
<pre>
</pre>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/05/27/more-thematic-maps-in-python-shapely-and-descartes/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A Thematic Map in Python</title>
		<link>http://danieljlewis.org/2010/05/25/a-thematic-map-in-python/</link>
		<comments>http://danieljlewis.org/2010/05/25/a-thematic-map-in-python/#comments</comments>
		<pubDate>Tue, 25 May 2010 19:08:09 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Representation]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[automated]]></category>
		<category><![CDATA[categorical]]></category>
		<category><![CDATA[Maps]]></category>
		<category><![CDATA[matplotlib]]></category>
		<category><![CDATA[OAC]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[shapefile]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=309</guid>
		<description><![CDATA[I though I would explore the possibility of creating thematic maps using Python, this post documents my initial attempt. The output is hence rather basic, but encouraging. The primary reason that I wanted to test the mapping potential of python is to allow for some basic automated map production in order to quickly visually assess [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F05%2F25%2Fa-thematic-map-in-python%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F05%2F25%2Fa-thematic-map-in-python%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I though I would explore the possibility of creating thematic maps using Python, this post documents my initial attempt. The output is hence rather basic, but encouraging. The primary reason that I wanted to test the mapping potential of python is to allow for some basic automated map production in order to quickly visually assess the geographical patterns contained within large data sets. This is something that I am at a loss to do in ESRI&#8217;s ArcGIS, although that might change in ArcGIS 10. For fans of R I know it can be done there, however R is too tricky for me! My colleague James Cheshire explains the method in R <a title="Making Maps in R" href="http://spatialanalysis.co.uk/2010/01/13/making-maps-with-r/" target="_blank">here.</a></p>
<p>The first hurdle in map making is getting the data in, for this I used the <a title="Shapefile Reader" href="http://indiemaps.com/blog/2008/03/easy-shapefile-loading-in-python/" target="_blank">shapefile reader</a> that <a title="Indiemaps Home" href="http://indiemaps.com/" target="_blank">Zachary Forest Johnson</a> put together for his excellent blog &#8216;<a title="Indiemaps Blog" href="http://indiemaps.com/blog">IndieMaps.com</a>&#8216;. This allowed me read in any of my masses of pre-existing Shapefile format datafiles, and indeed use the python scripting functionality in ArcGIS to perform spatial operations and then output a map quickly and without the hassle of dealing with ArcGIS layouts.</p>
<p>Once you have download the shapefile reader, it is easily implemented using:</p>
<pre>import shpUtils   #imports the shapefile reader
#Load a shapefile into an object called shpRecords
shpRecords = shpUtils.loadShapefile('\filename.shp')</pre>
<p>This is undoubtedly simple, what you then have is a (slightly) complex object which contians all of the shapefile data nested as lists and dictionaries. In order to get my head round this I spent some time investigating it, a standard shapefile that contains areal geographies (i.e. UK Output Areas) will have a similar set up to this:</p>
<ul>
<li>The first list (shpRecords[i]) records the number of complete geometries, this corresponds to the number of rows in the attribute table. Thus a single polygon has 1 row in the attribute table and 1 list (list index 0) in Python.</li>
<li>The second dictionary (shpRecords[i]['key']) records two branches, reporting either the &#8216;dbf_data&#8217; from the attribute table, or the &#8216;shp_data&#8217; from the .shp file describing the underlying geometry.</li>
<li>Choosing the &#8216;dbf_data&#8217; key (shpRecords[i]['dbf_data']) allows you to see the attributes recorded column-by-column for each row (and hence each geometry) in the attribute table. Thus shpRecords[i]['dbf_data']['name'] will return the attribute value for the field &#8216;name&#8217; for the <em>i</em>th geometry in the shapefile.</li>
<li>Choosing the &#8216;shp_data&#8217; key (shpRecords[i]['shp_data']) allows you to access the various components of the shapefile&#8217;s geometry. In the case of a polyline/polygon you get dictionary items &#8216;ymax&#8217;, &#8216;ymin&#8217;, &#8216;xmax&#8217;, &#8216;xmin&#8217;, &#8216;numpoints&#8217;, &#8216;numparts&#8217; and &#8216;parts&#8217;. Clearly the first 6 items are properties of the <em>i</em>th geometry you are querying, so it allows you to form a bounding box, get the number of vertices in the line/polygon, and draw separate lines/polygons if the shapefile is setup to have spatially discontinuous shapes for each row.</li>
<li>The thing we are most interested in is the &#8216;parts&#8217; dictionary key, as this contains all the coordinates for the particular geometry being considered, this is accessed as: shpRecords[i]['shp_data']['parts']. The next list (shpRecords[i]['shp_data']['parts'][j]) thus allows you to distinguish between parts in a multipart file. i.e. the <em>j</em>th part of the <em>i</em>th geometry.</li>
<li>Having come this far, one final dictionary allows us to see the coordinates themselves, this dictionary simply offers us &#8216;x&#8217; or &#8216;y&#8217;. Thus finding the x-coordinate of the <em>i</em>th geometry and <em>j</em>th part is accessed by: shpRecords[i]['shp_data']['parts'][j]['x'] &#8211; simple!</li>
</ul>
<p>I have been using <a title="MatPlotLib @ Sourceforge" href="http://matplotlib.sourceforge.net/" target="_blank">matplotlib</a> &#8211; a python library for scientific visualisation a lot recent, and have found it a very simple and powerful resource, so I thought I&#8217;d see if it could be made to draw a map.</p>
<p>Firstly import the pyplot element which does all the figure drawing:</p>
<pre>import matplotlib.pyplot as plt
</pre>
<p>Now lets use the &#8220;fill&#8221; component of matplotlib to draw all the geometries in a shapefile &#8211; my shapefile is Output Areas in Southwark. Firstly we need to loop through each geometry, and then draw a polygon using all the points contained within each geometry. I omitted a loop for multipart geometries as my shapefile has none, however this would be very easy if the data did have multiple parts- simply add a loop in the middle!</p>
<pre>for i in range(0,len(shpRecords)):
 # x and y are empty lists to be populated with the coords of each geometry.
 x = []
 y = []
 for j in range(0,len(shpRecords[i]['shp_data']['parts'][0]['points'])):
  # This is the number of vertices in the ith geometry.
  # The parts list is [0] as it is singlepart.

  # get x and y coordinates.
  tempx = float(shpRecords[i]['shp_data']['parts'][0]['points'][j]['x'])
  tempy = float(shpRecords[i]['shp_data']['parts'][0]['points'][j]['y'])
  x.append(tempx)
  y.append(tempy) # Populate the lists  

 # Creates a polygon in matplotlib for each geometry in the shapefile
 plt.fill(x,y)

plt.axis('equal')
# This sets the x and y axes as equal intervals.
# NB this script will only work for projected data, for geographical
# coordinate systems get ready to do some maths  

plt.show() # Draws the map!</pre>
<p>This is the simplest form of the script, it will simply draw the shapefile with each area filled a random colour. This is not that useful, but it is easy to create a thematic maps of categorical data, so let investigate a way of doing that. I&#8217;ve got data for the Output Area Classification, which is a clustering of areas by social characteristics, I know that there are 7 supergroups in the classification, named numerically, so before all the processing of the shapefile I can create a dictionary of colour choices for each group. I&#8217;m using hexadecimal colours that I got from <a title="Colour Brewer" href="http://colorbrewer2.org/" target="_blank">Cynthia Brewer&#8217;s</a> website for a &#8216;qualitative&#8217; 7 class classification. The dictionary looks like this:</p>
<pre>oacSGroups = {'1':'#A6761D','2':'#E6AB02','3':'#66A61E','4':'#E7298A',\
'5':'#7570B3','6':'#D95F02','7': '#1B9E77'}
</pre>
<p>Thus the key &#8217;1&#8242; returns the associated hex colour, this can be linked to the &#8216;dbf_data&#8217; key in the shapefile. In the plt.fill() component I simply have to specify the colour choice, thus we alter the line in the above script to read:</p>
<pre>plt.fill(x,y,fc = oacSGroups[str(int(shpRecords[i]['dbf_data']['supergroup']))]\
,ec = '0.7',lw=0.1)
</pre>
<p>&#8216;fc&#8217; is the &#8216;foreground colour&#8217; we are asking python to make the colour equal to the value in the oacSGroups dictionary where the key is the value contained in the attribute table for the <em>i</em>th row in the &#8216;supergroup&#8217; field. Thus if the <em>i</em>th row had a &#8216;supergroup&#8217; value of &#8217;7&#8242; that foreground colour would be set to &#8216;#1B9E77&#8242;. &#8216;ec&#8217; is &#8216;edge colour&#8217; and &#8216;lw&#8217; is linewidth, here I have set the values to display fine, light grey lines.</p>
<p>Finally, as basic a map as this will turn out to be, we wouldn&#8217;t be anywhere without a legend. The following a a very basic, wholy manual way to add a legend to the map:</p>
<pre>p1 = plt.Rectangle((0, 0), 1, 1, fc="#A6761D")
p2 = plt.Rectangle((0, 0), 1, 1, fc="#E6AB02")
p3 = plt.Rectangle((0, 0), 1, 1, fc="#66A61E")
p4 = plt.Rectangle((0, 0), 1, 1, fc="#E7298A")
p5 = plt.Rectangle((0, 0), 1, 1, fc="#7570B3")
p6 = plt.Rectangle((0, 0), 1, 1, fc="#D95F02")
p7 = plt.Rectangle((0, 0), 1, 1, fc="#1B9E77")

plt.legend([p1,p2,p3,p4,p5,p6,p7], ["Super Group 1","Super Group 2",\
"Super Group 3","Super Group 4","Super Group 5","Super Group 6","Super Group 7"], loc = 4)
</pre>
<p>This simply creates 7 rectangular plots which don&#8217;t appear on the plotted output, but instead are passed to the legend creator, each rectangle has the appropriate colour to match the mapped representation, and a label, shown int he legend as two ordered lists. The &#8216;loc&#8217; tag allows the setting of where the legend will appear, 4 denotes the bottom right corner. the tag &#8216;title&#8217; allows you to add a title to the legend as a string.</p>
<p style="text-align: left">An example output looks something like this:<a href="http://danieljlewis.org/files/2010/05/OACPythonMap.png"></a></p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/05/OACPythonMap1.png"><img class="aligncenter size-full wp-image-323" title="OACPythonMap" src="http://danieljlewis.org/files/2010/05/OACPythonMap1.png" alt="" width="564" height="650" /></a>This took a couple of seconds to produce, and accounts for 846 individual geometries, which actually have quite a number of vertices.</p>
<p style="text-align: left">I&#8217;ll update the blog should I find new methods to visualise spatial data in python.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/05/25/a-thematic-map-in-python/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Address Geocoding with Fuzzy String Matching</title>
		<link>http://danieljlewis.org/2009/12/23/address-geocoding-with-fuzzy-string-matching/</link>
		<comments>http://danieljlewis.org/2009/12/23/address-geocoding-with-fuzzy-string-matching/#comments</comments>
		<pubDate>Wed, 23 Dec 2009 17:14:54 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Health GIS]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[address]]></category>
		<category><![CDATA[edit]]></category>
		<category><![CDATA[geocode]]></category>
		<category><![CDATA[levenshtein]]></category>
		<category><![CDATA[match]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[ratio]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=135</guid>
		<description><![CDATA[Recently I obtained a portion of address layer 2 for Southwark and surrounding boroughs in order to georeference my patient data by household. Currently I have a postcode match with over 99% match between known postcodes and patient reported postcodes. Being able to locate patients by their address, rather than their postcode will allow me [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2009%2F12%2F23%2Faddress-geocoding-with-fuzzy-string-matching%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2009%2F12%2F23%2Faddress-geocoding-with-fuzzy-string-matching%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Recently I obtained a portion of address layer 2 for Southwark and surrounding boroughs in order to georeference my patient data by household. Currently I have a postcode match with over 99% match between known postcodes and patient reported postcodes. Being able to locate patients by their address, rather than their postcode will allow me to begin to think about patients in terms of &#8216;households&#8217; rather than as I have been currently doing as (somewhat atomised) individuals. A recent experiment I conducted on choice characteristics of patients which was an evolution of my CASA working paper showed that seeing people as individuals was faulty logic in the context of primary care uptake. Reason tells me that families and people living in the same household are more likely to go to the same GP than different ones, due to factors that might include social network effects, thus in order to test this I need to address geocode the patient list data to a finer standard than postcodes- households.</p>
<p>Addresslayer2 is a rich Ordnance Survey dataset that pinpoints the location of houses, commercial buildings and features of the built environment (such as post boxes). I have an extract of the national dataset covering Lambeth, Lewisham and Southwark which constitutes over 400,000 points of interest which are given an explicit location in space.</p>
<p>The difficulty inherent is to match up the reported patient address with the record in addresslayer2 in order to derive a location for each patient. Patients that overlay each other in space can then be aggregated to a &#8216;household&#8217; for that location. Often the addresses of patients living in the same house are subtly different so I cannot simply group the list by the addresses given, such a method also says little about the unque location of each household. Thus I have chosen to address geocode allt he aptients first and then derive household information from the spatial component of the data.</p>
<p>Initially I tried using pre-existing geocoding software in both ArcGIS and Manifold GIS, but neither was able to provide a satisfactory result. So i set about doing it myself using the Python Programming language.</p>
<p>One of the main things that has helped me so far is Fuzzy String Matching using Levenshtein Distance. It is rare that a recorded address will exactly match (i.e. match by equality, x == y) an address in addresslayer2. Often differences between two address strings amount to very little, such as capitalisation, abbreviations, punctuation etc. So a fuzzy match can be obtained by comparing the similarity of 2 strings.</p>
<p>The Levenshtein distance computes a value that represents the number of edits (by way of insertion, deletion or substitution of characters) required to turn one string into another. I am using the following algorithm written in python to calculate this value:</p>
<pre><span>def</span> levenshtein<span>(</span>s1, s2<span>)</span>:
    <span>if</span> <span>len</span><span>(</span>s1<span>)</span> <span>&lt;</span> <span>len</span><span>(</span>s2<span>)</span>:
        <span>return</span> levenshtein<span>(</span>s2, s1<span>)</span>
    <span>if</span> <span>not</span> s1:
        <span>return</span> <span>len</span><span>(</span>s2<span>)</span>

    previous_row = <span>xrange</span><span>(</span><span>len</span><span>(</span>s2<span>)</span> + 1<span>)</span>
    <span>for</span> i, c1 <span>in</span> <span>enumerate</span><span>(</span>s1<span>)</span>:
        current_row = <span>[</span>i + 1<span>]</span>
        <span>for</span> j, c2 <span>in</span> <span>enumerate</span><span>(</span>s2<span>)</span>:
            insertions = previous_row<span>[</span>j - 1<span>]</span> + 1
            deletions = current_row<span>[</span>j<span>]</span> + 1
            substitutions = previous_row<span>[</span>j<span>]</span> + <span>(</span>c1 <span>!</span>= c2<span>)</span>
            current_row.<span>append</span><span>(</span><span>min</span><span>(</span>insertions, deletions, substitutions<span>)</span><span>)</span>
        previous_row = current_row</pre>
<p><span>return</span> previous_row<span>[</span>-<span>1</span><span>]</span></p>
<p><span>This algorithm is available, in this form and in many other languages, from <a title="Levenshtein Distance - Wikibooks" href="http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance" target="_blank">wikibooks</a>.</span></p>
<p><span>For 2 strings, calling the levenshtein function returns an integer which represents the number of edits required to make the 2 strings the same, however this value is meaningless without reference to the length of the string in the first place &#8211; 5 edits on a string 5 characters long suggests a completely different string, whereas 5 edits on a string 50 characters long suggests that 90% of the string was the same and that changes were actually minimal. Thus to get a relative ratio from the levenshtein distance I use the following code:</span></p>
<pre><span>def ratio(s1,s2):
    edit = levenshtein(s1,s2)
    output = float(edit)/max(len(s1),len(s2))
    return output
</span></pre>
<p>This is a simple comparison that I found <a title="Ratio from Levenshtein Distance" href="http://tickett.net/dedupe/index.php/Levenshtein_Distance" target="_blank">here</a> and is implemented in a python Levenshtein library <a title="Google Code PyLevenshtein" href="http://code.google.com/p/pylevenshtein/" target="_blank">here</a>. This library is written in c and hence should be faster than the python implementation, unfortunately I couldn&#8217;t get it to compile properly in windows 7. Works really well in Linux though!</p>
<p>Having established the levenshtein and ratio functions, it is simply a case of matching candidate strings with known address strings. I&#8217;m using a threshold for similarity so the string has to been similar to a specified degree before it can be considered a match.</p>
<p>Using this algorithm in conjunction with some other basic string operations, strip(), isdigit(), is alnum() and find()/replace() etc. gives me a match rate of around 90%. This is reasonable, but because the dataset of patients is very large it still leaves around 30,000 people unmatched. My next move will be to start subsetting addresses and matching elements of them, and checking which pieces are not matched. This is particularly important with students and people living in social housing where a lot of the address information given is specific to particular subsets of social housing estates, but given to me as a single string. Dissaggregating this data will allow me to match each bit with individual fields in addresslayer2 relevant to social housing.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2009/12/23/address-geocoding-with-fuzzy-string-matching/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

