<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Volunteered Geographic Information &#187; Uncategorized</title>
	<atom:link href="http://danieljlewis.org/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://danieljlewis.org</link>
	<description>A Geography/GIS blog by Daniel J Lewis</description>
	<lastBuildDate>Tue, 20 Dec 2011 17:15:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>ArcGIS 10 &#8211; Field Calculator and Python</title>
		<link>http://danieljlewis.org/2010/10/11/arcgis-10-field-calculator-and-python/</link>
		<comments>http://danieljlewis.org/2010/10/11/arcgis-10-field-calculator-and-python/#comments</comments>
		<pubDate>Mon, 11 Oct 2010 17:10:57 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[GIS]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[arcgis]]></category>
		<category><![CDATA[esri]]></category>
		<category><![CDATA[field calculations]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=422</guid>
		<description><![CDATA[Python has been more tightly integrated in the new release of ArcGIS 10, allowing scripting to occur directly through a Python process without even opening up ArcMap. Admittedly this was available before, but now everything is more tightly coupled and a lot cleaner in it&#8217;s implementation. However, what has really interested, and indeed confused me [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F10%2F11%2Farcgis-10-field-calculator-and-python%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F10%2F11%2Farcgis-10-field-calculator-and-python%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Python has been more tightly integrated in the new release of ArcGIS 10, allowing scripting to occur directly through a Python process without even opening up ArcMap. Admittedly this was available before, but now everything is more tightly coupled and a lot cleaner in it&#8217;s implementation. However, what has really interested, and indeed confused me of late is how to use Python in the &#8216;field calculator&#8217;.</p>
<p>Field Calculator is a really useful tool, when you are looking at an attribute table for a shapefile in ArcGIS and you want to derive a value for each object in the file based on a function you can input the function into the field calculator and it will work it out for you row by row. Sometimes the value you want to derive is a bit more complicated than simple arithmetic and you need to write a script. Previously you could do this in VBA, but I always found it limited and confusing, now however you can do it in Python &#8211; much simpler!</p>
<p>There are a few pitfalls to using Python in ArcGIS field calculator, and so I&#8217;m going to specify how to write simple field calculator python scripts in ArcGIS from my early experience.</p>
<p>Firstly, for Python in field calculator the way to do it seems to be in write a Python function, and then call it for each row. In addition to this, because you are writing a function you have to give it the relevant parameters (i.e fields) with which to do the computation. Finally, and annoyingly you have to write your function in a little box, and use a consistent indentation standard (1 space works best for reasons of space) as Python requires.</p>
<p>Here is a basic recipe for achieving field calculations in ArcGIS using Python. Obvious this is overly simplistic as you do not need a script to do this calculation, but it serves as an introductory example.</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/10/FieldCalculator10.png"></a><a href="http://danieljlewis.org/files/2010/10/FieldCalculator101.png"><img class="aligncenter size-full wp-image-427" src="http://danieljlewis.org/files/2010/10/FieldCalculator101.png" alt="" width="493" height="472" /></a></p>
<p>1) Name a function and parameterise it with the fields to base the calculation on. Do this in the lower box.</p>
<p style="padding-left: 30px">In the image you can see I&#8217;ve input: density( !sum_pop!, !Area!) This means send the values in the fields called sum_pop and Area to the function called density.</p>
<p>2) Define the function you are calling in the larger upper box.</p>
<p style="padding-left: 30px">You define a function in python using the &#8220;def&#8221; command. In the image i have defined the &#8220;density&#8221; function by writing the line: def density( pop,area):</p>
<p style="padding-left: 30px">This function definition means: define a function called density which takes the parameters pop and area. The parameters could be called anything, but it is useful to call them something that makes sense for use in the function. These parameters are variable names that the function uses to identify the fields you have passed the function when you called it, as in 1).</p>
<p style="padding-left: 30px">Normally you&#8217;d do some sort of calculation within the function, however this example is so simple that all we need to do is &#8220;return&#8221; a value to the function call. This function is the density defined as population over area: pop/area.</p>
<p>Looking at the field calculator I have found that you are limited to the basic, math and datetime modules in python, without the ability to import other modules. You can however define several functions and call them from within your main function.</p>
<p>For details on the basic syntax of using python, this site is particularly good: http://www.tutorialspoint.com/python/index.htm</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/10/11/arcgis-10-field-calculator-and-python/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Computing the geometric median in Python</title>
		<link>http://danieljlewis.org/2010/07/09/computing-the-geometric-median-in-python/</link>
		<comments>http://danieljlewis.org/2010/07/09/computing-the-geometric-median-in-python/#comments</comments>
		<pubDate>Fri, 09 Jul 2010 10:17:56 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Modeling]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[allocation]]></category>
		<category><![CDATA[dijkstra]]></category>
		<category><![CDATA[geometric]]></category>
		<category><![CDATA[location]]></category>
		<category><![CDATA[matplotlib]]></category>
		<category><![CDATA[median]]></category>
		<category><![CDATA[optimisation]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[service]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=362</guid>
		<description><![CDATA[I noticed in a beta of ArcGIS 10 (then called 9.4) that there was a &#8216;new&#8217; option for computing a Geometric Median which didn&#8217;t exist in my copy of ArcGIS 9.3. This is an interesting concept, as in 1d statistics, the geometric (2d) mean is easy to calculate, being the average of all the X [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F07%2F09%2Fcomputing-the-geometric-median-in-python%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F07%2F09%2Fcomputing-the-geometric-median-in-python%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I noticed in a beta of ArcGIS 10 (then called 9.4) that there was a &#8216;new&#8217; option for computing a Geometric Median which didn&#8217;t exist in my copy of ArcGIS 9.3. This is an interesting concept, as in 1d statistics, the geometric (2d) mean is easy to calculate, being the average of all the X coords and all the Y coords. From stats we know that the Mean and Median value of a distribution will coincide if the data is perfectly normally distributed; however in the real world data usually will only approximate a normal distribution, leading to a mean value that is different from the midpoint, or median.  Therefore for a skewed distribution on the plane, we encounter a situation in which the mean is not necessarily the best representation of the &#8216;centre&#8217; of the data, thus we may wish to calculate the median; doing so will also give us a good idea of the direction of the skew of the point pattern we are investigating. In calculating the median of a 2d point pattern we can express the problem as a need to:</p>
<p><em> minimise the sum of squared distances from all points in a distribution to a centre.</em></p>
<p>Thus it is reasonably clear that we are dealing with an &#8216;optimisation problem&#8217;, something that I have experimented with before in work I conducted using the &#8216;transportation problem&#8217;, a classic linear programming problem.</p>
<p>In terms of application, I though that finding the median of a distribution of people around a service would be a useful, albeit basic, indication of whether all people were making a similar trip to a service, or whether there were other effects at work (this would be evidenced by a median centre that was not close to the actual service location). I though I would be able to code the optimisation routine in Python using pre-existing insight. Notably, the <a title="Geometric Median" href="http://en.wikipedia.org/wiki/Geometric_median" target="_blank">wikipedia page</a> on this details the Weiszfeld Algorithm as the acknowledged computational solution to the geometric median problem, it takes the form:</p>
<p><a href="http://danieljlewis.org/files/2010/07/weiszfeld.png"><img class="aligncenter size-full wp-image-363" title="weiszfeld" src="http://danieljlewis.org/files/2010/07/weiszfeld.png" alt="" width="368" height="61" /></a>However, actually writing the algorithm proved somewhat tough. Essentially the answer is to start with a candidate data point (I started with the mean centre) and calculate the objective function &#8211; in this case the sum of the euclidian distances of all points from the candidate centre. Then pass the candidate point through the Weiszfeld Algortihm and reassess the objective function, at such a point as the objective function converges a median has been found. There is no guarantee that the median found is the optimal median though, and depending of the data there may be more than 1 optimal solution. Below is a solution for some of my data (the data has been randomly offset by 75m to preserve anonymity) on patient registrations to a doctor.</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/07/geomedian.png"><img class="aligncenter size-large wp-image-365" title="geomedian" src="http://danieljlewis.org/files/2010/07/geomedian-1024x742.png" alt="" width="574" height="415" /></a>Here we can see that the mean and median centres are slightly different, suggesting that the patient population is skewed slightly northwards, most likely as a result of discontinuous urban infrastructure.</p>
<p style="text-align: left">The scatterplot was achieved using the <a title="MatPlotLib @ Sourceforge" href="http://matplotlib.sourceforge.net/index.html" target="_blank">matplotlib</a> Python plotting library. This was just a test, but I imagine more complex outputs can be achieved reasonably easily.</p>
<p style="text-align: left">Notably, this technique is using euclidian distance, which in a dense urban environment may be misleading, I note that there is a relatively simple execution of the <a title="Python Dijkstra" href="http://code.activestate.com/recipes/119466-dijkstras-algorithm-for-shortest-paths/" target="_blank">Dijkstra algorithm for shortest paths in Python</a>, and I am curious whether this could be integrated to find a geometric median on the network, although I suspect that it may be unworkable due to computational time constraints, although for smaller problems it might be ok.</p>
<p style="text-align: left">Naturally there are algorithms that can calculate a solution to the above for <em>p</em>-medians (i.e. several service centres in a population- commonly known as location-allocation), it is something that <a title="Paul Densham" href="http://www.geog.ucl.ac.uk/~pdensham/s_t_paper.html" target="_blank">Paul Densham</a> at UCL has worked on, and his code is making a return to service in ArcGIS version 10. I&#8217;m looking forward to seeing it, as it is a very difficult problem to solve (and in fact already has been &#8216;solved&#8217;), and not one I intend to investigate!</p>
<p style="text-align: left">My code for the geometric median is <a href="http://danieljlewis.org/files/2010/07/geomedian.pdf">here.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/07/09/computing-the-geometric-median-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8216;Locally led&#8217; NHS Service changes dubious</title>
		<link>http://danieljlewis.org/2010/06/01/locally-led-nhs-service-changes-dubious/</link>
		<comments>http://danieljlewis.org/2010/06/01/locally-led-nhs-service-changes-dubious/#comments</comments>
		<pubDate>Tue, 01 Jun 2010 14:17:28 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Health Geography]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[community]]></category>
		<category><![CDATA[health]]></category>
		<category><![CDATA[Lansley]]></category>
		<category><![CDATA[local]]></category>
		<category><![CDATA[provision]]></category>
		<category><![CDATA[service]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=330</guid>
		<description><![CDATA[Since coming to government, new Conservative Health Secretary Andrew Lansley has sought to fulfil the pledge he made to put an end to local restructurings of NHS service delivery by authorities higher up the NHS hierarchy. Ostensibly he believes that local decision-making will have a better overall effect on the quality of outcomes for patients [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F06%2F01%2Flocally-led-nhs-service-changes-dubious%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F06%2F01%2Flocally-led-nhs-service-changes-dubious%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Since coming to government, new Conservative Health Secretary Andrew Lansley has sought to fulfil the pledge he made to put an end to local restructurings of NHS service delivery by authorities higher up the NHS hierarchy. Ostensibly he believes that local decision-making will have a better overall effect on the quality of outcomes for patients and hence lead to a better health service. Specifically he wants to provide GPs with an opportunity to work with community leaders and their local authorities to steer local services. The core elements actually do not differ greatly from the outgoing Labour policies, particularly with respect to patient choice; however I will argue that there is a clear danger in engaging to too great an extent with a purely &#8216;local&#8217; approach, in general there seems to be something of a misconception in Government, particularly in the provision of local services (i.e. schools), that local approaches are somehow &#8216;better&#8217;.</p>
<p>Firstly, let us consider something that the Government seems to do without fail, something that I, as a Geographer, find to be a grave sin of omission. That is the apparently indiscriminate use of spatial qualifiers without so much of an explanation as to their meaning. The use of &#8216;local&#8217; and &#8216;community&#8217; are spectacularly misleading without qualification, and yet they are often used because people seems to think they understand what is meant by them &#8211; everyone considers themselves part of a community, and local to a service &#8211; but will these personal feelings about their socio-spatial connections actually translate to the ability to input on healthcare decision making? My investigation of access and registeration of patients to GPs in Southwark has shown that a) primary care is a very location based service and without fail each doctor exhibits a characteristic distance decay function that describes the pattern of registration with a GP suggest to some socio-economic criteria, but also that b) patients overlap to a large extent in a densely-populated urban context, the suggestion being that activity-spaces (i.e. retail areas, workplace and schools) has a distorting effect on patterns of registration for some people. To this end I suggest that a &#8216;community&#8217; can be defined independently for individual GPs based upon the patterns of patient uptake unique to that service, although there may be some strong correlations with residential, workplace, educational etc. communities that overlap it (of course for some GPs the profile of its registered community may be greatly divergent from its observed local (defined by proximity to a GP) community). The following map is an example of this kind of complexity:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/06/GPRegSwk.jpg"><img class="aligncenter size-full wp-image-332" title="GPRegSwk" src="http://danieljlewis.org/files/2010/06/GPRegSwk.jpg" alt="" width="420" height="705" /></a></p>
<p style="text-align: left">Here it is clear that any definition of locality or community based upon an arbitrary areal basis yields groups of people who could be registered to as many as 29 different Southwark GPs in only a very small area. This is in fact a very good, simple, illustration of patient choice in action. There are a lot of questions to ask Mr Lansley about how he views &#8216;local&#8217; or &#8216;community&#8217;, and whether he is willing to enshrine that definiton in policy before we actually consent to doing anything with provision of services.</p>
<p style="text-align: left">Further still, I have claimed that GPs are very much location based services &#8211; they are, over a certain distance (in Southwark this is about 6 -10km) no one is registered with a GP, choosing instead a closer service. In many ways this was constrained by the pre-existing system of &#8216;catchment areas&#8217;, however these were set to be removed by the end of the year in the quest for patient choice, thus the potential for registration is opened up to people using doctors near their place of work (for instance) rather than than near their home, thus should these people have a say in provison of services in the area within which they do not live &#8211; they are part of the GP&#8217;s &#8216;community&#8217; but not of the residential one. A good illustration of this  is actually the polyclinic system &#8211; Southwark is geared up to introduce 3 polyclinics &#8211; one which already exists as a large GP-led health centre in the centr eof the borough, and two in the north connected to hospitals, the biggest difficulty faced at the moment is in estimating the daytime population (i.e. transient workforce) of the Southbank in order to account for likely polyclinic usage &#8211; a huge number of people who do not live in Southwark but will likely have some part of their healthcare provided for by Southwark PCT.</p>
<p style="text-align: left">It is also unclear what Mr Lansley refers to when he talks about &#8216;top-down&#8217;: is it the Strategic health authorities and the DoH itself? It cannot be the PCTs as Mr Lansley claims that the new criteria will have the support of &#8216;GP commissioners&#8217; and it is the PCTs that actually do the commissioning, further the idea of GPs working with local authorities is largely the same of GPs working with PCTs now, as PCTs and LAs are generally coterminous.</p>
<p style="text-align: left">Whilst it is pleasing to see a politician quoting the need for an evidence based appraoch to restructuring, it is unclear what evidence he might base GP quality on, the current payment method (QoF) is based on GP reporting of pre-specified target outcomes to a centralised authority, surely GPs will simply follow these directives in order to bring in as much money as possible. Indeed, these stats are strong recommended not to be used as measures of GP quality as they are by-and-large patchy in what they cover, and include little demographic data. Indeed, had the previous government not already cut the NHS IT initiative that would have made reporting of outcomes actually feasible nationally, the new government would have no doubt cut it anyway.</p>
<p style="text-align: left">The final worry I have is one of equity, something upon which the NHS is founded &#8211; the provision of a fair service contingent on those that need it, that is free at point of service. Surely such an atomistic approach to healthcare provision as Mr Lansley seems to specify, is liable to deepen the perceived &#8216;social gradient&#8217; in health care, as without a careful (top-down) hand, the GPs and communities best-equiped to play an active role in orchestrating GP services will get increasingly better provision: most likely to be the wealthier areas of the country. There needs to be at least some form of national accountability for a national health service.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/06/01/locally-led-nhs-service-changes-dubious/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A Thematic Map in Python</title>
		<link>http://danieljlewis.org/2010/05/25/a-thematic-map-in-python/</link>
		<comments>http://danieljlewis.org/2010/05/25/a-thematic-map-in-python/#comments</comments>
		<pubDate>Tue, 25 May 2010 19:08:09 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Representation]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[automated]]></category>
		<category><![CDATA[categorical]]></category>
		<category><![CDATA[Maps]]></category>
		<category><![CDATA[matplotlib]]></category>
		<category><![CDATA[OAC]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[shapefile]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=309</guid>
		<description><![CDATA[I though I would explore the possibility of creating thematic maps using Python, this post documents my initial attempt. The output is hence rather basic, but encouraging. The primary reason that I wanted to test the mapping potential of python is to allow for some basic automated map production in order to quickly visually assess [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F05%2F25%2Fa-thematic-map-in-python%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F05%2F25%2Fa-thematic-map-in-python%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I though I would explore the possibility of creating thematic maps using Python, this post documents my initial attempt. The output is hence rather basic, but encouraging. The primary reason that I wanted to test the mapping potential of python is to allow for some basic automated map production in order to quickly visually assess the geographical patterns contained within large data sets. This is something that I am at a loss to do in ESRI&#8217;s ArcGIS, although that might change in ArcGIS 10. For fans of R I know it can be done there, however R is too tricky for me! My colleague James Cheshire explains the method in R <a title="Making Maps in R" href="http://spatialanalysis.co.uk/2010/01/13/making-maps-with-r/" target="_blank">here.</a></p>
<p>The first hurdle in map making is getting the data in, for this I used the <a title="Shapefile Reader" href="http://indiemaps.com/blog/2008/03/easy-shapefile-loading-in-python/" target="_blank">shapefile reader</a> that <a title="Indiemaps Home" href="http://indiemaps.com/" target="_blank">Zachary Forest Johnson</a> put together for his excellent blog &#8216;<a title="Indiemaps Blog" href="http://indiemaps.com/blog">IndieMaps.com</a>&#8216;. This allowed me read in any of my masses of pre-existing Shapefile format datafiles, and indeed use the python scripting functionality in ArcGIS to perform spatial operations and then output a map quickly and without the hassle of dealing with ArcGIS layouts.</p>
<p>Once you have download the shapefile reader, it is easily implemented using:</p>
<pre>import shpUtils   #imports the shapefile reader
#Load a shapefile into an object called shpRecords
shpRecords = shpUtils.loadShapefile('\filename.shp')</pre>
<p>This is undoubtedly simple, what you then have is a (slightly) complex object which contians all of the shapefile data nested as lists and dictionaries. In order to get my head round this I spent some time investigating it, a standard shapefile that contains areal geographies (i.e. UK Output Areas) will have a similar set up to this:</p>
<ul>
<li>The first list (shpRecords[i]) records the number of complete geometries, this corresponds to the number of rows in the attribute table. Thus a single polygon has 1 row in the attribute table and 1 list (list index 0) in Python.</li>
<li>The second dictionary (shpRecords[i]['key']) records two branches, reporting either the &#8216;dbf_data&#8217; from the attribute table, or the &#8216;shp_data&#8217; from the .shp file describing the underlying geometry.</li>
<li>Choosing the &#8216;dbf_data&#8217; key (shpRecords[i]['dbf_data']) allows you to see the attributes recorded column-by-column for each row (and hence each geometry) in the attribute table. Thus shpRecords[i]['dbf_data']['name'] will return the attribute value for the field &#8216;name&#8217; for the <em>i</em>th geometry in the shapefile.</li>
<li>Choosing the &#8216;shp_data&#8217; key (shpRecords[i]['shp_data']) allows you to access the various components of the shapefile&#8217;s geometry. In the case of a polyline/polygon you get dictionary items &#8216;ymax&#8217;, &#8216;ymin&#8217;, &#8216;xmax&#8217;, &#8216;xmin&#8217;, &#8216;numpoints&#8217;, &#8216;numparts&#8217; and &#8216;parts&#8217;. Clearly the first 6 items are properties of the <em>i</em>th geometry you are querying, so it allows you to form a bounding box, get the number of vertices in the line/polygon, and draw separate lines/polygons if the shapefile is setup to have spatially discontinuous shapes for each row.</li>
<li>The thing we are most interested in is the &#8216;parts&#8217; dictionary key, as this contains all the coordinates for the particular geometry being considered, this is accessed as: shpRecords[i]['shp_data']['parts']. The next list (shpRecords[i]['shp_data']['parts'][j]) thus allows you to distinguish between parts in a multipart file. i.e. the <em>j</em>th part of the <em>i</em>th geometry.</li>
<li>Having come this far, one final dictionary allows us to see the coordinates themselves, this dictionary simply offers us &#8216;x&#8217; or &#8216;y&#8217;. Thus finding the x-coordinate of the <em>i</em>th geometry and <em>j</em>th part is accessed by: shpRecords[i]['shp_data']['parts'][j]['x'] &#8211; simple!</li>
</ul>
<p>I have been using <a title="MatPlotLib @ Sourceforge" href="http://matplotlib.sourceforge.net/" target="_blank">matplotlib</a> &#8211; a python library for scientific visualisation a lot recent, and have found it a very simple and powerful resource, so I thought I&#8217;d see if it could be made to draw a map.</p>
<p>Firstly import the pyplot element which does all the figure drawing:</p>
<pre>import matplotlib.pyplot as plt
</pre>
<p>Now lets use the &#8220;fill&#8221; component of matplotlib to draw all the geometries in a shapefile &#8211; my shapefile is Output Areas in Southwark. Firstly we need to loop through each geometry, and then draw a polygon using all the points contained within each geometry. I omitted a loop for multipart geometries as my shapefile has none, however this would be very easy if the data did have multiple parts- simply add a loop in the middle!</p>
<pre>for i in range(0,len(shpRecords)):
 # x and y are empty lists to be populated with the coords of each geometry.
 x = []
 y = []
 for j in range(0,len(shpRecords[i]['shp_data']['parts'][0]['points'])):
  # This is the number of vertices in the ith geometry.
  # The parts list is [0] as it is singlepart.

  # get x and y coordinates.
  tempx = float(shpRecords[i]['shp_data']['parts'][0]['points'][j]['x'])
  tempy = float(shpRecords[i]['shp_data']['parts'][0]['points'][j]['y'])
  x.append(tempx)
  y.append(tempy) # Populate the lists  

 # Creates a polygon in matplotlib for each geometry in the shapefile
 plt.fill(x,y)

plt.axis('equal')
# This sets the x and y axes as equal intervals.
# NB this script will only work for projected data, for geographical
# coordinate systems get ready to do some maths  

plt.show() # Draws the map!</pre>
<p>This is the simplest form of the script, it will simply draw the shapefile with each area filled a random colour. This is not that useful, but it is easy to create a thematic maps of categorical data, so let investigate a way of doing that. I&#8217;ve got data for the Output Area Classification, which is a clustering of areas by social characteristics, I know that there are 7 supergroups in the classification, named numerically, so before all the processing of the shapefile I can create a dictionary of colour choices for each group. I&#8217;m using hexadecimal colours that I got from <a title="Colour Brewer" href="http://colorbrewer2.org/" target="_blank">Cynthia Brewer&#8217;s</a> website for a &#8216;qualitative&#8217; 7 class classification. The dictionary looks like this:</p>
<pre>oacSGroups = {'1':'#A6761D','2':'#E6AB02','3':'#66A61E','4':'#E7298A',\
'5':'#7570B3','6':'#D95F02','7': '#1B9E77'}
</pre>
<p>Thus the key &#8217;1&#8242; returns the associated hex colour, this can be linked to the &#8216;dbf_data&#8217; key in the shapefile. In the plt.fill() component I simply have to specify the colour choice, thus we alter the line in the above script to read:</p>
<pre>plt.fill(x,y,fc = oacSGroups[str(int(shpRecords[i]['dbf_data']['supergroup']))]\
,ec = '0.7',lw=0.1)
</pre>
<p>&#8216;fc&#8217; is the &#8216;foreground colour&#8217; we are asking python to make the colour equal to the value in the oacSGroups dictionary where the key is the value contained in the attribute table for the <em>i</em>th row in the &#8216;supergroup&#8217; field. Thus if the <em>i</em>th row had a &#8216;supergroup&#8217; value of &#8217;7&#8242; that foreground colour would be set to &#8216;#1B9E77&#8242;. &#8216;ec&#8217; is &#8216;edge colour&#8217; and &#8216;lw&#8217; is linewidth, here I have set the values to display fine, light grey lines.</p>
<p>Finally, as basic a map as this will turn out to be, we wouldn&#8217;t be anywhere without a legend. The following a a very basic, wholy manual way to add a legend to the map:</p>
<pre>p1 = plt.Rectangle((0, 0), 1, 1, fc="#A6761D")
p2 = plt.Rectangle((0, 0), 1, 1, fc="#E6AB02")
p3 = plt.Rectangle((0, 0), 1, 1, fc="#66A61E")
p4 = plt.Rectangle((0, 0), 1, 1, fc="#E7298A")
p5 = plt.Rectangle((0, 0), 1, 1, fc="#7570B3")
p6 = plt.Rectangle((0, 0), 1, 1, fc="#D95F02")
p7 = plt.Rectangle((0, 0), 1, 1, fc="#1B9E77")

plt.legend([p1,p2,p3,p4,p5,p6,p7], ["Super Group 1","Super Group 2",\
"Super Group 3","Super Group 4","Super Group 5","Super Group 6","Super Group 7"], loc = 4)
</pre>
<p>This simply creates 7 rectangular plots which don&#8217;t appear on the plotted output, but instead are passed to the legend creator, each rectangle has the appropriate colour to match the mapped representation, and a label, shown int he legend as two ordered lists. The &#8216;loc&#8217; tag allows the setting of where the legend will appear, 4 denotes the bottom right corner. the tag &#8216;title&#8217; allows you to add a title to the legend as a string.</p>
<p style="text-align: left">An example output looks something like this:<a href="http://danieljlewis.org/files/2010/05/OACPythonMap.png"></a></p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/05/OACPythonMap1.png"><img class="aligncenter size-full wp-image-323" title="OACPythonMap" src="http://danieljlewis.org/files/2010/05/OACPythonMap1.png" alt="" width="564" height="650" /></a>This took a couple of seconds to produce, and accounts for 846 individual geometries, which actually have quite a number of vertices.</p>
<p style="text-align: left">I&#8217;ll update the blog should I find new methods to visualise spatial data in python.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/05/25/a-thematic-map-in-python/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Some Surname-based Rank-Size thoughts</title>
		<link>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/</link>
		<comments>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/#comments</comments>
		<pubDate>Fri, 05 Mar 2010 14:24:27 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Southwark]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[log]]></category>
		<category><![CDATA[power law]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[rank-size]]></category>
		<category><![CDATA[surnames]]></category>
		<category><![CDATA[zipt]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=249</guid>
		<description><![CDATA[Yesterday Professor Mike Batty introduced me to the rank-size rule, an idea popularised by George Kingsley Zipf as the relationship between the frequency of an observed phenomenon against the phenomenon&#8217;s rank in its group. This is best exemplified by the example of city sizes: essentially Zipf shows that for every really large city, there exist [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F05%2Fsome-surname-based-rank-size-thoughts%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F05%2Fsome-surname-based-rank-size-thoughts%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Yesterday <a title="Mike Batty" href="http://www.casa.ucl.ac.uk/people/MikesPage.htm" target="_blank">Professor Mike Batty</a> introduced me to the rank-size rule, an idea popularised by <a title="Zipf - Wikipedia" href="http://en.wikipedia.org/wiki/George_Kingsley_Zipf" target="_blank">George Kingsley Zipf </a>as the relationship between the frequency of an observed phenomenon against the phenomenon&#8217;s rank in its group. This is best exemplified by the example of city sizes: essentially Zipf shows that for every really large city, there exist many smaller ones; however these smaller cities aren&#8217;t just a bit smaller than the large city, they are considerably smaller, in fact the difference in city size from the biggest cities to the smallest can be explained by a power law, this can be represented as:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/03/CodeCogsEqn2.gif"><img class="aligncenter size-full wp-image-250" title="CodeCogsEqn(2)" src="http://danieljlewis.org/files/2010/03/CodeCogsEqn2.gif" alt="" width="85" height="49" /></a></p>
<p style="text-align: left">Where Pn is the frequency of occurance of a phenomenon ranked nth, and the exponent <em>alpha </em>is usually roughly equal to 1.</p>
<p style="text-align: left">The power law thus produces a plot where the 2nd item is 1/2 the size of the 1st, the 3rd item is a 1/3 the size of the 1st etc. This can be represented by a plot of surname frequency in Southwark by rank.</p>
<div id="attachment_251" class="wp-caption aligncenter" style="width: 548px"><a href="http://danieljlewis.org/files/2010/03/Rplot3.png"><img class="size-full wp-image-251" title="Rplot3" src="http://danieljlewis.org/files/2010/03/Rplot3.png" alt="" width="538" height="537" /></a><p class="wp-caption-text">Plot of Surname Frequency against Rank in Southwark for all observed surname (using R)</p></div>
<p style="text-align: left">It is clear from the graph that there are very few surnames which are popular and many which are relatively unique. Another interesting characteristic of a power law, such as the relationship between surname frequency and rank are self similar: if we examine any portion of the curve we should get the same curve, albeit at a different scale.</p>
<p style="text-align: left">
<div id="attachment_255" class="wp-caption aligncenter" style="width: 548px"><a href="http://danieljlewis.org/files/2010/03/RPlot5.png"><img class="size-full wp-image-255 " title="RPlot5" src="http://danieljlewis.org/files/2010/03/RPlot5.png" alt="" width="538" height="537" /></a><p class="wp-caption-text">Plot of Surname frequency for Rank 300 - 6000</p></div>
<p style="text-align: left">It is clear from the above graph that a subset of the full data gives a power law relationship. We can attempt to linearise this relationship by taking the log of the frequency and rank:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/03/Rplot1.png"><img class="aligncenter size-full wp-image-256" title="Rplot1" src="http://danieljlewis.org/files/2010/03/Rplot1.png" alt="" width="538" height="537" /></a>The fact that the line is not straight indicates that the relationship is not a true power law. The long tail is accentuated by the stepped line, frequencies are integers so when we get to increasingly unique surnames the ranks tend to cluster. In the rank-size distribution of cities, the characteristic fall in the long tail when linearised like this indicates that city size distributions are really log-normal, however this is not the case in terms of surnames. If we exclude some of the long tail, the relationship can look a bit more linear as this plot demonstrates:</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/03/Rplot2.png"><img class="aligncenter size-full wp-image-257" title="Rplot2" src="http://danieljlewis.org/files/2010/03/Rplot2.png" alt="" width="538" height="537" /></a></p>
<p style="text-align: left">
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Non-Quantitative GIS &#8211; A thought</title>
		<link>http://danieljlewis.org/2009/12/11/non-quantitative-gis-a-thought/</link>
		<comments>http://danieljlewis.org/2009/12/11/non-quantitative-gis-a-thought/#comments</comments>
		<pubDate>Fri, 11 Dec 2009 14:29:40 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Critical GIS]]></category>
		<category><![CDATA[GIS]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[mixed-methods]]></category>
		<category><![CDATA[non-quantitative]]></category>
		<category><![CDATA[Pavlovskaya]]></category>
		<category><![CDATA[qualitative]]></category>
		<category><![CDATA[quantitative]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=107</guid>
		<description><![CDATA[I&#8217;ve been reading through a recent book entitled &#8220;Qualitative GIS: A Mixed Methods Approach&#8221; by Meghan Cope and Sarah Elwood, hopefully I&#8217;ll post a full review it soon. In the meantime however, I want to think about one of the sections in it by Marianna Pavlovskaya, specifically her discussion of whether or not GIS is [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2009%2F12%2F11%2Fnon-quantitative-gis-a-thought%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2009%2F12%2F11%2Fnon-quantitative-gis-a-thought%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I&#8217;ve been reading through a recent book entitled &#8220;Qualitative GIS: A Mixed Methods Approach&#8221; by Meghan Cope and Sarah Elwood, hopefully I&#8217;ll post a full review it soon. In the meantime however, I want to think about one of the sections in it by Marianna Pavlovskaya, specifically her discussion of whether or not GIS is inherently a quantitative tool. I tend to agree with Marianna on one front &#8211; rarely do most GIS users directly engage with the quantitative aspects of GIS, use of GIS becomes about spatial reasoning with overlays and logical tesselations of geometry and the impact of visual depiction. On the other hand though I offer a photo of books currently on my desk:</p>
<div id="attachment_110" class="wp-caption aligncenter" style="width: 277px"><a href="http://danieljlewis.org/files/2009/12/DSC00322.JPG"><img class="size-medium wp-image-110" title="DSC00322" src="http://danieljlewis.org/files/2009/12/DSC00322-267x300.jpg" alt="Figure 1: Some books on my shelf" width="267" height="300" /></a><p class="wp-caption-text">Figure 1: Some books on my shelf</p></div>
<p>I think there is a serious point here; GIS is a tool- people are happy to use tools to get the jobs they need doing done, I trust an allen key to undo parts of my bicycle without breaking them, and tighten them to a safe level. I don&#8217;t however need to know exactly how the tool is made, I just trust I know how to use it. The same is true of GIS, functions such as overlays are tools (in fact this is the exact terminology that ESRI use in ArcGIS), you use tools to analyse maps in a non quantitative way, because you reason and understand their usage; I don&#8217;t specifically need to know how the overlay works in a mechanical sense, just that it does. It is generally that case that such tools, even simple tools have a basis in quantitative fields, often mathematics; an overlay is a topological operator deriving from that branch of mathematics.</p>
<p>Now, in a GIS such tools are formed from solving challenges in &#8216;computational geometry&#8217;, and I agree with Pavlovskaya again in that &#8216;computerisation is not quantification&#8217;, except that in this case; that is exactly what it is. There are numerous instances in GIS, shown through point pattern analyses, topological functions and distance decay in which it is clear that GIS is a quantitative tool with a quantitative development.</p>
<p>So is Pavlovskaya wrong then? Well, no. There are different kinds of users in GIS and for the most part users aren&#8217;t looking &#8216;under the bonnet&#8217;, to them a GIS is a set of tools, the use of which causes them to construct (make) GIS as a non-quantitative thing. What seems strange is that in looking for opening for non-quantitative GIS in this perspective, the current way of constructing GIS quantitatively had to be challenged, or seen to be wrong and somehow misguided. In my eyes the interesting thing that comes out of this discussion is the idea of abstraction.</p>
<p>In a sense quantitative, or technical, ways of seeing are increasingly being abstracted: bundled and enclosed into something you can think of as a &#8216;black box&#8217;. This encourages engagement as the most common user experience is a qualitative one, as qualitiative experiences for most are more intuitive &#8211; generally we have the appropriate skills to deal with such experiences, but may need to learn the skills to engage quantitatively in a complex system such as GIS. Software such as GIS packages have the effect of making previosuly difficult quantitative functions much more accessible, even a measure of straightline distance is a quantitiative function &#8211; the Pythagorean theorum, and this can be done in Google Earth and Google Maps which are probably the most accessed GIS in the world (if not also the most accessible). I think the question we have to ask is: what are the tradeoffs in making quantitative functions more accessible, in having them reconstructed as qualitative tools? The obvious answer to this is that when things are conceived as &#8216;black boxes&#8217; they are exactly that: we have no idea of what is going on inside them. Thus, qualitatively, we have to decide whether or not that is important; in fact for qualitative GIS is may not be that important, as the world is reseen with &#8216;fuzzy&#8217; characteristics.</p>
<p>References</p>
<p>Cope, M. and Elwood, S. (2009) Qualitative GIS: A Mixed Methods Approach. Sage, London</p>
<p>(much of Pavlovskaya&#8217;s chapter also appears in the following paper)</p>
<p>Pavlovskaya, M. (2006) &#8216;Theorizing with GIS: A Tool for Critical Geographies?&#8217; Environment and Planning A 38(11) 2003-20</p>
<p>Addendum</p>
<p>Muki Haklay often blogs about usability and critical GIS, amongst numerous other things of interest at his blog <a title="Muki Haklay's Blog" href="http://povesham.wordpress.com/" target="_blank">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2009/12/11/non-quantitative-gis-a-thought/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

