<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Volunteered Geographic Information</title>
	<atom:link href="http://danieljlewis.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://danieljlewis.org</link>
	<description>A Geography/GIS blog by Daniel J Lewis</description>
	<lastBuildDate>Fri, 16 Jul 2010 17:44:09 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Hospital Outpatients in Southwark 08/09</title>
		<link>http://danieljlewis.org/2010/07/16/hospital-outpatients-in-southwark-0809/</link>
		<comments>http://danieljlewis.org/2010/07/16/hospital-outpatients-in-southwark-0809/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 17:44:09 +0000</pubDate>
		<dc:creator>Dan</dc:creator>
				<category><![CDATA[Health GIS]]></category>
		<category><![CDATA[Health Geography]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[admissions]]></category>
		<category><![CDATA[HES]]></category>
		<category><![CDATA[ONS]]></category>
		<category><![CDATA[population]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=380</guid>
		<description><![CDATA[Amongst other things, I&#8217;m beginning to tap into a data source I have acquired for my research known as Hospital Episode Statistics (HES). These are datasets which record the particulars of hospital service by patients. Generally they have a bit of a learning curve, and require the gathering of several additional datasets in order to [...]]]></description>
			<content:encoded><![CDATA[<p>Amongst other things, I&#8217;m beginning to tap into a data source I have acquired for my research known as Hospital Episode Statistics (HES). These are datasets which record the particulars of hospital service by patients. Generally they have a bit of a learning curve, and require the gathering of several additional datasets in order to make them useful. Having gathered all this data and put in all within a MySQL database I decided to conduct a basic analysis, using my study site of Southwark as a guinea pig. Essentially I wanted to known whether more people from Southwark were using hospitals of outpatient appointments than we would expect from national (England) figures. There are many reasons why any given area might be using health care services at a greater or lesser rate than other areas, but for the moment I simply wanted to see whether there was any interesting patterns.</p>
<p>In the HES data it is simple to calculate the total number of people using outpatient care, what is more complex is deriving an expected score from the national data. I went about it in the following way:</p>
<p>Firstly, I took the ONS experimental population projections from mid-2008 and calculated the number of people in each Southwark LSOA, and at the national (England) level, for each of the available age bands by men and women. The population projection age bands are quite coarse, giving totals for 5 population groups: 0-15, 16-29, 30-44, 45-64 (for men) or 45-59 (for women) and 65+ (for men) and 60+ for women. This isn&#8217;t ideal, but the age bands do roughly correlate with the different groups of mortality causes in the Grim Reaper&#8217;s road map (Shaw, Thomas, Smith and Dorling, 2008). Then I calculated the admission totals for all of the age-sex bands nationally (England), with this I could create a ratio of admissions against popualtion nationally. By applying this ratio to the Southwark LSOA population projects I could create an expected value for number of admissions per areas. Finally it is simply a case of dividing the observed admissions by the expected and multipling by 100 to get a score.</p>
<p>I mapped the results as follows, a score of 100 suggests that the area is not different from the national picture, whereas a value higher than 100 suggests that the area has more people using hospitals than we would expect and a value lower than 100 suggests the converse.</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/07/Outpatient0809a.jpg"><img class="aligncenter size-large wp-image-384" title="Outpatient0809a" src="http://danieljlewis.org/files/2010/07/Outpatient0809a-724x1024.jpg" alt="" width="579" height="819" /></a>In the case of Southwark, the pattern seems to follow those that are often observed in my work on Southwark, in that the Bankside areas, and the southern part of the borough, in addition with the north-eastern former docklands area have levels of admissions that are equivilant too, or lower than what we would expect nationally, whereas the central areas have admission numbers higher than the national level.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/07/16/hospital-outpatients-in-southwark-0809/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Household Types, Combinatorial Problems and Pure Maths</title>
		<link>http://danieljlewis.org/2010/07/15/household-types-combinatorial-problems-and-pure-maths/</link>
		<comments>http://danieljlewis.org/2010/07/15/household-types-combinatorial-problems-and-pure-maths/#comments</comments>
		<pubDate>Thu, 15 Jul 2010 18:17:07 +0000</pubDate>
		<dc:creator>Dan</dc:creator>
				<category><![CDATA[Modeling]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[combinatorics]]></category>
		<category><![CDATA[functions]]></category>
		<category><![CDATA[households]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=369</guid>
		<description><![CDATA[In some of the work I&#8217;m currently doing looking at households as derived from the Southwark patient register I wanted to go beyond a quantification of how many people lived in a households &#8211; a rudimentary household size, to looking at the composition of a household and hence what type of household it represented. In [...]]]></description>
			<content:encoded><![CDATA[<p>In some of the work I&#8217;m currently doing looking at households as derived from the Southwark patient register I wanted to go beyond a quantification of how many people lived in a households &#8211; a rudimentary household size, to looking at the composition of a household and hence what type of household it represented. In order to do this I looked at how types of household were generally reported in the UK Census, in European statistics, and in terms of social research on the life course, as well as in health literature itself. In terms of defining households, I found that although complex household typologies do exist, there exists a general set of likely household forms: as expected these revolve around the single, co-habiting, family, single parenthood, extended family etc models. As I have data on individuals I first decided to classify individuals into 5 broad categories that seem important in the literature and then look at the composition of these categories within households. The categories were:</p>
<p>1) Dependent Children (&lt;18 yrs old)</p>
<p>2) Adult Male (18-65 yrs old)</p>
<p>3) Adult Female (18-60 yrs old)</p>
<p>4) Male Pensioner (65+ yrs old)</p>
<p>5) Female Pensioner (60+ yrs old)</p>
<p>Evidence suggests that these represent the coarsest categories that could usefully represent significant periods within the life course, as well as being relevant to changes in health status. In a sense, different type of household structure can be described by different combinations of these person classes for different household sizes.</p>
<p>I decided to test this by calculating all the possible combinations of these 5 classes for a 2 person household and then looking at their uptake in the actual household data I had derived from the Southwark patient register. It turned out that for a two person household there were 15 different ways in which you could combine the 5 person classes in order to create a unique household:</p>
<p><em>Children Only (Parents Unregistered); Single Parent Male and Child; Co-Habiting Men; Single Parent Female and Child; Single Parent Male Pensioner and Child; Co-Habiting Man and Woman; Co-Habiting Man and Male Pensioner; Co-Habiting Women; Single Parent Female Pensioner and Child; Cohabiting Woman and Male Pensioner; Cohabiting Man and Female Pensioner; Cohabiting Male Pensioners; Cohabiting Woman and Female Pensioner; Cohabiting Male and Female Pensioner; Cohabiting Female Pensioners.</em></p>
<p>Using this typology of 15 possible household types, I extracted the two person households from the larger dataset and wrote a Python script to classify these households. The result for 27,124 households was a follows:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/07/2personHHtype.png"><img class="aligncenter size-full wp-image-370" title="2personHHtype" src="http://danieljlewis.org/files/2010/07/2personHHtype.png" alt="" width="594" height="330" /></a>What this graph seems to demonstrate is that roughly half of all 2 person households consist of a man and a woman (either adult or pensioner) cohabiting, and roughly a further 22% of same sex cohabitation. In this dataset for two person household, single parents only make up around 15% of households of which almost 13% is a single female parent (adult or pensioner) and a child. All other groups only make up around 13% of households, but crucially the only category in which no households were found to exist was the adult man cohabiting with a male pensioner category. Indeed many of the smaller categories can be interpreted as having inherently important social roles, the adult woman looking after a male or female pensioner for instance.</p>
<p style="text-align: left">Essentially, the terrain of household type was a lot more nuanced and tricky than I&#8217;d at first though, made even more so by my realisation that as household size increases, the number of possible combinations of the person types within a  household increases dramatically. I wrote a python script to calculate the number of possible different sets of people for the household sizes 1 to 10:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/07/possibles.png"><img class="aligncenter size-full wp-image-373" title="possibles" src="http://danieljlewis.org/files/2010/07/possibles.png" alt="" width="564" height="334" /></a>This presents a difficult situation, even for reasonably small households. This is a problem known as &#8220;combinatorial mathematics&#8221; or &#8220;<a title="Wiki - combinatorics" href="http://en.wikipedia.org/wiki/Combinatorics" target="_blank">combinatorics</a>&#8220;. I decided to see what I could learn about this distribution, so I looked for patterns in the sequence, as you are taught in pre-GCSE maths and soon found that the sequence had a constant fourth difference:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/07/difference-table.png"><img class="aligncenter size-full wp-image-375" title="difference table" src="http://danieljlewis.org/files/2010/07/difference-table.png" alt="" width="622" height="226" /></a>This constant fourth difference indicated that the sequence can be explained by a quartic function, of which is was easy to then calculate the form:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/07/CodeCogsEqn4.gif"><img class="aligncenter size-full wp-image-376" title="CodeCogsEqn(4)" src="http://danieljlewis.org/files/2010/07/CodeCogsEqn4.gif" alt="" width="556" height="22" /></a></p>
<p style="text-align: left">Sadly not one of those classically beautiful equations.</p>
<p style="text-align: left">This all leads to the issue of how I now classify households, clearly the number of possible sets makes anything above around 4 people per household fairly intractable. I&#8217;ll experiment with 3 households and see whether I can account for most household types with a few set patterns and then look at households that fall outside of this remit.</p>
<p style="text-align: left">Interesting none the less, I hadn&#8217;t expected to be doing much of this kind of maths!</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/07/15/household-types-combinatorial-problems-and-pure-maths/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Computing the geometric median in Python</title>
		<link>http://danieljlewis.org/2010/07/09/computing-the-geometric-median-in-python/</link>
		<comments>http://danieljlewis.org/2010/07/09/computing-the-geometric-median-in-python/#comments</comments>
		<pubDate>Fri, 09 Jul 2010 10:17:56 +0000</pubDate>
		<dc:creator>Dan</dc:creator>
				<category><![CDATA[Modeling]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[allocation]]></category>
		<category><![CDATA[dijkstra]]></category>
		<category><![CDATA[geometric]]></category>
		<category><![CDATA[location]]></category>
		<category><![CDATA[matplotlib]]></category>
		<category><![CDATA[median]]></category>
		<category><![CDATA[optimisation]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[service]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=362</guid>
		<description><![CDATA[I noticed in a beta of ArcGIS 10 (then called 9.4) that there was a &#8216;new&#8217; option for computing a Geometric Median which didn&#8217;t exist in my copy of ArcGIS 9.3. This is an interesting concept, as in 1d statistics, the geometric (2d) mean is easy to calculate, being the average of all the X [...]]]></description>
			<content:encoded><![CDATA[<p>I noticed in a beta of ArcGIS 10 (then called 9.4) that there was a &#8216;new&#8217; option for computing a Geometric Median which didn&#8217;t exist in my copy of ArcGIS 9.3. This is an interesting concept, as in 1d statistics, the geometric (2d) mean is easy to calculate, being the average of all the X coords and all the Y coords. From stats we know that the Mean and Median value of a distribution will coincide if the data is perfectly normally distributed; however in the real world data usually will only approximate a normal distribution, leading to a mean value that is different from the midpoint, or median.  Therefore for a skewed distribution on the plane, we encounter a situation in which the mean is not necessarily the best representation of the &#8216;centre&#8217; of the data, thus we may wish to calculate the median; doing so will also give us a good idea of the direction of the skew of the point pattern we are investigating. In calculating the median of a 2d point pattern we can express the problem as a need to:</p>
<p><em> minimise the sum of squared distances from all points in a distribution to a centre.</em></p>
<p>Thus it is reasonably clear that we are dealing with an &#8216;optimisation problem&#8217;, something that I have experimented with before in work I conducted using the &#8216;transportation problem&#8217;, a classic linear programming problem.</p>
<p>In terms of application, I though that finding the median of a distribution of people around a service would be a useful, albeit basic, indication of whether all people were making a similar trip to a service, or whether there were other effects at work (this would be evidenced by a median centre that was not close to the actual service location). I though I would be able to code the optimisation routine in Python using pre-existing insight. Notably, the <a title="Geometric Median" href="http://en.wikipedia.org/wiki/Geometric_median" target="_blank">wikipedia page</a> on this details the Weiszfeld Algorithm as the acknowledged computational solution to the geometric median problem, it takes the form:</p>
<p><a href="http://danieljlewis.org/files/2010/07/weiszfeld.png"><img class="aligncenter size-full wp-image-363" title="weiszfeld" src="http://danieljlewis.org/files/2010/07/weiszfeld.png" alt="" width="368" height="61" /></a>However, actually writing the algorithm proved somewhat tough. Essentially the answer is to start with a candidate data point (I started with the mean centre) and calculate the objective function &#8211; in this case the sum of the euclidian distances of all points from the candidate centre. Then pass the candidate point through the Weiszfeld Algortihm and reassess the objective function, at such a point as the objective function converges a median has been found. There is no guarantee that the median found is the optimal median though, and depending of the data there may be more than 1 optimal solution. Below is a solution for some of my data (the data has been randomly offset by 75m to preserve anonymity) on patient registrations to a doctor.</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/07/geomedian.png"><img class="aligncenter size-large wp-image-365" title="geomedian" src="http://danieljlewis.org/files/2010/07/geomedian-1024x742.png" alt="" width="574" height="415" /></a>Here we can see that the mean and median centres are slightly different, suggesting that the patient population is skewed slightly northwards, most likely as a result of discontinuous urban infrastructure.</p>
<p style="text-align: left">The scatterplot was achieved using the <a title="MatPlotLib @ Sourceforge" href="http://matplotlib.sourceforge.net/index.html" target="_blank">matplotlib</a> Python plotting library. This was just a test, but I imagine more complex outputs can be achieved reasonably easily.</p>
<p style="text-align: left">Notably, this technique is using euclidian distance, which in a dense urban environment may be misleading, I note that there is a relatively simple execution of the <a title="Python Dijkstra" href="http://code.activestate.com/recipes/119466-dijkstras-algorithm-for-shortest-paths/" target="_blank">Dijkstra algorithm for shortest paths in Python</a>, and I am curious whether this could be integrated to find a geometric median on the network, although I suspect that it may be unworkable due to computational time constraints, although for smaller problems it might be ok.</p>
<p style="text-align: left">Naturally there are algorithms that can calculate a solution to the above for <em>p</em>-medians (i.e. several service centres in a population- commonly known as location-allocation), it is something that <a title="Paul Densham" href="http://www.geog.ucl.ac.uk/~pdensham/s_t_paper.html" target="_blank">Paul Densham</a> at UCL has worked on, and his code is making a return to service in ArcGIS version 10. I&#8217;m looking forward to seeing it, as it is a very difficult problem to solve (and in fact already has been &#8217;solved&#8217;), and not one I intend to investigate!</p>
<p style="text-align: left">My code for the geometric median is <a href="http://danieljlewis.org/files/2010/07/geomedian.pdf">here.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/07/09/computing-the-geometric-median-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Distribution of Household Occupancy in Southwark</title>
		<link>http://danieljlewis.org/2010/06/09/distribution-of-household-occupancy-in-southwark/</link>
		<comments>http://danieljlewis.org/2010/06/09/distribution-of-household-occupancy-in-southwark/#comments</comments>
		<pubDate>Wed, 09 Jun 2010 14:19:05 +0000</pubDate>
		<dc:creator>Dan</dc:creator>
				<category><![CDATA[Geography]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[distribution]]></category>
		<category><![CDATA[exponential decay]]></category>
		<category><![CDATA[households]]></category>
		<category><![CDATA[log]]></category>
		<category><![CDATA[social]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=355</guid>
		<description><![CDATA[I&#8217;ve been doing some more analysis on the Southwark GP patient register at the household level. After a fair amount of cleaning and interpretation I&#8217;ve arrived at the following distribution of households.

There are a number of interesting things to say about this data, not least in the section that I&#8217;ve marked &#8216;larger social groupings&#8217; as [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been doing some more analysis on the Southwark GP patient register at the household level. After a fair amount of cleaning and interpretation I&#8217;ve arrived at the following distribution of households.</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/06/HHDistAnnotate.png"><img class="aligncenter size-full wp-image-356" title="HHDistAnnotate" src="http://danieljlewis.org/files/2010/06/HHDistAnnotate.png" alt="" width="578" height="380" /></a></p>
<p style="text-align: left">There are a number of interesting things to say about this data, not least in the section that I&#8217;ve marked &#8216;larger social groupings&#8217; as it seems to suggest a possible migrant social network effect, as the larger household groupings tend to be of minority ethnic groups, including Nigerians and other Africans, Hispanics and South-East Asians who are perhaps using cross-country social ties as help in getting established when first arriving in the UK. However, visually the shape of the distribution of household occupancy is very distinctive, and actually is very close to an exponential. Here I&#8217;ve taken the log of frequency of occurence and plotted the best-fit line through the plot:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/06/LogHHDist.png"><img class="aligncenter size-large wp-image-358" title="LogHHDist" src="http://danieljlewis.org/files/2010/06/LogHHDist-1024x682.png" alt="" width="574" height="382" /></a>This linear trend means that the model <strong>log(y) = -0.1635x + 4.602 </strong>is a good predictor of the number of Households we can expect to exist in Southwark for a given value of x, or occupancy.</p>
<p style="text-align: left">It is not entirely clear however why this situation is the case. Firstly, it may just be an artifact of the data, either of the matching process that has occured between the patient register and OS AddressLayer2, the way that GPs encode patient addresses in the first place, or the fact that the patient register is only a sample of the total population of Southwark, i.e. those people who register with a doctor. Secondly, it may simply be a reflection of the structure of the built environment in Southwark &#8211; i.e. what kind of housing is actually available. However, the distribution is also subject to the choices of individuals or groups.</p>
<p style="text-align: left">Currently, I am in the process of dissagregating the above characteristics and looking at trends by different population groups.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/06/09/distribution-of-household-occupancy-in-southwark/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Review of Elementary Statistics for Geographers- Bert et al.</title>
		<link>http://danieljlewis.org/2010/06/07/review-of-elementary-statistics-for-geographers-bert-et-al/</link>
		<comments>http://danieljlewis.org/2010/06/07/review-of-elementary-statistics-for-geographers-bert-et-al/#comments</comments>
		<pubDate>Mon, 07 Jun 2010 16:07:13 +0000</pubDate>
		<dc:creator>Dan</dc:creator>
				<category><![CDATA[Geography]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=351</guid>
		<description><![CDATA[A review I authored of Bert, Barber and Rigby&#8217;s &#8220;Elementary Statistics for Geographers&#8221; third edition, has made it into the Journal of the Royal Statistical Society Series A: Statistics in Society. The book is a truly excellent collection of statistical methods themed explicitly for use by geographers and spatial scientists, moreover the explanation and presentation [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/06/geogstats.jpg"><img class="size-full wp-image-352 alignleft" title="geogstats" src="http://danieljlewis.org/files/2010/06/geogstats.jpg" alt="" width="185" height="240" /></a>A review I authored of Bert, Barber and Rigby&#8217;s &#8220;<em>Elementary Statistics for Geographers</em>&#8221; third edition, has made it into the Journal of the Royal Statistical Society Series A: Statistics in Society. The book is a truly excellent collection of statistical methods themed explicitly for use by geographers and spatial scientists, moreover the explanation and presentation is superb. This has become a core book for myself and my colleague <a title="James' blog" href="http://spatialanalysis.co.uk/" target="_blank">James Cheshire</a> as we continue along the route of our PhD studies. I have said much the same thing in my review, accessible <a title="RSS A: Statistics in Society" href="http://www3.interscience.wiley.com/journal/123305751/abstract" target="_blank">here.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/06/07/review-of-elementary-statistics-for-geographers-bert-et-al/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Jenks&#8217; Natural Breaks Algorithm in Python</title>
		<link>http://danieljlewis.org/2010/06/07/jenks-natural-breaks-algorithm-in-python/</link>
		<comments>http://danieljlewis.org/2010/06/07/jenks-natural-breaks-algorithm-in-python/#comments</comments>
		<pubDate>Mon, 07 Jun 2010 15:53:27 +0000</pubDate>
		<dc:creator>Dan</dc:creator>
				<category><![CDATA[Modeling]]></category>
		<category><![CDATA[choropleth]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Jenks]]></category>
		<category><![CDATA[optimisation]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=347</guid>
		<description><![CDATA[The Jenks Optimal, or Jenks&#8217; Natural Breaks, Algorithm is a common method for classifying data presented in a choropleth map. It aims to present a series of break values that best represent the actual breaks observed in the data as opposed to some arbitrary classificatory scheme (i.e. equal interval), in this way the actual clustering [...]]]></description>
			<content:encoded><![CDATA[<p>The Jenks Optimal, or Jenks&#8217; Natural Breaks, Algorithm is a common method for classifying data presented in a choropleth map. It aims to present a series of break values that best represent the actual breaks observed in the data as opposed to some arbitrary classificatory scheme (i.e. equal interval), in this way the actual clustering of data values is preserved (subject to the arbitrary specification of <em>k </em>classes). The method was originally published in George Jenks&#8217; (1977) <em>Optimal Data Classification for Choropleth Maps</em> and reportedly represented the culmination of 15 years research on the topic, the method primarily derived from Walter Fisher&#8217;s work &#8216;<em>On grouping for maximum homogeneity</em>&#8216;. The specifics of the algorithm aim to create <em>k </em>classes so that the variance within groups is minimised, as such it is a problem of numerical optimisation.</p>
<p>A paper by Michael Coulson (1987) entitled <em>In The Matter Of Class Intervals For Choropleth Maps: With Particular Reference To The Work Of George F Jenks </em>details a method that Jenks apparently authored, but never published, to derive how optimum the number of classes chosen was, the method Goodness of Variance Fit (GVF) works by taking the difference between the squared deviations from the array mean (SDAM) and the squared deviations from the class means (SDCM), and dividing by the SDAM. Thus:</p>
<p style="text-align: center">GVF = (SDAM &#8211; SDCM)/SDAM</p>
<p style="text-align: left">However, it is likely this was never published as the GVF improves as the number of classes increases, until at such a points as there are the same number of classes as data points, the GVF reaches unity. Nonetheless, I have included a rudimentary example for calculating this statistic. In reality, this method is used to generalise data into a few classes for visualisation, so you are unlikely to be using more than 7 (+/- 2) classes; number of classes can be loosely assigned by looking at the distribution histogram, but often this is difficult.</p>
<p style="text-align: left">The script is <a href="http://danieljlewis.org/files/2010/06/Jenks.pdf">here.</a></p>
<p style="text-align: left">Acknowledgement: The initial script I used for the Python conversion can be found (in JAVA and Fortran) here: https://stat.ethz.ch/pipermail/r-sig-geo/2006-March/000811.html</p>
<p style="text-align: left">
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/06/07/jenks-natural-breaks-algorithm-in-python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Review of Rethinking Maps by Dodge, Kitchin and Perkins in EPB</title>
		<link>http://danieljlewis.org/2010/06/03/review-of-rethinking-maps-by-dodge-kitchin-and-perkins-in-epb/</link>
		<comments>http://danieljlewis.org/2010/06/03/review-of-rethinking-maps-by-dodge-kitchin-and-perkins-in-epb/#comments</comments>
		<pubDate>Thu, 03 Jun 2010 15:13:30 +0000</pubDate>
		<dc:creator>Dan</dc:creator>
				<category><![CDATA[Cartography]]></category>
		<category><![CDATA[Critical GIS]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Dodge]]></category>
		<category><![CDATA[EPB]]></category>
		<category><![CDATA[Kitchin]]></category>
		<category><![CDATA[Perkins]]></category>
		<category><![CDATA[rethinking maps]]></category>
		<category><![CDATA[review]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=340</guid>
		<description><![CDATA[This month has seen the publication of my review of &#8220;Rethinking Maps: New Frontiers in Cartographic Theory&#8221;, editted by Martin Dodge, Rob Kitchin and Chris Perkins, in Environment and Planning B.
The review begins thus:
This collection of essays marks a milestone of scholarship in critical cartography, a discourse most notably augered by the seminal work of [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://danieljlewis.org/files/2010/06/rethinkingmaps.png"><img class="size-large wp-image-341 alignleft" title="rethinkingmaps" src="http://danieljlewis.org/files/2010/06/rethinkingmaps-656x1024.png" alt="" width="236" height="368" /></a>This month has seen the publication of my review of &#8220;Rethinking Maps: New Frontiers in Cartographic Theory&#8221;, editted by Martin Dodge, Rob Kitchin and Chris Perkins, in Environment and Planning B.</p>
<p>The review begins thus:</p>
<address>This collection of essays marks a milestone of scholarship in critical cartography, a discourse most notably augered by the seminal work of John B Harley collected in The New Nature of Maps (2001). This collection moves forward from Harley and provides a timely summation and spur for future research in maps and mapping. In the final chapter of this edited book, a chapter subtitled &#8220;A manifesto for map studies&#8221;, Martin Dodge, Chris Perkins, and Rob Kitchin make clear that: &#8220;It is, we would argue, a stimulating time for mapping scholarship with many challenges and opportunities opening up: no single epistemological position now dominates interpretation&#8221; (page 229).</address>
<address> </address>
<p>For more see the <a title="EPB Reviews" href="http://www.envplan.com/abstract.cgi?id=b3703rvw" target="_blank">full review</a>. Sorry if you aren&#8217;t a subscriber to the journal, I suspect I can&#8217;t post the full text though.</p>
<p>A proof of the first chapter, courtesy of Martin Dodge, is available <a title="Chapter1" href="http://personalpages.manchester.ac.uk/staff/m.dodge/rethinking_maps_paper_pageproofs.pdf">here.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/06/03/review-of-rethinking-maps-by-dodge-kitchin-and-perkins-in-epb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UK OAC map in Python</title>
		<link>http://danieljlewis.org/2010/06/02/uk-oac-map-in-python/</link>
		<comments>http://danieljlewis.org/2010/06/02/uk-oac-map-in-python/#comments</comments>
		<pubDate>Wed, 02 Jun 2010 11:05:57 +0000</pubDate>
		<dc:creator>Dan</dc:creator>
				<category><![CDATA[Cartography]]></category>
		<category><![CDATA[GIS]]></category>
		<category><![CDATA[Representation]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[OAC]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[shapely]]></category>
		<category><![CDATA[UK]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=336</guid>
		<description><![CDATA[Here is a quick confirmation that you can use Python to draw very detailed maps; using the previously specified method I was unable to get python to draw all UK OAs due to their great number (c.220,000) and high complexity (c.50,000,000) vertices. Additionally I was unable to use the generalised OA boundaries for the UK [...]]]></description>
			<content:encoded><![CDATA[<p>Here is a quick confirmation that you can use Python to draw very detailed maps; using the previously specified method I was unable to get python to draw all UK OAs due to their great number (c.220,000) and high complexity (c.50,000,000) vertices. Additionally I was unable to use the generalised OA boundaries for the UK from UKBorders as they contain topological errors that the shapefile reader cannot deal with. ArcGIS is obviously a bit clever in how it handles bad topologies. So I extracted all the vertices and fed them into shapely polygons, and visualised them in the same way, but without reading shapefiles directly into python and was able to output this:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/06/UKOAC.png"><img class="aligncenter size-large wp-image-337" title="UKOAC" src="http://danieljlewis.org/files/2010/06/UKOAC-640x1024.png" alt="" width="576" height="922" /></a>This method has had an impact on the speed of computation as it can take roughly 25 minutes to output this map. The map looks pretty good, aside from a slightly odd polygon in the Bristol channel. Nevertheless, coupled with the operations that shapely, and other geo-libraries, can do this si increasing indication of the maturity of GIS in a variety of platforms. Oh, and it&#8217;s all free!</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/06/02/uk-oac-map-in-python/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>&#8216;Locally led&#8217; NHS Service changes dubious</title>
		<link>http://danieljlewis.org/2010/06/01/locally-led-nhs-service-changes-dubious/</link>
		<comments>http://danieljlewis.org/2010/06/01/locally-led-nhs-service-changes-dubious/#comments</comments>
		<pubDate>Tue, 01 Jun 2010 14:17:28 +0000</pubDate>
		<dc:creator>Dan</dc:creator>
				<category><![CDATA[Health Geography]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[community]]></category>
		<category><![CDATA[health]]></category>
		<category><![CDATA[Lansley]]></category>
		<category><![CDATA[local]]></category>
		<category><![CDATA[provision]]></category>
		<category><![CDATA[service]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=330</guid>
		<description><![CDATA[Since coming to government, new Conservative Health Secretary Andrew Lansley has sought to fulfil the pledge he made to put an end to local restructurings of NHS service delivery by authorities higher up the NHS hierarchy. Ostensibly he believes that local decision-making will have a better overall effect on the quality of outcomes for patients [...]]]></description>
			<content:encoded><![CDATA[<p>Since coming to government, new Conservative Health Secretary Andrew Lansley has sought to fulfil the pledge he made to put an end to local restructurings of NHS service delivery by authorities higher up the NHS hierarchy. Ostensibly he believes that local decision-making will have a better overall effect on the quality of outcomes for patients and hence lead to a better health service. Specifically he wants to provide GPs with an opportunity to work with community leaders and their local authorities to steer local services. The core elements actually do not differ greatly from the outgoing Labour policies, particularly with respect to patient choice; however I will argue that there is a clear danger in engaging to too great an extent with a purely &#8216;local&#8217; approach, in general there seems to be something of a misconception in Government, particularly in the provision of local services (i.e. schools), that local approaches are somehow &#8216;better&#8217;.</p>
<p>Firstly, let us consider something that the Government seems to do without fail, something that I, as a Geographer, find to be a grave sin of omission. That is the apparently indiscriminate use of spatial qualifiers without so much of an explanation as to their meaning. The use of &#8216;local&#8217; and &#8216;community&#8217; are spectacularly misleading without qualification, and yet they are often used because people seems to think they understand what is meant by them &#8211; everyone considers themselves part of a community, and local to a service &#8211; but will these personal feelings about their socio-spatial connections actually translate to the ability to input on healthcare decision making? My investigation of access and registeration of patients to GPs in Southwark has shown that a) primary care is a very location based service and without fail each doctor exhibits a characteristic distance decay function that describes the pattern of registration with a GP suggest to some socio-economic criteria, but also that b) patients overlap to a large extent in a densely-populated urban context, the suggestion being that activity-spaces (i.e. retail areas, workplace and schools) has a distorting effect on patterns of registration for some people. To this end I suggest that a &#8216;community&#8217; can be defined independently for individual GPs based upon the patterns of patient uptake unique to that service, although there may be some strong correlations with residential, workplace, educational etc. communities that overlap it (of course for some GPs the profile of its registered community may be greatly divergent from its observed local (defined by proximity to a GP) community). The following map is an example of this kind of complexity:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/06/GPRegSwk.jpg"><img class="aligncenter size-full wp-image-332" title="GPRegSwk" src="http://danieljlewis.org/files/2010/06/GPRegSwk.jpg" alt="" width="420" height="705" /></a></p>
<p style="text-align: left">Here it is clear that any definition of locality or community based upon an arbitrary areal basis yields groups of people who could be registered to as many as 29 different Southwark GPs in only a very small area. This is in fact a very good, simple, illustration of patient choice in action. There are a lot of questions to ask Mr Lansley about how he views &#8216;local&#8217; or &#8216;community&#8217;, and whether he is willing to enshrine that definiton in policy before we actually consent to doing anything with provision of services.</p>
<p style="text-align: left">Further still, I have claimed that GPs are very much location based services &#8211; they are, over a certain distance (in Southwark this is about 6 -10km) no one is registered with a GP, choosing instead a closer service. In many ways this was constrained by the pre-existing system of &#8216;catchment areas&#8217;, however these were set to be removed by the end of the year in the quest for patient choice, thus the potential for registration is opened up to people using doctors near their place of work (for instance) rather than than near their home, thus should these people have a say in provison of services in the area within which they do not live &#8211; they are part of the GP&#8217;s &#8216;community&#8217; but not of the residential one. A good illustration of this  is actually the polyclinic system &#8211; Southwark is geared up to introduce 3 polyclinics &#8211; one which already exists as a large GP-led health centre in the centr eof the borough, and two in the north connected to hospitals, the biggest difficulty faced at the moment is in estimating the daytime population (i.e. transient workforce) of the Southbank in order to account for likely polyclinic usage &#8211; a huge number of people who do not live in Southwark but will likely have some part of their healthcare provided for by Southwark PCT.</p>
<p style="text-align: left">It is also unclear what Mr Lansley refers to when he talks about &#8216;top-down&#8217;: is it the Strategic health authorities and the DoH itself? It cannot be the PCTs as Mr Lansley claims that the new criteria will have the support of &#8216;GP commissioners&#8217; and it is the PCTs that actually do the commissioning, further the idea of GPs working with local authorities is largely the same of GPs working with PCTs now, as PCTs and LAs are generally coterminous.</p>
<p style="text-align: left">Whilst it is pleasing to see a politician quoting the need for an evidence based appraoch to restructuring, it is unclear what evidence he might base GP quality on, the current payment method (QoF) is based on GP reporting of pre-specified target outcomes to a centralised authority, surely GPs will simply follow these directives in order to bring in as much money as possible. Indeed, these stats are strong recommended not to be used as measures of GP quality as they are by-and-large patchy in what they cover, and include little demographic data. Indeed, had the previous government not already cut the NHS IT initiative that would have made reporting of outcomes actually feasible nationally, the new government would have no doubt cut it anyway.</p>
<p style="text-align: left">The final worry I have is one of equity, something upon which the NHS is founded &#8211; the provision of a fair service contingent on those that need it, that is free at point of service. Surely such an atomistic approach to healthcare provision as Mr Lansley seems to specify, is liable to deepen the perceived &#8217;social gradient&#8217; in health care, as without a careful (top-down) hand, the GPs and communities best-equiped to play an active role in orchestrating GP services will get increasingly better provision: most likely to be the wealthier areas of the country. There needs to be at least some form of national accountability for a national health service.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/06/01/locally-led-nhs-service-changes-dubious/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More Thematic Maps in Python &#8211; shapely and descartes</title>
		<link>http://danieljlewis.org/2010/05/27/more-thematic-maps-in-python-shapely-and-descartes/</link>
		<comments>http://danieljlewis.org/2010/05/27/more-thematic-maps-in-python-shapely-and-descartes/#comments</comments>
		<pubDate>Thu, 27 May 2010 16:58:14 +0000</pubDate>
		<dc:creator>Dan</dc:creator>
				<category><![CDATA[Representation]]></category>
		<category><![CDATA[descartes]]></category>
		<category><![CDATA[matplotlib]]></category>
		<category><![CDATA[OAC]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[shapely]]></category>
		<category><![CDATA[Wales]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=326</guid>
		<description><![CDATA[Thanks to Sean Gillies for commenting on my last post, he put me onto a couple of Python packages that he&#8217;s been involved in creating that allow you to do some really excellent geospatial things. The shapely package is a great implementation of a lot of spatial analyses that you can do on projected (i.e. [...]]]></description>
			<content:encoded><![CDATA[<p>Thanks to <a title="Sean Gillies Homepage" href="http://sgillies.net/" target="_blank">Sean Gillies</a> for commenting on my last post, he put me onto a couple of Python packages that he&#8217;s been involved in creating that allow you to do some really excellent geospatial things. The <a title="shapely" href="http://trac.gispython.org/lab/wiki/Shapely" target="_blank">shapely</a> package is a great implementation of a lot of spatial analyses that you can do on projected (i.e. flattened) datasets, including topological operations and a full set of object types. The <a title="Descartes package" href="http://pypi.python.org/pypi/descartes/1.0" target="_blank">descartes</a> package allows better integration of matplotlib with spatial data, particularly in terms of not having to use the &#8220;fill&#8221; plotting function repeatedly, but creating a more efficient set of &#8220;patches&#8221; which can then be added to the figure plot. The overal impression I got from descartes is that it wasn&#8217;t spectacularly different from the method detailed in my previous post, but it gives you more control and stability over the map plotting process; whereas using raw matplotlib you are inclined to hope that the map outputs correctly (it all seems a bit up to chance), using descartes you have a more robust and easily manipulable output.</p>
<p>In order to test this I rewrote my previous thematic map script to: firstly convert the shapefile geometries into shapely polygons, and secondly to pass those shapely polygons to descartes and draw a map plot using descartes-matplotlib. The only slightly odd piece of functionality that I found was that you can&#8217;t pass the shapely polygon object a list of shapely points in order to create the polygon, rather you have to pass a list of x,y tuples &#8211; much less satisfying!</p>
<p>Nonetheless, the changes were easy to implement, and with the previous script as given basically include:</p>
<pre>from shapely.geometry import Polygon

points = []
for i in range(0,<em>number of points in shapefile</em>):
 tempx = float(<em>x coord of point in shapefile polygon</em>)
 tempy = float(<em>y coord of point in shapefile polygon</em>)

 points.append((tempx,tempy))
polygon = Polygon(points)
</pre>
<p>The above method creates a simple polygon without holes, shapely can accomodate this is need be though. Having created the shapely polygons, all that remains is to create a patch.</p>
<pre>from descartes import PolygonPatch

patch = PolygonPatch(polygon, <em>plus colour and line considerations</em>)
</pre>
<p>Then you simply add the patch to the matplotlib figure you have already created so:</p>
<pre>from matplotlib import pyplot

fig = pyplot.figure(1, figsize = [10,10], dpi = 300)   #create 10x10 figure
ax = fig.addsubplot(111)    #Add the map frame (single plot)

# here you create all the polygons and patches

ax.addpatch(patch)   # simply add the patch to the subplot
# set plot vars
ax.set_xlim(<em>get xmin and xmax values from data</em>)
ax.set_ylim(<em>get ymin and ymax values from data</em>)
ax.set_aspect(1)

pyplot.show()
</pre>
<p>Using these basics I was able to create a basic OAC map using Welsh OAs as an example:</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/05/WalesOAC1.png"><img class="aligncenter size-full wp-image-328" title="WalesOAC" src="http://danieljlewis.org/files/2010/05/WalesOAC1.png" alt="" width="520" height="545" /></a></p>
<pre>
</pre>
<pre>
</pre>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/05/27/more-thematic-maps-in-python-shapely-and-descartes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
