<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Volunteered Geographic Information &#187; Southwark</title>
	<atom:link href="http://danieljlewis.org/category/southwark/feed/" rel="self" type="application/rss+xml" />
	<link>http://danieljlewis.org</link>
	<description>A Geography/GIS blog by Daniel J Lewis</description>
	<lastBuildDate>Tue, 20 Dec 2011 17:15:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Weighted Mean Direction Surfaces in Python</title>
		<link>http://danieljlewis.org/2011/08/31/weighted-mean-direction-surfaces-in-python/</link>
		<comments>http://danieljlewis.org/2011/08/31/weighted-mean-direction-surfaces-in-python/#comments</comments>
		<pubDate>Wed, 31 Aug 2011 13:18:18 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[GIS]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[Representation]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[Brunsdon]]></category>
		<category><![CDATA[Charlton]]></category>
		<category><![CDATA[circular statistics]]></category>
		<category><![CDATA[mean direction]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[weighting]]></category>

		<guid isPermaLink="false">http://danieljlewis.org.blogs.splintdev.geog.ucl.ac.uk/?p=537</guid>
		<description><![CDATA[I work a lot with flows and spatial interactions, one thing that I&#8217;ve wanted to do for a while is compute a mean flow direction surface. Unfortunately, arithmetic means don&#8217;t work for angular data, this is because it cannot account for the circular nature of the distribution of angular measurements. For instance the angles 5 [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2011%2F08%2F31%2Fweighted-mean-direction-surfaces-in-python%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2011%2F08%2F31%2Fweighted-mean-direction-surfaces-in-python%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I work a lot with flows and spatial interactions, one thing that I&#8217;ve wanted to do for a while is compute a mean flow direction surface. Unfortunately, arithmetic means don&#8217;t work for angular data, this is because it cannot account for the circular nature of the distribution of angular measurements. For instance the angles 5 degrees and 355 degrees are seperated only by 10 degrees, but their arithmetic mean is 180 degrees -w ay off, it should be 0 degrees!</p>
<p>Luckily, <a title="Local trend Statistics for Direction Data" href="http://leicester.academia.edu/ChrisBrunsdon/Papers/534394/Local_trend_statistics_for_directional_data--A_moving_window_approach">Brunsdon and Charlton</a> have published on this very subject, so I took it upon myself to implement a weighted circular mean function in Python. The key obstacle was learning about complex numbers, about which, up until this point, I had no idea about at all!</p>
<p>The first thing to do is calculate the angle between a set of candidate points (such as people) and a set of services (such as Medical Centres). This is simple enough to do using, and would look something like:</p>
<pre>import math</pre>
<pre>math.atan2((y2-y1),(x2-x1))</pre>
<p>In which the pair (x1,y1) is the location of the candidate point, and (x2,y2) the location of the allocated service for that candidate point. The line linking these two points defines a flow from a candidate point, to a servcie and vice versa.</p>
<p>Having calculated all of the angles, I used ArcGIS to create an output grid, at the extent of the candidate points, using the &#8220;fishnet&#8221; function which creates a vector grid of prespecified dimensions.</p>
<p>The beauty of Brunsdon and Charlton&#8217;s method is that it uses a local method of approximation, this means that for each cell in the output grid, a mean direction can be calculated based upon the values of nearby points, applying a weighting allows for more distance points to have less of an effect on the mean direction.</p>
<p>Firstly, I read all the candidate points into a KDTree structure, this allows me to search for local points, at the same time I also create an array of the angles for those candidate points.</p>
<pre>from scipy.spatial import cKDTree
import numpy as np

tree = cKDTree(treepoints)
res, idx = tree.query(testpoint,300000,0,2,100)
res = res[0][np.where(res[0] &lt; np.Inf)[0]]
idx = idx[0][:len(res)]</pre>
<p>The tree takes a numpy array of coordinate pairs, and the query method returns an array of distances to points (res) and their index value in the original array of coordinates (idx). The testpoint is a cell in the vector grid; 300000 is the k-number of nearest neighbours to find, here I have simply set it arbitrarily high in the context of my dataset; 0 is for approximate nearest neighbours, here I&#8217;ve specified exact; 2 indicates the use of euclidian distance; and 100 is the threshold, neighbours won&#8217;t be returned if they are further than 100 metres away. The penultimate line simply returns an array that is shortened to just those values which are less than 100m away (i.e. less than infinity) &#8211; points over 100m away are returned as value Inf.</p>
<p>The next step is to actually compute the mean direction, this requires a special approach using complex numbers however. Brunsdon and Charlton show that a direction can be stated as a complex number <em>z</em> in which <em>z = exp(iθ)</em> this is effectively: <em>z = cos(θ) + i sin(θ)  </em>in which <em>i</em> is an imaginary number. We can restate our directions in Python using:</p>
<pre>import cmath

thetas = angles[idx]
cThetas = []
for i in xrange(0,len(thetas)):
    cThetas.append(complex(np.cos(thetas[i]),np.sin(thetas[i])))
cThetas = np.array(cThetas)</pre>
<p>Here, the complex function allows the complex number representing an angle to be stored in a list, which I convert (lazily) to a numpy array. The first term, thetas, is using the idx array from the cKDTree to cleverly index the relevant angle records from the angles array which stores all the angle values in the order of entries for the cKDTree.</p>
<p>Next a temporary variable is created which calculates the mean direction:</p>
<pre>temp = np.sum(cThetas)/np.absolute(np.sum(cThetas))
MeanDir = np.angle(temp, deg = True)</pre>
<p>The mean direction is given by the argument (Arg) of the resultant complex number, Python implements this with the np.angle function, where deg = True returns the angle in degrees, and False in radians.</p>
<p>So far this is the unweighted mean, aggregating directional observations within a 100m disk (see also: uniform disk smoothing). To introduce weighting we must first define a weighting scheme, I&#8217;ve used the one suggested by Brunsdon and Charlton, which is Gaussian, and might look at bit like this:</p>
<pre>def gaussW(dists,band):
    out = np.zeros(dists.shape)
    for i in xrange(0,len(out)):
        temp = np.power(dists[i],2)/(2.0*np.power(float(band),2))
        out[i] = np.exp(-1.0 * temp)
    return out

weight = gaussW(res,100)</pre>
<p>Quite simply, I pass the distance array res to the gaussW function and it gives me back an array of weights for that ordering of distances. Using this I can redo the mean direction thus:</p>
<pre>temp = np.sum(weight*cThetas)/np.absolute(np.sum(weight*cThetas))
MeanDir = np.angle(temp, deg = True)</pre>
<p>There you have it! Attached is the script I used. Obviously, Brunsdon and Charlton implement a variance and a couple of visualisation devices, but these should be simple enough to implement now!</p>
<p>I created an output for flows of patients to GPs in Southwark, visualised using one of ESRI&#8217;s circular/direction colour ramps from <a title="Mapping Resources" href="http://mappingcenter.esri.com/index.cfm?fa=arcgisResources.gateway">colour ramp pack 2</a>. Not sure how best to visualise the legend at this point though. NB. 90 is north, -90 is South, 0/-0 is East and 180/-180 is West. The map is visualised to show the 4 cardinal directions, but the output is in fact continuous.</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2011/08/MeanDirectionFlows.png"><img class="aligncenter size-large wp-image-538" src="http://danieljlewis.org/files/2011/08/MeanDirectionFlows-724x1024.png" alt="" width="434" height="614" /></a>My example script is <a href="http://danieljlewis.org/files/2011/08/meanDirection.txt">here. </a> Note that I am using dbfpy to read and write to shapefile DBF tables directly.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2011/08/31/weighted-mean-direction-surfaces-in-python/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Spatial Approach to Location Quotients</title>
		<link>http://danieljlewis.org/2011/06/17/a-spatial-approach-to-location-quotients-2/</link>
		<comments>http://danieljlewis.org/2011/06/17/a-spatial-approach-to-location-quotients-2/#comments</comments>
		<pubDate>Fri, 17 Jun 2011 14:46:21 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Geography]]></category>
		<category><![CDATA[GIS]]></category>
		<category><![CDATA[Representation]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[density]]></category>
		<category><![CDATA[KDE]]></category>
		<category><![CDATA[Location Quotient]]></category>

		<guid isPermaLink="false">http://danieljlewis.org.blogs.splintdev.geog.ucl.ac.uk/?p=529</guid>
		<description><![CDATA[The intent of this post is not simply to uncover where the highest density of people belonging to a particular ethnic group are, but rather to use the ‘location quotient’ (LQ) technique to compare the ethnic density in any one area to the overall ethnic density in Southwark, thus providing a relative insight into where [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2011%2F06%2F17%2Fa-spatial-approach-to-location-quotients-2%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2011%2F06%2F17%2Fa-spatial-approach-to-location-quotients-2%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>The intent of this post is not simply to uncover where the highest density of people belonging to a particular ethnic group are, but rather to use the ‘location quotient’ (LQ) technique to compare the ethnic density in any one area to the overall ethnic density in Southwark, thus providing a relative insight into where the density of particular groups is more, less or as dense as expected.</p>
<p>Location Quotients tend to work with areal units, characterising different areas subject to a larger region and providing a basic insight into where functions are clustered. Because the Southwark patient register data is address geocoded, we would be losing some spatial information if we choose to aggregate the data, not to mention the question of which areal aggregation is best. More info on how to create location quotients <a title="Wikipedia with LQs" href="http://en.wikipedia.org/wiki/Economic_base_analysis">here</a>.</p>
<p>A Location Quotient has 3 possible interpretations; if it is around 1 then the ethnic population in that area is at the level we would expect given what we observe nationally. If the LQ is less than 1 then that area has a lesser population of a particular ethnic group that what we would expect based upon national figures. Finally, in the LQ value is over 1 this suggests a concentration of the ethnic group in the area which is greater than we would expect given nationally observed levels. A LQ is quite simply a rate-ratio.</p>
<p>Instead of the standrad areal approach, the maps here use a density estimation approach in which disaggregate point data is transformed into a representation of the continuous density function of the point distribution. The LQ can then be computed for each cell based on the density of that cell with respect to the total density of the surface. This creates a smoothed LQ surface which is readily interpretable in the same manner as above. The Kernel Density Estimation used to create the ethnic and total population density surfaces should be parameterised in the same way; these examples use a 250m bandwidth and a 25m cells size, which is largely empirically redundant, based on the input dataset’s spatial resolution, but creates a more aesthetically appealing mapped representation. Naturally, the procedure works well for clustered data, in Southwarks case for the African and Muslim groups.</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2011/06/AfricanLQ.png"><img class="aligncenter size-large wp-image-530" src="http://danieljlewis.org/files/2011/06/AfricanLQ-724x1024.png" alt="" width="463" height="655" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2011/06/17/a-spatial-approach-to-location-quotients-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Mapping Spatial Entropy in Southwark</title>
		<link>http://danieljlewis.org/2011/02/03/mapping-spatial-entropy-in-southwark/</link>
		<comments>http://danieljlewis.org/2011/02/03/mapping-spatial-entropy-in-southwark/#comments</comments>
		<pubDate>Thu, 03 Feb 2011 01:32:16 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[GIS]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[ethnicity]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[rasters]]></category>
		<category><![CDATA[segreagtion]]></category>
		<category><![CDATA[spatially weighted entropy]]></category>

		<guid isPermaLink="false">http://danieljlewis.org.blogs.splintdev.geog.ucl.ac.uk/?p=496</guid>
		<description><![CDATA[I&#8217;ve been doing a bit of work recently on segregation with Pablo Mateos, and having gone through the motions with aspatial indices of segregation (the classics): dissimilarity, exposure and so on, I decided to investigate the more explicitly spatial ones. Taking a lead from Reardon and O&#8217;Sullivan&#8217;s (2004) paper &#8220;Measures of Spatial Segregation&#8221; in sociological methodology, I [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2011%2F02%2F03%2Fmapping-spatial-entropy-in-southwark%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2011%2F02%2F03%2Fmapping-spatial-entropy-in-southwark%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I&#8217;ve been doing a bit of work recently on segregation with <a title="Pablo Mateos" href="http://www.geog.ucl.ac.uk/about-the-department/people/academics/pablo-mateos" target="_blank">Pablo Mateos</a>, and having gone through the motions with aspatial indices of segregation (the classics): dissimilarity, exposure and so on, I decided to investigate the more explicitly spatial ones. Taking a lead from Reardon and O&#8217;Sullivan&#8217;s (2004) paper &#8220;Measures of Spatial Segregation&#8221; in <em>sociological methodology</em>, I got in touch with <a title="David O'Sullivan" href="http://web.env.auckland.ac.nz/people_profiles/osullivan_d/" target="_blank">David O&#8217;Sullivan </a>and he, and his student Seong-Yun Hong, helped me with the implementation of some spatial measures of segregation. This post specifically concerns spatially weighted entropy &#8211; a measure of population diversity. Reardon and O&#8217;Sullivan define spatially weighted entropy as:</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2011/02/SpatiallyWeightedEntropy.gif"><img class="aligncenter size-full wp-image-497" src="http://danieljlewis.org/files/2011/02/SpatiallyWeightedEntropy.gif" alt="" width="350" height="94" /></a></p>
<p style="text-align: left">This equation describes the &#8216;entropy&#8217;, derived from Shannon&#8217;s information theory, for each grid cell in the image (below) in which each cell value results from the entropy computed for a 1km &#8216;neighbourhood&#8217; <em>p </em>around each cell (essentially a circular buffer)<em>. </em>The ethnic group in question is given by &#8216;m&#8217; (with the pi representing the proportion of a given group in a given neighbourhood) and relates to ethnic groups defined from the Southwark patient register using Onomap, the groups defined are: African, East Asian and Pacific, European, Muslim, South Asian, British, Eastern European, Hispanic, and Unclassified or Other. The Onomap software is able to apply this classification by looking at the forename and surname combination of patients registered to use Southwark GPs, or patients living in Southwark but using GPs outside of Southwark. The cells in the image relate directly to the residential locations of patients, who were geocoded to their household using the Ordnance Survey&#8217;s Address Layer 2 product, therefore empty cells are areas within which no recorded patients were found, such as parks, and transport infrastructure. As the data underlying this is from patient registrations with GPs, we have to accept that the data is likely to be partial, with potentially systematic biases in those people who have registered &#8211; young men and people from countries where GPs as a method of primary care do not exist- may have been omitted.</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2011/02/SwkEntropyMap.png"><img class="aligncenter size-large wp-image-500" src="http://danieljlewis.org/files/2011/02/SwkEntropyMap-791x1024.png" alt="" width="475" height="614" /></a></p>
<p>In the image, higher values of entropy indicate greater diversity of population by ethnic group, the resultant images is unsurprising in terms of Southwark, with the Dulwich Village area showing as the least diverse place, home as it is to more affluent, generally &#8216;British&#8217; groups. Likewise historical factors regarding access to housing have shaped the lower entropy scores in the middle of the borough &#8211; home to African populations and the North East, home to the British working classes who were rehoused from the now more African areas in the middle of the borough. Finally, the greater Waterloo- Elephant and Castle region in the north-west shows up as the ethnic melting pot in the borough.</p>
<p>In the image above, the 1km neighbourhood defined in the spatially weighted entropy score has a smoothing effect, I experimented with smaller values for the neighburhood size, and found that the resultant output did not change dramatically from that obtained above. At the end of the day, the selection of neighbourhood size is largely arbitrary and will depend on sociocultural factors of the area and it&#8217;s people. Similarly, as there is no data for the regions outside of Southwark we are more uncertain of the values at the edges than in the middle of the borough as we are only sampling from within Southwark itself. Nonetheless, this representation of Southwark goes somewhat beyond what is possible using the commonly used output zones defined by the census.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2011/02/03/mapping-spatial-entropy-in-southwark/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hospital Outpatients in Southwark 08/09</title>
		<link>http://danieljlewis.org/2010/07/16/hospital-outpatients-in-southwark-0809/</link>
		<comments>http://danieljlewis.org/2010/07/16/hospital-outpatients-in-southwark-0809/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 17:44:09 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Health Geography]]></category>
		<category><![CDATA[Health GIS]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[admissions]]></category>
		<category><![CDATA[HES]]></category>
		<category><![CDATA[ONS]]></category>
		<category><![CDATA[population]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=380</guid>
		<description><![CDATA[Amongst other things, I&#8217;m beginning to tap into a data source I have acquired for my research known as Hospital Episode Statistics (HES). These are datasets which record the particulars of hospital service by patients. Generally they have a bit of a learning curve, and require the gathering of several additional datasets in order to [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F07%2F16%2Fhospital-outpatients-in-southwark-0809%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F07%2F16%2Fhospital-outpatients-in-southwark-0809%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Amongst other things, I&#8217;m beginning to tap into a data source I have acquired for my research known as Hospital Episode Statistics (HES). These are datasets which record the particulars of hospital service by patients. Generally they have a bit of a learning curve, and require the gathering of several additional datasets in order to make them useful. Having gathered all this data and put in all within a MySQL database I decided to conduct a basic analysis, using my study site of Southwark as a guinea pig. Essentially I wanted to known whether more people from Southwark were using hospitals of outpatient appointments than we would expect from national (England) figures. There are many reasons why any given area might be using health care services at a greater or lesser rate than other areas, but for the moment I simply wanted to see whether there was any interesting patterns.</p>
<p>In the HES data it is simple to calculate the total number of people using outpatient care, what is more complex is deriving an expected score from the national data. I went about it in the following way:</p>
<p>Firstly, I took the ONS experimental population projections from mid-2008 and calculated the number of people in each Southwark LSOA, and at the national (England) level, for each of the available age bands by men and women. The population projection age bands are quite coarse, giving totals for 5 population groups: 0-15, 16-29, 30-44, 45-64 (for men) or 45-59 (for women) and 65+ (for men) and 60+ for women. This isn&#8217;t ideal, but the age bands do roughly correlate with the different groups of mortality causes in the Grim Reaper&#8217;s road map (Shaw, Thomas, Smith and Dorling, 2008). Then I calculated the admission totals for all of the age-sex bands nationally (England), with this I could create a ratio of admissions against popualtion nationally. By applying this ratio to the Southwark LSOA population projects I could create an expected value for number of admissions per areas. Finally it is simply a case of dividing the observed admissions by the expected and multipling by 100 to get a score.</p>
<p>I mapped the results as follows, a score of 100 suggests that the area is not different from the national picture, whereas a value higher than 100 suggests that the area has more people using hospitals than we would expect and a value lower than 100 suggests the converse.</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/07/Outpatient0809a.jpg"><img class="aligncenter size-large wp-image-384" title="Outpatient0809a" src="http://danieljlewis.org/files/2010/07/Outpatient0809a-724x1024.jpg" alt="" width="579" height="819" /></a>In the case of Southwark, the pattern seems to follow those that are often observed in my work on Southwark, in that the Bankside areas, and the southern part of the borough, in addition with the north-eastern former docklands area have levels of admissions that are equivilant too, or lower than what we would expect nationally, whereas the central areas have admission numbers higher than the national level.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/07/16/hospital-outpatients-in-southwark-0809/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Distribution of Household Occupancy in Southwark</title>
		<link>http://danieljlewis.org/2010/06/09/distribution-of-household-occupancy-in-southwark/</link>
		<comments>http://danieljlewis.org/2010/06/09/distribution-of-household-occupancy-in-southwark/#comments</comments>
		<pubDate>Wed, 09 Jun 2010 14:19:05 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Geography]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[distribution]]></category>
		<category><![CDATA[exponential decay]]></category>
		<category><![CDATA[households]]></category>
		<category><![CDATA[log]]></category>
		<category><![CDATA[social]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=355</guid>
		<description><![CDATA[I&#8217;ve been doing some more analysis on the Southwark GP patient register at the household level. After a fair amount of cleaning and interpretation I&#8217;ve arrived at the following distribution of households. There are a number of interesting things to say about this data, not least in the section that I&#8217;ve marked &#8216;larger social groupings&#8217; [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F06%2F09%2Fdistribution-of-household-occupancy-in-southwark%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F06%2F09%2Fdistribution-of-household-occupancy-in-southwark%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I&#8217;ve been doing some more analysis on the Southwark GP patient register at the household level. After a fair amount of cleaning and interpretation I&#8217;ve arrived at the following distribution of households.</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/06/HHDistAnnotate.png"><img class="aligncenter size-full wp-image-356" title="HHDistAnnotate" src="http://danieljlewis.org/files/2010/06/HHDistAnnotate.png" alt="" width="578" height="380" /></a></p>
<p style="text-align: left">There are a number of interesting things to say about this data, not least in the section that I&#8217;ve marked &#8216;larger social groupings&#8217; as it seems to suggest a possible migrant social network effect, as the larger household groupings tend to be of minority ethnic groups, including Nigerians and other Africans, Hispanics and South-East Asians who are perhaps using cross-country social ties as help in getting established when first arriving in the UK. However, visually the shape of the distribution of household occupancy is very distinctive, and actually is very close to an exponential. Here I&#8217;ve taken the log of frequency of occurence and plotted the best-fit line through the plot:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/06/LogHHDist.png"><img class="aligncenter size-large wp-image-358" title="LogHHDist" src="http://danieljlewis.org/files/2010/06/LogHHDist-1024x682.png" alt="" width="574" height="382" /></a>This linear trend means that the model <strong>log(y) = -0.1635x + 4.602 </strong>is a good predictor of the number of Households we can expect to exist in Southwark for a given value of x, or occupancy.</p>
<p style="text-align: left">It is not entirely clear however why this situation is the case. Firstly, it may just be an artifact of the data, either of the matching process that has occured between the patient register and OS AddressLayer2, the way that GPs encode patient addresses in the first place, or the fact that the patient register is only a sample of the total population of Southwark, i.e. those people who register with a doctor. Secondly, it may simply be a reflection of the structure of the built environment in Southwark &#8211; i.e. what kind of housing is actually available. However, the distribution is also subject to the choices of individuals or groups.</p>
<p style="text-align: left">Currently, I am in the process of dissagregating the above characteristics and looking at trends by different population groups.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/06/09/distribution-of-household-occupancy-in-southwark/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8216;Locally led&#8217; NHS Service changes dubious</title>
		<link>http://danieljlewis.org/2010/06/01/locally-led-nhs-service-changes-dubious/</link>
		<comments>http://danieljlewis.org/2010/06/01/locally-led-nhs-service-changes-dubious/#comments</comments>
		<pubDate>Tue, 01 Jun 2010 14:17:28 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Health Geography]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[community]]></category>
		<category><![CDATA[health]]></category>
		<category><![CDATA[Lansley]]></category>
		<category><![CDATA[local]]></category>
		<category><![CDATA[provision]]></category>
		<category><![CDATA[service]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=330</guid>
		<description><![CDATA[Since coming to government, new Conservative Health Secretary Andrew Lansley has sought to fulfil the pledge he made to put an end to local restructurings of NHS service delivery by authorities higher up the NHS hierarchy. Ostensibly he believes that local decision-making will have a better overall effect on the quality of outcomes for patients [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F06%2F01%2Flocally-led-nhs-service-changes-dubious%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F06%2F01%2Flocally-led-nhs-service-changes-dubious%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Since coming to government, new Conservative Health Secretary Andrew Lansley has sought to fulfil the pledge he made to put an end to local restructurings of NHS service delivery by authorities higher up the NHS hierarchy. Ostensibly he believes that local decision-making will have a better overall effect on the quality of outcomes for patients and hence lead to a better health service. Specifically he wants to provide GPs with an opportunity to work with community leaders and their local authorities to steer local services. The core elements actually do not differ greatly from the outgoing Labour policies, particularly with respect to patient choice; however I will argue that there is a clear danger in engaging to too great an extent with a purely &#8216;local&#8217; approach, in general there seems to be something of a misconception in Government, particularly in the provision of local services (i.e. schools), that local approaches are somehow &#8216;better&#8217;.</p>
<p>Firstly, let us consider something that the Government seems to do without fail, something that I, as a Geographer, find to be a grave sin of omission. That is the apparently indiscriminate use of spatial qualifiers without so much of an explanation as to their meaning. The use of &#8216;local&#8217; and &#8216;community&#8217; are spectacularly misleading without qualification, and yet they are often used because people seems to think they understand what is meant by them &#8211; everyone considers themselves part of a community, and local to a service &#8211; but will these personal feelings about their socio-spatial connections actually translate to the ability to input on healthcare decision making? My investigation of access and registeration of patients to GPs in Southwark has shown that a) primary care is a very location based service and without fail each doctor exhibits a characteristic distance decay function that describes the pattern of registration with a GP suggest to some socio-economic criteria, but also that b) patients overlap to a large extent in a densely-populated urban context, the suggestion being that activity-spaces (i.e. retail areas, workplace and schools) has a distorting effect on patterns of registration for some people. To this end I suggest that a &#8216;community&#8217; can be defined independently for individual GPs based upon the patterns of patient uptake unique to that service, although there may be some strong correlations with residential, workplace, educational etc. communities that overlap it (of course for some GPs the profile of its registered community may be greatly divergent from its observed local (defined by proximity to a GP) community). The following map is an example of this kind of complexity:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/06/GPRegSwk.jpg"><img class="aligncenter size-full wp-image-332" title="GPRegSwk" src="http://danieljlewis.org/files/2010/06/GPRegSwk.jpg" alt="" width="420" height="705" /></a></p>
<p style="text-align: left">Here it is clear that any definition of locality or community based upon an arbitrary areal basis yields groups of people who could be registered to as many as 29 different Southwark GPs in only a very small area. This is in fact a very good, simple, illustration of patient choice in action. There are a lot of questions to ask Mr Lansley about how he views &#8216;local&#8217; or &#8216;community&#8217;, and whether he is willing to enshrine that definiton in policy before we actually consent to doing anything with provision of services.</p>
<p style="text-align: left">Further still, I have claimed that GPs are very much location based services &#8211; they are, over a certain distance (in Southwark this is about 6 -10km) no one is registered with a GP, choosing instead a closer service. In many ways this was constrained by the pre-existing system of &#8216;catchment areas&#8217;, however these were set to be removed by the end of the year in the quest for patient choice, thus the potential for registration is opened up to people using doctors near their place of work (for instance) rather than than near their home, thus should these people have a say in provison of services in the area within which they do not live &#8211; they are part of the GP&#8217;s &#8216;community&#8217; but not of the residential one. A good illustration of this  is actually the polyclinic system &#8211; Southwark is geared up to introduce 3 polyclinics &#8211; one which already exists as a large GP-led health centre in the centr eof the borough, and two in the north connected to hospitals, the biggest difficulty faced at the moment is in estimating the daytime population (i.e. transient workforce) of the Southbank in order to account for likely polyclinic usage &#8211; a huge number of people who do not live in Southwark but will likely have some part of their healthcare provided for by Southwark PCT.</p>
<p style="text-align: left">It is also unclear what Mr Lansley refers to when he talks about &#8216;top-down&#8217;: is it the Strategic health authorities and the DoH itself? It cannot be the PCTs as Mr Lansley claims that the new criteria will have the support of &#8216;GP commissioners&#8217; and it is the PCTs that actually do the commissioning, further the idea of GPs working with local authorities is largely the same of GPs working with PCTs now, as PCTs and LAs are generally coterminous.</p>
<p style="text-align: left">Whilst it is pleasing to see a politician quoting the need for an evidence based appraoch to restructuring, it is unclear what evidence he might base GP quality on, the current payment method (QoF) is based on GP reporting of pre-specified target outcomes to a centralised authority, surely GPs will simply follow these directives in order to bring in as much money as possible. Indeed, these stats are strong recommended not to be used as measures of GP quality as they are by-and-large patchy in what they cover, and include little demographic data. Indeed, had the previous government not already cut the NHS IT initiative that would have made reporting of outcomes actually feasible nationally, the new government would have no doubt cut it anyway.</p>
<p style="text-align: left">The final worry I have is one of equity, something upon which the NHS is founded &#8211; the provision of a fair service contingent on those that need it, that is free at point of service. Surely such an atomistic approach to healthcare provision as Mr Lansley seems to specify, is liable to deepen the perceived &#8216;social gradient&#8217; in health care, as without a careful (top-down) hand, the GPs and communities best-equiped to play an active role in orchestrating GP services will get increasingly better provision: most likely to be the wealthier areas of the country. There needs to be at least some form of national accountability for a national health service.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/06/01/locally-led-nhs-service-changes-dubious/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Gridded Population of Southwark</title>
		<link>http://danieljlewis.org/2010/04/16/gridded-population-of-southwark/</link>
		<comments>http://danieljlewis.org/2010/04/16/gridded-population-of-southwark/#comments</comments>
		<pubDate>Fri, 16 Apr 2010 18:28:01 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Cartography]]></category>
		<category><![CDATA[Representation]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[grid]]></category>
		<category><![CDATA[Patient Register]]></category>
		<category><![CDATA[population]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=280</guid>
		<description><![CDATA[One of the best things about having address-geocoded an entire popualtion dataset it that you can finally get away from non-uniform areal representation (OAs, Postcodes) and present something that is uniformly disaggregate. Academics such as David Martin have long expounded the value of gridded representation of population data as it is regular and hence spatial [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F04%2F16%2Fgridded-population-of-southwark%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F04%2F16%2Fgridded-population-of-southwark%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>One of the best things about having address-geocoded an entire popualtion dataset it that you can finally get away from non-uniform areal representation (OAs, Postcodes) and present something that is uniformly disaggregate. Academics such as <a title="Prof David Martin" href="http://www.soton.ac.uk/geography/staff_profiles/academic/djm1.html" target="_blank">David Martin</a> have long expounded the value of gridded representation of population data as it is regular and hence spatial unbiased. In fact the<a title="Population 24/7" href="http://www.soton.ac.uk/geography/research/phew/pop247/index.html" target="_blank"> current work</a> his group are doing is really interesting stuff, looking at daytime (as opposed to residential) population.</p>
<p>Anyhow, using the address-geocoded patient register for Southwark I was able to create a population density visualisation on a 100m x 100m grid that still preserves patient anonymity to an appropriate level. Of course there are some issues with Patient Registers, notably that they are only complete for people that register with a GP. Nonetheless, they provides a uniquely fine grain view of population in Southwark without resorting to the statistical uncertaintyof a smoothing surface-based density estimation, or an irregular, space-filling administrative/postal areal unit solution.</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/04/GridPopn.jpg"><img class="aligncenter size-large wp-image-281" title="GridPopn" src="http://danieljlewis.org/files/2010/04/GridPopn-724x1023.jpg" alt="" width="579" height="818" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/04/16/gridded-population-of-southwark/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Some Surname-based Rank-Size thoughts</title>
		<link>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/</link>
		<comments>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/#comments</comments>
		<pubDate>Fri, 05 Mar 2010 14:24:27 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Southwark]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[log]]></category>
		<category><![CDATA[power law]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[rank-size]]></category>
		<category><![CDATA[surnames]]></category>
		<category><![CDATA[zipt]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=249</guid>
		<description><![CDATA[Yesterday Professor Mike Batty introduced me to the rank-size rule, an idea popularised by George Kingsley Zipf as the relationship between the frequency of an observed phenomenon against the phenomenon&#8217;s rank in its group. This is best exemplified by the example of city sizes: essentially Zipf shows that for every really large city, there exist [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F05%2Fsome-surname-based-rank-size-thoughts%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F05%2Fsome-surname-based-rank-size-thoughts%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Yesterday <a title="Mike Batty" href="http://www.casa.ucl.ac.uk/people/MikesPage.htm" target="_blank">Professor Mike Batty</a> introduced me to the rank-size rule, an idea popularised by <a title="Zipf - Wikipedia" href="http://en.wikipedia.org/wiki/George_Kingsley_Zipf" target="_blank">George Kingsley Zipf </a>as the relationship between the frequency of an observed phenomenon against the phenomenon&#8217;s rank in its group. This is best exemplified by the example of city sizes: essentially Zipf shows that for every really large city, there exist many smaller ones; however these smaller cities aren&#8217;t just a bit smaller than the large city, they are considerably smaller, in fact the difference in city size from the biggest cities to the smallest can be explained by a power law, this can be represented as:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/03/CodeCogsEqn2.gif"><img class="aligncenter size-full wp-image-250" title="CodeCogsEqn(2)" src="http://danieljlewis.org/files/2010/03/CodeCogsEqn2.gif" alt="" width="85" height="49" /></a></p>
<p style="text-align: left">Where Pn is the frequency of occurance of a phenomenon ranked nth, and the exponent <em>alpha </em>is usually roughly equal to 1.</p>
<p style="text-align: left">The power law thus produces a plot where the 2nd item is 1/2 the size of the 1st, the 3rd item is a 1/3 the size of the 1st etc. This can be represented by a plot of surname frequency in Southwark by rank.</p>
<div id="attachment_251" class="wp-caption aligncenter" style="width: 548px"><a href="http://danieljlewis.org/files/2010/03/Rplot3.png"><img class="size-full wp-image-251" title="Rplot3" src="http://danieljlewis.org/files/2010/03/Rplot3.png" alt="" width="538" height="537" /></a><p class="wp-caption-text">Plot of Surname Frequency against Rank in Southwark for all observed surname (using R)</p></div>
<p style="text-align: left">It is clear from the graph that there are very few surnames which are popular and many which are relatively unique. Another interesting characteristic of a power law, such as the relationship between surname frequency and rank are self similar: if we examine any portion of the curve we should get the same curve, albeit at a different scale.</p>
<p style="text-align: left">
<div id="attachment_255" class="wp-caption aligncenter" style="width: 548px"><a href="http://danieljlewis.org/files/2010/03/RPlot5.png"><img class="size-full wp-image-255 " title="RPlot5" src="http://danieljlewis.org/files/2010/03/RPlot5.png" alt="" width="538" height="537" /></a><p class="wp-caption-text">Plot of Surname frequency for Rank 300 - 6000</p></div>
<p style="text-align: left">It is clear from the above graph that a subset of the full data gives a power law relationship. We can attempt to linearise this relationship by taking the log of the frequency and rank:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/03/Rplot1.png"><img class="aligncenter size-full wp-image-256" title="Rplot1" src="http://danieljlewis.org/files/2010/03/Rplot1.png" alt="" width="538" height="537" /></a>The fact that the line is not straight indicates that the relationship is not a true power law. The long tail is accentuated by the stepped line, frequencies are integers so when we get to increasingly unique surnames the ranks tend to cluster. In the rank-size distribution of cities, the characteristic fall in the long tail when linearised like this indicates that city size distributions are really log-normal, however this is not the case in terms of surnames. If we exclude some of the long tail, the relationship can look a bit more linear as this plot demonstrates:</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/03/Rplot2.png"><img class="aligncenter size-full wp-image-257" title="Rplot2" src="http://danieljlewis.org/files/2010/03/Rplot2.png" alt="" width="538" height="537" /></a></p>
<p style="text-align: left">
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analysis of Surnames from Southwark Patient Register</title>
		<link>http://danieljlewis.org/2010/03/03/analysis-of-surnames-from-southwark-patient-register/</link>
		<comments>http://danieljlewis.org/2010/03/03/analysis-of-surnames-from-southwark-patient-register/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 15:32:51 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Southwark]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[James Cheshire]]></category>
		<category><![CDATA[population]]></category>
		<category><![CDATA[surnames]]></category>
		<category><![CDATA[top 20]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=243</guid>
		<description><![CDATA[My colleague James Cheshire&#8217;s research deals with understanding and classifying spatial patterns in surnames. He has been able to show, through various techniques, that there exists in the UK a regional geography of surnames. This in mind, I thought I&#8217;d interogate my database of NHS patient registrations for Southwark and see what was going on [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F03%2Fanalysis-of-surnames-from-southwark-patient-register%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F03%2Fanalysis-of-surnames-from-southwark-patient-register%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>My colleague <a title="JC's Blog" href="http://spatialanalysis.co.uk/" target="_blank">James Cheshire&#8217;s</a> research deals with understanding and classifying spatial patterns in surnames. He has been able to show, through various techniques, that there exists in the UK a regional geography of surnames. This in mind, I thought I&#8217;d interogate my database of NHS patient registrations for Southwark and see what was going on in surname terms there. This first table shows the top 20 most popular surnames in Southwark, ranked by occurance.</p>
<div id="attachment_247" class="wp-caption aligncenter" style="width: 430px"><a href="http://danieljlewis.org/files/2010/03/Top20namesSouthwark.png"><img class="size-full wp-image-247" title="Top20namesSouthwark" src="http://danieljlewis.org/files/2010/03/Top20namesSouthwark.png" alt="" width="420" height="421" /></a><p class="wp-caption-text">Figure 1: Top 20 Surnames in Southwark, by occurance.</p></div>
<p>Unsurprisingly perhaps, the top places are dominated by surnames native to the UK, classically Smith, Williams, Jones etc. However, in line with Southwark&#8217;s reputation as a diverse borough and in light of it&#8217;s high inmigration figures, it is also clear that of these top 20 surnames some of them would be connected to inmigrant names: Kamara, Ahmed, Ali, Patel and Khan are all surnames that are increasingly associated with a previous period of migration to the UK. Interestingly the Vietnamese population is very small, less than 1% of the population of Southwark, but around 23% of these have the surname &#8216;Nguyen&#8217;. The ethnicity of the surnames is derived from <a title="Onomap" href="http://www.onomap.org/" target="_blank">Onomap</a>.</p>
<p>The frequency distribution of Southwark surnames looks like this:</p>
<div id="attachment_246" class="wp-caption aligncenter" style="width: 584px"><a href="http://danieljlewis.org/files/2010/03/SurnameFreq.png"><img class="size-large wp-image-246" title="SurnameFreq" src="http://danieljlewis.org/files/2010/03/SurnameFreq-1024x416.png" alt="" width="574" height="233" /></a><p class="wp-caption-text">Figure 2: Surname Frequency Distribution for Southwark, 2009</p></div>
<p style="text-align: left">Note the characteristic long tail, there are a huge number of unique, or almost unique surnames, and considerably fewer surnames which are possessed by a large number of people. Such a distribution seems to obey a <a title="Wiki Power Law" href="http://en.wikipedia.org/wiki/Power_law" target="_blank">power law</a> of some sort.</p>
<p style="text-align: left">We can dig deeper into this phenomenon by looking at the number of surnames that comprise a given percentage of the population:</p>
<div id="attachment_245" class="wp-caption aligncenter" style="width: 530px"><a href="http://danieljlewis.org/files/2010/03/PopSurnametablegraph.png"><img class="size-full wp-image-245" title="PopSurnametablegraph" src="http://danieljlewis.org/files/2010/03/PopSurnametablegraph.png" alt="" width="520" height="213" /></a><p class="wp-caption-text">Figure 3: Surnames comprising given percentages of the Southwark Population</p></div>
<p style="text-align: left">As we can see from the above figure, only 56 names account for 10% of the Southwark Population, but that in total there are 88,124 distinct surnames in Southwark. Again there is a characteristic decay to the curve.</p>
<p style="text-align: left">Finally, let us consider just the charactersitics of the long-tail of the distribution:</p>
<div id="attachment_244" class="wp-caption aligncenter" style="width: 560px"><a href="http://danieljlewis.org/files/2010/03/longtailsurnamegraphtable.png"><img class="size-full wp-image-244" title="longtailsurnamegraphtable" src="http://danieljlewis.org/files/2010/03/longtailsurnamegraphtable.png" alt="" width="550" height="221" /></a><p class="wp-caption-text">Figure 4: Focus on the long-tail - percentage population for given surname frequencies.</p></div>
<p style="text-align: left">From figure 4 it is clear that almost 25% of the Southwark population have a surname that is share by fewer that 11 people, indeed just over 16% of the Southwark population have a surname unique to the Southwark patient register. The shape of the curve in figure 4 demonstrate the effect of the long tail seen in figure 2.</p>
<p style="text-align: left">For more information on surnames research check out <a title="JC's Blog" href="http://spatialanalysis.co.uk/" target="_blank">James Cheshire&#8217;s blog</a>, <a title="JC's WP 149" href="http://www.casa.ucl.ac.uk/publications/workingPaperDetail.asp?ID=149" target="_blank">working paper</a> or <a title="Pablo's WP 116" href="http://www.casa.ucl.ac.uk/publications/workingPaperDetail.asp?ID=116" target="_blank">Pablo Mateos&#8217; working paper</a>.</p>
<p style="text-align: left">
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/03/03/analysis-of-surnames-from-southwark-patient-register/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Southwark Households &#8211; A Preliminary</title>
		<link>http://danieljlewis.org/2010/02/12/southwark-households-a-preliminary/</link>
		<comments>http://danieljlewis.org/2010/02/12/southwark-households-a-preliminary/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 19:28:55 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Health GIS]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[PhD Work]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[address matching]]></category>
		<category><![CDATA[geocoding]]></category>
		<category><![CDATA[households]]></category>
		<category><![CDATA[occupancy]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=186</guid>
		<description><![CDATA[I&#8217;ve spent a chunk of time recently address geocoding the Southwark PCT patient register to Ordnance Survey Address Layer 2 data. What this means is that I can start identifying and (later) classifying households, this will allow me to ask questions about how different households approach healthcare. More broadly it allows me an insight into [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F02%2F12%2Fsouthwark-households-a-preliminary%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F02%2F12%2Fsouthwark-households-a-preliminary%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I&#8217;ve spent a chunk of time recently address geocoding the Southwark PCT patient register to Ordnance Survey Address Layer 2 data. What this means is that I can start identifying and (later) classifying households, this will allow me to ask questions about how different households approach healthcare. More broadly it allows me an insight into the demographic character of Southwark.</p>
<p>The data actually extends past the Southwark boundary as people in Lambeth, Lewisham, Bromley and Croyden do also to some extent use Southwark primary healthcare services (GPs) this means that although Southwark&#8217;s population is only c.300,000 the datset I&#8217;m using is for just over 340,000 people. There is some uncertainty in the data naturally, this results from the two datasets used; on the one hand addresses recorded in the Southwark patient register are not all necessarily complete, for example there is sometimes a failure to record which particular subdivision of a house someone lives in, or which flat in a larger block of social housing. On the other hand the AddressLayer2 data, although very rich, is not necessarily complete, this could be due to the prescence of unacknowledged subdivisions in residential housing, and although most social housing estates seem well documented, some commercial developments are not necessarily registered beyond the building level. Similarly, there are a number of instances of social institutions, such as the Salvation Army and St. Mungos, or marinas and dormitories having a single registered address for a high number of residents. This may have the effect of skewing the data slightly. With this in mind I created the following graph from the dataset of Number of households against number of inhabitants per household:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/02/households.png"><img class="aligncenter size-full wp-image-187" title="households" src="http://danieljlewis.org/files/2010/02/households.png" alt="" width="512" height="318" /></a>This shows that there is still a major trend for single-person households, but equally that around a quarter of all households are co-habited. The long tail in the graph (which i have truncated here) is caused by a few special cases, some examples of which are acknowledged in the previous paragraph. The average household size of 3.10 is itself higher than the <a title="Housing focus 2001 census" href="http://www.statistics.gov.uk/census2001/profiles/commentaries/housing.asp" target="_blank">UK average household sizes</a> reported after the 2001 census which was 2.36; at the time the borough of Newham in East London had the highest household occupancy rate at 2.64. Of course there are any number of reasons why these data are not comparable, to start with the census took place 8 years before the Southwark dataset was created, similarly the uncertainty in the Southwark dataset is higher as it was not created with the primary purpose that it be able to successfully locate all patients as more often than not patients go to the Doctor and not vice-versa, whereas the census is distributed at a household level to each individual. The Southwark dataset does also include particularly tranisient communities which are missed by the census, such as the homeless who don&#8217;t have a fixed address (and hence may be using shelter or hostel addresses) but still require medical treatment at times.</p>
<p style="text-align: left">Nevertheless, an interesting first look. The next steps will involve evaluating and validating the dataset to the best of my ability and then moving on to look at ways of examining and classifying household structure.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/02/12/southwark-households-a-preliminary/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

