<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Volunteered Geographic Information &#187; log</title>
	<atom:link href="http://danieljlewis.org/tag/log/feed/" rel="self" type="application/rss+xml" />
	<link>http://danieljlewis.org</link>
	<description>A Geography/GIS blog by Daniel J Lewis</description>
	<lastBuildDate>Tue, 20 Dec 2011 17:15:30 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4-alpha-20124</generator>
		<item>
		<title>Distribution of Household Occupancy in Southwark</title>
		<link>http://danieljlewis.org/2010/06/09/distribution-of-household-occupancy-in-southwark/</link>
		<comments>http://danieljlewis.org/2010/06/09/distribution-of-household-occupancy-in-southwark/#comments</comments>
		<pubDate>Wed, 09 Jun 2010 14:19:05 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Geography]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[distribution]]></category>
		<category><![CDATA[exponential decay]]></category>
		<category><![CDATA[households]]></category>
		<category><![CDATA[log]]></category>
		<category><![CDATA[social]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=355</guid>
		<description><![CDATA[I&#8217;ve been doing some more analysis on the Southwark GP patient register at the household level. After a fair amount of cleaning and interpretation I&#8217;ve arrived at the following distribution of households. There are a number of interesting things to say about this data, not least in the section that I&#8217;ve marked &#8216;larger social groupings&#8217; [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F06%2F09%2Fdistribution-of-household-occupancy-in-southwark%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F06%2F09%2Fdistribution-of-household-occupancy-in-southwark%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I&#8217;ve been doing some more analysis on the Southwark GP patient register at the household level. After a fair amount of cleaning and interpretation I&#8217;ve arrived at the following distribution of households.</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/06/HHDistAnnotate.png"><img class="aligncenter size-full wp-image-356" title="HHDistAnnotate" src="http://danieljlewis.org/files/2010/06/HHDistAnnotate.png" alt="" width="578" height="380" /></a></p>
<p style="text-align: left">There are a number of interesting things to say about this data, not least in the section that I&#8217;ve marked &#8216;larger social groupings&#8217; as it seems to suggest a possible migrant social network effect, as the larger household groupings tend to be of minority ethnic groups, including Nigerians and other Africans, Hispanics and South-East Asians who are perhaps using cross-country social ties as help in getting established when first arriving in the UK. However, visually the shape of the distribution of household occupancy is very distinctive, and actually is very close to an exponential. Here I&#8217;ve taken the log of frequency of occurence and plotted the best-fit line through the plot:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/06/LogHHDist.png"><img class="aligncenter size-large wp-image-358" title="LogHHDist" src="http://danieljlewis.org/files/2010/06/LogHHDist-1024x682.png" alt="" width="574" height="382" /></a>This linear trend means that the model <strong>log(y) = -0.1635x + 4.602 </strong>is a good predictor of the number of Households we can expect to exist in Southwark for a given value of x, or occupancy.</p>
<p style="text-align: left">It is not entirely clear however why this situation is the case. Firstly, it may just be an artifact of the data, either of the matching process that has occured between the patient register and OS AddressLayer2, the way that GPs encode patient addresses in the first place, or the fact that the patient register is only a sample of the total population of Southwark, i.e. those people who register with a doctor. Secondly, it may simply be a reflection of the structure of the built environment in Southwark &#8211; i.e. what kind of housing is actually available. However, the distribution is also subject to the choices of individuals or groups.</p>
<p style="text-align: left">Currently, I am in the process of dissagregating the above characteristics and looking at trends by different population groups.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/06/09/distribution-of-household-occupancy-in-southwark/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Some Surname-based Rank-Size thoughts</title>
		<link>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/</link>
		<comments>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/#comments</comments>
		<pubDate>Fri, 05 Mar 2010 14:24:27 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Southwark]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[log]]></category>
		<category><![CDATA[power law]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[rank-size]]></category>
		<category><![CDATA[surnames]]></category>
		<category><![CDATA[zipt]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=249</guid>
		<description><![CDATA[Yesterday Professor Mike Batty introduced me to the rank-size rule, an idea popularised by George Kingsley Zipf as the relationship between the frequency of an observed phenomenon against the phenomenon&#8217;s rank in its group. This is best exemplified by the example of city sizes: essentially Zipf shows that for every really large city, there exist [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F05%2Fsome-surname-based-rank-size-thoughts%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F05%2Fsome-surname-based-rank-size-thoughts%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Yesterday <a title="Mike Batty" href="http://www.casa.ucl.ac.uk/people/MikesPage.htm" target="_blank">Professor Mike Batty</a> introduced me to the rank-size rule, an idea popularised by <a title="Zipf - Wikipedia" href="http://en.wikipedia.org/wiki/George_Kingsley_Zipf" target="_blank">George Kingsley Zipf </a>as the relationship between the frequency of an observed phenomenon against the phenomenon&#8217;s rank in its group. This is best exemplified by the example of city sizes: essentially Zipf shows that for every really large city, there exist many smaller ones; however these smaller cities aren&#8217;t just a bit smaller than the large city, they are considerably smaller, in fact the difference in city size from the biggest cities to the smallest can be explained by a power law, this can be represented as:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/03/CodeCogsEqn2.gif"><img class="aligncenter size-full wp-image-250" title="CodeCogsEqn(2)" src="http://danieljlewis.org/files/2010/03/CodeCogsEqn2.gif" alt="" width="85" height="49" /></a></p>
<p style="text-align: left">Where Pn is the frequency of occurance of a phenomenon ranked nth, and the exponent <em>alpha </em>is usually roughly equal to 1.</p>
<p style="text-align: left">The power law thus produces a plot where the 2nd item is 1/2 the size of the 1st, the 3rd item is a 1/3 the size of the 1st etc. This can be represented by a plot of surname frequency in Southwark by rank.</p>
<div id="attachment_251" class="wp-caption aligncenter" style="width: 548px"><a href="http://danieljlewis.org/files/2010/03/Rplot3.png"><img class="size-full wp-image-251" title="Rplot3" src="http://danieljlewis.org/files/2010/03/Rplot3.png" alt="" width="538" height="537" /></a><p class="wp-caption-text">Plot of Surname Frequency against Rank in Southwark for all observed surname (using R)</p></div>
<p style="text-align: left">It is clear from the graph that there are very few surnames which are popular and many which are relatively unique. Another interesting characteristic of a power law, such as the relationship between surname frequency and rank are self similar: if we examine any portion of the curve we should get the same curve, albeit at a different scale.</p>
<p style="text-align: left">
<div id="attachment_255" class="wp-caption aligncenter" style="width: 548px"><a href="http://danieljlewis.org/files/2010/03/RPlot5.png"><img class="size-full wp-image-255 " title="RPlot5" src="http://danieljlewis.org/files/2010/03/RPlot5.png" alt="" width="538" height="537" /></a><p class="wp-caption-text">Plot of Surname frequency for Rank 300 - 6000</p></div>
<p style="text-align: left">It is clear from the above graph that a subset of the full data gives a power law relationship. We can attempt to linearise this relationship by taking the log of the frequency and rank:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/03/Rplot1.png"><img class="aligncenter size-full wp-image-256" title="Rplot1" src="http://danieljlewis.org/files/2010/03/Rplot1.png" alt="" width="538" height="537" /></a>The fact that the line is not straight indicates that the relationship is not a true power law. The long tail is accentuated by the stepped line, frequencies are integers so when we get to increasingly unique surnames the ranks tend to cluster. In the rank-size distribution of cities, the characteristic fall in the long tail when linearised like this indicates that city size distributions are really log-normal, however this is not the case in terms of surnames. If we exclude some of the long tail, the relationship can look a bit more linear as this plot demonstrates:</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/03/Rplot2.png"><img class="aligncenter size-full wp-image-257" title="Rplot2" src="http://danieljlewis.org/files/2010/03/Rplot2.png" alt="" width="538" height="537" /></a></p>
<p style="text-align: left">
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

