<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Volunteered Geographic Information &#187; R</title>
	<atom:link href="http://danieljlewis.org/tag/r/feed/" rel="self" type="application/rss+xml" />
	<link>http://danieljlewis.org</link>
	<description>A Geography/GIS blog by Daniel J Lewis</description>
	<lastBuildDate>Tue, 20 Dec 2011 17:15:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Some Surname-based Rank-Size thoughts</title>
		<link>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/</link>
		<comments>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/#comments</comments>
		<pubDate>Fri, 05 Mar 2010 14:24:27 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Southwark]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[log]]></category>
		<category><![CDATA[power law]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[rank-size]]></category>
		<category><![CDATA[surnames]]></category>
		<category><![CDATA[zipt]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=249</guid>
		<description><![CDATA[Yesterday Professor Mike Batty introduced me to the rank-size rule, an idea popularised by George Kingsley Zipf as the relationship between the frequency of an observed phenomenon against the phenomenon&#8217;s rank in its group. This is best exemplified by the example of city sizes: essentially Zipf shows that for every really large city, there exist [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F05%2Fsome-surname-based-rank-size-thoughts%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F05%2Fsome-surname-based-rank-size-thoughts%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Yesterday <a title="Mike Batty" href="http://www.casa.ucl.ac.uk/people/MikesPage.htm" target="_blank">Professor Mike Batty</a> introduced me to the rank-size rule, an idea popularised by <a title="Zipf - Wikipedia" href="http://en.wikipedia.org/wiki/George_Kingsley_Zipf" target="_blank">George Kingsley Zipf </a>as the relationship between the frequency of an observed phenomenon against the phenomenon&#8217;s rank in its group. This is best exemplified by the example of city sizes: essentially Zipf shows that for every really large city, there exist many smaller ones; however these smaller cities aren&#8217;t just a bit smaller than the large city, they are considerably smaller, in fact the difference in city size from the biggest cities to the smallest can be explained by a power law, this can be represented as:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/03/CodeCogsEqn2.gif"><img class="aligncenter size-full wp-image-250" title="CodeCogsEqn(2)" src="http://danieljlewis.org/files/2010/03/CodeCogsEqn2.gif" alt="" width="85" height="49" /></a></p>
<p style="text-align: left">Where Pn is the frequency of occurance of a phenomenon ranked nth, and the exponent <em>alpha </em>is usually roughly equal to 1.</p>
<p style="text-align: left">The power law thus produces a plot where the 2nd item is 1/2 the size of the 1st, the 3rd item is a 1/3 the size of the 1st etc. This can be represented by a plot of surname frequency in Southwark by rank.</p>
<div id="attachment_251" class="wp-caption aligncenter" style="width: 548px"><a href="http://danieljlewis.org/files/2010/03/Rplot3.png"><img class="size-full wp-image-251" title="Rplot3" src="http://danieljlewis.org/files/2010/03/Rplot3.png" alt="" width="538" height="537" /></a><p class="wp-caption-text">Plot of Surname Frequency against Rank in Southwark for all observed surname (using R)</p></div>
<p style="text-align: left">It is clear from the graph that there are very few surnames which are popular and many which are relatively unique. Another interesting characteristic of a power law, such as the relationship between surname frequency and rank are self similar: if we examine any portion of the curve we should get the same curve, albeit at a different scale.</p>
<p style="text-align: left">
<div id="attachment_255" class="wp-caption aligncenter" style="width: 548px"><a href="http://danieljlewis.org/files/2010/03/RPlot5.png"><img class="size-full wp-image-255 " title="RPlot5" src="http://danieljlewis.org/files/2010/03/RPlot5.png" alt="" width="538" height="537" /></a><p class="wp-caption-text">Plot of Surname frequency for Rank 300 - 6000</p></div>
<p style="text-align: left">It is clear from the above graph that a subset of the full data gives a power law relationship. We can attempt to linearise this relationship by taking the log of the frequency and rank:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/03/Rplot1.png"><img class="aligncenter size-full wp-image-256" title="Rplot1" src="http://danieljlewis.org/files/2010/03/Rplot1.png" alt="" width="538" height="537" /></a>The fact that the line is not straight indicates that the relationship is not a true power law. The long tail is accentuated by the stepped line, frequencies are integers so when we get to increasingly unique surnames the ranks tend to cluster. In the rank-size distribution of cities, the characteristic fall in the long tail when linearised like this indicates that city size distributions are really log-normal, however this is not the case in terms of surnames. If we exclude some of the long tail, the relationship can look a bit more linear as this plot demonstrates:</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/03/Rplot2.png"><img class="aligncenter size-full wp-image-257" title="Rplot2" src="http://danieljlewis.org/files/2010/03/Rplot2.png" alt="" width="538" height="537" /></a></p>
<p style="text-align: left">
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LPSolve in R for Transportation Problems</title>
		<link>http://danieljlewis.org/2009/06/24/lpsolve-in-r-for-transportation-problems/</link>
		<comments>http://danieljlewis.org/2009/06/24/lpsolve-in-r-for-transportation-problems/#comments</comments>
		<pubDate>Wed, 24 Jun 2009 13:30:52 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[PhD Work]]></category>
		<category><![CDATA[doctors]]></category>
		<category><![CDATA[GIS]]></category>
		<category><![CDATA[Health GIS]]></category>
		<category><![CDATA[LPSolve]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[transportation problem]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=20</guid>
		<description><![CDATA[Recently I&#8217;ve been doing some work involving the transportation problem. The Transportation Problem is an allocation optimisation problem that requires the optimal assignment of demand, in my case patients by Output Area, to known, fixed, supply points, in my case doctors surgeries (General Practices). Rather than use a euclidian or manhattan metric to model distance [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2009%2F06%2F24%2Flpsolve-in-r-for-transportation-problems%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2009%2F06%2F24%2Flpsolve-in-r-for-transportation-problems%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Recently I&#8217;ve been doing some work involving the transportation problem. The Transportation Problem is an allocation optimisation problem that requires the optimal assignment of demand, in my case patients by Output Area, to known, fixed, supply points, in my case doctors surgeries (General Practices). Rather than use a euclidian or manhattan metric to model distance from the demand site to the supply site I have used public transport travel times from TfL.</p>
<p>Initially this seemed a difficult task, and early attempts only provided partial, or non-optimal, solutions. However, once I had found the linear programming functionality in R through the package LPSolve it became very easy to create a model with the constraints I wanted and get a solution very quickly. Key to the success of the R package was the ability to set the constraints I needed, crucially integer constraints so that people were not subdivided, and constraints on the number of patients doctors could take.</p>
<p>Mapping the outcomes in ArcGIS was straightforward due to R&#8217;s built in csv-export funtionality.</p>
<p>Here is an example of the output.</p>
<p><img class="size-large wp-image-21 alignnone" src="http://danieljlewis.org/files/2009/06/catchment06gpconstraint-724x1024.jpg" alt="catchment06gpconstraint" width="347" height="491" /></p>
<p>The legend denotes the 44 physical practices in the London borough of Southwark, some doctors exist on the same site and so these practices were agglomerated. The grey areas represent unallocated demand caused by capping the size of the General Practices. In another definition of the model I ran I set the GPs up as uncapacitated so that all the demand would be satisfied. This model uses data from 2006, I have data from 2009 for which I will also run the model.</p>
<p>Some earlier work I did on this is available in the Proceedings of GISRUK &#8217;09.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2009/06/24/lpsolve-in-r-for-transportation-problems/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

