<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Volunteered Geographic Information &#187; surnames</title>
	<atom:link href="http://danieljlewis.org/tag/surnames/feed/" rel="self" type="application/rss+xml" />
	<link>http://danieljlewis.org</link>
	<description>A Geography/GIS blog by Daniel J Lewis</description>
	<lastBuildDate>Tue, 20 Dec 2011 17:15:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Some Surname-based Rank-Size thoughts</title>
		<link>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/</link>
		<comments>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/#comments</comments>
		<pubDate>Fri, 05 Mar 2010 14:24:27 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Southwark]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[log]]></category>
		<category><![CDATA[power law]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[rank-size]]></category>
		<category><![CDATA[surnames]]></category>
		<category><![CDATA[zipt]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=249</guid>
		<description><![CDATA[Yesterday Professor Mike Batty introduced me to the rank-size rule, an idea popularised by George Kingsley Zipf as the relationship between the frequency of an observed phenomenon against the phenomenon&#8217;s rank in its group. This is best exemplified by the example of city sizes: essentially Zipf shows that for every really large city, there exist [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F05%2Fsome-surname-based-rank-size-thoughts%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F05%2Fsome-surname-based-rank-size-thoughts%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Yesterday <a title="Mike Batty" href="http://www.casa.ucl.ac.uk/people/MikesPage.htm" target="_blank">Professor Mike Batty</a> introduced me to the rank-size rule, an idea popularised by <a title="Zipf - Wikipedia" href="http://en.wikipedia.org/wiki/George_Kingsley_Zipf" target="_blank">George Kingsley Zipf </a>as the relationship between the frequency of an observed phenomenon against the phenomenon&#8217;s rank in its group. This is best exemplified by the example of city sizes: essentially Zipf shows that for every really large city, there exist many smaller ones; however these smaller cities aren&#8217;t just a bit smaller than the large city, they are considerably smaller, in fact the difference in city size from the biggest cities to the smallest can be explained by a power law, this can be represented as:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/03/CodeCogsEqn2.gif"><img class="aligncenter size-full wp-image-250" title="CodeCogsEqn(2)" src="http://danieljlewis.org/files/2010/03/CodeCogsEqn2.gif" alt="" width="85" height="49" /></a></p>
<p style="text-align: left">Where Pn is the frequency of occurance of a phenomenon ranked nth, and the exponent <em>alpha </em>is usually roughly equal to 1.</p>
<p style="text-align: left">The power law thus produces a plot where the 2nd item is 1/2 the size of the 1st, the 3rd item is a 1/3 the size of the 1st etc. This can be represented by a plot of surname frequency in Southwark by rank.</p>
<div id="attachment_251" class="wp-caption aligncenter" style="width: 548px"><a href="http://danieljlewis.org/files/2010/03/Rplot3.png"><img class="size-full wp-image-251" title="Rplot3" src="http://danieljlewis.org/files/2010/03/Rplot3.png" alt="" width="538" height="537" /></a><p class="wp-caption-text">Plot of Surname Frequency against Rank in Southwark for all observed surname (using R)</p></div>
<p style="text-align: left">It is clear from the graph that there are very few surnames which are popular and many which are relatively unique. Another interesting characteristic of a power law, such as the relationship between surname frequency and rank are self similar: if we examine any portion of the curve we should get the same curve, albeit at a different scale.</p>
<p style="text-align: left">
<div id="attachment_255" class="wp-caption aligncenter" style="width: 548px"><a href="http://danieljlewis.org/files/2010/03/RPlot5.png"><img class="size-full wp-image-255 " title="RPlot5" src="http://danieljlewis.org/files/2010/03/RPlot5.png" alt="" width="538" height="537" /></a><p class="wp-caption-text">Plot of Surname frequency for Rank 300 - 6000</p></div>
<p style="text-align: left">It is clear from the above graph that a subset of the full data gives a power law relationship. We can attempt to linearise this relationship by taking the log of the frequency and rank:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/03/Rplot1.png"><img class="aligncenter size-full wp-image-256" title="Rplot1" src="http://danieljlewis.org/files/2010/03/Rplot1.png" alt="" width="538" height="537" /></a>The fact that the line is not straight indicates that the relationship is not a true power law. The long tail is accentuated by the stepped line, frequencies are integers so when we get to increasingly unique surnames the ranks tend to cluster. In the rank-size distribution of cities, the characteristic fall in the long tail when linearised like this indicates that city size distributions are really log-normal, however this is not the case in terms of surnames. If we exclude some of the long tail, the relationship can look a bit more linear as this plot demonstrates:</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/03/Rplot2.png"><img class="aligncenter size-full wp-image-257" title="Rplot2" src="http://danieljlewis.org/files/2010/03/Rplot2.png" alt="" width="538" height="537" /></a></p>
<p style="text-align: left">
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/03/05/some-surname-based-rank-size-thoughts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analysis of Surnames from Southwark Patient Register</title>
		<link>http://danieljlewis.org/2010/03/03/analysis-of-surnames-from-southwark-patient-register/</link>
		<comments>http://danieljlewis.org/2010/03/03/analysis-of-surnames-from-southwark-patient-register/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 15:32:51 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Southwark]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[James Cheshire]]></category>
		<category><![CDATA[population]]></category>
		<category><![CDATA[surnames]]></category>
		<category><![CDATA[top 20]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=243</guid>
		<description><![CDATA[My colleague James Cheshire&#8217;s research deals with understanding and classifying spatial patterns in surnames. He has been able to show, through various techniques, that there exists in the UK a regional geography of surnames. This in mind, I thought I&#8217;d interogate my database of NHS patient registrations for Southwark and see what was going on [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F03%2Fanalysis-of-surnames-from-southwark-patient-register%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F03%2F03%2Fanalysis-of-surnames-from-southwark-patient-register%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>My colleague <a title="JC's Blog" href="http://spatialanalysis.co.uk/" target="_blank">James Cheshire&#8217;s</a> research deals with understanding and classifying spatial patterns in surnames. He has been able to show, through various techniques, that there exists in the UK a regional geography of surnames. This in mind, I thought I&#8217;d interogate my database of NHS patient registrations for Southwark and see what was going on in surname terms there. This first table shows the top 20 most popular surnames in Southwark, ranked by occurance.</p>
<div id="attachment_247" class="wp-caption aligncenter" style="width: 430px"><a href="http://danieljlewis.org/files/2010/03/Top20namesSouthwark.png"><img class="size-full wp-image-247" title="Top20namesSouthwark" src="http://danieljlewis.org/files/2010/03/Top20namesSouthwark.png" alt="" width="420" height="421" /></a><p class="wp-caption-text">Figure 1: Top 20 Surnames in Southwark, by occurance.</p></div>
<p>Unsurprisingly perhaps, the top places are dominated by surnames native to the UK, classically Smith, Williams, Jones etc. However, in line with Southwark&#8217;s reputation as a diverse borough and in light of it&#8217;s high inmigration figures, it is also clear that of these top 20 surnames some of them would be connected to inmigrant names: Kamara, Ahmed, Ali, Patel and Khan are all surnames that are increasingly associated with a previous period of migration to the UK. Interestingly the Vietnamese population is very small, less than 1% of the population of Southwark, but around 23% of these have the surname &#8216;Nguyen&#8217;. The ethnicity of the surnames is derived from <a title="Onomap" href="http://www.onomap.org/" target="_blank">Onomap</a>.</p>
<p>The frequency distribution of Southwark surnames looks like this:</p>
<div id="attachment_246" class="wp-caption aligncenter" style="width: 584px"><a href="http://danieljlewis.org/files/2010/03/SurnameFreq.png"><img class="size-large wp-image-246" title="SurnameFreq" src="http://danieljlewis.org/files/2010/03/SurnameFreq-1024x416.png" alt="" width="574" height="233" /></a><p class="wp-caption-text">Figure 2: Surname Frequency Distribution for Southwark, 2009</p></div>
<p style="text-align: left">Note the characteristic long tail, there are a huge number of unique, or almost unique surnames, and considerably fewer surnames which are possessed by a large number of people. Such a distribution seems to obey a <a title="Wiki Power Law" href="http://en.wikipedia.org/wiki/Power_law" target="_blank">power law</a> of some sort.</p>
<p style="text-align: left">We can dig deeper into this phenomenon by looking at the number of surnames that comprise a given percentage of the population:</p>
<div id="attachment_245" class="wp-caption aligncenter" style="width: 530px"><a href="http://danieljlewis.org/files/2010/03/PopSurnametablegraph.png"><img class="size-full wp-image-245" title="PopSurnametablegraph" src="http://danieljlewis.org/files/2010/03/PopSurnametablegraph.png" alt="" width="520" height="213" /></a><p class="wp-caption-text">Figure 3: Surnames comprising given percentages of the Southwark Population</p></div>
<p style="text-align: left">As we can see from the above figure, only 56 names account for 10% of the Southwark Population, but that in total there are 88,124 distinct surnames in Southwark. Again there is a characteristic decay to the curve.</p>
<p style="text-align: left">Finally, let us consider just the charactersitics of the long-tail of the distribution:</p>
<div id="attachment_244" class="wp-caption aligncenter" style="width: 560px"><a href="http://danieljlewis.org/files/2010/03/longtailsurnamegraphtable.png"><img class="size-full wp-image-244" title="longtailsurnamegraphtable" src="http://danieljlewis.org/files/2010/03/longtailsurnamegraphtable.png" alt="" width="550" height="221" /></a><p class="wp-caption-text">Figure 4: Focus on the long-tail - percentage population for given surname frequencies.</p></div>
<p style="text-align: left">From figure 4 it is clear that almost 25% of the Southwark population have a surname that is share by fewer that 11 people, indeed just over 16% of the Southwark population have a surname unique to the Southwark patient register. The shape of the curve in figure 4 demonstrate the effect of the long tail seen in figure 2.</p>
<p style="text-align: left">For more information on surnames research check out <a title="JC's Blog" href="http://spatialanalysis.co.uk/" target="_blank">James Cheshire&#8217;s blog</a>, <a title="JC's WP 149" href="http://www.casa.ucl.ac.uk/publications/workingPaperDetail.asp?ID=149" target="_blank">working paper</a> or <a title="Pablo's WP 116" href="http://www.casa.ucl.ac.uk/publications/workingPaperDetail.asp?ID=116" target="_blank">Pablo Mateos&#8217; working paper</a>.</p>
<p style="text-align: left">
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/03/03/analysis-of-surnames-from-southwark-patient-register/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

