<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Volunteered Geographic Information &#187; OAC</title>
	<atom:link href="http://danieljlewis.org/tag/oac/feed/" rel="self" type="application/rss+xml" />
	<link>http://danieljlewis.org</link>
	<description>A Geography/GIS blog by Daniel J Lewis</description>
	<lastBuildDate>Tue, 20 Dec 2011 17:15:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>UK OAC map in Python</title>
		<link>http://danieljlewis.org/2010/06/02/uk-oac-map-in-python/</link>
		<comments>http://danieljlewis.org/2010/06/02/uk-oac-map-in-python/#comments</comments>
		<pubDate>Wed, 02 Jun 2010 11:05:57 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Cartography]]></category>
		<category><![CDATA[GIS]]></category>
		<category><![CDATA[Representation]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[OAC]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[shapely]]></category>
		<category><![CDATA[UK]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=336</guid>
		<description><![CDATA[Here is a quick confirmation that you can use Python to draw very detailed maps; using the previously specified method I was unable to get python to draw all UK OAs due to their great number (c.220,000) and high complexity (c.50,000,000) vertices. Additionally I was unable to use the generalised OA boundaries for the UK [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F06%2F02%2Fuk-oac-map-in-python%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F06%2F02%2Fuk-oac-map-in-python%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Here is a quick confirmation that you can use Python to draw very detailed maps; using the previously specified method I was unable to get python to draw all UK OAs due to their great number (c.220,000) and high complexity (c.50,000,000) vertices. Additionally I was unable to use the generalised OA boundaries for the UK from UKBorders as they contain topological errors that the shapefile reader cannot deal with. ArcGIS is obviously a bit clever in how it handles bad topologies. So I extracted all the vertices and fed them into shapely polygons, and visualised them in the same way, but without reading shapefiles directly into python and was able to output this:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/06/UKOAC.png"><img class="aligncenter size-large wp-image-337" title="UKOAC" src="http://danieljlewis.org/files/2010/06/UKOAC-640x1024.png" alt="" width="576" height="922" /></a>This method has had an impact on the speed of computation as it can take roughly 25 minutes to output this map. The map looks pretty good, aside from a slightly odd polygon in the Bristol channel. Nevertheless, coupled with the operations that shapely, and other geo-libraries, can do this si increasing indication of the maturity of GIS in a variety of platforms. Oh, and it&#8217;s all free!</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/06/02/uk-oac-map-in-python/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>More Thematic Maps in Python &#8211; shapely and descartes</title>
		<link>http://danieljlewis.org/2010/05/27/more-thematic-maps-in-python-shapely-and-descartes/</link>
		<comments>http://danieljlewis.org/2010/05/27/more-thematic-maps-in-python-shapely-and-descartes/#comments</comments>
		<pubDate>Thu, 27 May 2010 16:58:14 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Representation]]></category>
		<category><![CDATA[descartes]]></category>
		<category><![CDATA[matplotlib]]></category>
		<category><![CDATA[OAC]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[shapely]]></category>
		<category><![CDATA[Wales]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=326</guid>
		<description><![CDATA[Thanks to Sean Gillies for commenting on my last post, he put me onto a couple of Python packages that he&#8217;s been involved in creating that allow you to do some really excellent geospatial things. The shapely package is a great implementation of a lot of spatial analyses that you can do on projected (i.e. [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F05%2F27%2Fmore-thematic-maps-in-python-shapely-and-descartes%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F05%2F27%2Fmore-thematic-maps-in-python-shapely-and-descartes%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Thanks to <a title="Sean Gillies Homepage" href="http://sgillies.net/" target="_blank">Sean Gillies</a> for commenting on my last post, he put me onto a couple of Python packages that he&#8217;s been involved in creating that allow you to do some really excellent geospatial things. The <a title="shapely" href="http://trac.gispython.org/lab/wiki/Shapely" target="_blank">shapely</a> package is a great implementation of a lot of spatial analyses that you can do on projected (i.e. flattened) datasets, including topological operations and a full set of object types. The <a title="Descartes package" href="http://pypi.python.org/pypi/descartes/1.0" target="_blank">descartes</a> package allows better integration of matplotlib with spatial data, particularly in terms of not having to use the &#8220;fill&#8221; plotting function repeatedly, but creating a more efficient set of &#8220;patches&#8221; which can then be added to the figure plot. The overal impression I got from descartes is that it wasn&#8217;t spectacularly different from the method detailed in my previous post, but it gives you more control and stability over the map plotting process; whereas using raw matplotlib you are inclined to hope that the map outputs correctly (it all seems a bit up to chance), using descartes you have a more robust and easily manipulable output.</p>
<p>In order to test this I rewrote my previous thematic map script to: firstly convert the shapefile geometries into shapely polygons, and secondly to pass those shapely polygons to descartes and draw a map plot using descartes-matplotlib. The only slightly odd piece of functionality that I found was that you can&#8217;t pass the shapely polygon object a list of shapely points in order to create the polygon, rather you have to pass a list of x,y tuples &#8211; much less satisfying!</p>
<p>Nonetheless, the changes were easy to implement, and with the previous script as given basically include:</p>
<pre>from shapely.geometry import Polygon

points = []
for i in range(0,<em>number of points in shapefile</em>):
 tempx = float(<em>x coord of point in shapefile polygon</em>)
 tempy = float(<em>y coord of point in shapefile polygon</em>)

 points.append((tempx,tempy))
polygon = Polygon(points)
</pre>
<p>The above method creates a simple polygon without holes, shapely can accomodate this is need be though. Having created the shapely polygons, all that remains is to create a patch.</p>
<pre>from descartes import PolygonPatch

patch = PolygonPatch(polygon, <em>plus colour and line considerations</em>)
</pre>
<p>Then you simply add the patch to the matplotlib figure you have already created so:</p>
<pre>from matplotlib import pyplot

fig = pyplot.figure(1, figsize = [10,10], dpi = 300)   #create 10x10 figure
ax = fig.addsubplot(111)    #Add the map frame (single plot)

# here you create all the polygons and patches

ax.addpatch(patch)   # simply add the patch to the subplot
# set plot vars
ax.set_xlim(<em>get xmin and xmax values from data</em>)
ax.set_ylim(<em>get ymin and ymax values from data</em>)
ax.set_aspect(1)

pyplot.show()
</pre>
<p>Using these basics I was able to create a basic OAC map using Welsh OAs as an example:</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/05/WalesOAC1.png"><img class="aligncenter size-full wp-image-328" title="WalesOAC" src="http://danieljlewis.org/files/2010/05/WalesOAC1.png" alt="" width="520" height="545" /></a></p>
<pre>
</pre>
<pre>
</pre>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/05/27/more-thematic-maps-in-python-shapely-and-descartes/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A Thematic Map in Python</title>
		<link>http://danieljlewis.org/2010/05/25/a-thematic-map-in-python/</link>
		<comments>http://danieljlewis.org/2010/05/25/a-thematic-map-in-python/#comments</comments>
		<pubDate>Tue, 25 May 2010 19:08:09 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Representation]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[automated]]></category>
		<category><![CDATA[categorical]]></category>
		<category><![CDATA[Maps]]></category>
		<category><![CDATA[matplotlib]]></category>
		<category><![CDATA[OAC]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[shapefile]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=309</guid>
		<description><![CDATA[I though I would explore the possibility of creating thematic maps using Python, this post documents my initial attempt. The output is hence rather basic, but encouraging. The primary reason that I wanted to test the mapping potential of python is to allow for some basic automated map production in order to quickly visually assess [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F05%2F25%2Fa-thematic-map-in-python%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F05%2F25%2Fa-thematic-map-in-python%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I though I would explore the possibility of creating thematic maps using Python, this post documents my initial attempt. The output is hence rather basic, but encouraging. The primary reason that I wanted to test the mapping potential of python is to allow for some basic automated map production in order to quickly visually assess the geographical patterns contained within large data sets. This is something that I am at a loss to do in ESRI&#8217;s ArcGIS, although that might change in ArcGIS 10. For fans of R I know it can be done there, however R is too tricky for me! My colleague James Cheshire explains the method in R <a title="Making Maps in R" href="http://spatialanalysis.co.uk/2010/01/13/making-maps-with-r/" target="_blank">here.</a></p>
<p>The first hurdle in map making is getting the data in, for this I used the <a title="Shapefile Reader" href="http://indiemaps.com/blog/2008/03/easy-shapefile-loading-in-python/" target="_blank">shapefile reader</a> that <a title="Indiemaps Home" href="http://indiemaps.com/" target="_blank">Zachary Forest Johnson</a> put together for his excellent blog &#8216;<a title="Indiemaps Blog" href="http://indiemaps.com/blog">IndieMaps.com</a>&#8216;. This allowed me read in any of my masses of pre-existing Shapefile format datafiles, and indeed use the python scripting functionality in ArcGIS to perform spatial operations and then output a map quickly and without the hassle of dealing with ArcGIS layouts.</p>
<p>Once you have download the shapefile reader, it is easily implemented using:</p>
<pre>import shpUtils   #imports the shapefile reader
#Load a shapefile into an object called shpRecords
shpRecords = shpUtils.loadShapefile('\filename.shp')</pre>
<p>This is undoubtedly simple, what you then have is a (slightly) complex object which contians all of the shapefile data nested as lists and dictionaries. In order to get my head round this I spent some time investigating it, a standard shapefile that contains areal geographies (i.e. UK Output Areas) will have a similar set up to this:</p>
<ul>
<li>The first list (shpRecords[i]) records the number of complete geometries, this corresponds to the number of rows in the attribute table. Thus a single polygon has 1 row in the attribute table and 1 list (list index 0) in Python.</li>
<li>The second dictionary (shpRecords[i]['key']) records two branches, reporting either the &#8216;dbf_data&#8217; from the attribute table, or the &#8216;shp_data&#8217; from the .shp file describing the underlying geometry.</li>
<li>Choosing the &#8216;dbf_data&#8217; key (shpRecords[i]['dbf_data']) allows you to see the attributes recorded column-by-column for each row (and hence each geometry) in the attribute table. Thus shpRecords[i]['dbf_data']['name'] will return the attribute value for the field &#8216;name&#8217; for the <em>i</em>th geometry in the shapefile.</li>
<li>Choosing the &#8216;shp_data&#8217; key (shpRecords[i]['shp_data']) allows you to access the various components of the shapefile&#8217;s geometry. In the case of a polyline/polygon you get dictionary items &#8216;ymax&#8217;, &#8216;ymin&#8217;, &#8216;xmax&#8217;, &#8216;xmin&#8217;, &#8216;numpoints&#8217;, &#8216;numparts&#8217; and &#8216;parts&#8217;. Clearly the first 6 items are properties of the <em>i</em>th geometry you are querying, so it allows you to form a bounding box, get the number of vertices in the line/polygon, and draw separate lines/polygons if the shapefile is setup to have spatially discontinuous shapes for each row.</li>
<li>The thing we are most interested in is the &#8216;parts&#8217; dictionary key, as this contains all the coordinates for the particular geometry being considered, this is accessed as: shpRecords[i]['shp_data']['parts']. The next list (shpRecords[i]['shp_data']['parts'][j]) thus allows you to distinguish between parts in a multipart file. i.e. the <em>j</em>th part of the <em>i</em>th geometry.</li>
<li>Having come this far, one final dictionary allows us to see the coordinates themselves, this dictionary simply offers us &#8216;x&#8217; or &#8216;y&#8217;. Thus finding the x-coordinate of the <em>i</em>th geometry and <em>j</em>th part is accessed by: shpRecords[i]['shp_data']['parts'][j]['x'] &#8211; simple!</li>
</ul>
<p>I have been using <a title="MatPlotLib @ Sourceforge" href="http://matplotlib.sourceforge.net/" target="_blank">matplotlib</a> &#8211; a python library for scientific visualisation a lot recent, and have found it a very simple and powerful resource, so I thought I&#8217;d see if it could be made to draw a map.</p>
<p>Firstly import the pyplot element which does all the figure drawing:</p>
<pre>import matplotlib.pyplot as plt
</pre>
<p>Now lets use the &#8220;fill&#8221; component of matplotlib to draw all the geometries in a shapefile &#8211; my shapefile is Output Areas in Southwark. Firstly we need to loop through each geometry, and then draw a polygon using all the points contained within each geometry. I omitted a loop for multipart geometries as my shapefile has none, however this would be very easy if the data did have multiple parts- simply add a loop in the middle!</p>
<pre>for i in range(0,len(shpRecords)):
 # x and y are empty lists to be populated with the coords of each geometry.
 x = []
 y = []
 for j in range(0,len(shpRecords[i]['shp_data']['parts'][0]['points'])):
  # This is the number of vertices in the ith geometry.
  # The parts list is [0] as it is singlepart.

  # get x and y coordinates.
  tempx = float(shpRecords[i]['shp_data']['parts'][0]['points'][j]['x'])
  tempy = float(shpRecords[i]['shp_data']['parts'][0]['points'][j]['y'])
  x.append(tempx)
  y.append(tempy) # Populate the lists  

 # Creates a polygon in matplotlib for each geometry in the shapefile
 plt.fill(x,y)

plt.axis('equal')
# This sets the x and y axes as equal intervals.
# NB this script will only work for projected data, for geographical
# coordinate systems get ready to do some maths  

plt.show() # Draws the map!</pre>
<p>This is the simplest form of the script, it will simply draw the shapefile with each area filled a random colour. This is not that useful, but it is easy to create a thematic maps of categorical data, so let investigate a way of doing that. I&#8217;ve got data for the Output Area Classification, which is a clustering of areas by social characteristics, I know that there are 7 supergroups in the classification, named numerically, so before all the processing of the shapefile I can create a dictionary of colour choices for each group. I&#8217;m using hexadecimal colours that I got from <a title="Colour Brewer" href="http://colorbrewer2.org/" target="_blank">Cynthia Brewer&#8217;s</a> website for a &#8216;qualitative&#8217; 7 class classification. The dictionary looks like this:</p>
<pre>oacSGroups = {'1':'#A6761D','2':'#E6AB02','3':'#66A61E','4':'#E7298A',\
'5':'#7570B3','6':'#D95F02','7': '#1B9E77'}
</pre>
<p>Thus the key &#8217;1&#8242; returns the associated hex colour, this can be linked to the &#8216;dbf_data&#8217; key in the shapefile. In the plt.fill() component I simply have to specify the colour choice, thus we alter the line in the above script to read:</p>
<pre>plt.fill(x,y,fc = oacSGroups[str(int(shpRecords[i]['dbf_data']['supergroup']))]\
,ec = '0.7',lw=0.1)
</pre>
<p>&#8216;fc&#8217; is the &#8216;foreground colour&#8217; we are asking python to make the colour equal to the value in the oacSGroups dictionary where the key is the value contained in the attribute table for the <em>i</em>th row in the &#8216;supergroup&#8217; field. Thus if the <em>i</em>th row had a &#8216;supergroup&#8217; value of &#8217;7&#8242; that foreground colour would be set to &#8216;#1B9E77&#8242;. &#8216;ec&#8217; is &#8216;edge colour&#8217; and &#8216;lw&#8217; is linewidth, here I have set the values to display fine, light grey lines.</p>
<p>Finally, as basic a map as this will turn out to be, we wouldn&#8217;t be anywhere without a legend. The following a a very basic, wholy manual way to add a legend to the map:</p>
<pre>p1 = plt.Rectangle((0, 0), 1, 1, fc="#A6761D")
p2 = plt.Rectangle((0, 0), 1, 1, fc="#E6AB02")
p3 = plt.Rectangle((0, 0), 1, 1, fc="#66A61E")
p4 = plt.Rectangle((0, 0), 1, 1, fc="#E7298A")
p5 = plt.Rectangle((0, 0), 1, 1, fc="#7570B3")
p6 = plt.Rectangle((0, 0), 1, 1, fc="#D95F02")
p7 = plt.Rectangle((0, 0), 1, 1, fc="#1B9E77")

plt.legend([p1,p2,p3,p4,p5,p6,p7], ["Super Group 1","Super Group 2",\
"Super Group 3","Super Group 4","Super Group 5","Super Group 6","Super Group 7"], loc = 4)
</pre>
<p>This simply creates 7 rectangular plots which don&#8217;t appear on the plotted output, but instead are passed to the legend creator, each rectangle has the appropriate colour to match the mapped representation, and a label, shown int he legend as two ordered lists. The &#8216;loc&#8217; tag allows the setting of where the legend will appear, 4 denotes the bottom right corner. the tag &#8216;title&#8217; allows you to add a title to the legend as a string.</p>
<p style="text-align: left">An example output looks something like this:<a href="http://danieljlewis.org/files/2010/05/OACPythonMap.png"></a></p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/05/OACPythonMap1.png"><img class="aligncenter size-full wp-image-323" title="OACPythonMap" src="http://danieljlewis.org/files/2010/05/OACPythonMap1.png" alt="" width="564" height="650" /></a>This took a couple of seconds to produce, and accounts for 846 individual geometries, which actually have quite a number of vertices.</p>
<p style="text-align: left">I&#8217;ll update the blog should I find new methods to visualise spatial data in python.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/05/25/a-thematic-map-in-python/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Multi Dimensional Scaling of Southwark OAC data</title>
		<link>http://danieljlewis.org/2010/04/28/multi-dimensional-scaling-of-southwark-oac-data/</link>
		<comments>http://danieljlewis.org/2010/04/28/multi-dimensional-scaling-of-southwark-oac-data/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 16:34:47 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Representation]]></category>
		<category><![CDATA[geodemographics]]></category>
		<category><![CDATA[LOAC]]></category>
		<category><![CDATA[MDS]]></category>
		<category><![CDATA[OAC]]></category>
		<category><![CDATA[Southwark]]></category>
		<category><![CDATA[uncertainty]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=283</guid>
		<description><![CDATA[Geodemographic classifications are funny things, they report a view of the world which suggests that areas can be split into groups within which all areas share the same or similar characteristics. This is not an inherently bad thing, for large scale analyses it can be a very useful way of simplifying a diverse array of [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F04%2F28%2Fmulti-dimensional-scaling-of-southwark-oac-data%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F04%2F28%2Fmulti-dimensional-scaling-of-southwark-oac-data%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p style="text-align: center">
<p>Geodemographic classifications are funny things, they report a view of the world which suggests that areas can be split into groups within which all areas share the same or similar characteristics. This is not an inherently bad thing, for large scale analyses it can be a very useful way of simplifying a diverse array of variables into something that characterises the underlying patterns in the distribution of data. However, for smaller scale analyses I am increasingly finding that non-bespoke geodemographics are limited, I attempted to demonstrate this on a national scale by looking at the entropy scores for each OA in the UK with respect to distance from all supergroup cluster centres <a title="Entropy Scores for OAC Supergroups" href="http://danieljlewis.org/2010/02/10/oac-quality-using-entropy-scores/" target="_blank">(here)</a>. Recently, <a title="Dr Pete Fischer - Leicester" href="http://www.le.ac.uk/gg/staff/academic_fisher.html" target="_blank">Pete Fischer</a> presented some very clever work in this vein at the recent GISRUK 2010 conference, he used fuzzy classification strategies to account for the likelihood that each OA does not fit exactly into any particular grouping, and that different OAs fit differently into the same group. <a title="Aidan Slingsby - City" href="http://www.soi.city.ac.uk/~sbbb717/" target="_blank">Aidan Slingsby</a> at City also showed this very nicely visually with his &#8216;OAC Explorer&#8217;. Procedings from the conference can be found <a title="GISRUK Proceedings 2010" href="http://eprints.ucl.ac.uk/19284/" target="_blank">here</a>.</p>
<p>With this in mind, I wanted to test the variability of the data in Southwark, my study site, with respect to OAC. OAC paints a very flat picture of the population of Southwark as shown below, and had led to me using LOAC, a London specific variant of OAC created by Jacob Petersen a previous research student at UCL and available as a layer on the <a title="London Profiler" href="http://www.londonprofiler.org/" target="_blank">London Profiler</a>. Using OAC, Southwark is primarily &#8216;multicultural&#8217;, there is more variability in the LOAC classification however, as is evident.</p>
<p style="text-align: center">
<div id="attachment_284" class="wp-caption aligncenter" style="width: 560px"><a href="http://danieljlewis.org/files/2010/04/OACandLOACSwk.png"><img class="size-full wp-image-284 " title="OACandLOACSwk" src="http://danieljlewis.org/files/2010/04/OACandLOACSwk.png" alt="" width="550" height="378" /></a><p class="wp-caption-text">OAC and LOAC classifications for Southwark.</p></div>
<p>Inspired by some of<a title="JC's Academic Context" href="http://spatialanalysis.co.uk/surnames/" target="_blank"> James  Cheshire&#8217;s great work with surnames</a> I employed a method called  &#8220;Multi Dimensional Scaling&#8221; or MDS. Multi Dimensional Scaling is great for exploring similarities and dissimilaries in data, rather than clustering data as in the creation of OAC, it reorders it so that similar datapoints have similar values. One of it&#8217;s great advantages is that it allows for the scaling of data that has many dimensions, such as the 41 OAC variables, into fewer dimensions representative of those 41, these can subsequently be visualised. Traditional approaches in geography have used MDS to scale many dimensions into 2, using these 2 to adjust spatial coordinates to &#8216;blow apart&#8217; maps, reordering places that are similar together and dissimilar further apart. Such representations challenge the validity of Tobler&#8217;s 1st law &#8211; near things are more similar than distant things. In this case however I don&#8217;t want to blow up Southwark, so I follow Cheshire&#8217;s lead in using the scaling to specify a colour for each area in which similar colours indicate similar areas in terms of OAC variables and different colours represent different areas. I experimented with both greyscale and RGB colour scales for this representation. Firstly though, a note on how I got there:</p>
<ol>
<li>Download the OAC variables from CASWEB, using the &#8216;recipe&#8217; specified by <a title="Vickers Working Paper OAC" href="http://eprints.whiterose.ac.uk/5003/1/05-2.pdf" target="_blank">Vickers et al (2005)</a>.</li>
<li>Standardise all the variables &#8211; I used Z-score without really checking for normality, although in reality this would be preferable &#8211; Vickers suggests some other methods of standardisation.</li>
<li>Compute a distance matrix for the MDS. This means calculating the similarity of each pair of OAs, given n OAs this thus leads to an n x n size matrix, a size that can rapidly become unmanageable beyond local scales. I used &#8216;canberra distance&#8217; (an arbitrary choice) to compute the matrix which is given by: <a href="http://danieljlewis.org/files/2010/04/CodeCogsEqn3.gif"><img class="aligncenter size-full wp-image-287" title="CodeCogsEqn(3)" src="http://danieljlewis.org/files/2010/04/CodeCogsEqn3.gif" alt="" width="171" height="61" /></a>where i relates to the value of the first object in a pair and j the  second, and k denotes the variable in question.</li>
<li>This matrix is then input into an MDS solver, as a python fan I used the  fantastic code written using Numpy by <a title="MDS Python Script" href="http://code.google.com/p/pyrouette/source/browse/alg/mds.py" target="_blank">Jeremy Stober</a>, although I added to it to do all the standardisation, distance matrix creation etc as part of a logical process.</li>
<li>Specifying the number of output dimensions (I used 1 and 3) allows you to reduce the large distance matrix into a vector (1d) or matrix (3d) of values, these can then be scaled between 0 and 255 to be converted into digital numbers for visual display. Thanks to James Cheshire for the ArcGIS script to assign RGB values in Arc.</li>
</ol>
<p>The results I got from this preliminary exploration were as follows:</p>
<p style="text-align: left">
<div id="attachment_289" class="wp-caption aligncenter" style="width: 516px"><a href="http://danieljlewis.org/files/2010/04/OACswkBW.jpg"><img class="size-large wp-image-289 " title="OACswkBW" src="http://danieljlewis.org/files/2010/04/OACswkBW-723x1024.jpg" alt="" width="506" height="717" /></a><p class="wp-caption-text">MDS Scaling of OAC variables into Greyscale Representation.</p></div>
<p>This is a very interesting way of looking at the OAC data, as the comfortable uniformity of the seven classes has been lost, instead we can see trends and similarities, but also a fair amount of discountinuity and noise. In the black and white representation a spectrum is presented in which very dark and very light colours are the most dissimilar slowly converging through the spectrum. The resultant mapping clearly displays areas of similarity, the more affluent southern tip of Southwark, the Southbank region in the north of the borough, and the former docklands in the north-west. Counterpoint to these areas is the middle band of Southwark represented by darker hues, and roughly aligned with known areas of deprivation characterised by high-levels of social housing, higher levels of non-white residents, lower level of educational attainment, poorer health etc. What is clear though is that the picture is not uniform as suggested by OAC, and that there exist notable pockets of difference, possibly interpretable as gentrification, particularly around parks. There is also evidence for some fairly notable discontinuities in demographic structure which isn&#8217;t immediately obvious in the OAC classification.</p>
<p>I also mapped an MDS output for 3 dimensions onto an RGB colour scale, as below:</p>
<div id="attachment_292" class="wp-caption aligncenter" style="width: 540px"><a href="http://danieljlewis.org/files/2010/04/OACSwkRGB.jpg"><img class="size-full wp-image-292 " title="OACSwkRGB" src="http://danieljlewis.org/files/2010/04/OACSwkRGB.jpg" alt="OAC" width="530" height="750" /></a><p class="wp-caption-text">MDS Scaling of OAC variables into Colour Representation.</p></div>
<p style="text-align: left">The colour representation should be a more nuanced reading of the similarities and differences, although it is immediately more challenging to interpret. One of the interesting factors is how the southern area of Southern, most characterised by a blue/purple colour has now been distanced from the southbank and former docklands areas, suggesting they are more distinguishably different than previously. The previously dark area is now a pinkish hue, again suggesting a uniformity in that area, however it is flecked with a variety of colours suggesting that deviations in demographics amongst the areas of high deprivation are not similar to each other, but distinct enclaves each with their own specific character.</p>
<p style="text-align: left">This constituted a preliminary study, time permitting I will continue to investigate interesting methods such as this. It is however a computationally intensive process, and a treatment of, for example the UK in this manner is out of the question. Nevertheless, I may update it at different scale in the future.</p>
<p style="text-align: left">Acknowledgment: Boundaries Crown Copyright 2010 Ordnance Survey. A UKBorders/JISC Supplied Service. Data from CASWeb.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/04/28/multi-dimensional-scaling-of-southwark-oac-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OAC quality using entropy scores</title>
		<link>http://danieljlewis.org/2010/02/10/oac-quality-using-entropy-scores/</link>
		<comments>http://danieljlewis.org/2010/02/10/oac-quality-using-entropy-scores/#comments</comments>
		<pubDate>Wed, 10 Feb 2010 20:10:26 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[GIS]]></category>
		<category><![CDATA[Modeling]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[entropy]]></category>
		<category><![CDATA[OAC]]></category>
		<category><![CDATA[Quality]]></category>
		<category><![CDATA[sparseness]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=177</guid>
		<description><![CDATA[The following map shows an entropy score by Great British Output Areas based on each OA&#8217;s &#8216;distance&#8217; from each OAC supergroup cluster centre. Essentially I&#8217;m attempting to measure whether any given OA fits discretely into it&#8217;s cluster assignment or not. I&#8217;m using the cluster distance data from the University of Sheffield OAC datasite. To get [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F02%2F10%2Foac-quality-using-entropy-scores%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2010%2F02%2F10%2Foac-quality-using-entropy-scores%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>The following map shows an entropy score by Great British Output Areas based on each OA&#8217;s &#8216;distance&#8217; from each OAC supergroup cluster centre. Essentially I&#8217;m attempting to measure whether any given OA fits discretely into it&#8217;s cluster assignment or not. I&#8217;m using the cluster distance data from the <a title="UoSheffield OAC data" href="http://www.sasi.group.shef.ac.uk/area_classification/index.html" target="_blank">University of Sheffield OAC datasite</a>. To get a sense of fit I&#8217;m using entropy scores, given by the following equation:</p>
<p><a href="http://danieljlewis.org/files/2010/02/Entropy.png"><img class="aligncenter size-full wp-image-178" title="Entropy" src="http://danieljlewis.org/files/2010/02/Entropy.png" alt="" width="244" height="92" /></a>Where pi is the distance of a given OA to a given supergroup cluster centre with respect to the other distance to centres. Essentially this is a measure of evenness, in terms of OAC we&#8217;d like the results to be less-even as this would suggest that one distance to centre is much smaller than the others indicating a good cluster assignment, OAs that are more-even are indicative of OAs which don&#8217;t fit as well into a single OAC class. In the map below a lower entropy score indicates less evenness and hence more a more discrete assignment of OAC class.</p>
<p style="text-align: center"><a href="http://danieljlewis.org/files/2010/02/OACEntropy.jpg"><img class="aligncenter size-large wp-image-179" title="OACEntropy" src="http://danieljlewis.org/files/2010/02/OACEntropy-724x1023.jpg" alt="" width="579" height="818" /></a></p>
<p style="text-align: left">The pattern that seems to emerge is that urban areas, such as London, and extremely remote areas, such as the highlands of Scotland, do not fit the classification so well. I quickly tested this conclusion by summarising the entropy scores by the <a title="Guide Rural Urban Classification 2004" href="http://www.defra.gov.uk/evidence/statistics/rural/documents/rural-defn/Rural_Urban_Introductory_Guide.pdf" target="_blank">rural urban classification 2004</a> from the ONS.</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/02/UrbanRuralEntropy.png"><img class="aligncenter size-full wp-image-180" title="UrbanRuralEntropy" src="http://danieljlewis.org/files/2010/02/UrbanRuralEntropy.png" alt="" width="448" height="313" /></a>This graph seems to confirm the visual reading of the map to some extent, the fit is worst for Urban areas, better for town and fringe, best for villages and slightly worse again for Hamlets and Isolated Dwellings. This graph was created only from data pertaining to OAs in England and Wales though, as Scotland has a different classification as is its want. The effect of Scottish OAs may lift the value for Hamlets though, as Scotland has more remote areas than England and Wales in general. I&#8217;ve also created a graph for the Rural and Urban Classification 2004 using the combination classification that takes into account &#8216;sparseness&#8217; as well. Ostensibly sparcity relates to the number of housholds in the surrounding 30km of a grid which has been aggregated to OA level. From this a distinction of sparse and less sparse is created, I&#8217;ve got no idea what this means and it seems useless and confusing, however it does back up the earlier poitn for what it&#8217;s worth:</p>
<p style="text-align: left"><a href="http://danieljlewis.org/files/2010/02/UrbanRuralEntropy2.png"><img class="aligncenter size-full wp-image-181" title="UrbanRuralEntropy2" src="http://danieljlewis.org/files/2010/02/UrbanRuralEntropy2.png" alt="" width="506" height="342" /></a>Areas that are &#8216;sparse&#8217; seem to be less well classified than areas that are &#8216;less sparse&#8217; &#8211; I&#8217;ve no idea what that means though. Nevertheless the pattern is much the same.</p>
<p style="text-align: left">Essentially OAC works better if you&#8217;re not classifying extremely urban, or extremely rural areas. I think someone should look at the rural urban classification though, or a least write a sensible description of what is actually meant by sparse or less sparse &#8211; a less sparse urban areas? I wish I knew what that meant!</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2010/02/10/oac-quality-using-entropy-scores/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>New Role as Secretary of OAC User Group</title>
		<link>http://danieljlewis.org/2009/11/03/new-role-as-secretary-of-oac-user-group/</link>
		<comments>http://danieljlewis.org/2009/11/03/new-role-as-secretary-of-oac-user-group/#comments</comments>
		<pubDate>Tue, 03 Nov 2009 12:17:02 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[OAC]]></category>
		<category><![CDATA[OACUG]]></category>
		<category><![CDATA[QMRG]]></category>
		<category><![CDATA[research groups]]></category>
		<category><![CDATA[RGS]]></category>
		<category><![CDATA[RSS]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=74</guid>
		<description><![CDATA[I have recently been made secretary of the Output Area Classification User Group (OACUG), a research group based at the Royal Statistical Society (RSS). This is a second role to complement my position as Postgraduate Representative of the Quantitative Methods Research Group (QMRG) at the Royal Geographical Society (RGS). My colleague Alex Singleton has also [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2009%2F11%2F03%2Fnew-role-as-secretary-of-oac-user-group%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2009%2F11%2F03%2Fnew-role-as-secretary-of-oac-user-group%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I have recently been made<strong> secretary</strong> of the <strong>Output Area Classification User Group</strong> (OACUG), a research group based at the Royal Statistical Society (RSS). This is a second role to complement my position as <strong>Postgraduate Representative</strong> of the <strong>Quantitative Methods Research Group</strong> (QMRG) at the Royal Geographical Society (RGS). My colleague <a title="Dr. Alex Singleton" href="http://www.alex-singleton.com/" target="_blank">Alex Singleton</a> has also been made chair of the OACUG.</p>
<p>The Output Area Classification (OAC) is a geodemographic classification, meaning that it is a system for characterising the type of people that live in a given area based upon their demographic data from the 2001 census. There are 3 hierarchical levels to OAC, consisting of 7 groups at the top, 21 at the middle and 52 at the bottom level. The intent of OAC is to provide a simple and general measure of &#8216;neighbourhood type&#8217;. The scale at which OAC is reported is Output Area (OA) which is an areal unit that contains roughly 125 households.</p>
<p>A good visualisation of OAC exists at on the <a title="London Profiler" href="http://www.londonprofiler.org/" target="_blank">London Profiler</a> site.</p>
<p>Further information at the <a title="OACUG" href="http://areaclassification.org.uk/" target="_blank">OACUG</a> and <a title="QMRG" href="http://qmrg.org.uk/" target="_blank">QMRG</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2009/11/03/new-role-as-secretary-of-oac-user-group/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Understanding your citizens, customers and communities using OAC</title>
		<link>http://danieljlewis.org/2009/07/13/understanding-your-citizens-customers-and-communities-using-oac/</link>
		<comments>http://danieljlewis.org/2009/07/13/understanding-your-citizens-customers-and-communities-using-oac/#comments</comments>
		<pubDate>Mon, 13 Jul 2009 21:42:46 +0000</pubDate>
		<dc:creator>Daniel Lewis</dc:creator>
				<category><![CDATA[Lecture]]></category>
		<category><![CDATA[communities]]></category>
		<category><![CDATA[geodemographics]]></category>
		<category><![CDATA[local government]]></category>
		<category><![CDATA[neighbourhoods]]></category>
		<category><![CDATA[OAC]]></category>
		<category><![CDATA[places]]></category>

		<guid isPermaLink="false">http://danieljlewis.org/?p=50</guid>
		<description><![CDATA[On friday (10/07) I attended a workshop on the usage of the Output Area Classification (OAC), aimed at local government and the public sector. I have experimented with a number of geodemographic classifcations, both commercial (experian&#8217;s mosaic, CACI&#8217;s acorn and health acorn) and non-commercial (OAC and Petersen et al&#8217;s LOAC (2007)) and was interested to [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fdanieljlewis.org%2F2009%2F07%2F13%2Funderstanding-your-citizens-customers-and-communities-using-oac%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fdanieljlewis.org%2F2009%2F07%2F13%2Funderstanding-your-citizens-customers-and-communities-using-oac%2F&amp;source=gisdjl&amp;style=normal&amp;service=bit.ly&amp;service_api=gisdjl%3AR_cbf864f1d7672c90a5d0e63770588605&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>On friday (10/07) I attended a workshop on the usage of the Output Area Classification (OAC), aimed at local government and the public sector. I have experimented with a number of geodemographic classifcations, both commercial (experian&#8217;s mosaic, CACI&#8217;s acorn and health acorn) and non-commercial (OAC and Petersen et al&#8217;s LOAC (2007)) and was interested to see the experienced line up presenting.</p>
<p>Tim Allen of the LGA introduced why customer insight is increasingly important, in the public sector this translates to understanding the situation and needs of your constituents, and directing spending accordingly. This introduction of the potential of OAC in the public sector pushes the merits it holds to assess neighbourhoods, communities and places. This trinity was oft mentioned, however I am suspicious that they were being conflated to mean similar things when in fact OAC is only realistically a window into neighbourhood/areal characteristics.</p>
<p>Second up was Dan Vickers of Sheffield University, who developed the OAC for the Office for national statistics, who went onto describe some of the characteristics of OAC and the decisions that went into its construction.</p>
<p>The remaining sessions focused on how OAC can be used to examine dataset and find trends, Martin Callingham, visiting Professor at the University of London, showed its use in profiling populations, essentially tagging locational data with the appropriate group and comparing it to the national average. In doing this he gave a lot of examples, but went onto describe OAC as fundamentally being about &#8216;place&#8217; something I&#8217;m not convinced is true.</p>
<p>Likewise John Fisher of local futures, described how OAC can tell &#8220;stories of Britain&#8221;, which brings the work of Doreen Massey to mind, the idea of place being the composite of &#8220;stories so far&#8221;, however this is a post-structuralist view of place-construction which itsn&#8217;t reflected in the way OAC is cast, with strongly defined boundaries, absolute assignments and statistical relevance. Fisher brings in further element of government into the OAC agenda, referencing &#8216;place-shaping&#8217; and &#8216;total place&#8217;, sustainable communities and localism. All elements which OAC may have a role in, but a role that needs careful shapign and consideration not broad geodemographic strokes.</p>
<p>Keith Dugmore, of Demographic Decisions, was more measured in his talk, which was quite interesting and revealed a geodemographic approach to assessing sample surveys, this is achieved through OAC coded surveys such as the British Household Panel Survey, or the Expenditure and Food Survey and allow for small area estimates for local areas based on the results of these surveys. The key failing picked up was a lack of confidence intervals, but these are easily added in reality.</p>
<p>Michael Willmott wrapped things up with a forward looking commentary on the trajectory of OAC and social research, unfortunately given apparent inability to distinguish neighbourhoods, communities, and places in some of the talks, and amongst some of the participants, it may have been overly optimistic.</p>
<p>I do think that geodemographic classifications have a role to play, simply that there needs to be a greater understandng of how they are best interpretted.</p>
]]></content:encoded>
			<wfw:commentRss>http://danieljlewis.org/2009/07/13/understanding-your-citizens-customers-and-communities-using-oac/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

