Geodemographic classifications are funny things, they report a view of the world which suggests that areas can be split into groups within which all areas share the same or similar characteristics. This is not an inherently bad thing, for large scale analyses it can be a very useful way of simplifying a diverse array of variables into something that characterises the underlying patterns in the distribution of data. However, for smaller scale analyses I am increasingly finding that non-bespoke geodemographics are limited, I attempted to demonstrate this on a national scale by looking at the entropy scores for each OA in the UK with respect to distance from all supergroup cluster centres (here). Recently, Pete Fischer presented some very clever work in this vein at the recent GISRUK 2010 conference, he used fuzzy classification strategies to account for the likelihood that each OA does not fit exactly into any particular grouping, and that different OAs fit differently into the same group. Aidan Slingsby at City also showed this very nicely visually with his ‘OAC Explorer’. Procedings from the conference can be found here.
With this in mind, I wanted to test the variability of the data in Southwark, my study site, with respect to OAC. OAC paints a very flat picture of the population of Southwark as shown below, and had led to me using LOAC, a London specific variant of OAC created by Jacob Petersen a previous research student at UCL and available as a layer on the London Profiler. Using OAC, Southwark is primarily ‘multicultural’, there is more variability in the LOAC classification however, as is evident.
Inspired by some of James Cheshire’s great work with surnames I employed a method called “Multi Dimensional Scaling” or MDS. Multi Dimensional Scaling is great for exploring similarities and dissimilaries in data, rather than clustering data as in the creation of OAC, it reorders it so that similar datapoints have similar values. One of it’s great advantages is that it allows for the scaling of data that has many dimensions, such as the 41 OAC variables, into fewer dimensions representative of those 41, these can subsequently be visualised. Traditional approaches in geography have used MDS to scale many dimensions into 2, using these 2 to adjust spatial coordinates to ‘blow apart’ maps, reordering places that are similar together and dissimilar further apart. Such representations challenge the validity of Tobler’s 1st law – near things are more similar than distant things. In this case however I don’t want to blow up Southwark, so I follow Cheshire’s lead in using the scaling to specify a colour for each area in which similar colours indicate similar areas in terms of OAC variables and different colours represent different areas. I experimented with both greyscale and RGB colour scales for this representation. Firstly though, a note on how I got there:
- Download the OAC variables from CASWEB, using the ‘recipe’ specified by Vickers et al (2005).
- Standardise all the variables – I used Z-score without really checking for normality, although in reality this would be preferable – Vickers suggests some other methods of standardisation.
- Compute a distance matrix for the MDS. This means calculating the similarity of each pair of OAs, given n OAs this thus leads to an n x n size matrix, a size that can rapidly become unmanageable beyond local scales. I used ‘canberra distance’ (an arbitrary choice) to compute the matrix which is given by:
where i relates to the value of the first object in a pair and j the second, and k denotes the variable in question. - This matrix is then input into an MDS solver, as a python fan I used the fantastic code written using Numpy by Jeremy Stober, although I added to it to do all the standardisation, distance matrix creation etc as part of a logical process.
- Specifying the number of output dimensions (I used 1 and 3) allows you to reduce the large distance matrix into a vector (1d) or matrix (3d) of values, these can then be scaled between 0 and 255 to be converted into digital numbers for visual display. Thanks to James Cheshire for the ArcGIS script to assign RGB values in Arc.
The results I got from this preliminary exploration were as follows:
This is a very interesting way of looking at the OAC data, as the comfortable uniformity of the seven classes has been lost, instead we can see trends and similarities, but also a fair amount of discountinuity and noise. In the black and white representation a spectrum is presented in which very dark and very light colours are the most dissimilar slowly converging through the spectrum. The resultant mapping clearly displays areas of similarity, the more affluent southern tip of Southwark, the Southbank region in the north of the borough, and the former docklands in the north-west. Counterpoint to these areas is the middle band of Southwark represented by darker hues, and roughly aligned with known areas of deprivation characterised by high-levels of social housing, higher levels of non-white residents, lower level of educational attainment, poorer health etc. What is clear though is that the picture is not uniform as suggested by OAC, and that there exist notable pockets of difference, possibly interpretable as gentrification, particularly around parks. There is also evidence for some fairly notable discontinuities in demographic structure which isn’t immediately obvious in the OAC classification.
I also mapped an MDS output for 3 dimensions onto an RGB colour scale, as below:
The colour representation should be a more nuanced reading of the similarities and differences, although it is immediately more challenging to interpret. One of the interesting factors is how the southern area of Southern, most characterised by a blue/purple colour has now been distanced from the southbank and former docklands areas, suggesting they are more distinguishably different than previously. The previously dark area is now a pinkish hue, again suggesting a uniformity in that area, however it is flecked with a variety of colours suggesting that deviations in demographics amongst the areas of high deprivation are not similar to each other, but distinct enclaves each with their own specific character.
This constituted a preliminary study, time permitting I will continue to investigate interesting methods such as this. It is however a computationally intensive process, and a treatment of, for example the UK in this manner is out of the question. Nevertheless, I may update it at different scale in the future.
Acknowledgment: Boundaries Crown Copyright 2010 Ordnance Survey. A UKBorders/JISC Supplied Service. Data from CASWeb.



~ End Article and Begin Conversation ~
There are no comments yet...
~ Now It's Your Turn ~