Skip to content

OAC quality using entropy scores

The following map shows an entropy score by Great British Output Areas based on each OA’s ‘distance’ from each OAC supergroup cluster centre. Essentially I’m attempting to measure whether any given OA fits discretely into it’s cluster assignment or not. I’m using the cluster distance data from the University of Sheffield OAC datasite. To get a sense of fit I’m using entropy scores, given by the following equation:

Where pi is the distance of a given OA to a given supergroup cluster centre with respect to the other distance to centres. Essentially this is a measure of evenness, in terms of OAC we’d like the results to be less-even as this would suggest that one distance to centre is much smaller than the others indicating a good cluster assignment, OAs that are more-even are indicative of OAs which don’t fit as well into a single OAC class. In the map below a lower entropy score indicates less evenness and hence more a more discrete assignment of OAC class.

The pattern that seems to emerge is that urban areas, such as London, and extremely remote areas, such as the highlands of Scotland, do not fit the classification so well. I quickly tested this conclusion by summarising the entropy scores by the rural urban classification 2004 from the ONS.

This graph seems to confirm the visual reading of the map to some extent, the fit is worst for Urban areas, better for town and fringe, best for villages and slightly worse again for Hamlets and Isolated Dwellings. This graph was created only from data pertaining to OAs in England and Wales though, as Scotland has a different classification as is its want. The effect of Scottish OAs may lift the value for Hamlets though, as Scotland has more remote areas than England and Wales in general. I’ve also created a graph for the Rural and Urban Classification 2004 using the combination classification that takes into account ’sparseness’ as well. Ostensibly sparcity relates to the number of housholds in the surrounding 30km of a grid which has been aggregated to OA level. From this a distinction of sparse and less sparse is created, I’ve got no idea what this means and it seems useless and confusing, however it does back up the earlier poitn for what it’s worth:

Areas that are ’sparse’ seem to be less well classified than areas that are ‘less sparse’ – I’ve no idea what that means though. Nevertheless the pattern is much the same.

Essentially OAC works better if you’re not classifying extremely urban, or extremely rural areas. I think someone should look at the rural urban classification though, or a least write a sensible description of what is actually meant by sparse or less sparse – a less sparse urban areas? I wish I knew what that meant!

Categories: GIS, Modeling, Thoughts.

Tags: , , , ,

Comment Feed

One Response



Some HTML is OK

or, reply to this post via trackback.

Continuing the Discussion

  1. [...] entropy scores for each OA in the UK with respect to distance from all supergroup cluster centres (here). Recently, Pete Fischer presented some very clever work in this vein at the recent GISRUK 2010 [...]