Generalising OS MasterMap Buildings

.........................................................

The purpose of map generalisation is to represent spatial data in a way that makes it possible to effectively view the data at scales smaller than that for which it was originally intended. In the case of the Ordnance Survey’s MasterMap product you have data at an incredibly fine level of spatial resolution, which is ideally viewed at a scale of approximately 1:1000 give or take 500. When you are reliant on MasterMap, but need to create map of a wider area you are faced with the challenge of generalising data so that it can be ably understood, this means reducing the complexity of components of the spatial data, for instance smoothing wiggly lines, or transforming complicated polygons into simpler ones, as well as aggregating an abundance of small features into larger ones. Such interventions are necessary because it is increasingly difficult to resolve fine detail as map scale decreases, leading to complex shapes appearing messy and disordered when visualised at smaller scales that that which they were intended for. The actual scale of a map gives us an insight into the types of objects that it is possible to resolve at different scales; at a scale of 1:1000 a physical distance of 100cm represents 1km, at 1:10000 and 1:100000 the distance of 1km is covered by 10cm and 1cm respectively. If we conservatively suggest that we can resolve features that are 5mm across, then at 1:1000; 1:10000; and 1:100000, the smallest real world objects that can be represented are 5m,  50m, and 500m respectively. These distances equate to real world objects such as large cars and trucks (c. 5m in length), Olympic-sized swimming pools and office buildings (50m), whilst a distance of 500m is twice the span of Tower Bridge. Evidently, there are significant difference in what constitutes appropriate detail at each of these scales.

I’ve been dealing with one such problem recently, involving the representation of MasterMap building outlines at a scale of 1:10000, somewhat smaller that the 1:1000ish that it was intended for. In order to create an effective map I did needed to generalise the building outlines, however, unfortunately I don’t have access to ESRI’s ArcGIS “simplify building” tool due to licensing restrictions, so I had to come up with another solution. Initially I attempted the classic line generalisation procedure – the Douglas-Peucker algorithm, which simplifies by reducing the number of points in a curve subject to some pre-specified threshold value. However, buildings are strong geometric shapes, often rectangular and orthogonal, so an algorithm such as the Douglas-Peucker can have the effect of disrupting the geometric regularity of building outlines, removing corners etc. What is required is a polygon simplification algorithm that preserves orthogonality, however I couldn’t find anything that did this whilst being accessible, instead I had to come up with a procedure to approximate a generalisation of the building polygons by another methods. The image below reveals the result, which I think is successful enough to use.

In the image above, A is the raw data, and B is the generalised data. I experimented with a few approaches, but the one I assessed as being the best was to position an enclosing rectangle around each building polygon, so that the area of the enclosing rectangle was minimised, and subsequently buffer the result to close any small gaps, choosing to dissolve as well in order to further reduce the complexity. Subsequently I removed the particularly small buildings. The generalisation is more in evidence in the image below, in which A and B are the same as before.

I am reasonably pleased with the result, which was achieved after a little trial and error. Whilst technical approaches to orthogonal simplification exist I can’t imagine them being much more effective at this scale, although perhaps at smaller scales they would be more appropriate as they can create meaningful aggregations of building based upon characteristics such as nearest-neighbour distance.

Continue Reading

Thomas Pynchon’s Entropy

.........................................................

James Clark Maxwell, creator of "Maxwell's Demon"

The concept of entropy has arisen periodically during the course of my PhD study, both as an analog to thermodynamics operationalised by Alan Wilson in his ‘family’ of spatial interaction models and to Shannon’s entropy as formalised spatially by several scholars including Mike Batty. Alan Wilson’s models of urban systems work because modelling an entropy function allows for the ‘most probable state of a system’ to be realised; as in thermodynamics’ if you start with an initial condition in which the state of the system is unbalanced, or disorganised, and iteratively maximise this state until an equilibrium is reached, that equilibrium state will represent the most probable state. Similarly, information entropy, pioneered by Claude Shannon, tells a similar story that we might think of as unpredictability, or uncertainty, in the transmission of information; practically information theory in geography has been used to characterise ‘evenness’ in the observed distribution of phenomena.

Entropy is generally regarded as something of a wily concept, frequently managing to avoid clear-cut explanations of what it represents, and acting at times as a mysterious quantification of uncertainty. It is with some delight then that it should crop up far outside of the scientific, or at least social-scientific arena, within the work of Thomas Pynchon, the famous American novelist and acknowledged recluse. Although I believe it is a reoccurring theme in his work, I encountered Pynchon’s entropy in the context of his 1966 work “The Crying of Lot 49″. This was a curious moment of coincidence as I had only recently discovered from Peter Baudains that some work on ‘complexity science’ he had been involved with was supported by English novelist Giles Foden, famous for his 1998 novel “the Last King of Scotland”. Such coincidences aside, the aim of this post is simply to consider how Pynchon invoke’s entropy, and what he means by it.

The Crying of Lot 49 is concerned with the story of Oedipa Maas as she struggles to come to terms with the practicalities of executing the estate of her deceased tycoon ex-boyfriend Pierce Inverarity. The task that Oedipa faces is complicated by the apparent realisation that she may be entwined in an historic, and ongoing, global conspiracy between two postal companies: Thurn and Taxis, and Tristero. Pynchon’s direction leads Oedipa through a set increasingly confusing circumstances, which seem to point towards society behaving in an increasingly unpredictable way; echoing the entropic state of a system as inherently disorganised. Oedipa begins to behave like the ordering, maximising, function in a Spatial Interaction Model, attempting to sort and seek out a most probable understanding of just what is happening as the novel unfolds. The core question pertains to the success of Oedipa’s efforts, and is largely unresolved at the books ending, simply put: can we ever overcome the uncertainty of life?

Tellingly, Pynchon makes reference to “Maxwell’s Demon”, a philosophical device that can supposedly overcome entropy, the idea behind Maxwell’s demon is that there exists some “finite being” (as Maxwell put it) to order the disparate elements of a distribution. In thermodynamics this would mean creating an artificial seperation between hot and cold particles, thus avoiding the thermodynamic equilibrium of the 2nd law. Currently it is unclear whether Maxwell’s demon could in fact violate the second law of thermodynamics.

 

Continue Reading

Weighted Mean Direction Surfaces in Python

.........................................................

I work a lot with flows and spatial interactions, one thing that I’ve wanted to do for a while is compute a mean flow direction surface. Unfortunately, arithmetic means don’t work for angular data, this is because it cannot account for the circular nature of the distribution of angular measurements. For instance the angles 5 degrees and 355 degrees are seperated only by 10 degrees, but their arithmetic mean is 180 degrees -w ay off, it should be 0 degrees!

Luckily, Brunsdon and Charlton have published on this very subject, so I took it upon myself to implement a weighted circular mean function in Python. The key obstacle was learning about complex numbers, about which, up until this point, I had no idea about at all!

The first thing to do is calculate the angle between a set of candidate points (such as people) and a set of services (such as Medical Centres). This is simple enough to do using, and would look something like:

import math
math.atan2((y2-y1),(x2-x1))

In which the pair (x1,y1) is the location of the candidate point, and (x2,y2) the location of the allocated service for that candidate point. The line linking these two points defines a flow from a candidate point, to a servcie and vice versa.

Having calculated all of the angles, I used ArcGIS to create an output grid, at the extent of the candidate points, using the “fishnet” function which creates a vector grid of prespecified dimensions.

The beauty of Brunsdon and Charlton’s method is that it uses a local method of approximation, this means that for each cell in the output grid, a mean direction can be calculated based upon the values of nearby points, applying a weighting allows for more distance points to have less of an effect on the mean direction.

Firstly, I read all the candidate points into a KDTree structure, this allows me to search for local points, at the same time I also create an array of the angles for those candidate points.

from scipy.spatial import cKDTree
import numpy as np

tree = cKDTree(treepoints)
res, idx = tree.query(testpoint,300000,0,2,100)
res = res[0][np.where(res[0] < np.Inf)[0]]
idx = idx[0][:len(res)]

The tree takes a numpy array of coordinate pairs, and the query method returns an array of distances to points (res) and their index value in the original array of coordinates (idx). The testpoint is a cell in the vector grid; 300000 is the k-number of nearest neighbours to find, here I have simply set it arbitrarily high in the context of my dataset; 0 is for approximate nearest neighbours, here I’ve specified exact; 2 indicates the use of euclidian distance; and 100 is the threshold, neighbours won’t be returned if they are further than 100 metres away. The penultimate line simply returns an array that is shortened to just those values which are less than 100m away (i.e. less than infinity) – points over 100m away are returned as value Inf.

The next step is to actually compute the mean direction, this requires a special approach using complex numbers however. Brunsdon and Charlton show that a direction can be stated as a complex number z in which z = exp(iθ) this is effectively: z = cos(θ) + i sin(θ)  in which i is an imaginary number. We can restate our directions in Python using:

import cmath

thetas = angles[idx]
cThetas = []
for i in xrange(0,len(thetas)):
    cThetas.append(complex(np.cos(thetas[i]),np.sin(thetas[i])))
cThetas = np.array(cThetas)

Here, the complex function allows the complex number representing an angle to be stored in a list, which I convert (lazily) to a numpy array. The first term, thetas, is using the idx array from the cKDTree to cleverly index the relevant angle records from the angles array which stores all the angle values in the order of entries for the cKDTree.

Next a temporary variable is created which calculates the mean direction:

temp = np.sum(cThetas)/np.absolute(np.sum(cThetas))
MeanDir = np.angle(temp, deg = True)

The mean direction is given by the argument (Arg) of the resultant complex number, Python implements this with the np.angle function, where deg = True returns the angle in degrees, and False in radians.

So far this is the unweighted mean, aggregating directional observations within a 100m disk (see also: uniform disk smoothing). To introduce weighting we must first define a weighting scheme, I’ve used the one suggested by Brunsdon and Charlton, which is Gaussian, and might look at bit like this:

def gaussW(dists,band):
    out = np.zeros(dists.shape)
    for i in xrange(0,len(out)):
        temp = np.power(dists[i],2)/(2.0*np.power(float(band),2))
        out[i] = np.exp(-1.0 * temp)
    return out

weight = gaussW(res,100)

Quite simply, I pass the distance array res to the gaussW function and it gives me back an array of weights for that ordering of distances. Using this I can redo the mean direction thus:

temp = np.sum(weight*cThetas)/np.absolute(np.sum(weight*cThetas))
MeanDir = np.angle(temp, deg = True)

There you have it! Attached is the script I used. Obviously, Brunsdon and Charlton implement a variance and a couple of visualisation devices, but these should be simple enough to implement now!

I created an output for flows of patients to GPs in Southwark, visualised using one of ESRI’s circular/direction colour ramps from colour ramp pack 2. Not sure how best to visualise the legend at this point though. NB. 90 is north, -90 is South, 0/-0 is East and 180/-180 is West. The map is visualised to show the 4 cardinal directions, but the output is in fact continuous.

My example script is here. Note that I am using dbfpy to read and write to shapefile DBF tables directly.

Continue Reading

A Spatial Approach to Location Quotients

.........................................................

The intent of this post is not simply to uncover where the highest density of people belonging to a particular ethnic group are, but rather to use the ‘location quotient’ (LQ) technique to compare the ethnic density in any one area to the overall ethnic density in Southwark, thus providing a relative insight into where the density of particular groups is more, less or as dense as expected.

Location Quotients tend to work with areal units, characterising different areas subject to a larger region and providing a basic insight into where functions are clustered. Because the Southwark patient register data is address geocoded, we would be losing some spatial information if we choose to aggregate the data, not to mention the question of which areal aggregation is best. More info on how to create location quotients here.

A Location Quotient has 3 possible interpretations; if it is around 1 then the ethnic population in that area is at the level we would expect given what we observe nationally. If the LQ is less than 1 then that area has a lesser population of a particular ethnic group that what we would expect based upon national figures. Finally, in the LQ value is over 1 this suggests a concentration of the ethnic group in the area which is greater than we would expect given nationally observed levels. A LQ is quite simply a rate-ratio.

Instead of the standrad areal approach, the maps here use a density estimation approach in which disaggregate point data is transformed into a representation of the continuous density function of the point distribution. The LQ can then be computed for each cell based on the density of that cell with respect to the total density of the surface. This creates a smoothed LQ surface which is readily interpretable in the same manner as above. The Kernel Density Estimation used to create the ethnic and total population density surfaces should be parameterised in the same way; these examples use a 250m bandwidth and a 25m cells size, which is largely empirically redundant, based on the input dataset’s spatial resolution, but creates a more aesthetically appealing mapped representation. Naturally, the procedure works well for clustered data, in Southwarks case for the African and Muslim groups.

Continue Reading

Network Population Density for Southwark

.........................................................

Using the excellent SANET extension for ArcGIS 9.3 I was able to take some of my data for Southwark that I had geocoded to address level, and estimate the population density using the OS Mastermap ITN product. The procedure is essentially a Kernel Density Estimation that takes place on a given network rather than across 2D space, this effectively controls for the effect of spatial structure, such as urban form, of which the data relates to residential locations. The estimation is made for c.300,000 people in Southwark on a network with around 30,000 road segments so it is to be expected that the calculation takes several hours to run. The KDE process is parameterised in much the same way as the straightforward density estimation procedures in the ARCGIS Spatial Analyst toolboxes, bandwidth and cell size are specified. In this case though cell size relates to the length of segments into which the network has to be cut in order to represent the output. Additionally, SANET allows you to control how you handle road intersections, either by using a continuous or discontinuous approach to the bifurcation, i arbitrarily chose the continuous approach, essentially meaning that the density estimation can turn corners. A straightforward representation can be made in 2D as below.

The interesting aspect to this image that is obscured in 2D smoothed representations is the relative usage of different streets, clearly visible are the residential streets as distinct from the more commercial area on Southwark’s Bankside, and along major roads, and the effect of open space and water features in reducing network density (i.e. if only one side of a road has residences on it). I’ve attempted to explore this further by using ArcScene’s 3D visualisation capabilities, but the complexity of the data make this an incredibly arduous process. The result i was able to obtain outside of ArcScene simply crashing are below.

In this example, Southwark is presented in a kind of 2.5D perspective in which the streets have been extruded so that their height represents the population density at that point. I’ve included some contextual elements, the Thames, and parks, wooded areas, and other water features. Whether or not this image is in anyway an improvement over a simple 2D representation is open to debate, but the selections below do present an interesting cross section of the data.

Continue Reading

‘Compactness’ in Zoning: the circle as the ideal.

.........................................................

I saw a thought provoking presentation recently, given by Wenwen Li of the University of California Santa Barbara, the talk was a wide ranging insight into Cyber Infrastructure, its uses for geospatial information, and some of the computational techniques that underpinned the project. One element of the project involved zone design for the greater Los Angeles region, and involved the implementation of an algorithm that was intended to aggregate small areal units into larger zones whilst meeting a number of conditions, principle among these conditions was ‘compactness’. The output looked very much like a single hierarchy of Christaller hexagons, and this got me thinking about the nature of space and compactness.

From: http://watd.wuthering-heights.co.uk/mainpages/sustainability.html

Christaller’s hexagons are the defining illustration of something called ‘central place theory’, a geographical abstraction that idealises settlement pattern based upon an underlying space which is assumed to be isotropic. The assumption of spatial isotropy is the big leap in this model, it assumes that the ‘friction of distance’ from any given point increases at an equal rate whichever way you go from that point. Clearly such a suggestion is not applicable to Los Angeles, where huge freeways and interchanges can make adjacent parcels of land remote neighbours, and increase the connection between advantageously placed non-adjacent sites? Surely a city in which sprawl and ribbon development, as well as segregated communities should be modeled differently? Why then do many of our zoning algorithms favour compact ‘circular’ shapes, very much in the christaller mould, and why do we reject uncompact areal features as ugly slivers? In short, how did the circle come to be the ideal shape of zone in regional studies? Certainly, it is easier, both implementationally and conceptually, to model circles than to consider optimising a zone system over an n zone by n zone similarity matrix pertaining to variables which may be important to aggregating any set of areal units. However, as we explore more and more the complex systems defined by cities and regions, surely there is a need to start integrating a more realistic anisotropic view of space, one in which the friction of distance from any given point in any given direction is defined by the underlying demography, built environment and/or infrastructure.

One such attempt at this, AMOEBA (A Multidirectional Optimum Ecotope-Based Algorithm), developed by Aldstadt and Getis, is worth noting. In this algorithm, zones are defined via the Getis-Ord Gi* statistic, which is a local statistic for identifying clustering, thus zones are defined by local conditions, which are free to vary anistropically across space, rather than by a predefined preference for circles. Spectacularly this algorithm is implemented in the superb clusterpy python module for spatially constrained clustering.

Continue Reading

Mapping Spatial Entropy in Southwark

.........................................................

I’ve been doing a bit of work recently on segregation with Pablo Mateos, and having gone through the motions with aspatial indices of segregation (the classics): dissimilarity, exposure and so on, I decided to investigate the more explicitly spatial ones. Taking a lead from Reardon and O’Sullivan’s (2004) paper “Measures of Spatial Segregation” in sociological methodology, I got in touch with David O’Sullivan and he, and his student Seong-Yun Hong, helped me with the implementation of some spatial measures of segregation. This post specifically concerns spatially weighted entropy – a measure of population diversity. Reardon and O’Sullivan define spatially weighted entropy as:

This equation describes the ‘entropy’, derived from Shannon’s information theory, for each grid cell in the image (below) in which each cell value results from the entropy computed for a 1km ‘neighbourhood’ p around each cell (essentially a circular buffer). The ethnic group in question is given by ‘m’ (with the pi representing the proportion of a given group in a given neighbourhood) and relates to ethnic groups defined from the Southwark patient register using Onomap, the groups defined are: African, East Asian and Pacific, European, Muslim, South Asian, British, Eastern European, Hispanic, and Unclassified or Other. The Onomap software is able to apply this classification by looking at the forename and surname combination of patients registered to use Southwark GPs, or patients living in Southwark but using GPs outside of Southwark. The cells in the image relate directly to the residential locations of patients, who were geocoded to their household using the Ordnance Survey’s Address Layer 2 product, therefore empty cells are areas within which no recorded patients were found, such as parks, and transport infrastructure. As the data underlying this is from patient registrations with GPs, we have to accept that the data is likely to be partial, with potentially systematic biases in those people who have registered – young men and people from countries where GPs as a method of primary care do not exist- may have been omitted.

In the image, higher values of entropy indicate greater diversity of population by ethnic group, the resultant images is unsurprising in terms of Southwark, with the Dulwich Village area showing as the least diverse place, home as it is to more affluent, generally ‘British’ groups. Likewise historical factors regarding access to housing have shaped the lower entropy scores in the middle of the borough – home to African populations and the North East, home to the British working classes who were rehoused from the now more African areas in the middle of the borough. Finally, the greater Waterloo- Elephant and Castle region in the north-west shows up as the ethnic melting pot in the borough.

In the image above, the 1km neighbourhood defined in the spatially weighted entropy score has a smoothing effect, I experimented with smaller values for the neighburhood size, and found that the resultant output did not change dramatically from that obtained above. At the end of the day, the selection of neighbourhood size is largely arbitrary and will depend on sociocultural factors of the area and it’s people. Similarly, as there is no data for the regions outside of Southwark we are more uncertain of the values at the edges than in the middle of the borough as we are only sampling from within Southwark itself. Nonetheless, this representation of Southwark goes somewhat beyond what is possible using the commonly used output zones defined by the census.

Continue Reading

Spatial Design for GP Consortia?

.........................................................

The government is set to release a bill detailing how it is they expect the proposed GP Consortia to work. GP Consortia, groups of GPs working together, are set to replace the current structure of Primary Care Trusts (PCTs) and Strategic Health Authorities (SHAs) as the mechanism through which primary healthcare is provided to the public, and services are commissioned. Recently, the planned wholesale changes to the NHS have come under a sustained attack from the media, professional bodies and MPs, meanwhile the plans for GP consortia have moved into a trial phase in which different setups are being tested for their effectiveness. The trial consortia demonstrate the extent to which the plans represent a completely new venture, with a broad spectrum of possibilities being tested in terms of consortia templates, from a ‘consortia’ of a mere 3 GP practices, to a vast group of 83 GP practices. There seems little reasoning behind how Consortia are allowed to form at the moment, thus I saw an interesting opportunity to consider the ‘GP Consortia Problem’ as a geographic question. This is most evident in the fact that the NHS is mandated to provide an equitable and universal service, and an unmetered potential for GPs to ‘consort’ may well lead to increasing inequities in healthcare provision.

I see the ‘GP Consortia Problem’ as solvable through a zone-design approach. To do this, I identify contiguity between all English GPs and employ spatially constrained clustering. The following assumptions are made:

  • Distance is important, GP consortia should be space covering without holes or islands, therefore a ‘neighbour’ approach to contiguity is advocated using graphs.
  • As a preliminary test, GPs are considered to be equal, although there is scope in the future to develop measures of dissimilarity and homegeneity which will provide better, or more appropriate solutions to the GP Consortia problem.
  • Based on the trials, I assume that Consortia must consist of at least 35 GPs, the average number of GPs per consortia in the trial phase.

I have used two approaches to creating contiguity amongst the English GP practices, both of them graph theoretical concepts based upon geometric analyses: the delaunay triangulation, and the gabriel graph. I believe that the gabriel graph is a sub graph of the delaunay triangulation, as such it is sparser than the delaunay graph. The two graphs are defined as:

  • Delaunay Triangulation – for a set of nodes (GP practices) the delaunay triangulation is the set of triangles created by drawing a circle with 3 nodes (which define the triangles edges) on the circle’s perimeter, in which the circle does not contain any other points- iterated for all sets of 3-points.
  • Gabriel Graph- 2 nodes are connected if they form the start and end-point of the diameter of a circle, and the circle does not contain any other points – iterated for all pairs of points.

In this sense, both the Delaunay triangulation and the Gabriel graph are nearest proximity measures. Having obtained the graph, the differences can be seen below. Note both graphs have been constrained for the English boundary.

Having created the ‘contiguity’ graphs, I wrote a short python script to extract the realtionships between GPs and write the output as a ‘.gal’ file for use with pySAL. I utilised the pySAL regionalisation module to compute the consortia solutions, I have used this previously in my blog, so I won’t go into detail on it. I paramterised the solution using the contiguity matrices created, assuming equality amongst GP practices, and looking for groups of at least 35 GPs. The regionalisations were then joined to a special areal geography I created for visualisation, this is simply the Voronoi diagram of the English GPs clipped to the English boundary. The results are below:

In these results it is notable that the Gabriel graph gives a cleaner result, the density of the delaunay-based contiguity matrix means that the result is subject to some sliver-like polygons in the regionalisation, and ‘spikier’ regions in general.

Of course, this is just a test, but it does point at the potential to create a rationalised system fo GP Consortia. Naturally, the biggest issue with these maps is that they only establish an areal depiction of consortia, one that is largely irrelevant. This is because the actual service areas of GPs tend to overlap and extend beyond any given GP’s voronoi defined footprint. Therefore the geography of patients requires a subsequent treatment once a geography of COnsortia has been established, and only in the interaction of the two can issues pertaining to equity be understood.

Continue Reading

Frozen Britain and No Central Heating?

.........................................................

I liked Ben Hennig’s population cartogram of the UK under snow, but I thought it could perhaps show something a little more serious than simply where the people are. To do this I went to the UK Census 2001 (I know, an old data source, but the only thing I was aware of that could help me) and downloaded a dataset of counts by area (LSOA) of households without central heating. Using these counts as a base population, I created the cartogram below.

Whilst very similar to Ben’s cartogram, there are some differences, notably Scotland is not as prominant as in Ben’s. Perhaps the higher frequency of harsh winters in Scotland has made central heating a necessity. This also seems to be true in the far north of England. Likewise, Wales shrinks away in all areas aside from Cardiff which is a notable bulge of people without central heating. It is clear, however, that the people most effected by a lack of central heating are those that live in the south and middle of England in large population centres such as London – perhaps complacency to cold weather, plus a stock of substandard housing, or high levels of deprivation have caused this. Needless to say, it is likely to be these people that disproportionately feel the cold this winter.

Continue Reading

Representing Populations: a Spatial Ecology

.........................................................

A subtitle to this post might also be: Are we all being mislead by the New York Times? In stating this I am referring to the recent maps released by the New York Times looking at ethnic distributions from the US Census Bureau’s American Community Survey.

The most immediate thing we can learn about this project is that it is a spatial ecology, that is, an examination of the spatial patterning of a phenomena, here it’s ethnicity, at a given level of spatial aggregation, in this case “every city, every block”.This much is apparent both when you drag the mouse across the geography of America and the Census areas are highlighted, as well as when you zoom in, and you navigate from the Census tract level to the Census block level, a finer scale areal aggregation.

On the one hand, what has been achieved in this map is tremendous, and the use of dot density mapping allows for a singular look at multivariate data. The sheer level of residential segregation in the US also makes the dot density approach a very persuasive cartographic representation. However, first let us consider what the dot density approach is.

First and foremost, it is important to note that the dot density approach does not represent the real-world locations of individuals, far from it, dot density maps are simply another way of drawing a choropleth map. Choropleth maps show data aggregated into predefined areas (e.g. Census Blocks) and thematically colour these areas based upon some classification of the share of the mapped phenomenon that each area has. In a dot density map, each dot represents an observation, or number of observations, that occur within an area, each dot is then randomly positioned within that area. This means that phenomena do not strictly occur where they were sampled, which can (in increasingly large areas) lead to increasingly large uncertainties and misrepresentations. A higher number of dots within an area indicates a greater number of observations, with density described by the relative spacing of the dots in each area: smaller spacings indicate higher density.

Herein lies the difficulty – most ways of dividing up territory, and census delineations in particular, use a space covering approach. This continuous, spatially extensive way of dividing up land means that all land areas, even areas that have no people living in them, are potentially subject to the random placement of a dot, in the image below this is shown by the placement of dots in water bodies. Dot density can be logically unsound, particularly when two adjoining census blocks have significantly different population densities, shown by the representation of apparently hard ‘edges’ at areal boundaries as in the image below.

One solution that could work to mitigate the issue of representing areal data using dot density maps would be to apply dasymetric mapping. The dasymetric mapping technique is a method of reallocating a population recorded on a continuous areal basis to one which is a better representation of where people actually are. To do this, more information than simply population counts are usually required, such as landuse classifications, or delineations of developed area. In reallocating population counts from an areal unit created on a continuous basis, to one which aims at a more realistic placing of people in space, the volume of people per area is preserved, this means that you will never end up with more or less people than you started with. David Martin has, in the UK, been responsible for some notable dasysemtric outputs with regard to the UK Census, and provides a software tool, SurfaceBuilder, here.

The overarching goal of dasymetric mapping is to circumnavigate the ecological fallacy, which manifests itself in issues I have suggested exist in the dot density mapping of the US. Whilst dasymetric mapping would resolve some issues, dot density would still be subject to some mislocation of data, which largely stems from the conflicting ontology of representing an areal-based data, such as a population count by census area, as a series of points within that area; it is too easy for the viewer to interpret the points as having some level of significance above and beyond the areal container within which they sit. Therefore it is useful that the New York Times mapping also provides an option to look solely at the thematic choropleths classified by colouring the areas for each individual ethnicity. In this representation the viewer cannot confer the same kind of absolute interpretation upon the meaning or location of points, as they may do for dot density representations.

Continue Reading

The Blogroll

Search this Site


[]