The Jenks Optimal, or Jenks’ Natural Breaks, Algorithm is a common method for classifying data presented in a choropleth map. It aims to present a series of break values that best represent the actual breaks observed in the data as opposed to some arbitrary classificatory scheme (i.e. equal interval), in this way the actual clustering of data values is preserved (subject to the arbitrary specification of k classes). The method was originally published in George Jenks’ (1977) Optimal Data Classification for Choropleth Maps and reportedly represented the culmination of 15 years research on the topic, the method primarily derived from Walter Fisher’s work ‘On grouping for maximum homogeneity‘. The specifics of the algorithm aim to create k classes so that the variance within groups is minimised, as such it is a problem of numerical optimisation.
A paper by Michael Coulson (1987) entitled In The Matter Of Class Intervals For Choropleth Maps: With Particular Reference To The Work Of George F Jenks details a method that Jenks apparently authored, but never published, to derive how optimum the number of classes chosen was, the method Goodness of Variance Fit (GVF) works by taking the difference between the squared deviations from the array mean (SDAM) and the squared deviations from the class means (SDCM), and dividing by the SDAM. Thus:
GVF = (SDAM – SDCM)/SDAM
However, it is likely this was never published as the GVF improves as the number of classes increases, until at such a points as there are the same number of classes as data points, the GVF reaches unity. Nonetheless, I have included a rudimentary example for calculating this statistic. In reality, this method is used to generalise data into a few classes for visualisation, so you are unlikely to be using more than 7 (+/- 2) classes; number of classes can be loosely assigned by looking at the distribution histogram, but often this is difficult.
The script is here.
Acknowledgement: The initial script I used for the Python conversion can be found (in JAVA and Fortran) here: https://stat.ethz.ch/pipermail/r-sig-geo/2006-March/000811.html