Pump Neighborhoods




John Snow published two versions of the cholera map. The first, which appeared in On The Mode Of Communication Of Cholera (Snow 1855a), is the more famous. The second, which appeared in the Report On The Cholera Outbreak In The Parish Of St. James, Westminster, During The Autumn Of 1854 (Snow 1855b), is the more important. What makes it so is that Snow adds a graphical annotation that outlines the neighborhood around the Broad Street pump, the set of addresses that he contends is most likely to use the pump:

By identifying the pump’s neighborhood, Snow sets limits on where we should and where we should not find fatalities. Ideally, this would help support his claims that cholera is a waterborne disease and that the Broad Street pump is the source of the outbreak. Looking at the second map Snow writes: “it will be observed that the deaths either very much diminish, or cease altogether, at every point where it becomes decidedly nearer to send to another pump that to the one in Broad street” (Snow 1855b, 109).

To help assess whether the map supports Snow’s arguments, I provide functions that allow you to analyze and visualize two flavors of pump neighborhoods: Voronoi tessellation, which is based on the Euclidean distance between pumps, and walking distance, which is based on the paths travelled along the network of roads. In either case, the guiding principle is the same. All else being equal, people will choose the closest pump.

Voronoi tessellation

Cliff and Haggett (1988) appear to be the first to use Voronoi tessellation1 to compute pump neighborhoods. In their digitization of Snow’s map, Dodson and Tolber (1992) include coordinates for 13 Voronoi cells. These are available in HistData::Snow.polygons. To replicate that effort, I use deldir::deldir(). With the exception of the border between the neighborhoods of the Market Place and the Adam and Eve Court pumps (pumps #1 and #2), I find that Dodson and Tobler’s computation are otherwise identical to those using the ‘deldir’ package.

To explore the data using this approach, you can use neighborhoodVoronoi() to create scenarios of different sets neighborhoods based on the pumps you select. The figure below plots the 321 fatality “addresses” and the Voronoi cells for the 13 pumps in the original map.


The next figure below plots the same data but excludes consideration of the Broad Street pump.


In either case, the numerical results can be summarized using the print() method. Note that “Pearson” is “Count” minus “Expected” divided by the square root of “Expected”:

# print(neighborhoodVoronoi()) or
##    pump.id Count Percent  Expected    Pearson
## 1        1     0    0.00 19.491634 -4.4149330
## 2        2     1    0.31  6.234668 -2.0964402
## 3        3    10    3.12 13.983773 -1.0653256
## 4        4    13    4.05 30.413400 -3.1575562
## 5        5     3    0.93 26.463696 -4.5611166
## 6        6    39   12.15 39.860757 -0.1363352
## 7        7   182   56.70 27.189136 29.6895576
## 8        8    12    3.74 22.112172 -2.1504470
## 9        9    17    5.30 15.531287  0.3726776
## 10      10    38   11.84 18.976903  4.3668529
## 11      11     2    0.62 24.627339 -4.5595790
## 12      12     2    0.62 29.671129 -5.0799548
## 13      13     2    0.62 46.444106 -6.5215206

Walking distance

The obvious criticism against using Voronoi tessellation to analyze Snow’s map is that the neighborhoods it describes are based solely on the Euclidean distance between water pumps. Roads and buildings don’t matter. In this version of the world, people walk to water pumps in perfectly straight lines rather than along the twists and turns of paths created by roads and streets.

Not only is this unrealistic, it’s also contrary to how Snow thought about the problem. Snow’s graphical annotation appears to be based on a computation of walking distance. He writes: “The inner dotted line on the map shews [sic] the various points which have been found by careful measurement to be at an equal distance by the nearest road from the pump in Broad Street and the surrounding pumps …” (Report On The Cholera Outbreak In The Parish Of St. James, Westminster, During The Autumn Of 1854, p. 109.).

While the details of his computations seem to be lost to history, I replicate and extend his efforts by writing functions that allow you to compute and visualize pump neighborhoods based on walking distance.2 My implementation works by transforming the roads on the map into a network graph and turning the computation of walking distance into a graph theory problem. For each case (observed or simulated), I compute the shortest path, weighted by the length of roads, to the nearest pump. Then, by drawing the unique paths for all cases, a pump’s neighborhood emerges:


The summary results are:

# print(neighborhoodWalking()) or
##   3   4   5   6   7   8   9  10  11  12 
##  12   6   1  44 189  14  32  20   2   1

“Expected” walking neighborhoods

To get a sense of the full extent of a walking neighborhood, I apply the approach above to use “expected” or simulated data. Using sp::spsample() and sp::Polygon(), I place 20,000 regularly spaced points, which lies approximately 6 meters apart, across the map and essentially compute the shortest path to the nearest pump.3

I visualize the results in two ways. In the first, I identify neighborhoods by coloring roads.4

plot(neighborhoodWalking(case.set = "expected"))

In the second, I identify neighborhoods by coloring regions using points or polygons.5 The points approach, shown below, is faster and more robust.

plot(neighborhoodWalking(case.set = "expected"), type = "area.points")

The main virtue of the polygon approach is that it better lends itself to building graphs at different scales. Details about the implementation of neighborhood polygons are found in this vignette’s lab notes, which are avalable online and in the package’s GitHub repository.

streetNameLocator("marshall street", zoom = TRUE, highlight = FALSE,
  add.title = FALSE, radius = 0.5)