Recreating the 2010 Rwanda DHS using GridSample

Nick W Ruktanonchai, Dana R Thomson

2018-08-07

Here we use the GridSample package (https://doi.org/10.1186/s12942-017-0098-4) to replicate the sample design of the 2010 Rwanda DHS. Further details are available at: <www.dhsprogram.com/pubs/pdf/FR259/FR259.pdf>.

Parameters

The 2010 Rwanda DHS sampled 12,540 households from 492 PSUs comprising rural villages and urban neighborhoods (30). The sample was stratified by Rwanda’s 30 districts, urban areas were oversampled by adding 12 PSUs in Kigali’s three districts, and 26 households were sampled from each urban and rural PSU. The average village in Rwanda had 610 occupants according to the sample frame.

Therefore, the corresponding parameters for the gs_sample algorithm are

cfg_hh_per_stratum <- 416 #
cfg_hh_per_urban <- 26 # households from each urban PSU
cfg_hh_per_rural <- 26 # households from each rural PSU
cfg_pop_per_psu <- 610 # limit PSU size to the average village size

Data preparation

We start with two files, a population raster and a shapefile RWAshp defining the various strata in the census. Here, the strata shapefile consists of the 30 districts. The population raster can be downloaded from <www.worldpop.org.uk>. For the purposes of this example, we will use the population raster that estimates population in Rwanda for 2010, adjusted using census estimates, where each grid-cell represents the population per pixel. The corresponding raster when downloaded from the WorldPop website is named RWA_ppp_v2b_2010_UNadj.tif.

First, we read in the relevant datasets, and rasterize the strata layer using a field that contains a unique identifier for each polygon.

library(gridsample)
library(raster)
## Warning: package 'raster' was built under R version 3.4.4
## Loading required package: sp
## Warning: package 'sp' was built under R version 3.4.4
population_raster <- raster(system.file("extdata", "RWA_ppp_2010_adj_v2.tif",
  package="gridsample"))
plot(population_raster)

data(RWAshp)
strata_raster <- rasterize(RWAshp,population_raster,field = "STL.2")
plot(strata_raster)

We need a raster that defines urban and rural areas as well. Here, we’ll define urban areas by determining the population cell density associated with the official proportion of people in an urban area. The 2012 Rwanda census found 16% of people to live in urban areas, so we’ll change the most dense cells as urban, until 16% of the population is defined as being in urban areas.

total_pop <- cellStats(population_raster, stat = "sum")
urban_pop_value <- total_pop * .16
pop_df <- data.frame(index = 1:length(population_raster[]),
  pop = population_raster[])
pop_df <- pop_df[!is.na(pop_df$pop), ]
pop_df <- pop_df[order(pop_df$pop,decreasing = T), ]
pop_df$cumulative_pop <- cumsum(pop_df$pop)
pop_df$urban <- 0
pop_df$urban[which(pop_df$cumulative_pop <= urban_pop_value)] <- 1
urban_raster <- population_raster >= min(subset(pop_df,urban == 1)$pop)
plot(urban_raster)

Using gs_sample

Now that we have the population, strata, and urbanization rasters, we are ready to use the gridsample algorithm. We exclude any grid cells with a population less than 0.01 to prevent the algorithm from creating overly large sampling areas (cfg_min_pop_per_cell = 0.01), also restricting the total PSU size to less than 10km^2 (cfg_max_psu_size = 10). Finally, because we wanted the sample to be representative of both urban and rural areas, we allowed the algorithm to oversample rural or urban areas by setting cfg_sample_rururb = TRUE. We save the output shapefile to a temporary directory.

psu_polygons <- gs_sample(
  population_raster = population_raster,
  strata_raster = strata_raster,
  urban_raster = urban_raster,
  cfg_desired_cell_size = NA,
  cfg_hh_per_stratum = 416,
  cfg_hh_per_urban = 26,
  cfg_hh_per_rural = 26,
  cfg_min_pop_per_cell = 0.01,
  cfg_max_psu_size = 10,
  cfg_pop_per_psu = 610,
  cfg_sample_rururb = TRUE,
  cfg_sample_spatial = FALSE,
  cfg_sample_spatial_scale = 100,
  output_path = tempdir(),
  sample_name = "rwanda_psu"
)
## [1] "Beginning gs_sample() Process:"
## [1] "Randomly Stratifying PSU Cell IDs:"
## [1] "Processing stratum ID 6 ..."
## [1] "Processing stratum ID 1 ..."
## [1] "Processing stratum ID 4 ..."
## [1] "Processing stratum ID 2 ..."
## [1] "Processing stratum ID 3 ..."
## [1] "Processing stratum ID 5 ..."
## [1] "Processing stratum ID 7 ..."
## [1] "Processing stratum ID 8 ..."
## [1] "Checking urban and rural domains meet stratum HH sample size"
## [1] "Reassignment for stratum 6 of 1 cells processed"
## [1] "Begin Second Stage Sampling, Growing PSUs if requested:"
## 
##      PLEASE NOTE:  The components "delsgs" and "summary" of the
##  object returned by deldir() are now DATA FRAMES rather than
##  matrices (as they were prior to release 0.0-18).
##  See help("deldir").
##  
##      PLEASE NOTE: The process that deldir() uses for determining
##  duplicated points has changed from that used in version
##  0.0-9 of this package (and previously). See help("deldir").
## [1] "PSUs to process: 128"
## [1] "Active PSUs:"
##   [1] 110  83 127  46  61  93  51 109  12  32  59 114  27 116  55  64   3
##  [18]  50  58 106   7  54  23  21  96  19 101 104  72 115  20 102 123   8
##  [35]  84  75  62  49  63  30  44  25 122 120  36  43  91  68  14  85  22
##  [52] 126  13  65  77  67  42  89 119 124  15  10  11  97  39  98  28  78
##  [69]   5  18   2 125   6  76  95  38 117  35  87  26  88  69  24 121  56
##  [86]  31  74  52  86 112  53  81  45 128  34 107  29  33 111  40  60 113
## [103]  92  80  90  48  57 118  47  17   1  41 105   9  37  99  73 108  82
## [120]  71  70  16 100  94 103  66   4  79
## [1] "Active PSU Population Sums:"
##   [1]  1144.6572  1306.7236  3424.1336  1199.3570  1160.2891  3161.4045
##   [7]  8484.4252  1232.6077  1032.3489   860.4852   955.7482   415.9963
##  [13]  1315.1844  1320.9459   848.6110  1473.1985  1091.2181   979.1264
##  [19]   959.1565  1319.7061  1568.2858  2152.9234  1028.0767  1128.2486
##  [25] 21556.3594   613.1150  1059.5182  1300.3492  1393.4698  1508.1392
##  [31]   131.3457  2043.6701  1704.5570  1028.6012  2095.1550  2064.2653
##  [37]  1501.9663  1029.0230   851.2141  2175.0904 10245.2136  1364.6390
##  [43]   259.5918  2494.7397   987.5451 47911.0375  1599.3976 17843.5810
##  [49]  1658.7823  4947.8855  1956.4188   607.0674  1473.3615  1547.9205
##  [55]   650.6719  2224.9154  1006.0417  2138.3901  1964.6719  1180.9378
##  [61]  1031.0676   521.2857   405.3794  1292.4436  1313.3385  1093.5068
##  [67]   452.1606  4930.7139  1308.3001  1162.5517   962.5251  1120.7299
##  [73]  5203.7464  1480.8351  1100.8640   708.2816  4699.4580  2682.2158
##  [79]   488.4512   948.5000  1034.0282  1624.3811  1273.5639   947.1859
##  [85]  2408.5114  1249.9432  1331.9261  1846.3818  1210.3671  1200.2283
##  [91]   901.4919   674.9285  1938.3210   947.4876   885.8436   440.3320
##  [97]  1382.3109   918.2573  2271.3246   964.0099 10901.2984 12736.4476
## [103]   598.6255  1366.0017  1109.0397  1389.2465   848.9205  1970.9421
## [109]   903.6887  1095.1059  1100.1390   918.5493  1760.1123  1919.0557
## [115]   783.2866   664.7677  1470.7730   933.0224   718.6760   535.4312
## [121]  4722.8411  3274.3940   506.9392   537.9949  1084.5628  1480.0378
## [127]  1131.8322 22994.9311
## [1] "Active PSU Cell Count:"
##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [71] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [106] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1] "Processing 114 ..."
## [1] "... cells added: 1"
## [1] "... population added: 282.767237246037"
## [1] "... population total: 698.763534963131"
## [1] "Processing 20 ..."
## [1] "... cells added: 5"
## [1] "... population added: 491.487710639834"
## [1] "... population total: 622.833435609937"
## [1] "Processing 122 ..."
## [1] "... cells added: 1"
## [1] "... population added: 631.574843214435"
## [1] "... population total: 891.166687278653"
## [1] "Processing 126 ..."
## [1] "... cells added: 1"
## [1] "... population added: 470.618171453476"
## [1] "... population total: 1077.68557143211"
## [1] "Processing 10 ..."
## [1] "... cells added: 1"
## [1] "... population added: 644.818376779556"
## [1] "... population total: 1166.10408675671"
## [1] "Processing 11 ..."
## [1] "... cells added: 1"
## [1] "... population added: 299.59882748127"
## [1] "... population total: 704.978253483772"
## [1] "Processing 28 ..."
## [1] "... cells added: 1"
## [1] "... population added: 581.901154398918"
## [1] "... population total: 1034.06173014641"
## [1] "Processing 87 ..."
## [1] "... cells added: 1"
## [1] "... population added: 219.939943730831"
## [1] "... population total: 708.391180574894"
## [1] "Processing 107 ..."
## [1] "... cells added: 1"
## [1] "... population added: 565.452972531319"
## [1] "... population total: 1005.78496551514"
## [1] "Processing 92 ..."
## [1] "... cells added: 1"
## [1] "... population added: 890.719133973122"
## [1] "... population total: 1489.34467315674"
## [1] "Processing 71 ..."
## [1] "... cells added: 1"
## [1] "... population added: 550.031454861164"
## [1] "... population total: 1085.46263051033"
## [1] "Processing 100 ..."
## [1] "... cells added: 1"
## [1] "... population added: 574.137712478638"
## [1] "... population total: 1081.07694292068"
## [1] "Processing 94 ..."
## [1] "... cells added: 1"
## [1] "... population added: 854.525423288345"
## [1] "... population total: 1392.52030861378"
## [1] "Converting PSU Cells to Polygons and Summarizing Metrics:"
plot(psu_polygons)