```
library(redist)
library(igraph)
#>
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#>
#> decompose, spectrum
#> The following object is masked from 'package:base':
#>
#> union
library(spdep)
#> Warning: package 'spdep' was built under R version 3.6.2
#> Loading required package: sp
#> Warning: package 'sp' was built under R version 3.6.2
#> Loading required package: spData
#> Warning: package 'spData' was built under R version 3.6.2
#> To access larger datasets in this package, install the spDataLarge
#> package with: `install.packages('spDataLarge',
#> repos='https://nowosad.github.io/drat/', type='source')`
#> Loading required package: sf
#> Warning: package 'sf' was built under R version 3.6.2
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
library(coda)
set.seed(1)
```

The `redist`

package is designed to allow for replicable redistricting simulations. The package comes loaded with data for simple testing and with functions to simulate redistricting and to run diagnostics on the redistricting plans created. These data form the basis for small-scale validations of sampling methods in Automated Redistricting Simulation Using Markov Chain Monte Carlo. For larger scale validation, see The Essential Role of Empirical Validation in Legislative Redistricting Simulation, which has additional methods which will be added to this package later in 2020.

- Loading redist
- Included Data
- Adjacency-Based Redistricting
- Redistricting with MCMC
- Redistricting with MCMC using MPI
- Using Multiple Chains
- Random Seed Growth
- Segregation Index
- Diagnostic Plots

redist can be installed either from CRAN, for the stable version with:

```
install.packages('redist')
```

or from GitHub, which is updated more often:

```
devtools::install_github(repo = 'kosukeimai/redist', ref = 'master')
```

This package is often used with a set of other packages:
`sf`

and `sp`

are useful for working with shapefiles and can be loaded with:

```
library(sf)
library(sp)
```

For additional functions for working with the shapefiles, using `spdep`

is recommended. This is also used with creating lists of which precincts are adjacent to other precincts. It can be loaded with:

```
library(spdep)
```

For plotting maps and adjacency graphs, `ggplot2`

and `igraph`

are useful. The former allows for making ggplot maps, when used with `sf`

. These may be loaded with:

```
library(igraph)
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.6.2
```

Additionally useful data manipulation tools for working with objects from the package are `dplyr`

and `magrittr`

.

```
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.6.2
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:igraph':
#>
#> as_data_frame, groups, union
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(magrittr)
```

The package contains four datasets. The first three are `algdat.p10`

, `algdat.p20`

, and `algdat.pfull`

which contain data for 25 continuous precincts within Florida when partitioned into three districts. Respectively, these are those plans which fall within 10% population parity, 20% population parity, and all possible partitions. These are loaded in with the package and can be loaded as follows:

```
data("algdat.p10")
data("algdat.p20")
data("algdat.pfull")
```

Each algdat object is a list with five objects:

```
names(algdat.p10)
#> [1] "adjlist" "cdmat" "precinct.data"
#> [4] "segregation.index" "distancemat"
names(algdat.p20)
#> [1] "adjlist" "cdmat" "precinct.data"
#> [4] "segregation.index" "distancemat"
names(algdat.pfull)
#> [1] "adjlist" "cdmat" "precinct.data"
#> [4] "segregation.index" "distancemat"
```

adjlist - contains 25 adjacency lists as nb objects from package

`spdep`

cdmat - contains all possible congressional districts under the population constraint for that object as a matrix

precinct.data - data frame with a row for each precinct with five columns: pop, demvote, repvote, blackpop, hispanicpop

- pop - the total population of the district
- demvote - the number of votes cast for Obama in the 2008 presidential election in that precinct
- repvote - the number of votes cast for John McCain in the 2008 presidental election in that precinct
- blackpop - the number of black individuals in that precinct
- hispanicpop - the number of hispanic individuals in that precinct

segregation.index - the dissimilarity index of segregation for each plan (The Dimensions of Residential Segregation, Massey & Denton 1987).

distancemat - a symmetric matrix with the squared distance between two precincts as its entries

The fourth data set is the sf dataframe, containing the shapes themselves for creating these objects. It can be called with:

```
data("fl25")
```

From the head of the data, we can see this is a typical dataframe with a geometry column appended. See the website for sf for more information about this type of object.

```
head(fl25)
#> Simple feature collection with 6 features and 11 fields
#> geometry type: POLYGON
#> dimension: XY
#> bbox: xmin: -81.85582 ymin: 26.4483 xmax: -81.56292 ymax: 26.7171
#> CRS: NA
#> geoid10 pop vap obama mccain TotPop BlackPop HispPop VAP BlackVAP
#> 2942 2519.0_0 15993 12379 647 1522 15993 863 1980 12379 582
#> 2561 2402.0_0 7799 6084 266 226 7799 2417 1600 6084 1601
#> 2581 2520.0_0 6347 5698 758 1846 6347 97 304 5698 78
#> 2594 2532.0_0 3808 3134 424 828 3808 41 180 3134 27
#> 2687 2438.0_0 4982 4173 668 842 4982 316 837 4173 199
#> 2732 2521.0_0 8907 7587 644 935 8907 195 1147 7587 153
#> HispVAP geometry
#> 2942 1292 POLYGON ((-81.78102 26.6274...
#> 2561 1133 POLYGON ((-81.80102 26.6341...
#> 2581 240 POLYGON ((-81.79772 26.5093...
#> 2594 134 POLYGON ((-81.79842 26.5474...
#> 2687 545 POLYGON ((-81.75042 26.7038...
#> 2732 860 POLYGON ((-81.66622 26.4657...
```

Each column contains useful information for typical redistricting questions:

`geoid10`

is a unique identifier.`pop`

is the population of the precinct.`vap`

is the voter age population.`obama`

is the vote count for Obama in the 2012 presidential election.`mccain`

is the vote count for McCain in the 2012 presidential election.`TotPop`

is identical to`pop`

.`BlackPop`

is the number of black residents within the precinct.`HispPop`

is the number of Hispanic residents within the precinct.`VAP`

is identical to`vap`

.`BlackVAP`

is the number of voter age black residents within the precinct.`HispVAP`

is the number of voter age hispanic residents within the precinct.`geometry`

contains the sf geometry.

We begin with a simple example of thinking about adjacency-based redistricting, using a small piece of Florida, named fl25. It can be loaded with the data command as follows. For more information on the data, see the data section

```
data("fl25")
```

This Florida subset contains the shapefiles for the districts, which can be plotted with sf.

```
fl25 %>% ggplot(aes(fill = pop)) +
geom_sf()
```

In this map, the color represents the population, where darker colors are smaller populations. The 25 shapes outlined above are the precincts that we work with. A difficulty in redistricting is that the area of the precinct does not necessarily tell us anything about the population of the precinct. As such, we work with adjacency lists for the precincts.

If we number the districts from 1 to 25, we say that two districts are connected if they share a side, which is referred to as rook contiguity.

If we arbitrarily number the districts from the plot above, we can pick a small subset of them.

```
fl25$id <- 1:25
fl25[c(18,19,23:25),] %>% ggplot() +
geom_sf() +
geom_sf_label(aes(label = id))
```

Then, the 25th precinct by this numbering is connected to 19, 23, and 24. So, the adjacency list would be 19,23,24. 18 would not be on this list, as it does not share a portion of a side with precinct 25.

To make an adjacency list, we may use the packages `spdep`

with function `poly2nb`

. We use this function with `queen = FALSE`

, as we want the rooks adjacency, not the queens adjacency, which would allow for if only corners touch, like on a chess board.

```
adjlist <- poly2nb(pl = fl25, queen = FALSE)
```

We can further verify the above comments by looking at the 25th element of the adjacency list, which corresponds to the 25th district.

```
adjlist[[25]]
#> [1] 5 22 24
```

Using igraph, we can easily plot the whole set of adjacencies:

```
plot(graph_from_adj_list(adjlist, mode = 'total'))
```

While these are numbered in 1:25, the back end in C++ requires 0 indexing for efficiency, so we sink the adjacency list to be 0:24.

```
for(i in 1:25){
adjlist[[i]] <- adjlist[[i]]-1
}
```

Thus, everything is the same up to naming.

```
adjlist[[25]]
#> [1] 4 21 23
```

Each algorithm within the package maintains geographically contiguous districts. The `MCMC`

and `enumerate`

algorithms consider the underlying graphical structure of the districts, where contiguity is represented by the edges in the above graph.

To begin running the MCMC algorithm, we have to provide some basic data. We provide the adjacency list to `adjobj`

, a vector of populations to `popvec`

, the number of districts as 3 to `ndists`

, and choose to run 10000 simulations with `nsims`

. We can also save the output with `savename`

.

```
alg_mcmc <- redist.mcmc(adjobj = algdat.pfull$adjlist,
popvec = algdat.pfull$precinct.data$pop,
ndists = 3,
nsims = 10000,
savename = "redist.mcmc")
#>
#> ====================
#> redist.mcmc(): Automated Redistricting Simulation Using
#> Markov Chain Monte Carlo
#>
#> Preprocessing data.
#>
#>
#> ====================
#> Using redist.rsg() to generate starting values.
#>
#> 10 percent done.
#> Metropolis acceptance ratio: 0.962963
#>
#> 20 percent done.
#> Metropolis acceptance ratio: 0.964982
#>
#> 30 percent done.
#> Metropolis acceptance ratio: 0.962654
#>
#> 40 percent done.
#> Metropolis acceptance ratio: 0.95949
#>
#> 50 percent done.
#> Metropolis acceptance ratio: 0.962993
#>
#> 60 percent done.
#> Metropolis acceptance ratio: 0.962827
#>
#> 70 percent done.
#> Metropolis acceptance ratio: 0.963138
#>
#> 80 percent done.
#> Metropolis acceptance ratio: 0.96237
#>
#> 90 percent done.
#> Metropolis acceptance ratio: 0.962551
#>
#> 100 percent done.
#> Metropolis acceptance ratio: 0.962796
```

Note that we do not need to specify `ndists`

when we supply a different argument, `initcds`

, which is the set of districts to initialize to. If we do not specific `initcds`

, the RSG function is run to initialize districts. When tuning parameters, the `initcds`

argument may be useful for ensuring diverse starting positions for different chains, though this is not typically necessary.

We can specify `initcds`

, for example, as the first column of the full enumeration in `algdat.pfull`

.

```
initcds <- algdat.pfull$cdmat[,1]
```

```
alg_mcmc <- redist.mcmc(adjobj = algdat.pfull$adjlist,
popvec = algdat.pfull$precinct.data$pop,
initcds = initcds,
nsims = 10000,
savename = "redist.mcmc")
#>
#> ====================
#> redist.mcmc(): Automated Redistricting Simulation Using
#> Markov Chain Monte Carlo
#>
#> Preprocessing data.
#>
#> 10 percent done.
#> Metropolis acceptance ratio: 0.958959
#>
#> 20 percent done.
#> Metropolis acceptance ratio: 0.962481
#>
#> 30 percent done.
#> Metropolis acceptance ratio: 0.960654
#>
#> 40 percent done.
#> Metropolis acceptance ratio: 0.963241
#>
#> 50 percent done.
#> Metropolis acceptance ratio: 0.962993
#>
#> 60 percent done.
#> Metropolis acceptance ratio: 0.961327
#>
#> 70 percent done.
#> Metropolis acceptance ratio: 0.961566
#>
#> 80 percent done.
#> Metropolis acceptance ratio: 0.960995
#>
#> 90 percent done.
#> Metropolis acceptance ratio: 0.961662
#>
#> 100 percent done.
#> Metropolis acceptance ratio: 0.961096
```

Once we've run the algorithm, the output is of class `redist`

. As `savename`

was provided, there is an Rdata file created in the working directory with a copy of the output.

```
class(alg_mcmc)
#> [1] "redist"
names(alg_mcmc)
#> [1] "partitions" "distance_parity" "distance_original"
#> [4] "mhdecisions" "mhprob" "pparam"
#> [7] "beta_sequence" "energy_psi" "constraint_pop"
#> [10] "constraint_compact" "constraint_segregation" "constraint_similar"
#> [13] "constraint_countysplit" "boundary_partitions" "boundaryratio"
```

For simple runs of the algorithm, the most important pieces of the output are `partitions`

and `distance_parity`

. The `partitions`

object is a matrix with `ndist`

rows and `nsims`

columns. Each column is one redistricting plan, where the numbers go from 0 to n-1, with each number represents the district assignment.

```
alg_mcmc$partitions[,1]
#> [1] 0 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2
```

The `distance_parity`

output provides the population parity of the districts as an array with `nsims`

entries.

```
alg_mcmc$distance_parity[1]
#> [1] 1.538696
```

The additional outputs are useful for implementing the algorithm with constraints.

The running of this function is the same as in redist.mcmc, but it requires Rmpi installed.

```
library(Rmpi)
redist.mcmc.mpi(adjobj = algdat.pfull$adjlist,
popvec = algdat.pfull$precinct.data$pop,
nsims = 10000,
ndists = 3,
savename = "redist.mcmc.mpi")
```

Outputs and usage are the same as in the MCMC Section, but MPI will allow for much faster computation.

When running larger redistricting analyses, one important step is to run multiple chains of the MCMC algorithm. This will also allow us to diagnose convergence better, using the Gelman-Rubin plot, as seen in the section on Diagnostic Plots.

On Windows and in smaller capacities, it is useful to run the algorithm within an lapply loop. First, we set up the seed for replicability and decide on the number of chains and simulations.

```
RNGkind(kind = "L'Ecuyer-CMRG")
set.seed(1)
nchains <- 4
nsims <- 10000
```

```
mcmc_chains <- lapply(1:nchains, function(x){
redist.mcmc(adjobj = algdat.pfull$adjlist,
popvec = algdat.pfull$precinct.data$pop,
nsims = nsims,
ndists = 3)
})
#>
#> ====================
#> redist.mcmc(): Automated Redistricting Simulation Using
#> Markov Chain Monte Carlo
#>
#> Preprocessing data.
#>
#>
#> ====================
#> Using redist.rsg() to generate starting values.
#>
#> 10 percent done.
#> Metropolis acceptance ratio: 0.966967
#>
#> 20 percent done.
#> Metropolis acceptance ratio: 0.966483
#>
#> 30 percent done.
#> Metropolis acceptance ratio: 0.964321
#>
#> 40 percent done.
#> Metropolis acceptance ratio: 0.964491
#>
#> 50 percent done.
#> Metropolis acceptance ratio: 0.964993
#>
#> 60 percent done.
#> Metropolis acceptance ratio: 0.964827
#>
#> 70 percent done.
#> Metropolis acceptance ratio: 0.963709
#>
#> 80 percent done.
#> Metropolis acceptance ratio: 0.96362
#>
#> 90 percent done.
#> Metropolis acceptance ratio: 0.964996
#>
#> 100 percent done.
#> Metropolis acceptance ratio: 0.964796
#>
#>
#> ====================
#> redist.mcmc(): Automated Redistricting Simulation Using
#> Markov Chain Monte Carlo
#>
#> Preprocessing data.
#>
#>
#> ====================
#> Using redist.rsg() to generate starting values.
#>
#> 10 percent done.
#> Metropolis acceptance ratio: 0.96997
#>
#> 20 percent done.
#> Metropolis acceptance ratio: 0.964982
#>
#> 30 percent done.
#> Metropolis acceptance ratio: 0.969323
#>
#> 40 percent done.
#> Metropolis acceptance ratio: 0.965491
#>
#> 50 percent done.
#> Metropolis acceptance ratio: 0.965993
#>
#> 60 percent done.
#> Metropolis acceptance ratio: 0.963994
#>
#> 70 percent done.
#> Metropolis acceptance ratio: 0.963852
#>
#> 80 percent done.
#> Metropolis acceptance ratio: 0.96362
#>
#> 90 percent done.
#> Metropolis acceptance ratio: 0.964774
#>
#> 100 percent done.
#> Metropolis acceptance ratio: 0.964896
#>
#>
#> ====================
#> redist.mcmc(): Automated Redistricting Simulation Using
#> Markov Chain Monte Carlo
#>
#> Preprocessing data.
#>
#>
#> ====================
#> Using redist.rsg() to generate starting values.
#>
#> 10 percent done.
#> Metropolis acceptance ratio: 0.974975
#>
#> 20 percent done.
#> Metropolis acceptance ratio: 0.964982
#>
#> 30 percent done.
#> Metropolis acceptance ratio: 0.964988
#>
#> 40 percent done.
#> Metropolis acceptance ratio: 0.964241
#>
#> 50 percent done.
#> Metropolis acceptance ratio: 0.964393
#>
#> 60 percent done.
#> Metropolis acceptance ratio: 0.965494
#>
#> 70 percent done.
#> Metropolis acceptance ratio: 0.965424
#>
#> 80 percent done.
#> Metropolis acceptance ratio: 0.964996
#>
#> 90 percent done.
#> Metropolis acceptance ratio: 0.964774
#>
#> 100 percent done.
#> Metropolis acceptance ratio: 0.965097
#>
#>
#> ====================
#> redist.mcmc(): Automated Redistricting Simulation Using
#> Markov Chain Monte Carlo
#>
#> Preprocessing data.
#>
#>
#> ====================
#> Using redist.rsg() to generate starting values.
#>
#> 10 percent done.
#> Metropolis acceptance ratio: 0.951952
#>
#> 20 percent done.
#> Metropolis acceptance ratio: 0.956478
#>
#> 30 percent done.
#> Metropolis acceptance ratio: 0.95932
#>
#> 40 percent done.
#> Metropolis acceptance ratio: 0.95999
#>
#> 50 percent done.
#> Metropolis acceptance ratio: 0.960992
#>
#> 60 percent done.
#> Metropolis acceptance ratio: 0.960327
#>
#> 70 percent done.
#> Metropolis acceptance ratio: 0.961423
#>
#> 80 percent done.
#> Metropolis acceptance ratio: 0.96162
#>
#> 90 percent done.
#> Metropolis acceptance ratio: 0.961218
#>
#> 100 percent done.
#> Metropolis acceptance ratio: 0.961896
```

In unix-based systems, this can be run considerably faster by running this in parallel.

```
mcmc_chains <- parallel::mclapply(1:nchains, function(x){
redist.mcmc(adjobj = algdat.pfull$adjlist,
popvec = algdat.pfull$precinct.data$pop,
nsims = nsims,
ndists = 3)
}, mc.set.seed = 1, mc.cores = parallel::detectCores())
```

This package also contains an implementation of Chen and Rodden's non-compact Random Seed and Grow redistricting algorithm from their paper Unintentional Gerrymandering: Political Geography and Electoral Biases in Legislatures.

The adjacency list, via `adj.list`

, is created as in Adjacency-Based Redistricting, `population`

is the population of the districts, `ndists`

is the number of districts, and `thresh`

is the allowed population parity. A `thresh`

of `0.05`

allows for 5\% population parity. In easier redistricting questions, a small `maxiter`

is typically sufficient, but in large maps, values may be closer to `50,000`

to ensure that a solution is found.

```
rsg <- redist.rsg(adj.list = algdat.pfull$adjlist,
population = algdat.pfull$precinct.data$pop,
ndists = 3,
thresh = 0.05,
maxiter = 5000)
#>
#> ====================
#> redist.rsg(): Automated Redistricting Starts
#>
#>
#> 3 districts built using 25 precincts in 0 seconds...
```

The output of `redist.rsg`

is a list with three objects. `district_membership`

provides a numeric array with indices for which district each precinct belongs to.

```
rsg$district_membership
#> [1] 2 1 2 2 1 2 1 2 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 1 1
```

`district_list`

will provide an alternate formulation of this with `ndists`

arrays, indicating which precincts are in each district.

```
rsg$district_list
#> [[1]]
#> [1] 9 11 14 13 10 8 12
#>
#> [[2]]
#> [1] 15 19 18 17 23 24 16 21 1 4 6 20
#>
#> [[3]]
#> [1] 22 2 0 7 5 3
```

Corresponding to this, `district_pop`

will give the district population for each of the `ndists`

districts.

```
rsg$district_pop
#> [1] 56227 60047 58769
```

To evaluate redistricting plans, the `redist`

package comes with the `redist.segcalc`

function. This computes the segregation index introduced in Massey and Denton 1987.

It takes three arguments, `algout`

, which is a `redist`

object from the various `mcmc`

function or `rsg`

function, `grouppop`

which is the population of the group being used for comparison, and `fullpop`

is the total population of the precinct.

```
seg <- redist.segcalc(algout = alg_mcmc,
grouppop = algdat.pfull$precinct.data$blackpop,
fullpop = algdat.pfull$precinct.data$pop)
```

This returns a numeric vector with the an entry for each district provided.

When using the MCMC algorithms, there are various useful diagnostic plots. The `redist.diagplot`

function creates familiar plots by converting numeric entries into `mcmc`

objects to use with `coda`

.

The first four plots take a single index, such as one created by `redist.segcalc`

.

- Trace Plot

```
redist.diagplot(seg, plot = "trace")
```

- Autocorrelation Plot

```
redist.diagplot(seg, plot = "autocorr")
```

- Density Plot

```
redist.diagplot(seg, plot = "densplot")
```

- Mean Plot

```
redist.diagplot(seg, plot = "mean")
```

The final plot requires at least two chains.

```
seg_chains <- lapply(1:nchains,
function(i){redist.segcalc(algout = mcmc_chains[[i]],
grouppop = algdat.pfull$precinct.data$blackpop,
fullpop = algdat.pfull$precinct.data$pop)})
redist.diagplot(seg_chains, plot = 'gelmanrubin')
```

Beware that this diagnostic will often fail to converge if there are insufficient iterations. If an error is thrown and no plot is created, try increasing `nsims`

.