Overview of the epitrix package

Thibaut Jombart

2018-01-24

epitrix implements small helper functions usefull in infectious disease modelling and epidemics analysis. This vignette provides a quick overview of the package’s features.

What does it do?

The main features of the package include:

Fitting a gamma distribution to delay data

In this example, we simulate data which replicate the serial interval (SI), i.e. the delays between primary and secondary symptom onsets, in Ebola Virus Disease (EVD). We start by converting previously estimates of the mean and standard deviation of the SI (WHO Ebola Response Team (2014) NEJM 371:1481–1495) to the parameters of a Gamma distribution:

library(epitrix)

mu <- 15.3 # mean in days days
sigma <- 9.3 # standard deviation in days
cv <- mu/sigma # coefficient of variation
cv
#> [1] 1.645161
param <- gamma_mucv2shapescale(mu, cv) # convertion to Gamma parameters
param
#> $shape
#> [1] 0.3694733
#> 
#> $scale
#> [1] 41.4103

The shape and scale are parameters of a Gamma distribution we can use to generate delays. However, delays are typically reported per days, which implies a discretisation (from continuous time to discrete numbers). We use the package distcrete to achieve this discretisation. It generates a list of functions, including one to simulate data ($r), which we use to simulate 500 delays:


si <- distcrete::distcrete("gamma", interval = 1,
               shape = param$shape,
               scale = param$scale, w = 0)
si
#> A discrete distribution
#>   name: gamma
#>   parameters:
#>     shape: 0.369473279507882
#>     scale: 41.4103017689906
set.seed(1)
x <- si$r(500)
head(x, 10)
#>  [1]  0  2  7 46  0 43 62 12 10  0
hist(x, col = "grey", border = "white",
     xlab = "Days between primary and secondary onset",
     main = "Simulated serial intervals")

x contains simulated data, for illustrative purpose. In practice, one would use real data from an ongoing outbreaks. Now we use fit_disc_gamma to estimate the parameters of a dicretised Gamma distribution from the data:


si_fit <- fit_disc_gamma(x)
si_fit
#> $mu
#> [1] 15.01017
#> 
#> $cv
#> [1] 1.656824
#> 
#> $sd
#> [1] 24.8692
#> 
#> $ll
#> [1] -1689.073
#> 
#> $converged
#> [1] TRUE
#> 
#> $distribution
#> A discrete distribution
#>   name: gamma
#>   parameters:
#>     shape: 0.364290092380669
#>     scale: 41.2038799853595

Converting a growth rate (r) to a reproduction number (R0)

The package incidence can fit a log-linear model to incidence curves (function fit), which produces a growth rate (r). This growth rate can in turn be translated into a basic reproduction number (R0) using r2R0. We illustrate this using simulated Ebola data from the outbreaks package, and using the serial interval from the previous example:


library(outbreaks)
library(incidence)
i <- incidence(ebola_sim$linelist$date_of_onset)
i
#> <incidence object>
#> [5888 cases from days 2014-04-07 to 2015-04-30]
#> 
#> $counts: matrix with 389 rows and 1 columns
#> $n: 5888 cases in total
#> $dates: 389 dates marking the left-side of bins
#> $interval: 1 day
#> $timespan: 389 days
f <- fit(i[1:150]) # fit on first 150 days
#> Warning in fit(i[1:150]): 22 dates with incidence of 0 ignored for fitting
plot(i[1:200], fit = f, color = "#9fc2fc")


r2R0(f$info$r, si$d(1:100))
#> [1] 1.358887
r2R0(f$info$r.conf, si$d(1:100))
#>         2.5 %   97.5 %
#> [1,] 1.328372 1.388925

In addition, we can also use the function lm2R0_sample to generate samples of R0 values compatible with a model fit:


R0_val <- lm2R0_sample(f$lm, si$d(1:100), n = 100)
head(R0_val)
#> [1] 1.360925 1.357800 1.360150 1.367461 1.352716 1.352790
hist(R0_val, col = "grey", border = "white")

Anonymising data

hash_names can be used to generate hashed labels from linelist data. Based on pre-defined fields, it will generate anonymous labels. This system has the following desirable features:


first_name <- c("Jane", "Joe", "Raoul", "Raoul")
last_name <- c("Doe", "Smith", "Dupont", "Dupond")
age <- c(25, 69, 36, 36)

## detailed output by default
hash_names(first_name, last_name, age)
#>           label hash_short                                     hash
#> 1     janedoe25     274be0 274be0e34366ab2b8798adb31ad161ea8441c410
#> 2    joesmith69     4c20ad 4c20ad4911e4941eaa3c12023440da406eaf3353
#> 3 raouldupont36     4573ee 4573ee25d7f395d94a89c2fea71bfed27933b6d9
#> 4 raouldupond36     d0e0f4 d0e0f421e546c7902c30faefe58be41aa4ed01d3

## short labels for practical use
hash_names(first_name, last_name, age,
           size = 8, full = FALSE)
#> [1] "274be0e3" "4c20ad49" "4573ee25" "d0e0f421"