# Rnets Vignette

## Introduction

The Rnets package provides a mostly automated analysis pipeline for translating antimicrobial resistance (AMR) data from bacterial populations into network models. Representing resistance relationships as networks greatly facilitates visualization of data and allows noval analytic approaches by mapping the relationships to a network. Currently, the package is focused on mapping relationships betweeen phenotypic resistances, e.g. minimum inihibitory concentrations (MICs), but methods to incorporate genetic information into the analysis are currently in development. ##Under the Hood The networks models created by this package are probabalistic graphical models (PGMs), more specifically Markov random fields (MRF). MRFs are undirected graphical models where the vertex set is defined by a set of random variables and the edge set is defined by non-zero partial correlations between variables in the dataset. Sparse MRFs of AMR data are network models of resistance relationships, referred to as ‘rnets’ for brevity.

Sparsity is a concept in network models desribing how many or how few links, or edges, exist between the units, or vertices, of the network. A network in which all possible edges exist is referred to as dense. Sparsity is an appealing characteristic because sparse networks are much easier to interpret than completely dense networks. An MRF may be made sparse by reducing trivially small partial correlations to 0. Several approaches to this problem, and we employ the graphical least absolute shrinkage and selection operator (LASSO) here using the glasso function from the eponymous package maintained by R. Tibshirani1. This method applies an L1 penalty to the inverse covariance matrix to increase it’s sparsity. Higher L1 values leads to fewer edges and a sparser network and the graph is empty when $$L_1 \geq max(|\sigma_{ij}|)$$, i.e., there a no edges and all the variables appear to be conditionally independent.

The analysis pipeline can be summarized as follows: $\Large D_{n \times k} \underset{cor}{\rightarrow} \Sigma_{k \times k} \underset{glasso}{\rightarrow} \Theta_{k \times k} \underset{std'ize}{\rightarrow}\Omega_{k \times k} \underset{igraph}{\rightarrow}R(V, E)$ Where…

• $$D$$ is the data matrix with n observations (single isolates) over k variables (Resistances tested).
• $$\Sigma$$ is the empirical correlation/covariance matrix for the k variables in D.
• $$\Theta$$ is the penalized precision matrix.
• $$\Omega$$ is the partial correlation matrix, estimated as $$\omega_{ij}=\frac{-\theta_{ij}}{\sqrt{\theta_{ii} \theta_{jj}}}|i\neq j$$
• $$R$$ is the network defined by the two sets:
• The vertex set $$V$$, with $$|V| = k$$ vertices and
• The edge set $$E$$, with $$|E| = m$$edges.

## Important Functions & Methods

### Rnet()

The Rnet function is the core of the package. This function intakes a dataframe containing the AMR data, the L1 penalty, and other options and produces an rnet object (the specific type varies on how the “Stratify” argument is specified) that contains the processed network and associated attributes.

The example dataset NARMS_EC_DATA is included in the package and contains a subset of AMR data from E. coli isolates collected by the FDA & USDA as part of the National Antimicrobial Resistance Monitoring System2.

#Define the set of antimicrobials to include in the Rnet
ABX_LIST <- c('AMC', 'AXO', 'TIO', 'CIP', 'TET', 'STR', 'GEN', 'CHL')

#Estimate the Rnet
EC_all_Rnet <- Rnets::Rnet(Data = NARMS_EC_DATA, L1 =  0.25, V_set = ABX_LIST)

#View Results
summary(EC_all_Rnet)
##
## Basic R-net
##
##  Sample: 14418 isolates, 8 vertices
##
##      L1: 0.25
##   Edges: 4
## Density: 14.3 %
##
## Vertex set: AMC AXO TIO CIP TET STR GEN CHL
##
##   Edge Set:
##    V1  V2 Omega
## 1 AMC AXO 0.195
## 2 AMC TIO 0.230
## 3 AXO TIO 0.218
## 4 GEN STR 0.151

### L1Selection()

Several methods have been proposed to select the ‘appropriate’ L1 penalty, represented by $$\lambda$$, to induce sparsity in MRFs. In general, $$\lambda$$ should be high enough to remove trivially small partial correlations while leaving intact stronger partial correlations that are presumbly caused by genetic associations. L1Selection implements the StARS method described by Liu, Roeder, and Wasserman (2010)3. Briefly, this method estimates MRFs using multiple subsets sampled without replacement from the empirical data over a range of $$\lambda$$ values. Individual edges/partial correlations from the subset-derived MRFs are evaluated for stability (defined as the std. deviation of the proportion of subsets in which they appear), and a score D is assigned for each tested value of $$\lambda$$ based on the sum of stabilities for all edges over all subsets given the respective penalty. The suggested $$\lambda$$ value is the lowest value for which D is below some threshold, typically 0.05. The goal is to find the densest network that is also stable across most data subsets.

This function defaults to a subsample size n_b of half the dataset, but smaller subsamples are typically appropriate. Liu, Roeder, and Wasserman suggest a n_b = 10$$\sqrt{n}$$

EC_all_L1Selection <- L1Selection(
Data = NARMS_EC_DATA,
L1_set = seq(0.05, 0.50, 0.05),
n_b = 1500,
V_set = ABX_LIST
)
## [1] "L1Selection completed 1001 loops over 100 subsets in 183.76 seconds"
round(EC_all_L1Selection@StARS_D, 4)
##   0.05    0.1   0.15    0.2   0.25    0.3   0.35    0.4   0.45    0.5
## 0.1989 0.0658 0.0250 0.0367 0.0147 0.0000 0.0000 0.0014 0.0110 0.0000

generic summary & print methods for class(x) = 'rnet.L1.set' are still in development. Currently, stability scores are accessed direct via x@StARS_D

Given these results, the suggested regularization penalty would be $$\lambda$$ = 0.15, since StARS_D > 0.05 at $$\lambda$$ = 0.10.

NOTE: The resampling approach can be time consuming large datasets, i.e. datasets with many observations or many variables.

1. Friedman, Hastie & Tibshirani. “Sparse inverse covariance estimation with the graphical lasso.” Biostatistics (2007)

2. Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. Advances in Nerual INformation Processing Systems 23 (2010)