q-SIP

Nick Youngblut

2018-05-14


Introduction

q-SIP method workflow

Quantitative stable isotope probing (q-SIP) attempts to deal with the compositional problems of HTS-SIP data by transforming relative abundances with qPCR counts of total gene copies.

More details can be found at:

Hungate, Bruce A., Rebecca L. Mau, Egbert Schwartz, J. Gregory Caporaso, Paul Dijkstra, Natasja van Gestel, Benjamin J. Koch, et al. 2015. “Quantitative Microbial Ecology Through Stable Isotope Probing.” Applied and Environmental Microbiology, August, AEM.02280–15.

HTSSIP can both simulate q-SIP datasets and run the q-SIP analysis (with parallel computation of bootstrap confidence intervals).

Dataset

First, let’s load some packages including HTSSIP.

library(dplyr)
library(ggplot2)
library(HTSSIP)

OK. We’re going to be using 2 data files:

We’ll be using the dataset that we simulated in the HTSSIP_sim vignette.

The phyloseq object is similar to the dataset in the other vignettes.

# HTS-SIP data
physeq_rep3
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 6 taxa and 144 samples ]
## sample_data() Sample Data:       [ 144 samples by 5 sample variables ]

The associated qPCR data is, in this case a list of length = 2.

# qPCR data (list object)
physeq_rep3_qPCR %>% names
## [1] "raw"     "summary"

Really, only the summary table is needed for the next step. You could just use the summary data.frame object and that would be fine. However, you should have just 1 qPCR value per sample. So, if you have ‘technical’ qPCR replicates (eg., triplicate qPCR reactions per DNA sample), then you need to summarize those replicates to get one value (eg., the mean of all replicate values).

physeq_rep3_qPCR$raw %>% head(n=3)
##   Buoyant_density IS_CONTROL qPCR_tech_rep1 qPCR_tech_rep2 qPCR_tech_rep3
## 1        1.668185       TRUE      374460160      326925692      191209173
## 2        1.680254       TRUE      418974851      846274648     1057133579
## 3        1.679431       TRUE      456070748      431437845      307250348
##                    Sample     Gradient Fraction Treatment Replicate
## 1 12C-Con_rep1_1.668185_1 12C-Con_rep1        1   12C-Con         1
## 2 12C-Con_rep1_1.680254_2 12C-Con_rep1        2   12C-Con         1
## 3 12C-Con_rep1_1.679431_3 12C-Con_rep1        3   12C-Con         1
physeq_rep3_qPCR$summary %>% head(n=3)
##   IS_CONTROL                  Sample Buoyant_density qPCR_tech_rep_mean
## 1      FALSE 13C-Glu_rep1_1.671712_2        1.671712           46598172
## 2      FALSE 13C-Glu_rep1_1.671722_1        1.671722           42299449
## 3      FALSE 13C-Glu_rep1_1.680311_3        1.680311           53536519
##   qPCR_tech_rep_sd     Gradient Fraction Treatment Replicate
## 1          9055833 13C-Glu_rep1        2   13C-Glu         1
## 2         16369298 13C-Glu_rep1        1   13C-Glu         1
## 3         18265667 13C-Glu_rep1        3   13C-Glu         1

q-SIP

Abundance count transformation

First, we’ll transform the OTU counts. The following function will do the following:

phyloseq::otu_table(physeq_rep3) %>% .[1:5, 1:5]
## OTU Table:          [5 taxa and 5 samples]
##                      taxa are rows
##       12C-Con_rep1_1.668185_1 12C-Con_rep1_1.680254_2
## OTU.1                      57                     925
## OTU.2                       0                       1
## OTU.3                      19                     492
## OTU.4                       0                       0
## OTU.5                       0                       0
##       12C-Con_rep1_1.679431_3 12C-Con_rep1_1.682305_4
## OTU.1                    5809                   14645
## OTU.2                      73                    1077
## OTU.3                    5078                   21603
## OTU.4                       0                       0
## OTU.5                       3                      43
##       12C-Con_rep1_1.689408_5
## OTU.1                   15089
## OTU.2                    7499
## OTU.3                   36426
## OTU.4                      11
## OTU.5                    1017
physeq_rep3_t = OTU_qPCR_trans(physeq_rep3, physeq_rep3_qPCR)
phyloseq::otu_table(physeq_rep3_t) %>% .[1:5, 1:5]
## OTU Table:          [5 taxa and 5 samples]
##                      taxa are rows
##       12C-Con_rep1_1.668185_1 12C-Con_rep1_1.680254_2
## OTU.1               223148756               504984567
## OTU.2                       0                  545929
## OTU.3                74382919               268597197
## OTU.4                       0                       0
## OTU.5                       0                       0
##       12C-Con_rep1_1.679431_3 12C-Con_rep1_1.682305_4
## OTU.1               207913325               252808193
## OTU.2                 2612786                18591630
## OTU.3               181749675               372920136
## OTU.4                       0                       0
## OTU.5                  107375                  742284
##       12C-Con_rep1_1.689408_5
## OTU.1               327552736
## OTU.2               162788652
## OTU.3               790737356
## OTU.4                  238789
## OTU.5                22077085

BD shift (Z) and atom fraction excess (A)

With the transformed OTU abundance counts, let’s calculate the BD shift (Z) and atom fraction excess (A) for each OTU. We’ll be comparing controls (designated by the control_expr option) and the labeled treatment (all samples not identified by the control_expr expression). In this case the control_expr is set to Treatment=="12C-Con", which will select all samples where the Treatment column in the phyloseq sample_data table equals 12C-Con.

The treatment_rep option designates which column in the sample_data table defines the replicate gradients (eg., 12C-Con_rep1, 12C-Con_rep2, etc).

# The 'Treatment' column designates treatment vs control
# The 'Replicate' column designates treatment replicates
physeq_rep3 %>% sample_data %>% dplyr::select(Treatment, Replicate) %>% distinct
##   Treatment Replicate
## 1   12C-Con         1
## 2   12C-Con         2
## 3   12C-Con         3
## 4   13C-Glu         1
## 5   13C-Glu         2
## 6   13C-Glu         3
atomX = qSIP_atom_excess(physeq_rep3_t,
                         control_expr='Treatment=="12C-Con"',
                         treatment_rep='Replicate')
atomX %>% names
## [1] "W" "A"

The output is a list of data.frames. The W table lists the weighted mean BD for each OTU in each gradient. This can be considered a sort-of ‘raw data’ table. The real meat of the output is the A table, which lists (among other things) the BD shift (Z) and atom fraction excess (A) as defined by Hungate et al., (2015).

Note: a more complicated expression can be used to designate control samples. For example: ‘Treatment == “12C-Con” & Day == 1’

Bootstrap confidence intervals

OK. The last step is to calculate atom % excess confidence intervals (CI). This is necessary for getting some sort of indication on how accurate the atom fraction excess estimations are. Also, Hungate et al., (2015) uses the CIs to call incorporators, in which an OTU was considered an incorporator if the CI was greater than, and did not span, zero.

We’ll use 20 bootstrap replicates for this tutorial, but I recommend 100 for a real analysis. Bootstrap replicates can be run in parallel (see the function documentation).

df_atomX_boot = qSIP_bootstrap(atomX, n_boot=20)
df_atomX_boot %>% head
## # A tibble: 6 x 11
##     OTU     Wlab   Wlight          Z        Gi   Mlight Mheavymax     Mlab
##   <chr>    <dbl>    <dbl>      <dbl>     <dbl>    <dbl>     <dbl>    <dbl>
## 1 OTU.1 1.721357 1.699179 0.02217875 0.6361399 308.0065  317.6638 312.0268
## 2 OTU.2 1.706713 1.694977 0.01173536 0.5858306 307.9816  317.6640 310.1139
## 3 OTU.3 1.731520 1.696813 0.03470613 0.6078170 307.9925  317.6639 314.2921
## 4 OTU.4 1.730946 1.704892 0.02605382 0.7045621 308.0405  317.6636 312.7479
## 5 OTU.5 1.729513 1.705316 0.02419670 0.7096354 308.0430  317.6636 312.4138
## 6 OTU.6 1.727713 1.699375 0.02833859 0.6384897 308.0077  317.6638 313.1440
## # ... with 3 more variables: A <dbl>, A_CI_low <dbl>, A_CI_high <dbl>

Let’s use the same threshold as Hungate et al., (2015) for defining incorporators.

CI_threshold = 0
df_atomX_boot = df_atomX_boot %>%
  mutate(Incorporator = A_CI_low > CI_threshold,
                OTU = reorder(OTU, -A))

How many incorporators?

n_incorp = df_atomX_boot %>%
  filter(Incorporator == TRUE) %>%
  nrow
cat('Number of incorporators:', n_incorp, '\n')
## Number of incorporators: 6

OK, let’s plot the results.

ggplot(df_atomX_boot, aes(OTU, A, ymin=A_CI_low, ymax=A_CI_high, color=Incorporator)) +
  geom_pointrange(size=0.25) +
  geom_linerange() +
  geom_hline(yintercept=0, linetype='dashed', alpha=0.5) +
  labs(x='OTU', y='Atom fraction excess') +
  theme_bw() +
  theme(
    axis.text.x = element_blank()
  )

Session info

sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2    phyloseq_1.20.0 HTSSIP_1.4.0    ggplot2_2.2.1  
## [5] tidyr_0.7.2     dplyr_0.7.4    
## 
## loaded via a namespace (and not attached):
##   [1] nlme_3.1-131               bitops_1.0-6              
##   [3] matrixStats_0.52.2         bit64_0.9-7               
##   [5] doParallel_1.0.11          RColorBrewer_1.1-2        
##   [7] rprojroot_1.2              GenomeInfoDb_1.12.3       
##   [9] tools_3.4.3                backports_1.1.1           
##  [11] R6_2.2.2                   coenocliner_0.2-2         
##  [13] vegan_2.4-4                rpart_4.1-11              
##  [15] Hmisc_4.0-3                DBI_0.7                   
##  [17] lazyeval_0.2.0             BiocGenerics_0.22.1       
##  [19] mgcv_1.8-22                colorspace_1.3-2          
##  [21] permute_0.9-4              ade4_1.7-8                
##  [23] nnet_7.3-12                tidyselect_0.2.2          
##  [25] gridExtra_2.3              DESeq2_1.16.1             
##  [27] bit_1.1-12                 compiler_3.4.3            
##  [29] Biobase_2.36.2             htmlTable_1.9             
##  [31] DelayedArray_0.2.7         labeling_0.3              
##  [33] scales_0.5.0               checkmate_1.8.5           
##  [35] genefilter_1.58.1          stringr_1.2.0             
##  [37] digest_0.6.12              foreign_0.8-69            
##  [39] rmarkdown_1.6              XVector_0.16.0            
##  [41] base64enc_0.1-3            pkgconfig_2.0.1           
##  [43] htmltools_0.3.6            htmlwidgets_0.9           
##  [45] rlang_0.1.2                RSQLite_2.0               
##  [47] bindr_0.1                  jsonlite_1.5              
##  [49] BiocParallel_1.10.1        acepack_1.4.1             
##  [51] RCurl_1.95-4.8             magrittr_1.5              
##  [53] GenomeInfoDbData_0.99.0    Formula_1.2-2             
##  [55] biomformat_1.4.0           Matrix_1.2-11             
##  [57] Rcpp_0.12.16               munsell_0.4.3             
##  [59] S4Vectors_0.14.7           ape_4.1                   
##  [61] stringi_1.1.5              yaml_2.1.14               
##  [63] MASS_7.3-47                SummarizedExperiment_1.6.5
##  [65] zlibbioc_1.22.0            rhdf5_2.20.0              
##  [67] plyr_1.8.4                 blob_1.1.0                
##  [69] grid_3.4.3                 parallel_3.4.3            
##  [71] lattice_0.20-35            Biostrings_2.44.2         
##  [73] splines_3.4.3              multtest_2.32.0           
##  [75] annotate_1.54.0            locfit_1.5-9.1            
##  [77] knitr_1.17                 igraph_1.1.2              
##  [79] GenomicRanges_1.28.6       geneplotter_1.54.0        
##  [81] reshape2_1.4.2             codetools_0.2-15          
##  [83] stats4_3.4.3               XML_3.98-1.9              
##  [85] glue_1.1.1                 evaluate_0.10.1           
##  [87] latticeExtra_0.6-28        data.table_1.10.4-2       
##  [89] foreach_1.4.3              gtable_0.2.0              
##  [91] purrr_0.2.4                assertthat_0.2.0          
##  [93] xtable_1.8-2               survival_2.41-3           
##  [95] tibble_1.3.4               iterators_1.0.8           
##  [97] memoise_1.1.0              AnnotationDbi_1.38.2      
##  [99] IRanges_2.10.5             cluster_2.0.6