Quantifying isotope incorporation

Nick Youngblut

2018-01-23


Introduction

There are two prevaling methods for using HTS-SIP data to estimate the amount of isotope that each OTU incorporated:

In this vignette, we are going to show how to run both analyses and also compare the results a bit.

Dataset

First, let’s load some packages including HTSSIP.

library(dplyr)
library(ggplot2)
library(HTSSIP)

OK. We’re going to be using 2 data files:

We’ll be using the dataset that we simulated in the HTSSIP_sim vignette.

The phyloseq object is similar to the dataset in the other vignettes.

# HTS-SIP data
physeq_rep3
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 6 taxa and 144 samples ]
## sample_data() Sample Data:       [ 144 samples by 5 sample variables ]

The associated qPCR data is a list of length = 2.

# qPCR data (list object)
physeq_rep3_qPCR %>% names
## [1] "raw"     "summary"

For the analyses in this vignette, we only need the ‘summary’ table.

# qPCR data (list object)
physeq_rep3_qPCR_sum = physeq_rep3_qPCR$summary
physeq_rep3_qPCR_sum %>% head(n=4)
##   IS_CONTROL                  Sample Buoyant_density qPCR_tech_rep_mean
## 1      FALSE 13C-Glu_rep1_1.671712_2        1.671712           46598172
## 2      FALSE 13C-Glu_rep1_1.671722_1        1.671722           42299449
## 3      FALSE 13C-Glu_rep1_1.680311_3        1.680311           53536519
## 4      FALSE 13C-Glu_rep1_1.683540_4        1.683540           51449609
##   qPCR_tech_rep_sd     Gradient Fraction Treatment Replicate
## 1          9055833 13C-Glu_rep1        2   13C-Glu         1
## 2         16369298 13C-Glu_rep1        1   13C-Glu         1
## 3         18265667 13C-Glu_rep1        3   13C-Glu         1
## 4          3796576 13C-Glu_rep1        4   13C-Glu         1

q-SIP

OK. Let’s quantify isotope incorporation witht the q-SIP method.

# transforming OTU counts
physeq_rep3_t = OTU_qPCR_trans(physeq_rep3, physeq_rep3_qPCR_sum)

# calculating atom fraction excess
atomX = qSIP_atom_excess(physeq_rep3_t,
                         control_expr='Treatment=="12C-Con"',
                         treatment_rep='Replicate')
atomX %>% names
## [1] "W" "A"

The resulting list object contains 2 data.frames. We are interested in the ‘A’ table, which contains estimated BD shifts (Z) and atom fraction excess (A).

atomX$A %>% head(n=4)
## # A tibble: 4 x 9
##     OTU     Wlab   Wlight          Z        Gi   Mlight Mheavymax     Mlab
##   <chr>    <dbl>    <dbl>      <dbl>     <dbl>    <dbl>     <dbl>    <dbl>
## 1 OTU.1 1.721357 1.699179 0.02217875 0.6361399 308.0065  317.6638 312.0268
## 2 OTU.2 1.706713 1.694977 0.01173536 0.5858306 307.9816  317.6640 310.1139
## 3 OTU.3 1.731520 1.696813 0.03470613 0.6078170 307.9925  317.6639 314.2921
## 4 OTU.4 1.730946 1.704892 0.02605382 0.7045621 308.0405  317.6636 312.7479
## # ... with 1 more variables: A <dbl>

Next, let’s calculate bootstrap confidence intervales for the atom fraction excess estimations.

# calculating bootstrapped CI values
df_atomX_boot = qSIP_bootstrap(atomX, n_boot=100)
df_atomX_boot %>% head(n=4)
## # A tibble: 4 x 11
##     OTU     Wlab   Wlight          Z        Gi   Mlight Mheavymax     Mlab
##   <chr>    <dbl>    <dbl>      <dbl>     <dbl>    <dbl>     <dbl>    <dbl>
## 1 OTU.1 1.721357 1.699179 0.02217875 0.6361399 308.0065  317.6638 312.0268
## 2 OTU.2 1.706713 1.694977 0.01173536 0.5858306 307.9816  317.6640 310.1139
## 3 OTU.3 1.731520 1.696813 0.03470613 0.6078170 307.9925  317.6639 314.2921
## 4 OTU.4 1.730946 1.704892 0.02605382 0.7045621 308.0405  317.6636 312.7479
## # ... with 3 more variables: A <dbl>, A_CI_low <dbl>, A_CI_high <dbl>

delta_BD

Now for delta_BD. The setup is easier because we are not using qPCR data, just relative abundances from 16S rRNA sequence data.

df_dBD = delta_BD(physeq_rep3, control_expr='Treatment=="12C-Con"')
df_dBD %>% head(n=4)
## # A tibble: 4 x 4
##     OTU CM_control CM_treatment   delta_BD
##   <chr>      <dbl>        <dbl>      <dbl>
## 1 OTU.1   1.694039     1.720067 0.02602843
## 2 OTU.2   1.690518     1.703567 0.01304837
## 3 OTU.3   1.686954     1.743139 0.05618485
## 4 OTU.4   1.705739     1.726582 0.02084359

Comparing results

Let’s plot the data and compare all of the results. First, let’s join all of the data into one table for plotting. We’ll also format it for plotting.

# checking & joining data 
stopifnot(nrow(df_atomX_boot) == nrow(df_dBD))
df_j = dplyr::inner_join(df_atomX_boot, df_dBD, c('OTU'='OTU'))
stopifnot(nrow(df_atomX_boot) == nrow(df_j))

# formatting data for plotting
df_j = df_j %>%
  dplyr::mutate(OTU = reorder(OTU, -delta_BD))

OK. Time to plot!

# plotting BD shift (Z)
ggplot(df_j, aes(OTU)) +
  geom_point(aes(y=Z), color='blue') +
  geom_point(aes(y=delta_BD), color='red') +
  geom_hline(yintercept=0, linetype='dashed', alpha=0.5) +
  labs(x='OTU', y='BD shift (Z)') +
  theme_bw() +
  theme(
    axis.text.x = element_blank()
  )

In the figure, red points are delta_BD and blue points are q-SIP. It’s easy to see that delta_BD is a lot more variable than q-SIP. This is likely due to a high influence of compositional data artifacts on delta_BD versus q-SIP.

Let’s make a boxplot to show the difference in estimation variance between the two methods.

# plotting BD shift (Z): boxplots

## formatting the table
df_j_g = df_j %>%
  dplyr::select(OTU, Z, delta_BD) %>%
  tidyr::gather(Method, BD_shift, Z, delta_BD) %>%
  mutate(Method = ifelse(Method == 'Z', 'q-SIP', 'delta-BD'))

## plotting 
ggplot(df_j_g, aes(Method, BD_shift)) +
  geom_boxplot() +
  geom_hline(yintercept=0, linetype='dashed', alpha=0.5) +
  labs(x='Method', y='BD shift (Z)') +
  theme_bw() 

The boxplot helps to summarize how much more variance delta_BD produces versus q-SIP.

Session info

sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2    phyloseq_1.22.3 HTSSIP_1.3.2    ggplot2_2.2.1  
## [5] tidyr_0.7.2     dplyr_0.7.4    
## 
## loaded via a namespace (and not attached):
##   [1] nlme_3.1-131               bitops_1.0-6              
##   [3] matrixStats_0.52.2         bit64_0.9-7               
##   [5] doParallel_1.0.11          RColorBrewer_1.1-2        
##   [7] rprojroot_1.2              GenomeInfoDb_1.12.3       
##   [9] tools_3.4.3                backports_1.1.1           
##  [11] R6_2.2.2                   coenocliner_0.2-2         
##  [13] vegan_2.4-4                rpart_4.1-11              
##  [15] Hmisc_4.0-3                DBI_0.7                   
##  [17] lazyeval_0.2.1             BiocGenerics_0.22.1       
##  [19] mgcv_1.8-22                colorspace_1.3-2          
##  [21] permute_0.9-4              ade4_1.7-8                
##  [23] nnet_7.3-12                tidyselect_0.2.2          
##  [25] gridExtra_2.3              DESeq2_1.16.1             
##  [27] bit_1.1-12                 compiler_3.4.3            
##  [29] Biobase_2.36.2             htmlTable_1.9             
##  [31] DelayedArray_0.2.7         labeling_0.3              
##  [33] scales_0.5.0               checkmate_1.8.5           
##  [35] genefilter_1.58.1          stringr_1.2.0             
##  [37] digest_0.6.12              foreign_0.8-69            
##  [39] rmarkdown_1.6              XVector_0.16.0            
##  [41] base64enc_0.1-3            pkgconfig_2.0.1           
##  [43] htmltools_0.3.6            htmlwidgets_0.9           
##  [45] rlang_0.1.2                RSQLite_2.0               
##  [47] bindr_0.1                  jsonlite_1.5              
##  [49] BiocParallel_1.10.1        acepack_1.4.1             
##  [51] RCurl_1.95-4.8             magrittr_1.5              
##  [53] GenomeInfoDbData_0.99.0    Formula_1.2-2             
##  [55] biomformat_1.4.0           Matrix_1.2-11             
##  [57] Rcpp_0.12.14               munsell_0.4.3             
##  [59] S4Vectors_0.14.7           ape_5.0                   
##  [61] stringi_1.1.5              yaml_2.1.14               
##  [63] MASS_7.3-47                SummarizedExperiment_1.6.5
##  [65] zlibbioc_1.22.0            rhdf5_2.20.0              
##  [67] plyr_1.8.4                 blob_1.1.0                
##  [69] grid_3.4.3                 parallel_3.4.3            
##  [71] lattice_0.20-35            Biostrings_2.44.2         
##  [73] splines_3.4.3              multtest_2.32.0           
##  [75] annotate_1.54.0            locfit_1.5-9.1            
##  [77] knitr_1.17                 igraph_1.1.2              
##  [79] GenomicRanges_1.28.6       geneplotter_1.54.0        
##  [81] reshape2_1.4.2             codetools_0.2-15          
##  [83] stats4_3.4.3               XML_3.98-1.9              
##  [85] glue_1.1.1                 evaluate_0.10.1           
##  [87] latticeExtra_0.6-28        data.table_1.10.4-2       
##  [89] foreach_1.4.3              gtable_0.2.0              
##  [91] purrr_0.2.4                assertthat_0.2.0          
##  [93] xtable_1.8-2               survival_2.41-3           
##  [95] tibble_1.3.4               iterators_1.0.8           
##  [97] memoise_1.1.0              AnnotationDbi_1.38.2      
##  [99] IRanges_2.10.5             cluster_2.0.6