Beta diversity ordinations

Nick Youngblut

2018-05-14


Beta diversity ordinations

Dataset

First, let’s load some packages including HTSSIP.

library(dplyr)
library(ggplot2)
library(HTSSIP)

Also let’s get an overview of the phyloseq object that we’re going to use.

physeq_S2D2
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 139 samples ]
## sample_data() Sample Data:       [ 139 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]

Parsing the dataset

See HTSSIP introduction vignette for a description on why dataset parsing is needed.

Let’s the parameters for parsing the dataset into individual treatment-control comparisons.

params = get_treatment_params(physeq_S2D2, c('Substrate', 'Day'))
params
##   Substrate Day
## 1   12C-Con  14
## 2   13C-Cel   3
## 3   13C-Cel  14
## 4   13C-Glu  14
## 5   12C-Con   3
## 6   13C-Glu   3

We just need the parameters for each treatment, so let’s filter out the controls. In this case, the controls are all ‘12C-Con’.

params = dplyr::filter(params, Substrate!='12C-Con')
params
##   Substrate Day
## 1   13C-Cel   3
## 2   13C-Cel  14
## 3   13C-Glu  14
## 4   13C-Glu   3

Now, we will use an expression that will subset the phyloseq object into the comparisons that we want to make.

ex is the expression that will be used for pruning the phyloseq object

ex = "(Substrate=='12C-Con' & Day=='${Day}') | (Substrate=='${Substrate}' & Day == '${Day}')"
physeq_S2D2_l = phyloseq_subset(physeq_S2D2, params, ex)
physeq_S2D2_l
## $`(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Cel' & Day == '14')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Glu' & Day == '14')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 47 samples ]
## sample_data() Sample Data:       [ 47 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Glu' & Day == '3')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]

Calculating ordinations

Now, let’s actually make ordinations of each treatment compared to the control. This will return a data.frame object for plotting.

# running in parallel
doParallel::registerDoParallel(2)
physeq_S2D2_l_df = SIP_betaDiv_ord(physeq_S2D2_l, parallel=TRUE)
## Run 0 stress 0.06000133 
## Run 1 stress 0.06000079 
## ... New best solution
## ... Procrustes: rmse 4.76635e-05  max resid 0.0001183245 
## ... Similar to previous best
## Run 2 stress 0.05999878 
## ... New best solution
## ... Procrustes: rmse 0.0007673419  max resid 0.002174231 
## ... Similar to previous best
## Run 3 stress 0.06000025 
## ... Procrustes: rmse 0.0002583214  max resid 0.0007257033 
## ... Similar to previous best
## Run 4 stress 0.06000132 
## ... Procrustes: rmse 0.0008298318  max resid 0.002359842 
## ... Similar to previous best
## Run 5 stress 0.05999855 
## ... New best solution
## ... Procrustes: rmse 0.0002404095  max resid 0.0006772252 
## ... Similar to previous best
## Run 6 stress 0.06000119 
## ... Procrustes: rmse 0.0005954045  max resid 0.001672996 
## ... Similar to previous best
## Run 7 stress 0.06000162 
## ... Procrustes: rmse 0.0007033992  max resid 0.001983281 
## ... Similar to previous best
## Run 8 stress 0.05999905 
## ... Procrustes: rmse 0.0002689876  max resid 0.0007601238 
## ... Similar to previous best
## Run 9 stress 0.06000217 
## ... Procrustes: rmse 0.000758332  max resid 0.002137149 
## ... Similar to previous best
## Run 10 stress 0.06000222 
## ... Procrustes: rmse 0.000682585  max resid 0.00194309 
## ... Similar to previous best
## Run 11 stress 0.06000121 
## ... Procrustes: rmse 0.0006338837  max resid 0.001786944 
## ... Similar to previous best
## Run 12 stress 0.06000015 
## ... Procrustes: rmse 0.0005002517  max resid 0.001413552 
## ... Similar to previous best
## Run 13 stress 0.06000224 
## ... Procrustes: rmse 0.0007031324  max resid 0.00196629 
## ... Similar to previous best
## Run 14 stress 0.05999863 
## ... Procrustes: rmse 0.0001768579  max resid 0.0005147077 
## ... Similar to previous best
## Run 15 stress 0.06000198 
## ... Procrustes: rmse 0.0007180126  max resid 0.00202838 
## ... Similar to previous best
## Run 16 stress 0.06000269 
## ... Procrustes: rmse 0.0007187226  max resid 0.002040055 
## ... Similar to previous best
## Run 17 stress 0.06000053 
## ... Procrustes: rmse 0.0005480856  max resid 0.001536443 
## ... Similar to previous best
## Run 18 stress 0.06000346 
## ... Procrustes: rmse 0.0008048583  max resid 0.002260405 
## ... Similar to previous best
## Run 19 stress 0.2366197 
## Run 20 stress 0.05999882 
## ... Procrustes: rmse 0.0002499372  max resid 0.0007488858 
## ... Similar to previous best
## *** Solution reached
## Run 0 stress 0.07196424 
## Run 1 stress 0.1319375 
## Run 2 stress 0.129617 
## Run 3 stress 0.1010428 
## Run 4 stress 0.07206155 
## ... Procrustes: rmse 0.005243278  max resid 0.02280419 
## Run 5 stress 0.07196366 
## ... New best solution
## ... Procrustes: rmse 0.0001232941  max resid 0.0002892083 
## ... Similar to previous best
## Run 6 stress 0.1283992 
## Run 7 stress 0.1329558 
## Run 8 stress 0.1374138 
## Run 9 stress 0.1318414 
## Run 10 stress 0.1361472 
## Run 11 stress 0.1199625 
## Run 12 stress 0.130733 
## Run 13 stress 0.07393864 
## Run 14 stress 0.1172225 
## Run 15 stress 0.07196433 
## ... Procrustes: rmse 0.0001492686  max resid 0.0003513592 
## ... Similar to previous best
## Run 16 stress 0.07196383 
## ... Procrustes: rmse 5.555946e-05  max resid 0.0001380408 
## ... Similar to previous best
## Run 17 stress 0.07457581 
## Run 18 stress 0.07404662 
## Run 19 stress 0.1186507 
## Run 20 stress 0.07196427 
## ... Procrustes: rmse 0.000372391  max resid 0.001150812 
## ... Similar to previous best
## *** Solution reached
## Run 0 stress 0.07255501 
## Run 1 stress 0.07255526 
## ... Procrustes: rmse 0.0001154799  max resid 0.0004017811 
## ... Similar to previous best
## Run 2 stress 0.07255521 
## ... Procrustes: rmse 9.374883e-05  max resid 0.0003199492 
## ... Similar to previous best
## Run 3 stress 0.07255501 
## ... Procrustes: rmse 8.016805e-06  max resid 2.850354e-05 
## ... Similar to previous best
## Run 4 stress 0.07255505 
## ... Procrustes: rmse 3.009653e-05  max resid 0.0001016472 
## ... Similar to previous best
## Run 5 stress 0.07255502 
## ... Procrustes: rmse 2.06275e-05  max resid 7.843908e-05 
## ... Similar to previous best
## Run 6 stress 0.07255504 
## ... Procrustes: rmse 4.134275e-05  max resid 0.0001448403 
## ... Similar to previous best
## Run 7 stress 0.07255501 
## ... Procrustes: rmse 7.638422e-06  max resid 4.579753e-05 
## ... Similar to previous best
## Run 8 stress 0.07255533 
## ... Procrustes: rmse 0.0001229787  max resid 0.0004466971 
## ... Similar to previous best
## Run 9 stress 0.07255507 
## ... Procrustes: rmse 1.723695e-05  max resid 4.729386e-05 
## ... Similar to previous best
## Run 10 stress 0.07255507 
## ... Procrustes: rmse 5.510571e-05  max resid 0.0001901477 
## ... Similar to previous best
## Run 11 stress 0.07255501 
## ... Procrustes: rmse 4.899759e-06  max resid 1.788823e-05 
## ... Similar to previous best
## Run 12 stress 0.07255504 
## ... Procrustes: rmse 3.884768e-05  max resid 0.0001316485 
## ... Similar to previous best
## Run 13 stress 0.07255501 
## ... Procrustes: rmse 3.915431e-06  max resid 1.688773e-05 
## ... Similar to previous best
## Run 14 stress 0.07255534 
## ... Procrustes: rmse 0.0001334698  max resid 0.000458489 
## ... Similar to previous best
## Run 15 stress 0.07255505 
## ... Procrustes: rmse 2.318446e-05  max resid 7.457831e-05 
## ... Similar to previous best
## Run 16 stress 0.07255504 
## ... Procrustes: rmse 3.816817e-05  max resid 0.0001420784 
## ... Similar to previous best
## Run 17 stress 0.07255501 
## ... Procrustes: rmse 1.240037e-05  max resid 4.174521e-05 
## ... Similar to previous best
## Run 18 stress 0.07255507 
## ... Procrustes: rmse 4.521679e-05  max resid 0.0001653047 
## ... Similar to previous best
## Run 19 stress 0.07255509 
## ... Procrustes: rmse 6.734897e-05  max resid 0.0002079885 
## ... Similar to previous best
## Run 20 stress 0.07255515 
## ... Procrustes: rmse 8.612619e-05  max resid 0.0003172871 
## ... Similar to previous best
## *** Solution reached
## Run 0 stress 0.1080816 
## Run 1 stress 0.1013612 
## ... New best solution
## ... Procrustes: rmse 0.03458739  max resid 0.1340111 
## Run 2 stress 0.1080834 
## Run 3 stress 0.1013618 
## ... Procrustes: rmse 0.001106331  max resid 0.004059074 
## ... Similar to previous best
## Run 4 stress 0.1080811 
## Run 5 stress 0.1080837 
## Run 6 stress 0.101361 
## ... New best solution
## ... Procrustes: rmse 0.0009579509  max resid 0.003513939 
## ... Similar to previous best
## Run 7 stress 0.1013639 
## ... Procrustes: rmse 0.001302598  max resid 0.004787408 
## ... Similar to previous best
## Run 8 stress 0.1080829 
## Run 9 stress 0.1013646 
## ... Procrustes: rmse 0.001364516  max resid 0.005018659 
## ... Similar to previous best
## Run 10 stress 0.1080837 
## Run 11 stress 0.108083 
## Run 12 stress 0.1080817 
## Run 13 stress 0.1013601 
## ... New best solution
## ... Procrustes: rmse 0.0007068489  max resid 0.002555171 
## ... Similar to previous best
## Run 14 stress 0.1080826 
## Run 15 stress 0.1080859 
## Run 16 stress 0.1080818 
## Run 17 stress 0.1013634 
## ... Procrustes: rmse 0.0005336088  max resid 0.001998047 
## ... Similar to previous best
## Run 18 stress 0.1080832 
## Run 19 stress 0.1450325 
## Run 20 stress 0.1013608 
## ... Procrustes: rmse 0.0001440085  max resid 0.0005593333 
## ... Similar to previous best
## *** Solution reached
physeq_S2D2_l_df %>% head(n=3)
##                                                           phyloseq_subset
## 1 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
## 2 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
## 3 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
##         NMDS1        NMDS2          X.Sample primer_number fwd_barcode
## 1 -0.15456305 -0.035393733 13C-Cel.D3.R1_F21             7    GGATATCT
## 2 -0.20505799 -0.101701050 13C-Cel.D3.R1_F20             6    CGTGAGTG
## 3 -0.02321259 -0.002126514 13C-Cel.D3.R1_F15             1    ATCGTACG
##   rev_barcode Substrate Day Microcosm_replicate Fraction Buoyant_density
## 1    CGAGAGTT   13C-Cel   3                   1       21        1.705640
## 2    CGAGAGTT   13C-Cel   3                   1       20        1.706186
## 3    CGAGAGTT   13C-Cel   3                   1       15        1.725310
##   Sample_type
## 1     unknown
## 2     unknown
## 3     unknown
##                                                                                   library
## 1 /var/seq_data/fullCyc/MiSeq_16SrRNA/515f-806r/lib1-7rs/../150721_lib4/run1/metadata.txt
## 2 /var/seq_data/fullCyc/MiSeq_16SrRNA/515f-806r/lib1-7rs/../150721_lib4/run1/metadata.txt
## 3 /var/seq_data/fullCyc/MiSeq_16SrRNA/515f-806r/lib1-7rs/../150721_lib4/run1/metadata.txt
##   Exp_type Sample_location Sample_date Sample_treatment
## 1      SIP              NA          NA               NA
## 2      SIP              NA          NA               NA
## 3      SIP              NA          NA               NA
##   Sample_subtreatment core_dataset
## 1                  NA         TRUE
## 2                  NA         TRUE
## 3                  NA         TRUE

Each specific phyloseq subset (treatment-control comparison) is delimited with the “phyloseq_subset” column.

physeq_S2D2_l_df %>% .$phyloseq_subset %>% unique
## [1] (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')  
## [2] (Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Cel' & Day == '14')
## [3] (Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Glu' & Day == '14')
## [4] (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Glu' & Day == '3')  
## 4 Levels: (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3') ...

For clarity, I’m going edit these long strings to make them more readable.

physeq_S2D2_l_df = physeq_S2D2_l_df %>%
  dplyr::mutate(phyloseq_subset = gsub(' \\| ', '\n', phyloseq_subset),
                phyloseq_subset = gsub('\'3\'', '\'03\'', phyloseq_subset))
physeq_S2D2_l_df %>% .$phyloseq_subset %>% unique
## [1] "(Substrate=='12C-Con' & Day=='03')\n(Substrate=='13C-Cel' & Day == '03')"
## [2] "(Substrate=='12C-Con' & Day=='14')\n(Substrate=='13C-Cel' & Day == '14')"
## [3] "(Substrate=='12C-Con' & Day=='14')\n(Substrate=='13C-Glu' & Day == '14')"
## [4] "(Substrate=='12C-Con' & Day=='03')\n(Substrate=='13C-Glu' & Day == '03')"

OK, let’s plot the data!

phyloseq_ord_plot(physeq_S2D2_l_df)

As you can see, the ‘heavy’ gradient fraction ‘communities’ for the labeled-treatments tend to diverge from the unlabeled gradient fraction communities, but the amount of divergence in dependent on substrate and time point.

Session info

sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2    phyloseq_1.20.0 HTSSIP_1.4.0    ggplot2_2.2.1  
## [5] tidyr_0.7.2     dplyr_0.7.4    
## 
## loaded via a namespace (and not attached):
##   [1] nlme_3.1-131               bitops_1.0-6              
##   [3] matrixStats_0.52.2         bit64_0.9-7               
##   [5] doParallel_1.0.11          RColorBrewer_1.1-2        
##   [7] rprojroot_1.2              GenomeInfoDb_1.12.3       
##   [9] tools_3.4.3                backports_1.1.1           
##  [11] R6_2.2.2                   coenocliner_0.2-2         
##  [13] vegan_2.4-4                rpart_4.1-11              
##  [15] Hmisc_4.0-3                DBI_0.7                   
##  [17] lazyeval_0.2.0             BiocGenerics_0.22.1       
##  [19] mgcv_1.8-22                colorspace_1.3-2          
##  [21] permute_0.9-4              ade4_1.7-8                
##  [23] nnet_7.3-12                tidyselect_0.2.2          
##  [25] gridExtra_2.3              DESeq2_1.16.1             
##  [27] bit_1.1-12                 compiler_3.4.3            
##  [29] Biobase_2.36.2             htmlTable_1.9             
##  [31] DelayedArray_0.2.7         labeling_0.3              
##  [33] scales_0.5.0               checkmate_1.8.5           
##  [35] genefilter_1.58.1          stringr_1.2.0             
##  [37] digest_0.6.12              foreign_0.8-69            
##  [39] rmarkdown_1.6              XVector_0.16.0            
##  [41] base64enc_0.1-3            pkgconfig_2.0.1           
##  [43] htmltools_0.3.6            htmlwidgets_0.9           
##  [45] rlang_0.1.2                RSQLite_2.0               
##  [47] bindr_0.1                  jsonlite_1.5              
##  [49] BiocParallel_1.10.1        acepack_1.4.1             
##  [51] RCurl_1.95-4.8             magrittr_1.5              
##  [53] GenomeInfoDbData_0.99.0    Formula_1.2-2             
##  [55] biomformat_1.4.0           Matrix_1.2-11             
##  [57] Rcpp_0.12.16               munsell_0.4.3             
##  [59] S4Vectors_0.14.7           ape_4.1                   
##  [61] stringi_1.1.5              yaml_2.1.14               
##  [63] MASS_7.3-47                SummarizedExperiment_1.6.5
##  [65] zlibbioc_1.22.0            rhdf5_2.20.0              
##  [67] plyr_1.8.4                 blob_1.1.0                
##  [69] grid_3.4.3                 parallel_3.4.3            
##  [71] lattice_0.20-35            Biostrings_2.44.2         
##  [73] splines_3.4.3              multtest_2.32.0           
##  [75] annotate_1.54.0            locfit_1.5-9.1            
##  [77] knitr_1.17                 igraph_1.1.2              
##  [79] GenomicRanges_1.28.6       geneplotter_1.54.0        
##  [81] reshape2_1.4.2             codetools_0.2-15          
##  [83] stats4_3.4.3               XML_3.98-1.9              
##  [85] glue_1.1.1                 evaluate_0.10.1           
##  [87] latticeExtra_0.6-28        data.table_1.10.4-2       
##  [89] foreach_1.4.3              gtable_0.2.0              
##  [91] purrr_0.2.4                assertthat_0.2.0          
##  [93] xtable_1.8-2               survival_2.41-3           
##  [95] tibble_1.3.4               iterators_1.0.8           
##  [97] memoise_1.1.0              AnnotationDbi_1.38.2      
##  [99] IRanges_2.10.5             cluster_2.0.6