Beta diversity ordinations

Nick Youngblut

2018-01-23


Beta diversity ordinations

Dataset

First, let’s load some packages including HTSSIP.

library(dplyr)
library(ggplot2)
library(HTSSIP)

Also let’s get an overview of the phyloseq object that we’re going to use.

physeq_S2D2
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 10001 taxa and 139 samples ]
## sample_data() Sample Data:       [ 139 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 10001 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 10001 tips and 10000 internal nodes ]

Parsing the dataset

See HTSSIP introduction vignette for a description on why dataset parsing is needed.

Let’s the parameters for parsing the dataset into individual treatment-control comparisons.

params = get_treatment_params(physeq_S2D2, c('Substrate', 'Day'))
params
##   Substrate Day
## 1   12C-Con  14
## 2   13C-Cel   3
## 3   13C-Cel  14
## 4   13C-Glu  14
## 5   12C-Con   3
## 6   13C-Glu   3

We just need the parameters for each treatment, so let’s filter out the controls. In this case, the controls are all ‘12C-Con’.

params = dplyr::filter(params, Substrate!='12C-Con')
params
##   Substrate Day
## 1   13C-Cel   3
## 2   13C-Cel  14
## 3   13C-Glu  14
## 4   13C-Glu   3

Now, we will use an expression that will subset the phyloseq object into the comparisons that we want to make.

ex is the expression that will be used for pruning the phyloseq object

ex = "(Substrate=='12C-Con' & Day=='${Day}') | (Substrate=='${Substrate}' & Day == '${Day}')"
physeq_S2D2_l = phyloseq_subset(physeq_S2D2, params, ex)
physeq_S2D2_l
## $`(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 10001 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 10001 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 10001 tips and 10000 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Cel' & Day == '14')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 10001 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 10001 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 10001 tips and 10000 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Glu' & Day == '14')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 10001 taxa and 47 samples ]
## sample_data() Sample Data:       [ 47 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 10001 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 10001 tips and 10000 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Glu' & Day == '3')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 10001 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 10001 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 10001 tips and 10000 internal nodes ]

Calculating ordinations

Now, let’s actually make ordinations of each treatment compared to the control. This will return a data.frame object for plotting.

# running in parallel
doParallel::registerDoParallel(2)
physeq_S2D2_l_df = SIP_betaDiv_ord(physeq_S2D2_l, parallel=TRUE)
## Run 0 stress 0.06040942 
## Run 1 stress 0.06041009 
## ... Procrustes: rmse 0.001591321  max resid 0.003679888 
## ... Similar to previous best
## Run 2 stress 0.06044065 
## ... Procrustes: rmse 0.001961676  max resid 0.007216207 
## ... Similar to previous best
## Run 3 stress 0.06043887 
## ... Procrustes: rmse 0.001772249  max resid 0.007024563 
## ... Similar to previous best
## Run 4 stress 0.06040952 
## ... Procrustes: rmse 1.904161e-05  max resid 5.731359e-05 
## ... Similar to previous best
## Run 5 stress 0.06040672 
## ... New best solution
## ... Procrustes: rmse 0.0004204482  max resid 0.0009632369 
## ... Similar to previous best
## Run 6 stress 0.06043839 
## ... Procrustes: rmse 0.001413797  max resid 0.006642425 
## ... Similar to previous best
## Run 7 stress 0.06040883 
## ... Procrustes: rmse 0.0003283913  max resid 0.0007500169 
## ... Similar to previous best
## Run 8 stress 0.06044267 
## ... Procrustes: rmse 0.001700692  max resid 0.006971731 
## ... Similar to previous best
## Run 9 stress 0.06040887 
## ... Procrustes: rmse 0.001026885  max resid 0.002384574 
## ... Similar to previous best
## Run 10 stress 0.06040628 
## ... New best solution
## ... Procrustes: rmse 0.0001565431  max resid 0.0003643002 
## ... Similar to previous best
## Run 11 stress 0.06040922 
## ... Procrustes: rmse 0.0009197138  max resid 0.002132683 
## ... Similar to previous best
## Run 12 stress 0.06040885 
## ... Procrustes: rmse 0.0008041917  max resid 0.00186744 
## ... Similar to previous best
## Run 13 stress 0.06043994 
## ... Procrustes: rmse 0.001461327  max resid 0.006708933 
## ... Similar to previous best
## Run 14 stress 0.06044054 
## ... Procrustes: rmse 0.001175608  max resid 0.005509123 
## ... Similar to previous best
## Run 15 stress 0.06041031 
## ... Procrustes: rmse 0.001036863  max resid 0.002404426 
## ... Similar to previous best
## Run 16 stress 0.06040782 
## ... Procrustes: rmse 0.0003685387  max resid 0.0008517008 
## ... Similar to previous best
## Run 17 stress 0.06040737 
## ... Procrustes: rmse 0.0006540678  max resid 0.001516885 
## ... Similar to previous best
## Run 18 stress 0.06043745 
## ... Procrustes: rmse 0.001186312  max resid 0.006307928 
## ... Similar to previous best
## Run 19 stress 0.06040777 
## ... Procrustes: rmse 0.0003618882  max resid 0.0008357161 
## ... Similar to previous best
## Run 20 stress 0.06043864 
## ... Procrustes: rmse 0.001096475  max resid 0.005712407 
## ... Similar to previous best
## *** Solution reached
## Run 0 stress 0.07096544 
## Run 1 stress 0.1140309 
## Run 2 stress 0.1132383 
## Run 3 stress 0.09511952 
## Run 4 stress 0.07105488 
## ... Procrustes: rmse 0.00691477  max resid 0.02491228 
## Run 5 stress 0.07440186 
## Run 6 stress 0.1225752 
## Run 7 stress 0.1276908 
## Run 8 stress 0.1313183 
## Run 9 stress 0.1255872 
## Run 10 stress 0.1296816 
## Run 11 stress 0.1176397 
## Run 12 stress 0.1239674 
## Run 13 stress 0.07448963 
## Run 14 stress 0.1134581 
## Run 15 stress 0.0710617 
## ... Procrustes: rmse 0.007370344  max resid 0.02463504 
## Run 16 stress 0.07096509 
## ... New best solution
## ... Procrustes: rmse 0.0001910312  max resid 0.0004383753 
## ... Similar to previous best
## Run 17 stress 0.07440227 
## Run 18 stress 0.07375728 
## Run 19 stress 0.1138365 
## Run 20 stress 0.0744901 
## *** Solution reached
## Run 0 stress 0.06929728 
## Run 1 stress 0.06929748 
## ... Procrustes: rmse 0.0001163495  max resid 0.000402083 
## ... Similar to previous best
## Run 2 stress 0.06929732 
## ... Procrustes: rmse 4.328741e-05  max resid 0.0001473437 
## ... Similar to previous best
## Run 3 stress 0.06929744 
## ... Procrustes: rmse 9.480249e-05  max resid 0.0003077851 
## ... Similar to previous best
## Run 4 stress 0.06929728 
## ... Procrustes: rmse 1.42411e-05  max resid 5.855222e-05 
## ... Similar to previous best
## Run 5 stress 0.0692973 
## ... Procrustes: rmse 3.282031e-05  max resid 0.0001122745 
## ... Similar to previous best
## Run 6 stress 0.06929735 
## ... Procrustes: rmse 7.334294e-05  max resid 0.000253137 
## ... Similar to previous best
## Run 7 stress 0.06929728 
## ... Procrustes: rmse 2.243925e-05  max resid 7.539774e-05 
## ... Similar to previous best
## Run 8 stress 0.06929733 
## ... Procrustes: rmse 2.078073e-05  max resid 8.540961e-05 
## ... Similar to previous best
## Run 9 stress 0.06929749 
## ... Procrustes: rmse 0.0001130701  max resid 0.0003872094 
## ... Similar to previous best
## Run 10 stress 0.06929733 
## ... Procrustes: rmse 4.934065e-05  max resid 0.0001686528 
## ... Similar to previous best
## Run 11 stress 0.06929728 
## ... Procrustes: rmse 9.864056e-06  max resid 2.792237e-05 
## ... Similar to previous best
## Run 12 stress 0.06929729 
## ... Procrustes: rmse 2.377935e-05  max resid 8.600386e-05 
## ... Similar to previous best
## Run 13 stress 0.06929728 
## ... Procrustes: rmse 1.817124e-05  max resid 5.884765e-05 
## ... Similar to previous best
## Run 14 stress 0.06929729 
## ... Procrustes: rmse 1.698076e-05  max resid 5.763711e-05 
## ... Similar to previous best
## Run 15 stress 0.06929728 
## ... Procrustes: rmse 6.688849e-06  max resid 1.77267e-05 
## ... Similar to previous best
## Run 16 stress 0.06929766 
## ... Procrustes: rmse 0.0001334399  max resid 0.000463657 
## ... Similar to previous best
## Run 17 stress 0.06929759 
## ... Procrustes: rmse 0.0001339259  max resid 0.0004629197 
## ... Similar to previous best
## Run 18 stress 0.06929729 
## ... Procrustes: rmse 2.250169e-05  max resid 7.706371e-05 
## ... Similar to previous best
## Run 19 stress 0.06929728 
## ... New best solution
## ... Procrustes: rmse 4.928991e-06  max resid 1.347154e-05 
## ... Similar to previous best
## Run 20 stress 0.06929749 
## ... Procrustes: rmse 9.827476e-05  max resid 0.0003346146 
## ... Similar to previous best
## *** Solution reached
## Run 0 stress 0.09758868 
## Run 1 stress 0.09758734 
## ... New best solution
## ... Procrustes: rmse 0.0008053643  max resid 0.003160243 
## ... Similar to previous best
## Run 2 stress 0.0995672 
## Run 3 stress 0.09719132 
## ... New best solution
## ... Procrustes: rmse 0.01477239  max resid 0.05036903 
## Run 4 stress 0.09758869 
## ... Procrustes: rmse 0.01520689  max resid 0.05100202 
## Run 5 stress 0.09956517 
## Run 6 stress 0.09719266 
## ... Procrustes: rmse 0.0003025296  max resid 0.001140011 
## ... Similar to previous best
## Run 7 stress 0.09956371 
## Run 8 stress 0.0975873 
## ... Procrustes: rmse 0.01507281  max resid 0.05074366 
## Run 9 stress 0.09758681 
## ... Procrustes: rmse 0.01485927  max resid 0.05034136 
## Run 10 stress 0.09956644 
## Run 11 stress 0.0975883 
## ... Procrustes: rmse 0.01518974  max resid 0.05097043 
## Run 12 stress 0.09956744 
## Run 13 stress 0.09719186 
## ... Procrustes: rmse 0.0004673669  max resid 0.001945022 
## ... Similar to previous best
## Run 14 stress 0.09956531 
## Run 15 stress 0.09956642 
## Run 16 stress 0.09956802 
## Run 17 stress 0.09719128 
## ... New best solution
## ... Procrustes: rmse 0.0003118372  max resid 0.00132854 
## ... Similar to previous best
## Run 18 stress 0.09956689 
## Run 19 stress 0.1486752 
## Run 20 stress 0.09758678 
## ... Procrustes: rmse 0.01474656  max resid 0.05014059 
## *** Solution reached
physeq_S2D2_l_df %>% head(n=3)
##                                                           phyloseq_subset
## 1 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
## 2 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
## 3 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
##         NMDS1        NMDS2          X.Sample primer_number fwd_barcode
## 1 -0.15986664 -0.031915930 13C-Cel.D3.R1_F21             7    GGATATCT
## 2 -0.20823544 -0.090867367 13C-Cel.D3.R1_F20             6    CGTGAGTG
## 3 -0.01887917 -0.006094586 13C-Cel.D3.R1_F15             1    ATCGTACG
##   rev_barcode Substrate Day Microcosm_replicate Fraction Buoyant_density
## 1    CGAGAGTT   13C-Cel   3                   1       21        1.705640
## 2    CGAGAGTT   13C-Cel   3                   1       20        1.706186
## 3    CGAGAGTT   13C-Cel   3                   1       15        1.725310
##   Sample_type
## 1     unknown
## 2     unknown
## 3     unknown
##                                                                                   library
## 1 /var/seq_data/fullCyc/MiSeq_16SrRNA/515f-806r/lib1-7rs/../150721_lib4/run1/metadata.txt
## 2 /var/seq_data/fullCyc/MiSeq_16SrRNA/515f-806r/lib1-7rs/../150721_lib4/run1/metadata.txt
## 3 /var/seq_data/fullCyc/MiSeq_16SrRNA/515f-806r/lib1-7rs/../150721_lib4/run1/metadata.txt
##   Exp_type Sample_location Sample_date Sample_treatment
## 1      SIP              NA          NA               NA
## 2      SIP              NA          NA               NA
## 3      SIP              NA          NA               NA
##   Sample_subtreatment core_dataset
## 1                  NA         TRUE
## 2                  NA         TRUE
## 3                  NA         TRUE

Each specific phyloseq subset (treatment-control comparison) is delimited with the “phyloseq_subset” column.

physeq_S2D2_l_df %>% .$phyloseq_subset %>% unique
## [1] (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')  
## [2] (Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Cel' & Day == '14')
## [3] (Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Glu' & Day == '14')
## [4] (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Glu' & Day == '3')  
## 4 Levels: (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3') ...

For clarity, I’m going edit these long strings to make them more readable.

physeq_S2D2_l_df = physeq_S2D2_l_df %>%
  dplyr::mutate(phyloseq_subset = gsub(' \\| ', '\n', phyloseq_subset),
                phyloseq_subset = gsub('\'3\'', '\'03\'', phyloseq_subset))
physeq_S2D2_l_df %>% .$phyloseq_subset %>% unique
## [1] "(Substrate=='12C-Con' & Day=='03')\n(Substrate=='13C-Cel' & Day == '03')"
## [2] "(Substrate=='12C-Con' & Day=='14')\n(Substrate=='13C-Cel' & Day == '14')"
## [3] "(Substrate=='12C-Con' & Day=='14')\n(Substrate=='13C-Glu' & Day == '14')"
## [4] "(Substrate=='12C-Con' & Day=='03')\n(Substrate=='13C-Glu' & Day == '03')"

OK, let’s plot the data!

phyloseq_ord_plot(physeq_S2D2_l_df)

As you can see, the ‘heavy’ gradient fraction ‘communities’ for the labeled-treatments tend to diverge from the unlabeled gradient fraction communities, but the amount of divergence in dependent on substrate and time point.

Session info

sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2    phyloseq_1.22.3 HTSSIP_1.3.2    ggplot2_2.2.1  
## [5] tidyr_0.7.2     dplyr_0.7.4    
## 
## loaded via a namespace (and not attached):
##   [1] nlme_3.1-131               bitops_1.0-6              
##   [3] matrixStats_0.52.2         bit64_0.9-7               
##   [5] doParallel_1.0.11          RColorBrewer_1.1-2        
##   [7] rprojroot_1.2              GenomeInfoDb_1.12.3       
##   [9] tools_3.4.3                backports_1.1.1           
##  [11] R6_2.2.2                   coenocliner_0.2-2         
##  [13] vegan_2.4-4                rpart_4.1-11              
##  [15] Hmisc_4.0-3                DBI_0.7                   
##  [17] lazyeval_0.2.1             BiocGenerics_0.22.1       
##  [19] mgcv_1.8-22                colorspace_1.3-2          
##  [21] permute_0.9-4              ade4_1.7-8                
##  [23] nnet_7.3-12                tidyselect_0.2.2          
##  [25] gridExtra_2.3              DESeq2_1.16.1             
##  [27] bit_1.1-12                 compiler_3.4.3            
##  [29] Biobase_2.36.2             htmlTable_1.9             
##  [31] DelayedArray_0.2.7         labeling_0.3              
##  [33] scales_0.5.0               checkmate_1.8.5           
##  [35] genefilter_1.58.1          stringr_1.2.0             
##  [37] digest_0.6.12              foreign_0.8-69            
##  [39] rmarkdown_1.6              XVector_0.16.0            
##  [41] base64enc_0.1-3            pkgconfig_2.0.1           
##  [43] htmltools_0.3.6            htmlwidgets_0.9           
##  [45] rlang_0.1.2                RSQLite_2.0               
##  [47] bindr_0.1                  jsonlite_1.5              
##  [49] BiocParallel_1.10.1        acepack_1.4.1             
##  [51] RCurl_1.95-4.8             magrittr_1.5              
##  [53] GenomeInfoDbData_0.99.0    Formula_1.2-2             
##  [55] biomformat_1.4.0           Matrix_1.2-11             
##  [57] Rcpp_0.12.14               munsell_0.4.3             
##  [59] S4Vectors_0.14.7           ape_5.0                   
##  [61] stringi_1.1.5              yaml_2.1.14               
##  [63] MASS_7.3-47                SummarizedExperiment_1.6.5
##  [65] zlibbioc_1.22.0            rhdf5_2.20.0              
##  [67] plyr_1.8.4                 blob_1.1.0                
##  [69] grid_3.4.3                 parallel_3.4.3            
##  [71] lattice_0.20-35            Biostrings_2.44.2         
##  [73] splines_3.4.3              multtest_2.32.0           
##  [75] annotate_1.54.0            locfit_1.5-9.1            
##  [77] knitr_1.17                 igraph_1.1.2              
##  [79] GenomicRanges_1.28.6       geneplotter_1.54.0        
##  [81] reshape2_1.4.2             codetools_0.2-15          
##  [83] stats4_3.4.3               XML_3.98-1.9              
##  [85] glue_1.1.1                 evaluate_0.10.1           
##  [87] latticeExtra_0.6-28        data.table_1.10.4-2       
##  [89] foreach_1.4.3              gtable_0.2.0              
##  [91] purrr_0.2.4                assertthat_0.2.0          
##  [93] xtable_1.8-2               survival_2.41-3           
##  [95] tibble_1.3.4               iterators_1.0.8           
##  [97] memoise_1.1.0              AnnotationDbi_1.38.2      
##  [99] IRanges_2.10.5             cluster_2.0.6