# Moderating Differential Proportionality

## Introduction

This vignette reviews the $$\theta$$ types and their variant definitions. It also discusses how to calculate an $$F$$-statistic from $$\theta_d$$. In presenting this, we provide details on how to moderate the $$F$$-statistic using voom from the limma package. To keep this vignette tractable yet reproducible, we use a subset of the iris data set as an example.

library(propr)
data(iris)

## Moderating the F-statistic

Borrowing again from the limma package, we offer a way to calculate a moderated $$F$$-statistic for differential proportionality analysis. The principle behind moderation states that it is possible to “borrow information between genes” (i.e., via a Bayesian hierarchical model) to improve the power of statistical hypothesis testing in the setting of small sample sizes (Smyth 2004). This technique was first for developed for measuring the differential expression (DE) of (normally distributed) microarray data, but was subsequently extended to RNA-Seq count data through the use of precision weights (Law 2014). Precision weights, like those described above, can model mean-variance trends in count data to facilitate the analysis of counts as if they were normally distributed (Law 2014).

To calculate a moderated $$F$$-statistic from $$\theta_d$$, we fit the data to an empirical Bayes model (via limma::eBayes) with underlying mean-variance modeling (via limma::voom). Conventionally, for per-gene (i.e., DE) analysis (and also for VLR weighting), moderation and modeling is done for individual genes. However, for per-ratio (i.e., $$\theta_d$$) analysis, moderation and modeling is done for gene ratios. To apply the per-gene moderation to ratios, we must select a suitable reference, $$z$$, which is used for a kind of normalization of the data. The hierarchical model is then calculated after this normalization is performed. As a consequence, the moderation of the $$F$$-statistic depends on the chosen reference (although the unmoderated $$F$$-statistic does not).

By default, the reference for each i-th composition (i.e., sample vector) is the geometric mean of the composition itself. Using this reference, our normalization becomes the corresponding log-ratio transformation (i.e., the clr transformation in the default case). However, limma::voom adds pseudo-counts to the normalized counts, and in order for these to have a similar influence as if applied to the original counts, we must up-scale our log-ratio transformed data again. We achieve this by multiplying (the exponential of) the log-ratio transformed data by the mean over the reference (i.e., the constant $$\sum_i^N \mathbf{z}_i/N$$). It is this product that is used to obtain the hierarchical model parameters. In contrast, the weights themselves are calculated directly from the original counts.

The updateF function calculates the moderated $$F$$-statistic if the argument moderated equals TRUE. Meanwhile, the ivar argument defines the arbitrary feature set to use as the reference. By default, the updateF function uses the geometric mean of all features as the reference (analogous to the clr transformation), although the user may specify any reference as described in ?updateF. This function appends an “Fstat” and “theta_mod” column to the @theta slot. Note that while per-ratio modeling is used to moderate the $$F$$-statistic, per-gene modeling is still used to calculate a weighted VLR. Although limma::voom is used in both scenarios, it remains possible to moderate the $$F$$-statistic without weighting VLR (and vice versa).

pd.nn <- updateF(pd.nn, moderated = TRUE, ivar = "clr")
pd.wn <- updateF(pd.wn, moderated = TRUE, ivar = "clr")
pd.na <- updateF(pd.na, moderated = TRUE, ivar = "clr")
pd.wa <- updateF(pd.wa, moderated = TRUE, ivar = "clr")

We refer the reader to Erb et al. 2017 for an elaboration of $$F$$-statistic moderation.

## References

1. Erb, Ionas, Thomas Quinn, David Lovell, and Cedric Notredame. “Differential Proportionality - A Normalization-Free Approach To Differential Gene Expression.” bioRxiv, May 5, 2017, 134536. http://dx.doi.org/10.1101/134536.

2. Law, Charity W., Yunshun Chen, Wei Shi, and Gordon K. Smyth. “Voom: Precision Weights Unlock Linear Model Analysis Tools for RNA-Seq Read Counts.” Genome Biology 15 (January 3, 2014): R29. https://doi.org/10.1186/gb-2014-15-2-r29.

3. Smyth, Gordon K. “Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments.” Statistical Applications in Genetics and Molecular Biology 3 (2004): Article3. https://doi.org/10.2202/1544-6115.1027.