ZIBR (Zero-Inflated Beta Random Effect model)

R-CMD-check codecov Published in Bioinformatics


The longitudinal microbiome compositional data are highly skewed, bounded in [0,1), and often sparse with many zeros. In addition, the observations from repeated measures are correlated. We propose a two-part zero-inflated Beta regression model with random effects (ZIBR) for testing the association between microbial abundance and clinical covariates for longitudinal microbiome data. The model includes a logistic component to model presence/absence of the microbe in samples and a Beta component to model non-zero microbial abundance and each component includes a random effect to take into account the correlation among repeated measurements on the same subject.

The details of the statistical model are as follows:

The ZIBR model combines the logistic regression and Beta regression in one model. Each regression part includes random effects to account for correlations acorss time points. We call these two regressions in ZIBR model as logistic component and Beta component. These two components model two different aspects of the data. The logistic component models presence/absence of the microbe and Beta component models non-zero microbial abundance.

Accordingly, we can test three biologically relevant null hypotheses:
- H0: α_j = 0. This is to test the coefficients in the logistic component, if the covariates are associated with the bacterial taxon by affecting its presence or absence;
- H0: β_j = 0. This is to test the coefficients in the Beta component, if the taxon is associated with the covariates by showing different abundances;
- H0: α_j = 0 and β_j = 0 for each covariate X_j and Z_j. This is to joinly test the coefficients in both logistic and Beta components, if the covariates affect the taxon both in terms of presence/absence and its non-zero abundance.


You can install our ZIBR package from CRAN


Or get the dev version from GitHub


Basic Usage

zibr_fit <- zibr(logistic_cov=logistic_cov,beta_cov=beta_cov,Y=Y,subject_ind=subject_ind,time_ind=time_ind)

The ordering of the samples in the above matrix or vectors must be consistent.

The zibr function will return the following results



Missing values

If there are missing values in certain time points, they can be imputed as following: 1. Calculate the mean or median of values from previous time point(s) and later time points(s). Use such values to replace the missing values. 2. Group the time point with missing values with other time points. For example, if you have T1, T2, T3 and T4 and T1 has missing values, you can group T1 and T2 as one time point.

After the missing values are imputed, the data can be fed into ZIBR.

Other issues

If you have other problems with the package or features/fixes to suggest, please open an issue on the GitHub issues page.


Eric Z. Chen and Hongzhe Li (2016). A two-part mixed effect model for analyzing longitudinal microbiome data. Bioinformatics. Link


Maintained by Charlie Bushman (ctbushman@gmail.com) and the Penn CHOP Microbiome Program.