Here we document what model objects may be used with emmeans, and some special features of some of them that may be accessed by passing additional arguments through ref_grid
or emmeans()
.
Certain objects are affected by optional arguments to functions that construct emmGrid
objects, including ref_grid()
, emmeans()
, emtrends()
, and emmip()
. When “arguments” are mentioned in the subsequent quick reference and object-by-object documentation, we are talking about arguments in these constructors.
Additional models can be supported by writing appropriate recover_data
and emm_basis
methods. See the package documentation for extending-emmeans
and vignette("extending")
for details.
Here is an alphabetical list of model classes that are supported, and the arguments that apply. Detailed documentation follows, with objects grouped by the code in the “Group” column. Scroll down or follow the links to those groups for more information.
Object.class | Package | Group | Arguments / notes |
---|---|---|---|
aov | stats | A | |
aovList | stats | V | Best with balanced designs, orthogonal coding |
betareg | betareg | B | mode = c("link", "precision", "phi.link", |
"variance", "quantile") |
|||
carbayes | CARBayes | S | data is required |
clm | ordinal | O | mode = c("latent", "linear.predictor", "cum.prob", |
"exc.prob", "prob", "mean.class", "scale") |
|||
clmm | ordinal | O | Like clm but no "scale" mode |
coxme | coxme | G | |
coxph | survival | G | |
gam | gam | G | nboot = 800 |
gamlss | gamlss | H | what = c("mu", "sigma", "nu", "tau") |
gee | gee | E | vcov.method = c("naive", "robust") |
geeglm | geepack | E | vcov.method = c("vbeta", "vbeta.naiv", "vbeta.j1s", |
"vbeta.fij", "robust", "naive") or a matrix |
|||
geese | geepack | E | Like geeglm |
glm | stats | G | |
glm.nb | MASS | G | Requires data argument |
glmmadmb | glmmADMB | G | |
glmerMod | lme4 | G | |
glmmPQL | MASS | G | inherits lm support |
gls | nlme | G | |
hurdle | pscl | C | mode = c("response", "count", "zero", "prob0"), |
lin.pred = c(FALSE, TRUE) |
|||
lm | stats | A | Several other classes inherit from this and may be supported |
lme | nlme | A | sigmaAdjust = c(TRUE, FALSE) |
lmerMod | lme4 | L | lmer.df = c("kenward-roger", "satterthwaite", "asymptotic") , |
pbkrtest.limit = 3000 , disable.pbkrtest = FALSE . |
|||
emm_options(lmer.df =, pbkrtest.limit =, disable.pbkrtest =) |
|||
manova | stats | M | mult.name , mult.levs |
maov | stats | M | mult.name , mult.levs |
mcmc | mcmc | S | May require formula , data |
MCMCglmm | MCMCglmm | M,S | mult.name , mult.levs |
data is required |
|||
mixed | afex | P | Supported in afex package |
mlm | stats | M | mult.name , mult.levs |
multinom | nnet | N | mode = c("prob", "latent") |
Always include response in specs for emmeans() |
|||
nauf | P | Supported in nauf package | |
nlme | nlme | A | Supports fixed part. Requires param |
polr | MASS | O | mode = c("latent", "linear.predictor", "cum.prob", |
"exc.prob", "prob", "mean.class") |
|||
rlm | MASS | A | inherits lm support |
rms | rms | O | mode = ("middle", "latent", "linear.predictor", |
"cum.prob", "exc.prob", "prob", "mean.class") |
|||
rsm | rsm | P | Supported in rsm package |
stanreg | rstanarm | S | Args for stanreg_ xxx similar to those for xxx |
survreg | survival | A | |
zeroinfl | pscl | C | mode = c("response", "count", "zero", "prob0") , |
lin.pred = c(FALSE, TRUE) |
Models in this group, such as lm
, do not have unusual features that need special support; hence no extra arguments are needed.
The additional mode
argument for betareg
objects has possible values of "response"
, "link"
, "precision"
, "phi.link"
, "variance"
, and "quantile"
, which have the same meaning as the type
argument in predict.betareg
– with the addition that "phi.link"
is like "link"
, but for the precision portion of the model. When mode = "quantile"
is specified, the additional argument quantile
(a numeric scalar or vector) specifies which quantile(s) to compute; the default is 0.5 (the median). Also in "quantile"
mode, an additional variable quantile
is added to the reference grid, and its levels are the values supplied.
Two optional arguments – mode
and lin.pred
– are provided. The mode
argument has possible values "response"
(the default), "count"
, "zero"
, or "prob0"
. lin.pred
is logical and defaults to FALSE
.
With lin.pred = FALSE
, the results are comparable to those returned by predict(..., type = "response")
, predict(..., type = "count")
, predict(..., type = "zero")
, or predict(..., type = "prob")[, 1]
. See the documentation for predict.hurdle
and predict.zeroinfl
.
The option lin.pred = TRUE
only applies to mode = "count"
and mode = "zero"
. The results returned are on the linear-predictor scale, with the same transformation as the link function in that part of the model. The predictions for a reference grid with mode = "count"
, lin.pred = TRUE
, and type = "response"
will be the same as those obtained with lin.pred = FALSE
and mode = "count"
; however, any EMMs derived from these grids will be different, because the averaging is done on the log-count scale and the actual count scale, respectively – thereby producing geometric means versus arithmetic means of the predictions.
If the vcov.
argument is used (see details in the documentation for ref_grid
), it must yield a matrix of the same size as would be obtained using vcov.hurdle
or vcov.zeroinfl
with its model
argument set to ("full", "count", "zero")
in respective correspondence with mode
of ("mean", "count", "zero")
. If vcov.
is a function, it must support the model
argument.
These models all have more than one covariance estimate available, and it may be selected by supplying a string as the vcov.method
argument. It is partially matched with the available choices shown in the quick reference. In geese
and geeglm
, the aliases "robust"
(for "vbeta"
) and "naive"
(for "vbeta.naiv"
are also accepted.
If a matrix or function is supplied as vcov.method
, it is interpreted as a vcov.
specification as described for ...
in the documentation for ref_grid
.
Models in this group receive only standard support as in Group A, but typically the tests and confidence intervals are asymptotic. Thus the df
column for tabular results will be Inf
.
In the case of gam::gam
objects, there is an optional nboot
argument that sets the number of bootstrap replications used to estimate the variances and covariances of the smoothing portions of the model.
gamlss
modelsThe what
argument has possible values of "mu"
(default), "sigma"
, "nu"
, or "tau"
depending on which part of the model you want results for. Currently, there is no support when the selected part of the model contains a smoothing method like pb()
.
lmerMod
modelsThere is an optional lmer.df
argument that defaults to get_EMM_option("lmer.df")
(which in turn defaults to "kenward-roger"
). The possible values are "kenward-roger"
, "satterthwaite"
, and "asymptotic"
(these are partially matched and case-insensitive). With "kenward-roger"
, d.f. are obtained using code from the pbkrtest package, if installed. With "satterthwaite"
, d.f. are obtained using code from the lmerTest package, if installed. With "asymptotic"
, or if the needed package is not installed, d.f. are set to Inf
. (For backward compatibility, the user may specify mode
in lieu of lmer.df
.)
A by-product of the Kenward-Roger method is that the covariance matrix is adjusted using pbkrtest::vcovAdj()
. This can require considerable computation; so to avoid that overhead, the user should opt for the Satterthwaite or asymptotic method; or, for backward compatibility, may disable the use of pbkrtest via emm_options(disable.pbkrtest = TRUE)
(this does not disable the pbkrtest package entirely, just its use in emmeans). The computation time required depends roughly on the number of observations, N, in the design matrix (because a major part of the computation involves inverting an N x N matrix). Thus, pbkrtest is automatically disabled if N exceeds the value of get_emm_option("pbkrtest.limit")
, for which the factory default is 3000.
Similarly to the above, the disable.lmerTest
and lmerTest.limit
options affect whether Satterthwaite methods can be implemented.
The df
argument may be used to specify some other degrees of freedom. Note that if df
and method = "satterthwaite"
are both specified, the covariance matrix is adjusted but the K-R degrees of freedom are not used.
Finally, note that a user-specified covariance matrix (via the vcov.
argument) will also disable the Kenward-Roger method; in that case, the Satterthwaite method is used in place of Kenward-Roger.
When there is a multivariate response, the different responses are treated as if they were levels of a factor – named rep.meas
by default. The mult.name
argument may be used to change this name. The mult.levs
argument may specify a named list of one or more sets of levels. If this has more than one element, then the multivariate levels are expressed as combinations of the named factor levels via the function base::expand.grid
.
The reference grid includes a pseudo-factor with the same name and levels as the multinomial response. There is an optional mode
argument which should match "prob"
or "latent"
. With mode = "prob"
, the reference-grid predictions consist of the estimated multinomial probabilities. The "latent"
mode returns the linear predictor, recentered so that it averages to zero over the levels of the response variable (similar to sum-to-zero contrasts). Thus each latent variable can be regarded as the log probability at that level minus the average log probability over all levels.
There are two optional arguments: mode
and rescale
(which defaults to c(0, 1)
).
Please note that, because the probabilities sum to 1 (and the latent values sum to 0) over the multivariate-response levels, all sensible results from emmeans()
must involve that response as one of the factors. For example, if resp
is a response with k levels, emmeans(model, ~ resp | trt)
will yield the estimated multinomial distribution for each trt
; but emmeans(model, ~ trt)
will just yield the average probability of 1/k for each trt
.
The reference grid for ordinal models will include all variables that appear in the main model as well as those in the scale
or nominal
models (if provided). There are two optional arguments: mode
(a character string) and rescale
(which defaults to c(0, 1)
). mode
should match one of "latent"
(the default), "linear.predictor"
, "cum.prob"
, "exc.prob"
, "prob"
, "mean.class"
, or "scale"
– see the quick reference and note which are supported.
With mode = "latent"
, the reference-grid predictions are made on the scale of the latent variable implied by the model. The scale and location of this latent variable are arbitrary, and may be altered via rescale
. The predictions are multiplied by rescale[2]
, then added to rescale[1]
. Keep in mind that the scaling is related to the link function used in the model; for example, changing from a probit link to a logistic link will inflate the latent values by around \(\pi/\sqrt{3}\), all other things being equal. rescale
has no effect for other values of mode
.
With mode = "linear.predictor"
, mode = "cum.prob"
, and mode = "exc.prob"
, the boundaries between categories (i.e., thresholds) in the ordinal response are included in the reference grid as a pseudo-factor named cut
. The reference-grid predictions are then of the cumulative probabilities at each threshold (for mode = "cum.prob"
), exceedance probabilities (one minus cumulative probabilities, for mode = "exc.prob"
), or the link function thereof (for mode = "linear.predictor"
).
With mode = "prob"
, a pseudo-factor with the same name as the model’s response variable is created, and the grid predictions are of the probabilities of each class of the ordinal response. With "mean.class"
, the returned results are means of the ordinal response, interpreted as a numeric value from 1 to the number of classes, using the "prob"
results as the estimated probability distribution for each case.
With mode = "scale"
, and the fitted object incorporates a scale model, EMMs are obtained for the factors in the scale model (with a log response) instead of the response model. The grid is constructed using only the factors in the scale model.
Any grid point that is non-estimable by either the location or the scale model (if present) is set to NA
, and any EMMs involving such a grid point will also be non-estimable. A consequence of this is that if there is a rank-deficient scale
model, then all latent responses become non-estimable because the predictions are made using the average log-scale estimate.
rms
models have an additional mode
. With mode = "middle"
(this is the default), the middle intercept is used, comparable to the default for rms::Predict()
. This is quite similar in concept to mode = "latent"
, where all intercepts are averaged together.
Models in this group have their emmeans support provided by the package that implements the model-fitting procedure. Users should refer to the package documentation for details on emmeans support.
Models fitted using MCMC methods contain a sample from the posterior distribution of fixed-effect coefficients. In some cases (e.g., results of MCMCpack::MCMCregress()
and MCMCpack::MCMCpoisson()
), the object may include a "call"
attribute that emmeans()
can use to reconstruct the data and obtain a basis for the EMMs. If not, a formula
and data
argument are provided that may help produce the right results. In addition, the contrasts
specifications are not necessarily recoverable from the object, so the system default must match what was actually used in fitting the model.
The summary.emmGrid()
, test.emmGrid()
, etc. methods provide frequentist analyses of the results based on the posterior means and covariances. However, an as.mcmc()
method is provided that creates an mcmc
object that can be summarized or plotted using the coda package. It provides a posterior sample of EMMs for the given reference grid, based on the posterior sample of the fixed effects from the model object.
aovList
objectsSupport for these objects is limited. To avoid strong biases in the predictions, the contrasts
attribute of all factors should be of a type that sums to zero – for example, "contr.sum"
, "contr.poly"
, or "contr.helmert"
but not "contr.treatment"
. Only intra-block estimates of covariances are used. That is, if a factor appears in more than one error stratum, only the covariance structure from its lowest stratum is used in estimating standard errors. Degrees of freedom are obtained using the Satterthwaite method. In general, aovList
support is best with balanced designs, with due caution in the use of contrasts. If a vcov.
argument is supplied, it must yield a single covariance matrix for the unique fixed effects (not a set of them for each error stratum). In that case, the degrees of freedom are set to NA
.