Comparisons and contrasts in emmeans

Russ Lenth

2018-07-17

Contents

This vignette covers techniques for comparing EMMs at levels of a factor predictor, and other related analyses.

  1. Pairwise comparisons
  2. Other contrasts
  3. Formula interface
  4. Custom contrasts and linear functions
  5. Special behavior with log transformations
  6. Interaction contrasts (see “interactions” vignette)

Index of all vignette topics

Pairwise comparisons

The most common follow-up analysis for models having factors as predictors is to compare the EMMs with one another. This may be done simply via the pairs() method for emmGrid objects. In the code below, we obtain the EMMs for source for the pigs data, and then compare the sources pairwise.

pigs.lm <- lm(log(conc) ~ source + factor(percent), data = pigs)
pigs.emm.s <- emmeans(pigs.lm, "source")
pairs(pigs.emm.s)
##  contrast      estimate         SE df t.ratio p.value
##  fish - soy  -0.2727678 0.05293450 23  -5.153  0.0001
##  fish - skim -0.4022777 0.05415929 23  -7.428  <.0001
##  soy - skim  -0.1295098 0.05304280 23  -2.442  0.0570
## 
## Results are averaged over the levels of: percent 
## Results are given on the log (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 3 estimates

In its out-of-the-box configuration, pairs() sets two defaults for summary(): adjust = "tukey" (multiplicity adjustment), and infer = c(FALSE, TRUE) (test statistics, not confidence intervals). You may override these, of course, by calling summary() on the result with different values for these.

In the example above, EMMs for later factor levels are subtracted from those for earlier levels; if you want the comparisons to go in the other direction, use pairs(pigs.emm.s, reverse = TRUE). Also, in multi-factor situations, you may specify by factor(s) to perform the comparisons separately at the levels of those factors.

Comparisons may be summarized graphically via the comparisons argument in plot.emm():

plot(pigs.emm.s, comparisons = TRUE)

The blue bars are confidence intervals for the EMMs, and the red arrows are for the comparisons among them. If an arrow from one mean overlaps an arrow from another group, the difference is not significant, based on the adjust setting (which defaults to "tukey"). (Note: Don’t ever use confidence intervals for EMMs to perform comparisons; they can be very misleading.)

Another way to depict comparisons is by compact-letter displays:

CLD(pigs.emm.s)
##  source   emmean         SE df lower.CL upper.CL .group
##  fish   3.394492 0.03668122 23 3.318612 3.470373  1    
##  soy    3.667260 0.03744798 23 3.589793 3.744727   2   
##  skim   3.796770 0.03938283 23 3.715300 3.878240   2   
## 
## Results are averaged over the levels of: percent 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95 
## P value adjustment: tukey method for comparing a family of 3 estimates 
## significance level used: alpha = 0.05

Two EMMs sharing one or more grouping symbols are not significantly different. I really don’t recommend this method, though, as it imposes a stark difference between P values slightly less and slightly more than alpha.

Note: The multcomp package provides a very similar function named cld(). A corresponding method is provided for emmGrid objects; thus, if package:multcomp is attached, you may call cld() and obtain the same results as with CLD().

Back to Contents

Other contrasts

Pairwise comparisons are an example of linear functions of EMMs. You may use coef() to see the coefficients of these linear functions:

coef(pairs(pigs.emm.s))
##      source c.1 c.2 c.3
## fish   fish   1   1   0
## soy     soy  -1   0   1
## skim   skim   0  -1  -1

The pairwise comparisons correspond to columns of the above results. For example, the first pairwise comparison, fish - soy, gives coefficients of 1, -1, and 0 to fish, soy, and skim, respectively. In cases, such as this one, where each column of coefficients sums to zero, the linear functions are termed contrasts

The contrast() function provides for general contrasts (and linear functions, as well) of factor levels. Its second argument, method, is used to specify what method is to be used. In this section we describe the built-in ones, where we simply provide the name of the built-in method. Consider, for example, the factor percent in the model pigs.lm . It is treated as a factor in the model, but it corresponds to equally-spaced values of a numeric variable. In such cases, users often want to compute orthogonal polynomial contrasts:

pigs.emm.p <- emmeans(pigs.lm, "percent")
ply <- contrast(pigs.emm.p, "poly")
ply
##  contrast     estimate         SE df t.ratio p.value
##  linear     0.93739228 0.21055597 23   4.452  0.0002
##  quadratic -0.09710425 0.08832632 23  -1.099  0.2830
##  cubic      0.18631573 0.18773559 23   0.992  0.3313
## 
## Results are averaged over the levels of: source 
## Results are given on the log (not the response) scale.
coef(ply)
##    percent c.1 c.2 c.3
## 9        9  -3   1  -1
## 12      12  -1  -1   3
## 15      15   1  -1  -3
## 18      18   3   1   1

We obtain tests for the linear, quadratic, and cubic trends. The coefficients are those that can be found in tables in many experimental-design texts. It is important to understand that the estimated linear contrast is not the slope of a line fitted to the data. It is simply a contrast having coefficients that increase linearly. It does test the linear trend, however.

There are a number of other named contrast methods, for example "trt.vs.ctrl", "eff", and "consec". The "pairwise" and "revpairwise" methods in contrast() are the same as Pairs() and pairs(..., reverse = TRUE). See help(“contrast-methods”) for details.

Back to Contents

Formula interface

If you already know what contrasts you will want before calling emmeans(), a quick way to get them is to specify the method as the left-hand side of the formula in its second argument. For example, with the oranges dataset provided in the package,

org.aov <- aov(sales1 ~ day + Error(store), data = oranges,
               contrasts = list(day = "contr.sum"))
org.emml <- emmeans(org.aov, consec ~ day)
org.emml
## $emmeans
##  day    emmean       SE    df lower.CL upper.CL
##  1    7.872750 2.772162 29.24 2.205099 13.54040
##  2    7.100600 2.772162 29.24 1.432949 12.76825
##  3   13.758600 2.772162 29.24 8.090949 19.42625
##  4    8.042467 2.772162 29.24 2.374815 13.71012
##  5   12.924600 2.772162 29.24 7.256949 18.59225
##  6   11.603650 2.772162 29.24 5.935999 17.27130
## 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast  estimate       SE df t.ratio p.value
##  2 - 1    -0.772150 3.776875 25  -0.204  0.9997
##  3 - 2     6.658000 3.776875 25   1.763  0.3246
##  4 - 3    -5.716133 3.776875 25  -1.513  0.4681
##  5 - 4     4.882133 3.776875 25   1.293  0.6125
##  6 - 5    -1.320950 3.776875 25  -0.350  0.9965
## 
## P value adjustment: mvt method for 5 tests

The contrasts shown are the day-to-day changes.

This two-sided formula technique is quite convenient, but it can also create confusion. For one thing, the result is not an emmGrid object anymore; it is a list of emmGrid objects, called an emm_list. You may need to be cognizant of that if you are to do further contrasts or other analyzes. For example if you want "eff" contrasts as well, you need to do contrast(org.emml[[1]], "eff") or contrast(org.emml, "eff", which = 1).

Another issue is that it may be unclear which part of the results is affected by certain options. For example, if you were to add adjust = "bonf" to the org.emm call above, would the Bonferroni adjustment be applied to the EMMs, or to the contrasts? (See the documentation if interested; but the best practice is to avoid such dilemmas.)

Back to Contents

Custom contrasts and linear functions

The user may write a custom contrast function for use in contrast(). What’s needed is a function having the desired name with ".emmc" appended, that generates the needed coefficients as a list or data frame. The function should take a vector of levels as its first argument, and any optional parameters as additional arguments. For example, suppose we want to compare every third level of a treatment. The following function provides for this:

skip_comp.emmc <- function(levels, skip = 1, reverse = FALSE) {
    if((k <- length(levels)) < skip + 1)
        stop("Need at least ", skip + 1, " levels")
    coef <- data.frame()
    coef <- as.data.frame(lapply(seq_len(k - skip - 1), function(i) {
        sgn <- ifelse(reverse, -1, 1)
        sgn * c(rep(0, i - 1), 1, rep(0, skip), -1, rep(0, k - i - skip - 1))
    }))
    names(coef) <- sapply(coef, function(x)
        paste(which(x == 1), "-", which(x == -1)))
    attr(coef, "adjust") = "fdr"   # default adjustment method
    coef
}

To test it, try 5 levels:

skip_comp.emmc(1:5)
##   1 - 3 2 - 4 3 - 5
## 1     1     0     0
## 2     0     1     0
## 3    -1     0     1
## 4     0    -1     0
## 5     0     0    -1
skip_comp.emmc(1:5, skip = 0, reverse = TRUE)
##   2 - 1 3 - 2 4 - 3 5 - 4
## 1    -1     0     0     0
## 2     1    -1     0     0
## 3     0     1    -1     0
## 4     0     0     1    -1
## 5     0     0     0     1

(The latter is the same as "consec" contrasts.) Now try it with the oranges example we had previously:

contrast(org.emml[[1]], "skip_comp", skip = 2, reverse = TRUE)
##  contrast   estimate       SE df t.ratio p.value
##  4 - 1     0.1697167 3.776875 25   0.045  0.9645
##  5 - 2     5.8240000 3.776875 25   1.542  0.4069
##  6 - 3    -2.1549500 3.776875 25  -0.571  0.8601
## 
## P value adjustment: fdr method for 3 tests

The contrast() function may in fact be used to compute arbitrary linear functions of EMMs. Suppose for some reason we want to estimate the quantities \(\lambda_1 = \mu_1+2\mu_2-7\) and \(\lambda_2 = 3\mu_2-2\mu_3+1\), where the \(\mu_j\) are the population values of the source EMMs in the pigs example. This may be done by providing the coefficients in a list, and the added constants in the offset argument:

LF <- contrast(pigs.emm.s, 
               list(lambda1 = c(1, 2, 0), lambda2 = c(0, 3, -2)),
               offset = c(-7, 1))
confint(LF, adjust = "bonferroni")
##  contrast estimate         SE df lower.CL upper.CL
##  lambda1  3.729013 0.08274372 23 3.530604 3.927422
##  lambda2  4.408241 0.13411290 23 4.086655 4.729827
## 
## Results are averaged over the levels of: percent 
## Confidence level used: 0.95 
## Conf-level adjustment: bonferroni method for 2 estimates

Back to Contents

Special properties of log (and logit) transformations

Suppose we obtain EMMs for a model having a response transformation or link function. In most cases, when we compute contrasts of those EMMs, there is no natural way to express those contrasts on anything other than the transformed scale. For example, in a model fitted using glm() with the gamma() family, the default link function is the inverse. Predictions on such a model are estimates of \(1/\mu_j\) for various \(j\). Comparisons of predictions will be estimates of \(1/\mu_j - 1/\mu_{k}\) for \(j \ne k\). There is no natural way to back-transform these differences to some other interpretable scale.

However, logs are an exception, in that \(\log\mu_j - \log\mu_k = \log(\mu_j/\mu_k)\). Accordingly, when contrast() (or pairs()) notices that the response is on the log scale, it back-transforms contrasts to ratios when results are to be of response type. For example:

pairs(pigs.emm.s, type = "lp")
##  contrast      estimate         SE df t.ratio p.value
##  fish - soy  -0.2727678 0.05293450 23  -5.153  0.0001
##  fish - skim -0.4022777 0.05415929 23  -7.428  <.0001
##  soy - skim  -0.1295098 0.05304280 23  -2.442  0.0570
## 
## Results are averaged over the levels of: percent 
## Results are given on the log (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 3 estimates
pairs(pigs.emm.s, type = "response")
##  contrast        ratio         SE df t.ratio p.value
##  fish / soy  0.7612695 0.04029742 23  -5.153  0.0001
##  fish / skim 0.6687950 0.03622146 23  -7.428  <.0001
##  soy / skim  0.8785259 0.04659947 23  -2.442  0.0570
## 
## Results are averaged over the levels of: percent 
## P value adjustment: tukey method for comparing a family of 3 estimates 
## Tests are performed on the log scale

As is true of EMM summaries with type = "response", the tests and confidence intervals are done before back-transforming. The ratios estimated here are actually ratios of geometric means. In general, a model with a log response is in fact a model for relative effects of any of its linear predictors, and this back-transformation to ratios goes hand-in-hand with that.

In generalized linear models, this behaviors will occur in two common cases: Poisson or count regression, for which the usual link is the log; and logistic regression, because logits are logs of odds ratios.

Back to Contents

Index of all vignette topics