Models in which predictors interact seem to create a lot of confusion concerning what kinds of *post hoc* methods should be used. It is hoped that this vignette will be helpful in shedding some light on how to use the **emmeans** package effectively in such situations.

As an example for this topic, consider the `auto.noise`

dataset included with the package. This is a balanced 3x2x2 experiment with three replications. The response – noise level – is evaluated with different sizes of cars, types of anti-pollution filters, on each side of the car being measured.^{1}

Let’s fit a model and obtain the ANOVA table:

```
noise.lm <- lm(noise ~ size * type * side, data = auto.noise)
anova(noise.lm)
## Analysis of Variance Table
##
## Response: noise
## Df Sum Sq Mean Sq F value Pr(>F)
## size 2 26051.4 13025.7 893.1905 < 2.2e-16
## type 1 1056.3 1056.3 72.4286 1.038e-08
## side 1 0.7 0.7 0.0476 0.8291042
## size:type 2 804.2 402.1 27.5714 6.048e-07
## size:side 2 1293.1 646.5 44.3333 8.730e-09
## type:side 1 17.4 17.4 1.1905 0.2860667
## size:type:side 2 301.4 150.7 10.3333 0.0005791
## Residuals 24 350.0 14.6
```

There are statistically strong 2- and 3-way interactions.

One mistake that a lot of people seem to make is to proceed too hastily to estimating marginal means (even in the face of all these interactions!). They would go straight to analyses like this:

```
emmeans(noise.lm, pairwise ~ size)
## NOTE: Results may be misleading due to involvement in interactions
## $emmeans
## size emmean SE df lower.CL upper.CL
## S 824.1667 1.102396 24 821.8914 826.4419
## M 833.7500 1.102396 24 831.4748 836.0252
## L 772.5000 1.102396 24 770.2248 774.7752
##
## Results are averaged over the levels of: type, side
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## S - M -9.583333 1.559024 24 -6.147 <.0001
## S - L 51.666667 1.559024 24 33.140 <.0001
## M - L 61.250000 1.559024 24 39.287 <.0001
##
## Results are averaged over the levels of: type, side
## P value adjustment: tukey method for comparing a family of 3 estimates
```

The hasty analyst would thus conclude that the noise level is significantly higher for medium-sized cars than for small or large ones.

But as is seen in the message before the output, `emmeans()`

valiantly tries to warn you that it may not be a good idea to average over factors that interact with the factor of interest. It isn’t always a bad idea to do this, but sometimes it definitely is.

What about this time? I think a good first step is always to try to visualize the nature of the interactions before doing any statistical comparisons. The following plot helps.

`emmip(noise.lm, type ~ size | side)`

Examining this plot, we see that the “medium” mean is not always higher; so the marginal means, and the way they compare, does not represent what is always the case. Moreover, what is evident in the plot is that the peak for medium-size cars occurs for only one of the two filter types. So it seems more useful to do the comparisons of size separately for each filter type. This is easily done, simply by conditioning on `type`

:

```
emm_s.t <- emmeans(noise.lm, pairwise ~ size | type)
## NOTE: Results may be misleading due to involvement in interactions
emm_s.t
## $emmeans
## type = Std:
## size emmean SE df lower.CL upper.CL
## S 825.8333 1.559024 24 822.6157 829.0510
## M 845.8333 1.559024 24 842.6157 849.0510
## L 775.0000 1.559024 24 771.7823 778.2177
##
## type = Octel:
## size emmean SE df lower.CL upper.CL
## S 822.5000 1.559024 24 819.2823 825.7177
## M 821.6667 1.559024 24 818.4490 824.8843
## L 770.0000 1.559024 24 766.7823 773.2177
##
## Results are averaged over the levels of: side
## Confidence level used: 0.95
##
## $contrasts
## type = Std:
## contrast estimate SE df t.ratio p.value
## S - M -20.0000000 2.204793 24 -9.071 <.0001
## S - L 50.8333333 2.204793 24 23.056 <.0001
## M - L 70.8333333 2.204793 24 32.127 <.0001
##
## type = Octel:
## contrast estimate SE df t.ratio p.value
## S - M 0.8333333 2.204793 24 0.378 0.9245
## S - L 52.5000000 2.204793 24 23.812 <.0001
## M - L 51.6666667 2.204793 24 23.434 <.0001
##
## Results are averaged over the levels of: side
## P value adjustment: tukey method for comparing a family of 3 estimates
```

Not too surprisingly, the statistical comparisons are all different for standard filters, but with Octel filters, there isn’t much of a difference between small and medium size.

For comparing the levels of other factors, similar judgments must be made. It may help to construct other interaction plots with the factors in different roles. In my opinion, almost all meaningful statistical analysis should be grounded in evaluating *both* the statistical and practical significance of the estimated effects. Those who put all their attention on *P* values and asterisks are ignoring the fact that these don’t measure the sizes of the effects on a practical scale.^{2} An effect can be practically negligible but statistically significant, or vice versa, depending on sample size and error variance. Failure to describe what is actually going on in the data is a failure to do an adequate analysis. Use lots of plots, and think.

An interaction contrast is a contrast of contrasts. For instance, in the auto-noise example, we may want to obtain the linear and quadratic contrasts of `size`

separately for each `type`

, and compare them. Here are estimates of those contrasts:

```
contrast(emm_s.t[[1]], "poly") ## 'by = "type"' already in previous result
## type = Std:
## contrast estimate SE df t.ratio p.value
## linear -50.83333 2.204793 24 -23.056 <.0001
## quadratic -90.83333 3.818813 24 -23.786 <.0001
##
## type = Octel:
## contrast estimate SE df t.ratio p.value
## linear -52.50000 2.204793 24 -23.812 <.0001
## quadratic -50.83333 3.818813 24 -13.311 <.0001
##
## Results are averaged over the levels of: side
```

The comparison of these contrasts may be done using the `interaction`

argument in `contrast()`

as follows:

```
IC_st <- contrast(emm_s.t[[1]], interaction = c("poly", "consec"))
IC_st
## type = Std:
## size_poly estimate SE df t.ratio p.value
## linear -50.83333 2.204793 24 -23.056 <.0001
## quadratic -90.83333 3.818813 24 -23.786 <.0001
##
## type = Octel:
## size_poly estimate SE df t.ratio p.value
## linear -52.50000 2.204793 24 -23.812 <.0001
## quadratic -50.83333 3.818813 24 -13.311 <.0001
##
## Results are averaged over the levels of: side
```

The practical meaning of this is that there isn’t a statistical difference in the linear trends, but the quadratic trend for Octel is greater than for standard filter types. (Both quadratic trends are negative, so in fact it is the standard filters that have more pronounced *downward* curvature, as is seen in the plot.) In case you need to understand more clearly what contrasts are being estimated, the `coef()`

method helps:

```
coef(IC_st)
## size type c.1 c.2 c.3 c.4
## 1 S Std -1 1 0 0
## 2 M Std 0 -2 0 0
## 3 L Std 1 1 0 0
## 4 S Octel 0 0 -1 1
## 5 M Octel 0 0 0 -2
## 6 L Octel 0 0 1 1
```

Note that the 4th through 6th contrast coefficients are the negatives of the 1st through 3rd – thus a comparison of two contrasts.

By the way, “type III” tests of interaction effects can be obtained via interaction contrasts:

```
test(IC_st, joint = TRUE)
## type df1 df2 F.ratio p.value
## Std 2 24 548.667 <.0001
## Octel 2 24 372.095 <.0001
```

This result is exactly the same as the *F* test of `size:type`

in the `anova`

output.

The three-way interaction may be explored via interaction contrasts too:

```
contrast(emmeans(noise.lm, ~ size*type*side),
interaction = c("poly", "consec", "consec"))
## size_poly type_consec side_consec estimate SE df t.ratio p.value
## linear Octel - Std R - L -26.66667 6.236096 24 -4.276 0.0003
## quadratic Octel - Std R - L -16.66667 10.801234 24 -1.543 0.1359
```

One interpretation of this is that the comparison by `type`

of the linear contrasts for `size`

is different on the left side than on the right side; but the comparison of that comparison of the quadratic contrasts, not so much. Refer again to the plot, and this can be discerned as a comparison of the interaction in the left panel versus the interaction in the right panel.

Finally, **emmeans** provides a `joint_tests()`

function that obtains and tests the interaction contrasts for all effects in the model and compiles them in one Type-III-ANOVA-like table:

```
joint_tests(noise.lm)
## model term df1 df2 F.ratio p.value
## size 2 24 893.190 <.0001
## type 1 24 72.429 <.0001
## side 1 24 0.048 0.8291
## size:type 2 24 27.571 <.0001
## size:side 2 24 44.333 <.0001
## type:side 1 24 1.190 0.2861
## size:type:side 2 24 10.333 0.0006
```

You may even add `by`

variable(s) to obtain separate ANOVA tables for the remaining factors:

```
joint_tests(noise.lm, by = "side")
## side = L:
## model term df1 df2 F.ratio p.value
## size 2 24 651.714 <.0001
## type 1 24 46.095 <.0001
## size:type 2 24 23.524 <.0001
##
## side = R:
## model term df1 df2 F.ratio p.value
## size 2 24 285.810 <.0001
## type 1 24 27.524 <.0001
## size:type 2 24 14.381 0.0001
```

Consider the `oranges`

dataset included with the package. These data concern the sales of two varieties of oranges. The prices (`price1`

and `price2`

) were experimentally varied in different stores and different days, and the responses `sales1`

and `sales2`

were observed. Let’s consider three multivariate models for these data, with additive effects for days and stores, and different levels of fitting on the prices:

```
org.quad <- lm(cbind(sales1, sales2) ~ poly(price1, price2, degree = 2)
+ day + store, data = oranges)
org.int <- lm(cbind(sales1, sales2) ~ price1 * price2 + day + store, data = oranges)
org.add <- lm(cbind(sales1, sales2) ~ price1 + price2 + day + store, data = oranges)
```

Being a multivariate model, **emmeans** methods will distinguish the responses as if they were levels of a factor, which we will name “variety”. An interesting way to view these models is to look at how they predict sales of each variety at each observed values of the prices:

`emmip(org.quad, price2 ~ price1 | variety, mult.name = "variety", cov.reduce = FALSE)`

The trends portrayed here are quite sensible: In the left panel, as we increase the price of variety 1, sales of that variety will tend to decrease – and the decrease will be faster when the other variety of oranges is low-priced. In the right panel, as price of variety 1 increases, sales of variety 2 will increase when it is low-priced, but could decrease also at high prices because oranges in general are just too expensive. A plot like this for `org.int`

will be similar but all the curves will be straight lines; and the one for `plot.add`

will have all lines parallel. In all models, though, there are implied `price1:variety`

and `price2:variety`

interactions, because we have different regression coefficients for the two responses.

Which model should we use? They are nested models, so they can be compared by `anova()`

:

```
anova(org.quad, org.int, org.add)
## Analysis of Variance Table
##
## Model 1: cbind(sales1, sales2) ~ poly(price1, price2, degree = 2) + day +
## store
## Model 2: cbind(sales1, sales2) ~ price1 * price2 + day + store
## Model 3: cbind(sales1, sales2) ~ price1 + price2 + day + store
## Res.Df Df Gen.var. Pillai approx F num Df den Df Pr(>F)
## 1 20 22.798
## 2 22 2 21.543 0.074438 0.38658 4 40 0.8169
## 3 23 1 23.133 0.218004 2.64840 2 19 0.0967
```

It seems like the full-quadratic model has little advantage over the interaction model. Strict .05-significance people would, I suppose, settle on the additive model. I like `org.int`

, but what follows could be done with any of them.

To summarize and test the results compactly, it makes sense to obtain estimates of a representative trend in each of the left and right panels, and perhaps to compare them. In turn, that can be done by obtaining the slope of the curve (or line) at the average value of `price2`

. The `emtrends()`

function is designed for exactly this kind of purpose. It uses a difference quotient to estimate the slope of a line fitted to a given variable. It works just like `emmeans()`

except for requiring the variable to use in the difference quotient. Using the `org.int`

model:

```
emtrends(org.int, pairwise ~ variety, var = "price1", mult.name = "variety")
## $emtrends
## variety price1.trend SE df lower.CL upper.CL
## sales1 -0.7489343 0.1713333 22 -1.104258 -0.3936107
## sales2 0.1376322 0.2140952 22 -0.306374 0.5816384
##
## Results are averaged over the levels of: day, store
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## sales1 - sales2 -0.8865665 0.2402569 22 -3.69 0.0013
##
## Results are averaged over the levels of: day, store
```

From this, we can say that, starting with `price1`

and `price2`

both at their average values, we expect `sales1`

to decrease by about .75 per unit increase in `price1`

(statistically significant since the confidence interval excludes zero); meanwhile, a slight increase (but not significant) of `sales2`

may occur. Marginally, the first variety has a .89 disadvantage relative to sales of the second variety.

Other analyses (not shown) with `price2`

set at a higher value will reduce these effects, while setting `price2`

lower will exaggerate all these effects. If the same analysis is done with the quadratic model, the the trends are curved, and so the results will depend somewhat on the setting for `price1`

. The graph above gives an indication of the nature of those changes.

Similar results hold when we analyze the trends for `price2`

:

```
emtrends(org.int, pairwise ~ variety, var = "price2", mult.name = "variety")
## $emtrends
## variety price2.trend SE df lower.CL upper.CL
## sales1 0.1718250 0.1023334 22 -0.04040137 0.3840514
## sales2 -0.7447521 0.1278740 22 -1.00994656 -0.4795577
##
## Results are averaged over the levels of: day, store
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## sales1 - sales2 0.9165772 0.1434998 22 6.387 <.0001
##
## Results are averaged over the levels of: day, store
```

At the averages, increasing the price of variety 2 has the effect of decreasing sales of variety 2 while slightly increasing sales of variety 1 – a marginal difference of about .92.

Interaction, by nature, make things more complicated. One must resist pressures and inclinations to try to produce simple bottom-line conclusions. Interactions require more work and more patience; they require presenting more cases – more than are presented in the examples in this vignette – in order to provide a complete picture.

I sure wish I could ask some questions about how how these data were collected; for example, are these independent experimental runs, or are some cars measured more than once? The model is based on the independence assumption, but I have my doubts.↩

You may have noticed that there are no asterisks in the ANOVA table in this vignette. I habitually opt out of star-gazing via

`options(show.signif.stars = FALSE)`

↩