Transformations and link functions in emmeans

Russ Lenth

2018-01-09

Contents

This vignette covers the intricacies of transformations and link functions in emmeans.

  1. Overview
  2. Re-gridding
  3. Link functions
  4. Both a response transformation and a link
  5. Special transformations
  6. Specifying a transformation after the fact
  7. Faking a log transformation

Vignette index

Overview

Consider the same example with the pigs dataset that is used in many of these vignettes:

pigs.lm <- lm(log(conc) ~ source + factor(percent), data = pigs)

This model has two factors, source and percent (coerced to a factor), as predictors; and log-transformed conc as the response. Here we obtain the EMMs for source, examine its structure, and finally produce a summary, including a test against a null value of log(35):

pigs.emm.s <- emmeans(pigs.lm, "source")
str(pigs.emm.s)
## 'emmGrid' object with variables:
##     source = fish, soy, skim
## Transformation: "log"

summary(pigs.emm.s, infer = TRUE, null = log(35))
##  source   emmean         SE df lower.CL upper.CL     null t.ratio p.value
##  fish   3.394492 0.03668122 23 3.318612 3.470373 3.555348  -4.385  0.0002
##  soy    3.667260 0.03744798 23 3.589793 3.744727 3.555348   2.988  0.0066
##  skim   3.796770 0.03938283 23 3.715300 3.878240 3.555348   6.130  <.0001
## 
## Results are averaged over the levels of: percent 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95

Now suppose that we want the EMMs expressed on the same scale as conc. This can be done by adding type = "response" to the summary() call:

summary(pigs.emm.s, infer = TRUE, null = log(35), type = "response")
##  source response       SE df lower.CL upper.CL null t.ratio p.value
##  fish   29.79952 1.093083 23 27.62197 32.14874   35  -4.385  0.0002
##  soy    39.14451 1.465883 23 36.22658 42.29747   35   2.988  0.0066
##  skim   44.55704 1.754782 23 41.07093 48.33905   35   6.130  <.0001
## 
## Results are averaged over the levels of: percent 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale 
## Tests are performed on the log scale

Timing is everything

Dealing with transformations in emmeans is somewhat complex, due to the large number of possibilities. But the key is understanding what happens, when. These results come from a sequence of steps. Here is what happens (and doesn’t happen) at each step:

  1. The reference grid is constructed for the log(conc) model. The fact that a log transformation is used is recorded, but nothing else is done with that information.
  2. The predictions on the reference grid are averaged over the four percent levels, for each source, to obtain the EMMs for sourcestill on the log(conc) scale.
  3. The standard errors and confidence intervals for these EMMs are computed – still on the log(conc) scale.
  4. Only now do we do back-transformation…
    1. The EMMs are back-transformed to the conc scale.
    2. The endpoints of the confidence intervals are back-transformed.
    3. The t tests and P values are left as-is.
    4. The standard errors are converted to the conc scale using the delta method. These SEs were not used in constructing the tests and confidence intervals.

The model is our best guide

This choice of timing is based on the idea that the model is right. In particular, the fact that the response is transformed suggests that the transformed scale is the best scale to be working with. In addition, the model specifies that the effects of source and percent are linear on the transformed scale; inasmuch as marginal averaging to obtain EMMs is a linear operation, that averaging is best done on the transformed scale. For those two good reasons, back-transforming to the response scale is delayed until the very end by default.

Back to Contents

Re-gridding

As well-advised as it is, some users may not want the default timing of things. The tool for changing when back-transformation is performed is the regrid() function – which, with default settings of its arguments, back-transforms an emmGrid object and adjusts everything in it appropriately. For example:

str(regrid(pigs.emm.s))
## 'emmGrid' object with variables:
##     source = fish, soy, skim

summary(regrid(pigs.emm.s), infer = TRUE, null = 35)
##  source response       SE df lower.CL upper.CL null t.ratio p.value
##  fish   29.79952 1.093083 23 27.53831 32.06074   35  -4.758  0.0001
##  soy    39.14451 1.465883 23 36.11210 42.17692   35   2.827  0.0096
##  skim   44.55704 1.754782 23 40.92699 48.18708   35   5.446  <.0001
## 
## Results are averaged over the levels of: percent 
## Confidence level used: 0.95

Notice that the structure no longer includes the transformation. That’s because it is no longer relevant; the reference grid is on the conc scale, and how we got there is now forgotten. Compare this summary() result with the preceding one, and note the following:

Understood, right? But think carefully about how these EMMs were obtained. They are back-transformed from pigs.emm.s, in which the marginal averaging was done on the log scale. If we want to back-transform before doing the averaging, we need to call regrid() after the reference grid is constructed but before the averaging takes place:

pigs.rg <- ref_grid(pigs.lm)
pigs.remm.s <- emmeans(regrid(pigs.rg), "source")
summary(pigs.remm.s, infer = TRUE, null = 35)
##  source response       SE df lower.CL upper.CL null t.ratio p.value
##  fish   29.97478 1.096051 23 27.70743 32.24214   35  -4.585  0.0001
##  soy    39.37473 1.494655 23 36.28280 42.46666   35   2.927  0.0076
##  skim   44.81909 1.789918 23 41.11636 48.52182   35   5.486  <.0001
## 
## Results are averaged over the levels of: percent 
## Confidence level used: 0.95

These results all differ from either of the previous two summaries – again, because the averaging is done on the conc scale rather than the log(conc) scale.

Note: For those who want to routinely back-transform before averaging, the transform argument in ref_grid() simplifies this. The first two steps above could have been done more easily as follows:

pigs.remm.s <- emmeans(pigs.lm, "source", transform = "response")

But don’t get transform and type confused. The transform argument is passed to regrid() after the reference grid is constructed, whereas the type argument is simply remembered and used by summary(). So a similar-looking call:

emmeans(pigs.lm, "source", type = "response")

will compute the results we have seen for pigs.emm.s – back-transformed after averaging on the log scale.

Remember again: When it comes to transformations, timing is everything.

Back to Contents

Special transformations

The make.tran() function provides several special transformations and sets things up so they can be handled in emmeans with relative ease. (See [help("make.tran", "emmeans")](../html/make.tran.html) for descriptions of what is available.)make.tran()works much likestats::make.link()in that it returns a list of functionslinkfun(),linkinv(), etc. that serve in managing results on a transformed scale. The difference is that most transformations withmake.tran()` require additional arguments.

To use this capability in emmeans(), it is fortuitous to first obtain the make.tran() result, and then to use it as the enclosing environment for fitting the model, with linkfun as the transformation. For example, suppose we want to use the response transformation \(\log(y + \frac12)\). Then proceed like this:

tran <- make.tran("genlog", 1/2)
my.model <- with(tran, 
    lmer(linkfun(yield) ~ treatment + (1|Block), data = mydata))

Subsequent calls to ref_grid(), emmeans(), regrid(), etc. will then be able to access the transformation information correctly.

The help page for make.tran() has an example like this using a Box-Cox transformation.

Back to Contents

Specifying a transformation after the fact

It is not at all uncommon to fit a model using statements like the following:

mydata <- transform(mydata, logy.5 = log(yield + .5))
my.model <- lmer(logy.5 ~ treatment + (1|Block), data = mydata)

In this case, there is no way for ref_grid() to figure out that a response transformation was used. What can be done is to update the reference grid with the required information:

my.rg <- update(ref_grid(my.model), tran = make.tran("genlog", .5))

Subsequently, use my.rg in place of my.mnodel in any emmeans() analyses, and the transformation information will be there.

For standard transformations (those in stats::make.link()), just give the name of the transformation; e.g.,

model.rg <- update(ref_grid(model), tran = "sqrt")

Back to Contents

Faking a log transformation

The regrid() function makes it possible to fake a log transformation of the response. Why would you want to do this? So that you can make comparisons using ratios instead of differences.

Consider the pigs example once again, but suppose we had fitted a model with a square-root transformation instead of a log:

pigroot.lm <- lm(sqrt(conc) ~ source + factor(percent), data = pigs)
piglog.emm.s <- regrid(emmeans(pigroot.lm, "source"), transform = "log")
confint(piglog.emm.s, type = "response")
##  source response       SE df lower.CL upper.CL
##  fish   29.84382 1.316416 23 27.24115 32.69514
##  soy    39.24300 1.541103 23 36.18104 42.56408
##  skim   44.99895 1.735523 23 41.54824 48.73626
## 
## Results are averaged over the levels of: percent 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale
pairs(piglog.emm.s, type = "response")
##  contrast        ratio         SE df t.ratio p.value
##  fish / soy  0.7604877 0.04535114 23  -4.591  0.0004
##  fish / skim 0.6632114 0.03910379 23  -6.965  <.0001
##  soy / skim  0.8720869 0.04685060 23  -2.548  0.0457
## 
## Results are averaged over the levels of: percent 
## P value adjustment: tukey method for comparing a family of 3 estimates 
## Tests are performed on the log scale

These results are not identical, but very similar to the back-transformed confidence intervals above for the EMMs and the pairwise ratios in the “comparisons” vignette, where the fitted model actually used a log response.

Back to Contents