We present a package for estimation of cis-eQTL effect sizes, using a new model called ACME which respects biological understanding of cis-eQTL action. The model, fully presented and validated in (Palowitch et al. 2017), involves an additive effect of allele count and multiplicative component random noise (hence “ACME”: Additive-Contribution, Multiplicative-Error), and is defined as

\[y_i = \log(\beta_0 + \beta_1 s_i) + Z_i^T \gamma + \epsilon_i\]

where

- \(y_i\) – log-transformed normalized gene read count from sample \(i\)
- \(s_i\) – minor allele count for the SNP of sample \(i\)
- \(Z_i\) – the \(p \times 1\) vector of covarites for the sample \(i\)
- \(\beta_0\) – unknown “baseline” mean of raw expression
- \(\beta_1\) – unknown linear contribution of \(s_i\) on the
**raw**expression - \(\gamma\) – unknown \(p \times 1\) covariate coefficient
- \(\epsilon_i\) – noise

We estimate the model using a fast iterative algorithm.

The algorithm estimates the model which is nonlinear only with respect to \(\eta = \beta_1 / \beta_0\)

\[y_i = \log(1 + s_i \eta) + \log(\beta_0) + Z_i^T \gamma + \epsilon_i\]

ACMEeqtl can be installed with the following command.

`install.packages("ACMEeqtl")`

ACMEeqtl package provides functions for analysis of a single gene-SNP pair as well as fast parallel testing of all local gene-SNP pairs.

`library(ACMEeqtl)`

First we generate sample gene expression, SNP allele counts, and a set of covariates.

```
# Model parameters
beta0 = 10000
beta1 = 50000
# Data dimensions
nsample = 1000
ncvrt = 19
### Data generation
### Zero average covariates
cvrt = matrix(rnorm(nsample * ncvrt), nsample, ncvrt)
cvrt = t(t(cvrt) - colMeans(cvrt))
# Generate SNPs
s = rbinom(n = nsample, size = 2, prob = 0.2)
# Generate log-normalized expression
y = log(beta0 + beta1 * s) +
cvrt %*% rnorm(ncvrt) +
rnorm(nsample)
```

We provide two equivalent functions for model estimation.

`effectSizeEstimationR`

– fully coded in R`effectSizeEstimationC`

– faster version with core coded in C.

```
z1 = effectSizeEstimationR(s, y, cvrt)
z2 = effectSizeEstimationC(s, y, cvrt)
pander(rbind(z1,z2))
```

beta0 | beta1 | nits | SSE | SST | Ftest | eta | SE_eta | |
---|---|---|---|---|---|---|---|---|

z1 |
9744 | 52302 | 6 | 907 | 1764 | 926 | 5.37 | 0.401 |

z2 |
9744 | 52302 | 6 | 907 | 1764 | 926 | 5.37 | 0.401 |

Variables `z1`

, `z2`

show that the estimation was done in 6 iterations, with estimated parameters

- \(\hat\beta_0\) = 9743.5 (true parameter is 10000)
- \(\hat\beta_1\) = 52302.2 (true parameter is 50000)

First we generate a eQTL dataset in filematrix format (see filematrix package).

```
tempdirectory = tempdir()
z = create_artificial_data(
nsample = 100,
ngene = 100,
nsnp = 501,
ncvrt = 1,
minMAF = 0.2,
saveDir = tempdirectory,
returnData = FALSE,
savefmat = TRUE,
savetxt = FALSE,
verbose = FALSE)
```

In this example, we use 2 CPU cores (threads) for testing of all gene-SNP pairs within 100,000 bp.

```
multithreadACME(
genefm = "gene",
snpsfm = "snps",
glocfm = "gene_loc",
slocfm = "snps_loc",
cvrtfm = "cvrt",
acmefm = "ACME",
cisdist = 1.5e+06,
threads = 2,
workdir = file.path(tempdirectory, "filematrices"),
verbose = FALSE)
```

Now the filematrix `ACME`

holds estimations for all local gene-SNP pairs.

```
fm = fm.open(file.path(tempdirectory, "filematrices", "ACME"))
TenResults = fm[,1:10]
rownames(TenResults) = rownames(fm)
close(fm)
pander(t(TenResults))
```

geneid | snp_id | beta0 | beta1 | nits | SSE | SST | F | eta | SE |
---|---|---|---|---|---|---|---|---|---|

1 | 1 | 92.7 | -26.9 | 8 | 104 | 114 | 9.51 | -0.29 | 0.0602 |

1 | 2 | 82.7 | -16.9 | 6 | 112 | 114 | 2.17 | -0.204 | 0.109 |

1 | 3 | 72.7 | -2.84 | 3 | 114 | 114 | 0.0595 | -0.0391 | 0.155 |

1 | 4 | 61.7 | 12.8 | 5 | 113 | 114 | 1.13 | 0.207 | 0.223 |

2 | 4 | 99.1 | -30.6 | 8 | 103 | 109 | 5.89 | -0.309 | 0.079 |

2 | 5 | 82.5 | -10.5 | 4 | 108 | 109 | 0.941 | -0.127 | 0.114 |

2 | 6 | 86.9 | -18.1 | 5 | 106 | 109 | 2.7 | -0.209 | 0.0993 |

2 | 7 | 79.1 | -7.04 | 4 | 109 | 109 | 0.339 | -0.089 | 0.14 |

2 | 8 | 76.3 | -3.47 | 4 | 109 | 109 | 0.0999 | -0.0455 | 0.138 |

2 | 9 | 66.3 | 12.2 | 4 | 108 | 109 | 1.11 | 0.184 | 0.199 |

Now we can estimate multi-SNP ACME models for each gene:

```
multisnpACME(
genefm = "gene",
snpsfm = "snps",
glocfm = "gene_loc",
slocfm = "snps_loc",
cvrtfm = "cvrt",
acmefm = "ACME",
workdir = file.path(tempdirectory, "filematrices"),
genecap = Inf,
verbose = FALSE)
```

Now the filematrix `ACME_multiSNP`

holds estimations for all multi-SNP models.

```
fm = fm.open(file.path(tempdirectory, "filematrices", "ACME_multiSNP"))
TenResults = fm[,1:10]
rownames(TenResults) = rownames(fm)
close(fm)
pander(t(TenResults))
```

geneid | snp_id | beta0 | betas | forward_adjR2 |
---|---|---|---|---|

1 | 1 | 91.3 | -29 | 0.0799 |

1 | 4 | 91.3 | 13.1 | 0.0843 |

2 | 4 | 120 | -36.6 | 0.0475 |

2 | 5 | 120 | -11.9 | 0.0505 |

3 | 10 | 104 | 21.6 | 0.00589 |

4 | 19 | 40.8 | 48.1 | 0.0669 |

4 | 17 | 40.8 | 21.3 | 0.0877 |

4 | 18 | 40.8 | 15.7 | 0.0939 |

5 | 21 | 35.7 | 126 | 0.198 |

5 | 23 | 35.7 | 51.4 | 0.243 |

Note that each multi-SNP model will contain at least one SNP, even if that initial SNP was not significant under the single-SNP models. This initial SNP will be the one with the highest adjusted-R\(^2\) value among the single-SNP models. However, after the initial SNP, further SNPs are added only if the combined model’s adjusted-R\(^2\) is greater than that from the previous combined model.

Palowitch, John, Andrey Shabalin, Yi-Hui Zhou, Andrew B Nobel, and Fred A Wright. 2017. “Estimation of Cis-eQTL Effect Sizes Using a Log of Linear Model.” *Biometrics*. Wiley Online Library.