This package has two parts:

- The first part provides tools to manipulate formulas.
- The second part provides functions to evaluate and check the marginal impacts of a linear model.

Variables in Râ€™s linear formula/model can have different forms:

- Model variables, the items showed up directly in the formula, separated by the â€˜+â€™ sign.
- Raw variables, the underlying variables used.
- Coefficient variables, the coefficient names; note that un-evaluated formulas donâ€™t have those variables.

`get_x(formula/model,'coeff')`

```
data = ggplot2::diamonds
diamond_lm = lm(log(price)~ I(carat^ 2) + cut + carat + table + carat:table, data)
```

At the first sight, the linear model above contains 5 variables:

- I(carat^ 2)
- cut
- carat
- table
- carat:table

In linear.tools we call them *model* variables and can access them using function `get_x(.,'model')`

:

`get_x(diamond_lm,'model')`

`## [1] "I(carat^2)" "cut" "carat" "table" "carat:table"`

Note that in the original formula, there are redundant spaces â€˜I(carat^ 2)â€™; in `get_x(.,'model')`

we deleted them.

`get_x(formula/model,'coeff')`

Sometimes you want to get the underlying raw variables used in the formula, which are

- carat (the underlying variable for I(carat^ 2))
- cut
- carat
- table

In linear.tools we call them *raw* variables and can access them using function `get_x(.,'raw')`

:

`get_x(diamond_lm,'raw')`

`## [1] "carat" "cut" "table"`

`get_x(.,'model')`

will show the linkage between model variables and raw variables: it will return a list with names as model variables and elements as their corresponding raw variables.

`get_model_pair(diamond_lm, data, 'raw')`

```
## $`I(carat^2)`
## [1] "carat"
##
## $cut
## [1] "cut"
##
## $carat
## [1] "carat"
##
## $table
## [1] "table"
##
## $`carat:table`
## [1] "carat" "table"
```

`get_x(model,'coeff')`

Sometimes you want the the coefficient names of the model

`get_x(diamond_lm,'coeff')`

```
## [1] "I(carat^2)" "cut.L" "cut.Q" "cut.C" "cut^4"
## [6] "carat" "table" "carat:table"
```

You may also want to see how â€˜modelâ€™ variables are linked with â€˜coeffâ€™ variables: `get_x(.,'coeff')`

will return a list with names as model variables and elements as their corresponding coeff variables.

`get_model_pair(diamond_lm, data, 'coeff')`

```
## $`I(carat^2)`
## [1] "I(carat^2)"
##
## $cut
## [1] "cut.L" "cut.Q" "cut.C" "cut^4"
##
## $carat
## [1] "carat"
##
## $table
## [1] "table"
##
## $`carat:table`
## [1] "carat:table"
```

`get_x_all(model)`

The `get_x_all()`

function will return a data.frame showing all the model variables and their corresponding raw & coefficient variables.

`get_x_all(model = diamond_lm)`

```
## raw model coeff n_raw_in_model
## 1 carat I(carat^2) I(carat^2) 1
## 2 cut cut cut.L 1
## 3 cut cut cut.Q 1
## 4 cut cut cut.C 1
## 5 cut cut cut^4 1
## 6 carat carat carat 1
## 7 table table table 1
## 8 carat carat:table carat:table 2
## 9 table carat:table carat:table 2
```

`get_y(formula/model)`

`get_y(diamond_lm,'raw')`

`## [1] "price"`

`get_y(diamond_lm,'model')`

`## [1] "log(price)"`

Contrasts are how categorical variables show up in coefficients.

When R evaluate categorical variables in the linear model, R will transform them into sets of â€˜contrastsâ€™ using certain contrast encoding schedule. See UCLA idre for details.

For example, for categorical variable â€˜cutâ€™ in the above model, we can get its contrasts through function `get_contrast`

```
# get_contrast will return a list with each element as the contrasts of a categorical variable in the model
get_contrast(diamond_lm)
```

```
## $cut
## [1] "cut.L" "cut.Q" "cut.C" "cut^4"
```

You can also return the contrast method.

`get_contrast(diamond_lm, return_method = T)`

```
## $cut
## contr.poly
```

In formula `y ~ a + I(a^2) + b`

, We define â€˜Marginal Effectâ€™ of `a`

on `y`

as: fixing `b`

, how the change of `a`

will affect value of `y`

. Note that the marginal effect here is not just the coefficients for `a`

and `I(a^2)`

, neither the sum.

`effect`

We provide a easy tool to show the marginal effect and check its monotonicity. The example below will evaluate how the `carat`

of the diamond will affect its `price`

in a particular model.

```
# more carats, higher price.
diamond_lm3 = lm(price~ carat + I(carat^2) + I(carat^3) , ggplot2::diamonds) # a GLM
test1 = effect(model = diamond_lm3, focus_var_raw = c('carat'), focus_value =list(carat = seq(0.5,1,0.1)))
```

`test1$Monoton_Increase`

`## [1] TRUE`

You can see that the model did a good job to model monotonic increasing relations between `carat`

and `price`

when `carat`

ranges from 0.5 to 1 (`$Monoton_Increase`

is `True`

).

PS: A more interesting case is that, if you interact `carat`

with the categorical variable `cut`

, you can examine the marginal effects `carat`

under different categories of `cut`

```
test_interaction = effect(model = lm(price~ carat*cut + I(carat^2)*cut, ggplot2::diamonds),
focus_var_raw = c('carat','cut'), focus_value =list(carat = seq(0.5,1,0.1))
)
```

However, in the model `diamond_lm3`

when we let the `carat`

ranges from 0.5 to 6, the model failed to get the monotonic increasing relations: in the model below, when carat is larger than 3 approximately, the higher the carat, the lower the price!

`test2 = effect(model = diamond_lm3, focus_var_raw = c('carat'), focus_value =list(carat = seq(0.5,6,0.1))) `

`test2$Monoton_Increase`

`## [1] FALSE`

When a model has a wrong marginal effect, we can use function `deleting_wrongeffect`

to delete a model variable that potentially causes the wrong marginal impacts and then re-estimate the model. This function can keep doing this until the correct marginal impacts are found.

The example below will

- first test the marginal effect of carat on price, which is supposed to be monotonic increasing.
- then as it finds incorrect marginal effect, it will delete one model variable that contains
`carat`

in the most right, and then recheck the marginal effect. - It will keep doing the same thing until the marginal effect is correct, or all model variables containing
`carat`

are deleted.

```
model_correct_effect = deleting_wrongeffect(model = diamond_lm3,
focus_var_raw = 'carat',
focus_value = list(carat=seq(0.5,6,0.1)),
data = ggplot2::diamonds,
PRINT = T,STOP =F, PLOT = T,
Reverse = F)
```

```
##
## initial model:
## Estimate Pr(>|t|)
## (Intercept) -198.3337 3.930283e-11
## carat 812.3639 1.540245e-19
## I(carat^2) 5813.2637 0.000000e+00
## I(carat^3) -1308.8438 0.000000e+00
##
##
## check raw var: carat
## check model var: carat, I(carat^2), I(carat^3)
## Correct Monotonicity is supposed to be: Increasing
```