Introduction to olsrr

Introduction

The olsrr package provides following tools for teaching and learning OLS regression using R:

This document is a quickstart guide to the tools offered by olsrr. Other vignettes provide more details on specific topics:

Regression

ols_regress(mpg ~ disp + hp + wt + qsec, data = mtcars)
##                         Model Summary                          
## --------------------------------------------------------------
## R                       0.914       RMSE                2.622 
## R-Squared               0.835       Coef. Var          13.051 
## Adj. R-Squared          0.811       MSE                 6.875 
## Pred R-Squared          0.771       MAE                 1.858 
## --------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                 
## --------------------------------------------------------------------
##                 Sum of                                              
##                Squares        DF    Mean Square      F         Sig. 
## --------------------------------------------------------------------
## Regression     940.412         4        235.103    34.195    0.0000 
## Residual       185.635        27          6.875                     
## Total         1126.047        31                                    
## --------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    27.330         8.639                  3.164    0.004     9.604    45.055 
##        disp     0.003         0.011        0.055     0.248    0.806    -0.019     0.025 
##          hp    -0.019         0.016       -0.212    -1.196    0.242    -0.051     0.013 
##          wt    -4.609         1.266       -0.748    -3.641    0.001    -7.206    -2.012 
##        qsec     0.544         0.466        0.161     1.166    0.254    -0.413     1.501 
## ----------------------------------------------------------------------------------------

In the presence of interaction terms in the model, the predictors are scaled and centered before computing the standardized betas. ols_regress() will detect interaction terms automatically but in case you have created a new variable instead of using the inline function *, you can indicate the presence of interaction terms by setting iterm to TRUE.

Residual vs Fitted Values Plot

Plot to detect non-linearity, unequal error variances, and outliers.

model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_plot_resid_fit(model)

DFBETAs Panel

DFBETAs measure the difference in each parameter estimate with and without the influential observation. dfbetas_panel creates plots to detect influential observations using DFBETAs.

model <- lm(mpg ~ disp + hp + wt, data = mtcars)
ols_plot_dfbetas(model)

Residual Fit Spread Plot

Plot to detect non-linearity, influential observations and outliers.

model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_plot_resid_fit_spread(model)

Breusch Pagan Test

Breusch Pagan test is used to test for herteroskedasticity (non-constant error variance). It tests whether the variance of the errors from a regression is dependent on the values of the independent variables. It is a \(\chi^{2}\) test.

model <- lm(mpg ~ disp + hp + wt + drat, data = mtcars)
ols_test_breusch_pagan(model)
## 
##  Breusch Pagan Test for Heteroskedasticity
##  -----------------------------------------
##  Ho: the variance is constant            
##  Ha: the variance is not constant        
## 
##              Data               
##  -------------------------------
##  Response : mpg 
##  Variables: fitted values of mpg 
## 
##        Test Summary         
##  ---------------------------
##  DF            =    1 
##  Chi2          =    1.429672 
##  Prob > Chi2   =    0.231818

Collinearity Diagnostics

model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_coll_diag(model)
## Tolerance and Variance Inflation Factor
## ---------------------------------------
## # A tibble: 4 x 3
##   Variables Tolerance   VIF
##   <chr>         <dbl> <dbl>
## 1 disp          0.125  7.99
## 2 hp            0.194  5.17
## 3 wt            0.145  6.92
## 4 qsec          0.319  3.13
## 
## 
## Eigenvalue and Condition Index
## ------------------------------
##    Eigenvalue Condition Index   intercept        disp          hp
## 1 4.721487187        1.000000 0.000123237 0.001132468 0.001413094
## 2 0.216562203        4.669260 0.002617424 0.036811051 0.027751289
## 3 0.050416837        9.677242 0.001656551 0.120881424 0.392366164
## 4 0.010104757       21.616057 0.025805998 0.777260487 0.059594623
## 5 0.001429017       57.480524 0.969796790 0.063914571 0.518874831
##             wt         qsec
## 1 0.0005253393 0.0001277169
## 2 0.0002096014 0.0046789491
## 3 0.0377028008 0.0001952599
## 4 0.7017528428 0.0024577686
## 5 0.2598094157 0.9925403056

Stepwise Regression

Build regression model from a set of candidate predictor variables by entering and removing predictors based on p values, in a stepwise manner until there is no variable left to enter or remove any more.

Variable Selection

# stepwise regression
model <- lm(y ~ ., data = surgical)
ols_step_both_p(model)
## Stepwise Selection Method   
## ---------------------------
## 
## Candidate Terms: 
## 
## 1. bcs 
## 2. pindex 
## 3. enzyme_test 
## 4. liver_test 
## 5. age 
## 6. gender 
## 7. alc_mod 
## 8. alc_heavy 
## 
## We are selecting variables based on p value...
## 
## Variables Entered/Removed: 
## 
## - liver_test added 
## - alc_heavy added 
## - enzyme_test added 
## - pindex added 
## - bcs added 
## 
## No more variables to be added/removed.
## 
## 
## Final Model Output 
## ------------------
## 
##                           Model Summary                           
## -----------------------------------------------------------------
## R                       0.884       RMSE                 195.454 
## R-Squared               0.781       Coef. Var             27.839 
## Adj. R-Squared          0.758       MSE                38202.426 
## Pred R-Squared          0.700       MAE                  137.656 
## -----------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                  ANOVA                                  
## -----------------------------------------------------------------------
##                    Sum of                                              
##                   Squares        DF    Mean Square      F         Sig. 
## -----------------------------------------------------------------------
## Regression    6535804.090         5    1307160.818    34.217    0.0000 
## Residual      1833716.447        48      38202.426                     
## Total         8369520.537        53                                    
## -----------------------------------------------------------------------
## 
##                                       Parameter Estimates                                        
## ------------------------------------------------------------------------------------------------
##       model         Beta    Std. Error    Std. Beta      t        Sig         lower       upper 
## ------------------------------------------------------------------------------------------------
## (Intercept)    -1178.330       208.682                 -5.647    0.000    -1597.914    -758.746 
##  liver_test       58.064        40.144        0.156     1.446    0.155      -22.652     138.779 
##   alc_heavy      317.848        71.634        0.314     4.437    0.000      173.818     461.878 
## enzyme_test        9.748         1.656        0.521     5.887    0.000        6.419      13.077 
##      pindex        8.924         1.808        0.380     4.935    0.000        5.288      12.559 
##         bcs       59.864        23.060        0.241     2.596    0.012       13.498     106.230 
## ------------------------------------------------------------------------------------------------
## 
##                                 Stepwise Selection Summary                                 
## ------------------------------------------------------------------------------------------
##                         Added/                   Adj.                                         
## Step     Variable      Removed     R-Square    R-Square     C(p)        AIC         RMSE      
## ------------------------------------------------------------------------------------------
##    1    liver_test     addition       0.455       0.444    62.5120    771.8753    296.2992    
##    2     alc_heavy     addition       0.567       0.550    41.3680    761.4394    266.6484    
##    3    enzyme_test    addition       0.659       0.639    24.3380    750.5089    238.9145    
##    4      pindex       addition       0.750       0.730     7.5370    735.7146    206.5835    
##    5        bcs        addition       0.781       0.758     3.1920    730.6204    195.4544    
## ------------------------------------------------------------------------------------------

Plot

model <- lm(y ~ ., data = surgical)
k <- ols_step_both_p(model)
## Stepwise Selection Method   
## ---------------------------
## 
## Candidate Terms: 
## 
## 1. bcs 
## 2. pindex 
## 3. enzyme_test 
## 4. liver_test 
## 5. age 
## 6. gender 
## 7. alc_mod 
## 8. alc_heavy 
## 
## We are selecting variables based on p value...
## 
## Variables Entered/Removed: 
## 
## - liver_test added 
## - alc_heavy added 
## - enzyme_test added 
## - pindex added 
## - bcs added 
## 
## No more variables to be added/removed.
## 
## 
## Final Model Output 
## ------------------
## 
##                           Model Summary                           
## -----------------------------------------------------------------
## R                       0.884       RMSE                 195.454 
## R-Squared               0.781       Coef. Var             27.839 
## Adj. R-Squared          0.758       MSE                38202.426 
## Pred R-Squared          0.700       MAE                  137.656 
## -----------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                  ANOVA                                  
## -----------------------------------------------------------------------
##                    Sum of                                              
##                   Squares        DF    Mean Square      F         Sig. 
## -----------------------------------------------------------------------
## Regression    6535804.090         5    1307160.818    34.217    0.0000 
## Residual      1833716.447        48      38202.426                     
## Total         8369520.537        53                                    
## -----------------------------------------------------------------------
## 
##                                       Parameter Estimates                                        
## ------------------------------------------------------------------------------------------------
##       model         Beta    Std. Error    Std. Beta      t        Sig         lower       upper 
## ------------------------------------------------------------------------------------------------
## (Intercept)    -1178.330       208.682                 -5.647    0.000    -1597.914    -758.746 
##  liver_test       58.064        40.144        0.156     1.446    0.155      -22.652     138.779 
##   alc_heavy      317.848        71.634        0.314     4.437    0.000      173.818     461.878 
## enzyme_test        9.748         1.656        0.521     5.887    0.000        6.419      13.077 
##      pindex        8.924         1.808        0.380     4.935    0.000        5.288      12.559 
##         bcs       59.864        23.060        0.241     2.596    0.012       13.498     106.230 
## ------------------------------------------------------------------------------------------------
plot(k)

Stepwise AIC Backward Regression

Build regression model from a set of candidate predictor variables by removing predictors based on Akaike Information Criteria, in a stepwise manner until there is no variable left to remove any more.

Variable Selection

# stepwise aic backward regression
model <- lm(y ~ ., data = surgical)
k <- ols_step_backward_aic(model)
## Backward Elimination Method 
## ---------------------------
## 
## Candidate Terms: 
## 
## 1 . bcs 
## 2 . pindex 
## 3 . enzyme_test 
## 4 . liver_test 
## 5 . age 
## 6 . gender 
## 7 . alc_mod 
## 8 . alc_heavy 
## 
## 
## Variables Removed: 
## 
## - alc_mod 
## - gender 
## - age 
## 
## No more variables to be removed.
k
## 
## 
##                         Backward Elimination Summary                         
## ---------------------------------------------------------------------------
## Variable        AIC          RSS          Sum Sq        R-Sq      Adj. R-Sq 
## ---------------------------------------------------------------------------
## Full Model    736.390    1825905.713    6543614.824    0.78184      0.74305 
## alc_mod       734.407    1826477.828    6543042.709    0.78177      0.74856 
## gender        732.494    1829435.617    6540084.920    0.78142      0.75351 
## age           730.620    1833716.447    6535804.090    0.78091      0.75808 
## ---------------------------------------------------------------------------

Plot

model <- lm(y ~ ., data = surgical)
k <- ols_step_backward_aic(model)
## Backward Elimination Method 
## ---------------------------
## 
## Candidate Terms: 
## 
## 1 . bcs 
## 2 . pindex 
## 3 . enzyme_test 
## 4 . liver_test 
## 5 . age 
## 6 . gender 
## 7 . alc_mod 
## 8 . alc_heavy 
## 
## 
## Variables Removed: 
## 
## - alc_mod 
## - gender 
## - age 
## 
## No more variables to be removed.
plot(k)