Abstract

Psycho is an R package that aims at providing tools for psychologists, neuropsychologists and neuroscientists, to transform statistical outputs into something readable that can be, almost directly, copied and pasted into a report. It also implements various functions useful in psychological science, such as correlation matrices, assessment plot creation or normalization. The package revolves around the psychobject. Main functions from the package return this type, and the `analyze()`

function transforms other R objects into psychobjects. Four functions can then be applied on a psychobject: `summary()`

, `print()`

, `plot()`

and `values()`

. Contrary to many other packages which goal is to produce statistical analyzes, `psycho`

aims at filling the gap between statistical R outputs and statistical report writing, with a focus on APA formatting guidelines, to enhance the standardization of results reporting. Complex outputs, such as those of Bayesian and frequentist mixed models, are automatically transformed into readable text, tables, and plots that illustrate the effects. Thus, the results can easily be incorporated into shareable reports and publications, promoting data exploration, saving time and preventing errors for better, reproducible, science.

```
# Do this once (uncomment if needed)
# install.packages("devtools")
# library(devtools)
# devtools::install_github("https://github.com/neuropsychology/psycho.R")
# Load psycho (at the beginning of every script)
library(psycho)
```

The package mainly revolves around the `psychobject`

. Main functions from the package return this type, and the `analyze()`

function transforms other R objects into psychobjects. Then, 4 functions can be applied on a psychobject: `summary()`

, `print()`

, `plot()`

and `values()`

.

It is possible to quickly run a correlation analysis on a dataframe with the flexible and powerful `correlation()`

function.

```
library(psycho)
df <- iris
cor <- psycho::correlation(df,
type = "full",
method = "pearson",
adjust = "none")
summary(cor)
```

Sepal.Length | Sepal.Width | Petal.Length | |
---|---|---|---|

Sepal.Length | |||

Sepal.Width | -0.12 | ||

Petal.Length | 0.87*** | -0.43*** | |

Petal.Width | 0.82*** | -0.37*** | 0.96*** |

You can save this correlation matrix using `write.csv(print(cor), "correlation_table.csv")`

. That is very useful to *copy/paste* it from excel to a paper or a report :)

You can also draw a quick visualization:

`plot(cor)`

`correlation()`

offers the possibility to run partial or semi-partial correleations, as well as printing them pairwise.

```
library(psycho)
df <- iris
pcor <- psycho::correlation(df,
type = "partial",
method = "pearson",
adjust = "bonferroni")
summary(pcor)
```

```
Sepal.Length Sepal.Width Petal.Length
Sepal.Length
Sepal.Width 0.63***
Petal.Length 0.72*** -0.62***
Petal.Width -0.34*** 0.35*** 0.87***
```

You can also have access to the individual correlations as follows:

`print(pcor)`

```
Pearson Partial Correlation (p value correction: bonferroni):
- Sepal.Length - Sepal.Width: Results of the Pearson correlation showed a significant and strong negative association between Sepal.Length and Sepal.Width (r(148) = 0.63, p < .001***).
- Sepal.Length - Petal.Length: Results of the Pearson correlation showed a significant and strong negative association between Sepal.Length and Petal.Length (r(148) = 0.72, p < .001***).
- Sepal.Width - Petal.Length: Results of the Pearson correlation showed a significant and strong positive association between Sepal.Width and Petal.Length (r(148) = -0.62, p < .001***).
- Sepal.Length - Petal.Width: Results of the Pearson correlation showed a significant and moderate positive association between Sepal.Length and Petal.Width (r(148) = -0.34, p < .001***).
- Sepal.Width - Petal.Width: Results of the Pearson correlation showed a significant and moderate negative association between Sepal.Width and Petal.Width (r(148) = 0.35, p < .001***).
- Petal.Length - Petal.Width: Results of the Pearson correlation showed a significant and strong negative association between Petal.Length and Petal.Width (r(148) = 0.87, p < .001***).
```

The `standardize()`

function allows you to easily scale and center all numeric variables of a dataframe. It is similar to the base function `scale()`

, but presents some advantages: it is tidyverse-friendly, data-type friendly (*i.e.*, does not transform it into a matrix) and can handle dataframes with categorical data.

```
library(psycho)
library(tidyverse)
iris %>%
select(Species, Sepal.Length, Petal.Length) %>%
psycho::standardize() %>%
summary()
```

```
Species Sepal.Length Petal.Length
setosa :50 Min. :-1.86378 Min. :-1.5623
versicolor:50 1st Qu.:-0.89767 1st Qu.:-1.2225
virginica :50 Median :-0.05233 Median : 0.3354
Mean : 0.00000 Mean : 0.0000
3rd Qu.: 0.67225 3rd Qu.: 0.7602
Max. : 2.48370 Max. : 1.7799
```

This function is useful in clinical activity. It is sometimes necessary to show to the patient, his family or other members of staff, a visual representation of his score. The `assess()`

function also computes the percentile and the Z-score, often needed for neuropsychological reports.

```
library(psycho)
results <- psycho::assess(124, mean=100, sd=15)
# Print it
print(results)
```

`The participant (score = 124) is positioned at 1.6 standard deviations from the mean (M = 100, SD = 15). The participant's score is greater than 94.35 % of the general population.`

```
# Plot it
plot(results)
```

It is also possible to custom the plot a bit (see the documentation for available parameters).

```
library(psycho)
results <- psycho::assess(85, mean=100, sd=15, linecolor = "orange", fillcolor = "#4CAF50")
# Plot it
plot(results)
```

In general, the `plot()`

function returns, most of the times, a ggplot object. That means it remains quite flexible. Here’s an example.

```
library(psycho)
# Let's create a correlation plot
p <- plot(psycho::correlation(iris))
# Custom theme and colours
p <- p +
scale_fill_gradientn(colors = c("#4CAF50", "#FFEB3B", "#FF5722")) +
ylab("Variables\n") +
labs(fill = "r") +
theme(plot.background = element_rect(fill = "#607D8B"),
axis.title.y = element_text(size = 20, angle = 90, colour="white"),
axis.text = element_text(size = 15, colour="white"),
legend.title = element_text(size = 20, colour="white"),
legend.text = element_text(size = 15, colour="white"),
title = element_text(size = 16, colour="white"))
p
```

The `n_factors()`

function is useful in before running principal component (PCA) or factor (FA) analysis. As many statistical methods exists to that purpose, this function gathers them together and gives an overview on the most frequent result. It also draw a nice plot with the eigenvalues and the proportion of explained variance.

```
results <- attitude %>%
select_if(is.numeric) %>%
psycho::n_factors()
# Get a summary
summary(results)
```

n.Factors | n.Methods | Eigenvalues | Exp.Variance | Cum.Variance |
---|---|---|---|---|

1 | 5 | 3.7163758 | 0.5309108 | 0.5309108 |

2 | 3 | 1.1409219 | 0.1629888 | 0.6938997 |

3 | 1 | 0.8471915 | 0.1210274 | 0.8149270 |

4 | 0 | 0.6128697 | 0.0875528 | 0.9024798 |

5 | 0 | 0.3236728 | 0.0462390 | 0.9487188 |

6 | 0 | 0.2185306 | 0.0312187 | 0.9799375 |

7 | 0 | 0.1404378 | 0.0200625 | 1.0000000 |

We can also extract the final result (the optimal number of factors) for each method:

Method | n_optimal |
---|---|

Optimal Coordinates | 1 |

Acceleration Factor | 1 |

Parallel Analysis | 1 |

Eigenvalues (Kaiser Criterion) | 2 |

Velicer MAP | 1 |

BIC | 2 |

Sample Size Adjusted BIC | 3 |

VSS Complexity 1 | 1 |

VSS Complexity 2 | 2 |

And, of course, plot it :)

`plot(results)`

This is possibly the most important function of the `psycho`

package. Its goal is to transform complex outputs of complex statistical routines into something readable, interpretable, and formatted. It is designed to work with frequentist and Bayesian mixed models, which is the central statistical method for psychological science.

Let’s start by creating a dataframe similar to those found in psychological science.

```
set.seed(666)
df <- data.frame(Participant = as.factor(rep(1:25, each = 4)),
Item = rep_len(c("i1", "i2", "i3", "i4"), 100),
Condition = rep_len(c("A", "B", "A", "B", "B"), 20),
Error = as.factor(sample(c(0, 1), 100, replace = T)),
RT = rnorm(100, 30, .2),
Stress = runif(100, 3, 5))
# Standardize the numeric variables.
df <- psycho::standardize(df)
# Take a look at the first 10 rows
head(df)
```

Participant | Item | Condition | Error | RT | Stress |
---|---|---|---|---|---|

1 | i1 | A | 1 | 0.2610666 | -1.5032539 |

1 | i2 | B | 0 | 1.2180393 | -1.3381086 |

1 | i3 | A | 1 | -0.6122813 | -1.6359922 |

1 | i4 | B | 0 | -0.5209097 | -0.1893384 |

2 | i1 | B | 0 | -0.4227089 | -0.2383872 |

2 | i2 | A | 1 | -0.7291277 | 1.1834562 |

This dataframe contains the data of 25 participants (labelled from 1 to 25). Each saw 4 different items (i1-i4) in two conditions (A and B). We measured, for each item, if the response was correct or not (Error), its reaction time (RT) and the stress associated with the trial.

In order to investigate the effect of the condition on the reaction time RT, the traditional, ancient and obsolete routine is 1) to compute the mean for each participant, and 2) run an ANOVA.

```
# Format data
df_for_anova <- df %>%
dplyr::group_by(Participant, Condition) %>%
dplyr::summarise(RT = mean(RT))
# Run the anova
anova <- aov(RT ~ Condition + Error(Participant), df_for_anova)
summary(anova)
```

```
Error: Participant
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 24 15.13 0.6304
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Condition 1 0.725 0.7254 1.102 0.304
Residuals 24 15.793 0.6580
```

As we can see, the effect of the condition is not significant (unsuprisingly, as data was generated randomly). One of the many flaws of this approach is that we lose information about intra-individual and item-related variability.

The use of the mixed-modelling framework allows us to add the items as random factors.

```
library(lme4)
fit <- lme4::lmer(RT ~ Condition + (1|Participant) + (1|Item), data=df)
# Traditional output
summary(fit)
```

```
Linear mixed model fit by REML ['lmerMod']
Formula: RT ~ Condition + (1 | Participant) + (1 | Item)
Data: df
REML criterion at convergence: 282.3
Scaled residuals:
Min 1Q Median 3Q Max
-2.8414 -0.6905 -0.0882 0.7010 2.4840
Random effects:
Groups Name Variance Std.Dev.
Participant (Intercept) 0.0000 0.0000
Item (Intercept) 0.1036 0.3219
Residual 0.9251 0.9618
Number of obs: 100, groups: Participant, 25; Item, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.09239 0.22142 0.417
ConditionB -0.15398 0.19633 -0.784
Correlation of Fixed Effects:
(Intr)
ConditionB -0.532
```

As the output is a bit messy, the `analyze()`

function will munge this into something nicely formatted.

```
results <- psycho::analyze(fit)
# We can extract a formatted summary table
summary(results, round = 2)
```

Effect_Size | Coef | SE | t | df | Coef.std | SE.std | p | |
---|---|---|---|---|---|---|---|---|

(Intercept) | Very Small | 0.09 | 0.22 | 0.42 | 5.81 | 0.00 | 0.0 | 0.69 |

ConditionB | Very Small | -0.15 | 0.20 | -0.78 | 95.00 | -0.08 | 0.1 | 0.43 |

We can also print it in a text format!

`print(results)`

```
The overall model predicting ... successfully converged and explained 10.57% of the variance of the endogen (the conditional R2). The variance explained by the fixed effects was of 0.56% (the marginal R2) and the one explained by the random effects of 10.01%.
- The effect of (Intercept) was [NOT] significant (beta = 0.092, SE = 0.22, t(5.81) = 0.42, p > .1) and can be considered as very small (std. beta = 0, std. SE = 0).
- The effect of ConditionB was [NOT] significant (beta = -0.15, SE = 0.20, t(95.00) = -0.78, p > .1) and can be considered as very small (std. beta = -0.076, std. SE = 0.097).
```

However, as the frequentist framework is criticized, it is advised to switch to the Bayesian framework. Unfortunately, interpretation of these models remain unfamiliar to regular psychologists. But stay calm, because `analyze()`

handles this for you.

```
library(rstanarm)
fit <- rstanarm::stan_lmer(RT ~ Condition + (1|Participant) + (1|Item), data=df)
# Traditional output
results <- psycho::analyze(fit, effsize=T)
summary(results, round=2)
```

Variable | MPE | Median | MAD | Mean | SD | 95_CI_lower | 95_CI_higher | Very_Large | Large | Medium | Small | Very_Small | Opposite |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

(Intercept) | 64.18 | 0.09 | 0.26 | 0.10 | 0.34 | -0.57 | 0.72 | 0 | 0.02 | 0.05 | 0.26 | 0.31 | 0.36 |

ConditionB | 79.50 | -0.16 | 0.19 | -0.16 | 0.19 | -0.52 | 0.23 | 0 | 0.00 | 0.04 | 0.38 | 0.38 | 0.20 |

`print(results)`

```
We fitted a Markov Chain Monte Carlo [type] model to predict[Y] with [X] (formula = RT ~ Condition + (1 | Participant) + (1 | Item)).Priors were set as follow: [INSERT INFO ABOUT PRIORS].
- Concerning the effect of (Intercept), there is a probability of 64.18% that its coefficient is between 0 and 3.05 (Median = 0.092, MAD = 0.26, 95% CI [-0.57, 0.72], MPE = 64.18%).
- Based on Cohen (1988) recommandations, there is a probability of 0.48% that this effect size is very large, 1.98% that this effect size is large, 5.30% that this effect size is medium, 25.50% that this effect size is small, 30.93% that this effect is very small and 35.83% that it has an opposite direction (between 0 and 3.2e-05).
- Concerning the effect of ConditionB, there is a probability of 79.50% that its coefficient is between -0.87 and 0 (Median = -0.16, MAD = 0.19, 95% CI [-0.52, 0.23], MPE = 79.50%).
- Based on Cohen (1988) recommandations, there is a probability of 0% that this effect size is very large, 0.050% that this effect size is large, 3.52% that this effect size is medium, 38.12% that this effect size is small, 37.80% that this effect is very small and 20.50% that it has an opposite direction (between 0 and 0.52).
```

We can also plot the effects:

`plot(results)`

Obviously, you need to learn more about Bayesian analyses before running them. You can find more information in the rstanarm’s vignettes.