LikertMakeR synthesises and correlates Likert-scale and related rating-scale data. You decide the mean and standard deviation, and (optionally) the correlations among vectors, and the package will generate data with those same predefined properties.
The package generates a column of values that simulate the same properties as a rating scale. If multiple columns are generated, then you can use LikertMakeR to rearrange the values so that the new variables are correlated exactly in accord with a user-predefined correlation matrix.
The package should be useful for teaching in the Social Sciences, and for scholars who wish to “replicate” rating-scale data for further analysis and visualisation when only summary statistics have been reported.
I was prompted to write the functions in LikertMakeR after reviewing too many journal article submissions where authors presented questionnaire results with only means and standard deviations (often only the means), with no understanding of the real distributions. Hopefully, this tool will help researchers, teachers, and other reviewers, to better think about rating-scale distributions, and the effects of variance, boundaries and number of items in a scale.
A Likert scale is the mean, or sum, of several ordinal rating scales. They are bipolar (usually “agree-disagree”) responses to propositions that are determined to be moderately-to-highly correlated and capturing various facets of a construct.
Rating scales, such as Likert scales, are not continuous or unbounded.
For example, a 5-point Likert scale that is constructed with, say, five items (questions) will have a summed range of between 5 (all rated ‘1’) and 25 (all rated ‘5’) with all integers in between, and the mean range will be ‘1’ to ‘5’ with intervals of 1/5=0.20. A 7-point Likert scale constructed from eight items will have a summed range between 8 (all rated ‘1’) and 56 (all rated ‘7’) with all integers in between, and the mean range will be ‘1’ to ‘7’ with intervals of 1/8=0.125.
Rating-scale boundaries define minima and maxima for any scale values. If the mean is close to one boundary then data points will gather more closely to that boundary and the data will always be skewed.
Download and Install LikertMakeR from GitHub.
library(devtools) install_github("WinzarH/LikertMakeR") # load the package library(LikertMakeR)
To synthesise a rating scale with LikertMakeR, the user must input the following parameters:
n: sample size
mean: desired mean
sd: desired standard deviation
lowerbound: desired lower bound
upperbound: desired upper bound
items: number of items making the scale - default = 1
seed: optional seed for reproducibility
LikertMakeR offers two different functions for synthesising a rating scale: lfast() and lexact()
## a four-item, five-point Likert scale <- lfast( x n = 512, mean = 2.0, sd = 1.0, lowerbound = 1, upperbound = 5, items = 4 )
## an 11-point likelihood-of-purchase scale <- lfast(256, 2, 2, 0, 10)x
lexact() attempts to produce a vector with exact first and second moments. It uses the Differential Evolution algorithm in the DEoptim package to find appropriate values within the desired constraints. The DEoptim package is described in Mullen, Ardia, Gil, Windover, & Cline (2011) doi:10.18637/jss.v040.i06.
If feasible, lexact() should produce data with moments that are correct to two decimal places. Infeasible cases occur when the requested standard deviation is too large for the combination of mean, n-items, and scale boundaries.
<- lexact( x n = 64, mean = 2.5, sd = 1.0, lowerbound = 1, upperbound = 5, items = 4 )#> #> ***** summary of DEoptim object ***** #> best member : 6 12 18 14 11 12 9 12 5 11 19 6 12 8 6 4 14 8 17 11 12 9 19 12 12 6 12 6 6 13 4 12 12 7 7 10 13 9 5 12 4 4 9 8 6 9 14 8 5 5 8 17 8 11 18 12 7 12 12 17 6 10 6 11 #> best value : 0 #> after : 32 generations #> fn evaluated : 21120 times #> *************************************
<- lexact(64, 2, 1.8, 0, 10) x #> #> ***** summary of DEoptim object ***** #> best member : 0 6 1 1 2 0 1 1 3 1 3 3 1 5 4 1 4 1 0 4 1 5 3 4 1 0 2 1 2 1 2 0 0 0 1 6 4 0 0 0 2 1 0 2 3 1 6 1 0 2 1 6 3 1 3 1 2 1 2 1 5 2 1 6 #> best value : 0.0028 #> after : 640 generations #> fn evaluated : 410240 times #> *************************************
LikertMakeR offers another function, lcor(), which rearranges the values in the columns of a data-set so that they are correlated at a specified level. It does not change the values - it swaps their positions within each column so that univariate statistics do not change, but their correlations with other vectors do.
lcor() systematically selects pairs of values in a column and swaps their places, and checks to see if this swap improves the correlation matrix. If the revised data-frame produces a correlation matrix closer to the target correlation matrix, then the swap is retained. Otherwise, the values are returned to their original places. This process is iterated across each column.
To create the desired correlated data, the user must define the following data-frames:
data: a starter data set of rating-scales
target: the target correlation matrix
Let’s generate some data: three 5-point Likert scales, each with five items.
## describe a target correlation matrix <- matrix( tgt3 c( 1.00, 0.80, 0.75, 0.80, 1.00, 0.90, 0.75, 0.90, 1.00 ),nrow = 3 )
So now we have a data-frame with desired first and second moments, and a target correlation matrix.
## apply lcor function <- lcor(mydat3, tgt3)new3
A new data frame with correlations close to our desired correlation matrix:
#> x1 x2 x3 #> x1 1.00 0.80 0.75 #> x2 0.80 1.00 0.85 #> x3 0.75 0.85 1.00
LikertMakeR is intended for synthesising & correlating rating-scale data with means, standard deviations, and correlations as close as possible to predefined parameters. If you don’t need your data to be close to exact, then other options may be faster or more flexible.
Different approaches include:
sampling from a truncated normal distribution. Data are sampled from a normal distribution, and then truncated to suit the rating-scale boundaries, and rounded to set discrete values as we see in rating scales. See Heiz (2021) for an excellent and short example using the following packages:
See also the rLikert() function from the responsesR package, Lalovic (2021), for an approach using optimal discretization and skew-normal distribution.
sampling with a predetermined probability distribution
<- 128 n sample(1:5, n, replace = TRUE, prob = c(0.1, 0.2, 0.4, 0.2, 0.1) )
Marginal model specification as in Touloumis (2016) and Grønneberg et al. (2022) using:
Grønneberg, S., Foldnes, N., & Marcoulides, K. M. (2022). covsim: An R Package for Simulating Non-Normal Data for Structural Equation Models Using Copulas. Journal of Statistical Software, 102(1), 1–45. doi:10.18637/jss.v102.i03
Heinz, A. (2021), Simulating Correlated Likert-Scale Data In R: 3 Simple Steps (blog post) https://glaswasser.github.io/simulating-correlated-likert-scale-data/
Lalovic, M. (2021), responsesR: Simulate Likert scale item responses https://github.com/markolalovic/responsesR
Mullen, K. M., Ardia, D., Gil, D. L., Windover, D., & Cline, J. (2011). DEoptim: An R Package for Global Optimization by Differential Evolution. Journal of Statistical Software, 40(6), 1–26. doi:10.18637/jss.v040.i06
Touloumis, A. (2016), Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package, The R Journal 8:2, 79-91. doi:10.32614/RJ-2016-034