Weighted regression-based norming

Sebastian Gary, Wolfgang Lenhard, Alexandra Lenhard


Automatic post-stratification through weighted regression-based norming in cNORM

Representativeness of the norm sample is essential for the estimation of valid norm scores. To achieve this, random sampling is usually applied. But even if there are no systematic biases in data collection, the resulting sample might deviate from the population composition. The cNORM R package offers functionality to integrate sampling weights into the norming process and, therefore, to reduce negative effects of non-representative norm samples on the norm score quality. For this purpose, the so-called raking (= iterative proportional fitting) was integrated in cNORM, which allows post-stratifying the used norm sample with respect to one or more stratification variables (SVs) for given population marginals of the used SVs.

Problem of non-representative norm samples

Non-representative norm samples, i.e., norm samples not representing the target population with respect to one or more relevant stratification variables (Kruskal & Mosteller, 1979), can reduce the quality of norm scores of tests. This is especially true for SVs influencing a person’s true latent ability. For example, not considering parents’ educational background in estimating norm scores for an intelligence test for children may result in a general tendency of over- or underestimation of norm scores, and, therefore, in an over- or underestimation of a child’s true intelligence (Hernandez et al., 2017). Since norm scores are often used as criterion for far-reaching decisions, like school placement or in the diagnosis of learning disabilities (Gary & Lenhard, 2021; Lenhard, Lenhard & Gary, 2019; Lenhard & Lenhard, 2021), biased norm scores can ultimately lead to disadvantages for the individuals being examined. Therefore, it’s necessary to use countermeasures, as for example sample weighting methods, to reduce the non-representativeness of norm samples.

Post-stratification through iterative proportional fitting

Raking, also called iterative proportional fitting, is a post-stratification approach targeted to enhance sample representativeness with respect to two or more stratification variables . For this purpose, sample weights are computed for every case in the norm sample based on the ratio between the proportion of the corresponding strata in the target population and the proportion in the actual norm sample (Lumley, 2011). The procedure can be described as an iterative post-stratification with respect to one variable in each step. For example, let’s assume a target population containing 49% female as well as 51% male persons, while the resulting norm sample contains 45% female and 55% male subjects. To enhance the representativeness of the norm sample with respect to the SV sex (female/male), every single female case would be weighted with \(w_{female}=\frac{49\%}{45\%}=1.09\) and every single male case with \(w_{male}=\frac{55\%}{51\%}=0.93\). For stratifying a norm sample with respect to two or more variables, for example sex(female/male) and education(low/medium/high), the before described adaptation is applied several times regarding the marginals of one variable by time iteratively. For example, if the weights are adapted with respect to the variable sex first, the weights would be adapted regarding education in the second step. Since the weights no longer represent the population with respect to variable sex after the second step, the weights are computed to SV sex in the third step respectively to education in the fourth step and so on until the corresponding raking weights are converged. Finally, the resulting raking weights respectively the weighted norm sample represents the target population with respect to the marginal proportions of the used SVs. Each case is assigned with an according weight in a way that the proportions of the strata in the norm sample aligns with the composition of the representative population.

Integration of raking weights in regression-based norming in cNORM

The integration of raking weights in cNORM is accomplished in three steps.

Step 1: Computation and standardization of raking weights

Raking weights are computed regarding the proportions of the SVs in the target population and the actual norm sample. Afterwards, the resulting raking weights are standardized by dividing every weight by the smallest resulting raking weight, i.e., the smallest weight is set to 1.0, while the ratio between one weight and each other remains the same. Consequently, underrepresented cases in the sample are weighted with a factor larger 1.0. To compute the weights, please provide a data frame with three columns to specify the population marginals. The first column specifies the stratification variable, the second the factor level of the stratification variable and the third the proportion for the representative population. The function ‘computeWeights()’ is used to retrieve the weights. The original data and the marginals have to be passed as function parameters.

Step 2: Weighted percentile estimation

Secondly, the norm sample is ranked with respect to the raking weights using weighted percentile. This step is the actual start of the further regression-based norming approach and it is automatically applied in the ‘cnorm()’ function, as soon as weights are specified.

Step 3: Regression-based norming with standardized regression weights

Finally, the standardized raking weights are used in the weighted best-subset regression to obtain an adequate norm model. While the former steps can be seen as kind of data preparation, the computation of the regression-based norm model represents the actual norming process, since the resulting regression model is used for the actual mapping between achieved raw score and assigned norm score. By using the standardized raking weights in weighted regression, an overfit of the regression model with respect to overrepresented data points should be reduced. This third step is as well applied automatically when using the ‘cnorm()’ function.


In the following, the usage of raking weights in regression-based norming with cNORM is illustrated in detail based the on a not representative norm sample for the German version of the Peabody Picture Vocabulary Test (PPVT-IV)

# Assign data to object norm.data
norm.data <- ppvt
#>      age sex migration region raw    group
#> 1 2.5971   1         0   west 120 3.160655
#> 2 2.5993   1         0   west  67 3.160655
#> 3 2.6241   1         0   west  23 3.160655
#> 4 2.8622   1         0  south  50 3.160655
#> 5 2.8764   1         0  south  44 3.160655
#> 6 2.9308   1         0   west  55 3.160655

For the post-stratification, we need population marginals for the relevant stratification variables as a data frame, with each level of each stratification variable in a row. The data frame must contain the names of the SVs (column 1), the single levels (column 2) and the corresponding proportion in the target population (column 3).

# Generate population marginals
marginals <- data.frame(var = c("sex", "sex", "migration", "migration"),
                             level = c(1,2,0,1),
                             prop = c(0.51, 0.49, 0.65, 0.35))
#>         var level prop
#> 1       sex     1 0.51
#> 2       sex     2 0.49
#> 3 migration     0 0.65
#> 4 migration     1 0.35

To caclulate raking weights, the cNORM’s ‘computeWeights()’ function is used, with the norm sample data and the population marginals as function parameters.

weights <- computeWeights(data = norm.data, population.margins = marginals)
#> Raking converged normally after 3 iterations.

Using the ‘cnorm()’ function passing the raking weights by function parameter ‘weights’, the intial weighted ranking and the actual norming process is started.

norm.model <- cnorm(raw = norm.data$raw, group = norm.data$group,
                    weights = weights)

The resulting model contains four predictors with a RMSE of 3.54212.

#> Final solution: 6 terms
#> R-Square Adj. = 0.990042
#> Final regression model: raw ~ L2 + L1A1 + L1A2 + L2A1 + L2A3 + L4A1
#> Regression function: raw ~ -92.16983481 + (0.0195782225*L2) + (1.327109958*L1A1) + (-0.03851643094*L1A2) + (-0.01236535515*L2A1) + (1.320696396e-05*L2A3) + (3.158314092e-07*L4A1)
#> Raw Score RMSE = 3.60335
#> Post stratification was applied. The weights range from 1 to 1.415 (m = 1.116, sd = 0.182).

Moreover, the percentile plot reveals no hints on model violation, like intersecting percentile curves. It reaches a high multiple R2 with only few terms.

plot(norm.model, "subset")
plot(norm.model, "norm")

Caveats and recommendation for use

We extensively simulated biased distributions and assessed, if our approach can mitigate the effects of unrepresentative samples. cNORM itself already corrects for several types of sampling eror, namely if deviations occur in specific age groups or if joint probabilities of stratification variables are unbalanced (while preserving the marginals). Weighted Continuous Norming as well works very well in most, but not all use cases. Please note the following: