This tutorial introduces mixed-effects models in R.1 Mixed-effects models are a family of regression models widely used in the language sciences to assess whether and how predictors correlate with a response when data are hierarchical — that is, when observations are grouped or nested within higher-level units such as speakers, texts, experimental items, or schools.
This tutorial is aimed at intermediate and advanced users of R. The goal is not to provide an exhaustive theoretical treatment but to show how to implement the most commonly used mixed-effects model types, perform appropriate diagnostics, and report results clearly and reproducibly.
Prerequisite Tutorials
This tutorial presumes familiarity with regression modelling. If you are new to regression or need a refresher, please work through the following tutorials first:
Explain what mixed-effects models are, when they are appropriate, and how they differ from fixed-effects models
Recognise when ignoring grouping structure can lead to completely wrong conclusions (Simpson’s Paradox)
Distinguish between random intercepts and random slopes, and between nested and crossed random effects
Apply appropriate contrast coding schemes (treatment, sum, Helmert, planned) for experimental and observational designs
Fit, evaluate, and interpret linear mixed-effects models using lme4 and nlme
Fit and interpret mixed-effects binomial logistic regression models
Choose between Poisson, quasi-Poisson, and negative binomial mixed-effects models for count data
Fit and interpret mixed-effects ordinal regression models using ordinal
Perform comprehensive model diagnostics using the performance package
Report mixed-effects model results clearly in an academic write-up
Citation
Schweinberger, Martin. 2026. Mixed-Effects Models in R. Brisbane: The Language Technology and Data Analysis Laboratory (LADAL). url: https://ladal.edu.au/tutorials/mixedmodel/mixedmodel.html (Version 2026.02.24).
Setup
Installing Packages
Code
# Run once — comment out after installationinstall.packages("Boruta")install.packages("car")install.packages("emmeans")install.packages("effects")install.packages("flextable")install.packages("ggplot2")install.packages("ggpubr")install.packages("glmulti")install.packages("Hmisc")install.packages("lme4")install.packages("MASS")install.packages("mclogit")install.packages("MuMIn")install.packages("nlme")install.packages("ordinal")install.packages("report")install.packages("rms")install.packages("robustbase")install.packages("sjPlot")install.packages("stringr")install.packages("tibble")install.packages("dplyr")install.packages("tidyr")install.packages("vcd")install.packages("gridExtra")install.packages("performance")install.packages("see") # required for performance plots
Loading Packages
Code
# set optionsoptions(stringsAsFactors =FALSE)options("scipen"=100, "digits"=12)# load packageslibrary(Boruta)library(car)library(effects)library(emmeans)library(flextable)library(ggplot2)library(ggpubr)library(glmulti)library(Hmisc)library(lme4)library(MASS)library(mclogit)library(MuMIn)library(nlme)library(ordinal)library(report)library(rms)library(robustbase)library(sjPlot)library(stringr)library(tibble)library(tidyr)library(dplyr)library(vcd)library(gridExtra)library(performance)library(see) # companion visualisation package for performance
What Are Mixed-Effects Models?
Section Overview
What you will learn: Why mixed-effects models are needed, how they differ from fixed-effects regression, what assumptions they make, and which model type to choose for a given research situation
The Problem: Hierarchical Data
Most regression models taught in introductory courses assume that all observations are independent of each other. In practice, data in the language sciences almost never meet this assumption. Observations are routinely grouped or nested:
Multiple utterances produced by the same speaker are more similar to each other than utterances produced by different speakers
Multiple sentences from the same text or document share authorial style
Multiple items in an experiment may differ in inherent difficulty, regardless of the manipulation
Multiple measurements from the same participant across trials are correlated over time
When data have this kind of grouping structure, treating observations as if they were independent underestimates standard errors, inflates Type I error rates, and produces unreliable coefficient estimates. Mixed-effects models solve this problem by explicitly modelling the grouping structure as part of the model.
A Simple Intuition
Imagine estimating how much people weigh based on their height. If your sample includes five people measured ten times each, you have 50 data points — but not 50 truly independent observations. The ten measurements from the same person are more similar to each other than they are to measurements from a different person. A fixed-effects model that ignores this will be overconfident. A mixed-effects model adds a separate intercept (and optionally a separate slope) for each person, soaking up the between-person variation and giving you honest estimates of the height–weight relationship.
Why Ignoring Speakers Can Be Deeply Misleading
A particularly striking consequence of ignoring the grouping structure of data is that the direction of an effect at the aggregate level can be the opposite of the direction within every individual speaker. This is a form of Simpson’s Paradox: a trend that appears in the pooled data reverses — or disappears entirely — when the data are examined within their natural groups.
The figure below illustrates this with simulated data that mirrors a common situation in language research. Suppose we are studying how speaking rate (words per minute) varies with utterance complexity across speakers. Looking at all data points together suggests a clear negative relationship — more complex utterances are produced faster. But when we look at each speaker individually, every single speaker shows a positive relationship: their speech rate increases with complexity.
Code
set.seed(2024)# Simulate data: 6 speakers, each with their own baseline rate# Within each speaker, rate INCREASES with complexity (positive slope)# But speakers with low baseline rate happen to produce more complex utterances# creating a NEGATIVE trend in the pooled datan_speakers <-6n_obs <-12# observations per speakerspeaker_baselines <-c(130, 150, 170, 190, 210, 230) # decreasing with complexity offsetcomplexity_offsets <-c(8, 6, 4, 2, -2, -4) # higher-baseline speakers get simpler itemssimdat <- purrr::map_dfr(seq_len(n_speakers), function(s) { complexity <-seq(1, n_obs) + complexity_offsets[s] +rnorm(n_obs, 0, 1) rate <- speaker_baselines[s] +3* (complexity -mean(complexity)) +rnorm(n_obs, 0, 5)data.frame(Speaker =paste0("S", s),Complexity = complexity,Rate = rate )})# Aggregate trend (ignoring speakers) — NEGATIVEp_pooled <-ggplot(simdat, aes(x = Complexity, y = Rate)) +geom_point(alpha =0.35, color ="gray40", size =1.5) +geom_smooth(method ="lm", se =TRUE, color ="firebrick", linewidth =1.2) +annotate("text", x =min(simdat$Complexity) +0.5,y =max(simdat$Rate) -5,label ="Pooled: NEGATIVE trend", hjust =0,color ="firebrick", size =3.5, fontface ="bold") +theme_bw() +labs(title ="Pooled data (speakers ignored)",subtitle ="A fixed-effects model would conclude: more complexity → slower speech",x ="Utterance complexity", y ="Speaking rate (wpm)")# Within-speaker trends — all POSITIVEp_byspeaker <-ggplot(simdat, aes(x = Complexity, y = Rate,color = Speaker, group = Speaker)) +geom_point(size =1.5, alpha =0.6) +geom_smooth(method ="lm", se =FALSE, linewidth =0.9) +annotate("text", x =min(simdat$Complexity) +0.3,y =max(simdat$Rate) -5,label ="Within each speaker: POSITIVE trend",hjust =0, color ="gray20", size =3.5, fontface ="bold") +theme_bw() +theme(legend.position ="right") +labs(title ="By-speaker data (mixed-effects view)",subtitle ="Every speaker shows: more complexity → faster speech",x ="Utterance complexity", y ="Speaking rate (wpm)")ggpubr::ggarrange(p_pooled, p_byspeaker, ncol =2, widths =c(1, 1.2))
The Simpson’s Paradox Lesson
The pooled regression line tells a completely false story. The apparent negative trend in the left panel arises purely because speakers with higher baseline speaking rates happened to be assigned (or self-selected into) utterances of lower complexity — a between-speaker confound that pollutes the aggregate estimate.
A mixed-effects model with a random intercept for Speaker separates the within-speaker effect of complexity (the true relationship of interest) from the between-speaker variation in baseline rate, recovering the correct positive slope.
This is not a contrived pathological case. It happens routinely in:
Corpus data: high-frequency words tend to be shorter — but within each word frequency band, longer words are used in more predictable contexts, confounding length–predictability correlations
Reaction time experiments: slower participants may naturally be tested on harder items
Longitudinal data: earlier time-points in a study tend to involve different conditions than later time-points, creating spurious temporal trends
Always model the grouping structure. A fixed-effects model applied to grouped data may not just be imprecise — it may point in entirely the wrong direction.
Fixed Effects vs. Random Effects
Mixed-effects models are called mixed because they contain both fixed effects and random effects:
Fixed effects are the predictors you are directly interested in — the variables whose coefficients you want to estimate and interpret. They are fixed in the sense that, if you were to repeat the study, you would use the same levels (e.g. Male vs. Female, Primed vs. Unprimed). Fixed effects are the same as the predictors in an ordinary regression model.
Random effects represent the grouping structure of the data. They are random in the sense that the levels observed in your data are treated as a random sample from a larger population of possible levels. Common random effects in linguistics include:
Examples of random effects in language research
Random effect
Why it is random
Speaker / participant
The 30 speakers in your study are a sample from all possible speakers
Text / document
The 50 texts in your corpus are a sample from all possible texts
Word / lexical item
The 200 words tested are a sample from the mental lexicon
Experimental item
The 40 sentences in your experiment are a sample from all possible sentences
School / class
The 20 classes in your study are a sample from all possible classes
Key Rules for Random Effects
Random effects must be categorical (nominal or ordinal with many levels). They cannot be continuous.
Random effects should have many levels. A useful rule of thumb: use a variable as a fixed effect if it has fewer than 5 levels; consider it as a random effect if it has more than 10 levels (Zuur, Hilbe, and Ieno 2013). Variables with 5–10 levels can go either way, but theoretical justification (random sample, nestedness) should guide the decision.
Random effects represent a random sample. If you would expect the levels to change with every new sample you drew, it is a random effect. If the levels would always be the same (e.g. Male/Female), it is a fixed effect.
Random effects absorb residual grouping variance rather than being of direct inferential interest.
When to Use Mixed-Effects Models
Use a mixed-effects model whenever:
The same participant, speaker, or item contributes more than one observation to the data (repeated measures)
Observations are nested within higher-level units (students within classes, words within texts)
You want to generalize beyond the specific levels in your data (to all possible speakers, not just the 30 in your sample)
A fixed-effects model produces a singular fit warning suggesting random variance is being absorbed into fixed effects
Do not use a mixed-effects model simply to control for a nuisance variable that has only a few levels (e.g. three experimental conditions). Use fixed effects for that purpose.
Assumptions
Mixed-effects models share the core assumptions of their fixed-effects counterparts but add a few specific to the random effects structure:
Shared with fixed-effects regression: - The relationship between predictors and response is correctly specified (correct functional form) - Residuals are independent within groups (after accounting for the random effects) - Homoscedasticity (for linear models): residual variance is constant - No severe multicollinearity among fixed-effects predictors
Specific to mixed-effects models: - Random effects are normally distributed with mean zero. This applies to both random intercepts and random slopes. - Random effects and fixed-effects predictors are uncorrelated (an assumption that can be violated; see Mundlak correction for remedies) - Sufficient sample size at each level. There is no universally agreed minimum, but very small numbers of observations per random effect level (e.g. fewer than 5) can make random effect variance estimates unreliable.
How Large Does My Sample Need to Be?
Bell, Ferron, and Kromrey (2008) showed in simulation that small numbers of observations per random-effect level (even as few as one) do not cause serious bias in fixed-effects coefficients, as long as the total number of random-effect levels is reasonably large (500 or more). However, random-effect variance estimates themselves become unreliable with very few observations per level. As a practical guideline, aim for at least 5–10 observations per random-effect level for reliable variance estimates, and at least 20–30 random-effect levels for the random-effect structure to be worthwhile.
Which Model Type?
The choice of mixed-effects model type mirrors the choice in fixed-effects regression and depends on the type of dependent variable:
Choosing a mixed-effects model type by dependent variable
Dependent variable
Model type
Key function
Continuous (numeric)
Linear mixed-effects
lmer() in lme4; lme() in nlme
Binary (0/1, yes/no)
Logistic mixed-effects
glmer(..., family = binomial)
Count (non-negative integers)
Poisson / NB mixed-effects
glmer(..., family = poisson); glmer.nb()
Ordered categorical (Likert)
Ordinal mixed-effects
clmm() in ordinal
Unordered categorical (3+ levels)
Multinomial mixed-effects
mblogit() in mclogit
✎ Check Your Understanding — Question 1
Which of the following is the most important reason to use a mixed-effects model rather than a fixed-effects model?
Mixed-effects models always produce lower AIC values
Mixed-effects models account for the non-independence of observations grouped within higher-level units
Mixed-effects models do not require normally distributed residuals
Mixed-effects models can handle more predictors than fixed-effects models
Answer
b) Mixed-effects models account for the non-independence of observations grouped within higher-level units
This is the core motivation for mixed-effects models. When multiple observations come from the same speaker, text, or participant, they are correlated — a fixed-effects model treats them as independent, underestimates standard errors, and inflates Type I error. Mixed-effects models absorb this grouping structure through random effects, producing honest standard errors and reliable inferential tests. The other options are all incorrect: mixed-effects models do not always have lower AIC (simpler fixed-effects models can win), they still assume normally distributed random effects, and there is no advantage in the number of predictors they can accommodate.
Random Effects Structure
Section Overview
What you will learn: The difference between random intercepts and random slopes, and between nested and crossed random effects, with linguistic examples of each
Random Intercepts vs. Random Slopes
The random effects structure of a mixed-effects model can take several forms. Understanding the distinction is essential for correctly specifying your model.
Random Intercepts
A random intercept model allows each level of the grouping variable to have its own intercept — its own baseline level of the response — while sharing a common slope across all groups. This is the most commonly used random effects structure.
In lme4 notation, a random intercept for Speaker is written as (1 | Speaker). The 1 represents the intercept.
Example: You are studying how speech rate (words per minute) varies with utterance length across speakers. A random intercept model gives each speaker their own baseline speech rate while estimating a single common effect of utterance length:
SpeechRate ~ UtteranceLength + (1 | Speaker)
Random Slopes
A random slope model allows the effect of a predictor to vary across levels of the grouping variable — each group has its own slope for that predictor. Random slopes can be added with or without random intercepts, but in practice they are almost always combined with random intercepts.
In lme4 notation, a random slope for UtteranceLength by Speaker is written as (1 + UtteranceLength | Speaker) or equivalently (UtteranceLength | Speaker).
Example: You suspect that the effect of utterance length on speech rate may be stronger for some speakers than others — some speakers slow down substantially for long utterances, others do not. A random slope model allows this:
The orange line in each panel shows the overall (population-level) regression line. In the random-intercept panel, the group lines are parallel — they share the same slope but differ in their starting point. In the random-intercepts-and-slopes panel, the lines are no longer parallel — both the intercept and the direction of the effect differ by group.
Which Random Effects Structure Should I Use?
Winter (2019, 241–44) provides an excellent practical guide. The short version:
Start with the maximal random effects structure justified by the design — that is, include random slopes for every within-group predictor (barr2013random?)
If the maximal model fails to converge or produces a singular fit, simplify by removing random slope correlations first (use (1 + x || Group) instead of (1 + x | Group)), then remove random slopes starting with those explaining the least variance
Always justify your final random effects structure explicitly when reporting results
Nested vs. Crossed Random Effects
A second important distinction is whether random effects are nested or crossed.
Nested Random Effects
Random effects are nested when the levels of one grouping variable are entirely contained within levels of another. For example, students are nested within classes — no student belongs to more than one class.
In lme4 notation, students nested within classes is written as (1 | Class/Student) or equivalently (1 | Class) + (1 | Class:Student).
Linguistic example: Words are nested within texts, which are nested within authors:
FrequencyEffect ~ WordLength + (1 | Author/Text)
Crossed Random Effects
Random effects are crossed when every level of one grouping variable appears in combination with every (or most) levels of another. In a typical psycholinguistic experiment, every participant sees every item — participants and items are fully crossed.
In lme4 notation, crossed random effects for participants and items are written as two separate terms:
RT ~ Condition + (1 | Participant) + (1 | Item)
A Common Mistake
Many researchers who should be fitting crossed random effects for participants and items mistakenly aggregate over items and fit only a by-participant random effect (or vice versa). (clark1973language?) famously demonstrated that ignoring item-level random effects inflates Type I error substantially. Always fit random effects for both participants and items in psycholinguistic experiments.
Summary of random effects structure notation in lme4:
Random effects structure notation in lme4
Structure
Formula
Example
Random intercept for G
(1 \| G)
(1 \| Speaker)
Random slope for X by G (with correlation)
(1 + X \| G)
(1 + Age \| Speaker)
Random slope for X by G (no correlation)
(1 + X \|\| G)
(1 + Age \|\| Speaker)
Nested: levels of G2 within G1
(1 \| G1/G2)
(1 \| Author/Text)
Crossed: separate random effects
(1 \| G1) + (1 \| G2)
(1 \| Participant) + (1 \| Item)
Manual Contrast Coding
Why This Section Matters
Contrast coding determines how categorical predictors are parameterised — that is, what the model intercept and each coefficient actually represent. R applies treatment (dummy) contrasts by default, which is appropriate for many observational studies but can be misleading or suboptimal in experimental designs. Choosing the right contrast scheme before fitting your model is not optional: it affects the interpretation of every coefficient, including interactions.
When you include a categorical fixed effect in a mixed-effects model, R converts it into one or more numeric columns behind the scenes — the contrast matrix. The default in R is treatment coding (also called dummy coding), where one level is chosen as the reference and all other levels are compared to it. But this is only one option among several, each with different interpretations.
Treatment (Dummy) Coding
Default in R. One level is the reference (intercept = mean of the reference group); each coefficient is the difference between that level and the reference.
Code
# Default treatment contrasts for a 3-level factorfactor_example <-factor(c("A", "B", "C"))contr.treatment(levels(factor_example))
B C
A 0 0
B 1 0
C 0 1
The intercept estimates the mean of level A; the coefficient for B is the A→B difference; the coefficient for C is the A→C difference.
Best for: observational studies with a natural reference category (e.g. a control group, native speakers vs. L2 learners).
Sum (Deviation) Coding
Each coefficient represents the deviation of a group from the grand mean. The intercept estimates the grand mean (unweighted). The last level is not explicitly estimated but is the negative sum of all other deviations.
Code
contr.sum(3) # for a 3-level factor
[,1] [,2]
1 1 0
2 0 1
3 -1 -1
Best for: experimental designs where no single level is a natural baseline; allows interpretation of main effects independently of arbitrary reference level choices. Particularly useful for interactions: with sum coding, the coefficient for a main effect represents its effect averaged over the other factor’s levels, not just at the reference level.
Sum Coding and Interactions
When a model includes an interaction between two factors, the interpretation of the main-effect coefficients changes depending on the contrast scheme:
Treatment coding: main-effect coefficients represent effects at the reference level of the other factor — rarely the quantity of interest
Sum coding: main-effect coefficients represent effects averaged across levels of the other factor — usually more informative
For factorial experimental designs, sum coding is strongly preferred.
Helmert Coding
Each level is compared to the mean of all subsequent levels. Useful when the factor has a natural ordering (e.g. proficiency levels: Beginner, Intermediate, Advanced) and you want to test sequential step contrasts.
For experimental studies with specific theoretical predictions, manually specified contrasts give maximum interpretive precision. Each contrast compares exactly the groups you are interested in, and the coefficients directly answer your research questions.
Example: Suppose you have four conditions — Baseline, Low-Priming, High-Priming, and Control — and your theoretical predictions are:
Any priming vs. Baseline
High-Priming vs. Low-Priming
Control vs. Baseline
Code
# 4-level factor: Baseline, Low, High, Control# Define three orthogonal planned contrastsmy_contrasts <-matrix(c(-3, 1, 1, 1, # Contrast 1: Baseline vs. (Low + High + Control)0, -1, 1, 0, # Contrast 2: Low vs. High (priming gradient)-1, 0, 0, 1# Contrast 3: Baseline vs. Control ),ncol =3)colnames(my_contrasts) <-c("BasevOther", "LowvHigh", "BasevControl")rownames(my_contrasts) <-c("Baseline", "Low", "High", "Control")# Check orthogonality: all pairwise dot products should be 0t(my_contrasts) %*% my_contrasts # off-diagonal should be 0
# Apply to a factor in your data frame# (illustration — adapt column name and data frame to your own data)# mydata$Condition <- factor(mydata$Condition,# levels = c("Baseline", "Low", "High", "Control"))# contrasts(mydata$Condition) <- my_contrasts## Then fit as normal:# lmer(RT ~ Condition + (1 | Participant) + (1 | Item), data = mydata)
Rules for Valid Contrast Matrices
For a factor with \(k\) levels, you can specify at most \(k - 1\) contrasts. For contrasts to be interpretable as independent tests:
Each contrast must sum to zero across all groups (positive weights cancel negative weights)
For orthogonal contrasts: the dot product of any two contrast vectors must equal zero — this ensures each contrast tests a unique piece of variance and p-values are independent
Orthogonality is not strictly required, but correlated contrasts inflate Type I error for the correlated tests
Planned contrasts should always be specified before looking at the data. Post-hoc contrast selection based on observed patterns requires appropriate correction (e.g. Tukey HSD via emmeans).
Applying Contrasts Before Model Fitting
Code
# General workflow for applying contrasts in any mixed-effects model# Step 1: Create factor with explicitly ordered levels# mydata$Condition <- factor(mydata$Condition,# levels = c("Reference", "LevelB", "LevelC"))# Step 2: Apply contrast matrix or built-in scheme# contrasts(mydata$Condition) <- contr.sum(3) # sum coding# contrasts(mydata$Condition) <- contr.helmert(3) # helmert coding# contrasts(mydata$Condition) <- my_contrasts # custom planned contrasts# Step 3: Verify what was applied# contrasts(mydata$Condition)# Step 4: Fit model — contrasts are applied automatically# m1 <- lmer(Response ~ Condition + (1 + Condition | Participant) + (1 | Item),# data = mydata, REML = TRUE)
Global vs. Local Contrast Setting
R allows you to change contrasts globally using options(contrasts = c("contr.sum", "contr.poly")). This applies sum coding to all unordered factors and polynomial coding to ordered factors for the rest of the session.
A safer approach is to set contrasts locally on individual factor columns using contrasts(mydata$Factor) <- .... This makes your contrast choices explicit and reproducible, and avoids unintended effects on other models fitted in the same session.
In a study where 40 participants each respond to 20 experimental items (every participant sees every item), which random effects structure is most appropriate?
(1 | Participant) — random intercept for participant only
(1 | Item) — random intercept for item only
(1 | Participant) + (1 | Item) — crossed random intercepts for participant and item
(1 | Participant/Item) — items nested within participants
Answer
c) (1 | Participant) + (1 | Item) — crossed random intercepts for participant and item
Because every participant sees every item, participants and items are crossed, not nested. Crossed random effects are specified as two separate terms in the formula. Option (a) ignores item-level variance and inflates Type I error (the “language as fixed effect” fallacy described by Clark 1973). Option (b) ignores participant-level variance. Option (d) would be correct if each item belonged to only one participant (nested design), which is not the case here.
:::
Linear Mixed-Effects Regression
Section Overview
What you will learn: How to fit, evaluate, and report a linear mixed-effects model using the preposition frequency data; includes random effect selection, fixed-effect model building, model diagnostics, effect size extraction, and a complete write-up template
Introduction
Linear mixed-effects models extend ordinary multiple linear regression to hierarchical data where the response variable is continuous and approximately normally distributed. The formal representation of a model with random intercepts is:
\[f_{(x)} = \alpha_i + \beta x + \epsilon\]
where \(\alpha_i\) is a group-specific intercept for group \(i\) (the population intercept plus the random deviation for that group), \(\beta\) is the fixed-effects slope, and \(\epsilon\) is the residual error. When the model also has random slopes:
\[f_{(x)} = \alpha_i + \beta_i x + \epsilon\]
where both the intercept \(\alpha_i\) and the slope \(\beta_i\) vary by group.
Advantages of linear mixed-effects models over simpler alternatives:
Multivariate: test the effect of several predictors simultaneously while controlling for all others
Handle dependence: explicitly model within-speaker or within-text correlation rather than ignoring it
Rich diagnostics: enable checking of multicollinearity, homoscedasticity, and normality of residuals
Generalisability: random effects allow generalisation beyond the specific speakers/texts in your sample
Disadvantages:
Prone to high Type II error (\(\beta\)-errors) in small samples (Johnson 2009)
Require relatively large data sets for stable estimates
Model selection and random effects specification require careful thought
Example: Preposition Use Across Time by Genre
We use a data set containing the relative frequency of prepositions per 1,000 words in English texts written between 1150 and 1913, across multiple genres and regions. The research question is: does the frequency of prepositions change systematically over time, and does this change differ by genre?
The data set contains: Date (year of composition), Genre, Text (text name), Prepositions (relative frequency per 1,000 words), and Region.
Exploratory Visualisation
Code
p1 <-ggplot(lmmdata, aes(x = Date, y = Prepositions)) +geom_point(alpha = .4) +geom_smooth(method ="lm", se =TRUE, color ="red", linetype ="dashed") +theme_bw() +labs(y ="Prepositions per 1,000 words", x ="Year of composition",title ="Preposition frequency over time")p2 <-ggplot(lmmdata, aes(x =reorder(Genre, -Prepositions), y = Prepositions)) +geom_boxplot(fill ="gray90") +theme_bw() +theme(axis.text.x =element_text(angle =90, hjust =1)) +labs(x ="Genre", y ="Prepositions per 1,000 words",title ="Preposition frequency by genre")p3 <-ggplot(lmmdata, aes(Prepositions)) +geom_histogram(fill ="gray70", color ="white", bins =20) +theme_bw() +labs(y ="Count", x ="Frequency (prepositions)",title ="Distribution of preposition frequency")gridExtra::grid.arrange(grobs =list(p1, p2, p3),widths =c(1, 1),layout_matrix =rbind(c(1,1), c(2,3)))
The scatter plot suggests a modest upward trend over time. The boxplots reveal substantial genre differences — some genres consistently use more prepositions than others. The histogram confirms that preposition frequency is approximately normally distributed, supporting the use of a linear model.
Code
ggplot(lmmdata, aes(Date, Prepositions)) +geom_point(alpha = .4) +facet_wrap(~ Genre, nrow =4) +geom_smooth(method ="lm", se =TRUE) +theme_bw() +labs(x ="Year of composition", y ="Prepositions per 1,000 words") +coord_cartesian(ylim =c(0, 220))
The faceted plot shows that the temporal trend differs substantially across genres — this is an argument for including Genre as a random effect (it absorbs genre-level baseline differences) while estimating a single population-level temporal trend.
Centering the Date Variable
Centering numeric predictors (subtracting the mean) is strongly recommended before fitting mixed-effects models. Without centering, the model intercept is the predicted value at year 0, which is meaningless for this data. After centering, the intercept represents the predicted value at the mean year in the sample, which is interpretable.
Code
lmmdata$DateUnscaled <- lmmdata$Datelmmdata$Date <-scale(lmmdata$Date, scale =FALSE) # center but do not scalecat("Mean of original Date:", round(mean(lmmdata$DateUnscaled), 1), "\n")
Mean of original Date: 1626.1
Code
cat("Mean of centered Date:", round(mean(lmmdata$Date), 6), "\n")
Mean of centered Date: 0
Centering vs. Scaling
Centering (subtracting the mean) makes the intercept interpretable — it is now the predicted response at the mean value of the predictor rather than at zero.
Scaling (dividing by the standard deviation, in addition to centering) makes coefficients comparable across predictors measured on different scales. Use scale(x, scale = TRUE) for this. Scaling is particularly useful when you want to compare the relative importance of predictors or when predictors are on very different measurement scales.
For this example we center only, as the date variable is on a natural scale (years) and we want to interpret the coefficient in those units.
Step 1: Testing Whether Random Effects Are Warranted
Before adding any fixed effects, we test whether including a random effect structure is statistically justified. We compare an intercept-only fixed-effects model (glm) with an intercept-only mixed-effects model that includes Genre as a random intercept (lmer).
Code
# intercept-only fixed-effects baselinem0.glm <-glm(Prepositions ~1, family = gaussian, data = lmmdata)# intercept-only mixed-effects baseline (Genre as random intercept)m0.lmer <-lmer(Prepositions ~1+ (1| Genre), REML =TRUE, data = lmmdata)
Code
AIC(logLik(m0.glm))
[1] 4718.19031114
Code
AIC(logLik(m0.lmer))
[1] 4497.77554693
The AIC of the mixed-effects model is substantially lower than the AIC of the fixed-effects model, confirming that adding Genre as a random effect is warranted.
Step 2: Choosing the Random Effects Structure
Now we compare two candidate random effects structures: random intercepts only vs. random intercepts + random slopes for Date. We use REML = TRUE for this comparison, as recommended when comparing random effects structures (Pinheiro and Bates 2000; Winter 2019).
Code
ma.lmer <-lmer(Prepositions ~ Date + (1| Genre), REML =TRUE, data = lmmdata)mb.lmer <-lmer(Prepositions ~ Date + (1+ Date | Genre), REML =TRUE, data = lmmdata)anova(ma.lmer, mb.lmer, test ="Chisq", refit =FALSE)
The model with random slopes for Date by Genre fits significantly better (lower AIC, significant χ² test). In a real analysis we would proceed with the random-intercepts-and-slopes model. For clarity of exposition in this tutorial, we continue with the simpler random-intercepts-only model (ma.lmer).
REML vs. ML for Model Comparison
Use REML (REML = TRUE) when comparing models that differ only in their random effects structure (same fixed effects, different random effects).
Use ML (REML = FALSE) when comparing models that differ in their fixed effects structure. This is because REML estimates are conditioned on the fixed effects, making REML-based likelihood ratios invalid for fixed-effects comparisons.
When using anova() to compare lmer models with different fixed effects, set refit = TRUE (the default) so lme4 automatically refits using ML before comparing.
Step 3: Fixed-Effects Model Building
With the random effects structure decided, we now test which fixed effects to include. We compare models using likelihood ratio tests (anova() with refit = TRUE so models are refitted with ML).
Code
# add Date as fixed effectm1.lmer <-lmer(Prepositions ~ (1| Genre) + Date, REML =TRUE, data = lmmdata)anova(m1.lmer, m0.lmer, test ="Chi")
Three indicators show that Region should not be included: AIC does not decrease, BIC increases, and p > .05. We check whether Region is involved in a significant interaction before finalising the model.
Code
m3.lmer <-update(m1.lmer, .~. + Region * Date)car::vif(m3.lmer)
Date Region Date:Region
1.96923042276 1.20324697637 1.78000887977
The interaction is also non-significant. Our final minimal adequate model contains only Date as a fixed effect.
Code
summary(m1.lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: Prepositions ~ (1 | Genre) + Date
Data: lmmdata
REML criterion at convergence: 4491.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.734915441 -0.657038004 0.005865025 0.661298615 3.596659863
Random effects:
Groups Name Variance Std.Dev.
Genre (Intercept) 159.021120 12.6103576
Residual 228.764179 15.1249522
Number of obs: 537, groups: Genre, 16
Fixed effects:
Estimate Std. Error t value
(Intercept) 133.88516211469 3.24749296248 41.22724
Date 0.01894493515 0.00632363682 2.99589
Correlation of Fixed Effects:
(Intr)
Date 0.005
Step 4: Model Diagnostics with performance
The performance package provides a comprehensive, one-call diagnostic suite for mixed-effects models. check_model() produces a panel of six diagnostic plots: linearity of residuals, homoscedasticity, influential observations (Cook’s distance), normality of residuals, normality of random effects, and multicollinearity. This replaces the need to manually call multiple base-R plotting functions.
Code
# Comprehensive model diagnostics in one callperformance::check_model(m1.lmer)
The panels are interpreted as follows:
Linearity (top-left): The reference line should be approximately flat and horizontal — systematic curvature indicates a misspecified functional form
Homoscedasticity (top-right): Points should be evenly spread across fitted values with no funnel shape — a widening spread indicates heteroscedasticity
Influential observations (middle-left): Points with Cook’s distance > 0.5 (marked with a dashed line) may be unduly influencing the model
Normality of residuals (middle-right): Points should follow the diagonal line closely
Normality of random effects (bottom-left): The random-effect BLUPs (Best Linear Unbiased Predictors) should follow the diagonal — deviations indicate departures from the normality assumption for random effects
Some genres show more variability than others (particularly Letters). We test formally for heteroscedasticity across genres using Levene’s test as a supplementary check.
Code
car::leveneTest(lmmdata$Prepositions, lmmdata$Genre, center = mean)
Levene's Test for Homogeneity of Variance (center = mean)
Df F value Pr(>F)
group 15 1.74289 0.039906 *
521
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Levene’s Test: Caveats
Levene’s test for homogeneity of variance has well-known problems: it lacks power with small samples (too lax) and is overly sensitive with large samples (too strict). Use performance::check_heteroscedasticity() and check_model() as the primary tools, and treat Levene’s test as a supplementary check only. Pinheiro and Bates (2000, 175–80) provide more nuanced guidance.
Levene’s test confirms heteroscedasticity across genres. We therefore switch to a weighted model using the lme function from nlme, which supports observation-level weights.
# confirm that weighted model outperforms intercept-only baselinem0.lme <-lme(Prepositions ~1,random =~1| Genre,data = lmmdata,method ="ML",weights =varIdent(form =~1| Genre))anova(m0.lme, m5.lme)
Model df AIC BIC logLik Test L.Ratio
m0.lme 1 18 4496.28563020 4573.43359590 -2230.14281510
m5.lme 2 19 4485.84955689 4567.28352069 -2223.92477845 1 vs 2 12.4360733078
p-value
m0.lme
m5.lme 0.0004
Code
intervals(m5.lme, which ="fixed")
Approximate 95% confidence intervals
Fixed effects:
lower est. upper
(Intercept) 127.7983408437279 133.9640139919199 140.1296871401120
Date 0.0110458012653 0.0217418124276 0.0324378235899
Step 6: Effect Sizes and Visualisation
We extract marginal and conditional R² values using MuMIn::r.squaredGLMM(). The marginal R² represents the variance explained by the fixed effects alone; the conditional R² represents the variance explained by the full model (fixed + random effects) (Bartoń 2020).
Code
MuMIn::r.squaredGLMM(m1.lmer)
R2m R2c
[1,] 0.0121971160211 0.417270545308
We display the model summary table using sjPlot::tab_model().
Code
sjPlot::tab_model(m5.lme)
Prepositions
Predictors
Estimates
CI
p
(Intercept)
133.96
127.80 – 140.13
<0.001
Date
0.02
0.01 – 0.03
<0.001
Random Effects
σ2
205.69
τ00Genre
150.39
ICC
0.42
N Genre
16
Observations
537
Marginal R2 / Conditional R2
0.017 / 0.432
R² Values in sjPlot
The conditional R² value may be missing or incorrect in the sjPlot summary table for lme objects. Always use r.squaredGLMM() from MuMIn for reliable marginal and conditional R² values for mixed-effects models.
Code
sjPlot::plot_model(m5.lme, type ="pred", terms =c("Date")) +scale_x_continuous(name ="Year of composition",breaks =seq(-500, 300, 100),labels =seq(1150, 1950, 100)) +labs(title ="Predicted preposition frequency by year",y ="Prepositions per 1,000 words") +theme_bw()
Code
lmmdata$Predicted <-predict(m5.lme, lmmdata)ggplot(lmmdata, aes(DateUnscaled, Predicted)) +facet_wrap(~ Genre) +geom_point(aes(x = DateUnscaled, y = Prepositions), color ="gray80", size = .5) +geom_smooth(aes(y = Predicted), color ="gray20", method ="lm", se =TRUE) +theme_bw() +labs(x ="Year of composition",y ="Prepositions per 1,000 words",title ="Observed (grey) vs. predicted (line) preposition frequency by genre")
Step 7: Final Diagnostics
We run the full performance diagnostic panel on the weighted lme model. Note that check_model() supports lme objects from nlme directly.
Code
performance::check_model(m5.lme)
Code
# Model performance summary: R², ICC, RMSEperformance::model_performance(m5.lme)
The ICC from performance::icc() estimates what proportion of the total variance in the response is attributable to between-group differences (here, between genres). An ICC of .40 means 40% of variance in preposition frequency is due to genre differences, with the remaining 60% varying within genres. A high ICC reinforces the importance of including the grouping variable as a random effect.
The diagnostic panels are satisfactory: residuals form an unstructured cloud with no funnel shape (no heteroscedasticity remaining after weighting), the Q–Q plot is approximately linear, and no data points show excessively high Cook’s distance.
Reporting Results
Code
summary(m5.lme)
Linear mixed-effects model fit by maximum likelihood
Data: lmmdata
AIC BIC logLik
4485.84955689 4567.28352069 -2223.92477845
Random effects:
Formula: ~1 | Genre
(Intercept) Residual
StdDev: 12.2631808124 14.3420265787
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Genre
Parameter estimates:
Bible Biography Diary Education Fiction
1.000000000000 0.340719922131 0.869529591941 0.788861415446 0.911718858006
Handbook History Law Philosophy PrivateLetter
1.096586432734 0.978750069335 0.784977716011 0.736988751895 1.190652740883
PublicLetter Religion Science Sermon Travel
1.218929184394 0.974646323962 0.848611735175 0.970873365315 1.086235097365
TrialProceeding
1.260207123026
Fixed effects: Prepositions ~ Date
Value Std.Error DF t-value p-value
(Intercept) 133.9640139919 3.144348306618 520 42.6046992663 0.0000
Date 0.0217418124 0.005454714153 520 3.9858756695 0.0001
Correlation:
(Intr)
Date 0.004
Standardized Within-Group Residuals:
Min Q1 Med Q3
-3.3190657058362 -0.6797224358638 0.0146860053179 0.6987149873619
Max
3.1039114834876
Number of Observations: 537
Number of Groups: 16
Write-Up Template: Linear Mixed-Effects Model
A mixed-effects linear regression model was fitted to predict the frequency of prepositions per 1,000 words. Genre was included as a random effect (random intercepts). Observation-level weights (varIdent) were used to account for heteroscedasticity across genres; inclusion of weights significantly improved model fit (χ²(2) = 39.17, p = .0006). The final minimal adequate model was selected through a step-up procedure comparing nested models with likelihood ratio tests. The model significantly outperformed an intercept-only baseline (χ²(1) = 12.44, p = .0004). Date of composition was the only significant fixed-effect predictor: preposition frequency increased marginally but significantly over time (Estimate = 0.02, 95% CI [0.01, 0.03], p < .001). Neither region of composition nor a Date × Genre interaction significantly improved model fit. Marginal R² = .017; conditional R² = .432, indicating that the random effect structure (genre) accounts for the majority of the variance explained by the full model.
✎ Check Your Understanding — Question 3
A researcher fits a linear mixed-effects model and finds that the marginal R² is .03 while the conditional R² is .55. What does this pattern indicate?
The fixed effects explain most of the variance; the random effects add little
The model has a very poor fit overall
The random effects (grouping structure) account for the majority of the variance, while the fixed effects explain only a small proportion
The model violates the assumption of normally distributed residuals
Answer
c) The random effects (grouping structure) account for the majority of the variance, while the fixed effects explain only a small proportion
Marginal R² reflects the variance explained by fixed effects alone (.03 = 3%). Conditional R² reflects the variance explained by the entire model, including both fixed and random effects (.55 = 55%). The difference (.52) is the variance attributable to the random effects. This pattern — low marginal, high conditional R² — is common when grouping structure (e.g. genre, speaker, site) is a strong source of variability in the data but the specific fixed-effects predictors tested do not capture it fully. It does not indicate a poor fit or assumption violation.
Mixed-Effects Binomial Logistic Regression
Section Overview
What you will learn: How to fit and interpret a mixed-effects logistic regression model for binary outcomes; includes testing the random effect, model selection with glmulti, visualisation of effects, model fit statistics, and a complete reporting template
Introduction
Mixed-effects binomial logistic regression extends fixed-effects logistic regression to hierarchical data with a binary dependent variable (yes/no, present/absent, 0/1). Everything about fixed-effects logistic regression — the logit link, odds ratios, the interpretation of coefficients — applies here. The addition of random effects handles the non-independence of observations that come from the same speaker or item.
Just as with linear mixed-effects models, random effects in logistic models shift the intercept of the logistic curve — and optionally also its slope — for each level of the grouping variable:
Example: Discourse like in Irish English
We investigate which factors correlate with the use of final discourse like (e.g. “The weather is shite, like!”) in Irish English. The dependent variable is binary (1 = speech unit contains a final discourse like; 0 = it does not).
ggplot(mblrdata, aes(Gender, SUFlike, color = Priming)) +facet_wrap(Age ~ ConversationType) +stat_summary(fun = mean, geom ="point", size =2) +stat_summary(fun.data = mean_cl_boot, geom ="errorbar", width =0.2) +theme_bw() +theme(legend.position ="top") +labs(x ="", y ="Observed probability of discourse like",title ="Discourse like by gender, age, conversation type, and priming") +scale_color_manual(values =c("gray20", "gray70"))
Key patterns: men use discourse like more than women; primed contexts show substantially higher probability; mixed-gender conversations show slightly higher rates than same-gender conversations.
Testing the Random Effect
Code
options(contrasts =c("contr.treatment", "contr.poly"))mblrdata.dist <-datadist(mblrdata)options(datadist ="mblrdata.dist")m0.glm <-glm(SUFlike ~1, family = binomial, data = mblrdata)m0.glmer <-glmer(SUFlike ~ (1| ID), data = mblrdata, family = binomial)
Code
AIC(logLik(m0.glmer)); AIC(logLik(m0.glm))
[1] 1828.49227105
[1] 1838.17334856
Code
# Likelihood ratio testnull.id <--2*logLik(m0.glm) +2*logLik(m0.glmer)pchisq(as.numeric(null.id), df =1, lower.tail =FALSE)
[1] 0.000631389565878
Both the AIC reduction and the likelihood ratio test (p < .05) confirm that including speaker (ID) as a random intercept is warranted.
Model Selection
We use glmulti to search all possible models (main effects and two-way interactions) and select the model with the lowest BIC.
The final model significantly outperforms the random-effects-only baseline.
Visualising Effects
Code
plot_model(mlr.glmer, type ="pred",terms =c("Gender", "Priming", "ConversationType")) +theme_bw() +labs(title ="Predicted probability of discourse like",y ="Probability of discourse like")
We use performance::check_model() to run the full diagnostic suite. For logistic mixed-effects models, the panel includes checks for binned residuals (the logistic equivalent of residuals vs. fitted), influential observations, normality of random effects, and multicollinearity.
For logistic regression models, raw residuals are not informative because they are binary (0 or 1 minus the predicted probability). performance::check_model() automatically uses binned residuals — the average residual within bins of predicted probabilities — which should scatter randomly around zero with approximately 95% of points falling within the error bounds (shown as dashed lines). Systematic patterns above or below the bounds indicate model misspecification.
Reporting Results
Code
report::report(mlr.glmer)
We fitted a logistic mixed model (estimated using ML and BOBYQA optimizer) to
predict SUFlike with Gender, ConversationType and Priming (formula: SUFlike ~
Gender + ConversationType + Priming). The model included ID as random effect
(formula: ~1 | ID). The model's total explanatory power is moderate
(conditional R2 = 0.15) and the part related to the fixed effects alone
(marginal R2) is of 0.13. The model's intercept, corresponding to Gender = Men,
ConversationType = MixedGender and Priming = NoPrime, is at -1.07 (95% CI
[-1.36, -0.78], p < .001). Within this model:
- The effect of Gender [Women] is statistically significant and negative (beta
= -0.64, 95% CI [-0.99, -0.30], p < .001; Std. beta = -0.64, 95% CI [-0.99,
-0.30])
- The effect of ConversationType [SameGender] is statistically significant and
negative (beta = -0.54, 95% CI [-0.83, -0.24], p < .001; Std. beta = -0.54, 95%
CI [-0.83, -0.24])
- The effect of Priming [Prime] is statistically significant and positive (beta
= 1.87, 95% CI [1.55, 2.19], p < .001; Std. beta = 1.87, 95% CI [1.55, 2.19])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation.
A mixed-effects binomial logistic regression model was fitted to predict the use of final discourse like in Irish English. Speaker (ID) was included as a random effect (random intercept). Inclusion of the random effect was justified by a likelihood ratio test comparing the mixed-effects baseline to a fixed-effects intercept-only model (p < .05) and by a lower AIC. All possible main effects and two-way interactions among Gender, Age, ConversationType, and Priming were evaluated using exhaustive model search with the glmulti package (BIC criterion). The final minimal adequate model included Gender, ConversationType, and Priming as fixed effects and significantly outperformed the random-effects baseline (χ²(3) = X, p < .001). Women used discourse like significantly less than men (β = −0.64, 95% CI [−0.99, −0.30], p < .001). Speakers in same-gender conversations used discourse like significantly less than in mixed-gender conversations (β = −0.54, 95% CI [−0.83, −0.24], p < .001). Priming was the strongest predictor: primed speech units showed substantially higher rates of discourse like (β = 1.87, 95% CI [1.55, 2.19], p < .001). Marginal R² = .13; conditional R² = .15.
✎ Check Your Understanding — Question 4
A mixed-effects logistic regression model predicting a binary linguistic feature gives a marginal R² of .13 and a conditional R² of .15. What can you conclude about the role of the random effect?
The random effect (speaker) explains most of the variance in the model
The random effect adds very little explanatory power beyond the fixed effects
The model has a poor fit and should be re-specified
The fixed effects and random effect together explain 28% of variance
Answer
b) The random effect adds very little explanatory power beyond the fixed effects
The difference between conditional R² (.15) and marginal R² (.13) is only .02, meaning the random effect structure (by-speaker random intercepts) accounts for approximately 2 percentage points of additional variance. This is in contrast to the linear mixed-effects model example above, where random effects explained a much larger share. Both scenarios are valid — it simply means that individual speaker differences are not a very strong source of variability for discourse like use once the fixed-effects predictors are included. The random effects are still necessary to ensure that standard errors are not underestimated due to the non-independence of speech units within speakers.
Mixed-Effects Poisson and Negative Binomial Regression
Section Overview
What you will learn: How to choose between Poisson, quasi-Poisson, and negative binomial mixed-effects models for count data; how to detect and address overdispersion; and how to report results for each model type
Introduction
When the dependent variable is a count (a non-negative integer — number of errors, occurrences of a word, instances of a discourse feature), the appropriate model family is Poisson or one of its extensions. Poisson models assume that the mean equals the variance — a very restrictive assumption that is frequently violated in linguistic data. When the variance exceeds the mean (overdispersion), we must use a more robust alternative.
Mixed-effects count models and when to use them
Model
When to use
Key characteristic
Poisson
Count data, variance ≈ mean
Restrictive; standard starting point
Quasi-Poisson
Count data, variance > mean
Scales standard errors; coefficients unchanged
Negative binomial
Count data, variance >> mean
Adds explicit dispersion parameter; preferred by Zuur, Hilbe, and Ieno (2013)
Why Not Just Use Quasi-Poisson?
Quasi-Poisson models scale the standard errors to compensate for overdispersion, which corrects p-values. However, the coefficients themselves remain those of the Poisson model and may still be biased under severe overdispersion. Negative binomial models include an additional dispersion parameter that accommodates overdispersion in the model structure itself, making the coefficients more reliable. Zuur, Hilbe, and Ieno (2013) therefore recommend negative binomial over quasi-Poisson when overdispersion is substantial.
Example: Filler uhm Use by Language and Alcohol Consumption
The data for this section records the number of instances of the filler uhm in two-minute conversations in four languages (English, German, Russian, Mandarin), along with the number of alcoholic shots consumed by each speaker before the conversation.
Boruta identifies Shots as the only confirmed important predictor. Language is tentative. We include Language as a random effect for theoretical reasons (speakers are nested within language communities) even though it does not reach confirmed status.
Checking for Overdispersion
Before choosing between Poisson and its alternatives, we check whether the count data is plausibly Poisson-distributed.
Code
gf <-goodfit(countdata$UHM, type ="poisson", method ="ML")plot(gf, main ="Count data vs. Poisson distribution")
Code
summary(gf)
Goodness-of-fit test for poisson distribution
X^2 df P(> X^2)
Likelihood Ratio 153.422771085 5 0.000000000000000000000000000000249336691328
The goodness-of-fit test (p < .05) and the visual inspection both indicate that the data deviate from a Poisson distribution — the variance is greater than the mean (overdispersion). We will fit all three model types for comparison.
Mixed-Effects Poisson Regression
Code
m0.glmer.p <-glmer(UHM ~1+ (1| Language), data = countdata,family = poisson,control =glmerControl(optimizer ="bobyqa"))m1.glmer.p <-update(m0.glmer.p, .~. + Shots)car::Anova(m1.glmer.p, test ="Chi")
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: poisson ( log )
Formula: UHM ~ (1 | Language) + Shots
Data: countdata
Control: glmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
1041.8 1054.5 -517.9 1035.8 497
Scaled residuals:
Min 1Q Median 3Q Max
-1.509633852 -0.666423093 -0.592950422 0.586114082 4.338639382
Random effects:
Groups Name Variance Std.Dev.
Language (Intercept) 0 0
Number of obs: 500, groups: Language, 4
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2789168850 0.0893313713 -14.31655 < 0.000000000000000222 ***
Shots 0.2336279071 0.0130347699 17.92344 < 0.000000000000000222 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
Shots -0.806
optimizer (bobyqa) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
Singular Fit Warning
The model above produces a singular fit warning (Language variance ≈ 0). This means the random effect is not explaining any additional variance, and the mixed-effects model collapses to the fixed-effects equivalent. In a real analysis, this should lead you to remove the random effect and report a fixed-effects model instead. We continue here purely for illustration.
The diagnostic plots show patterning (lower-left clustering) consistent with overdispersion. We proceed to quasi-Poisson and negative binomial alternatives.
Linear mixed-effects model fit by maximum likelihood
Data: countdata
AIC BIC logLik
NA NA NA
Random effects:
Formula: ~1 | Language
(Intercept) Residual
StdDev: 0.0000407595004855 1.07801257915
Variance function:
Structure: fixed weights
Formula: ~invwt
Fixed effects: UHM ~ Shots
Value Std.Error DF t-value p-value
(Intercept) -1.278929721022 0.0964886326643 495 -13.2547190867 0
Shots 0.233630231741 0.0140795660799 495 16.5935676153 0
Correlation:
(Intr)
Shots -0.806
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.400417640404 -0.618228066949 -0.550071160973 0.543731905024 4.024828853746
Number of Observations: 500
Number of Groups: 4
Code
# odds ratios from fixed-effects equivalentm1.glm.qp <-glm(UHM ~ Shots, data = countdata, family =quasipoisson(link ="log"))exp(coef(m1.glm.qp))
(Intercept) Shots
0.278338612235 1.263174385573
The rate ratio for Shots is approximately 1.26: each additional shot is associated with a 26% increase in the predicted number of uhm fillers.
For this data set, the quasi-Poisson model shows reduced overdispersion and no singular fit warning, making it preferable to the Poisson model. In general, when overdispersion is severe, prefer negative binomial over quasi-Poisson for more reliable coefficient estimates.
A mixed-effects quasi-Poisson regression model was fitted to predict the number of uhm fillers produced in two-minute conversations. Language was included as a random effect (random intercept). Prior to modelling, a Boruta variable selection procedure identified the number of alcoholic shots consumed as the only confirmed predictor; Language was tentative. A goodness-of-fit test confirmed that the data deviated from a Poisson distribution (overdispersion), justifying the use of a quasi-Poisson rather than a standard Poisson model. The final minimal adequate model showed that each additional shot consumed was associated with a significant 26% increase in the expected number of uhm fillers (rate ratio = 1.26, p < .001). The overdispersion parameter for the quasi-Poisson model was close to 1, indicating adequate fit after scaling.
✎ Check Your Understanding — Question 5
A researcher fits a Poisson mixed-effects model to count data and obtains an overdispersion parameter of 4.2. What is the most appropriate course of action?
Report the Poisson model — an overdispersion parameter above 1 is always acceptable
Switch to a quasi-Poisson or negative binomial model, as overdispersion of 4.2 indicates a serious violation
Remove outliers until the overdispersion parameter drops below 2
Add more fixed-effects predictors until the overdispersion is reduced
Answer
b) Switch to a quasi-Poisson or negative binomial model, as overdispersion of 4.2 indicates a serious violation
An overdispersion parameter of 1 is ideal for Poisson models (variance = mean). Values modestly above 1 (say, < 1.5) can often be tolerated, but a value of 4.2 indicates that the variance is more than four times the mean — a serious violation that will produce underestimated standard errors and inflated Type I error in a standard Poisson model. The correct response is to switch to a quasi-Poisson model (which scales standard errors) or — better — a negative binomial model (which models the overdispersion explicitly). Simply removing outliers or adding more predictors is not a principled solution to overdispersion.
Mixed-Effects Ordinal Regression
Section Overview
What you will learn: When and why to use mixed-effects ordinal regression; how to fit, diagnose, and interpret a clmm() model; and how to report ordinal mixed-effects results
Introduction
Mixed-effects ordinal regression (also called the cumulative link mixed model, CLMM) is the appropriate choice when:
The dependent variable is an ordered categorical variable with three or more levels — such as a Likert scale (Strongly Disagree → Strongly Agree), an acceptability rating, a proficiency level, or an error severity score
Observations are not independent — for example, multiple ratings from the same participant or rater, or multiple items rated by different people
The clmm() function from the ordinal package (Christensen 2019) implements the proportional odds mixed model — an extension of the fixed-effects proportional odds model (polr() from MASS) that adds random effects.
Why Not Treat Ordinal Data as Continuous?
Treating an ordinal variable (e.g. a 5-point Likert scale) as a continuous dependent variable in a linear model is tempting but problematic: it assumes equal intervals between scale points (that the difference between 1 and 2 is the same as the difference between 4 and 5), which is rarely justified. Ordinal regression makes no such assumption — it models the cumulative probability of being at or below each category without assuming equal intervals.
Example: Accentedness Ratings in a Rating Experiment
The data represent the results of a rating experiment in which raters evaluated the accentedness of speech samples from speakers of two language families (Language Family: Germanic vs. Romance) across two groups (Group: L2 learners vs. native speakers). The dependent variable (Accent) is an ordered factor with five levels.
p_mean <- ratex |>ggplot(aes(Family, AccentNumeric, color = Group)) +stat_summary(fun = mean, geom ="point", size =2) +stat_summary(fun.data = mean_cl_boot, geom ="errorbar", width =0.2) +theme_bw() +theme(legend.position ="top") +scale_color_manual(values =c("gray20", "gray70")) +labs(y ="Mean accentedness rating", title ="Mean rating by family and group")p_box <- ratex |> dplyr::group_by(Family, Rater, Group) |> dplyr::summarise(AccentMean =mean(AccentNumeric), .groups ="drop") |>ggplot(aes(Family, AccentMean, fill = Group)) +geom_boxplot() +theme_bw() +theme(legend.position ="top") +scale_fill_manual(values =c("gray50", "gray85")) +labs(y ="Mean accentedness (by rater)", title ="Rater-level ratings by family and group")gridExtra::grid.arrange(p_mean, p_box, nrow =1)
The plots suggest that Language Family affects accentedness ratings, with the direction possibly modulated by Group.
Fitting the Ordinal Mixed-Effects Model
We fit a model with Rater as a random effect (multiple ratings per rater) and Family as the fixed-effects predictor.
Code
m1.or <-clmm(Accent ~ (1| Rater) + Family, link ="logit", data = ratex)# check for complete separationifelse(min(ftable(ratex$Accent, ratex$Family)) ==0,"Warning: incomplete information (separation possible)","No separation detected — OK to proceed")
[1] "No separation detected — OK to proceed"
Code
summary(m1.or)
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: Accent ~ (1 | Rater) + Family
data: ratex
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 755 -685.13 1380.26 371(435) 3.04e-07 2.9e+05
Random effects:
Groups Name Variance Std.Dev.
Rater (Intercept) 0.00000000353434196 0.0000594503318
Number of groups: Rater 21
Coefficients:
Estimate Std. Error z value Pr(>|z|)
FamilyEqualBilingual 0.477744668 0.214314867 2.22917 0.025802
FamilyMonolingual -2.550224082 0.198852092 -12.82473 < 0.0000000000000002
FamilyEqualBilingual *
FamilyMonolingual ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Threshold coefficients:
Estimate Std. Error z value
NoAccent|StrongAccent -1.0798112480 0.1058010986 -10.20605
StrongAccent|WeakAccent 0.1060945025 0.0951357711 1.11519
The model outputs:
Threshold coefficients (1|2, 2|3, 3|4, 4|5): these are the log-odds of being at or below each category for the reference level of Family. They define the cut-points of the latent continuous response.
Fixed-effect coefficient for Family: the log-odds shift associated with moving from the reference Family level to the other level.
Interpreting Threshold Coefficients
In a cumulative link model, the threshold coefficient k|(k+1) is the log-odds of rating at or below category \(k\) (compared to above \(k\)) for the reference group. For example, 1|2 = -2.15 means the log-odds of rating ≤ 1 vs. > 1 for the reference Family is −2.15 — a low probability of rating at the bottom of the scale.
The fixed-effect coefficient for Family shifts all thresholds simultaneously by the same amount (the proportional odds assumption). A positive coefficient means the distribution shifts towards higher ratings for that Family level.
$lsmeans
Family lsmean SE df asymp.LCL asymp.UCL
DomBilingual 0.487 0.0914 Inf 0.308 0.666
EqualBilingual 0.965 0.1950 Inf 0.582 1.347
Monolingual -2.063 0.1740 Inf -2.404 -1.722
Confidence level used: 0.95
$contrasts
contrast estimate SE df z.ratio p.value
DomBilingual - EqualBilingual -0.478 0.214 Inf -2.229 0.0664
DomBilingual - Monolingual 2.550 0.199 Inf 12.825 <.0001
EqualBilingual - Monolingual 3.028 0.265 Inf 11.438 <.0001
P value adjustment: tukey method for comparing a family of 3 estimates
Model Summary Table
Code
sjPlot::tab_model(m1.or)
Accent
Predictors
Odds Ratios
CI
p
NoAccent|StrongAccent
0.34
0.28 – 0.42
<0.001
StrongAccent|WeakAccent
1.11
0.92 – 1.34
0.265
Family [EqualBilingual]
1.61
1.06 – 2.45
0.026
Family [Monolingual]
0.08
0.05 – 0.12
<0.001
Random Effects
σ2
3.29
τ00Rater
0.00
N Rater
21
Observations
755
Marginal R2 / Conditional R2
0.325 / NA
Visualising Effects
Code
plot_model(m1.or, type ="pred", terms =c("Family")) +theme_bw() +labs(title ="Predicted probability of each accentedness rating by language family",y ="Predicted probability", x ="Language family")
Model Diagnostics
For ordinal mixed-effects models, formal diagnostic plots are less well developed than for linear or logistic models. Key checks include:
Convergence: ensure the model converged (no warnings about convergence failure)
Complete separation: if any cell of the contingency table between the response and a predictor is empty, the model may produce unreliable thresholds
Proportional odds assumption: the brant test (for fixed-effects ordinal models) or a comparison of constrained vs. unconstrained models can be used to test this assumption
Code
# Rough check: compare clmm to a nominal logistic model# (if proportional odds hold, the clmm should fit comparably)cat("AIC of proportional odds model:", AIC(m1.or), "\n")
A mixed-effects cumulative link model (proportional odds; logit link) was fitted to predict accentedness ratings (5-point ordered scale). Rater was included as a random effect (random intercept) to account for the non-independence of multiple ratings from the same rater. The model was fitted using the clmm() function from the ordinal package. Language Family was a significant predictor of accentedness ratings (β = X, SE = Y, z = Z, p = .W). Post-hoc pairwise comparisons with Tukey correction confirmed significant differences between [Family A] and [Family B] (p = .V). The proportional odds assumption was not formally tested for the mixed model; visual inspection of predicted probabilities suggested the assumption was approximately satisfied.
✎ Check Your Understanding — Question 6
A researcher uses a 7-point Likert scale (1 = Strongly Disagree → 7 = Strongly Agree) as the dependent variable. Multiple participants rated the same 30 items. Which model is most appropriate?
Linear mixed-effects model with Participant as a random effect only
Binomial logistic mixed-effects model after dichotomising the scale at the midpoint
Mixed-effects cumulative link model (clmm) with Participant and Item as crossed random effects
Fixed-effects ordinal regression with polr(), ignoring the grouping structure
Answer
c) Mixed-effects cumulative link model (clmm) with Participant and Item as crossed random effects
The dependent variable is ordered categorical (a 7-point Likert scale), which calls for an ordinal model rather than a linear or binary logistic model. Because every participant rates every item, the design has crossed random effects — both participants and items should be included as random intercepts. Option (a) uses a linear model and includes only participant-level random effects, ignoring item-level variance. Option (b) throws away information by collapsing the 7-point scale to binary. Option (d) ignores the non-independence of observations, underestimates standard errors, and inflates Type I error.
Mixed-Effects Multinomial Regression
Section Overview
This section provides a brief implementation showcase of mixed-effects multinomial regression using mblogit() from the mclogit package. It is intentionally brief — for a full treatment of multinomial regression concepts and diagnostics, see the Regression Analysis tutorial.
Introduction
Mixed-effects multinomial logistic regression is used when the dependent variable is unordered categorical with three or more levels (e.g. choice of variant A vs. B vs. C in a variation study) and observations are grouped within speakers or items.
Analysis of Deviance Table
Model 1: Response ~ 1
Model 2: Response ~ Gender + Group
Resid. Df Resid. Dev Df Deviance
1 3261 2411.258235
2 3249 1545.827693 12 865.4305427
Code
sjPlot::tab_model(m1.mn)
Response: NumeralNoun
Response: QuantAdjNoun
Response: QuantNoun
Predictors
Odds Ratios
CI
p
Odds Ratios
CI
p
Odds Ratios
CI
p
(Intercept)
2.99
0.63 – 14.20
0.169
1.81
0.68 – 4.82
0.232
0.11
0.03 – 0.50
0.004
GenderMale
1.07
0.73 – 1.58
0.725
1.35
0.83 – 2.20
0.227
0.69
0.30 – 1.60
0.388
GroupGerman_Mono
0.04
0.02 – 0.07
<0.001
0.02
0.01 – 0.04
<0.001
0.09
0.02 – 0.44
0.003
GroupL2_Advanced
0.63
0.35 – 1.16
0.137
0.11
0.06 – 0.22
<0.001
2.20
0.58 – 8.42
0.248
GroupL2_Intermediate
0.31
0.17 – 0.58
<0.001
0.05
0.02 – 0.11
<0.001
0.86
0.20 – 3.67
0.841
N Item
10
Observations
1090
This Section Is a Showcase Only
The analysis above is deliberately brief and skips full diagnostics and model validation. It is intended only to demonstrate how to implement a mixed-effects multinomial model in R. Do not use it as a template for a complete research analysis — always perform and report full diagnostics.
Summary and Model Comparison
Section Overview
What you will learn: A consolidated overview of when to use each mixed-effects model type, key implementation functions, and the most important points to check and report for each
The table below summarises the five mixed-effects model types covered in this tutorial:
Summary of mixed-effects model types covered in this tutorial
Model type
Dependent variable
Key function(s)
Key diagnostic
Linear
Continuous
lmer(), lme()
performance::check_model(), ICC, Levene’s test
Logistic (binary)
Binary (0/1)
glmer(..., family=binomial)
performance::check_model(), AUC, binned residuals
Poisson
Count (underdispersed)
glmer(..., family=poisson)
Overdispersion parameter, check_model()
Quasi-Poisson / NB
Count (overdispersed)
glmmPQL(), glmer.nb()
Overdispersion parameter, diagnostic plots
Ordinal
Ordered categorical
clmm()
Convergence, separation, proportional odds
Multinomial
Unordered categorical (3+)
mblogit()
VIF, convergence
General workflow for any mixed-effects model:
Inspect the data — visualise distributions, group differences, and potential outliers before modelling
Set contrasts deliberately — use contrasts() or options(contrasts = ...) before fitting; choose treatment, sum, Helmert, or planned contrasts based on your design
Centre (and possibly scale) numeric predictors — makes intercepts interpretable and improves model stability
Test whether random effects are warranted — compare AIC of fixed- vs. mixed-effects baseline with a likelihood ratio test
Decide on the random effects structure — random intercepts only vs. random intercepts + slopes; nested vs. crossed
Build fixed effects step by step — use anova() with refit = TRUE (ML) for fixed-effects comparisons
Run diagnostics with performance — check_model() for a full panel; icc() for the intraclass correlation; model_performance() for a summary table of fit indices
Extract effect sizes — marginal and conditional R² via MuMIn::r.squaredGLMM() or performance::r2()
Report clearly — include model type, contrast scheme, random effects structure, comparison statistics, fixed-effect estimates with CIs, ICC, and R²
✎ Check Your Understanding — Question 7
A researcher is studying how often L2 speakers self-correct during oral production. Each speaker produced 30 utterances, and the dependent variable is the number of self-corrections per utterance. A Poisson model gives an overdispersion parameter of 0.8. What should the researcher do?
Switch to a negative binomial model because Poisson models should never be used
Use the Poisson model — an overdispersion parameter below 1 indicates underdispersion, which is within acceptable range for a Poisson model
Switch to a linear mixed-effects model because the count is approximately continuous
Switch to a logistic model after dichotomising (corrected vs. not corrected)
Answer
b) Use the Poisson model — an overdispersion parameter below 1 indicates underdispersion, which is within acceptable range for a Poisson model
An overdispersion parameter of 0.8 indicates underdispersion (variance slightly less than the mean) rather than overdispersion (variance greater than the mean). Mild underdispersion is generally not a problem for Poisson models and does not warrant switching to a negative binomial model (which addresses overdispersion, not underdispersion). Quasi-Poisson models also address overdispersion only. Switching to a linear model ignores the count nature of the data and risks predicting negative values. Dichotomising throws away information. The correct response is to proceed with the Poisson model and note the overdispersion parameter in the methods section.
Citation and Session Info
Schweinberger, Martin. 2026. Mixed-Effects Models in R. Brisbane: The Language Technology and Data Analysis Laboratory (LADAL). url: https://ladal.edu.au/tutorials/mixedmodel/mixedmodel.html (Version 2026.02.24).
@manual{schweinberger2026mixedmodel,
author = {Schweinberger, Martin},
title = {Mixed-Effects Models in R},
note = {https://ladal.edu.au/tutorials/mixedmodel/mixedmodel.html},
year = {2026},
organization = {The Language Technology and Data Analysis Laboratory (LADAL)},
address = {Brisbane},
edition = {2026.02.24}
}
This tutorial was revised and substantially expanded with the assistance of Claude (claude.ai), a large language model created by Anthropic. Claude was used to restructure the document into Quarto format, expand the theoretical introduction, add the new section on crossed vs. nested random effects, add the Simpson’s Paradox / misleading aggregate trend figure and its accompanying callout, add the manual contrast coding subsection, replace manual diagnostic code with the performance package throughout the linear and logistic sections, expand model diagnostics and interpretation guidance across all sections, write the new reporting templates, write the quiz questions and detailed answer explanations, and produce the model comparison summary table. All content was reviewed, edited, and approved by the author (Martin Schweinberger), who takes full responsibility for its accuracy.
Bell, Bethany A., John M. Ferron, and Jeffrey D. Kromrey. 2008. “Cluster Size in Multilevel Models: The Impact of Sparse Data Structures on Point and Interval Estimates in Two-Level Models.” In JSM Proceedings. Section on Survey Research Methods, 1122–29. American Statistical Association Alexandria, VA.
Christensen, R. H. B. 2019. “Ordinal—Regression Models for Ordinal Data.”
Johnson, Daniel Ezra. 2009. “Getting Off the GoldVarb Standard: Introducing Rbrul for Mixed-Effects Variable Rule Analysis.”Language and Linguistics Compass 3 (1): 359–83. https://doi.org/https://doi.org/10.1111/j.1749-818x.2008.00108.x.
Pinheiro, J. C., and D. M. Bates. 2000. Mixed-Effects Models in s and s-PLUS. Statistics and Computing. New York: Springer.
Zuur, Alain F., Joseph M. Hilbe, and Elena N. Ieno. 2013. A Beginner’s Guide to GLM and GLMM with. R. Beginner’s Guide Series. Newburgh, United Kingdom: Highland Statistics Ltd.
Footnotes
Sincere thanks to Stefan Thomas Gries, whose detailed feedback and corrections substantially improved earlier versions of this tutorial. All remaining errors are my own.↩︎
Source Code
---title: "Mixed-Effects Models in R"author: "Martin Schweinberger"format: html: toc: true toc-depth: 4 code-fold: show code-tools: true theme: cosmo---```{r setup, echo=FALSE, message=FALSE, warning=FALSE}options(stringsAsFactors = FALSE)options("scipen" = 100, "digits" = 12)```{ width=100% }# Introduction {#intro}This tutorial introduces mixed-effects models in R.^[Sincere thanks to Stefan Thomas Gries, whose detailed feedback and corrections substantially improved earlier versions of this tutorial. All remaining errors are my own.] Mixed-effects models are a family of regression models widely used in the language sciences to assess whether and how predictors correlate with a response when data are **hierarchical** — that is, when observations are grouped or nested within higher-level units such as speakers, texts, experimental items, or schools.{ width=15% style="float:right; padding:10px" }This tutorial is aimed at intermediate and advanced users of R. The goal is not to provide an exhaustive theoretical treatment but to show how to implement the most commonly used mixed-effects model types, perform appropriate diagnostics, and report results clearly and reproducibly.::: {.callout-note}## Prerequisite TutorialsThis tutorial presumes familiarity with regression modelling. If you are new to regression or need a refresher, please work through the following tutorials first:- [Regression Concepts](/tutorials/regression_concepts/regression_concepts.html) — key assumptions, the logic of OLS, model comparison- [Regression Analysis in R](/tutorials/regression/regression.html) — fixed-effects linear, logistic, and ordinal regression in R:::::: {.callout-note}## Learning ObjectivesBy the end of this tutorial you will be able to:1. Explain what mixed-effects models are, when they are appropriate, and how they differ from fixed-effects models2. Recognise when ignoring grouping structure can lead to completely wrong conclusions (Simpson's Paradox)3. Distinguish between random intercepts and random slopes, and between nested and crossed random effects4. Apply appropriate contrast coding schemes (treatment, sum, Helmert, planned) for experimental and observational designs5. Fit, evaluate, and interpret linear mixed-effects models using `lme4` and `nlme`6. Fit and interpret mixed-effects binomial logistic regression models7. Choose between Poisson, quasi-Poisson, and negative binomial mixed-effects models for count data8. Fit and interpret mixed-effects ordinal regression models using `ordinal`9. Perform comprehensive model diagnostics using the `performance` package10. Report mixed-effects model results clearly in an academic write-up:::::: {.callout-note}## CitationSchweinberger, Martin. 2026. *Mixed-Effects Models in R*. Brisbane: The Language Technology and Data Analysis Laboratory (LADAL). url: https://ladal.edu.au/tutorials/mixedmodel/mixedmodel.html (Version 2026.02.24).:::---# Setup {#setup}## Installing Packages {-}```{r prep0, echo=TRUE, eval=FALSE, warning=FALSE, message=FALSE}# Run once — comment out after installationinstall.packages("Boruta")install.packages("car")install.packages("emmeans")install.packages("effects")install.packages("flextable")install.packages("ggplot2")install.packages("ggpubr")install.packages("glmulti")install.packages("Hmisc")install.packages("lme4")install.packages("MASS")install.packages("mclogit")install.packages("MuMIn")install.packages("nlme")install.packages("ordinal")install.packages("report")install.packages("rms")install.packages("robustbase")install.packages("sjPlot")install.packages("stringr")install.packages("tibble")install.packages("dplyr")install.packages("tidyr")install.packages("vcd")install.packages("gridExtra")install.packages("performance")install.packages("see") # required for performance plots```## Loading Packages {-}```{r prep1, echo=TRUE, eval=TRUE, warning=FALSE, message=FALSE}# set optionsoptions(stringsAsFactors = FALSE)options("scipen" = 100, "digits" = 12)# load packageslibrary(Boruta)library(car)library(effects)library(emmeans)library(flextable)library(ggplot2)library(ggpubr)library(glmulti)library(Hmisc)library(lme4)library(MASS)library(mclogit)library(MuMIn)library(nlme)library(ordinal)library(report)library(rms)library(robustbase)library(sjPlot)library(stringr)library(tibble)library(tidyr)library(dplyr)library(vcd)library(gridExtra)library(performance)library(see) # companion visualisation package for performance```---# What Are Mixed-Effects Models? {#whatare}::: {.callout-note}## Section Overview**What you will learn:** Why mixed-effects models are needed, how they differ from fixed-effects regression, what assumptions they make, and which model type to choose for a given research situation:::## The Problem: Hierarchical Data {-}Most regression models taught in introductory courses assume that all observations are **independent** of each other. In practice, data in the language sciences almost never meet this assumption. Observations are routinely grouped or nested:- Multiple utterances produced by the **same speaker** are more similar to each other than utterances produced by different speakers- Multiple sentences from the **same text or document** share authorial style- Multiple items in an **experiment** may differ in inherent difficulty, regardless of the manipulation- Multiple measurements from the **same participant** across trials are correlated over timeWhen data have this kind of grouping structure, treating observations as if they were independent **underestimates standard errors**, inflates Type I error rates, and produces unreliable coefficient estimates. Mixed-effects models solve this problem by explicitly modelling the grouping structure as part of the model.::: {.callout-tip}## A Simple IntuitionImagine estimating how much people weigh based on their height. If your sample includes five people measured ten times each, you have 50 data points — but not 50 truly independent observations. The ten measurements from the same person are more similar to each other than they are to measurements from a different person. A fixed-effects model that ignores this will be overconfident. A mixed-effects model adds a separate intercept (and optionally a separate slope) for each person, soaking up the between-person variation and giving you honest estimates of the height–weight relationship.:::## Why Ignoring Speakers Can Be Deeply Misleading {-}A particularly striking consequence of ignoring the grouping structure of data is that the **direction** of an effect at the aggregate level can be the **opposite** of the direction within every individual speaker. This is a form of **Simpson's Paradox**: a trend that appears in the pooled data reverses — or disappears entirely — when the data are examined within their natural groups.The figure below illustrates this with simulated data that mirrors a common situation in language research. Suppose we are studying how speaking rate (words per minute) varies with utterance complexity across speakers. Looking at all data points together suggests a clear **negative** relationship — more complex utterances are produced faster. But when we look at each speaker individually, every single speaker shows a **positive** relationship: their speech rate increases with complexity.```{r simpsons_paradox, echo=TRUE, message=FALSE, warning=FALSE}set.seed(2024)# Simulate data: 6 speakers, each with their own baseline rate# Within each speaker, rate INCREASES with complexity (positive slope)# But speakers with low baseline rate happen to produce more complex utterances# creating a NEGATIVE trend in the pooled datan_speakers <- 6n_obs <- 12 # observations per speakerspeaker_baselines <- c(130, 150, 170, 190, 210, 230) # decreasing with complexity offsetcomplexity_offsets <- c(8, 6, 4, 2, -2, -4) # higher-baseline speakers get simpler itemssimdat <- purrr::map_dfr(seq_len(n_speakers), function(s) { complexity <- seq(1, n_obs) + complexity_offsets[s] + rnorm(n_obs, 0, 1) rate <- speaker_baselines[s] + 3 * (complexity - mean(complexity)) + rnorm(n_obs, 0, 5) data.frame( Speaker = paste0("S", s), Complexity = complexity, Rate = rate )})# Aggregate trend (ignoring speakers) — NEGATIVEp_pooled <- ggplot(simdat, aes(x = Complexity, y = Rate)) + geom_point(alpha = 0.35, color = "gray40", size = 1.5) + geom_smooth(method = "lm", se = TRUE, color = "firebrick", linewidth = 1.2) + annotate("text", x = min(simdat$Complexity) + 0.5, y = max(simdat$Rate) - 5, label = "Pooled: NEGATIVE trend", hjust = 0, color = "firebrick", size = 3.5, fontface = "bold") + theme_bw() + labs(title = "Pooled data (speakers ignored)", subtitle = "A fixed-effects model would conclude: more complexity → slower speech", x = "Utterance complexity", y = "Speaking rate (wpm)")# Within-speaker trends — all POSITIVEp_byspeaker <- ggplot(simdat, aes(x = Complexity, y = Rate, color = Speaker, group = Speaker)) + geom_point(size = 1.5, alpha = 0.6) + geom_smooth(method = "lm", se = FALSE, linewidth = 0.9) + annotate("text", x = min(simdat$Complexity) + 0.3, y = max(simdat$Rate) - 5, label = "Within each speaker: POSITIVE trend", hjust = 0, color = "gray20", size = 3.5, fontface = "bold") + theme_bw() + theme(legend.position = "right") + labs(title = "By-speaker data (mixed-effects view)", subtitle = "Every speaker shows: more complexity → faster speech", x = "Utterance complexity", y = "Speaking rate (wpm)")ggpubr::ggarrange(p_pooled, p_byspeaker, ncol = 2, widths = c(1, 1.2))```::: {.callout-important}## The Simpson's Paradox LessonThe pooled regression line tells a completely false story. The apparent negative trend in the left panel arises purely because speakers with **higher baseline speaking rates** happened to be assigned (or self-selected into) utterances of **lower complexity** — a between-speaker confound that pollutes the aggregate estimate.A mixed-effects model with a random intercept for Speaker separates the **within-speaker effect of complexity** (the true relationship of interest) from the **between-speaker variation in baseline rate**, recovering the correct positive slope.This is not a contrived pathological case. It happens routinely in:- **Corpus data:** high-frequency words tend to be shorter — but within each word frequency band, longer words are used in more predictable contexts, confounding length–predictability correlations- **Reaction time experiments:** slower participants may naturally be tested on harder items- **Longitudinal data:** earlier time-points in a study tend to involve different conditions than later time-points, creating spurious temporal trends**Always model the grouping structure.** A fixed-effects model applied to grouped data may not just be imprecise — it may point in entirely the wrong direction.:::## Fixed Effects vs. Random Effects {-}Mixed-effects models are called *mixed* because they contain **both fixed effects and random effects**:**Fixed effects** are the predictors you are directly interested in — the variables whose coefficients you want to estimate and interpret. They are *fixed* in the sense that, if you were to repeat the study, you would use the same levels (e.g. Male vs. Female, Primed vs. Unprimed). Fixed effects are the same as the predictors in an ordinary regression model.**Random effects** represent the grouping structure of the data. They are *random* in the sense that the levels observed in your data are treated as a **random sample** from a larger population of possible levels. Common random effects in linguistics include:| Random effect | Why it is random ||---------------|-----------------|| Speaker / participant | The 30 speakers in your study are a sample from all possible speakers || Text / document | The 50 texts in your corpus are a sample from all possible texts || Word / lexical item | The 200 words tested are a sample from the mental lexicon || Experimental item | The 40 sentences in your experiment are a sample from all possible sentences || School / class | The 20 classes in your study are a sample from all possible classes |: Examples of random effects in language research {tbl-colwidths="[35,65]"}::: {.callout-important}## Key Rules for Random Effects1. **Random effects must be categorical** (nominal or ordinal with many levels). They cannot be continuous.2. **Random effects should have many levels.** A useful rule of thumb: use a variable as a fixed effect if it has fewer than 5 levels; consider it as a random effect if it has more than 10 levels [@zuur2013beginner]. Variables with 5–10 levels can go either way, but theoretical justification (random sample, nestedness) should guide the decision.3. **Random effects represent a random sample.** If you would expect the levels to change with every new sample you drew, it is a random effect. If the levels would always be the same (e.g. Male/Female), it is a fixed effect.4. **Random effects absorb residual grouping variance** rather than being of direct inferential interest.:::## When to Use Mixed-Effects Models {-}Use a mixed-effects model whenever:- The same participant, speaker, or item contributes **more than one observation** to the data (repeated measures)- Observations are **nested** within higher-level units (students within classes, words within texts)- You want to **generalize beyond the specific levels** in your data (to all possible speakers, not just the 30 in your sample)- A fixed-effects model produces a **singular fit warning** suggesting random variance is being absorbed into fixed effectsDo **not** use a mixed-effects model simply to control for a nuisance variable that has only a few levels (e.g. three experimental conditions). Use fixed effects for that purpose.## Assumptions {-}Mixed-effects models share the core assumptions of their fixed-effects counterparts but add a few specific to the random effects structure:**Shared with fixed-effects regression:**- The relationship between predictors and response is correctly specified (correct functional form)- Residuals are independent *within* groups (after accounting for the random effects)- Homoscedasticity (for linear models): residual variance is constant- No severe multicollinearity among fixed-effects predictors**Specific to mixed-effects models:**- **Random effects are normally distributed** with mean zero. This applies to both random intercepts and random slopes.- **Random effects and fixed-effects predictors are uncorrelated** (an assumption that can be violated; see Mundlak correction for remedies)- **Sufficient sample size at each level.** There is no universally agreed minimum, but very small numbers of observations per random effect level (e.g. fewer than 5) can make random effect variance estimates unreliable.::: {.callout-note}## How Large Does My Sample Need to Be?@bell2008multilevel showed in simulation that small numbers of observations per random-effect level (even as few as one) do not cause serious bias in fixed-effects coefficients, as long as the total number of random-effect levels is reasonably large (500 or more). However, random-effect variance estimates themselves become unreliable with very few observations per level. As a practical guideline, aim for at least 5–10 observations per random-effect level for reliable variance estimates, and at least 20–30 random-effect levels for the random-effect structure to be worthwhile.:::## Which Model Type? {-}The choice of mixed-effects model type mirrors the choice in fixed-effects regression and depends on the **type of dependent variable**:| Dependent variable | Model type | Key function ||--------------------|------------|--------------|| Continuous (numeric) | Linear mixed-effects | `lmer()` in `lme4`; `lme()` in `nlme` || Binary (0/1, yes/no) | Logistic mixed-effects | `glmer(..., family = binomial)` || Count (non-negative integers) | Poisson / NB mixed-effects | `glmer(..., family = poisson)`; `glmer.nb()` || Ordered categorical (Likert) | Ordinal mixed-effects | `clmm()` in `ordinal` || Unordered categorical (3+ levels) | Multinomial mixed-effects | `mblogit()` in `mclogit` |: Choosing a mixed-effects model type by dependent variable {tbl-colwidths="[30,35,35]"}::: {.callout-note collapse="true"}## ✎ Check Your Understanding — Question 1**Which of the following is the most important reason to use a mixed-effects model rather than a fixed-effects model?**a) Mixed-effects models always produce lower AIC valuesb) Mixed-effects models account for the non-independence of observations grouped within higher-level unitsc) Mixed-effects models do not require normally distributed residualsd) Mixed-effects models can handle more predictors than fixed-effects models<details><summary>**Answer**</summary>**b) Mixed-effects models account for the non-independence of observations grouped within higher-level units**This is the core motivation for mixed-effects models. When multiple observations come from the same speaker, text, or participant, they are correlated — a fixed-effects model treats them as independent, underestimates standard errors, and inflates Type I error. Mixed-effects models absorb this grouping structure through random effects, producing honest standard errors and reliable inferential tests. The other options are all incorrect: mixed-effects models do not always have lower AIC (simpler fixed-effects models can win), they still assume normally distributed random effects, and there is no advantage in the number of predictors they can accommodate.</details>:::---# Random Effects Structure {#randomeffects}::: {.callout-note}## Section Overview**What you will learn:** The difference between random intercepts and random slopes, and between nested and crossed random effects, with linguistic examples of each:::## Random Intercepts vs. Random Slopes {-}The random effects structure of a mixed-effects model can take several forms. Understanding the distinction is essential for correctly specifying your model.### Random Intercepts {-}A **random intercept** model allows each level of the grouping variable to have its own intercept — its own baseline level of the response — while sharing a common slope across all groups. This is the most commonly used random effects structure.In `lme4` notation, a random intercept for `Speaker` is written as `(1 | Speaker)`. The `1` represents the intercept.**Example:** You are studying how speech rate (words per minute) varies with utterance length across speakers. A random intercept model gives each speaker their own baseline speech rate while estimating a single common effect of utterance length:```SpeechRate ~ UtteranceLength + (1 | Speaker)```### Random Slopes {-}A **random slope** model allows the effect of a predictor to vary across levels of the grouping variable — each group has its own slope for that predictor. Random slopes can be added with or without random intercepts, but in practice they are almost always combined with random intercepts.In `lme4` notation, a random slope for `UtteranceLength` by `Speaker` is written as `(1 + UtteranceLength | Speaker)` or equivalently `(UtteranceLength | Speaker)`.**Example:** You suspect that the effect of utterance length on speech rate may be stronger for some speakers than others — some speakers slow down substantially for long utterances, others do not. A random slope model allows this:```SpeechRate ~ UtteranceLength + (1 + UtteranceLength | Speaker)``````{r randomfig, echo=FALSE, eval=TRUE, message=FALSE, warning=FALSE}# Generate illustrative data for random intercept / slope figuresset.seed(42)Height <- c(169, 171, 164, 160, 158, 173, 166, 161, 180, 187, 170, 177, 163, 161, 157)Weight <- c(68, 67, 65, 66, 64, 80, 75, 70, 85, 92, 86, 87, 85, 82, 80)Group <- c("a","a","a","a","a","b","b","b","b","b","c","c","c","c","c")tb <- data.frame(Height, Weight, Group)m1 <- lm(Weight ~ Height + Group, data = tb)m2 <- lmer(Weight ~ Height + (1 | Group), data = tb)tb <- tb |> dplyr::mutate(PWeight = predict(m1, tb), PWeight_lme = predict(m2, tb))p_fixed <- ggplot(tb, aes(Height, Weight)) + geom_abline(intercept = summary(m2)$coefficients[1], slope = summary(m2)$coefficients[2], color = "orange", linewidth = .75) + geom_point(size = 2) + theme_bw() + ggtitle("Fixed-Effects Model\n(single intercept + single slope)")p_rint <- ggplot(tb, aes(Height, Weight)) + geom_point(size = 2, aes(shape = Group, color = Group)) + geom_abline(intercept = fixef(m2)[1], slope = fixef(m2)[2], color = "orange", linewidth = .75) + geom_smooth(method = "lm", se = FALSE, aes(x = Height, y = PWeight, color = Group), linewidth = .5) + theme_bw() + theme(legend.position = "none") + ggtitle("Random Intercepts\n(separate intercept per group;\ncommon slope)")m3 <- lmer(Weight ~ Height + (1 + Height | Group), data = tb, control = lmerControl(optimizer = "bobyqa"))tb$PWeight_rs <- predict(m3, tb)p_rslope <- ggplot(tb, aes(Height, Weight)) + geom_smooth(se = FALSE, method = "lm", linewidth = .5, aes(y = PWeight_rs, color = Group)) + geom_abline(intercept = fixef(m3)[1], slope = fixef(m3)[2], color = "orange", linewidth = .75) + geom_point(size = 2, aes(shape = Group, color = Group)) + theme_bw() + theme(legend.position = "none") + ggtitle("Random Intercepts + Random Slopes\n(separate intercept AND slope\nper group)")ggpubr::ggarrange(p_fixed, p_rint, p_rslope, ncol = 3)```The orange line in each panel shows the overall (population-level) regression line. In the random-intercept panel, the group lines are parallel — they share the same slope but differ in their starting point. In the random-intercepts-and-slopes panel, the lines are no longer parallel — both the intercept and the direction of the effect differ by group.::: {.callout-tip}## Which Random Effects Structure Should I Use?@winter2019statistics [241–244] provides an excellent practical guide. The short version:- Start with the **maximal random effects structure** justified by the design — that is, include random slopes for every within-group predictor [@barr2013random]- If the maximal model fails to converge or produces a singular fit, **simplify** by removing random slope correlations first (use `(1 + x || Group)` instead of `(1 + x | Group)`), then remove random slopes starting with those explaining the least variance- Always **justify your final random effects structure** explicitly when reporting results:::## Nested vs. Crossed Random Effects {-}A second important distinction is whether random effects are **nested** or **crossed**.### Nested Random Effects {-}Random effects are **nested** when the levels of one grouping variable are entirely contained within levels of another. For example, students are nested within classes — no student belongs to more than one class.In `lme4` notation, students nested within classes is written as `(1 | Class/Student)` or equivalently `(1 | Class) + (1 | Class:Student)`.**Linguistic example:** Words are nested within texts, which are nested within authors:```FrequencyEffect ~ WordLength + (1 | Author/Text)```### Crossed Random Effects {-}Random effects are **crossed** when every level of one grouping variable appears in combination with every (or most) levels of another. In a typical psycholinguistic experiment, every participant sees every item — participants and items are fully crossed.In `lme4` notation, crossed random effects for participants and items are written as two separate terms:```RT ~ Condition + (1 | Participant) + (1 | Item)```::: {.callout-important}## A Common MistakeMany researchers who should be fitting **crossed** random effects for participants and items mistakenly aggregate over items and fit only a by-participant random effect (or vice versa). @clark1973language famously demonstrated that ignoring item-level random effects inflates Type I error substantially. Always fit random effects for both participants **and** items in psycholinguistic experiments.:::**Summary of random effects structure notation in `lme4`:**| Structure | Formula | Example ||-----------|---------|---------|| Random intercept for `G` | `(1 \| G)` | `(1 \| Speaker)` || Random slope for `X` by `G` (with correlation) | `(1 + X \| G)` | `(1 + Age \| Speaker)` || Random slope for `X` by `G` (no correlation) | `(1 + X \|\| G)` | `(1 + Age \|\| Speaker)` || Nested: levels of `G2` within `G1` | `(1 \| G1/G2)` | `(1 \| Author/Text)` || Crossed: separate random effects | `(1 \| G1) + (1 \| G2)` | `(1 \| Participant) + (1 \| Item)` |: Random effects structure notation in lme4 {tbl-colwidths="[30,35,35]"}## Manual Contrast Coding {#contrasts}::: {.callout-note}## Why This Section MattersContrast coding determines **how categorical predictors are parameterised** — that is, what the model intercept and each coefficient actually represent. R applies treatment (dummy) contrasts by default, which is appropriate for many observational studies but can be misleading or suboptimal in **experimental designs**. Choosing the right contrast scheme before fitting your model is not optional: it affects the interpretation of every coefficient, including interactions.:::When you include a categorical fixed effect in a mixed-effects model, R converts it into one or more numeric columns behind the scenes — the contrast matrix. The default in R is **treatment coding** (also called dummy coding), where one level is chosen as the reference and all other levels are compared to it. But this is only one option among several, each with different interpretations.### Treatment (Dummy) Coding {-}**Default in R.** One level is the reference (intercept = mean of the reference group); each coefficient is the difference between that level and the reference.```{r contrast_treatment, message=FALSE, warning=FALSE}# Default treatment contrasts for a 3-level factorfactor_example <- factor(c("A", "B", "C"))contr.treatment(levels(factor_example))```The intercept estimates the mean of level A; the coefficient for B is the A→B difference; the coefficient for C is the A→C difference.**Best for:** observational studies with a natural reference category (e.g. a control group, native speakers vs. L2 learners).### Sum (Deviation) Coding {-}Each coefficient represents the deviation of a group from the **grand mean**. The intercept estimates the grand mean (unweighted). The last level is not explicitly estimated but is the negative sum of all other deviations.```{r contrast_sum, message=FALSE, warning=FALSE}contr.sum(3) # for a 3-level factor```**Best for:** experimental designs where no single level is a natural baseline; allows interpretation of main effects independently of arbitrary reference level choices. **Particularly useful for interactions**: with sum coding, the coefficient for a main effect represents its effect averaged over the other factor's levels, not just at the reference level.::: {.callout-tip}## Sum Coding and InteractionsWhen a model includes an interaction between two factors, the interpretation of the main-effect coefficients changes depending on the contrast scheme:- **Treatment coding:** main-effect coefficients represent effects at the *reference level* of the other factor — rarely the quantity of interest- **Sum coding:** main-effect coefficients represent effects *averaged across* levels of the other factor — usually more informativeFor factorial experimental designs, sum coding is strongly preferred.:::### Helmert Coding {-}Each level is compared to the **mean of all subsequent levels**. Useful when the factor has a natural ordering (e.g. proficiency levels: Beginner, Intermediate, Advanced) and you want to test sequential step contrasts.```{r contrast_helmert, message=FALSE, warning=FALSE}contr.helmert(4) # for a 4-level factor```### Custom / Planned Contrast Coding {-}For experimental studies with specific theoretical predictions, manually specified contrasts give maximum interpretive precision. Each contrast compares exactly the groups you are interested in, and the coefficients directly answer your research questions.**Example:** Suppose you have four conditions — Baseline, Low-Priming, High-Priming, and Control — and your theoretical predictions are:1. Any priming vs. Baseline2. High-Priming vs. Low-Priming3. Control vs. Baseline```{r contrast_custom, message=FALSE, warning=FALSE}# 4-level factor: Baseline, Low, High, Control# Define three orthogonal planned contrastsmy_contrasts <- matrix( c( -3, 1, 1, 1, # Contrast 1: Baseline vs. (Low + High + Control) 0, -1, 1, 0, # Contrast 2: Low vs. High (priming gradient) -1, 0, 0, 1 # Contrast 3: Baseline vs. Control ), ncol = 3)colnames(my_contrasts) <- c("BasevOther", "LowvHigh", "BasevControl")rownames(my_contrasts) <- c("Baseline", "Low", "High", "Control")# Check orthogonality: all pairwise dot products should be 0t(my_contrasts) %*% my_contrasts # off-diagonal should be 0``````{r contrast_apply, message=FALSE, warning=FALSE}# Apply to a factor in your data frame# (illustration — adapt column name and data frame to your own data)# mydata$Condition <- factor(mydata$Condition,# levels = c("Baseline", "Low", "High", "Control"))# contrasts(mydata$Condition) <- my_contrasts## Then fit as normal:# lmer(RT ~ Condition + (1 | Participant) + (1 | Item), data = mydata)```::: {.callout-important}## Rules for Valid Contrast MatricesFor a factor with $k$ levels, you can specify at most $k - 1$ contrasts. For contrasts to be interpretable as independent tests:1. **Each contrast must sum to zero** across all groups (positive weights cancel negative weights)2. For **orthogonal contrasts**: the dot product of any two contrast vectors must equal zero — this ensures each contrast tests a unique piece of variance and p-values are independent3. Orthogonality is not strictly required, but correlated contrasts inflate Type I error for the correlated testsPlanned contrasts should always be specified **before looking at the data**. Post-hoc contrast selection based on observed patterns requires appropriate correction (e.g. Tukey HSD via `emmeans`).:::### Applying Contrasts Before Model Fitting {-}```{r contrast_workflow, message=FALSE, warning=FALSE}# General workflow for applying contrasts in any mixed-effects model# Step 1: Create factor with explicitly ordered levels# mydata$Condition <- factor(mydata$Condition,# levels = c("Reference", "LevelB", "LevelC"))# Step 2: Apply contrast matrix or built-in scheme# contrasts(mydata$Condition) <- contr.sum(3) # sum coding# contrasts(mydata$Condition) <- contr.helmert(3) # helmert coding# contrasts(mydata$Condition) <- my_contrasts # custom planned contrasts# Step 3: Verify what was applied# contrasts(mydata$Condition)# Step 4: Fit model — contrasts are applied automatically# m1 <- lmer(Response ~ Condition + (1 + Condition | Participant) + (1 | Item),# data = mydata, REML = TRUE)```::: {.callout-note}## Global vs. Local Contrast SettingR allows you to change contrasts globally using `options(contrasts = c("contr.sum", "contr.poly"))`. This applies sum coding to all unordered factors and polynomial coding to ordered factors for the rest of the session.A safer approach is to set contrasts **locally on individual factor columns** using `contrasts(mydata$Factor) <- ...`. This makes your contrast choices explicit and reproducible, and avoids unintended effects on other models fitted in the same session.:::**In a study where 40 participants each respond to 20 experimental items (every participant sees every item), which random effects structure is most appropriate?**a) `(1 | Participant)` — random intercept for participant onlyb) `(1 | Item)` — random intercept for item onlyc) `(1 | Participant) + (1 | Item)` — crossed random intercepts for participant and itemd) `(1 | Participant/Item)` — items nested within participants<details><summary>**Answer**</summary>**c) `(1 | Participant) + (1 | Item)` — crossed random intercepts for participant and item**Because every participant sees every item, participants and items are *crossed*, not nested. Crossed random effects are specified as two separate terms in the formula. Option (a) ignores item-level variance and inflates Type I error (the "language as fixed effect" fallacy described by Clark 1973). Option (b) ignores participant-level variance. Option (d) would be correct if each item belonged to only one participant (nested design), which is not the case here.</details>:::---# Linear Mixed-Effects Regression {#lmm}::: {.callout-note}## Section Overview**What you will learn:** How to fit, evaluate, and report a linear mixed-effects model using the preposition frequency data; includes random effect selection, fixed-effect model building, model diagnostics, effect size extraction, and a complete write-up template:::## Introduction {-}Linear mixed-effects models extend ordinary multiple linear regression to hierarchical data where the **response variable is continuous and approximately normally distributed**. The formal representation of a model with random intercepts is:$$f_{(x)} = \alpha_i + \beta x + \epsilon$$where $\alpha_i$ is a group-specific intercept for group $i$ (the population intercept plus the random deviation for that group), $\beta$ is the fixed-effects slope, and $\epsilon$ is the residual error. When the model also has random slopes:$$f_{(x)} = \alpha_i + \beta_i x + \epsilon$$where both the intercept $\alpha_i$ and the slope $\beta_i$ vary by group.**Advantages of linear mixed-effects models** over simpler alternatives:- **Multivariate:** test the effect of several predictors simultaneously while controlling for all others- **Handle dependence:** explicitly model within-speaker or within-text correlation rather than ignoring it- **Rich diagnostics:** enable checking of multicollinearity, homoscedasticity, and normality of residuals- **Generalisability:** random effects allow generalisation beyond the specific speakers/texts in your sample**Disadvantages:**- Prone to high Type II error ($\beta$-errors) in small samples [@johnson2009getting]- Require relatively large data sets for stable estimates- Model selection and random effects specification require careful thought## Example: Preposition Use Across Time by Genre {-}We use a data set containing the relative frequency of prepositions per 1,000 words in English texts written between 1150 and 1913, across multiple genres and regions. The research question is: **does the frequency of prepositions change systematically over time, and does this change differ by genre?**### Loading and Inspecting the Data {-}```{r lmm3, message=FALSE, warning=FALSE}# load datalmmdata <- base::readRDS("tutorials/regression/data/lmd.rda", "rb") |> dplyr::mutate(Date = as.numeric(Date))``````{r lmm3b, echo=FALSE, eval=TRUE, message=FALSE, warning=FALSE}lmmdata |> as.data.frame() |> head(15) |> flextable() |> flextable::set_table_properties(width = .75, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::align_text_col(align = "center") |> flextable::set_caption(caption = "First 15 rows of the lmmdata.") |> flextable::border_outer()```The data set contains: `Date` (year of composition), `Genre`, `Text` (text name), `Prepositions` (relative frequency per 1,000 words), and `Region`.### Exploratory Visualisation {-}```{r lmm4, message=FALSE, warning=FALSE}p1 <- ggplot(lmmdata, aes(x = Date, y = Prepositions)) + geom_point(alpha = .4) + geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") + theme_bw() + labs(y = "Prepositions per 1,000 words", x = "Year of composition", title = "Preposition frequency over time")p2 <- ggplot(lmmdata, aes(x = reorder(Genre, -Prepositions), y = Prepositions)) + geom_boxplot(fill = "gray90") + theme_bw() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(x = "Genre", y = "Prepositions per 1,000 words", title = "Preposition frequency by genre")p3 <- ggplot(lmmdata, aes(Prepositions)) + geom_histogram(fill = "gray70", color = "white", bins = 20) + theme_bw() + labs(y = "Count", x = "Frequency (prepositions)", title = "Distribution of preposition frequency")gridExtra::grid.arrange(grobs = list(p1, p2, p3), widths = c(1, 1), layout_matrix = rbind(c(1,1), c(2,3)))```The scatter plot suggests a modest upward trend over time. The boxplots reveal substantial genre differences — some genres consistently use more prepositions than others. The histogram confirms that preposition frequency is approximately normally distributed, supporting the use of a linear model.```{r lmm6, message=FALSE, warning=FALSE}ggplot(lmmdata, aes(Date, Prepositions)) + geom_point(alpha = .4) + facet_wrap(~ Genre, nrow = 4) + geom_smooth(method = "lm", se = TRUE) + theme_bw() + labs(x = "Year of composition", y = "Prepositions per 1,000 words") + coord_cartesian(ylim = c(0, 220))```The faceted plot shows that the temporal trend differs substantially across genres — this is an argument for including Genre as a random effect (it absorbs genre-level baseline differences) while estimating a single population-level temporal trend.### Centering the Date Variable {-}Centering numeric predictors (subtracting the mean) is strongly recommended before fitting mixed-effects models. Without centering, the model intercept is the predicted value at year 0, which is meaningless for this data. After centering, the intercept represents the predicted value at the **mean year in the sample**, which is interpretable.```{r lmm7, message=FALSE, warning=FALSE}lmmdata$DateUnscaled <- lmmdata$Datelmmdata$Date <- scale(lmmdata$Date, scale = FALSE) # center but do not scalecat("Mean of original Date:", round(mean(lmmdata$DateUnscaled), 1), "\n")cat("Mean of centered Date:", round(mean(lmmdata$Date), 6), "\n")```::: {.callout-tip}## Centering vs. Scaling**Centering** (subtracting the mean) makes the intercept interpretable — it is now the predicted response at the mean value of the predictor rather than at zero.**Scaling** (dividing by the standard deviation, in addition to centering) makes coefficients comparable across predictors measured on different scales. Use `scale(x, scale = TRUE)` for this. Scaling is particularly useful when you want to compare the relative importance of predictors or when predictors are on very different measurement scales.For this example we center only, as the date variable is on a natural scale (years) and we want to interpret the coefficient in those units.:::### Step 1: Testing Whether Random Effects Are Warranted {-}Before adding any fixed effects, we test whether including a random effect structure is statistically justified. We compare an intercept-only fixed-effects model (`glm`) with an intercept-only mixed-effects model that includes Genre as a random intercept (`lmer`).```{r lmm8, message=FALSE, warning=FALSE}# intercept-only fixed-effects baselinem0.glm <- glm(Prepositions ~ 1, family = gaussian, data = lmmdata)# intercept-only mixed-effects baseline (Genre as random intercept)m0.lmer <- lmer(Prepositions ~ 1 + (1 | Genre), REML = TRUE, data = lmmdata)``````{r lmm9a, message=FALSE, warning=FALSE}AIC(logLik(m0.glm))AIC(logLik(m0.lmer))```The AIC of the mixed-effects model is substantially lower than the AIC of the fixed-effects model, confirming that adding Genre as a random effect is warranted.### Step 2: Choosing the Random Effects Structure {-}Now we compare two candidate random effects structures: random intercepts only vs. random intercepts + random slopes for Date. We use `REML = TRUE` for this comparison, as recommended when comparing random effects structures [@pinheiro2000mixedmodels; @winter2019statistics].```{r lmm9b, message=FALSE, warning=FALSE}ma.lmer <- lmer(Prepositions ~ Date + (1 | Genre), REML = TRUE, data = lmmdata)mb.lmer <- lmer(Prepositions ~ Date + (1 + Date | Genre), REML = TRUE, data = lmmdata)anova(ma.lmer, mb.lmer, test = "Chisq", refit = FALSE)```The model with random slopes for Date by Genre fits significantly better (lower AIC, significant χ² test). In a real analysis we would proceed with the random-intercepts-and-slopes model. For clarity of exposition in this tutorial, we continue with the simpler random-intercepts-only model (`ma.lmer`).::: {.callout-warning}## REML vs. ML for Model ComparisonUse **REML** (`REML = TRUE`) when comparing models that **differ only in their random effects structure** (same fixed effects, different random effects).Use **ML** (`REML = FALSE`) when comparing models that **differ in their fixed effects** structure. This is because REML estimates are conditioned on the fixed effects, making REML-based likelihood ratios invalid for fixed-effects comparisons.When using `anova()` to compare `lmer` models with different fixed effects, set `refit = TRUE` (the default) so `lme4` automatically refits using ML before comparing.:::### Step 3: Fixed-Effects Model Building {-}With the random effects structure decided, we now test which fixed effects to include. We compare models using likelihood ratio tests (`anova()` with `refit = TRUE` so models are refitted with ML).```{r lmm10, message=FALSE, warning=FALSE}# add Date as fixed effectm1.lmer <- lmer(Prepositions ~ (1 | Genre) + Date, REML = TRUE, data = lmmdata)anova(m1.lmer, m0.lmer, test = "Chi")```The model with Date fits significantly better. Date correlates significantly with preposition frequency.```{r lmm11, message=FALSE, warning=FALSE}# add Regionm2.lmer <- update(m1.lmer, .~. + Region)car::vif(m2.lmer)anova(m2.lmer, m1.lmer, test = "Chi")```Three indicators show that Region should not be included: AIC does not decrease, BIC increases, and p > .05. We check whether Region is involved in a significant interaction before finalising the model.```{r lmm12, message=FALSE, warning=FALSE}m3.lmer <- update(m1.lmer, .~. + Region * Date)car::vif(m3.lmer)anova(m3.lmer, m1.lmer, test = "Chi")```The interaction is also non-significant. Our **final minimal adequate model** contains only Date as a fixed effect.```{r lmm13, message=FALSE, warning=FALSE}summary(m1.lmer)```### Step 4: Model Diagnostics with `performance` {-}The `performance` package provides a comprehensive, one-call diagnostic suite for mixed-effects models. `check_model()` produces a panel of six diagnostic plots: linearity of residuals, homoscedasticity, influential observations (Cook's distance), normality of residuals, normality of random effects, and multicollinearity. This replaces the need to manually call multiple base-R plotting functions.```{r lmm14, message=FALSE, warning=FALSE, fig.height=10, fig.width=10}# Comprehensive model diagnostics in one callperformance::check_model(m1.lmer)```The panels are interpreted as follows:- **Linearity (top-left):** The reference line should be approximately flat and horizontal — systematic curvature indicates a misspecified functional form- **Homoscedasticity (top-right):** Points should be evenly spread across fitted values with no funnel shape — a widening spread indicates heteroscedasticity- **Influential observations (middle-left):** Points with Cook's distance > 0.5 (marked with a dashed line) may be unduly influencing the model- **Normality of residuals (middle-right):** Points should follow the diagonal line closely- **Normality of random effects (bottom-left):** The random-effect BLUPs (Best Linear Unbiased Predictors) should follow the diagonal — deviations indicate departures from the normality assumption for random effects- **Multicollinearity (bottom-right):** VIF (Variance Inflation Factor) values above 5–10 indicate problematic collinearity among fixed-effects predictorsWe can also extract individual diagnostic statistics:```{r lmm14b, message=FALSE, warning=FALSE}# Check homoscedasticity formallyperformance::check_heteroscedasticity(m1.lmer)# Check normality of residualsperformance::check_normality(m1.lmer)# Check for influential observationsperformance::check_outliers(m1.lmer)# Check multicollinearityperformance::check_collinearity(m1.lmer)```Some genres show more variability than others (particularly Letters). We test formally for heteroscedasticity across genres using Levene's test as a supplementary check.```{r lmm15b, message=FALSE, warning=FALSE}car::leveneTest(lmmdata$Prepositions, lmmdata$Genre, center = mean)```::: {.callout-note}## Levene's Test: CaveatsLevene's test for homogeneity of variance has well-known problems: it lacks power with small samples (too lax) and is overly sensitive with large samples (too strict). Use `performance::check_heteroscedasticity()` and `check_model()` as the primary tools, and treat Levene's test as a supplementary check only. @pinheiro2000mixedmodels [175–180] provide more nuanced guidance.:::Levene's test confirms heteroscedasticity across genres. We therefore switch to a weighted model using the `lme` function from `nlme`, which supports observation-level weights.### Step 5: Handling Heteroscedasticity with Weights {-}```{r lmm16, message=FALSE, warning=FALSE}# unweighted lme modelm4.lme <- lme(Prepositions ~ Date, random = ~1 | Genre, data = lmmdata, method = "ML")# weighted model: allow variance to differ by Genrem5.lme <- update(m4.lme, weights = varIdent(form = ~1 | Genre))anova(m5.lme, m4.lme)```The weighted model fits significantly better. We adopt it as our final model and inspect its parameters.```{r lmm17, message=FALSE, warning=FALSE}summary(m5.lme)``````{r lmm18, message=FALSE, warning=FALSE}anova(m5.lme)``````{r lmm19, message=FALSE, warning=FALSE}# confirm that weighted model outperforms intercept-only baselinem0.lme <- lme(Prepositions ~ 1, random = ~1 | Genre, data = lmmdata, method = "ML", weights = varIdent(form = ~1 | Genre))anova(m0.lme, m5.lme)``````{r lmm20, message=FALSE, warning=FALSE}intervals(m5.lme, which = "fixed")```### Step 6: Effect Sizes and Visualisation {-}We extract marginal and conditional R² values using `MuMIn::r.squaredGLMM()`. The **marginal R²** represents the variance explained by the fixed effects alone; the **conditional R²** represents the variance explained by the full model (fixed + random effects) [@barton2020mumin].```{r lmm21c, message=FALSE, warning=FALSE}MuMIn::r.squaredGLMM(m1.lmer)```We display the model summary table using `sjPlot::tab_model()`.```{r lmm21g, message=FALSE, warning=FALSE}sjPlot::tab_model(m5.lme)```::: {.callout-warning}## R² Values in sjPlotThe conditional R² value may be missing or incorrect in the `sjPlot` summary table for `lme` objects. Always use `r.squaredGLMM()` from `MuMIn` for reliable marginal and conditional R² values for mixed-effects models.:::```{r lmm21d, message=FALSE, warning=FALSE}sjPlot::plot_model(m5.lme, type = "pred", terms = c("Date")) + scale_x_continuous(name = "Year of composition", breaks = seq(-500, 300, 100), labels = seq(1150, 1950, 100)) + labs(title = "Predicted preposition frequency by year", y = "Prepositions per 1,000 words") + theme_bw()``````{r lmm21b, message=FALSE, warning=FALSE}lmmdata$Predicted <- predict(m5.lme, lmmdata)ggplot(lmmdata, aes(DateUnscaled, Predicted)) + facet_wrap(~ Genre) + geom_point(aes(x = DateUnscaled, y = Prepositions), color = "gray80", size = .5) + geom_smooth(aes(y = Predicted), color = "gray20", method = "lm", se = TRUE) + theme_bw() + labs(x = "Year of composition", y = "Prepositions per 1,000 words", title = "Observed (grey) vs. predicted (line) preposition frequency by genre")```### Step 7: Final Diagnostics {-}We run the full `performance` diagnostic panel on the weighted `lme` model. Note that `check_model()` supports `lme` objects from `nlme` directly.```{r lmm22, message=FALSE, warning=FALSE, fig.height=10, fig.width=10}performance::check_model(m5.lme)``````{r lmm22b, message=FALSE, warning=FALSE}# Model performance summary: R², ICC, RMSEperformance::model_performance(m5.lme)# Intraclass Correlation Coefficient — proportion of variance at group levelperformance::icc(m5.lme)```::: {.callout-tip}## The Intraclass Correlation Coefficient (ICC)The **ICC** from `performance::icc()` estimates what proportion of the total variance in the response is attributable to between-group differences (here, between genres). An ICC of .40 means 40% of variance in preposition frequency is due to genre differences, with the remaining 60% varying within genres. A high ICC reinforces the importance of including the grouping variable as a random effect.:::The diagnostic panels are satisfactory: residuals form an unstructured cloud with no funnel shape (no heteroscedasticity remaining after weighting), the Q–Q plot is approximately linear, and no data points show excessively high Cook's distance.### Reporting Results {-}```{r lmm27, message=FALSE, warning=FALSE}summary(m5.lme)```::: {.callout-note}## Write-Up Template: Linear Mixed-Effects ModelA mixed-effects linear regression model was fitted to predict the frequency of prepositions per 1,000 words. Genre was included as a random effect (random intercepts). Observation-level weights (`varIdent`) were used to account for heteroscedasticity across genres; inclusion of weights significantly improved model fit (χ²(2) = 39.17, p = .0006). The final minimal adequate model was selected through a step-up procedure comparing nested models with likelihood ratio tests. The model significantly outperformed an intercept-only baseline (χ²(1) = 12.44, p = .0004). Date of composition was the only significant fixed-effect predictor: preposition frequency increased marginally but significantly over time (Estimate = 0.02, 95% CI [0.01, 0.03], p < .001). Neither region of composition nor a Date × Genre interaction significantly improved model fit. Marginal R² = .017; conditional R² = .432, indicating that the random effect structure (genre) accounts for the majority of the variance explained by the full model.:::::: {.callout-note collapse="true"}## ✎ Check Your Understanding — Question 3**A researcher fits a linear mixed-effects model and finds that the marginal R² is .03 while the conditional R² is .55. What does this pattern indicate?**a) The fixed effects explain most of the variance; the random effects add littleb) The model has a very poor fit overallc) The random effects (grouping structure) account for the majority of the variance, while the fixed effects explain only a small proportiond) The model violates the assumption of normally distributed residuals<details><summary>**Answer**</summary>**c) The random effects (grouping structure) account for the majority of the variance, while the fixed effects explain only a small proportion**Marginal R² reflects the variance explained by fixed effects alone (.03 = 3%). Conditional R² reflects the variance explained by the entire model, including both fixed and random effects (.55 = 55%). The difference (.52) is the variance attributable to the random effects. This pattern — low marginal, high conditional R² — is common when grouping structure (e.g. genre, speaker, site) is a strong source of variability in the data but the specific fixed-effects predictors tested do not capture it fully. It does not indicate a poor fit or assumption violation.</details>:::---# Mixed-Effects Binomial Logistic Regression {#blmm}::: {.callout-note}## Section Overview**What you will learn:** How to fit and interpret a mixed-effects logistic regression model for binary outcomes; includes testing the random effect, model selection with `glmulti`, visualisation of effects, model fit statistics, and a complete reporting template:::## Introduction {-}Mixed-effects binomial logistic regression extends fixed-effects logistic regression to hierarchical data with a **binary dependent variable** (yes/no, present/absent, 0/1). Everything about fixed-effects logistic regression — the logit link, odds ratios, the interpretation of coefficients — applies here. The addition of random effects handles the non-independence of observations that come from the same speaker or item.Just as with linear mixed-effects models, random effects in logistic models shift the **intercept** of the logistic curve — and optionally also its **slope** — for each level of the grouping variable:```{r blmm1, echo=FALSE, message=FALSE, warning=FALSE, fig.height=5}set.seed(1)x1 <- seq(62, 86, length.out = 29)yn <- c(rep(0, 14), rep(1, 15))shifts_int <- c(-4, -2, 0, 2, 4, 6, -6)slopes_mult <- c(0.5, 0.8, 1.0, 1.2, 1.5, 1.8, 0.3)logd <- data.frame(x1 = x1, yn = yn)make_logistic <- function(x, a, b) 1 / (1 + exp(-(a + b * x)))x_seq <- seq(60, 90, length.out = 200)base_a <- -10; base_b <- 0.15p_fixed <- ggplot() + stat_function(fun = function(x) make_logistic(x, base_a, base_b), color = "red", linewidth = 1) + labs(title = "Fixed-Effects Model\n(1 intercept, 1 slope)", x = "", y = "Probability") + xlim(60, 90) + ylim(0, 1) + theme_bw() + theme(plot.title = element_text(size = 9))p_rint <- ggplot() + lapply(shifts_int, function(s) stat_function(fun = function(x) make_logistic(x, base_a + s, base_b), color = ifelse(s == 0, "red", "gray70"), linewidth = ifelse(s == 0, 1, .5))) + labs(title = "Random Intercepts\n(separate intercept per group)", x = "", y = "") + xlim(60, 90) + ylim(0, 1) + theme_bw() + theme(plot.title = element_text(size = 9))p_rslope <- ggplot() + lapply(slopes_mult, function(s) stat_function(fun = function(x) make_logistic(x, base_a, base_b * s), color = ifelse(s == 1.0, "red", "gray70"), linewidth = ifelse(s == 1.0, 1, .5))) + labs(title = "Random Slopes\n(separate slope per group)", x = "", y = "Probability") + xlim(60, 90) + ylim(0, 1) + theme_bw() + theme(plot.title = element_text(size = 9))p_both <- ggplot() + lapply(seq_along(shifts_int), function(i) stat_function(fun = function(x) make_logistic(x, base_a + shifts_int[i], base_b * slopes_mult[i]), color = ifelse(shifts_int[i] == 0 & slopes_mult[i] == 1.0, "red", "gray70"), linewidth = ifelse(shifts_int[i] == 0 & slopes_mult[i] == 1.0, 1, .5))) + labs(title = "Random Intercepts + Slopes\n(separate intercept AND slope)", x = "", y = "") + xlim(60, 90) + ylim(0, 1) + theme_bw() + theme(plot.title = element_text(size = 9))ggpubr::ggarrange(p_fixed, p_rint, p_rslope, p_both, ncol = 2, nrow = 2)```## Example: Discourse *like* in Irish English {-}We investigate which factors correlate with the use of *final discourse like* (e.g. *"The weather is shite, like!"*) in Irish English. The dependent variable is binary (1 = speech unit contains a final discourse like; 0 = it does not).### Loading and Preparing the Data {-}```{r blmm3, message=FALSE, warning=FALSE}mblrdata <- base::readRDS("tutorials/regression/data/mbd.rda", "rb")``````{r blmm3b, echo=FALSE, eval=TRUE, message=FALSE, warning=FALSE}mblrdata |> as.data.frame() |> head(15) |> flextable() |> flextable::set_table_properties(width = .75, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::align_text_col(align = "center") |> flextable::set_caption(caption = "First 15 rows of the mblrdata.") |> flextable::border_outer()``````{r blmm4, message=FALSE, warning=FALSE}vrs <- c("ID", "Age", "Gender", "ConversationType", "Priming")fctr <- which(colnames(mblrdata) %in% vrs)mblrdata[, fctr] <- lapply(mblrdata[, fctr], factor)mblrdata$Age <- relevel(mblrdata$Age, "Young")mblrdata <- mblrdata |> dplyr::arrange(ID)```### Exploratory Visualisation {-}```{r blmm8, message=FALSE, warning=FALSE}ggplot(mblrdata, aes(Gender, SUFlike, color = Priming)) + facet_wrap(Age ~ ConversationType) + stat_summary(fun = mean, geom = "point", size = 2) + stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) + theme_bw() + theme(legend.position = "top") + labs(x = "", y = "Observed probability of discourse like", title = "Discourse like by gender, age, conversation type, and priming") + scale_color_manual(values = c("gray20", "gray70"))```Key patterns: men use discourse like more than women; primed contexts show substantially higher probability; mixed-gender conversations show slightly higher rates than same-gender conversations.### Testing the Random Effect {-}```{r blmm10, message=FALSE, warning=FALSE}options(contrasts = c("contr.treatment", "contr.poly"))mblrdata.dist <- datadist(mblrdata)options(datadist = "mblrdata.dist")m0.glm <- glm(SUFlike ~ 1, family = binomial, data = mblrdata)m0.glmer <- glmer(SUFlike ~ (1 | ID), data = mblrdata, family = binomial)``````{r blmm11, message=FALSE, warning=FALSE}AIC(logLik(m0.glmer)); AIC(logLik(m0.glm))# Likelihood ratio testnull.id <- -2 * logLik(m0.glm) + 2 * logLik(m0.glmer)pchisq(as.numeric(null.id), df = 1, lower.tail = FALSE)```Both the AIC reduction and the likelihood ratio test (p < .05) confirm that including speaker (ID) as a random intercept is warranted.### Model Selection {-}We use `glmulti` to search all possible models (main effects and two-way interactions) and select the model with the lowest BIC.```{r blmm13a, message=FALSE, warning=FALSE}glmer.glmulti <- function(formula, data, random = "", ...) { glmer(paste(deparse(formula), random), family = binomial, data = data, control = glmerControl(optimizer = "bobyqa"), ...)}form_glmulti <- as.formula("SUFlike ~ Gender + Age + ConversationType + Priming")mfit <- glmulti(form_glmulti, random = "+ (1 | ID)", data = mblrdata, method = "h", fitfunc = glmer.glmulti, crit = "bic", intercept = TRUE, marginality = FALSE, level = 2)``````{r blmm13b, message=FALSE, warning=FALSE}top <- weightable(mfit)top[1:5, ]```The best model includes Gender, ConversationType, and Priming as fixed effects (no interactions). We fit this as our final minimal adequate model.```{r blmm13c, message=FALSE, warning=FALSE}mlr.glmer <- glmer(SUFlike ~ (1 | ID) + Gender + ConversationType + Priming, family = binomial, control = glmerControl(optimizer = "bobyqa"), data = mblrdata)summary(mlr.glmer, corr = FALSE)``````{r blmm30a, message=FALSE, warning=FALSE}sigfit <- anova(mlr.glmer, m0.glmer, test = "Chi")sigfit```The final model significantly outperforms the random-effects-only baseline.### Visualising Effects {-}```{r blmm33, message=FALSE, warning=FALSE}plot_model(mlr.glmer, type = "pred", terms = c("Gender", "Priming", "ConversationType")) + theme_bw() + labs(title = "Predicted probability of discourse like", y = "Probability of discourse like")```### Model Fit Statistics {-}```{r blmm35, message=FALSE, warning=FALSE, eval=FALSE}library(Hmisc)probs <- binomial()$linkinv(fitted(mlr.glmer))somers2(probs, as.numeric(mblrdata$SUFlike))``````{r blmm41, message=FALSE, warning=FALSE}sjPlot::tab_model(mlr.glmer)``````{r blmm_r2, message=FALSE, warning=FALSE}MuMIn::r.squaredGLMM(mlr.glmer)```### Model Diagnostics {-}We use `performance::check_model()` to run the full diagnostic suite. For logistic mixed-effects models, the panel includes checks for binned residuals (the logistic equivalent of residuals vs. fitted), influential observations, normality of random effects, and multicollinearity.```{r blmm38, message=FALSE, warning=FALSE, fig.height=10, fig.width=10}performance::check_model(mlr.glmer)``````{r blmm38b, message=FALSE, warning=FALSE}# Individual checksperformance::check_collinearity(mlr.glmer)# check_outliers() requires numeric predictors; for models with only# categorical fixed effects we inspect influence via Cook's distance insteadinfl <- influence(mlr.glmer, group = "ID")cooks_d <- cooks.distance(infl)plot(cooks_d, type = "h", ylab = "Cook's distance", xlab = "Speaker index", main = "Cook's distance by speaker (random-effect level)")abline(h = 4 / length(cooks_d), lty = 2, col = "firebrick") # rule-of-thumb threshold# Model performance summary including AUC (area under the ROC curve)performance::model_performance(mlr.glmer)# ICC: proportion of variance at speaker levelperformance::icc(mlr.glmer)```::: {.callout-tip}## Binned Residuals for Logistic ModelsFor logistic regression models, raw residuals are not informative because they are binary (0 or 1 minus the predicted probability). `performance::check_model()` automatically uses **binned residuals** — the average residual within bins of predicted probabilities — which should scatter randomly around zero with approximately 95% of points falling within the error bounds (shown as dashed lines). Systematic patterns above or below the bounds indicate model misspecification.:::### Reporting Results {-}```{r blme_report, message=FALSE, warning=FALSE}report::report(mlr.glmer)```::: {.callout-note}## Write-Up Template: Mixed-Effects Logistic RegressionA mixed-effects binomial logistic regression model was fitted to predict the use of final discourse *like* in Irish English. Speaker (ID) was included as a random effect (random intercept). Inclusion of the random effect was justified by a likelihood ratio test comparing the mixed-effects baseline to a fixed-effects intercept-only model (p < .05) and by a lower AIC. All possible main effects and two-way interactions among Gender, Age, ConversationType, and Priming were evaluated using exhaustive model search with the `glmulti` package (BIC criterion). The final minimal adequate model included Gender, ConversationType, and Priming as fixed effects and significantly outperformed the random-effects baseline (χ²(3) = *X*, p < .001). Women used discourse like significantly less than men (β = −0.64, 95% CI [−0.99, −0.30], p < .001). Speakers in same-gender conversations used discourse like significantly less than in mixed-gender conversations (β = −0.54, 95% CI [−0.83, −0.24], p < .001). Priming was the strongest predictor: primed speech units showed substantially higher rates of discourse like (β = 1.87, 95% CI [1.55, 2.19], p < .001). Marginal R² = .13; conditional R² = .15.:::::: {.callout-note collapse="true"}## ✎ Check Your Understanding — Question 4**A mixed-effects logistic regression model predicting a binary linguistic feature gives a marginal R² of .13 and a conditional R² of .15. What can you conclude about the role of the random effect?**a) The random effect (speaker) explains most of the variance in the modelb) The random effect adds very little explanatory power beyond the fixed effectsc) The model has a poor fit and should be re-specifiedd) The fixed effects and random effect together explain 28% of variance<details><summary>**Answer**</summary>**b) The random effect adds very little explanatory power beyond the fixed effects**The difference between conditional R² (.15) and marginal R² (.13) is only .02, meaning the random effect structure (by-speaker random intercepts) accounts for approximately 2 percentage points of additional variance. This is in contrast to the linear mixed-effects model example above, where random effects explained a much larger share. Both scenarios are valid — it simply means that individual speaker differences are not a very strong source of variability for discourse *like* use once the fixed-effects predictors are included. The random effects are still necessary to ensure that standard errors are not underestimated due to the non-independence of speech units within speakers.</details>:::---# Mixed-Effects Poisson and Negative Binomial Regression {#countmodels}::: {.callout-note}## Section Overview**What you will learn:** How to choose between Poisson, quasi-Poisson, and negative binomial mixed-effects models for count data; how to detect and address overdispersion; and how to report results for each model type:::## Introduction {-}When the dependent variable is a **count** (a non-negative integer — number of errors, occurrences of a word, instances of a discourse feature), the appropriate model family is Poisson or one of its extensions. Poisson models assume that the **mean equals the variance** — a very restrictive assumption that is frequently violated in linguistic data. When the variance exceeds the mean (overdispersion), we must use a more robust alternative.| Model | When to use | Key characteristic ||-------|------------|-------------------|| **Poisson** | Count data, variance ≈ mean | Restrictive; standard starting point || **Quasi-Poisson** | Count data, variance > mean | Scales standard errors; coefficients unchanged || **Negative binomial** | Count data, variance >> mean | Adds explicit dispersion parameter; preferred by @zuur2013beginner |: Mixed-effects count models and when to use them {tbl-colwidths="[25,45,30]"}::: {.callout-tip}## Why Not Just Use Quasi-Poisson?Quasi-Poisson models scale the **standard errors** to compensate for overdispersion, which corrects p-values. However, the **coefficients themselves** remain those of the Poisson model and may still be biased under severe overdispersion. Negative binomial models include an additional dispersion parameter that accommodates overdispersion in the model structure itself, making the coefficients more reliable. @zuur2013beginner therefore recommend negative binomial over quasi-Poisson when overdispersion is substantial.:::## Example: Filler *uhm* Use by Language and Alcohol Consumption {-}The data for this section records the number of instances of the filler *uhm* in two-minute conversations in four languages (English, German, Russian, Mandarin), along with the number of alcoholic shots consumed by each speaker before the conversation.### Loading and Preparing the Data {-}```{r pmm1, message=FALSE, warning=FALSE}countdata <- base::readRDS("tutorials/regression/data/cld.rda", "rb")``````{r pmm2, message=FALSE, warning=FALSE}countdata <- countdata |> dplyr::select(-ID) |> dplyr::mutate_if(is.character, factor)```### Exploratory Visualisation {-}```{r pmm3, message=FALSE, warning=FALSE}countdata |> dplyr::group_by(Language) |> dplyr::mutate(Mean = round(mean(Shots), 1), SD = round(sd(Shots), 1)) |> ggplot(aes(Language, Shots, color = Language, fill = Language)) + geom_violin(trim = FALSE, color = "gray20") + geom_boxplot(width = 0.1, fill = "white", color = "gray20") + geom_text(aes(y = -4, label = paste("mean:", Mean)), size = 3, color = "black") + geom_text(aes(y = -5, label = paste("SD:", SD)), size = 3, color = "black") + scale_fill_manual(values = rep("grey90", 4)) + theme_bw() + theme(legend.position = "none") + ylim(-5, 15) + labs(x = "Language", y = "Shots consumed", title = "Distribution of shots consumed by language")```### Variable Selection with Boruta {-}```{r pmm4, message=FALSE, warning=FALSE}set.seed(sum(utf8ToInt("boruta")))boruta <- Boruta(UHM ~ ., data = countdata)print(boruta)```Boruta identifies Shots as the only confirmed important predictor. Language is tentative. We include Language as a random effect for theoretical reasons (speakers are nested within language communities) even though it does not reach confirmed status.### Checking for Overdispersion {-}Before choosing between Poisson and its alternatives, we check whether the count data is plausibly Poisson-distributed.```{r pmm5, message=FALSE, warning=FALSE}gf <- goodfit(countdata$UHM, type = "poisson", method = "ML")plot(gf, main = "Count data vs. Poisson distribution")summary(gf)```The goodness-of-fit test (p < .05) and the visual inspection both indicate that the data deviate from a Poisson distribution — the variance is greater than the mean (overdispersion). We will fit all three model types for comparison.## Mixed-Effects Poisson Regression {-}```{r pmm7, message=FALSE, warning=FALSE}m0.glmer.p <- glmer(UHM ~ 1 + (1 | Language), data = countdata, family = poisson, control = glmerControl(optimizer = "bobyqa"))m1.glmer.p <- update(m0.glmer.p, .~. + Shots)car::Anova(m1.glmer.p, test = "Chi")``````{r pmm8, message=FALSE, warning=FALSE}summary(m1.glmer.p)```::: {.callout-warning}## Singular Fit WarningThe model above produces a singular fit warning (Language variance ≈ 0). This means the random effect is not explaining any additional variance, and the mixed-effects model collapses to the fixed-effects equivalent. In a real analysis, this should lead you to remove the random effect and report a fixed-effects model instead. We continue here purely for illustration.:::#### Checking Overdispersion in the Poisson Model {-}```{r pmm9, message=FALSE, warning=FALSE}PearsonResiduals <- resid(m1.glmer.p, type = "pearson")Cases <- nrow(countdata)NumberOfPredictors <- length(fixef(m1.glmer.p)) + 1Overdispersion <- sum(PearsonResiduals^2) / (Cases - NumberOfPredictors)cat("Overdispersion parameter:", round(Overdispersion, 3), "\n")cat("(Values > 1.5 indicate meaningful overdispersion)\n")``````{r pmm10, message=FALSE, warning=FALSE}diag_data <- data.frame(Pearson = PearsonResiduals, Fitted = fitted(m1.glmer.p))p9 <- ggplot(diag_data, aes(Fitted, Pearson)) + geom_point(alpha = .4) + geom_hline(yintercept = 0, linetype = "dotted") + theme_bw() + labs(title = "Residuals vs. Fitted")p10 <- ggplot(countdata, aes(Shots, diag_data$Pearson)) + geom_point(alpha = .4) + geom_hline(yintercept = 0, linetype = "dotted") + theme_bw() + labs(y = "Pearson residuals", title = "Residuals vs. Shots")p11 <- ggplot(countdata, aes(Language, diag_data$Pearson)) + geom_boxplot() + theme_bw() + labs(y = "Pearson residuals", title = "Residuals by Language")gridExtra::grid.arrange(p9, p10, p11, nrow = 1)```The diagnostic plots show patterning (lower-left clustering) consistent with overdispersion. We proceed to quasi-Poisson and negative binomial alternatives.## Mixed-Effects Quasi-Poisson Regression {-}```{r qpmm1, eval=TRUE, echo=TRUE, message=FALSE, warning=FALSE}m0.qp <- glmmPQL(UHM ~ 1, random = ~1 | Language, data = countdata, family = quasipoisson(link = "log"))m1.qp <- update(m0.qp, .~. + Shots)car::Anova(m1.qp, test = "Chi")``````{r qpmm2, message=FALSE, warning=FALSE}summary(m1.qp)``````{r qpmm3, message=FALSE, warning=FALSE}# odds ratios from fixed-effects equivalentm1.glm.qp <- glm(UHM ~ Shots, data = countdata, family = quasipoisson(link = "log"))exp(coef(m1.glm.qp))```The rate ratio for Shots is approximately 1.26: each additional shot is associated with a 26% increase in the predicted number of *uhm* fillers.```{r qpmm4, message=FALSE, warning=FALSE}PearsonResiduals.qp <- resid(m1.qp, type = "pearson")Overdispersion.qp <- sum(PearsonResiduals.qp^2) / (Cases - length(fixef(m1.qp)) - 1)cat("Overdispersion (quasi-Poisson):", round(Overdispersion.qp, 3), "\n")``````{r qpmm5, message=FALSE, warning=FALSE}plot(m1.qp, pch = 20, col = "black", lty = "dotted", ylab = "Pearson residuals")``````{r qpmm10, message=FALSE, warning=FALSE}plot(m1.qp, Language ~ resid(.), abline = 0, fill = "gray70")``````{r qpmm11b, message=FALSE, warning=FALSE}plot_model(m1.qp, type = "pred", terms = c("Shots")) + theme_bw() + labs(title = "Predicted uhm frequency by shots consumed")``````{r qpmm11c, message=FALSE, warning=FALSE}sjPlot::tab_model(m1.qp)MuMIn::r.squaredGLMM(m1.qp)```## Mixed-Effects Negative Binomial Regression {-}```{r nbmm1, message=FALSE, warning=FALSE}m0.nb <- glmer.nb(UHM ~ 1 + (1 | Language), data = countdata)m1.nb <- update(m0.nb, .~. + Shots)anova(m1.nb, m0.nb)``````{r nbmm3, message=FALSE, warning=FALSE}PearsonResiduals.nb <- resid(m1.nb, type = "pearson")NumberOfPredictors.nb <- 2 + 1 + 1 # intercept + Shots + dispersion + 1Overdispersion.nb <- sum(PearsonResiduals.nb^2) / (Cases / NumberOfPredictors.nb)cat("Overdispersion (negative binomial):", round(Overdispersion.nb, 3), "\n")``````{r nbmm4, message=FALSE, warning=FALSE}diag_nb <- data.frame(Pearson = PearsonResiduals.nb, Fitted = fitted(m1.nb))q9 <- ggplot(diag_nb, aes(Fitted, Pearson)) + geom_point(alpha = .4) + geom_hline(yintercept = 0, linetype = "dotted") + theme_bw() + labs(title = "NB: Residuals vs. Fitted")q10 <- ggplot(countdata, aes(Shots, diag_nb$Pearson)) + geom_point(alpha = .4) + geom_hline(yintercept = 0, linetype = "dotted") + theme_bw() + labs(y = "Pearson", title = "NB: Residuals vs. Shots")q11 <- ggplot(countdata, aes(Language, diag_nb$Pearson)) + geom_boxplot() + theme_bw() + labs(y = "Pearson", title = "NB: Residuals by Language")gridExtra::grid.arrange(q9, q10, q11, nrow = 1)``````{r nbmm6b, message=FALSE, warning=FALSE}plot_model(m1.nb, type = "pred", terms = c("Shots")) + theme_bw() + labs(title = "NB: Predicted uhm frequency by shots")sjPlot::tab_model(m1.nb)MuMIn::r.squaredGLMM(m1.nb)```### Model Comparison and Selection {-}| Model | Overdispersion | Singular fit? | Verdict ||-------|---------------|--------------|---------|| Poisson | Yes (> 1) | Yes | Not recommended || Quasi-Poisson | Reduced (closer to 1) | No | Acceptable || Negative binomial | Varies | Check | Often preferred |For this data set, the quasi-Poisson model shows reduced overdispersion and no singular fit warning, making it preferable to the Poisson model. In general, when overdispersion is severe, prefer negative binomial over quasi-Poisson for more reliable coefficient estimates.### Reporting Results {-}::: {.callout-note}## Write-Up Template: Mixed-Effects Quasi-Poisson RegressionA mixed-effects quasi-Poisson regression model was fitted to predict the number of *uhm* fillers produced in two-minute conversations. Language was included as a random effect (random intercept). Prior to modelling, a Boruta variable selection procedure identified the number of alcoholic shots consumed as the only confirmed predictor; Language was tentative. A goodness-of-fit test confirmed that the data deviated from a Poisson distribution (overdispersion), justifying the use of a quasi-Poisson rather than a standard Poisson model. The final minimal adequate model showed that each additional shot consumed was associated with a significant 26% increase in the expected number of *uhm* fillers (rate ratio = 1.26, p < .001). The overdispersion parameter for the quasi-Poisson model was close to 1, indicating adequate fit after scaling.:::::: {.callout-note collapse="true"}## ✎ Check Your Understanding — Question 5**A researcher fits a Poisson mixed-effects model to count data and obtains an overdispersion parameter of 4.2. What is the most appropriate course of action?**a) Report the Poisson model — an overdispersion parameter above 1 is always acceptableb) Switch to a quasi-Poisson or negative binomial model, as overdispersion of 4.2 indicates a serious violationc) Remove outliers until the overdispersion parameter drops below 2d) Add more fixed-effects predictors until the overdispersion is reduced<details><summary>**Answer**</summary>**b) Switch to a quasi-Poisson or negative binomial model, as overdispersion of 4.2 indicates a serious violation**An overdispersion parameter of 1 is ideal for Poisson models (variance = mean). Values modestly above 1 (say, < 1.5) can often be tolerated, but a value of 4.2 indicates that the variance is more than four times the mean — a serious violation that will produce underestimated standard errors and inflated Type I error in a standard Poisson model. The correct response is to switch to a quasi-Poisson model (which scales standard errors) or — better — a negative binomial model (which models the overdispersion explicitly). Simply removing outliers or adding more predictors is not a principled solution to overdispersion.</details>:::---# Mixed-Effects Ordinal Regression {#ordinal}::: {.callout-note}## Section Overview**What you will learn:** When and why to use mixed-effects ordinal regression; how to fit, diagnose, and interpret a `clmm()` model; and how to report ordinal mixed-effects results:::## Introduction {-}Mixed-effects ordinal regression (also called the *cumulative link mixed model*, CLMM) is the appropriate choice when:- The **dependent variable is an ordered categorical variable** with three or more levels — such as a Likert scale (Strongly Disagree → Strongly Agree), an acceptability rating, a proficiency level, or an error severity score- **Observations are not independent** — for example, multiple ratings from the same participant or rater, or multiple items rated by different peopleThe `clmm()` function from the `ordinal` package [@ordinal] implements the proportional odds mixed model — an extension of the fixed-effects proportional odds model (`polr()` from `MASS`) that adds random effects.::: {.callout-important}## Why Not Treat Ordinal Data as Continuous?Treating an ordinal variable (e.g. a 5-point Likert scale) as a continuous dependent variable in a linear model is tempting but problematic: it assumes equal intervals between scale points (that the difference between 1 and 2 is the same as the difference between 4 and 5), which is rarely justified. Ordinal regression makes no such assumption — it models the **cumulative probability of being at or below each category** without assuming equal intervals.:::## Example: Accentedness Ratings in a Rating Experiment {-}The data represent the results of a rating experiment in which raters evaluated the accentedness of speech samples from speakers of two language families (Language Family: Germanic vs. Romance) across two groups (Group: L2 learners vs. native speakers). The dependent variable (*Accent*) is an ordered factor with five levels.### Loading and Inspecting the Data {-}```{r ord01, message=FALSE, warning=FALSE}ratex <- base::readRDS("tutorials/regression/data/ratex.rda", "rb")head(ratex, 10)``````{r ord02, message=FALSE, warning=FALSE}# check balance: how many ratings per Family × Accent combination?ratex |> dplyr::group_by(Family, Accent) |> dplyr::summarise(n = dplyr::n(), .groups = "drop") |> tidyr::spread(Accent, n)```### Exploratory Visualisation {-}```{r ord03, message=FALSE, warning=FALSE}p_mean <- ratex |> ggplot(aes(Family, AccentNumeric, color = Group)) + stat_summary(fun = mean, geom = "point", size = 2) + stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) + theme_bw() + theme(legend.position = "top") + scale_color_manual(values = c("gray20", "gray70")) + labs(y = "Mean accentedness rating", title = "Mean rating by family and group")p_box <- ratex |> dplyr::group_by(Family, Rater, Group) |> dplyr::summarise(AccentMean = mean(AccentNumeric), .groups = "drop") |> ggplot(aes(Family, AccentMean, fill = Group)) + geom_boxplot() + theme_bw() + theme(legend.position = "top") + scale_fill_manual(values = c("gray50", "gray85")) + labs(y = "Mean accentedness (by rater)", title = "Rater-level ratings by family and group")gridExtra::grid.arrange(p_mean, p_box, nrow = 1)```The plots suggest that Language Family affects accentedness ratings, with the direction possibly modulated by Group.### Fitting the Ordinal Mixed-Effects Model {-}We fit a model with Rater as a random effect (multiple ratings per rater) and Family as the fixed-effects predictor.```{r ord04, message=FALSE, warning=FALSE}m1.or <- clmm(Accent ~ (1 | Rater) + Family, link = "logit", data = ratex)# check for complete separationifelse(min(ftable(ratex$Accent, ratex$Family)) == 0, "Warning: incomplete information (separation possible)", "No separation detected — OK to proceed")summary(m1.or)```The model outputs:- **Threshold coefficients** (`1|2`, `2|3`, `3|4`, `4|5`): these are the log-odds of being at or below each category for the reference level of Family. They define the cut-points of the latent continuous response.- **Fixed-effect coefficient** for Family: the log-odds shift associated with moving from the reference Family level to the other level.::: {.callout-tip}## Interpreting Threshold CoefficientsIn a cumulative link model, the threshold coefficient `k|(k+1)` is the log-odds of rating at or below category $k$ (compared to above $k$) for the reference group. For example, `1|2 = -2.15` means the log-odds of rating ≤ 1 vs. > 1 for the reference Family is −2.15 — a low probability of rating at the bottom of the scale.The fixed-effect coefficient for Family shifts all thresholds simultaneously by the same amount (the **proportional odds** assumption). A positive coefficient means the distribution shifts towards higher ratings for that Family level.:::### Post-Hoc Comparisons {-}```{r ord05, message=FALSE, warning=FALSE}emmeans::lsmeans(m1.or, pairwise ~ Family, adjust = "tukey")```### Model Summary Table {-}```{r ord06, message=FALSE, warning=FALSE}sjPlot::tab_model(m1.or)```### Visualising Effects {-}```{r ord07, message=FALSE, warning=FALSE}plot_model(m1.or, type = "pred", terms = c("Family")) + theme_bw() + labs(title = "Predicted probability of each accentedness rating by language family", y = "Predicted probability", x = "Language family")```### Model Diagnostics {-}For ordinal mixed-effects models, formal diagnostic plots are less well developed than for linear or logistic models. Key checks include:1. **Convergence:** ensure the model converged (no warnings about convergence failure)2. **Complete separation:** if any cell of the contingency table between the response and a predictor is empty, the model may produce unreliable thresholds3. **Proportional odds assumption:** the `brant` test (for fixed-effects ordinal models) or a comparison of constrained vs. unconstrained models can be used to test this assumption```{r ord08, message=FALSE, warning=FALSE}# Rough check: compare clmm to a nominal logistic model# (if proportional odds hold, the clmm should fit comparably)cat("AIC of proportional odds model:", AIC(m1.or), "\n")```### Reporting Results {-}::: {.callout-note}## Write-Up Template: Mixed-Effects Ordinal RegressionA mixed-effects cumulative link model (proportional odds; logit link) was fitted to predict accentedness ratings (5-point ordered scale). Rater was included as a random effect (random intercept) to account for the non-independence of multiple ratings from the same rater. The model was fitted using the `clmm()` function from the `ordinal` package. Language Family was a significant predictor of accentedness ratings (β = *X*, SE = *Y*, z = *Z*, p = .*W*). Post-hoc pairwise comparisons with Tukey correction confirmed significant differences between [Family A] and [Family B] (p = .*V*). The proportional odds assumption was not formally tested for the mixed model; visual inspection of predicted probabilities suggested the assumption was approximately satisfied.:::::: {.callout-note collapse="true"}## ✎ Check Your Understanding — Question 6**A researcher uses a 7-point Likert scale (1 = Strongly Disagree → 7 = Strongly Agree) as the dependent variable. Multiple participants rated the same 30 items. Which model is most appropriate?**a) Linear mixed-effects model with Participant as a random effect onlyb) Binomial logistic mixed-effects model after dichotomising the scale at the midpointc) Mixed-effects cumulative link model (`clmm`) with Participant and Item as crossed random effectsd) Fixed-effects ordinal regression with `polr()`, ignoring the grouping structure<details><summary>**Answer**</summary>**c) Mixed-effects cumulative link model (`clmm`) with Participant and Item as crossed random effects**The dependent variable is ordered categorical (a 7-point Likert scale), which calls for an ordinal model rather than a linear or binary logistic model. Because every participant rates every item, the design has crossed random effects — both participants and items should be included as random intercepts. Option (a) uses a linear model and includes only participant-level random effects, ignoring item-level variance. Option (b) throws away information by collapsing the 7-point scale to binary. Option (d) ignores the non-independence of observations, underestimates standard errors, and inflates Type I error.</details>:::---# Mixed-Effects Multinomial Regression {#multinomial}::: {.callout-note}## Section OverviewThis section provides a brief implementation showcase of mixed-effects multinomial regression using `mblogit()` from the `mclogit` package. It is intentionally brief — for a full treatment of multinomial regression concepts and diagnostics, see the [Regression Analysis tutorial](/tutorials/regression/regression.html).:::## Introduction {-}Mixed-effects multinomial logistic regression is used when the dependent variable is **unordered categorical with three or more levels** (e.g. choice of variant A vs. B vs. C in a variation study) and observations are grouped within speakers or items.## Example: Picture Description Responses {-}```{r mult01, message=FALSE, warning=FALSE}pict <- base::readRDS("tutorials/regression/data/pict.rda", "rb")head(pict)``````{r mult02, message=FALSE, warning=FALSE}m0.mn <- mblogit(formula = Response ~ 1, random = ~1 | Participant, data = pict)``````{r mult03, message=FALSE, warning=FALSE}m1.mn <- mblogit(formula = Response ~ Gender + Group, random = ~1 | Item, data = pict)anova(m0.mn, m1.mn)``````{r mult09, message=FALSE, warning=FALSE}sjPlot::tab_model(m1.mn)```::: {.callout-warning}## This Section Is a Showcase OnlyThe analysis above is deliberately brief and skips full diagnostics and model validation. It is intended only to demonstrate how to implement a mixed-effects multinomial model in R. Do not use it as a template for a complete research analysis — always perform and report full diagnostics.:::---# Summary and Model Comparison {#summary}::: {.callout-note}## Section Overview**What you will learn:** A consolidated overview of when to use each mixed-effects model type, key implementation functions, and the most important points to check and report for each:::The table below summarises the five mixed-effects model types covered in this tutorial:| Model type | Dependent variable | Key function(s) | Key diagnostic ||------------|--------------------|-----------------|----------------|| **Linear** | Continuous | `lmer()`, `lme()` | `performance::check_model()`, ICC, Levene's test || **Logistic (binary)** | Binary (0/1) | `glmer(..., family=binomial)` | `performance::check_model()`, AUC, binned residuals || **Poisson** | Count (underdispersed) | `glmer(..., family=poisson)` | Overdispersion parameter, `check_model()` || **Quasi-Poisson / NB** | Count (overdispersed) | `glmmPQL()`, `glmer.nb()` | Overdispersion parameter, diagnostic plots || **Ordinal** | Ordered categorical | `clmm()` | Convergence, separation, proportional odds || **Multinomial** | Unordered categorical (3+) | `mblogit()` | VIF, convergence |: Summary of mixed-effects model types covered in this tutorial {tbl-colwidths="[20,22,28,30]"}**General workflow for any mixed-effects model:**1. **Inspect the data** — visualise distributions, group differences, and potential outliers before modelling2. **Set contrasts deliberately** — use `contrasts()` or `options(contrasts = ...)` before fitting; choose treatment, sum, Helmert, or planned contrasts based on your design3. **Centre (and possibly scale) numeric predictors** — makes intercepts interpretable and improves model stability4. **Test whether random effects are warranted** — compare AIC of fixed- vs. mixed-effects baseline with a likelihood ratio test5. **Decide on the random effects structure** — random intercepts only vs. random intercepts + slopes; nested vs. crossed6. **Build fixed effects step by step** — use `anova()` with `refit = TRUE` (ML) for fixed-effects comparisons7. **Run diagnostics with `performance`** — `check_model()` for a full panel; `icc()` for the intraclass correlation; `model_performance()` for a summary table of fit indices8. **Extract effect sizes** — marginal and conditional R² via `MuMIn::r.squaredGLMM()` or `performance::r2()`9. **Report clearly** — include model type, contrast scheme, random effects structure, comparison statistics, fixed-effect estimates with CIs, ICC, and R²::: {.callout-note collapse="true"}## ✎ Check Your Understanding — Question 7**A researcher is studying how often L2 speakers self-correct during oral production. Each speaker produced 30 utterances, and the dependent variable is the number of self-corrections per utterance. A Poisson model gives an overdispersion parameter of 0.8. What should the researcher do?**a) Switch to a negative binomial model because Poisson models should never be usedb) Use the Poisson model — an overdispersion parameter below 1 indicates underdispersion, which is within acceptable range for a Poisson modelc) Switch to a linear mixed-effects model because the count is approximately continuousd) Switch to a logistic model after dichotomising (corrected vs. not corrected)<details><summary>**Answer**</summary>**b) Use the Poisson model — an overdispersion parameter below 1 indicates underdispersion, which is within acceptable range for a Poisson model**An overdispersion parameter of 0.8 indicates *under*dispersion (variance slightly less than the mean) rather than overdispersion (variance greater than the mean). Mild underdispersion is generally not a problem for Poisson models and does not warrant switching to a negative binomial model (which addresses overdispersion, not underdispersion). Quasi-Poisson models also address overdispersion only. Switching to a linear model ignores the count nature of the data and risks predicting negative values. Dichotomising throws away information. The correct response is to proceed with the Poisson model and note the overdispersion parameter in the methods section.</details>:::---# Citation and Session Info {-}Schweinberger, Martin. 2026. *Mixed-Effects Models in R*. Brisbane: The Language Technology and Data Analysis Laboratory (LADAL). url: https://ladal.edu.au/tutorials/mixedmodel/mixedmodel.html (Version 2026.02.24).```@manual{schweinberger2026mixedmodel, author = {Schweinberger, Martin}, title = {Mixed-Effects Models in R}, note = {https://ladal.edu.au/tutorials/mixedmodel/mixedmodel.html}, year = {2026}, organization = {The Language Technology and Data Analysis Laboratory (LADAL)}, address = {Brisbane}, edition = {2026.02.24}}``````{r session_info}sessionInfo()```::: {.callout-note}## AI Transparency StatementThis tutorial was revised and substantially expanded with the assistance of **Claude** (claude.ai), a large language model created by Anthropic. Claude was used to restructure the document into Quarto format, expand the theoretical introduction, add the new section on crossed vs. nested random effects, add the Simpson's Paradox / misleading aggregate trend figure and its accompanying callout, add the manual contrast coding subsection, replace manual diagnostic code with the `performance` package throughout the linear and logistic sections, expand model diagnostics and interpretation guidance across all sections, write the new reporting templates, write the quiz questions and detailed answer explanations, and produce the model comparison summary table. All content was reviewed, edited, and approved by the author (Martin Schweinberger), who takes full responsibility for its accuracy.:::---[Back to top](#intro)[Back to LADAL home](/)---# References {-}