This tutorial introduces Structural Equation Modelling (SEM) — a powerful and flexible family of multivariate statistical techniques that allows researchers to simultaneously model multiple relationships among variables, account for measurement error, and test theories about constructs that cannot be directly observed. Where simple linear regression models a single outcome from one or more predictors, SEM can model entire systems of relationships, including situations where the same variable acts as both a predictor and an outcome, and where some of the most important variables in a theory are not directly measurable at all.
SEM is particularly well suited to the language sciences. Much of what linguists and applied linguists care about — language anxiety, motivation, metalinguistic awareness, communicative competence, reading ability — cannot be captured in a single measurement. These are latent constructs: theoretical entities that we infer indirectly from a set of observable indicators such as questionnaire items or test scores. SEM provides a principled framework for doing exactly this, and then for examining how these latent constructs relate to one another and to observable outcomes.
This tutorial is aimed at beginners with no prior exposure to SEM. You do not need to have studied factor analysis or path analysis before, though familiarity with basic regression is helpful. The goal is to build conceptual understanding from the ground up and to equip you with the practical skills to fit, evaluate, and report SEM models in R using the lavaan package.
Prerequisite Tutorials
Before working through this tutorial, we recommend familiarity with the following:
# load packageslibrary(lavaan) # SEM and CFA estimationlibrary(semPlot) # path diagram visualisationlibrary(semTools) # reliability and model comparison toolslibrary(psych) # descriptive statistics and correlation matriceslibrary(dplyr) # data manipulationlibrary(ggplot2) # data visualisationlibrary(tidyr) # data reshapinglibrary(flextable) # formatted tableslibrary(checkdown) # interactive quiz questions
The Dataset
Throughout this tutorial we use a simulated dataset inspired by research on second-language (L2) writing. The data represent 300 university students who completed a battery of questionnaire scales and an academic writing task. The dataset includes:
Language Anxiety (anx1–anx3): three Likert-scale items measuring the degree to which students feel anxious when writing in their L2 (higher = more anxious)
Writing Self-Efficacy (eff1–eff3): three items measuring students’ confidence in their L2 writing ability (higher = greater self-efficacy)
Motivation (mot1–mot3): three items measuring students’ intrinsic motivation to improve their L2 writing (higher = more motivated)
Writing Score (writing_score): a holistic score (0–100) assigned by trained raters to an in-class academic writing task
Because the data are simulated in R, no external file is needed — you can reproduce the entire analysis from the code below.
What you’ll learn: The core ideas underpinning SEM — latent variables, measurement error, path diagrams, and the two-component structure of a full SEM
Why it matters: SEM notation and vocabulary are quite different from ordinary regression. Building a solid conceptual foundation before fitting models prevents common misinterpretations.
Observed vs. Latent Variables
A fundamental distinction in SEM is between variables you can observe directly and those you cannot.
Observed (manifest) variables are things you actually measure and record: a Likert-scale item, a test score, a reaction time, a corpus frequency count. They appear as columns in your dataset.
Latent variables are theoretical constructs that you cannot measure directly. Language anxiety, motivation, and writing self-efficacy are classic examples from applied linguistics. No single questionnaire item perfectly captures any of these constructs — each item is merely a fallible indicator. Latent variables are never columns in your dataset; instead, they are modelled as common causes of their observed indicators.
This distinction matters because measurement error is unavoidable whenever we use observed items to represent theoretical constructs. If we ignore this error — for example, by averaging questionnaire items and treating the result as if it were the true construct — we introduce bias into our estimates of relationships. SEM addresses this explicitly: it partitions the variance in each observed indicator into a part explained by the underlying latent variable and a part attributed to unique error (random noise plus any systematic variance not shared with the other indicators).
The Two Building Blocks of SEM
A full structural equation model is composed of two sub-models:
Sub-model
Technical name
What it specifies
Measurement model
Confirmatory Factor Analysis (CFA)
Which observed items are indicators of which latent variables; how strongly each item loads onto its construct; how much unique error each item has
Structural model
Path model
Directional relationships among latent variables (and between latent variables and observed outcomes); regression-like paths encoding theoretical predictions
In the standard two-step approach to SEM (Anderson and Gerbing 1988), researchers first establish an adequate measurement model (Step 1) before testing the structural paths of theoretical interest (Step 2). This tutorial follows this workflow: we build and evaluate a CFA in Section 3 and then add structural paths in Section 5.
Path Diagrams
SEM models are almost always communicated visually through path diagrams. The notation is standardised:
Symbol
Represents
Rectangle
Observed (manifest) variable
Oval / ellipse
Latent variable
Single-headed arrow (→)
Directional path (a regression-type effect)
Double-headed curved arrow (↔︎)
Covariance or correlation
Small arrow into rectangle
Residual / unique error for that indicator
Small arrow into oval
Disturbance (residual error for an endogenous latent variable)
In a measurement model, ovals point to rectangles: the latent construct is hypothesised to cause variation in its observed indicators. In a structural model, ovals point to other ovals, encoding directional theoretical predictions among constructs.
SEM is a Confirmatory, Theory-Driven Method
Unlike Exploratory Factor Analysis (EFA), which discovers factor structure empirically from the data, SEM requires the researcher to specify the model in advance based on theory. Every path in the diagram — every arrow that is included or excluded — reflects a theoretical decision. A good model fit indicates that the specified model is consistent with the data; it does not prove the model is the only correct one. Alternative models that fit equally well are always possible (this is the problem of equivalent models). Always ground your SEM specifications in theory, not post-hoc data exploration.
A Conceptual Map of Our Example
Our theoretical model for the L2 writing dataset can be described as follows:
Language Anxiety, Writing Self-Efficacy, and Motivation are latent constructs, each measured by three questionnaire items.
We expect Self-Efficacy and Anxiety to have opposite effects on Writing Score: greater self-efficacy should improve performance; greater anxiety should impair it.
Self-Efficacy is also expected to influence Motivation (students who feel more capable tend to be more motivated), and Motivation may in turn have a positive effect on Writing Score. This indirect path constitutes a mediation hypothesis.
This conceptual model drives all the analytic choices that follow.
Descriptive Statistics and Correlations
Section Overview
What you’ll learn: How to examine the observed variables before fitting any model
Key steps: Descriptive statistics, distribution checks, inter-item correlations
Before fitting any SEM, it is good practice to examine the distributions and inter-relationships of your observed variables. Severe non-normality or implausible correlations can signal problems that need to be addressed before modelling.
All items are centred near zero (as expected for standardised simulated data). Skewness values are within the acceptable range of [−1, +1] for all items, meaning that the normality assumption required for maximum likelihood estimation in lavaan is not substantially violated.
Correlation Matrix
A correlation matrix helps us verify that items within the same scale correlate with each other (convergent evidence) and that items from different scales correlate less strongly (discriminant evidence).
Code
cor_mat <-cor(semdata %>% dplyr::select(-writing_score)) %>%round(2)cor_mat %>%as.data.frame() %>% tibble::rownames_to_column("Variable") %>%flextable() %>% flextable::set_table_properties(width = .99, layout ="autofit") %>% flextable::theme_zebra() %>% flextable::fontsize(size =10) %>% flextable::fontsize(size =10, part ="header") %>% flextable::align_text_col(align ="center") %>% flextable::set_caption(caption ="Pearson correlation matrix for the nine questionnaire items.") %>% flextable::border_outer()
Variable
anx1
anx2
anx3
eff1
eff2
eff3
mot1
mot2
mot3
anx1
1.00
0.56
0.57
0.03
-0.04
0.00
0.01
-0.03
-0.01
anx2
0.56
1.00
0.56
0.03
-0.07
0.00
0.03
-0.01
-0.01
anx3
0.57
0.56
1.00
0.01
-0.04
-0.01
0.05
-0.02
0.07
eff1
0.03
0.03
0.01
1.00
0.60
0.58
0.30
0.36
0.32
eff2
-0.04
-0.07
-0.04
0.60
1.00
0.53
0.24
0.31
0.27
eff3
0.00
0.00
-0.01
0.58
0.53
1.00
0.23
0.29
0.22
mot1
0.01
0.03
0.05
0.30
0.24
0.23
1.00
0.53
0.50
mot2
-0.03
-0.01
-0.02
0.36
0.31
0.29
0.53
1.00
0.56
mot3
-0.01
-0.01
0.07
0.32
0.27
0.22
0.50
0.56
1.00
Code
# Visualise correlations as a heatmapcor_long <- cor_mat %>%as.data.frame() %>% tibble::rownames_to_column("Var1") %>% tidyr::pivot_longer(-Var1, names_to ="Var2", values_to ="r")ggplot(cor_long, aes(x = Var1, y = Var2, fill = r)) +geom_tile(color ="white") +geom_text(aes(label =round(r, 2)), size =3.2) +scale_fill_gradient2(low ="tomato", mid ="white", high ="steelblue",midpoint =0, limits =c(-1, 1), name ="r") +theme_bw() +theme(axis.text.x =element_text(angle =45, hjust =1),panel.grid =element_blank()) +labs(title ="Correlation heatmap: nine questionnaire items",x ="", y ="")
The heatmap confirms the expected pattern: items within each scale (e.g., anx1–anx3) correlate strongly with each other and more weakly with items from the other scales. The efficacy and motivation items show moderate cross-scale correlations, consistent with our theoretical expectation that the two constructs are related.
Confirmatory Factor Analysis (CFA)
Section Overview
What you’ll learn: How to specify, fit, and evaluate a measurement model using CFA in lavaan
Key concepts: Factor loadings, model fit indices, reliability, convergent and discriminant validity
Why CFA before SEM: The measurement model must be established before structural paths are meaningful. If your indicators do not adequately reflect the intended latent constructs, the structural estimates will be uninterpretable.
What is Confirmatory Factor Analysis?
Confirmatory Factor Analysis (CFA) is a measurement modelling technique in which the researcher specifies in advance which observed variables (indicators) are assumed to reflect which latent factors (constructs), and then tests whether this specification is consistent with the observed data. This is what distinguishes CFA from Exploratory Factor Analysis (EFA): in EFA the factor structure is discovered from the data with no prior constraints; in CFA the factor structure is specified from theory and then confirmed (or disconfirmed) empirically.
In our example, we hypothesise three latent factors:
Anxiety (ANX), indicated by anx1, anx2, anx3
Self-Efficacy (EFF), indicated by eff1, eff2, eff3
Motivation (MOT), indicated by mot1, mot2, mot3
Specifying a CFA Model in lavaan
The lavaan package uses a simple, readable model syntax. The key operator for defining a measurement model is =~ which is read as “is measured by” or “is indicated by”:
By default, lavaan fixes the first indicator loading to 1.0 to set the scale of each latent variable (the marker variable method), freely estimates the remaining loadings, freely estimates all indicator residuals, and freely estimates all latent variable covariances. You can change these defaults using arguments to cfa() or sem().
Fitting the CFA Model
We fit the model using lavaan::cfa(). The default estimator is Maximum Likelihood (ML), which assumes multivariate normality of the observed variables.
lavaan 0.6-21 ended normally after 29 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 21
Number of observations 300
Model Test User Model:
Test statistic 12.806
Degrees of freedom 24
P-value (Chi-square) 0.969
Model Test Baseline Model:
Test statistic 856.110
Degrees of freedom 36
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.020
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3379.871
Loglikelihood unrestricted model (H1) -3373.468
Akaike (AIC) 6801.742
Bayesian (BIC) 6879.521
Sample-size adjusted Bayesian (SABIC) 6812.922
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value H_0: RMSEA <= 0.050 1.000
P-value H_0: RMSEA >= 0.080 0.000
Standardized Root Mean Square Residual:
SRMR 0.023
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ANX =~
anx1 1.000 0.770 0.754
anx2 0.937 0.090 10.447 0.000 0.721 0.742
anx3 0.965 0.092 10.479 0.000 0.743 0.757
EFF =~
eff1 1.000 0.833 0.824
eff2 0.923 0.082 11.253 0.000 0.768 0.733
eff3 0.825 0.075 10.992 0.000 0.687 0.706
MOT =~
mot1 1.000 0.664 0.677
mot2 1.134 0.116 9.768 0.000 0.753 0.782
mot3 1.031 0.108 9.571 0.000 0.685 0.716
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ANX ~~
EFF -0.004 0.046 -0.095 0.924 -0.007 -0.007
MOT 0.003 0.038 0.093 0.926 0.007 0.007
EFF ~~
MOT 0.291 0.049 5.901 0.000 0.526 0.526
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.anx1 0.449 0.059 7.658 0.000 0.449 0.431
.anx2 0.424 0.053 7.990 0.000 0.424 0.449
.anx3 0.412 0.054 7.583 0.000 0.412 0.427
.eff1 0.328 0.054 6.081 0.000 0.328 0.321
.eff2 0.508 0.058 8.695 0.000 0.508 0.463
.eff3 0.475 0.051 9.277 0.000 0.475 0.502
.mot1 0.521 0.056 9.266 0.000 0.521 0.542
.mot2 0.359 0.054 6.701 0.000 0.359 0.388
.mot3 0.447 0.053 8.464 0.000 0.447 0.488
ANX 0.592 0.089 6.629 0.000 1.000 1.000
EFF 0.693 0.092 7.551 0.000 1.000 1.000
MOT 0.441 0.076 5.835 0.000 1.000 1.000
This output contains three major sections: model fit information, factor loadings (both unstandardised and standardised), and latent variable covariances. We work through each in the sections below.
Interpreting Factor Loadings
Factor loadings express how strongly each indicator is related to its underlying latent variable. In the standardised solution (column Std.all), a loading can be interpreted like a correlation: it represents the expected change in the standardised indicator for a one-standard-deviation increase in the latent variable. Standardised loadings above 0.50 are generally considered acceptable; loadings above 0.70 are considered strong (Hair et al. 2019).
We extract and display the standardised loadings:
Code
loadings_df <- lavaan::standardizedsolution(cfa_fit) %>% dplyr::filter(op =="=~") %>% dplyr::select(Latent = lhs, Indicator = rhs,Std_Loading = est.std, SE = se,z = z, p = pvalue) %>% dplyr::mutate(across(where(is.numeric), ~round(.x, 3)))loadings_df %>%flextable() %>% flextable::set_table_properties(width = .85, layout ="autofit") %>% flextable::theme_zebra() %>% flextable::fontsize(size =11) %>% flextable::fontsize(size =11, part ="header") %>% flextable::align_text_col(align ="left") %>% flextable::set_caption(caption ="Standardised CFA factor loadings with standard errors and significance tests.") %>% flextable::border_outer()
Latent
Indicator
Std_Loading
SE
z
p
ANX
anx1
0.754
0.038
19.668
0
ANX
anx2
0.742
0.039
19.193
0
ANX
anx3
0.757
0.038
19.772
0
EFF
eff1
0.824
0.033
24.642
0
EFF
eff2
0.733
0.037
19.838
0
EFF
eff3
0.706
0.038
18.458
0
MOT
mot1
0.677
0.042
16.071
0
MOT
mot2
0.782
0.038
20.464
0
MOT
mot3
0.716
0.041
17.666
0
All standardised loadings should exceed 0.50, confirming that each indicator is a meaningful reflection of its intended latent construct.
Model Fit Assessment
Fitting a CFA does not automatically produce a good model. We must evaluate how well the specified model reproduces the observed covariance structure in the data. This is done using model fit indices — statistics that summarise the discrepancy between the model-implied covariance matrix and the observed covariance matrix.
Model Fit Indices: What They Mean and What Cut-offs to Use
No single fit index is sufficient. Report a combination of the following:
Index
Full name
What it measures
Acceptable
Good
χ²
Chi-square test
Overall model misfit (sensitive to N)
p > .05 (rarely achieved)
—
CFI
Comparative Fit Index
Fit relative to null model
≥ .90
≥ .95
TLI
Tucker–Lewis Index
Fit relative to null model (penalises complexity)
≥ .90
≥ .95
RMSEA
Root Mean Square Error of Approximation
Average misfit per degree of freedom
≤ .08
≤ .05
SRMR
Standardised Root Mean Square Residual
Average standardised residual
≤ .08
≤ .05
Cut-offs are from Hu and Bentler (1999). Note that these are guidelines, not hard thresholds. Model fit must always be evaluated in the context of model complexity and sample size.
The χ² test is almost always significant in moderate to large samples even for well-fitting models, because it is extremely sensitive to sample size. It is therefore standard practice to rely on the incremental and approximate fit indices (CFI, TLI, RMSEA, SRMR) rather than on χ² alone.
We extract the key fit indices:
Code
fit_indices <- lavaan::fitMeasures(cfa_fit,c("chisq", "df", "pvalue","cfi", "tli","rmsea", "rmsea.ci.lower", "rmsea.ci.upper","srmr")) %>%round(3)fit_df <-data.frame(Index =c("χ²", "df", "p (χ²)", "CFI", "TLI","RMSEA", "RMSEA 90% CI lower", "RMSEA 90% CI upper", "SRMR"),Value =as.numeric(fit_indices),Threshold =c("—", "—", "> .05", "≥ .95", "≥ .95","≤ .05", "—", "—", "≤ .05"))fit_df %>%flextable() %>% flextable::set_table_properties(width = .70, layout ="autofit") %>% flextable::theme_zebra() %>% flextable::fontsize(size =11) %>% flextable::fontsize(size =11, part ="header") %>% flextable::align_text_col(align ="left") %>% flextable::set_caption(caption ="CFA model fit indices with recommended thresholds.") %>% flextable::border_outer()
Index
Value
Threshold
χ²
12.806
—
df
24.000
—
p (χ²)
0.969
> .05
CFI
1.000
≥ .95
TLI
1.020
≥ .95
RMSEA
0.000
≤ .05
RMSEA 90% CI lower
0.000
—
RMSEA 90% CI upper
0.000
—
SRMR
0.023
≤ .05
Internal Consistency Reliability
Beyond model fit, we assess whether each scale is internally consistent — that is, whether the indicators of each latent variable reliably hang together. We use McDonald’s omega (ω), which is the preferred reliability coefficient for factor-based scales because, unlike Cronbach’s alpha, it does not assume equal factor loadings (McDonald 1999).
Values of ω ≥ .70 are generally considered acceptable for research purposes; ω ≥ .80 is considered good (Nunnally 1978).
Visualising the Measurement Model
The semPlot package produces path diagrams directly from a fitted lavaan object.
Code
semPlot::semPaths( cfa_fit,what ="std", # show standardised estimateslayout ="tree",rotation =2,edge.label.cex =0.85,sizeMan =7,sizeLat =10,color =list(lat ="steelblue", man ="lightyellow"),title =FALSE,style ="lisrel")title("CFA measurement model — standardised solution", cex.main =1)
Each oval represents a latent variable; each rectangle an observed indicator. The numbers on the arrows are standardised factor loadings; the numbers on the small arrows into each rectangle are standardised residual variances (unique errors).
Knowledge Check: CFA
Q1. In a CFA path diagram, what does a single-headed arrow from an oval to a rectangle represent?
Q2. A CFA model returns CFI = .88 and RMSEA = .09. What is the most appropriate conclusion?
Q3. What is the main difference between CFA and Exploratory Factor Analysis (EFA)?
Full Structural Equation Model
Section Overview
What you’ll learn: How to extend a CFA measurement model by adding directional structural paths between latent variables and outcomes
Once we are satisfied with the measurement model, we add the structural paths — the directional hypotheses about how the latent variables relate to each other and to the writing score outcome. Our theoretical model predicts:
Anxiety → Writing Score (negative effect: more anxious students perform worse)
Self-Efficacy → Writing Score (positive effect)
Self-Efficacy → Motivation (positive effect: more efficacious students are more motivated)
Motivation → Writing Score (positive effect)
Path (3) combined with path (4) constitutes an indirect effect of Self-Efficacy on Writing Score through Motivation — a mediation hypothesis examined in Section 6.
Specifying the Full SEM
In lavaan, structural paths are specified using the ~ operator, which is read as “is regressed on”:
Outcome ~ Predictor
We combine the measurement model (from the CFA) with the structural paths in a single model string:
Exogenous variables have no incoming arrows (they are only predictors, never outcomes). In our model, ANX and EFF are exogenous latent variables.
Endogenous variables have at least one incoming arrow (they are outcomes of at least one other variable). MOT and writing_score are endogenous.
Endogenous variables have a disturbance (residual error) term — the part of their variance not explained by the variables pointing to them. lavaan estimates disturbances automatically.
Fitting the Full SEM
We fit the full SEM using lavaan::sem(). The syntax is identical to cfa() but with the full model specification:
lavaan 0.6-21 ended normally after 56 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 24
Number of observations 300
Model Test User Model:
Test statistic 16.790
Degrees of freedom 31
P-value (Chi-square) 0.982
Model Test Baseline Model:
Test statistic 1250.761
Degrees of freedom 45
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.017
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -4417.659
Loglikelihood unrestricted model (H1) -4409.264
Akaike (AIC) 8883.317
Bayesian (BIC) 8972.208
Sample-size adjusted Bayesian (SABIC) 8896.094
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value H_0: RMSEA <= 0.050 1.000
P-value H_0: RMSEA >= 0.080 0.000
Standardized Root Mean Square Residual:
SRMR 0.023
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ANX =~
anx1 1.000 0.765 0.749
anx2 0.942 0.084 11.160 0.000 0.721 0.742
anx3 0.978 0.086 11.339 0.000 0.748 0.762
EFF =~
eff1 1.000 0.809 0.801
eff2 0.973 0.072 13.596 0.000 0.787 0.751
eff3 0.859 0.067 12.790 0.000 0.695 0.714
MOT =~
mot1 1.000 0.672 0.685
mot2 1.087 0.106 10.269 0.000 0.730 0.759
mot3 1.044 0.103 10.096 0.000 0.702 0.733
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
MOT ~
EFF 0.435 0.065 6.695 0.000 0.524 0.524
writing_score ~
ANX -7.257 0.795 -9.126 0.000 -5.550 -0.376
EFF 12.811 1.001 12.801 0.000 10.365 0.702
MOT 5.220 1.060 4.926 0.000 3.507 0.238
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ANX ~~
EFF -0.006 0.044 -0.145 0.885 -0.010 -0.010
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.anx1 0.457 0.054 8.528 0.000 0.457 0.439
.anx2 0.424 0.049 8.699 0.000 0.424 0.450
.anx3 0.404 0.049 8.228 0.000 0.404 0.420
.eff1 0.367 0.040 9.051 0.000 0.367 0.359
.eff2 0.478 0.048 10.006 0.000 0.478 0.435
.eff3 0.464 0.044 10.480 0.000 0.464 0.490
.mot1 0.511 0.054 9.499 0.000 0.511 0.531
.mot2 0.393 0.049 7.982 0.000 0.393 0.424
.mot3 0.423 0.049 8.586 0.000 0.423 0.462
.writing_score 28.004 5.613 4.989 0.000 28.004 0.128
ANX 0.585 0.086 6.836 0.000 1.000 1.000
EFF 0.655 0.082 7.938 0.000 1.000 1.000
.MOT 0.328 0.058 5.651 0.000 0.726 0.726
Structural Path Estimates
We extract the structural paths for a clean summary table:
Code
sem_paths_df <- lavaan::standardizedsolution(sem_fit) %>% dplyr::filter(op =="~") %>% dplyr::select(Outcome = lhs, Predictor = rhs,Std_Estimate = est.std, SE = se,z = z, p = pvalue) %>% dplyr::mutate(across(where(is.numeric), ~round(.x, 3)),Sig = dplyr::case_when( p < .001~"***", p < .01~"**", p < .05~"*",TRUE~"" ) )sem_paths_df %>%flextable() %>% flextable::set_table_properties(width = .90, layout ="autofit") %>% flextable::theme_zebra() %>% flextable::fontsize(size =11) %>% flextable::fontsize(size =11, part ="header") %>% flextable::align_text_col(align ="left") %>% flextable::set_caption(caption ="Standardised structural path coefficients from the full SEM.") %>% flextable::border_outer()
Outcome
Predictor
Std_Estimate
SE
z
p
Sig
MOT
EFF
0.524
0.058
8.989
0
***
writing_score
ANX
-0.376
0.038
-10.003
0
***
writing_score
EFF
0.702
0.041
17.174
0
***
writing_score
MOT
0.238
0.045
5.234
0
***
Standardised path coefficients can be interpreted similarly to standardised regression coefficients (β): they indicate the expected change in the outcome (in standard deviation units) for a one-standard-deviation increase in the predictor, holding all other predictors constant.
Q1. In the lavaan model syntax, what does the ~ operator specify?
Q2. A standardised structural path coefficient of β = −0.42 (p < .001) from Anxiety to Writing Score means:
Mediation Analysis
Section Overview
What you’ll learn: How to test mediation hypotheses — indirect effects of one variable on another via a third — within an SEM framework
Key concepts: Direct effects, indirect effects, total effects, bootstrapped confidence intervals
What is Mediation?
Mediation occurs when the effect of a predictor (X) on an outcome (Y) operates — at least in part — through an intervening variable, the mediator (M). Rather than a simple direct path X → Y, the effect is transmitted via the chain X → M → Y.
In our example, the theoretical mediation hypothesis is:
Self-Efficacy (EFF) influences Writing Score both directly and indirectly by increasing Motivation (MOT), which in turn improves Writing Score.
This decomposes the total effect of Self-Efficacy on Writing Score into:
A direct effect: EFF → writing_score
An indirect effect (via Motivation): EFF → MOT → writing_score
The total effect = direct + indirect
Specifying Mediation in lavaan
lavaan uses labels to name individual paths, which can then be combined using the := operator to define new parameters such as indirect and total effects. Labels are assigned by prefixing a path coefficient with a name followed by *:
Code
mediation_model <-' # --- Measurement model --- ANX =~ anx1 + anx2 + anx3 EFF =~ eff1 + eff2 + eff3 MOT =~ mot1 + mot2 + mot3 # --- Structural paths (labelled for mediation) --- MOT ~ a * EFF # path a: EFF -> MOT writing_score ~ b * MOT # path b: MOT -> writing_score writing_score ~ c * EFF + ANX # path c: direct effect EFF -> writing_score # --- Defined parameters --- indirect := a * b # indirect effect of EFF via MOT total := c + (a * b) # total effect of EFF on writing_score'
Bootstrapped Confidence Intervals for Indirect Effects
Indirect effects are the product of two path coefficients (a × b). Their sampling distribution is often asymmetric and non-normal, which makes standard errors based on normality assumptions unreliable. The recommended approach is bootstrapping: repeatedly resampling from the data, re-fitting the model, and using the resulting distribution of indirect effect estimates to construct confidence intervals. If the 95% bootstrapped CI does not contain zero, the indirect effect is considered statistically significant.
Code
set.seed(42)med_fit <- lavaan::sem(mediation_model,data = semdata,estimator ="ML",se ="bootstrap",bootstrap =1000)# Extract direct, indirect, and total effectsmed_effects <- lavaan::parameterEstimates(med_fit, boot.ci.type ="bca.simple") %>% dplyr::filter(label %in%c("a", "b", "c", "indirect", "total")) %>% dplyr::select(Label = label, Estimate = est, SE = se,CI_lower = ci.lower, CI_upper = ci.upper, p = pvalue) %>% dplyr::mutate(across(where(is.numeric), ~round(.x, 3)))med_effects %>%flextable() %>% flextable::set_table_properties(width = .85, layout ="autofit") %>% flextable::theme_zebra() %>% flextable::fontsize(size =11) %>% flextable::fontsize(size =11, part ="header") %>% flextable::align_text_col(align ="left") %>% flextable::set_caption(caption ="Direct, indirect (mediated), and total effects with bootstrapped 95% CIs (1000 resamples).") %>% flextable::border_outer()
Label
Estimate
SE
CI_lower
CI_upper
p
a
0.435
0.061
0.330
0.573
0
b
5.220
1.136
2.970
7.599
0
c
12.811
1.170
10.722
15.377
0
indirect
2.270
0.516
1.386
3.423
0
total
15.080
1.072
13.180
17.404
0
Interpreting Mediation Results
To interpret mediation, we examine the following:
Path a (EFF → MOT): Is Self-Efficacy a significant predictor of Motivation?
Path b (MOT → writing_score): Is Motivation a significant predictor of Writing Score (controlling for the other predictors)?
Indirect effect (a × b): Is the product significant, as indicated by a 95% CI that excludes zero?
Direct effect c (EFF → writing_score): Does Self-Efficacy still predict Writing Score after accounting for the mediation?
If both the indirect effect is significant and the direct effect remains significant, we have partial mediation: Motivation carries part of the effect of Self-Efficacy to Writing Score, but Self-Efficacy also has an effect above and beyond that mediated path. If the direct effect becomes non-significant while the indirect effect is significant, we have full mediation.
A Note on Causal Language
Mediation analysis is often discussed in causal terms (“X causes Y through M”). However, causal inference from cross-sectional observational data is not straightforward. A statistically significant indirect effect demonstrates that the data are consistent with a mediation mechanism — it does not prove causation. To make stronger causal claims, researchers need longitudinal designs, experimental manipulation of the mediator, or other causal identification strategies.
Knowledge Check: Mediation
Q1. What is the indirect effect in a mediation model?
Q2. Why are bootstrapped confidence intervals preferred over standard (normal-theory) confidence intervals for indirect effects?
Model Comparison and Modification
Section Overview
What you’ll learn: How to compare alternative SEM specifications using formal tests and fit indices, and how to use modification indices responsibly
Key concepts: Nested models, likelihood ratio (chi-square difference) test, AIC/BIC, modification indices
Why Compare Models?
In practice, researchers often have competing theoretical models — alternative specifications that make different predictions about which paths should be present or absent. SEM provides tools for formally comparing such models. There are two situations:
Nested models: Model A is a special case of Model B (Model A is Model B with one or more paths fixed to zero). These can be compared with a chi-square difference test (Δχ²).
Non-nested models: Neither model is a special case of the other. These are compared using information criteria (AIC, BIC): lower values indicate better fit, penalised for model complexity.
Comparing a Constrained Model
Suppose a reviewer argues that the direct path from Self-Efficacy to Writing Score is unnecessary and that all of Self-Efficacy’s influence on Writing Score is mediated through Motivation. We test this by fitting a constrained model with the direct EFF → writing_score path removed, and comparing it to the full model.
A significant Δχ² (p < .05) indicates that the constrained model fits significantly worse — that is, removing the direct path causes a significant deterioration in fit, providing evidence that the direct path contributes meaningfully and should be retained.
We also compare AIC and BIC:
Code
aic_bic <-data.frame(Model =c("Full model (with direct EFF path)","Constrained model (no direct EFF path)"),AIC =round(c(AIC(sem_fit), AIC(constrained_fit)), 1),BIC =round(c(BIC(sem_fit), BIC(constrained_fit)), 1))aic_bic %>%flextable() %>% flextable::set_table_properties(width = .80, layout ="autofit") %>% flextable::theme_zebra() %>% flextable::fontsize(size =11) %>% flextable::fontsize(size =11, part ="header") %>% flextable::align_text_col(align ="left") %>% flextable::set_caption(caption ="Model comparison: AIC and BIC for the full and constrained models.") %>% flextable::border_outer()
Model
AIC
BIC
Full model (with direct EFF path)
8,883.3
8,972.2
Constrained model (no direct EFF path)
9,003.1
9,088.3
The preferred model has the lower AIC (and lower BIC, which applies a stronger penalty for model complexity). A difference of more than 10 in BIC is generally considered strong evidence in favour of the model with the lower value.
Modification Indices
If a model fits poorly, modification indices (MIs) can help diagnose which additional paths or covariances would most improve fit. Each MI indicates how much the overall model χ² would decrease if a particular currently-fixed parameter were freed (estimated).
Modification indices are a double-edged sword. They are useful for diagnosing systematic misfit (e.g., correlated residuals between items that share method variance). However, acting on every high MI and re-fitting the model is a form of capitalising on chance: the revised model will fit the current sample better but may not generalise.
Rules of thumb for responsible use of MIs:
Theory first: only free a parameter if there is a substantive, theoretically defensible reason to do so.
One at a time: modify one parameter, re-fit, re-inspect — do not free multiple parameters simultaneously.
Cross-validate: if sample size permits, split the data and use one half to explore modifications and the other to confirm them.
Report transparently: if modifications were made post-hoc, report this explicitly and distinguish the revised model from the originally hypothesised model.
Knowledge Check: Model Comparison
Q1. What does a significant chi-square difference test (Δχ²) between two nested models indicate?
Q2. A modification index of 24.5 suggests adding a cross-loading of anx2 onto the EFF factor. Should you add this path?
Reporting Standards
Reporting SEM results clearly and completely is as important as the analysis itself. This section summarises conventions for reporting SEM in linguistics and applied linguistics research.
The full theoretical rationale for the hypothesised model
Which variables are latent vs. observed; which indicators load onto which factors
Software and estimator used (e.g., “Models were estimated in R using the lavaan package [version X] with Maximum Likelihood estimation”)
Measurement model (CFA)
Standardised factor loadings for all indicators (with SEs and significance)
All model fit indices: χ²(df), CFI, TLI, RMSEA (with 90% CI), SRMR
Scale reliabilities (McDonald’s ω or Cronbach’s α)
Structural model
Standardised path coefficients (with SEs and significance)
R² for all endogenous variables
Model fit indices
Mediation (if applicable)
Labelled paths (a, b, c/c’), indirect effect, total effect
Bootstrapped confidence intervals (state N of resamples)
Whether partial or full mediation was found
Model comparisons (if applicable)
Δχ², Δdf, p-value for nested comparisons
AIC/BIC for non-nested comparisons
Model Reporting Paragraphs
CFA
A three-factor measurement model was specified a priori based on the theoretical framework, with Language Anxiety (ANX), Writing Self-Efficacy (EFF), and Motivation (MOT) each indicated by three Likert-scale items (nine indicators in total). The model was estimated using Maximum Likelihood in R (lavaan). Model fit was excellent: χ²(df) = X.XX, CFI = .97, TLI = .96, RMSEA = .04 [90% CI: .01, .07], SRMR = .04. All standardised factor loadings were significant and exceeded 0.70 (range: .71–.82), and McDonald’s ω exceeded .80 for all three scales, indicating good reliability. The measurement model was retained for subsequent structural analysis.
Full SEM
The structural model specified directional effects of Language Anxiety and Writing Self-Efficacy on Writing Score, and an effect of Self-Efficacy on Motivation. Model fit was acceptable: χ²(df) = X.XX, CFI = .96, TLI = .95, RMSEA = .05 [90% CI: .02, .07], SRMR = .05. Writing Self-Efficacy was a significant positive predictor of both Motivation (β = .55, SE = .07, p < .001) and Writing Score (β = .47, SE = .08, p < .001). Language Anxiety was a significant negative predictor of Writing Score (β = −.38, SE = .07, p < .001). Motivation significantly predicted Writing Score (β = .21, SE = .07, p = .003). Together, the predictors explained 58% of the variance in Writing Score.
Mediation
To test whether the effect of Writing Self-Efficacy on Writing Score was partially mediated by Motivation, we re-estimated the model with labelled paths and requested 1000 bootstrap resamples for inference on the indirect effect. The indirect effect of Self-Efficacy on Writing Score via Motivation was significant (unstandardised b = X.XX, 95% BCa CI [X.XX, X.XX]), indicating that part of the positive effect of self-efficacy on writing performance operates through increased motivation. The direct effect of Self-Efficacy on Writing Score remained significant after accounting for this indirect path, supporting partial mediation.
Quick Reference: SEM Workflow
Step
Action
Key R function(s)
1. Theoretical specification
Draw path diagram; specify which indicators load onto which factors and which structural paths are hypothesised
—
2. Descriptive checks
Examine distributions (skewness, kurtosis), correlations; check for multivariate outliers
psych::describe(); cor()
3. Confirmatory Factor Analysis
Fit measurement model with lavaan::cfa()
lavaan::cfa()
4. Evaluate measurement fit
Inspect CFI, TLI, RMSEA, SRMR against recommended thresholds
lavaan::fitMeasures()
5. Assess reliability
Compute McDonald's ω with semTools::reliability()
semTools::reliability()
6. Full SEM
Add structural paths; fit with lavaan::sem()
lavaan::sem()
7. Mediation (if applicable)
Label paths; define indirect/total effects with ':='; use se = 'bootstrap'
lavaan::sem(se = 'bootstrap')
8. Model comparison
Use lavTestLRT() for nested models; AIC/BIC for non-nested; consult MIs with theory
lavaan::lavTestLRT(); AIC(); modindices()
9. Report
Report all fit indices, standardised loadings, path coefficients, R², and effect CIs
Schweinberger, Martin. 2026. Structural Equation Modelling in R. Brisbane: The University of Queensland. url: https://ladal.edu.au/tutorials/sem/sem.html (Version 2026.02.23).
@manual{schweinberger2026sem,
author = {Schweinberger, Martin},
title = {Structural Equation Modelling in R},
note = {tutorials/sem/sem.html},
year = {2026},
organization = {The University of Queensland, School of Languages and Cultures},
address = {Brisbane},
edition = {2026.02.23}
}
Anderson, James C., and David W. Gerbing. 1988. “Structural Equation Modeling in Practice: A Review and Recommended Two-Step Approach.”Psychological Bulletin 103 (3): 411–23.
Hair, Joseph F., William C. Black, Barry J. Babin, and Rolph E. Anderson. 2019. Multivariate Data Analysis. 8th ed. Andover: Cengage.
Hu, Li-tze, and Peter M. Bentler. 1999. “Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria Versus New Alternatives.”Structural Equation Modeling: A Multidisciplinary Journal 6 (1): 1–55.
Jackson, Dennis L., J. Arthur Gillaspy, and Rebecca Purc-Stephenson. 2009. “Reporting Practices in Confirmatory Factor Analysis: An Overview and Some Recommendations.”Psychological Methods 14 (1): 6–23.
Kline, Rex B. 2023. Principles and Practice of Structural Equation Modeling. 5th ed. New York: Guilford Press.
McDonald, Roderick P. 1999. Test Theory: A Unified Treatment. Mahwah, NJ: Lawrence Erlbaum Associates.
Nunnally, Jum C. 1978. Psychometric Theory. 2nd ed. New York: McGraw-Hill.
AI Transparency Statement
This tutorial was developed with the assistance of Claude (claude.ai), a large language model created by Anthropic. Claude was used to help draft the tutorial text, structure the instructional content, generate the R code examples, and write the checkdown quiz questions and feedback strings. All content was reviewed, edited, and approved by the author (Martin Schweinberger), who takes full responsibility for the accuracy and pedagogical appropriateness of the material. The use of AI assistance is disclosed here in the interest of transparency and in accordance with emerging best practices for AI-assisted academic content creation.