Keyness and Keyword Analysis in R

Author

Martin Schweinberger

Introduction

This tutorial introduces keyness and keyword analysis — a set of corpus-linguistic methods for identifying words that are statistically characteristic of one text or corpus when compared to another. Keywords play a pivotal role in text analysis, serving as distinctive terms that hold particular significance within a given text, context, or collection. These words stand out due to their heightened frequency in a specific text or context, setting them apart from their occurrence in another. In essence, keywords are linguistic markers that encapsulate the essence or topical focus of a document or dataset. The process of identifying keywords involves a methodology akin to the one employed for detecting collocations using kwics: we compare the use of a particular word in a target corpus A against its use in a reference corpus B. By discerning the frequency disparities, we gain valuable insights into the salient terms that contribute significantly to the unique character and thematic emphasis of a given text or context.1

This tutorial is aimed at beginners and intermediate users of R with the aim of showcasing how to extract keywords from and analyze keywords in textual data using R. The aim is not to provide a fully-fledged analysis but rather to show and exemplify selected useful methods associated with keyness and keyword analysis.

Prerequisite Tutorials

To be able to follow this tutorial, we suggest you check out and familiarise yourself with the content of the following R Basics tutorials:

Familiarity with basic frequency analysis and with the concept of statistical significance testing will be particularly helpful for understanding the keyness statistics introduced in this tutorial.


Learning Objectives

By the end of this tutorial you will be able to:

  1. Explain what a keyword is and how keyness analysis differs from simple frequency analysis
  2. Describe the dimensions of keyness proposed by Egbert and Biber (2019) and Sønning (2023) — frequency vs. dispersion, and target-intrinsic vs. comparative
  3. Construct the 2×2 contingency table that underlies all keyness statistics
  4. Compute a comprehensive suite of keyness measures in R — G², χ², phi, MI, PMI, Log Odds Ratio, Rate Ratio, Rate Difference, Difference Coefficient, Odds Ratio, DeltaP, and Signed DKL
  5. Apply Fisher’s Exact Test and Bonferroni correction to assess and control for statistical significance
  6. Visualise keyword results using dot plots, bar plots, and comparison word clouds
  7. Interpret types (overrepresented words) and antitypes (underrepresented words) substantively
  8. Report keyword analyses in accordance with best-practice conventions in corpus linguistics


What This Tutorial Covers
  1. Dimensions of keyness — frequency vs. dispersion, discernibility vs. distinctiveness, target-intrinsic vs. comparative approaches
  2. The 2×2 contingency table — the logical and mathematical foundation of keyword identification
  3. Computing keyness statistics — G², χ², phi, MI, PMI, Log Odds Ratio, Rate Ratio, Rate Difference, Difference Coefficient, Odds Ratio, DeltaP, Signed DKL
  4. Significance testing — Fisher’s Exact Test and Bonferroni correction for multiple comparisons
  5. Visualising keywords — dot plots, bar plots, and comparison word clouds
  6. Reporting standards — what to report, model paragraphs, and a quick-reference checklist

Preparation and Session Set-up

This tutorial is based on R. If you have not installed R or are new to it, you will find an introduction to and more information on how to use R here. For this tutorial we need to install certain packages from an R library so that the scripts shown below are executed without errors. Before turning to the code below, please install the packages by running the code in this section. If you have already installed the packages mentioned below, then you can skip ahead and ignore this section. To install the necessary packages, simply run the following code — it may take some time (between 1 and 5 minutes to install all of the libraries so you do not need to worry if it takes some time).

Code
# set options
options(stringsAsFactors = F)
options(scipen = 999)
options(max.print = 1000)
# install packages
install.packages("checkdown")
install.packages("flextable")
install.packages("Matrix")
install.packages("quanteda")
install.packages("quanteda.textplots")
install.packages("dplyr")
install.packages("stringr")
install.packages("tidyr")
install.packages("tm")
install.packages("ggplot2")

Next, we load the packages.

Code
# load packages
library(checkdown)             # interactive quiz questions
library(flextable)             # formatted tables
library(Matrix)                # sparse matrix support
library(quanteda)              # corpus and tokenisation tools
library(quanteda.textplots)    # word clouds and text visualisations
library(dplyr)                 # data manipulation
library(stringr)               # string processing
library(tidyr)                 # data reshaping
library(tm)                    # stopword lists
library(ggplot2)               # data visualisation

Interactive Keyword Tool

KEYWORD TOOL

Click here to open an notebook-based tool
that calculates keyness statistics and allows you to download the results.


How can you detect keywords — words that are characteristic of a text or a collection of texts?


This tutorial aims to show how you can answer this question.


Keywords

Section Overview

What you’ll learn: What keywords are, why they matter, and how keyword identification relates to frequency analysis

Key concepts: Target corpus, reference corpus, typicalness, keyword, antitype

Why it matters: Understanding the logic of keyness is essential before computing any statistics — knowing what a keyword is tells you how to choose the right measure and how to interpret the results

Keywords play a central role in corpus linguistics and computational text analysis. In everyday language, the word keyword may mean simply an important or central word in a document. In corpus linguistics, however, the term has a more precise, comparative meaning: a keyword is a word whose frequency — or whose distribution — in a target corpus is statistically unusual compared to a reference corpus (Scott 1997; Stubbs 2010).

This comparative logic is fundamental. Consider the word whale: it will be extremely frequent in a corpus of whaling narratives (such as Melville’s Moby Dick) but far less common in dystopian fiction. Its relative excess in the whaling corpus is what makes it a keyword there — not its raw frequency per se, but its frequency relative to a baseline. The reference corpus serves as that baseline, providing an estimate of how often we would expect a given word to appear in text generally, against which we assess whether its occurrence in the target corpus is surprising.

The identification of keywords is used across a wide range of applications in linguistics and beyond, including:

  • Stylistic analysis — characterising an author’s distinctive vocabulary relative to contemporaries or a general corpus
  • Genre analysis — identifying what makes a genre lexically distinctive
  • Diachronic studies — tracking which words become more or less characteristic of a variety over time
  • Discourse analysis — revealing vocabulary associated with a particular social group or ideological position
  • Language pedagogy — identifying vocabulary that is key to a specific academic field or register
The Reference Corpus Matters

The reference corpus is not a neutral backdrop — it shapes every keyword that emerges from the analysis. A study comparing academic writing to news prose will produce very different keywords than one comparing the same academic texts to spoken conversation. Always report what your reference corpus is, justify why it is the appropriate baseline for your research question, and interpret all keywords in light of that choice.


Dimensions of Keyness

Section Overview

What you’ll learn: The theoretical framework for understanding different types of keyness — frequency-based vs. dispersion-based, and target-intrinsic vs. comparative

Key references: Egbert and Biber (2019); Sønning (2023)

Why it matters: Not all keyness measures capture the same property of language. Understanding the dimensions of keyness helps you choose the measure that best reflects your research question.

Before turning to the practicalities of computing keyness, it is worth considering what typicalness — the theoretical goal of keyness analysis — actually means. This question has received renewed attention in recent methodological work (Sønning 2023).

Keyness analysis identifies typical items in a discourse domain, where typicalness traditionally relates to frequency of occurrence: the emphasis is on items used more frequently in the target corpus compared to a reference corpus. Egbert and Biber (2019) expanded this notion by highlighting two distinct criteria for typicalness: content-distinctiveness and content-generalizability.

  • Content-distinctiveness refers to an item’s association with the domain and its topical relevance — how much more (or less) it is used in the target than in a reference corpus.
  • Content-generalizability pertains to an item’s widespread usage across various texts within the target domain — whether the word surfaces broadly or is concentrated in just a handful of documents.

These criteria bridge traditional keyness approaches with broader linguistic perspectives, emphasising both the distinctiveness and the generalizability of key items within a corpus.

Following Sønning (2023), we can adopt Egbert and Biber (2019)’s keyness criteria and distinguish between frequency-oriented and dispersion-oriented approaches to assess keyness. We can also distinguish between keyness features that are assessed relative to the target variety only (target-intrinsic) and those that emerge only from a comparison to a reference variety (comparative). This four-way classification, detailed in the table below, links methodological choices to the linguistic meaning conveyed by quantitative measures:

Analysis

Frequency-oriented

Dispersion-oriented

Target variety in isolation

Discernibility of item in the target variety

Generality across texts in the target variety

Comparison to reference variety

Distinctiveness relative to the reference variety

Comparative generality relative to the reference variety

The second key aspect of keyness involves an item’s dispersion across texts in the target domain, indicating its widespread use. A typical item should appear evenly across various texts within the target domain, reflecting its generality. This breadth of usage can be compared to its occurrence in the reference domain — termed comparative generality. Therefore, a key item should exhibit greater prevalence across target texts compared to those in the reference domain.

In this tutorial we focus primarily on the frequency-comparative quadrant: identifying words that are significantly more (or less) frequent in the target corpus than in the reference corpus. This is by far the most commonly implemented approach in corpus-linguistic research and the one found in tools such as AntConc, WordSmith Tools, and Sketch Engine. Dispersion-based approaches are an important complementary perspective but are beyond the scope of this introductory tutorial.

Exercises: Dimensions of Keyness

Q1. A word appears 800 times in a target corpus of 200,000 tokens, but it also appears very frequently in the reference corpus in proportion to its size. Is this word necessarily a keyword?






Q2. What is the difference between content-distinctiveness and content-generalizability as described by Egbert & Biber (2019)?






Identifying Keywords

Section Overview

What you’ll learn: The logical and mathematical structure of keyword identification — how the 2×2 contingency table works and what information it captures

Key concepts: O11, O12, O21, O22, expected frequencies, null hypothesis, types, antitypes

Why it matters: Every keyness statistic — from G² to MI to the Log Odds Ratio — is computed from this same table. Understanding it is the key to understanding all measures.

Here, we focus on a frequency-based approach that assesses distinctiveness relative to the reference variety. To identify these keywords, we follow the procedure used to identify collocations using kwics — the idea is essentially identical: we compare the use of a word in a target corpus A to its use in a reference corpus B.

To determine if a token is a keyword — whether it occurs significantly more frequently in a target corpus compared to a reference corpus — we use the following information arranged in a 2×2 contingency table:

  • O11 = Number of times wordx occurs in the target corpus
  • O12 = Number of times wordx occurs in the reference corpus (without target corpus)
  • O21 = Number of times other words occur in the target corpus
  • O22 = Number of times other words occur in the reference corpus
Target corpus Reference corpus Row total
token O11 O12 = R1
other tokens O21 O22 = R2
Column total = C1 = C2 = N

From these observed counts we compute expected frequencies — the counts we would expect if wordx were distributed in exact proportion to the sizes of the two corpora (i.e., the null hypothesis of no keyness):

\[E_{11} = \frac{R_1 \times C_1}{N}, \quad E_{12} = \frac{R_1 \times C_2}{N}\]

If the observed O11 substantially exceeds E11, the word appears more often in the target than chance would predict: it is a candidate keyword, also called a type. If O11 is substantially below E11, the word is underrepresented in the target: it is an antitype — a keyword of the reference corpus.

Types and Antitypes

Both directions of keyness are substantively informative:

  • A type is a word used significantly more in the target corpus than expected — it characterises the target.
  • An antitype is a word used significantly less in the target corpus than expected — it characterises the reference corpus, or equivalently, is avoided in the target.

Antitypes can reveal what a text or genre systematically avoids saying, which is often as theoretically meaningful as what it uses abundantly. For example, if we compare political speeches to news reporting, words significantly avoided in speeches (antitypes) can illuminate strategic communicative choices.


Data: Two Literary Texts

We begin with loading two texts. text1 is our target and text2 is our reference.

Code
# load data
text1 <- base::readRDS("tutorials/key/data/orwell.rda", "rb") %>%
  paste0(collapse = " ")
text2 <- base::readRDS("tutorials/key/data/melville.rda", "rb") %>%
  paste0(collapse = " ")

We inspect the first 200 characters of each text to confirm what we are working with:

.

1984 George Orwell Part 1, Chapter 1 It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, sli

As you can see, text1 is George Orwell’s Nineteen Eighty-Four.

.

MOBY-DICK; or, THE WHALE. By Herman Melville CHAPTER 1. Loomings. Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interes

The table shows that text2 is Herman Melville’s Moby Dick. These two novels are chosen because they are stylistically and thematically very different — one a mid-twentieth-century dystopian political novel, the other a nineteenth-century nautical adventure — which produces clear and interpretable keywords, making them ideal for illustrative purposes.


Computing Keyness Statistics

Section Overview

What you’ll learn: How to tokenise two texts, build frequency and contingency tables, and calculate a comprehensive suite of keyness measures in R — step by step

Key statistics computed: G² (log-likelihood), χ², phi, MI, PMI, Log Odds Ratio, Rate Ratio, Rate Difference, Difference Coefficient, Odds Ratio, DeltaP, Signed DKL, LLR

Why it matters: Building the analysis from scratch means you understand exactly what each step does and can adapt it to your own corpora and research questions

After loading the two texts, we create a frequency table of the first text (the target).

Code
text1_words <- text1 %>%
  # remove non-word characters
  stringr::str_remove_all("[^[:alpha:] ]") %>%
  # convert to lower case
  tolower() %>%
  # tokenize the corpus files
  quanteda::tokens(
    remove_punct   = TRUE,
    remove_symbols = TRUE,
    remove_numbers = TRUE
  ) %>%
  # unlist the tokens to create a data frame
  unlist() %>%
  as.data.frame() %>%
  # rename the column to 'token'
  dplyr::rename(token = 1) %>%
  # group by 'token' and count occurrences
  dplyr::group_by(token) %>%
  dplyr::summarise(n = n()) %>%
  # add column stating where the frequency list is 'from'
  dplyr::mutate(type = "text1")

Now, we create a frequency table for the second text (the reference).

Code
text2_words <- text2 %>%
  # remove non-word characters
  stringr::str_remove_all("[^[:alpha:] ]") %>%
  # convert to lower case
  tolower() %>%
  # tokenize the corpus files
  quanteda::tokens(
    remove_punct   = TRUE,
    remove_symbols = TRUE,
    remove_numbers = TRUE
  ) %>%
  # unlist the tokens to create a data frame
  unlist() %>%
  as.data.frame() %>%
  # rename the column to 'token'
  dplyr::rename(token = 1) %>%
  # group by 'token' and count occurrences
  dplyr::group_by(token) %>%
  dplyr::summarise(n = n()) %>%
  # add column stating where the frequency list is 'from'
  dplyr::mutate(type = "text2")

In a next step, we combine the two frequency tables. We use a left join so that every word from the target corpus appears in the combined table, with a zero count assigned to words that do not appear in the reference corpus.

Code
texts_df <- dplyr::left_join(text1_words, text2_words, by = c("token")) %>%
  # rename columns and select relevant columns
  dplyr::rename(
    text1 = n.x,
    text2 = n.y
  ) %>%
  dplyr::select(-type.x, -type.y) %>%
  # replace NA values with 0
  tidyr::replace_na(list(text1 = 0, text2 = 0))

token

text1

text2

a

2,390

4,536

aaronson

8

0

aback

2

2

abandon

3

3

abandoned

4

7

abashed

1

2

abbreviated

1

0

abiding

1

1

ability

1

1

abject

3

0

We now calculate the observed and expected frequencies as well as the row and column totals needed to fill the 2×2 contingency table for each word.

Code
texts_df %>%
  dplyr::mutate(
    text1 = as.numeric(text1),
    text2 = as.numeric(text2)
  ) %>%
  dplyr::mutate(
    C1 = sum(text1),
    C2 = sum(text2),
    N  = C1 + C2
  ) %>%
  dplyr::rowwise() %>%
  dplyr::mutate(
    R1  = text1 + text2,
    R2  = N - R1,
    O11 = text1,
    O11 = ifelse(O11 == 0, O11 + 0.1, O11),  # small offset to avoid log(0)
    O12 = R1 - O11,
    O21 = C1 - O11,
    O22 = C2 - O12
  ) %>%
  dplyr::mutate(
    E11 = (R1 * C1) / N,
    E12 = (R1 * C2) / N,
    E21 = (R2 * C1) / N,
    E22 = (R2 * C2) / N
  ) %>%
  dplyr::select(-text1, -text2) -> stats_tb2

token

C1

C2

N

R1

R2

O11

O12

O21

O22

E11

E12

E21

E22

a

94,677

169,163

263,840

6,926

256,914

2,390

4,536

92,287

164,627

2,485.3430185

4,440.6569815

92,191.66

164,722.3

aaronson

94,677

169,163

263,840

8

263,832

8

0

94,669

169,163

2.8707398

5.1292602

94,674.13

169,157.9

aback

94,677

169,163

263,840

4

263,836

2

2

94,675

169,161

1.4353699

2.5646301

94,675.56

169,160.4

abandon

94,677

169,163

263,840

6

263,834

3

3

94,674

169,160

2.1530549

3.8469451

94,674.85

169,159.2

abandoned

94,677

169,163

263,840

11

263,829

4

7

94,673

169,156

3.9472673

7.0527327

94,673.05

169,155.9

abashed

94,677

169,163

263,840

3

263,837

1

2

94,676

169,161

1.0765274

1.9234726

94,675.92

169,161.1

abbreviated

94,677

169,163

263,840

1

263,839

1

0

94,676

169,163

0.3588425

0.6411575

94,676.64

169,162.4

abiding

94,677

169,163

263,840

2

263,838

1

1

94,676

169,162

0.7176850

1.2823150

94,676.28

169,161.7

ability

94,677

169,163

263,840

2

263,838

1

1

94,676

169,162

0.7176850

1.2823150

94,676.28

169,161.7

abject

94,677

169,163

263,840

3

263,837

3

0

94,674

169,163

1.0765274

1.9234726

94,675.92

169,161.1

We can now calculate the keyness measures. Each statistic is described in detail in Section 7 below.

Code
stats_tb2 %>%
  # determine number of rows (for Bonferroni correction)
  dplyr::mutate(Rws = nrow(.)) %>%
  # work row-wise
  dplyr::rowwise() %>%
  # calculate Fisher's Exact Test p-value
  dplyr::mutate(p = as.vector(unlist(fisher.test(matrix(
    c(O11, O12, O21, O22), ncol = 2, byrow = TRUE
  ))[1]))) %>%
  # relative frequencies per thousand words
  dplyr::mutate(
    ptw_target = O11 / C1 * 1000,
    ptw_ref    = O12 / C2 * 1000
  ) %>%
  # chi-square statistic
  dplyr::mutate(
    X2 = (O11 - E11)^2 / E11 +
         (O12 - E12)^2 / E12 +
         (O21 - E21)^2 / E21 +
         (O22 - E22)^2 / E22
  ) %>%
  # extract keyness measures
  dplyr::mutate(
    phi              = sqrt(X2 / N),
    MI               = log2(O11 / E11),
    t.score          = (O11 - E11) / sqrt(O11),
    PMI              = log2((O11 / N) / ((O11 + O12) / N) * ((O11 + O21) / N)),
    DeltaP           = (O11 / R1) - (O21 / R2),
    LogOddsRatio     = log(((O11 + 0.5) * (O22 + 0.5)) / ((O12 + 0.5) * (O21 + 0.5))),
    G2               = 2 * (
      (O11 + 0.001) * log((O11 + 0.001) / E11) +
      (O12 + 0.001) * log((O12 + 0.001) / E12) +
       O21           * log(O21 / E21) +
       O22           * log(O22 / E22)
    ),
    # traditional keyness measures
    RateRatio             = ((O11 + 0.001) / (C1 * 1000)) / ((O12 + 0.001) / (C2 * 1000)),
    RateDifference        = (O11 / (C1 * 1000)) - (O12 / (C2 * 1000)),
    DifferenceCoefficient = RateDifference / sum((O11 / (C1 * 1000)), (O12 / (C2 * 1000))),
    OddsRatio             = ((O11 + 0.5) * (O22 + 0.5)) / ((O12 + 0.5) * (O21 + 0.5)),
    LLR                   = 2 * (O11 * (log(O11 / E11))),
    RDF                   = abs((O11 / C1) - (O12 / C2)),
    PDiff                 = abs(ptw_target - ptw_ref) / ((ptw_target + ptw_ref) / 2) * 100,
    SignedDKL             = sum(
      ifelse(O11 > 0, O11 * log(O11 / ((O11 + O12) / 2)), 0) -
      ifelse(O12 > 0, O12 * log(O12 / ((O11 + O12) / 2)), 0)
    )
  ) %>%
  # determine Bonferroni-corrected significance
  dplyr::mutate(Sig_corrected = dplyr::case_when(
    p / Rws >  .05  ~ "n.s.",
    p / Rws >  .01  ~ "p < .05*",
    p / Rws >  .001 ~ "p < .01**",
    p / Rws <= .001 ~ "p < .001***",
    TRUE            ~ "N.A."
  )) %>%
  # round p-value, classify direction, sign phi and G2 for antitypes
  dplyr::mutate(
    p    = round(p, 5),
    type = ifelse(E11 > O11, "antitype", "type"),
    phi  = ifelse(E11 > O11, -phi, phi),
    G2   = ifelse(E11 > O11, -G2,  G2)
  ) %>%
  # filter out non-significant results
  dplyr::filter(Sig_corrected != "n.s.") %>%
  # arrange by G2 (strongest types first)
  dplyr::arrange(-G2) %>%
  # remove superfluous columns
  dplyr::select(-any_of(c(
    "TermCoocFreq", "AllFreq", "NRows", "R1", "R2",
    "C1", "C2", "E12", "E21", "E22", "upp", "low",
    "op", "t.score", "z.score", "Rws"
  ))) %>%
  dplyr::relocate(any_of(c(
    "token", "type", "Sig_corrected", "O11", "O12",
    "ptw_target", "ptw_ref", "G2", "RDF", "RateRatio",
    "RateDifference", "DifferenceCoefficient", "LLR", "SignedDKL",
    "PDiff", "LogOddsRatio", "MI", "PMI", "phi", "X2",
    "OddsRatio", "DeltaP", "p", "E11", "O21", "O22"
  ))) -> assoc_tb3

token

type

Sig_corrected

O11

O12

ptw_target

ptw_ref

G2

RDF

RateRatio

RateDifference

DifferenceCoefficient

LLR

SignedDKL

PDiff

LogOddsRatio

MI

PMI

phi

X2

OddsRatio

DeltaP

p

E11

O21

O22

N

winston

type

p < .001***

440

0

4.6473800

0.00000000

903.1799

0.0046473800

786,166.536360

0.0000046473800

1.0000000

901.8871

304.98476

200.00000

7.3661051

1.4785774

-1.478577

0.05463223

787.4780

1,581.462194

0.6422285

0

157.89069

94,237

169,163

263,840

was

type

p < .001***

2,146

1,618

22.6665399

9.56473933

703.9743

0.0131018006

2.369802

0.0000131018006

0.4064933

1,987.1753

526.25807

81.29867

0.8760446

0.6679609

-2.289194

0.05299452

740.9733

2.401383

0.2143537

0

1,350.68310

92,531

167,545

263,840

had

type

p < .001***

1,268

765

13.3929043

4.52226551

591.3677

0.0088706388

2.961546

0.0000088706388

0.4951468

1,401.9010

497.77101

99.02936

1.0944013

0.7975219

-2.159633

0.04865982

624.7146

2.987394

0.2669231

0

729.52676

93,409

168,398

263,840

party

type

p < .001***

250

9

2.6405568

0.05320312

442.6871

0.0025873537

49.626297

0.0000025873537

0.9604990

494.7523

188.44312

192.09980

3.8551473

1.4275534

-1.529601

0.03962998

414.3699

47.235573

0.6070044

0

92.94020

94,427

169,154

263,840

he

type

p < .001***

1,889

1,729

19.9520475

10.22091119

406.2806

0.0097311363

1.952081

0.0000097311363

0.3225118

1,416.7423

159.94782

64.50237

0.6787494

0.5410077

-2.416147

0.04013583

425.0158

1.971411

0.1655392

0

1,298.29209

92,788

167,434

263,840

obrien

type

p < .001***

178

0

1.8800765

0.00000000

365.0499

0.0018800765

318,041.162722

0.0000018800765

1.0000000

364.8543

123.38020

200.00000

6.4600069

1.4785774

-1.478577

0.03473095

318.2541

639.065492

0.6415904

0

63.87396

94,499

169,163

263,840

she

type

p < .001***

378

110

3.9925219

0.65026040

352.4095

0.0033422615

6.139842

0.0000033422615

0.7198833

581.7046

253.09608

143.97666

1.8149399

1.1100825

-1.847072

0.03731102

367.2949

6.140707

0.4165181

0

175.11513

94,299

169,053

263,840

you

type

p < .001***

950

851

10.0341160

5.03065091

214.0251

0.0050034651

1.994596

0.0000050034651

0.3321303

731.9492

98.95010

66.42605

0.6954194

0.5557786

-2.401376

0.02914781

224.1572

2.004550

0.1698013

0

646.27531

93,727

168,312

263,840

could

type

p < .001***

378

211

3.9925219

1.24731768

194.3144

0.0027452043

3.200880

0.0000027452043

0.5239100

439.4929

164.70635

104.78200

1.1651328

0.8386960

-2.118459

0.02790018

205.3783

3.206349

0.2835562

0

211.35822

94,299

168,952

263,840

telescreen

type

p < .001***

90

0

0.9506005

0.00000000

184.5139

0.0009506005

160,808.212797

0.0000009506005

1.0000000

184.4769

62.38325

200.00000

5.7798374

1.4785774

-1.478577

0.02469195

160.8613

323.706552

0.6413763

0

32.29582

94,587

169,163

263,840

The table above shows the keywords for text1, which is George Orwell’s Nineteen Eighty-Four. The table starts with token (word type), followed by type, which indicates whether the token is a keyword in the target data (type) or a keyword in the reference data (antitype). Next is the Bonferroni-corrected significance (Sig_corrected), which accounts for repeated testing. This is followed by O11 (observed frequency of the token in the target corpus), and then by the various keyness statistics, which are explained in detail in the next section.

Exercises: Computing Keyness

Q1. In the keyword contingency table, what does O11 represent?






Q2. Why is a small offset (e.g., +0.1) added to zero-count cells before calculating keyness statistics?






Q3. What does it mean for a word to be an antitype in a keyword analysis?






Keyness Measures Explained

Section Overview

What you’ll learn: What each keyness statistic measures conceptually, its mathematical formula, and when it is most appropriate to use

Key measures: G², χ², phi, MI, PMI, Log Odds Ratio, Rate Ratio, Rate Difference, Difference Coefficient, Odds Ratio, DeltaP, Signed DKL

Why it matters: Different keyness measures capture different aspects of the relationship between a word and a corpus. Knowing what each one does allows you to make principled choices and report results accurately.

This section explains each of the statistics produced by the code above. Understanding these measures allows you to choose the most appropriate one for your research question and to interpret results correctly. These measures help analyse the association strength and significance of a token’s attraction to the target rather than the reference corpus.

Delta P (ΔP)

Delta P is a measure of association that indicates the difference in conditional probabilities. It measures the strength and direction of the association between a word and corpus membership:

\[\Delta P(A|B) = P(A|B) - P(A|\neg B) \quad \Rightarrow \quad \Delta P = \frac{O_{11}}{R_1} - \frac{O_{21}}{R_2}\]

Where \(P(A|B)\) is the probability of A given B, and \(P(A|\neg B)\) is the probability of A given not-B. Delta P ranges from −1 to +1 and is increasingly recommended in corpus-linguistic work (Gries 2013).

Log Odds Ratio

The Log Odds Ratio measures the strength of association between a word and the target corpus. It is the natural logarithm of the odds ratio and provides a symmetric measure. The +0.5 offsets (Haldane–Anscombe correction) handle zero-count cells:

\[\text{LogOR} = \log\!\left(\frac{(O_{11} + 0.5)(O_{22} + 0.5)}{(O_{12} + 0.5)(O_{21} + 0.5)}\right)\]

Positive values indicate overrepresentation in the target; negative values indicate underrepresentation. The Log Odds Ratio is particularly attractive because it is symmetric, interpretable as an effect size, and amenable to confidence interval construction.

Mutual Information (MI)

Mutual Information quantifies the amount of information obtained about corpus membership through knowing the word. It measures mutual dependence between the word and the corpus:

\[MI = \log_2\!\left(\frac{O_{11}}{E_{11}}\right)\]

MI is highly sensitive to low-frequency items: a word appearing only once or twice in the target but never in the reference will receive an extremely high MI score. It therefore tends to favour rare, highly specific words over more general but robustly frequent keywords. Use MI with a minimum frequency filter.

Pointwise Mutual Information (PMI)

Pointwise Mutual Information measures the association between the specific word and the target corpus as point-events:

\[\text{PMI}(w, \text{target}) = \log_2\!\left(\frac{P(w, \text{target})}{P(w) \cdot P(\text{target})}\right)\]

Like MI, PMI is sensitive to low-frequency words, though in slightly different ways depending on the implementation. Both MI and PMI are better used as ranking or ordering metrics than as standalone significance tests.

Phi (φ) Coefficient

The phi coefficient is a scale-free effect size for the association between a word and corpus membership:

\[\phi = \sqrt{\frac{\chi^2}{N}}\]

Where \(n_{ij}\) are the counts in each cell. Phi ranges from 0 (no association) to 1 (perfect association), and is signed here to indicate direction (positive = type, negative = antitype). Because phi is not influenced by sample size, it is valuable for comparing keyness strength across words or studies.

Chi-Square (χ²)

Pearson’s chi-square tests the independence of the word’s distribution from corpus membership:

\[\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}\]

It shares the same distributional logic as G² but is less robust when expected cell frequencies fall below 5 — which is common for rare words in large corpora. For most corpus-linguistic keyness applications, G² is preferred over χ².

Likelihood Ratio (G²)

The log-likelihood ratio statistic (G²) is the most widely recommended keyness measure in corpus linguistics (Dunning 1993). It compares how much better the data fit a model where the word has different rates in the two corpora versus a model assuming a single pooled rate:

\[G^2 = 2 \sum_{i} O_i \log\!\left(\frac{O_i}{E_i}\right)\]

G² follows an approximate chi-square distribution, making significance assessment straightforward. Unlike Pearson’s χ², G² performs well even when expected cell frequencies are low, making it more robust for rare words.

Rate Ratio

The Rate Ratio compares the rate of events between two groups — here, the per-thousand-word frequencies in the target and reference corpora:

\[\text{Rate Ratio} = \frac{(O_{11} / C_1) \times 1000}{(O_{12} / C_2) \times 1000}\]

A Rate Ratio of 3.0 means the word appears three times more frequently per thousand words in the target than in the reference. It is intuitive and easy to communicate to non-specialist audiences. A small offset (+0.001) avoids division by zero for words absent from the reference.

Rate Difference

The Rate Difference measures the absolute difference in per-thousand-word event rates:

\[\text{Rate Difference} = \frac{O_{11}}{C_1 \times 1000} - \frac{O_{12}}{C_2 \times 1000}\]

While the Rate Ratio is relative (multiplicative), the Rate Difference is absolute (additive). Both capture useful but distinct aspects of how usage rates differ.

Difference Coefficient

The Difference Coefficient (also known as the Difference Score) normalises the Rate Difference by the sum of the two rates:

\[D = \frac{\text{Rate}_1 - \text{Rate}_2}{\text{Rate}_1 + \text{Rate}_2}\]

This produces a bounded measure in [−1, +1], making it easier to compare across words with very different base frequencies.

Odds Ratio

The (unlogged) Odds Ratio quantifies the strength of association between the word and corpus membership:

\[\text{OR} = \frac{(O_{11} + 0.5)(O_{22} + 0.5)}{(O_{12} + 0.5)(O_{21} + 0.5)}\]

Values above 1 indicate overrepresentation in the target; values below 1 indicate underrepresentation. The log transformation (Log Odds Ratio, above) is usually preferred because it is symmetric around zero.

Log-Likelihood Ratio (LLR)

The LLR as implemented here is a simplified form that focuses on the target word’s contribution to the full G² statistic:

\[\text{LLR} = 2 \times O_{11} \times \log\!\left(\frac{O_{11}}{E_{11}}\right)\]

Relative Difference (RDF) and PDiff

RDF is the absolute difference in relative frequencies (proportions) between the two corpora:

\[\text{RDF} = \left|\frac{O_{11}}{C_1} - \frac{O_{12}}{C_2}\right|\]

PDiff expresses this as a percentage of the mean per-thousand-word rate:

\[\text{PDiff} = \frac{|\text{ptw\_target} - \text{ptw\_ref}|}{(\text{ptw\_target} + \text{ptw\_ref}) / 2} \times 100\]

Both measures are intuitive but do not account for corpus size differences or provide statistical significance on their own.

Signed DKL

The Signed Kullback–Leibler divergence measures the information-theoretic distance between the word’s distribution in the two corpora:

\[\text{SignedDKL} = \sum\!\left[O_{11} \log\!\frac{O_{11}}{(O_{11}+O_{12})/2} - O_{12} \log\!\frac{O_{12}}{(O_{11}+O_{12})/2}\right]\]

It is signed to indicate direction (positive = more frequent in target; negative = more frequent in reference).

Significance and Multiple Testing

All keyness statistics above measure association strength, but to determine whether a keyword is statistically significant we need a hypothesis test. The code uses Fisher’s Exact Test, which computes the exact probability of observing a contingency table as extreme as the one observed under the null hypothesis of no association. This is more reliable than the asymptotic chi-square approximation, especially for words with small expected counts.

Bonferroni Correction for Multiple Testing

When testing thousands of words simultaneously, some will appear significant purely by chance. If we test 10,000 words at α = .05, we expect roughly 500 false positives even if no word is truly a keyword. The Bonferroni correction addresses this by dividing the significance threshold by the number of tests performed: αcorrected = α / k, where k is the number of word types tested.

In the output table, the corrected significance tiers are:

Label Meaning
p < .001*** p / k ≤ .001 — very strong evidence against H₀
p < .01** p / k ≤ .01
p < .05* p / k ≤ .05
n.s. Not significant after Bonferroni correction — excluded from results

The Bonferroni correction is conservative (it increases the risk of false negatives alongside reducing false positives). An alternative that controls the False Discovery Rate (FDR) rather than the family-wise error rate is the Benjamini–Hochberg procedure, which offers more statistical power at the cost of allowing a small proportion of false positives among the significant results.

Exercises: Keyness Measures

Q1. Why might Mutual Information (MI) not be the best default measure for identifying keywords in a large corpus?






Q2. G² = 45.3 (p < .001, Bonferroni-corrected). What does this tell us?






Q3. A Rate Ratio of 0.15 for a word in a keyword analysis of text1 vs. text2 means:






Visualising Keywords

Section Overview

What you’ll learn: How to create and interpret three complementary visualisations of keyword results — dot plots, bar plots, and comparison word clouds

Why visualisation matters: A table with thousands of rows of keyness statistics is difficult to scan; visualisations make patterns immediately communicable and allow you to identify the most important results at a glance

Dot Plot

We can visualise the keyness strengths in a dot plot as shown in the code below. Sorting by G² in descending order and selecting the top 20 types gives us the words most strongly characteristic of Orwell’s Nineteen Eighty-Four.

Code
assoc_tb3 %>%
  dplyr::filter(type == "type") %>%
  dplyr::arrange(-G2) %>%
  head(20) %>%
  ggplot(aes(x = reorder(token, G2, mean), y = G2)) +
  geom_point(color = "steelblue", size = 3) +
  geom_segment(aes(xend = token, y = 0, yend = G2),
               color = "steelblue", linewidth = 0.7) +
  coord_flip() +
  theme_bw() +
  theme(panel.grid.minor = element_blank()) +
  labs(
    title    = "Top 20 keywords of Orwell's Nineteen Eighty-Four",
    subtitle = "Compared to Melville's Moby Dick | sorted by G² (log-likelihood)",
    x = "Token", y = "Keyness (G²)"
  )

The dot plot shows that words like party, winston, telescreen, and thought are among the most distinctive terms in Nineteen Eighty-Four — words that encapsulate the novel’s preoccupation with totalitarian control, surveillance, and political conformity.

Bar Plot

Another option is to visualise keyness as a bar plot that simultaneously shows the top keywords for each text. We display the 12 strongest types (keywords of text1) and 12 strongest antitypes (keywords of text2) in a single panel, making the contrasting vocabularies of the two novels immediately apparent.

Code
# get top 12 keywords for text1 (types)
top <- assoc_tb3 %>%
  dplyr::ungroup() %>%
  dplyr::filter(type == "type") %>%
  dplyr::slice_head(n = 12)

# get top 12 keywords for text2 (antitypes of text1)
bot <- assoc_tb3 %>%
  dplyr::ungroup() %>%
  dplyr::filter(type == "antitype") %>%
  dplyr::slice_tail(n = 12)

# combine and plot
rbind(top, bot) %>%
  ggplot(aes(x = reorder(token, G2, mean), y = G2,
             label = round(G2, 1), fill = type)) +
  geom_bar(stat = "identity") +
  geom_text(aes(
    y     = ifelse(G2 > 0, G2 - max(abs(G2)) * 0.04,
                           G2 + max(abs(G2)) * 0.04),
    label = round(G2, 1)
  ), color = "white", size = 3) +
  coord_flip() +
  theme_bw() +
  theme(legend.position = "none",
        panel.grid.minor = element_blank()) +
  scale_fill_manual(values = c("antitype" = "orange", "type" = "steelblue")) +
  labs(
    title    = "Top keywords (blue) and antitypes (orange)",
    subtitle = "Target: Orwell's Nineteen Eighty-Four | Reference: Melville's Moby Dick",
    x = "Keyword", y = "Keyness (G²)"
  )

Bars extending to the right (blue) show the strongest keywords of Nineteen Eighty-Four; bars extending to the left (orange) show words characteristic of Moby Dick that are underrepresented in Orwell. The contrast is striking: Melville’s distinctive vocabulary (whale, ship, sea, ahab) reflects the nautical world of the novel, while Orwell’s keywords (party, winston, telescreen) evoke the dystopian political landscape of Nineteen Eighty-Four.

Comparative Word Clouds

Another form of word clouds, known as comparison clouds, is helpful in discerning disparities between texts. The problem compared to previous, more rigorous methods for identifying keywords is that comparison clouds use a very basic and not very sophisticated method for identifying distinctive words. Nonetheless, comparison clouds are very useful visualisation tools during the initial steps of an analysis.

In a first step, we generate a corpus object from the texts and create a variable with the author name.

Code
corp_dom <- quanteda::corpus(c(text1, text2))
attr(corp_dom, "docvars")$Author <- c("Orwell", "Melville")

Now, we can remove so-called stopwords (non-lexical function words) and punctuation and generate the comparison cloud.

Code
# create a comparison word cloud for a corpus
corp_dom %>%
  # tokenize the corpus, removing punctuation, symbols, and numbers
  quanteda::tokens(
    remove_punct   = TRUE,
    remove_symbols = TRUE,
    remove_numbers = TRUE
  ) %>%
  # remove English stopwords
  quanteda::tokens_remove(stopwords("english")) %>%
  # create a Document-Feature Matrix (DFM)
  quanteda::dfm() %>%
  # group the DFM by the 'Author' column
  quanteda::dfm_group(groups = corp_dom$Author) %>%
  # trim the DFM, keeping terms that occur at least 10 times
  quanteda::dfm_trim(min_termfreq = 10, verbose = FALSE) %>%
  # generate a comparison word cloud
  quanteda.textplots::textplot_wordcloud(
    comparison = TRUE,
    color      = c("darkgray", "orange"),
    max_words  = 150
  )

Interpreting Comparison Word Clouds Cautiously

Comparison word clouds use a simplified keyness algorithm that does not apply multiple testing correction and does not distinguish between statistical significance and visual prominence. They should be used for exploration or illustration rather than as the primary or sole evidence for research claims. Always accompany word clouds with the full statistical keyword table, and report statistics (G², phi, etc.) for any keywords you discuss substantively.

Exercises: Visualising Keywords

Q1. In the bar plot of keywords and antitypes, what does a bar extending to the left (negative G²) represent?






Q2. Why are comparison word clouds considered a less rigorous method of keyword identification than the statistical approach demonstrated earlier?






Reporting Standards

Reporting keyword analyses clearly and completely is as important as conducting them correctly. This section summarises conventions for reporting keyness analyses in corpus linguistics and adjacent fields.


General Principles

What to Report in a Keyword Analysis

Following best practice in corpus linguistics (Hunston 2002; Gries 2013):

Corpus description

  • Describe both the target and reference corpora: their source, composition, size in tokens, and any relevant metadata (e.g., time period, genre, sampling frame)
  • State all preprocessing steps: tokenisation method, case normalisation, stopword removal, lemmatisation
  • Justify the choice of reference corpus relative to the specific research question

Statistical choices

  • Name the keyness measure(s) used and cite a methodological reference (e.g., G²: Dunning (1993))
  • State the significance test used (Fisher’s Exact Test or asymptotic chi-square approximation)
  • State whether and how you corrected for multiple testing (e.g., Bonferroni correction: αcorrected = .05 / k)
  • Report any minimum frequency thresholds applied before ranking

Results

  • Report the keyness statistic (G²), the Bonferroni-corrected significance level, and at least one effect size (phi, Log Odds Ratio, or Rate Ratio) for each keyword discussed in detail
  • Report both types and antitypes if they are relevant to the research question
  • Provide a full keyword table in the paper (or as supplementary material if space is constrained)
  • Interpret keywords substantively — connect them to the theoretical or linguistic claims of the study

Model Reporting Paragraph

To identify the lexical characteristics of Orwell’s Nineteen Eighty-Four relative to Melville’s Moby Dick, a keyword analysis was conducted using the log-likelihood statistic (G²; Dunning (1993)). Fisher’s Exact Test was used to assess statistical significance, with a Bonferroni correction applied to control for multiple comparisons across all word types tested (αcorrected = .05 / k). Only words reaching the corrected threshold of p < .001 are reported. Effect sizes are reported as phi (φ). The strongest keywords of Nineteen Eighty-Four included party (G² = [X], φ = [X], p < .001), winston (G² = [X], φ = [X], p < .001), and telescreen (G² = [X], φ = [X], p < .001), reflecting the novel’s preoccupation with political control and surveillance. Prominent antitypes — words significantly underrepresented in Nineteen Eighty-Four relative to Moby Dick — included whale and ship, consistent with the nautical thematic focus of the reference text.


Quick Reference: Keyness Measures

Measure

Strengths

Use with caution when

G² (Log-Likelihood)

Robust for rare words; best general-purpose keyness test; widely used

Large N inflates significance — always pair with an effect size such as phi

χ² (Chi-Square)

Widely known; same distributional logic as G²

Expected cell frequencies < 5 (use G² instead)

Phi (φ)

Scale-free effect size; comparable across words and studies; not N-inflated

Used alone — does not test statistical significance

MI (Mutual Information)

Highlights highly specific, narrowly targeted words

No frequency filter applied — strongly favours hapax legomena

PMI

Interpretable in information-theoretic terms

No frequency filter applied — also favours rare words

Log Odds Ratio

Symmetric; amenable to CIs; recommended effect size for keyness

Zero cells exist without Haldane correction (+0.5 offset needed)

Rate Ratio

Intuitive; easy to communicate to non-specialist audiences

Base rates in the two corpora differ greatly

Rate Difference

Shows absolute magnitude of frequency difference

Comparing across words with very different base frequencies

Difference Coefficient

Bounded [−1, +1]; accounts for base rate differences

Both rates are near zero (arithmetic instability)

Odds Ratio

Familiar from epidemiology; simple ratio

Asymmetric on raw scale — log transformation preferred

DeltaP (ΔP)

Bounded [−1, +1]; grounded in conditional probability

Less commonly reported; reviewers may be unfamiliar with it

Signed DKL

Information-theoretic; sensitive to distributional divergence

Implementation details vary across software — document formula used


Reporting Checklist

Reporting item

Required

Target corpus described (source, size in tokens, composition)

Yes

Reference corpus described and choice justified relative to research question

Yes

All preprocessing steps reported (tokenisation, case, stopwords, lemmatisation)

Yes

Keyness measure named and a methodological reference cited

Yes

Significance test specified (Fisher's Exact Test or chi-square p-value)

Yes

Multiple testing correction applied and reported (Bonferroni or FDR)

Yes

Minimum frequency threshold stated (if applied before ranking)

Recommended

Both types and antitypes considered and discussed where relevant

Recommended

Effect size reported alongside G² (phi, Log Odds Ratio, or Rate Ratio)

Yes

Full keyword table provided or referenced as supplementary material

Yes

Keywords interpreted substantively in relation to the research question

Yes


Citation & Session Info

Schweinberger, Martin. 2026. Keyness and Keyword Analysis in R. Brisbane: The University of Queensland. url: https://ladal.edu.au/tutorials/key/key.html (Version 2026.02.24).

@manual{schweinberger2026key,
  author       = {Schweinberger, Martin},
  title        = {Keyness and Keyword Analysis in R},
  note         = {tutorials/key/key.html},
  year         = {2026},
  organization = {The University of Queensland, School of Languages and Cultures},
  address      = {Brisbane},
  edition      = {2026.02.24}
}
Code
sessionInfo()
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Australia/Brisbane
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] ggplot2_3.5.1             sna_2.8                  
 [3] network_1.19.0            statnet.common_4.11.0    
 [5] tm_0.7-16                 NLP_0.3-2                
 [7] stringr_1.5.1             dplyr_1.1.4              
 [9] quanteda.textplots_0.95   quanteda.textstats_0.97.2
[11] quanteda_4.2.0            Matrix_1.7-2             
[13] flextable_0.9.7          

loaded via a namespace (and not attached):
 [1] fastmatch_1.1-6         gtable_0.3.6            xfun_0.51              
 [4] htmlwidgets_1.6.4       lattice_0.22-6          vctrs_0.6.5            
 [7] tools_4.4.2             generics_0.1.3          parallel_4.4.2         
[10] klippy_0.0.0.9500       tibble_3.2.1            pkgconfig_2.0.3        
[13] data.table_1.17.0       assertthat_0.2.1        uuid_1.2-1             
[16] lifecycle_1.0.4         farver_2.1.2            compiler_4.4.2         
[19] textshaping_1.0.0       munsell_0.5.1           codetools_0.2-20       
[22] fontquiver_0.2.1        fontLiberation_0.1.0    htmltools_0.5.8.1      
[25] yaml_2.3.10             tidyr_1.3.1             pillar_1.10.1          
[28] openssl_2.3.2           fontBitstreamVera_0.1.1 stopwords_2.3          
[31] tidyselect_1.2.1        zip_2.3.2               digest_0.6.37          
[34] stringi_1.8.4           slam_0.1-55             purrr_1.0.4            
[37] labeling_0.4.3          fastmap_1.2.0           grid_4.4.2             
[40] colorspace_2.1-1        cli_3.6.4               magrittr_2.0.3         
[43] withr_3.0.2             gdtools_0.4.1           scales_1.3.0           
[46] rmarkdown_2.29          officer_0.6.7           askpass_1.2.1          
[49] ragg_1.3.3              coda_0.19-4.1           evaluate_1.0.3         
[52] knitr_1.49              rlang_1.1.5             Rcpp_1.0.14            
[55] nsyllable_1.0.1         glue_1.8.0              xml2_1.3.6             
[58] renv_1.1.1              rstudioapi_0.17.1       jsonlite_1.9.0         
[61] R6_2.6.1                systemfonts_1.2.1      

Back to top

Back to HOME


References

Dunning, Ted. 1993. “Accurate Methods for the Statistics of Surprise and Coincidence.” Computational Linguistics 19 (1): 61–74.
Egbert, Jesse, and Douglas Biber. 2019. “Incorporating Text Dispersion into Keyword Analyses.” Corpora 14 (1): 77–104.
Gries, Stefan Th. 2013. Statistics for Linguistics with R: A Practical Introduction. 2nd ed. Berlin: De Gruyter Mouton.
Hunston, Susan. 2002. Corpora in Applied Linguistics. Cambridge: Cambridge University Press.
Scott, Mike. 1997. PC Analysis of Key Words — and Key Key Words.” System 25 (2): 233–45.
Sønning, Lukas. 2023. “Keyword Analysis in Corpus Linguistics: Rethinking the Foundations.” Corpora 18 (2): 1–31.
Stubbs, Michael. 2010. “Three Concepts of Keywords.” In Keyness in Texts, edited by Marina Bondi and Mike Scott, 1–42. Amsterdam: John Benjamins.

AI Transparency Statement

This tutorial was developed with the assistance of Claude (claude.ai), a large language model created by Anthropic. Claude was used to help revise, expand, and improve the tutorial text; add and structure new sections (learning objectives, prerequisite tutorials, section overview callout boxes, detailed explanations of each keyness measure, the types/antitypes callout, reporting guidelines, and quick-reference tables); write the checkdown multiple-choice exercises with detailed right/wrong feedback; and refine the ggplot2 visualisations. All original code and analysis logic from the draft tutorial have been preserved and integrated. All content was reviewed, edited, and approved by the author (Martin Schweinberger), who takes full responsibility for the accuracy, completeness, and pedagogical appropriateness of the material. The use of AI assistance is disclosed here in the interest of transparency and in accordance with emerging best practices for AI-assisted academic content creation.

Footnotes

  1. I am extremely grateful to Joseph Flanagan, who provided very helpful feedback and pointed out errors in previous versions of this tutorial. All remaining errors are, of course, my own.↩︎