Courses

Structured learning pathways for computational and quantitative methods in language research

LADAL Courses are curated sequences of tutorials, readings, and practical exercises for learners progressing from foundational knowledge to advanced skills. All courses are free, open, and built around reproducible R workflows. Whether you are a complete beginner or an experienced analyst, there is a pathway here for you.

By following a LADAL course, you will develop practical skills in R — data management, visualisation, statistics, and text analytics — and learn to apply them to real research questions in linguistics, the humanities, and the social sciences.

All Courses at a Glance

Short Course — 6–10 tutorials, self-paced, one focused topic Long Course — 12-week semester programme with weekly lectures, tutorials, and readings

Short Course

Introduction to Language Technology

👥 Linguists and humanities students

📚 6 tutorials

A conceptual and practical first introduction to language technology — from text processing and regex to OCR and NLP overview.

Level

Beginner

View course →

Short Course

Introduction to Corpus Linguistics

👥 Linguistics students and language teachers

📚 7 tutorials

Core corpus methods — concordancing, collocations, keyness, and frequency analysis — using R and reproducible workflows.

Level

Beginner

View course →

Short Course

Introduction to Text Analysis

👥 Humanities and social science students

📚 7 tutorials

From text processing basics to topic modelling, sentiment analysis, and network analysis of text collections.

Level

Beginner

View course →

Short Course

Data Visualisation for Linguists

👥 Linguists and language researchers

📚 6 tutorials

Publication-quality visualisation with ggplot2 — histograms, scatter plots, maps, Likert charts, and more.

Level

Introductory

View course →

Short Course

Introduction to Statistics

👥 Humanities and social science researchers

📚 7 tutorials

Statistical literacy from the ground up — descriptive statistics, hypothesis testing, t-tests, chi-square, and simple regression.

Level

Beginner

View course →

Short Course

Introduction to Learner Corpus Research

👥 Applied linguists and SLA researchers

📚 7 tutorials

Learner corpus methods — frequency comparison, collocations, lexical diversity, readability, and error analysis with ICLE and LOCNESS.

Level

Introductory

View course →

Short Course

Natural Language Processing with R

👥 Computational linguists and data scientists

📚 7 tutorials

NLP pipeline in R — preprocessing, TF-IDF, classification, NER, dependency parsing, and introduction to word embeddings.

Level

Intermediate

View course →

Long Course

Introduction to Digital Humanities with R

👥 Humanities researchers and students

📅 12 weeks · No background required

Full semester course: DH methods from data literacy and text processing through corpus analysis, topic modelling, networks, and mapping.

Level

Foundational

View course →

Long Course

Corpus Linguistics and Text Analysis with R

👥 Linguistics and applied linguistics students

📅 12 weeks · No background required

Corpus construction through concordancing, collocations, keywords, topic modelling, sentiment analysis, and network analysis.

Level

Foundational

View course →

Long Course

Introduction to Statistics in the Humanities

👥 Students and researchers, all disciplines

📅 12 weeks · No background required

From probability and descriptive statistics through regression and mixed-effects modelling, using R throughout.

Level

Foundational

View course →

Long Course

Advanced Statistics in the Humanities

👥 Researchers with prior statistics knowledge

📅 12 weeks · Basic stats + R required

Multivariate modelling, classification trees, random forests, clustering, correspondence analysis, and survey data analysis.

Level

Advanced

View course →

Short Courses

Introduction to Language Technology

A first introduction to language technology — what it is, what it can do, and how to get started

6 tutorials No background required Free

👥 Audience: Anyone curious about how computers process and analyse language

🎯 Aim: Conceptual and practical first introduction — from text processing and regex to OCR and NLP overview

Language technology encompasses the computational tools and methods used to analyse, generate, and interact with human language. This short course introduces learners to the landscape of language technology with hands-on practice in R. By the end, learners will understand the key methods and be equipped to explore more specialised pathways.

A conceptual map of language technology and its applications in linguistics and the humanities

Practical experience loading, cleaning, and exploring text data in R

Familiarity with regular expressions as a foundation for all text-analytic work

Hands-on experience with OCR for converting PDFs and scanned documents to text

An understanding of how corpus tools and NLP pipelines are constructed

Introduction to Text Analysis

What text analysis is, how it relates to corpus linguistics and NLP, and key concepts: corpus, token, type, frequency, and concordance.

Getting Started with R

Installing packages, loading data, working with vectors and data frames, and writing simple functions in R and RStudio.

Loading and Saving Data

Importing text from plain text files, CSV, Excel, and web URLs — and saving results for later use.

String Processing

Pattern matching, substitution, splitting, and the core string operations (using stringr) that underpin all text analysis.

Regular Expressions

Character classes, quantifiers, anchors, and look-arounds with worked linguistic examples — the pattern language for searching and transforming text.

Converting PDFs to Text

Extracting machine-readable text with pdftools (digital PDFs) and tesseract (scanned documents), including post-OCR spell-checking.

Introduction to Corpus Linguistics

Core corpus methods — concordancing, collocations, keyness — using R and reproducible workflows

7 tutorials No background required Free

👥 Audience: Linguistics students; language teachers; researchers new to corpus methods

🎯 Aim: Introduce concordancing, collocations, and keyness with hands-on R practice

Corpus linguistics uses large, principled collections of authentic text to investigate patterns of language use. This short course takes learners from a conceptual introduction through hands-on practice with the most widely used corpus methods, culminating in a case-study showcase integrating all techniques into a full corpus-based analysis.

What a corpus is and how corpus-based research differs from introspective approaches

Practical skills in frequency analysis, concordancing, collocation, and keyword extraction using R

Ability to design, conduct, and report a reproducible corpus-based study

Familiarity with key R packages: quanteda, tidytext, and related tools

Introduction to Text Analysis

Key concepts — corpus, concordance, collocation, keyword, frequency — used throughout the course.

Getting Started with R

First introduction to R and RStudio. Focus on the first four sections (up to Working with Tables).

String Processing

Essential string manipulation: pattern matching, substitution, tokenisation preparation, and whitespace management.

Concordancing (Keywords-in-Context)

KWIC concordance search and display in R — sorting, filtering, and interpreting concordance output.

Collocation and N-gram Analysis

Statistically significant collocations and n-gram sequences — PMI, log-likelihood, t-score, and visualisation.

Keyness and Keyword Analysis

Comparing two corpora to identify words that are statistically more or less frequent — the foundation of contrastive corpus analysis.

Corpus Linguistics with R

Capstone showcase: complete case studies integrating concordancing, frequency analysis, collocations, and keyness.

Introduction to Text Analysis

From text processing basics to topic modelling, sentiment analysis, and network analysis

7 tutorials No background required Free

👥 Audience: Humanities and social science students; researchers wanting computational approaches to text

🎯 Aim: Build practical R text analysis skills from cleaning and processing through to advanced methods

Text analysis uses computational methods to extract patterns, topics, sentiment, and relational structure from large collections of text. This course builds from foundational R skills through to topic modelling, sentiment analysis, and network analysis. By the end, learners will be able to apply a range of text-analytic methods to their own research texts.

An understanding of the major families of computational text analysis and their research applications

Practical R skills for cleaning, processing, and analysing text data

Hands-on experience with topic modelling, sentiment analysis, and network analysis

Ability to select the most appropriate method for a given research question

Introduction to Text Analysis

Overview of the field, key concepts, and situating text analysis within computational humanities research.

Getting Started with R

First introduction to R and RStudio. Focus on the first four sections.

String Processing

Core string manipulation skills for preparing raw text for analysis.

Practical Overview of Text Analytics Methods

Frequency analysis, TF-IDF, and basic classification workflows using R.

Topic Modelling

Latent Dirichlet Allocation (LDA) for discovering thematic structure in document collections — theory and R implementation.

Sentiment Analysis

Lexicon-based and machine-learning approaches to opinion and emotion extraction, including dictionary methods and valence shifting.

Network Analysis

Representing relational structure in textual and social data — node and edge construction, centrality measures, and visualisation.

Data Visualisation for Linguists

Publication-quality visualisation with ggplot2 — from frequency plots to maps

6 tutorials Basic R helpful Free

👥 Audience: Linguists and language researchers who want to communicate findings more effectively

🎯 Aim: Principled, publication-quality data visualisation with ggplot2 and linguistic data

Effective visualisation is one of the most transferable skills in quantitative research. This course builds from visualisation principles through the mechanics of ggplot2, covering the graph types most commonly needed in linguistics: frequency distributions, scatter plots, heat maps, geographic maps, and interactive visualisations. Special attention is given to colour accessibility, annotations, and formatting for publication.

A principled understanding of what makes a graph effective or misleading

Practical ggplot2 skills: geoms, scales, facets, themes, and annotations

Publication-quality static and interactive visualisations from linguistic data

Confidence choosing the right graph type for the right data and research question

Getting Started with R

Introduction to R with a focus on data structures and workflow needed for visualisation.

Introduction to Data Visualisation

Visualisation philosophy, perceptual principles, grammar of graphics, and when to use which chart type.

Descriptive Statistics

Summary statistics — means, medians, distributions, variance — that underpin most visualisations of linguistic data.

Data Visualisation with R

In-depth ggplot2: histograms, density plots, box plots, bar charts, scatter plots, and line graphs with worked linguistic examples.

Visualising and Analysing Survey Data

Cumulative density plots, diverging stacked bar charts, and Likert scale visualisation for questionnaire data.

Maps and Spatial Visualisation

Dialect maps, distribution maps, and choropleth maps of linguistic data using ggplot2 and sf.

Introduction to Statistics

Introduction to Statistics in the Humanities and Social Sciences

Statistical literacy and practical quantitative skills from the ground up, using R throughout

7 tutorials No background required Free

👥 Audience: Humanities and social science students and researchers with little or no prior statistics knowledge

🎯 Aim: Build statistical literacy from first principles through inferential testing in R

This course provides a conceptual and practical introduction to statistics for researchers whose background is in the humanities or social sciences. It begins with the philosophical foundations of quantitative reasoning and builds through descriptive statistics, visualisation, and inferential testing. By the end, learners will be able to conduct and interpret basic statistical analyses and communicate their results clearly.

Solid conceptual understanding of statistical thinking, probability, and hypothesis testing

Practical R skills for summarising, tabulating, visualising, and testing data

Ability to select, apply, and interpret t-tests, chi-square, correlation, and simple regression

Confidence reading and critically evaluating quantitative results in published research

Introduction to Quantitative Reasoning

Scientific thinking, the logic of hypothesis testing, and the role of quantitative methods in humanities and social science research.

Basic Concepts in Quantitative Research

Variables, data types, sampling, populations, reliability, and validity.

Getting Started with R

Introduction to R and RStudio. Focus on the first four sections.

Handling Tables in R

Importing, cleaning, reshaping, and summarising data frames using dplyr and tidyr.

Descriptive Statistics

Means, medians, standard deviations, distributions, and frequency tables in R.

Introduction to Data Visualisation

Visualisation principles and hands-on practice creating and customising graphs in R.

Basic Inferential Statistics

Hypothesis testing, p-values, confidence intervals, t-tests, chi-square, correlation, and simple linear regression with R exercises.

Introduction to Learner Corpus Research

Corpus methods for studying learner language — from frequency comparison to error analysis

7 tutorials Basic corpus linguistics helpful Free

👥 Audience: Applied linguists; SLA researchers; language teachers and test developers

🎯 Aim: Introduce LCR methods from corpus construction through to lexical diversity, readability, and error analysis

Learner corpus research uses collections of authentic language produced by second-language learners to investigate the structure, development, and distinctiveness of interlanguage. This course covers the major analytical methods — concordancing, frequency comparison, collocation, POS tagging, lexical diversity, and error analysis — using the ICLE and LOCNESS corpora as running examples.

What learner corpora are and how they differ from native-speaker corpora

Skills for comparing learner and native-speaker language quantitatively using R

Experience with lexical diversity measures, readability scores, and spelling error detection

Ability to design and interpret a basic learner corpus study in the context of SLA theory

Introduction to Text Analysis

Key concepts — corpus, frequency, concordance, collocation — underpinning learner corpus research.

Getting Started with R

Data structures and workflow for corpus analysis.

String Processing

Cleaning, normalising, splitting, and extracting character patterns from raw learner corpus texts.

Concordancing (Keywords-in-Context)

Extracting and inspecting KWIC concordances from learner texts to investigate how learners use specific words or constructions.

Collocation and N-gram Analysis

Comparing collocational patterns between learner and native-speaker corpora for studying collocational competence and L1 transfer.

Analysing Learner Language with R

Frequency comparison, POS tagging, lexical diversity, readability scores, and spelling error detection with ICLE and LOCNESS examples.

Keyness and Keyword Analysis

Words systematically over- or under-used by learners relative to native-speaker norms — one of the most informative methods in LCR.

Natural Language Processing with R

Text preprocessing, feature extraction, classification, NER, and transformer-based representations

7 tutorials Intermediate R required Free

👥 Audience: Computational linguists; data scientists working with language data

🎯 Prerequisite: Intermediate R skills; basic familiarity with descriptive statistics and simple regression

NLP builds on corpus and statistical methods to develop computational pipelines for understanding and generating language at scale. This course introduces the NLP workflow in R using real linguistic datasets, progressing from text preprocessing and feature engineering to supervised classification, topic models, and an introduction to working with large language model embeddings and APIs.

Clear understanding of the NLP pipeline from raw text to structured, analysable representations

Practical preprocessing skills: tokenisation, stopword removal, stemming, and lemmatisation

Experience building document-feature matrices and applying TF-IDF weighting

Hands-on practice with text classification, NER, and dependency parsing

Introduction to word embeddings and transformer-based representations

Introduction to Text Analysis

Situating NLP within corpus linguistics and computational linguistics.

String Processing

Foundation string manipulation — essential for all preprocessing steps in NLP pipelines.

Regular Expressions

Regex as the primary pattern-matching tool in text preprocessing and feature extraction.

Practical Overview of Text Analytics Methods

Document-feature matrices, TF-IDF, and basic classification workflows in R.

Topic Modelling

Probabilistic topic models as an unsupervised NLP method for discovering thematic structure.

Analysing Learner Language with R

POS tagging with udpipe, sequence analysis, and lexical diversity measures — key NLP tasks applied to real corpus data.

Network Analysis

Representing relational structure in language data — semantic networks, co-occurrence graphs, and social networks of linguistic interaction.

Long Courses

Introduction to Digital Humanities with R

Computational methods for humanistic inquiry — from data literacy through corpus analysis, networks, and mapping

12 weeks No background required Free

👥 Audience: Literature, history, cultural studies, linguistics, media studies students and researchers

🕐 Structure: 1h lecture + 1.5h tutorial per week

🎯 Aim: Design, conduct, and communicate a reproducible computational analysis of a humanities dataset

Digital humanities applies computational methods to humanistic inquiry: analysing large literary corpora, mapping cultural data geographically, tracing discourse patterns across historical archives, or modelling networks of social interaction. This 12-week course introduces students to the core DH toolkit through R, with weekly tutorials grounded in real humanities datasets. No prior programming experience is assumed.

Week 1What Is Digital Humanities?▾

Lecture topics

Overview of digital humanities — history, debates, and current landscape; relationship to corpus linguistics, text analysis, and data science; what counts as DH research.

LADAL tutorials

Introduction to Text Analysis

Readings

Burdick et al. (2012). Digital humanities. MIT Press, Ch. 1
Drucker (2021). The digital humanities coursebook. Routledge, Ch. 1

Week 2Reproducible Research and Data Management▾

Lecture topics

Why reproducibility matters in DH; introduction to R and RStudio; file organisation, project workflows, and version control basics.

LADAL tutorials

Reproducible Research Creating R Notebooks

Readings

Flanagan, J. (2025). Reproducibility, replicability, robustness, and generalizability in corpus linguistics. International Journal of Corpus Linguistics. doi:10.1075/ijcl.24113.fla

Week 3Getting Started with R▾

Lecture topics

R syntax, data types, vectors, and data frames; the tidyverse ecosystem; reading and writing data.

LADAL tutorials

Getting Started with R Loading and Saving Data

Readings

Wickham & Grolemund (2016). R for data science. Ch. 1–3. r4ds.had.co.nz

Week 4Working with Text Data▾

Lecture topics

How text is represented computationally; encoding, tokenisation, and the document-feature matrix; from raw text to structured data.

LADAL tutorials

String Processing Regular Expressions

Readings

Jockers, M. L. (2014). Text analysis with R for students of literature. Springer, Ch. 1–3

Week 5Building and Exploring Digital Corpora▾

Lecture topics

What is a corpus? Sampling principles, metadata, corpus design for humanities research; downloading and preparing digital texts.

LADAL tutorials

Downloading from Project Gutenberg Converting PDFs to Text

Readings

Biber, Conrad & Reppen (1998). Corpus linguistics. Cambridge University Press, Ch. 1–2

Week 6Frequency Analysis and Visualisation▾

Lecture topics

Zipf's law and frequency distributions; word counts, type-token ratios, and dispersion; principles of effective visualisation for humanities data.

LADAL tutorials

Introduction to Data Visualisation Descriptive Statistics

Readings

Jockers (2014), Ch. 4–5

Week 7Concordancing, Collocations, and Keywords▾

Lecture topics

Searching corpora; KWIC concordances and their interpretation; collocation and association measures; keyness and corpus comparison.

LADAL tutorials

Concordancing with R Keyness Analysis

Readings

Baker, P. (2006). Using corpora in discourse analysis. Continuum, Ch. 3–4

Week 8Topic Modelling and Thematic Analysis▾

Lecture topics

Latent Dirichlet Allocation (LDA); interpreting topics; applications in literary and historical research; limitations and critical perspectives.

LADAL tutorials

Topic Modelling

Readings

Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84
Maier et al. (2021). Applying LDA topic modeling in communication research. In Computational methods for communication science (pp. 13–38). Routledge

Week 9Sentiment Analysis and Opinion Mining▾

Lecture topics

Lexicon-based and machine learning approaches to sentiment; subjectivity, valence, and emotion; applications in literary and media studies.

LADAL tutorials

Sentiment Analysis

Readings

Liu, B. (2012). Sentiment analysis and opinion mining. Ch. 1–2

Week 10Network Analysis for Humanities Research▾

Lecture topics

Graphs and networks as representations of humanistic data; character networks in fiction; citation and social networks; centrality and community detection.

LADAL tutorials

Network Analysis

Readings

Moretti, F. (2011). Network theory, plot analysis. New Left Review, 68, 80–102

Week 11Maps, Space, and Geographic Visualisation▾

Lecture topics

Spatial thinking in digital humanities; mapping literary geography, dialect distribution, and historical events; choropleth maps and point maps in R.

LADAL tutorials

Maps and Spatial Visualisation

Readings

Drucker (2021), Ch. 8

Week 12Project Workshop and Critical Reflections▾

Lecture topics

Critical DH — bias in corpora and algorithms, data ethics, representation, and positionality; communicating DH research; the future of digital humanities.

Tutorial

Student project presentations and peer feedback.

📚 Core Reading List

Baker, P. (2006). Using corpora in discourse analysis. Continuum.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics. Cambridge University Press.
Burdick, A., et al. (2012). Digital humanities. MIT Press.
Drucker, J. (2021). The digital humanities coursebook. Routledge.
Flanagan, J. (2025). Reproducibility in corpus linguistics. International Journal of Corpus Linguistics. doi:10.1075/ijcl.24113.fla
Jockers, M. L. (2014). Text analysis with R for students of literature. Springer.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.
Wickham, H., & Grolemund, G. (2016). R for data science. O'Reilly. r4ds.had.co.nz

Corpus Linguistics and Text Analysis with R

Introduction to Corpus Linguistics and Text Analysis with R

Corpus construction through concordancing, keywords, topic modelling, and network analysis

12 weeks No background required Free

👥 Audience: Students in linguistics, applied linguistics, translation, communication, and literary studies

🕐 Structure: 1h lecture + 1.5h tutorial per week

🎯 Aim: Introduce corpus-based methods and hands-on text analysis in R

Week 1Introduction to Corpus Linguistics and Text Analytics▾

Lecture topics

What is corpus linguistics? Key concepts, history, and applications; corpus vs. introspective and experimental methods; overview of the course.

LADAL tutorials

Introduction to Text Analysis

Readings

McEnery & Hardie (2012). Corpus linguistics: Method, theory and practice. CUP, Ch. 1–2

Week 2Working with Digital Data and Reproducibility▾

Lecture topics

Principles of reproducible research; introduction to R Notebooks; file management and workflow.

LADAL tutorials

Reproducible Research Creating R Notebooks

Readings

Flanagan (2025). Reproducibility in corpus linguistics. doi:10.1075/ijcl.24113.fla

Week 3Getting Started with R▾

Lecture topics

R and RStudio; installing packages; basic syntax; workflow setup.

LADAL tutorials

Why R for Corpus Linguistics Getting Started with R Loading and Saving Data

Readings

Wickham & Grolemund (2016), Ch. 1–3

Week 4Corpus Compilation and Preparation▾

Lecture topics

Types of corpora; sampling principles and representativeness; metadata and annotation; legal and ethical issues in corpus construction.

LADAL tutorials

Downloading from Project Gutenberg

Readings

Biber, Conrad & Reppen (1998), Ch. 1–2

Week 5Frequency and Dispersion▾

Lecture topics

Counting words and n-grams; Zipf's law; normalised frequencies; dispersion measures and why they matter; type-token ratio.

LADAL tutorials

Handling Tables in R

Readings

McEnery & Hardie (2012), Ch. 3
Gries (2024). Frequency, dispersion, association, and keyness. Ch. 1–2

Week 6Concordancing and KWIC▾

Lecture topics

Searching corpora; concordance displays and their interpretation; sorting and filtering; from examples to patterns.

LADAL tutorials

Concordancing with R

Readings

Baker (2006), Ch. 3

Week 7Collocations and N-grams▾

Lecture topics

Association measures (MI, t-score, log-likelihood, Dice); phraseology and formulaic sequences; n-gram extraction and analysis.

LADAL tutorials

Collocation and N-gram Analysis

Readings

Gries (2024), Ch. 2

Week 8Keywords and Keyness▾

Lecture topics

Reference corpora and keyness; log-likelihood and log ratio as keyness measures; interpretation and applications in discourse analysis.

LADAL tutorials

Keyness and Keyword Analysis

Readings

Gries (2024), Ch. 3

Week 9Advanced Text Analytics I — Topic Modelling▾

Lecture topics

Unsupervised text classification; LDA and its assumptions; interpreting and validating topic models; applications in linguistics and discourse analysis.

LADAL tutorials

Topic Modelling

Readings

Maier et al. (2021). Applying LDA topic modeling in communication research (pp. 13–38)

Week 10Advanced Text Analytics II — Sentiment and Network Analysis▾

Lecture topics

Sentiment lexicons; opinion mining; co-occurrence networks and semantic networks from corpus data.

LADAL tutorials

Sentiment Analysis Network Analysis

Readings

Liu (2012), Ch. 1–2

Week 11Case Studies in Corpus Linguistics▾

Lecture topics

Corpus-based studies of grammar, lexis, and discourse; from method to interpretation; writing up corpus research.

LADAL tutorials

Corpus Linguistics with R

Readings

Baker (2006), Ch. 7

Week 12Project Workshop and Presentations▾

Lecture topics

Ethics in corpus research; future directions; communicating corpus findings to non-specialist audiences.

Tutorial

Student project work.

📚 Core Reading List

Baker, P. (2006). Using corpora in discourse analysis. Continuum.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics. Cambridge University Press.
Flanagan, J. (2025). Reproducibility in corpus linguistics. International Journal of Corpus Linguistics. doi:10.1075/ijcl.24113.fla
Gries, S. T. (2024). Frequency, dispersion, association, and keyness (Studies in Corpus Linguistics, Vol. 115). John Benjamins.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.
McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge University Press.
Wickham, H., & Grolemund, G. (2016). R for data science. r4ds.had.co.nz

Introduction to Statistics in the Humanities

Introduction to Statistics in the Humanities and Social Sciences

Probability and descriptive statistics through regression and mixed-effects modelling, using R throughout

12 weeks No background required Free

👥 Audience: Students and researchers in linguistics, psychology, education, sociology, and related fields

🕐 Structure: 1h lecture + 1.5h tutorial per week

🎯 Aim: Practical and conceptual foundation in quantitative methods, no prior knowledge assumed

Week 1Introduction to Quantitative Research▾

Lecture topics

The role of quantitative methods in humanities and social sciences; an overview of statistical thinking; the research cycle; types of research questions.

LADAL tutorials

Introduction to Quantitative Reasoning

Readings

Field, Miles & Field (2012). Discovering statistics using R. Ch. 1
Baayen (2008). Analyzing linguistic data. Ch. 1

Week 2Basic Concepts in Quantitative Research▾

Lecture topics

Data types and measurement scales; variables, operationalisation, and construct validity; sampling and representativeness; reliability and validity.

LADAL tutorials

Basic Concepts in Quantitative Research

Readings

Gries (2013). Statistics for linguists. Ch. 1–2

Week 3Getting Started with R — Part 1▾

Lecture topics

Introduction to R and RStudio; installing and loading packages; basic syntax and data structures; the tidyverse ecosystem.

LADAL tutorials

Getting Started with R

Readings

Wickham & Grolemund (2016), Ch. 1–3

Week 4Loading and Handling Data▾

Lecture topics

Importing datasets from CSV, Excel, and text files; data cleaning and transformation; working with factors and missing values.

LADAL tutorials

Loading and Saving Data Handling Tables in R

Readings

Baayen (2008), Ch. 2

Week 5R Basics for Statistical Analysis▾

Lecture topics

Vectors, factors, data frames, indexing, and subsetting; writing functions; applying operations across groups with dplyr.

LADAL tutorials

Getting Started with R (advanced sections)

Readings

Field, Miles & Field (2012), Ch. 2–3

Week 6Descriptive Statistics▾

Lecture topics

Measures of central tendency and dispersion; frequency distributions; skewness and kurtosis; the normal distribution; summarising grouped data.

LADAL tutorials

Descriptive Statistics

Readings

Baayen (2008), Ch. 3
Winter (2019). Statistics for linguists. Ch. 2

Week 7Visualising Data▾

Lecture topics

Principles of effective visualisation; histograms, box plots, scatter plots, and bar charts; ggplot2 grammar of graphics.

LADAL tutorials

Data Visualisation with R

Readings

Wickham & Grolemund (2016), Ch. 14

Week 8Hypothesis Testing and Power Analysis▾

Lecture topics

The logic of null hypothesis significance testing; t-tests, ANOVA; p-values and their interpretation; effect sizes; statistical power and sample size planning.

LADAL tutorials

Basic Inferential Statistics

Readings

Field, Miles & Field (2012), Ch. 4
Gries (2013), Ch. 3

Week 9Correlation and Simple Regression▾

Lecture topics

Pearson and Spearman correlation; simple linear regression; interpreting intercepts and slopes; assumptions and diagnostics.

LADAL tutorials

Regression Analysis

Readings

Baayen (2008), Ch. 4

Week 10Multiple Regression and Model Diagnostics▾

Lecture topics

Multiple regression; multicollinearity; residual analysis; model comparison with AIC/BIC; stepwise and theory-driven model building.

LADAL tutorials

Regression Analysis (advanced)

Readings

Winter (2019), Ch. 5

Week 11Logistic Regression▾

Lecture topics

Binary and ordinal outcomes; logistic regression model fitting and interpretation; odds ratios and predicted probabilities; the proportional odds model.

LADAL tutorials

Regression Analysis Visualising and Analysing Survey Data

Readings

Baayen (2008), Ch. 5
Winter (2019), Ch. 6

Week 12Mixed-Effects Models▾

Lecture topics

Why mixed effects? Random intercepts and random slopes; by-participant and by-item random effects; fitting and interpreting mixed models with lme4.

LADAL tutorials

Mixed-Effects Models

Readings

Gries (2013), Ch. 6
Field, Miles & Field (2012), Ch. 12

📚 Core Reading List

Baayen, R. H. (2008). Analyzing linguistic data. Cambridge University Press.
Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Sage.
Gries, S. T. (2013). Statistics for linguists. De Gruyter Mouton.
Wickham, H., & Grolemund, G. (2016). R for data science. O'Reilly. r4ds.had.co.nz
Winter, B. (2019). Statistics for linguists: An introduction using R. Routledge.

Advanced Statistics in the Humanities and Social Sciences

Multivariate modelling, classification, clustering, and survey data analysis using R

12 weeks Basic stats + R required Free

👥 Audience: Students and researchers with prior knowledge of basic statistics

🕐 Structure: 1h lecture + 1.5h tutorial per week

🎯 Prerequisite: Familiarity with t-tests, regression, and intermediate R skills

Week 1Advanced Data Management and Reproducible Workflows▾

Lecture topics

Organising complex datasets; reproducibility in advanced research; scripting and automating analysis pipelines; version control with Git.

LADAL tutorials

Reproducible Research Creating R Notebooks

Readings

Flanagan (2025), Ch. 1

Week 2Review of Descriptive and Inferential Statistics▾

Lecture topics

Quick review of key concepts: distributions, t-tests, correlations, confidence intervals, effect sizes, and power.

LADAL tutorials

Descriptive Statistics Basic Inferential Statistics

Readings

Field, Miles & Field (2012), Ch. 1–4

Week 3Advanced Regression — Multiple and Hierarchical Models▾

Lecture topics

Multiple regression; interaction terms; hierarchical (nested) models; mixed-effects models with random intercepts and slopes.

LADAL tutorials

Regression Analysis Mixed-Effects Models

Readings

Baayen (2008), Ch. 4–5
Winter (2019), Ch. 5–6

Week 4Logistic Regression and Generalised Linear Models▾

Lecture topics

Binary and multinomial outcomes; model fitting and interpretation; goodness-of-fit; GLMs as a unified framework.

LADAL tutorials

Regression Analysis

Readings

Winter (2019), Ch. 6

Week 5Classification — Decision Trees▾

Lecture topics

Decision trees; recursive partitioning; overfitting and pruning; interpreting tree outputs; applications in linguistic classification problems.

LADAL tutorials

Tree-Based Models

Readings

Gries (2013), Ch. 6

Week 6Classification — Random Forests and Ensemble Methods▾

Lecture topics

Ensemble learning; bagging and boosting; random forests; variable importance; improving prediction accuracy and generalisability.

LADAL tutorials

Tree-Based Models

Readings

James, Witten, Hastie & Tibshirani (2021). An introduction to statistical learning. Ch. 8

Week 7Clustering and Correspondence Analysis▾

Lecture topics

Unsupervised classification; k-means and hierarchical clustering; choosing the number of clusters; correspondence analysis for categorical data.

LADAL tutorials

Cluster and Correspondence Analysis

Readings

Gries (2013), Ch. 7

Week 8Survey and Questionnaire Data Analysis I▾

Lecture topics

Preparing survey data; dealing with missing values; Likert scales and their properties; descriptive analysis and visualisation of survey items.

LADAL tutorials

Visualising and Analysing Survey Data

Readings

Field, Miles & Field (2012), Ch. 10
Baayen (2008), Ch. 6

Week 9Survey and Questionnaire Data Analysis II▾

Lecture topics

Reliability (Cronbach's α, McDonald's ω); factor analysis and scale validation; cross-tabulations and chi-square; ordinal regression for Likert outcomes.

LADAL tutorials

Visualising and Analysing Survey Data

Readings

Field, Miles & Field (2012), Ch. 11

Week 10Dimension Reduction and Multivariate Techniques▾

Lecture topics

Principal Component Analysis (PCA); multidimensional scaling (MDS); detecting latent variables; applications to linguistic and social science data.

LADAL tutorials

Dimension Reduction Methods

Readings

Gries (2013), Ch. 8

Week 11Model Evaluation, Diagnostics, and Advanced Visualisation▾

Lecture topics

Residual analysis and outlier detection; model comparison and selection criteria (AIC, BIC, cross-validation); visualisation for multivariate data.

LADAL tutorials

Data Visualisation with R Regression Analysis

Readings

Winter (2019), Ch. 7

Week 12Applications and Student Mini-Projects▾

Lecture topics

Integrating advanced methods into humanities and social science research; ethical considerations; communicating complex statistical results; reproducibility revisited.

Tutorial

Student project work applying classification, clustering, and survey analysis to real datasets.

Readings

Baayen (2008), Ch. 7
Field, Miles & Field (2012), Ch. 12

📚 Core Reading List

Baayen, R. H. (2008). Analyzing linguistic data. Cambridge University Press.
Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Sage.
Flanagan, J. (2025). Reproducibility in corpus linguistics. International Journal of Corpus Linguistics. doi:10.1075/ijcl.24113.fla
Gries, S. T. (2013). Statistics for linguists. De Gruyter Mouton.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning. Springer.
Winter, B. (2019). Statistics for linguists: An introduction using R. Routledge.

All Courses at a Glance

Short Courses

Self-Paced Short Courses

Introduction to Language Technology

Introduction to Language Technology

Introduction to Corpus Linguistics

Introduction to Corpus Linguistics

Introduction to Text Analysis

Introduction to Text Analysis

Data Visualisation for Linguists

Data Visualisation for Linguists

Introduction to Statistics

Introduction to Statistics in the Humanities and Social Sciences

Introduction to Learner Corpus Research

Introduction to Learner Corpus Research

Natural Language Processing with R

Natural Language Processing with R

Long Courses

Semester-Length Long Courses

Introduction to Digital Humanities with R

Introduction to Digital Humanities with R

📚 Core Reading List

Corpus Linguistics and Text Analysis with R

Introduction to Corpus Linguistics and Text Analysis with R

📚 Core Reading List

Introduction to Statistics in the Humanities

Introduction to Statistics in the Humanities and Social Sciences

📚 Core Reading List

Advanced Statistics in the Humanities and Social Sciences

Advanced Statistics in the Humanities and Social Sciences

📚 Core Reading List

Ready to start learning?