Courses

Structured learning pathways for computational and quantitative methods in language research

LADAL Courses are curated sequences of tutorials, readings, and practical exercises for learners progressing from foundational knowledge to advanced skills. All courses are free, open, and built around reproducible R workflows. Whether you are a complete beginner or an experienced analyst, there is a pathway here for you.

By following a LADAL course, you will develop practical skills in R — data management, visualisation, statistics, and text analytics — and learn to apply them to real research questions in linguistics, the humanities, and the social sciences.


All Courses at a Glance

Short Course — 6–10 tutorials, self-paced, one focused topic Long Course — 12-week semester programme with weekly lectures, tutorials, and readings
👥 Linguists and humanities students
📚 6 tutorials

A conceptual and practical first introduction to language technology — from text processing and regex to OCR and NLP overview.

Level
Beginner
👥 Linguistics students and language teachers
📚 7 tutorials

Core corpus methods — concordancing, collocations, keyness, and frequency analysis — using R and reproducible workflows.

Level
Beginner
👥 Humanities and social science students
📚 7 tutorials

From text processing basics to topic modelling, sentiment analysis, and network analysis of text collections.

Level
Beginner
👥 Linguists and language researchers
📚 6 tutorials

Publication-quality visualisation with ggplot2 — histograms, scatter plots, maps, Likert charts, and more.

Level
Introductory
👥 Humanities and social science researchers
📚 7 tutorials

Statistical literacy from the ground up — descriptive statistics, hypothesis testing, t-tests, chi-square, and simple regression.

Level
Beginner
👥 Applied linguists and SLA researchers
📚 7 tutorials

Learner corpus methods — frequency comparison, collocations, lexical diversity, readability, and error analysis with ICLE and LOCNESS.

Level
Introductory
👥 Computational linguists and data scientists
📚 7 tutorials

NLP pipeline in R — preprocessing, TF-IDF, classification, NER, dependency parsing, and introduction to word embeddings.

Level
Intermediate
👥 Humanities researchers and students
📅 12 weeks · No background required

Full semester course: DH methods from data literacy and text processing through corpus analysis, topic modelling, networks, and mapping.

Level
Foundational
👥 Linguistics and applied linguistics students
📅 12 weeks · No background required

Corpus construction through concordancing, collocations, keywords, topic modelling, sentiment analysis, and network analysis.

Level
Foundational
👥 Students and researchers, all disciplines
📅 12 weeks · No background required

From probability and descriptive statistics through regression and mixed-effects modelling, using R throughout.

Level
Foundational
👥 Researchers with prior statistics knowledge
📅 12 weeks · Basic stats + R required

Multivariate modelling, classification trees, random forests, clustering, correspondence analysis, and survey data analysis.

Level
Advanced

Short Courses

Self-Paced Short Courses

6–10 tutorials per course. Work through them in order at your own pace — no enrolment needed. Ideal for researchers building a specific skill quickly, or instructors embedding a focused module in a larger course.

Introduction to Language Technology

Introduction to Language Technology

A first introduction to language technology — what it is, what it can do, and how to get started

6 tutorials No background required Free
👥 Audience: Anyone curious about how computers process and analyse language
🎯 Aim: Conceptual and practical first introduction — from text processing and regex to OCR and NLP overview

Language technology encompasses the computational tools and methods used to analyse, generate, and interact with human language. This short course introduces learners to the landscape of language technology with hands-on practice in R. By the end, learners will understand the key methods and be equipped to explore more specialised pathways.

A conceptual map of language technology and its applications in linguistics and the humanities
Practical experience loading, cleaning, and exploring text data in R
Familiarity with regular expressions as a foundation for all text-analytic work
Hands-on experience with OCR for converting PDFs and scanned documents to text
An understanding of how corpus tools and NLP pipelines are constructed
1
Introduction to Text Analysis

What text analysis is, how it relates to corpus linguistics and NLP, and key concepts: corpus, token, type, frequency, and concordance.

2
Getting Started with R

Installing packages, loading data, working with vectors and data frames, and writing simple functions in R and RStudio.

3
Loading and Saving Data

Importing text from plain text files, CSV, Excel, and web URLs — and saving results for later use.

4
String Processing

Pattern matching, substitution, splitting, and the core string operations (using stringr) that underpin all text analysis.

5
Regular Expressions

Character classes, quantifiers, anchors, and look-arounds with worked linguistic examples — the pattern language for searching and transforming text.

6
Converting PDFs to Text

Extracting machine-readable text with pdftools (digital PDFs) and tesseract (scanned documents), including post-OCR spell-checking.

Introduction to Corpus Linguistics

Introduction to Corpus Linguistics

Core corpus methods — concordancing, collocations, keyness — using R and reproducible workflows

7 tutorials No background required Free
👥 Audience: Linguistics students; language teachers; researchers new to corpus methods
🎯 Aim: Introduce concordancing, collocations, and keyness with hands-on R practice

Corpus linguistics uses large, principled collections of authentic text to investigate patterns of language use. This short course takes learners from a conceptual introduction through hands-on practice with the most widely used corpus methods, culminating in a case-study showcase integrating all techniques into a full corpus-based analysis.

What a corpus is and how corpus-based research differs from introspective approaches
Practical skills in frequency analysis, concordancing, collocation, and keyword extraction using R
Ability to design, conduct, and report a reproducible corpus-based study
Familiarity with key R packages: quanteda, tidytext, and related tools
1
Introduction to Text Analysis

Key concepts — corpus, concordance, collocation, keyword, frequency — used throughout the course.

2
Getting Started with R

First introduction to R and RStudio. Focus on the first four sections (up to Working with Tables).

3
String Processing

Essential string manipulation: pattern matching, substitution, tokenisation preparation, and whitespace management.

4
Concordancing (Keywords-in-Context)

KWIC concordance search and display in R — sorting, filtering, and interpreting concordance output.

5
Collocation and N-gram Analysis

Statistically significant collocations and n-gram sequences — PMI, log-likelihood, t-score, and visualisation.

6
Keyness and Keyword Analysis

Comparing two corpora to identify words that are statistically more or less frequent — the foundation of contrastive corpus analysis.

7
Corpus Linguistics with R

Capstone showcase: complete case studies integrating concordancing, frequency analysis, collocations, and keyness.

Introduction to Text Analysis

Introduction to Text Analysis

From text processing basics to topic modelling, sentiment analysis, and network analysis

7 tutorials No background required Free
👥 Audience: Humanities and social science students; researchers wanting computational approaches to text
🎯 Aim: Build practical R text analysis skills from cleaning and processing through to advanced methods

Text analysis uses computational methods to extract patterns, topics, sentiment, and relational structure from large collections of text. This course builds from foundational R skills through to topic modelling, sentiment analysis, and network analysis. By the end, learners will be able to apply a range of text-analytic methods to their own research texts.

An understanding of the major families of computational text analysis and their research applications
Practical R skills for cleaning, processing, and analysing text data
Hands-on experience with topic modelling, sentiment analysis, and network analysis
Ability to select the most appropriate method for a given research question
1
Introduction to Text Analysis

Overview of the field, key concepts, and situating text analysis within computational humanities research.

2
Getting Started with R

First introduction to R and RStudio. Focus on the first four sections.

3
String Processing

Core string manipulation skills for preparing raw text for analysis.

4
Practical Overview of Text Analytics Methods

Frequency analysis, TF-IDF, and basic classification workflows using R.

5
Topic Modelling

Latent Dirichlet Allocation (LDA) for discovering thematic structure in document collections — theory and R implementation.

6
Sentiment Analysis

Lexicon-based and machine-learning approaches to opinion and emotion extraction, including dictionary methods and valence shifting.

7
Network Analysis

Representing relational structure in textual and social data — node and edge construction, centrality measures, and visualisation.

Data Visualisation for Linguists

Data Visualisation for Linguists

Publication-quality visualisation with ggplot2 — from frequency plots to maps

6 tutorials Basic R helpful Free
👥 Audience: Linguists and language researchers who want to communicate findings more effectively
🎯 Aim: Principled, publication-quality data visualisation with ggplot2 and linguistic data

Effective visualisation is one of the most transferable skills in quantitative research. This course builds from visualisation principles through the mechanics of ggplot2, covering the graph types most commonly needed in linguistics: frequency distributions, scatter plots, heat maps, geographic maps, and interactive visualisations. Special attention is given to colour accessibility, annotations, and formatting for publication.

A principled understanding of what makes a graph effective or misleading
Practical ggplot2 skills: geoms, scales, facets, themes, and annotations
Publication-quality static and interactive visualisations from linguistic data
Confidence choosing the right graph type for the right data and research question
1
Getting Started with R

Introduction to R with a focus on data structures and workflow needed for visualisation.

2
Introduction to Data Visualisation

Visualisation philosophy, perceptual principles, grammar of graphics, and when to use which chart type.

3
Descriptive Statistics

Summary statistics — means, medians, distributions, variance — that underpin most visualisations of linguistic data.

4
Data Visualisation with R

In-depth ggplot2: histograms, density plots, box plots, bar charts, scatter plots, and line graphs with worked linguistic examples.

5
Visualising and Analysing Survey Data

Cumulative density plots, diverging stacked bar charts, and Likert scale visualisation for questionnaire data.

6
Maps and Spatial Visualisation

Dialect maps, distribution maps, and choropleth maps of linguistic data using ggplot2 and sf.

Introduction to Statistics

Introduction to Statistics in the Humanities and Social Sciences

Statistical literacy and practical quantitative skills from the ground up, using R throughout

7 tutorials No background required Free
👥 Audience: Humanities and social science students and researchers with little or no prior statistics knowledge
🎯 Aim: Build statistical literacy from first principles through inferential testing in R

This course provides a conceptual and practical introduction to statistics for researchers whose background is in the humanities or social sciences. It begins with the philosophical foundations of quantitative reasoning and builds through descriptive statistics, visualisation, and inferential testing. By the end, learners will be able to conduct and interpret basic statistical analyses and communicate their results clearly.

Solid conceptual understanding of statistical thinking, probability, and hypothesis testing
Practical R skills for summarising, tabulating, visualising, and testing data
Ability to select, apply, and interpret t-tests, chi-square, correlation, and simple regression
Confidence reading and critically evaluating quantitative results in published research
1
Introduction to Quantitative Reasoning

Scientific thinking, the logic of hypothesis testing, and the role of quantitative methods in humanities and social science research.

2
Basic Concepts in Quantitative Research

Variables, data types, sampling, populations, reliability, and validity.

3
Getting Started with R

Introduction to R and RStudio. Focus on the first four sections.

4
Handling Tables in R

Importing, cleaning, reshaping, and summarising data frames using dplyr and tidyr.

5
Descriptive Statistics

Means, medians, standard deviations, distributions, and frequency tables in R.

6
Introduction to Data Visualisation

Visualisation principles and hands-on practice creating and customising graphs in R.

7
Basic Inferential Statistics

Hypothesis testing, p-values, confidence intervals, t-tests, chi-square, correlation, and simple linear regression with R exercises.

Introduction to Learner Corpus Research

Introduction to Learner Corpus Research

Corpus methods for studying learner language — from frequency comparison to error analysis

7 tutorials Basic corpus linguistics helpful Free
👥 Audience: Applied linguists; SLA researchers; language teachers and test developers
🎯 Aim: Introduce LCR methods from corpus construction through to lexical diversity, readability, and error analysis

Learner corpus research uses collections of authentic language produced by second-language learners to investigate the structure, development, and distinctiveness of interlanguage. This course covers the major analytical methods — concordancing, frequency comparison, collocation, POS tagging, lexical diversity, and error analysis — using the ICLE and LOCNESS corpora as running examples.

What learner corpora are and how they differ from native-speaker corpora
Skills for comparing learner and native-speaker language quantitatively using R
Experience with lexical diversity measures, readability scores, and spelling error detection
Ability to design and interpret a basic learner corpus study in the context of SLA theory
1
Introduction to Text Analysis

Key concepts — corpus, frequency, concordance, collocation — underpinning learner corpus research.

2
Getting Started with R

Data structures and workflow for corpus analysis.

3
String Processing

Cleaning, normalising, splitting, and extracting character patterns from raw learner corpus texts.

4
Concordancing (Keywords-in-Context)

Extracting and inspecting KWIC concordances from learner texts to investigate how learners use specific words or constructions.

5
Collocation and N-gram Analysis

Comparing collocational patterns between learner and native-speaker corpora for studying collocational competence and L1 transfer.

6
Analysing Learner Language with R

Frequency comparison, POS tagging, lexical diversity, readability scores, and spelling error detection with ICLE and LOCNESS examples.

7
Keyness and Keyword Analysis

Words systematically over- or under-used by learners relative to native-speaker norms — one of the most informative methods in LCR.

Natural Language Processing with R

Natural Language Processing with R

Text preprocessing, feature extraction, classification, NER, and transformer-based representations

7 tutorials Intermediate R required Free
👥 Audience: Computational linguists; data scientists working with language data
🎯 Prerequisite: Intermediate R skills; basic familiarity with descriptive statistics and simple regression

NLP builds on corpus and statistical methods to develop computational pipelines for understanding and generating language at scale. This course introduces the NLP workflow in R using real linguistic datasets, progressing from text preprocessing and feature engineering to supervised classification, topic models, and an introduction to working with large language model embeddings and APIs.

Clear understanding of the NLP pipeline from raw text to structured, analysable representations
Practical preprocessing skills: tokenisation, stopword removal, stemming, and lemmatisation
Experience building document-feature matrices and applying TF-IDF weighting
Hands-on practice with text classification, NER, and dependency parsing
Introduction to word embeddings and transformer-based representations
1
Introduction to Text Analysis

Situating NLP within corpus linguistics and computational linguistics.

2
String Processing

Foundation string manipulation — essential for all preprocessing steps in NLP pipelines.

3
Regular Expressions

Regex as the primary pattern-matching tool in text preprocessing and feature extraction.

4
Practical Overview of Text Analytics Methods

Document-feature matrices, TF-IDF, and basic classification workflows in R.

5
Topic Modelling

Probabilistic topic models as an unsupervised NLP method for discovering thematic structure.

6
Analysing Learner Language with R

POS tagging with udpipe, sequence analysis, and lexical diversity measures — key NLP tasks applied to real corpus data.

7
Network Analysis

Representing relational structure in language data — semantic networks, co-occurrence graphs, and social networks of linguistic interaction.


Long Courses

📅

Semester-Length Long Courses

Structured as 12-week programmes with weekly lectures, LADAL tutorials, and recommended readings. Designed to scaffold a full university course — or for motivated independent learners who want a thorough grounding in a field.

Introduction to Digital Humanities with R

Introduction to Digital Humanities with R

Computational methods for humanistic inquiry — from data literacy through corpus analysis, networks, and mapping

12 weeks No background required Free
👥 Audience: Literature, history, cultural studies, linguistics, media studies students and researchers
🕐 Structure: 1h lecture + 1.5h tutorial per week
🎯 Aim: Design, conduct, and communicate a reproducible computational analysis of a humanities dataset

Digital humanities applies computational methods to humanistic inquiry: analysing large literary corpora, mapping cultural data geographically, tracing discourse patterns across historical archives, or modelling networks of social interaction. This 12-week course introduces students to the core DH toolkit through R, with weekly tutorials grounded in real humanities datasets. No prior programming experience is assumed.

Week 1What Is Digital Humanities?

Overview of digital humanities — history, debates, and current landscape; relationship to corpus linguistics, text analysis, and data science; what counts as DH research.

  • Burdick et al. (2012). Digital humanities. MIT Press, Ch. 1
  • Drucker (2021). The digital humanities coursebook. Routledge, Ch. 1
Week 2Reproducible Research and Data Management

Why reproducibility matters in DH; introduction to R and RStudio; file organisation, project workflows, and version control basics.

  • Flanagan, J. (2025). Reproducibility, replicability, robustness, and generalizability in corpus linguistics. International Journal of Corpus Linguistics. doi:10.1075/ijcl.24113.fla
Week 3Getting Started with R

R syntax, data types, vectors, and data frames; the tidyverse ecosystem; reading and writing data.

  • Wickham & Grolemund (2016). R for data science. Ch. 1–3. r4ds.had.co.nz
Week 4Working with Text Data

How text is represented computationally; encoding, tokenisation, and the document-feature matrix; from raw text to structured data.

  • Jockers, M. L. (2014). Text analysis with R for students of literature. Springer, Ch. 1–3
Week 5Building and Exploring Digital Corpora

What is a corpus? Sampling principles, metadata, corpus design for humanities research; downloading and preparing digital texts.

  • Biber, Conrad & Reppen (1998). Corpus linguistics. Cambridge University Press, Ch. 1–2
Week 6Frequency Analysis and Visualisation

Zipf's law and frequency distributions; word counts, type-token ratios, and dispersion; principles of effective visualisation for humanities data.

  • Jockers (2014), Ch. 4–5
Week 7Concordancing, Collocations, and Keywords

Searching corpora; KWIC concordances and their interpretation; collocation and association measures; keyness and corpus comparison.

  • Baker, P. (2006). Using corpora in discourse analysis. Continuum, Ch. 3–4
Week 8Topic Modelling and Thematic Analysis

Latent Dirichlet Allocation (LDA); interpreting topics; applications in literary and historical research; limitations and critical perspectives.

  • Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84
  • Maier et al. (2021). Applying LDA topic modeling in communication research. In Computational methods for communication science (pp. 13–38). Routledge
Week 9Sentiment Analysis and Opinion Mining

Lexicon-based and machine learning approaches to sentiment; subjectivity, valence, and emotion; applications in literary and media studies.

  • Liu, B. (2012). Sentiment analysis and opinion mining. Ch. 1–2
Week 10Network Analysis for Humanities Research

Graphs and networks as representations of humanistic data; character networks in fiction; citation and social networks; centrality and community detection.

  • Moretti, F. (2011). Network theory, plot analysis. New Left Review, 68, 80–102
Week 11Maps, Space, and Geographic Visualisation

Spatial thinking in digital humanities; mapping literary geography, dialect distribution, and historical events; choropleth maps and point maps in R.

  • Drucker (2021), Ch. 8
Week 12Project Workshop and Critical Reflections

Critical DH — bias in corpora and algorithms, data ethics, representation, and positionality; communicating DH research; the future of digital humanities.

Student project presentations and peer feedback.

📚 Core Reading List
  • Baker, P. (2006). Using corpora in discourse analysis. Continuum.
  • Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics. Cambridge University Press.
  • Burdick, A., et al. (2012). Digital humanities. MIT Press.
  • Drucker, J. (2021). The digital humanities coursebook. Routledge.
  • Flanagan, J. (2025). Reproducibility in corpus linguistics. International Journal of Corpus Linguistics. doi:10.1075/ijcl.24113.fla
  • Jockers, M. L. (2014). Text analysis with R for students of literature. Springer.
  • Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.
  • Wickham, H., & Grolemund, G. (2016). R for data science. O'Reilly. r4ds.had.co.nz

Corpus Linguistics and Text Analysis with R

Introduction to Corpus Linguistics and Text Analysis with R

Corpus construction through concordancing, keywords, topic modelling, and network analysis

12 weeks No background required Free
👥 Audience: Students in linguistics, applied linguistics, translation, communication, and literary studies
🕐 Structure: 1h lecture + 1.5h tutorial per week
🎯 Aim: Introduce corpus-based methods and hands-on text analysis in R
Week 1Introduction to Corpus Linguistics and Text Analytics

What is corpus linguistics? Key concepts, history, and applications; corpus vs. introspective and experimental methods; overview of the course.

  • McEnery & Hardie (2012). Corpus linguistics: Method, theory and practice. CUP, Ch. 1–2
Week 2Working with Digital Data and Reproducibility

Principles of reproducible research; introduction to R Notebooks; file management and workflow.

Week 3Getting Started with R

R and RStudio; installing packages; basic syntax; workflow setup.

  • Wickham & Grolemund (2016), Ch. 1–3
Week 4Corpus Compilation and Preparation

Types of corpora; sampling principles and representativeness; metadata and annotation; legal and ethical issues in corpus construction.

  • Biber, Conrad & Reppen (1998), Ch. 1–2
Week 5Frequency and Dispersion

Counting words and n-grams; Zipf's law; normalised frequencies; dispersion measures and why they matter; type-token ratio.

  • McEnery & Hardie (2012), Ch. 3
  • Gries (2024). Frequency, dispersion, association, and keyness. Ch. 1–2
Week 6Concordancing and KWIC

Searching corpora; concordance displays and their interpretation; sorting and filtering; from examples to patterns.

  • Baker (2006), Ch. 3
Week 7Collocations and N-grams

Association measures (MI, t-score, log-likelihood, Dice); phraseology and formulaic sequences; n-gram extraction and analysis.

  • Gries (2024), Ch. 2
Week 8Keywords and Keyness

Reference corpora and keyness; log-likelihood and log ratio as keyness measures; interpretation and applications in discourse analysis.

  • Gries (2024), Ch. 3
Week 9Advanced Text Analytics I — Topic Modelling

Unsupervised text classification; LDA and its assumptions; interpreting and validating topic models; applications in linguistics and discourse analysis.

  • Maier et al. (2021). Applying LDA topic modeling in communication research (pp. 13–38)
Week 10Advanced Text Analytics II — Sentiment and Network Analysis

Sentiment lexicons; opinion mining; co-occurrence networks and semantic networks from corpus data.

  • Liu (2012), Ch. 1–2
Week 11Case Studies in Corpus Linguistics

Corpus-based studies of grammar, lexis, and discourse; from method to interpretation; writing up corpus research.

  • Baker (2006), Ch. 7
Week 12Project Workshop and Presentations

Ethics in corpus research; future directions; communicating corpus findings to non-specialist audiences.

Student project work.

📚 Core Reading List
  • Baker, P. (2006). Using corpora in discourse analysis. Continuum.
  • Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics. Cambridge University Press.
  • Flanagan, J. (2025). Reproducibility in corpus linguistics. International Journal of Corpus Linguistics. doi:10.1075/ijcl.24113.fla
  • Gries, S. T. (2024). Frequency, dispersion, association, and keyness (Studies in Corpus Linguistics, Vol. 115). John Benjamins.
  • Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.
  • McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge University Press.
  • Wickham, H., & Grolemund, G. (2016). R for data science. r4ds.had.co.nz

Introduction to Statistics in the Humanities

Introduction to Statistics in the Humanities and Social Sciences

Probability and descriptive statistics through regression and mixed-effects modelling, using R throughout

12 weeks No background required Free
👥 Audience: Students and researchers in linguistics, psychology, education, sociology, and related fields
🕐 Structure: 1h lecture + 1.5h tutorial per week
🎯 Aim: Practical and conceptual foundation in quantitative methods, no prior knowledge assumed
Week 1Introduction to Quantitative Research

The role of quantitative methods in humanities and social sciences; an overview of statistical thinking; the research cycle; types of research questions.

  • Field, Miles & Field (2012). Discovering statistics using R. Ch. 1
  • Baayen (2008). Analyzing linguistic data. Ch. 1
Week 2Basic Concepts in Quantitative Research

Data types and measurement scales; variables, operationalisation, and construct validity; sampling and representativeness; reliability and validity.

  • Gries (2013). Statistics for linguists. Ch. 1–2
Week 3Getting Started with R — Part 1

Introduction to R and RStudio; installing and loading packages; basic syntax and data structures; the tidyverse ecosystem.

  • Wickham & Grolemund (2016), Ch. 1–3
Week 4Loading and Handling Data

Importing datasets from CSV, Excel, and text files; data cleaning and transformation; working with factors and missing values.

  • Baayen (2008), Ch. 2
Week 5R Basics for Statistical Analysis

Vectors, factors, data frames, indexing, and subsetting; writing functions; applying operations across groups with dplyr.

  • Field, Miles & Field (2012), Ch. 2–3
Week 6Descriptive Statistics

Measures of central tendency and dispersion; frequency distributions; skewness and kurtosis; the normal distribution; summarising grouped data.

  • Baayen (2008), Ch. 3
  • Winter (2019). Statistics for linguists. Ch. 2
Week 7Visualising Data

Principles of effective visualisation; histograms, box plots, scatter plots, and bar charts; ggplot2 grammar of graphics.

  • Wickham & Grolemund (2016), Ch. 14
Week 8Hypothesis Testing and Power Analysis

The logic of null hypothesis significance testing; t-tests, ANOVA; p-values and their interpretation; effect sizes; statistical power and sample size planning.

  • Field, Miles & Field (2012), Ch. 4
  • Gries (2013), Ch. 3
Week 9Correlation and Simple Regression

Pearson and Spearman correlation; simple linear regression; interpreting intercepts and slopes; assumptions and diagnostics.

  • Baayen (2008), Ch. 4
Week 10Multiple Regression and Model Diagnostics

Multiple regression; multicollinearity; residual analysis; model comparison with AIC/BIC; stepwise and theory-driven model building.

  • Winter (2019), Ch. 5
Week 11Logistic Regression

Binary and ordinal outcomes; logistic regression model fitting and interpretation; odds ratios and predicted probabilities; the proportional odds model.

  • Baayen (2008), Ch. 5
  • Winter (2019), Ch. 6
Week 12Mixed-Effects Models

Why mixed effects? Random intercepts and random slopes; by-participant and by-item random effects; fitting and interpreting mixed models with lme4.

  • Gries (2013), Ch. 6
  • Field, Miles & Field (2012), Ch. 12
📚 Core Reading List
  • Baayen, R. H. (2008). Analyzing linguistic data. Cambridge University Press.
  • Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Sage.
  • Gries, S. T. (2013). Statistics for linguists. De Gruyter Mouton.
  • Wickham, H., & Grolemund, G. (2016). R for data science. O'Reilly. r4ds.had.co.nz
  • Winter, B. (2019). Statistics for linguists: An introduction using R. Routledge.

Advanced Statistics in the Humanities and Social Sciences

Advanced Statistics in the Humanities and Social Sciences

Multivariate modelling, classification, clustering, and survey data analysis using R

12 weeks Basic stats + R required Free
👥 Audience: Students and researchers with prior knowledge of basic statistics
🕐 Structure: 1h lecture + 1.5h tutorial per week
🎯 Prerequisite: Familiarity with t-tests, regression, and intermediate R skills
Week 1Advanced Data Management and Reproducible Workflows

Organising complex datasets; reproducibility in advanced research; scripting and automating analysis pipelines; version control with Git.

  • Flanagan (2025), Ch. 1
Week 2Review of Descriptive and Inferential Statistics

Quick review of key concepts: distributions, t-tests, correlations, confidence intervals, effect sizes, and power.

  • Field, Miles & Field (2012), Ch. 1–4
Week 3Advanced Regression — Multiple and Hierarchical Models

Multiple regression; interaction terms; hierarchical (nested) models; mixed-effects models with random intercepts and slopes.

  • Baayen (2008), Ch. 4–5
  • Winter (2019), Ch. 5–6
Week 4Logistic Regression and Generalised Linear Models

Binary and multinomial outcomes; model fitting and interpretation; goodness-of-fit; GLMs as a unified framework.

  • Winter (2019), Ch. 6
Week 5Classification — Decision Trees

Decision trees; recursive partitioning; overfitting and pruning; interpreting tree outputs; applications in linguistic classification problems.

  • Gries (2013), Ch. 6
Week 6Classification — Random Forests and Ensemble Methods

Ensemble learning; bagging and boosting; random forests; variable importance; improving prediction accuracy and generalisability.

  • James, Witten, Hastie & Tibshirani (2021). An introduction to statistical learning. Ch. 8
Week 7Clustering and Correspondence Analysis

Unsupervised classification; k-means and hierarchical clustering; choosing the number of clusters; correspondence analysis for categorical data.

  • Gries (2013), Ch. 7
Week 8Survey and Questionnaire Data Analysis I

Preparing survey data; dealing with missing values; Likert scales and their properties; descriptive analysis and visualisation of survey items.

  • Field, Miles & Field (2012), Ch. 10
  • Baayen (2008), Ch. 6
Week 9Survey and Questionnaire Data Analysis II

Reliability (Cronbach's α, McDonald's ω); factor analysis and scale validation; cross-tabulations and chi-square; ordinal regression for Likert outcomes.

  • Field, Miles & Field (2012), Ch. 11
Week 10Dimension Reduction and Multivariate Techniques

Principal Component Analysis (PCA); multidimensional scaling (MDS); detecting latent variables; applications to linguistic and social science data.

  • Gries (2013), Ch. 8
Week 11Model Evaluation, Diagnostics, and Advanced Visualisation

Residual analysis and outlier detection; model comparison and selection criteria (AIC, BIC, cross-validation); visualisation for multivariate data.

  • Winter (2019), Ch. 7
Week 12Applications and Student Mini-Projects

Integrating advanced methods into humanities and social science research; ethical considerations; communicating complex statistical results; reproducibility revisited.

Student project work applying classification, clustering, and survey analysis to real datasets.

  • Baayen (2008), Ch. 7
  • Field, Miles & Field (2012), Ch. 12
📚 Core Reading List
  • Baayen, R. H. (2008). Analyzing linguistic data. Cambridge University Press.
  • Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Sage.
  • Flanagan, J. (2025). Reproducibility in corpus linguistics. International Journal of Corpus Linguistics. doi:10.1075/ijcl.24113.fla
  • Gries, S. T. (2013). Statistics for linguists. De Gruyter Mouton.
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning. Springer.
  • Winter, B. (2019). Statistics for linguists: An introduction using R. Routledge.

Ready to start learning?

All LADAL courses are free, self-paced, and built around reproducible R workflows. No enrolment required — just dive in.


Back to top | Back to HOME