COURSES

Introduction

Welcome to LADAL Courses. This page provides structured guidance for students and researchers interested in applying computational and quantitative methods to the study of language and the humanities.

What are LADAL Courses?

LADAL Courses are curated sequences of topics, tutorials, and readings designed to help learners progress systematically from foundational knowledge to more advanced skills. They are split into Short Courses, which are designed for independent study and consist of 5-6 tutorials, and Long Courses, which are structured as full semester-length courses. Each course combines conceptual background, hands-on practice, and reproducible workflows using R, integrating resources from the Language Technology and Data Analysis Laboratory (LADAL).

By following a LADAL course, learners can:

  • Understand the key concepts and principles behind quantitative research and text analysis.
  • Develop practical skills in R, including data management, visualization, statistics, and text analytics.
  • Apply statistical and computational methods to real-world research questions in linguistics, humanities, and social sciences.
  • Build reproducible and transparent research workflows, ensuring robust and reliable analyses.

Learners can follow the pathways sequentially or focus on specific topics of interest, depending on their background and research goals. These pathways are suitable for undergraduate and postgraduate students, as well as researchers seeking to enhance their computational and statistical skills.

Short Courses

Our short courses are designed for independent learners and can be worked through at your own pace. They cover some of LADAL’s most popular topics and consist of 6-10 tutorials, organised in a logical sequence.

Introduction to Corpus Linguistics

This short course provides an introduction to key methods in corpus linguistics using R. Aimed at complete beginners, it will introduce you to computational analysis of language, using R and RStudio, and key corpus linguistic methods such as concordancing, collocations, and keyword analysis. It includes a combination of theory, hands-on R tutorials, and case studies in language analysis.

1. Introduction to Text Analysis

In the first tutorial, you will be introduced to Text Analysis and related fields, including Corpus Linguistics. This tutorial also defines some key concepts that you will need for the rest of this course, including corpus, concordancing, collocations, and more!

2. Getting Started with R

This second tutorial is your first taste of the programming language, R, which will be used for the rest of the tutorials in this course. It specifically focuses on R for analysing language data, but it offers valuable information for anyone who wants to get started with R. For the purposes of this course, focus on the first four sections (up to Working with Tables).

3. String Processing

This tutorial introduces string processing, which is an essential part of performing any kind of computational analysis on language data. The basic techniques introduced here will be used in all subsequent tutorials in this course.

4. Concordancing (keywords-in-context)

This tutorial introduces our first corpus linguistic method: how to find words or phrases in text and display concordances, or keyword-in-context (KWIC) displays, with R.

5. Collocation and N-gram Analysis

This tutorial introduces collocation analysis and identifying N-grams with R and shows how to extract and visualise semantic links between words.

6. Keyness and Keyword Analysis

This tutorial introduces keyness analysis and identifying keywords with R and shows how to visualise keywords.

7. Corpus Linguistics with R

In this final tutorial, you will see how the techniques and methods you have learned so far can be put together. It presents different case studies or use cases that highlight how to do full corpus-based analyses by implementing procedures shown in other LADAL tutorials.

Introduction to Text Analysis

This course provides an introduction to key methods in text analysis using R. It is aimed at beginners, and takes you from introductory overviews of text analysis and text analytic techniques through to advanced analytic methods such as topic modelling, sentiment analysis and network analysis, with practical lessons in R along the way.

1. Introduction to Text Analysis

This tutorial introduces Text Analysis, i.e. computer-based analysis of language data or the (semi-)automated extraction of information from text. It also defines a number of key concepts and terms that you will need for the rest of this course.

2. Getting Started with R

This second tutorial is your first taste of the programming language, R, which will be used for the rest of the tutorials in this course. It specifically focuses on R for analysing language data, but it offers valuable information for anyone who wants to get started with R. For the purposes of this course, focus on the first four sections (up to Working with Tables).

3. String Processing

This tutorial introduces string processing, which is an essential part of performing any kind of computational analysis on language data. The basic techniques introduced here will be used in all subsequent tutorials in this course.

4. Practical Overview of Selected Text Analytics Methods

This tutorial showcases some common text analysis methods and serves as a more practical introduction to text analytics. You will get to try out running some of these methods using the R knowledge you’ve picked up in the last two tutorials.

5. Topic Modeling

This tutorial introduces topic modelling using R. Topic modelling is a text analytic technique that aims to uncover topics in documents. It covers both theory and practical application in R.

6. Sentiment Analysis

This tutorial introduces sentiment analysis, which is a text analytic technique that extracts opinion or emotion from texts. This tutorial covers both theory and practical application of sentiment analysis using R.

7. Network Analysis

This last tutorial introduces network analysis using R. Network analysis is a method for visualisation that can be used to represent various types of data. As with the previous tutorials, this will offer theoretical background and practical exercises using R.

Introduction to Statistics in the Humanities and Social Sciences

This course provides a simple introduction to statistics and quantitative research. It is aimed at learners with a background in the humanities and social sciences who have little to no prior knowledge in statistics. It covers basic concepts in science and quantitative research, an introduction to working with R and RStudio, basic statistical concepts and methods, and how to perform statistical tests and visualise results in R.

1. Introduction to Quantitative Reasoning

This tutorial takes a philosophical or history-of-ideas approach and introduces the logical and cognitive underpinnings of the scientific method. It provides a simple and engaging introduction to science and quantitative reasoning.

2. Basic Concepts in Quantitative Research

This tutorial introduces basic concepts in statistics and quantitative research. The concepts introduced and defined here will be used throughout the rest of this course.

3. Getting started

This tutorial is your first taste of the programming language, R, which will be used for the rest of the tutorials in this course. It specifically focuses on R for analysing language data, but it offers valuable information for anyone who wants to get started with R. For the purposes of this course, focus on the first four sections (up to Working with Tables). You will cover uses of R more specific to this course in subsequent tutorials.

4. Handling tables in R

This tutorial shows how to work with tables and how to tabulate data in R. Tabular data is very common, so this will help you become more familiar with handling and processing data in R.

5. Descriptive Statistics

This tutorial introduces descriptive statistics, which focuses on the description and visualisation of data. This tutorial covers both theoretical explanations of various statistical concepts, and practical exercises in R. This will be your first change to try out some statistical tests using what you have learned in R.

6. Introduction to Data Viz

This tutorial introduces basic data visualisation using R. It includes a discussion on different data visualisation philosophies and also some practical exercises creating and modifying graphs.

7. Basic Inferential Statistics

This tutorial introduces basic inferential statistics, which focuses on analysing and interpreting data. The inferential procedures covered in this tutorial allow you to analyze relationships, test hypotheses, and understand the broader implications of data. This final tutorial will provide both theoretical explanations and practical exercises using what you have learned in R.

Long Courses

These courses are structured as 12 week, semester-long courses. They are a great scaffold for creating university courses. If you are an independent learner, feel free to follow along with the tutorials and readings, and refer to the lecture topic as a helpful overview of that week’s focus.

Each long course is organized as a series of weekly sessions with:

  • Lecture topics outlining the main concepts and learning objectives.
  • LADAL tutorials providing step-by-step, hands-on exercises.
  • Recommended readings to reinforce and expand the theoretical and methodological foundations.

Introduction to Corpus Linguistics and Text Analysis with R

This pathway introduces students to corpus-based methods and practical approaches to text analysis. Students learn to compile, manage, and explore corpora, and gain hands-on experience with both traditional corpus methods (e.g., concordancing, collocations, keyness analysis) and contemporary text analytics techniques such as sentiment analysis and topic modelling.

Audience: Students in linguistics, applied linguistics, translation, communication, and literary studies who wish to develop practical corpus analysis skills for investigating patterns of language use.

Aim: Introduce corpus-based methods for linguistic analysis and hands-on text analysis with R.

Structure: 12 weekly sessions (1h lecture + 1.5h tutorial)

Week 1: Introduction to Corpus Linguistics & Text Analytics
  • LADAL content to be used: Tutorial Introduction to Text Analysis
  • Additional Resources:
    McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge University Press, Ch. 1–2
Week 2: Working with Digital Data & Reproducibility
  • LADAL content to be used: Tutorials Reproducible Research and Creating R Notebooks
  • Additional Resources:
    Flanagan, J. (2025). Reproducibility, replicability, robustness, and generalizability in corpus linguistics. International Journal of Corpus Linguistics. Advance online publication. https://doi.org/10.1075/ijcl.24113.fla
Week 3: Getting Started with R
Week 4: Corpus Compilation & Preparation
  • Lecture: Types of corpora, sampling, metadata
  • LADAL content to be used: Tutorial Downloading Texts from Project Gutenberg
  • Additional Resources:
    Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge University Press, Ch. 1–2
Week 5: Frequency & Dispersion
  • Lecture: Counting words, Zipf’s law, dispersion measures
  • LADAL content to be used: “Handling Tables in R”
  • Additional Resources:
    McEnery, T., & Hardie, A. (2012). Ch. 3
Week 6: Concordancing & KWIC
  • Lecture: Searching corpora, concordances, interpretation
  • LADAL content to be used: “Concordancing with R”
  • Additional Resources:
    Baker, P. (2006). Using corpora in discourse analysis. Ch. 3
Week 7: Collocations & N-grams
  • Lecture: Association measures and phraseology
  • LADAL content to be used: “Collocation and N-gram Analysis”
  • Additional Resources:
    Gries, S. T. (2024). Frequency, dispersion, association, and keyness. Ch. 2
Week 8: Keywords & Keyness
  • LADAL content to be used: “Keyness and Keyword Analysis”
  • Additional Resources:
    Gries, S. T. (2024). Ch. 3
Week 9: Advanced Text Analytics I: Topic Modelling
  • Lecture: Latent topics, probabilistic models
  • LADAL content to be used: “Topic Modelling”
  • Additional Resources:
    Maier, D., et al. (2021). Applying LDA topic modeling in communication research. pp. 13–38
Week 10: Advanced Text Analytics II: Sentiment & Network Analysis
  • Lecture: Social dimensions of corpus analysis
  • LADAL content to be used: “Sentiment Analysis with R” and “Network Analysis”
  • Additional Resources:
    Liu, B. (2012). Sentiment analysis and opinion mining. Ch. 1–2
Week 11: Case Studies in Corpus Linguistics
  • Lecture: Corpus applications in linguistics & discourse studies
  • LADAL content to be used: “Corpus Linguistics with R”
  • Additional Resources:
    Baker, P. (2006). Ch. 7
Week 12: Project Workshop & Presentations
  • Lecture: Ethics, future directions, wrap-up
  • Tutorial: Student project work
Core Readings
  • Baker, P. (2006). Using corpora in discourse analysis. Continuum.

  • Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge University Press.

  • Flanagan, J. (2025). Reproducibility, replicability, robustness, and generalizability in corpus linguistics. International Journal of Corpus Linguistics. Advance online publication. https://doi.org/10.1075/ijcl.24113.fla

  • Gries, S. T. (2024). Frequency, dispersion, association, and keyness (Studies in Corpus Linguistics, Vol. 115). John Benjamins.

  • Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.

  • Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., … & Adam, S. (2021). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. In Computational methods for communication science (pp. 13–38). Routledge.

  • McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge University Press.

  • Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media. https://r4ds.had.co.nz

Introduction to Statistics in the Humanities and Social Sciences

This pathway builds students’ understanding of quantitative reasoning and statistical analysis tailored to the humanities and social sciences. Starting with probability and descriptive statistics, the course progresses to regression models, mixed-effects modelling, classification, and power analysis, emphasizing interpreting and applying statistics to real-world research problems.

Audience: Students and researchers in linguistics, psychology, education, sociology, and related fields who want to develop statistical literacy and the ability to conduct and critically evaluate quantitative studies.

Aim: Provide students in the humanities and social sciences with a practical and conceptual foundation in quantitative methods, emphasizing the application of statistics using R.
Students will learn to summarize, visualize, model, and interpret data relevant to their research questions, while developing reproducible workflows.

Structure: 12 weekly sessions (1h lecture + 1.5h tutorial)

Week 1: Introduction to Quantitative Research
  • Lecture: Role of quantitative methods in humanities and social sciences; overview of statistical thinking
  • LADAL content to be used: Introduction to Quantitative Reasoning
  • Additional Readings:
    Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R, Ch. 1
    Baayen, R. H. (2008). Analyzing linguistic data, Ch. 1
Week 2: Basic Concepts in Quantitative Research
  • Lecture: Data types, variables, sampling, reliability, and validity
  • LADAL content to be used: Basic Concepts in Quantitative Research
  • Additional Readings:
    Gries, S. T. (2013). Statistics for linguists, Ch. 1–2
Week 3: Getting Started with R – Part 1
  • Lecture: Introduction to R and RStudio; installing packages, basic syntax, workflow setup
  • LADAL content to be used: Getting Started
  • Additional Readings:
    Wickham, H., & Grolemund, G. (2016). R for data science, Ch. 1–3
Week 4: Getting Started with R – Part 2: Loading and Handling Data
  • Lecture: Importing datasets, data cleaning, and working with tables
  • LADAL content to be used: Loading and Saving Data; Handling Tables in R
  • Additional Readings:
    Baayen, R. H. (2008), Ch. 2
Week 5: R Basics for Statistical Analysis
  • Lecture: Vectors, factors, data frames, indexing, and subsetting
  • LADAL content to be used: Getting Started; Loading Data
  • Additional Readings:
    Field, Miles & Field (2012), Ch. 2–3
Week 6: Descriptive Statistics
  • Lecture: Summarizing data: means, medians, variances, standard deviations, distributions
  • LADAL content to be used: Descriptive Statistics
  • Additional Readings:
    Baayen (2008), Ch. 3
    Winter, B. (2019). Statistics for linguists, Ch. 2
Week 7: Visualizing Data
  • Lecture: Principles of effective visualization, histograms, boxplots, scatterplots
  • LADAL content to be used: Data Visualisation with R
  • Additional Readings:
    Wickham & Grolemund (2016), Ch. 14
Week 8: Hypothesis Testing and Power Analysis
  • Lecture: t-tests, ANOVA, p-values, confidence intervals, and power
  • LADAL content to be used: Basic Inferential Statistics
  • Additional Readings:
    Field, Miles & Field (2012), Ch. 4
    Gries (2013), Ch. 3
Week 9: Correlation & Regression Analysis
  • Lecture: Pearson correlation, simple linear regression, interpreting coefficients
  • LADAL content to be used: Regression Analysis
  • Additional Readings:
    Baayen (2008), Ch. 4
Week 10: Advanced Regression: Multiple Regression & Diagnostics
  • Lecture: Multiple regression, multicollinearity, residual analysis, model fit
  • LADAL content to be used: Regression Analysis
  • Additional Readings:
    Winter (2019), Ch. 5
Week 11: Logistic Regression
  • Lecture: Binary outcomes, model fitting
  • LADAL content to be used: Regression Analysis
  • Additional Readings:
    Baayen (2008), Ch. 5
    Winter (2019), Ch. 6
Week 12: Mixed Effects Models
  • Lecture: Introduction to hierarchical models
  • LADAL content to be used: Mixed-Effects Models
  • Tutorial: Student mini-projects applying learned methods to real datasets
  • Additional Readings:
    Gries (2013), Ch. 6
    Field, Miles & Field (2012), Ch. 12
Core Readings
  • Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Sage.
  • Baayen, R. H. (2008). Analyzing linguistic data. Cambridge University Press.
  • Gries, S. T. (2013). Statistics for linguists. De Gruyter Mouton.
  • Winter, B. (2019). Statistics for linguists: An introduction using R. Routledge

Advanced Statistics in the Humanities and Social Sciences

This pathway develops advanced statistical skills for research in the humanities and social sciences, focusing on multivariate modeling, classification, clustering, and the analysis of survey/questionnaire data. Students gain hands-on experience with decision trees, random forests, clustering methods, factor analysis, and model diagnostics, all applied using R and reproducible workflows.

Audience: Students and researchers with prior knowledge of basic statistics in the humanities/social sciences who wish to apply advanced quantitative methods to research problems and large datasets.

Aim: Develop advanced skills in quantitative methods for humanities and social sciences, including multivariate modeling, classification, clustering, and analysis of survey/questionnaire data using R.
The course emphasizes practical application, data visualization, and reproducible workflows using LADAL tutorials.

Structure: 12 weekly sessions (1h lecture + 1.5h tutorial)

Week 1: Advanced Data Management & Reproducible Workflows
  • Lecture: Organizing complex datasets, reproducibility, scripting for analysis pipelines
  • LADAL content to be used: Reproducible Research; Creating R Notebooks
  • Additional Readings: Flanagan (2025), Ch. 1
Week 2: Review of Descriptive and Inferential Statistics
  • Lecture: Quick review of key concepts: distributions, t-tests, correlations, confidence intervals
  • LADAL content to be used: Descriptive Statistics; Basic Inferential Statistics
  • Additional Readings: Field, Miles & Field (2012), Ch. 1–4
Week 3: Advanced Regression Models I – Multiple and Hierarchical Regression
  • Lecture: Multiple regression, interaction terms, hierarchical/mixed-effects models
  • LADAL content to be used: Regression Analysis; Mixed-Effects Models
  • Additional Readings: Baayen (2008), Ch. 4–5; Winter (2019), Ch. 5–6
Week 4: Logistic Regression & Generalized Linear Models
  • Lecture: Binary and multinomial outcomes, model fitting, interpretation, goodness-of-fit
  • LADAL content to be used: Regression Analysis
  • Additional Readings: Winter (2019), Ch. 6
Week 5: Classification Methods – Decision Trees
  • Lecture: Decision trees, recursive partitioning, interpreting tree outputs
  • LADAL content to be used: Tree-Based Models
  • Additional Readings: Gries (2013), Ch. 6
Week 6: Classification Methods – Random Forests & Ensemble Methods
  • Lecture: Ensemble learning, bagging, random forests, improving prediction accuracy
  • LADAL content to be used: Tree-Based Models
  • Additional Readings: James, Witten, Hastie & Tibshirani (2021), Ch. 8
Week 7: Clustering & Correspondence Analysis
  • Lecture: Unsupervised classification, k-means, hierarchical clustering, correspondence analysis for categorical data
  • LADAL content to be used: Cluster and Correspondence Analysis
  • Additional Readings: Gries (2013), Ch. 7
Week 8: Questionnaire and Survey Data Analysis I – Data Cleaning & Descriptive Analysis
  • Lecture: Preparing survey data, dealing with missing values, summarizing Likert-scale items
  • LADAL content to be used: Visualising and Analysing Questionnaire and Survey Data
  • Additional Readings: Field, Miles & Field (2012), Ch. 10; Baayen (2008), Ch. 6
Week 9: Questionnaire and Survey Data Analysis II – Advanced Techniques
  • Lecture: Factor analysis, reliability measures (Cronbach’s alpha), cross-tabulations, chi-square tests
  • LADAL content to be used: Visualising and Analysing Questionnaire and Survey Data
  • Additional Readings: Field, Miles & Field (2012), Ch. 11
Week 10: Dimension Reduction & Multivariate Techniques
  • Lecture: Principal Component Analysis (PCA), multidimensional scaling, detecting latent variables
  • LADAL content to be used: Dimension Reduction Methods
  • Additional Readings: Gries (2013), Ch. 8
Week 11: Model Evaluation, Diagnostics, and Advanced Visualizations
  • Lecture: Residual analysis, detecting outliers, model comparison, visualization techniques for multivariate data
  • LADAL content to be used: Data Visualisation with R; Regression Analysis
  • Additional Readings: Winter (2019), Ch. 7
Week 12: Applications & Student Mini-Projects
  • Lecture: Integrating advanced methods into humanities/social science research projects; ethical considerations; reproducibility
  • Tutorial: Student project work applying classification, clustering, and survey analysis
  • Additional Readings: Baayen (2008), Ch. 7; Field, Miles & Field (2012), Ch. 12
Core Readings
  • Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Sage.
  • Baayen, R. H. (2008). Analyzing linguistic data. Cambridge University Press.
  • Gries, S. T. (2013). Statistics for linguists. De Gruyter Mouton.
  • Winter, B. (2019). Statistics for linguists: An introduction using R. Routledge
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning. Springer, Ch. 8

Back to top

Back to HOME