COURSES

Introduction
Welcome to LADAL Courses. This page provides structured guidance for students and researchers interested in applying computational and quantitative methods to the study of language and the humanities.
What are LADAL Courses?
LADAL Courses are curated sequences of topics, tutorials, and readings designed to help learners progress systematically from foundational knowledge to more advanced skills. They are split into Short Courses, which are designed for independent study and consist of 5-6 tutorials, and Long Courses, which are structured as full semester-length courses. Each course combines conceptual background, hands-on practice, and reproducible workflows using R, integrating resources from the Language Technology and Data Analysis Laboratory (LADAL).
By following a LADAL course, learners can:
- Understand the key concepts and principles behind quantitative research and text analysis.
- Develop practical skills in R, including data management, visualization, statistics, and text analytics.
- Apply statistical and computational methods to real-world research questions in linguistics, humanities, and social sciences.
- Build reproducible and transparent research workflows, ensuring robust and reliable analyses.
Learners can follow the pathways sequentially or focus on specific topics of interest, depending on their background and research goals. These pathways are suitable for undergraduate and postgraduate students, as well as researchers seeking to enhance their computational and statistical skills.
Short Courses
Our short courses are designed for independent learners and can be worked through at your own pace. They cover some of LADAL’s most popular topics and consist of 6-10 tutorials, organised in a logical sequence.
Introduction to Corpus Linguistics
This short course provides an introduction to key methods in corpus linguistics using R. Aimed at complete beginners, it will introduce you to computational analysis of language, using R and RStudio, and key corpus linguistic methods such as concordancing, collocations, and keyword analysis. It includes a combination of theory, hands-on R tutorials, and case studies in language analysis.
1. Introduction to Text Analysis
In the first tutorial, you will be introduced to Text Analysis and related fields, including Corpus Linguistics. This tutorial also defines some key concepts that you will need for the rest of this course, including corpus, concordancing, collocations, and more!
This second tutorial is your first taste of the programming language, R, which will be used for the rest of the tutorials in this course. It specifically focuses on R for analysing language data, but it offers valuable information for anyone who wants to get started with R. For the purposes of this course, focus on the first four sections (up to Working with Tables).
This tutorial introduces string processing, which is an essential part of performing any kind of computational analysis on language data. The basic techniques introduced here will be used in all subsequent tutorials in this course.
4. Concordancing (keywords-in-context)
This tutorial introduces our first corpus linguistic method: how to find words or phrases in text and display concordances, or keyword-in-context (KWIC) displays, with R.
5. Collocation and N-gram Analysis
This tutorial introduces collocation analysis and identifying N-grams with R and shows how to extract and visualise semantic links between words.
6. Keyness and Keyword Analysis
This tutorial introduces keyness analysis and identifying keywords with R and shows how to visualise keywords.
In this final tutorial, you will see how the techniques and methods you have learned so far can be put together. It presents different case studies or use cases that highlight how to do full corpus-based analyses by implementing procedures shown in other LADAL tutorials.
Introduction to Text Analysis
This course provides an introduction to key methods in text analysis using R. It is aimed at beginners, and takes you from introductory overviews of text analysis and text analytic techniques through to advanced analytic methods such as topic modelling, sentiment analysis and network analysis, with practical lessons in R along the way.
1. Introduction to Text Analysis
This tutorial introduces Text Analysis, i.e. computer-based analysis of language data or the (semi-)automated extraction of information from text. It also defines a number of key concepts and terms that you will need for the rest of this course.
This second tutorial is your first taste of the programming language, R, which will be used for the rest of the tutorials in this course. It specifically focuses on R for analysing language data, but it offers valuable information for anyone who wants to get started with R. For the purposes of this course, focus on the first four sections (up to Working with Tables).
This tutorial introduces string processing, which is an essential part of performing any kind of computational analysis on language data. The basic techniques introduced here will be used in all subsequent tutorials in this course.
4. Practical Overview of Selected Text Analytics Methods
This tutorial showcases some common text analysis methods and serves as a more practical introduction to text analytics. You will get to try out running some of these methods using the R knowledge you’ve picked up in the last two tutorials.
This tutorial introduces topic modelling using R. Topic modelling is a text analytic technique that aims to uncover topics in documents. It covers both theory and practical application in R.
This tutorial introduces sentiment analysis, which is a text analytic technique that extracts opinion or emotion from texts. This tutorial covers both theory and practical application of sentiment analysis using R.
This last tutorial introduces network analysis using R. Network analysis is a method for visualisation that can be used to represent various types of data. As with the previous tutorials, this will offer theoretical background and practical exercises using R.
Long Courses
These courses are structured as 12 week, semester-long courses. They are a great scaffold for creating university courses. If you are an independent learner, feel free to follow along with the tutorials and readings, and refer to the lecture topic as a helpful overview of that week’s focus.
Each long course is organized as a series of weekly sessions with:
- Lecture topics outlining the main concepts and learning objectives.
- LADAL tutorials providing step-by-step, hands-on exercises.
- Recommended readings to reinforce and expand the theoretical and methodological foundations.
Introduction to Corpus Linguistics and Text Analysis with R
This pathway introduces students to corpus-based methods and practical approaches to text analysis. Students learn to compile, manage, and explore corpora, and gain hands-on experience with both traditional corpus methods (e.g., concordancing, collocations, keyness analysis) and contemporary text analytics techniques such as sentiment analysis and topic modelling.
Audience: Students in linguistics, applied linguistics, translation, communication, and literary studies who wish to develop practical corpus analysis skills for investigating patterns of language use.
Aim: Introduce corpus-based methods for linguistic analysis and hands-on text analysis with R.
Structure: 12 weekly sessions (1h lecture + 1.5h tutorial)
Week 1: Introduction to Corpus Linguistics & Text Analytics
- LADAL content to be used: Tutorial Introduction to Text Analysis
- Additional Resources:
McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge University Press, Ch. 1–2
Week 2: Working with Digital Data & Reproducibility
- LADAL content to be used: Tutorials Reproducible Research and Creating R Notebooks
- Additional Resources:
Flanagan, J. (2025). Reproducibility, replicability, robustness, and generalizability in corpus linguistics. International Journal of Corpus Linguistics. Advance online publication. https://doi.org/10.1075/ijcl.24113.fla
Week 3: Getting Started with R
- LADAL content to be used: Tutorials Why R for Corpus Linguistics, Getting Started, and Loading and Saving Data
- Additional Resources:
Wickham, H., & Grolemund, G. (2016). R for data science. Ch. 1–3. https://r4ds.had.co.nz
Week 4: Corpus Compilation & Preparation
- Lecture: Types of corpora, sampling, metadata
- LADAL content to be used: Tutorial Downloading Texts from Project Gutenberg
- Additional Resources:
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge University Press, Ch. 1–2
Week 5: Frequency & Dispersion
- Lecture: Counting words, Zipf’s law, dispersion measures
- LADAL content to be used: “Handling Tables in R”
- Additional Resources:
McEnery, T., & Hardie, A. (2012). Ch. 3
Week 6: Concordancing & KWIC
- Lecture: Searching corpora, concordances, interpretation
- LADAL content to be used: “Concordancing with R”
- Additional Resources:
Baker, P. (2006). Using corpora in discourse analysis. Ch. 3
Week 7: Collocations & N-grams
- Lecture: Association measures and phraseology
- LADAL content to be used: “Collocation and N-gram Analysis”
- Additional Resources:
Gries, S. T. (2024). Frequency, dispersion, association, and keyness. Ch. 2
Week 8: Keywords & Keyness
- LADAL content to be used: “Keyness and Keyword Analysis”
- Additional Resources:
Gries, S. T. (2024). Ch. 3
Week 9: Advanced Text Analytics I: Topic Modelling
- Lecture: Latent topics, probabilistic models
- LADAL content to be used: “Topic Modelling”
- Additional Resources:
Maier, D., et al. (2021). Applying LDA topic modeling in communication research. pp. 13–38
Week 10: Advanced Text Analytics II: Sentiment & Network Analysis
- Lecture: Social dimensions of corpus analysis
- LADAL content to be used: “Sentiment Analysis with R” and “Network Analysis”
- Additional Resources:
Liu, B. (2012). Sentiment analysis and opinion mining. Ch. 1–2
Week 11: Case Studies in Corpus Linguistics
- Lecture: Corpus applications in linguistics & discourse studies
- LADAL content to be used: “Corpus Linguistics with R”
- Additional Resources:
Baker, P. (2006). Ch. 7
Week 12: Project Workshop & Presentations
- Lecture: Ethics, future directions, wrap-up
- Tutorial: Student project work
Core Readings
Baker, P. (2006). Using corpora in discourse analysis. Continuum.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge University Press.
Flanagan, J. (2025). Reproducibility, replicability, robustness, and generalizability in corpus linguistics. International Journal of Corpus Linguistics. Advance online publication. https://doi.org/10.1075/ijcl.24113.fla
Gries, S. T. (2024). Frequency, dispersion, association, and keyness (Studies in Corpus Linguistics, Vol. 115). John Benjamins.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., … & Adam, S. (2021). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. In Computational methods for communication science (pp. 13–38). Routledge.
McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge University Press.
Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media. https://r4ds.had.co.nz