TUTORIALS

Welcome to LADAL tutorials. This page contains all tutorials provided by LADAL, organised in seven sections. The first section, Data Science Basics, offers some useful background reading which introduces key concepts and best practices in digital and quantitative research. The rest of the sections mainly consist of more practical, follow-along tutorials. The first of these, R Basics, introduces the programming language, R, which is used for all other LADAL tutorials and tools. Next is Data Visualisation, which will show you how to create visual representations of data (such as graphs and tables) using R. The Statistics section covers statistical methods with R, from simple overviews of descriptive and basic inferential statistics through to specific statistical models that may be useful when working with text and language data. The Text Analytics section introduces a range of computational methods for analysing text, covering both the theoretical background of text analytics and practical, hands-on lessons. Next, we have a Case Studies section, which offers more specific examples of the various kinds of research you can do with the methods shown in LADAL tutorials. Finally, the How-Tos section includes some brief tutorials on accessing or converting data which you could then use to apply the methods taught in other tutorials.

Generally, we recommend starting with R Basics, as the content covered in this section will be assumed knowledge for all subsequent sections. Once you are familiar with R and RStudio, you can move on to Data Visualisation, Statistics, and/or Text Analytics, referring to the Data Science Basics section as needed. If you are completely new to computational methods and you’d like to explore whether LADAL will be helpful for your work, you might like to first have a quick look through Introduction to Text Analysis and some of our Case Studies.

Data Science Basics

This section is an introduction to digital and quantitative research, covering best practices for working with digital data, key principles behind reproducible research, and the basic building blocks of science and quantitative analysis. These tutorials provide great theoretical backgrounds to the practical tutorials in other sections, and can be supplemented as needed for your particular goals.

Working with Computers

This tutorial provides advice and general tips on how to keep your computer clean and running smoothly, how to organise files and folders, and how to store your data safely and orderly.

Introduction to Data Management

This tutorial introduces basic data management techniques and measures to keep your folder system clean and tidy.

Reproducible Research

This tutorial introduces options to make your research and workflows efficient and reproducible.

Introduction to Quantitative Reasoning

This tutorial takes a philosophical or history-of-ideas approach and introduces the logical and cognitive underpinnings of the scientific method.

Basic Concepts in Quantitative Research

This tutorial introduces basic concepts of data analysis and quantitative research.

R Basics

This section introduces the programming language R, which is the basis for all other LADAL tutorials and tools. The content covered here will be assumed knowledge for all subsequent sections, so we recommend starting here if you are not already familiar with R. These tutorials are designed to be worked through in order, before moving on to the other section(s) that you are interested in.

Why R?

This site provides our reasoning for focusing (almost exclusively) on R in LADAL.

Getting started

This tutorial shows how to get started with R and it specifically focuses on R for analysing language data but it offers valuable information for anyone who wants to get started with R.

Loading and saving data

This tutorial shows how you can load and save different types of data when working with R.

String Processing

This tutorial introduces string processing and this can be used when working with language data.

Regular Expressions

This tutorial introduces regular expressions and how they can be used when working with language data.

Handling tables in R

This tutorial shows how to work with tables and how to tabulate data in R.

Data Visualisation

This section introduces some simple principles of data visualisation and shows you how to create visual representations of your data using R. The tutorials in this section require some familiarity with R and RStudio, so you should be comfortable with the content in R Basics before proceeding here.

Introduction to Data Viz

This tutorial introduces data visualisation using R and shows how to modify different types of visualisations in the ggplot framework in R.

Data Visualisation with R

This tutorial introduces different types of data visualisation and how to prepare your data for different plot types.

Showcase: How to create a typological map

This tutorial shows how to create typological maps in R with leaflet.

Statistics

This section covers various statistical methods and how to apply them using R. The following tutorials require some familiarity with R and RStudio, so you should be comfortable with the content in R Basics before proceeding here. Some tutorials in this section also assume familiarity with Data Visualisation using R, so we also recommended completing that section to get the most out of these tutorials. If you are looking for a simple conceptual introduction to statistics and quantitative analysis, you may find it helpful to start with Introduction to Quantitative Reasoning and Basic Concepts in Quantitative Research before proceeding here.

The tutorials in this section do not necessarily have to be completed in order. We recommend starting with Descriptive Statistics and Basic Inferential Statistics as an introduction, and then moving on to the more in-depth tutorials that are relevant to you.

Descriptive Statistics

This tutorial focuses on how to describe and summarise data in R.

Basic Inferential Statistics

This tutorial introduces basic inferential procedures for null-hypothesis hypothesis testing.

Regression Analysis

This tutorial introduces regression analyses (also called regression modelling) using R. Regression models are among the most widely used quantitative methods in the language sciences to assess if and how predictors (variables or interactions between variables) correlate with a certain response.

Mixed-Effects Models

This tutorial introduces mixed-effects modelling in R. Mixed-models are widely used in the language sciences to assess if and how predictors correlate with a certain response if the data is hierarchical.

Tree-Based Models

This tutorial focuses on tree-based models and their implementation in R.

Cluster and Correspondence Analysis

This tutorial introduces classification and clustering using R. Cluster analyses fall within the domain of classification methods which are used to find groups or patterns in data or to predict group membership.

Introduction to Lexical Similarity

This tutorial introduces Text Similarity, i.e. how close or similar two pieces of text are with respect to either their use of words or characters (lexical similarity) or in terms of meaning (semantic similarity).

Semantic Vector Space Models

This tutorial introduces Semantic Vector Space (SVM) modelling R. SVMs are used to find groups or patterns in data or to predict group membership.

Dimension Reduction Methods

This tutorial introduces selected dimension reduction methods (Principal Component Analysis, Factor Analysis, and Multidimensional Scaling) which allow to detect and evaluate structures, called components, latent variables, or factors, underlying observed variables.

Power Analysis

This tutorial introduces power analysis using R. Power analysis is a method primarily used to determine the appropriate sample size for empirical studies.

Text Analytics

This section introduces text analysis using R and covers various text analytics methods. These tutorials require some familiarity with R and RStudio, so you should be comfortable with the content in the R Basics section before proceeding here.

The tutorials in this section do not necessarily need to be completed in order. Feel free to skip ahead to the tutorial that is relevant to your work if you know exactly what you’re looking for; otherwise, we recommend starting with Introduction to Text Analysis for some of the theoretical background and relevant terms and concepts, and then moving on to Practical Overview of Selected Text Analytics Methods to get an idea of the kinds of methods you can apply. At this point, you can move on to the more in-depth tutorials that are relevant to you.

Introduction to Text Analysis

This tutorial introduces Text Analysis, i.e. computer-based analysis of language data or the (semi-)automated extraction of information from text.

Practical Overview of Selected Text Analytics Methods

This tutorial showcases some basic but useful methods for text analysis and serves as a practical overview or introduction to Text analytics and distant reading.

Concordancing (keywords-in-context)

This tutorial introduces how to find words or phrases in text and display concordances, a so-called keyword-in-context (KWIC) display, with R.

Collocation and N-gram Analysis

This tutorial introduces collocation analysis and identifying N-grams with R and shows how to extract and visualise semantic links between words.

Keyness and Keyword Analysis

This tutorial introduces keyness analysis and identifying keywords with R and shows how to visualise keywords.

Network Analysis

This tutorial introduces network analysis using R. Network analysis is a method for visualisation that can be used to represent various types of data.

Topic Modeling

This tutorial introduces topic modelling using R.

Sentiment Analysis

This tutorial introduces sentiment analysis (SA) and shows how to perform a SA in R.

Tagging and Parsing

This tutorial introduces part-of-speech tagging and syntactic parsing using R.

Automated Text Summarisation

This tutorial shows how to summarise texts automatically using R by extracting the most prototypical sentences.

Spell Checking

This tutorial shows how to implement and use spell checking in R when working with text data.

Case Studies

This section offers some more specific examples of the kinds of research and analyses you can do with the methods taught in LADAL tutorials. These tutorials require some familiarity with R and RStudio, so you should be comfortable with the content in the R Basics section before proceeding here.

Classifying American Speeches

This tutorial shows how to perform document classification using R. It was created by Gerold Schneider and Max Lauber for the Australian Text Analytics Platform (ATAP).

Corpus Linguistics with R

This section presents different case studies or use cases that highlight how to do corpus-based analyses by implementing procedures shown in other LADAL tutorials.

Analysing learner language using R

This tutorial focuses on learner language and how to analyse differences between learners and L1 speakers of English using R.

Lexicography and Creating Dictionaries with R

This tutorial introduces lexicography with R and shows how to use R to create dictionaries and find synonyms through determining semantic similarity in R.

Visualising and Analysing Questionnaire and Survey Data

This tutorial offers some advice on what to consider when creating surveys and questionnaires, provides tips on visualising survey data, and exemplifies how survey and questionnaire data can be analysed.

Creating Vowel Charts in R

This tutorial exemplifies how to create a vowel chart with Praat and R.

Computational Literary Stylistics with R

This tutorial focuses on computational literary stylistics (also digital literary stylistics) and shows how fictional texts can be analysed by using computational means.

Reinforcement Learning and Text Summarisation in R

This tutorial introduces the concept of Reinforcement Learning (RL), and how it can be applied in the domain of Natural Language Processing (NLP) and linguistics.

How-Tos

This section includes quick tutorials showing you how to access and/or convert data which could then be analysed using methods taught in other LADAL tutorials.

Converting PDFs to txt

This tutorial shows how to extract text from one or more pdf-files using optical character recognition (OCR) and then saving the text(s) in txt-files on your computer.

Creating R Notebooks with Markdown

This tutorial shows how to create R Notebooks using markdown for formatting for your analyses to document your workflow so that your research project is transparent and reproducibile.

Creating free online ebooks with bookdown

This tutorial shows how to create free online ebooks with bookdown that are launhced from GitHub. Such ebooks can serve to create course books for students, provide additional information about your analyses, or just to write and publish free online books.

Creating interactive Jupyter notebooks

This tutorial shows how to create interactive Juypter notebooks that can be launched from GitHub. Such notebooks are ideal for teaching, sharing reproducible research, building tutorials, or developing live documentation that includes code, visualizations, and narrative text.

Downloading Texts from Project Gutenberg

This tutorial shows how to download and clean works from the Project Gutenberg archive using R. Project Gutenberg is a data base which contains roughly 60,000 texts for which the US copyright has expired.

Back to HOME