TUTORIALS
Welcome to LADAL tutorials. This page contains all tutorials provided by LADAL, organised in seven sections. The first section, Data Science Basics, offers some useful background reading which introduces key concepts and best practices in digital and quantitative research. The rest of the sections mainly consist of more practical, follow-along tutorials. The first of these, R Basics, introduces the programming language, R, which is used for all other LADAL tutorials and tools. Next is Data Visualisation, which will show you how to create visual representations of data (such as graphs and tables) using R. The Statistics section covers statistical methods with R, from simple overviews of descriptive and basic inferential statistics through to specific statistical models that may be useful when working with text and language data. The Text Analytics section introduces a range of computational methods for analysing text, covering both the theoretical background of text analytics and practical, hands-on lessons. Next, we have a Case Studies section, which offers more specific examples of the various kinds of research you can do with the methods shown in LADAL tutorials. Finally, the How-Tos section includes some brief tutorials on accessing or converting data which you could then use to apply the methods taught in other tutorials.
Generally, we recommend starting with R Basics, as the content covered in this section will be assumed knowledge for all subsequent sections. Once you are familiar with R and RStudio, you can move on to Data Visualisation, Statistics, and/or Text Analytics, referring to the Data Science Basics section as needed. If you are completely new to computational methods and you’d like to explore whether LADAL will be helpful for your work, you might like to first have a quick look through Introduction to Text Analysis and some of our Case Studies.
Data Science Basics
This section is an introduction to digital and quantitative research, covering best practices for working with digital data, key principles behind reproducible research, and the basic building blocks of science and quantitative analysis. These tutorials provide great theoretical backgrounds to the practical tutorials in other sections, and can be supplemented as needed for your particular goals.
This tutorial provides advice and general tips on how to keep your computer clean and running smoothly, how to organise files and folders, and how to store your data safely and orderly.
Data Management and Reproducibility
This tutorial introduces basic data management techniques, version control measures, and issues relating to reproducible research.
Introduction to Quantitative Reasoning
This tutorial takes a philosophical or history-of-ideas approach and introduces the logical and cognitive underpinnings of the scientific method.
Basic Concepts in Quantitative Research
This tutorial introduces basic concepts of data analysis and quantitative research.
R Basics
This section introduces the programming language R, which is the basis for all other LADAL tutorials and tools. The content covered here will be assumed knowledge for all subsequent sections, so we recommend starting here if you are not already familiar with R. These tutorials are designed to be worked through in order, before moving on to the other section(s) that you are interested in.
This site provides our reasoning for focusing (almost exclusively) on R in LADAL.
This tutorial shows how to get started with R and it specifically focuses on R for analysing language data but it offers valuable information for anyone who wants to get started with R.
This tutorial shows how you can load and save different types of data when working with R.
This tutorial introduces string processing and this can be used when working with language data.
This tutorial introduces regular expressions and how they can be used when working with language data.
This tutorial shows how to work with tables and how to tabulate data in R.
Data Visualisation
This section introduces some simple principles of data visualisation and shows you how to create visual representations of your data using R. The tutorials in this section require some familiarity with R and RStudio, so you should be comfortable with the content in R Basics before proceeding here.
This tutorial introduces data visualisation using R and shows how to modify different types of visualisations in the ggplot framework in R.
This tutorial introduces different types of data visualisation and how to prepare your data for different plot types.
Showcase: How to create a typological map
This tutorial shows how to create typological maps in R with leaflet.
Statistics
This section covers various statistical methods and how to apply them using R. The following tutorials require some familiarity with R and RStudio, so you should be comfortable with the content in R Basics before proceeding here. Some tutorials in this section also assume familiarity with Data Visualisation using R, so we also recommended completing that section to get the most out of these tutorials. If you are looking for a simple conceptual introduction to statistics and quantitative analysis, you may find it helpful to start with Introduction to Quantitative Reasoning and Basic Concepts in Quantitative Research before proceeding here.
The tutorials in this section do not necessarily have to be completed in order. We recommend starting with Descriptive Statistics and Basic Inferential Statistics as an introduction, and then moving on to the more in-depth tutorials that are relevant to you.
This tutorial focuses on how to describe and summarise data in R.
This tutorial introduces basic inferential procedures for null-hypothesis hypothesis testing.
This tutorial introduces regression analyses (also called regression modelling) using R. Regression models are among the most widely used quantitative methods in the language sciences to assess if and how predictors (variables or interactions between variables) correlate with a certain response.
This tutorial introduces mixed-effects modelling in R. Mixed-models are widely used in the language sciences to assess if and how predictors correlate with a certain response if the data is hierarchical.
This tutorial focuses on tree-based models and their implementation in R.
Cluster and Correspondence Analysis
This tutorial introduces classification and clustering using R. Cluster analyses fall within the domain of classification methods which are used to find groups or patterns in data or to predict group membership.
Introduction to Lexical Similarity
This tutorial introduces Text Similarity, i.e. how close or similar two pieces of text are with respect to either their use of words or characters (lexical similarity) or in terms of meaning (semantic similarity).
This tutorial introduces Semantic Vector Space (SVM) modelling R. SVMs are used to find groups or patterns in data or to predict group membership.
This tutorial introduces selected dimension reduction methods (Principal Component Analysis, Factor Analysis, and Multidimensional Scaling) which allow to detect and evaluate structures, called components, latent variables, or factors, underlying observed variables.
This tutorial introduces power analysis using R. Power analysis is a method primarily used to determine the appropriate sample size for empirical studies.
Text Analytics
This section introduces text analysis using R and covers various text analytics methods. These tutorials require some familiarity with R and RStudio, so you should be comfortable with the content in the R Basics section before proceeding here.
The tutorials in this section do not necessarily need to be completed in order. Feel free to skip ahead to the tutorial that is relevant to your work if you know exactly what you’re looking for; otherwise, we recommend starting with Introduction to Text Analysis for some of the theoretical background and relevant terms and concepts, and then moving on to Practical Overview of Selected Text Analytics Methods to get an idea of the kinds of methods you can apply. At this point, you can move on to the more in-depth tutorials that are relevant to you.
This tutorial introduces Text Analysis, i.e. computer-based analysis of language data or the (semi-)automated extraction of information from text.
Practical Overview of Selected Text Analytics Methods
This tutorial showcases some basic but useful methods for text analysis and serves as a practical overview or introduction to Text analytics and distant reading.
Concordancing (keywords-in-context)
This tutorial introduces how to find words or phrases in text and display concordances, a so-called keyword-in-context (KWIC) display, with R.
Collocation and N-gram Analysis
This tutorial introduces collocation analysis and identifying N-grams with R and shows how to extract and visualise semantic links between words.
This tutorial introduces keyness analysis and identifying keywords with R and shows how to visualise keywords.
This tutorial introduces network analysis using R. Network analysis is a method for visualisation that can be used to represent various types of data.
This tutorial introduces topic modelling using R.
This tutorial introduces sentiment analysis (SA) and shows how to perform a SA in R.
This tutorial introduces part-of-speech tagging and syntactic parsing using R.
This tutorial shows how to summarise texts automatically using R by extracting the most prototypical sentences.
This tutorial shows how to implement and use spell checking in R when working with text data.
Case Studies
This section offers some more specific examples of the kinds of research and analyses you can do with the methods taught in LADAL tutorials. These tutorials require some familiarity with R and RStudio, so you should be comfortable with the content in the R Basics section before proceeding here.
This tutorial shows how to perform document classification using R. It was created by Gerold Schneider and Max Lauber for the Australian Text Analytics Platform (ATAP).
This section presents different case studies or use cases that highlight how to do corpus-based analyses by implementing procedures shown in other LADAL tutorials.
Analysing learner language using R
This tutorial focuses on learner language and how to analyse differences between learners and L1 speakers of English using R.
Lexicography and Creating Dictionaries with R
This tutorial introduces lexicography with R and shows how to use R to create dictionaries and find synonyms through determining semantic similarity in R.
Visualising and Analysing Questionnaire and Survey Data
This tutorial offers some advice on what to consider when creating surveys and questionnaires, provides tips on visualising survey data, and exemplifies how survey and questionnaire data can be analysed.
This tutorial exemplifies how to create a vowel chart with Praat and R.
Computational Literary Stylistics with R
This tutorial focuses on computational literary stylistics (also digital literary stylistics) and shows how fictional texts can be analysed by using computational means.
Reinforcement Learning and Text Summarisation in R
This tutorial introduces the concept of Reinforcement Learning (RL), and how it can be applied in the domain of Natural Language Processing (NLP) and linguistics.
How-Tos
This section includes quick tutorials showing you how to access and/or convert data which could then be analysed using methods taught in other LADAL tutorials.
This tutorial shows how to extract text from one or more pdf-files using optical character recognition (OCR) and then saving the text(s) in txt-files on your computer.
Downloading Texts from Project Gutenberg
This tutorial shows how to download and clean works from the Project Gutenberg archive using R. Project Gutenberg is a data base which contains roughly 60,000 texts for which the US copyright has expired.