The LADAL Webinar Series 2022 consists of 6 webinars | online presentations from speakers with backgrounds in linguistics, data science, or computational humanities and it covers topics related to the computational handling of language data! All recordings of the webinar series are available on the LADAL YouTube channel.
Details about upcoming and past webinars that are part of the LADAL Webinar Series 2022 can be found below.
All events were announced on Twitter (@slcladal), via the UQ School of Languages and Cultures, and via our collaborators) - so please follow us if you like to catch up with the activities at LADAL. Below are links to recordings of past webinars on our YouTube channel.
Below you will find the details of the webinars including abstracts, bioblurbs of the speakers, and additional resources.Please note that we have only included confirmed webinars | online presentations at the moment - so more webinars and online presentations will be added once they are confirmed! Stay put and check this space if you want to find out more.
July 4, 2022, 8 pm (Brisbane, 12 pm Berlin, 10 am Birmingham)
Analyzing Longitudinal Data
Zoom link: https://uqz.zoom.us/j/86849442143
The focus of the current seminar will be a (very) brief introduction to longitudinal data and their analysis focusing on regression. To start with we will look at longitudinal data and different designs for collecting such data. We will then look at some empirical observations and why they occur before turning our attention to simple linear regression and why it is generally not appropriate to use it for the analysis of such data. Understanding the above is very important but also a neglected aspect of longitudinal data. We will then, very briefly, introduce two methods for appropriately analysing such data: (i) mixed models and (ii) Generalised Estimating Equations.
Dimitrios is a biostatistician and the head of the Research Methods Groups at the Queensland University of Technology. He is also the President of the QLD branch of the Statistical Society of Australia and a member of the Accreditation committee of the same society. He also holds the highest professional accreditation of the American Statistical Association and the Royal Statistical Society. He has also studied animal science, quantitative genetics, and digital biology. His main job is to help researchers do better research and he enjoys working in multidisciplinary teams.
November 7, 2022, 8pm (Brisbane, 11am Berlin, 10am Birmingham)
Bayesian generalized linear mixed models with brms
Zoom link: https://uqz.zoom.us/j/86849442143
Linguistics is undergoing a rapid shift away from significance tests towards approaches emphasizing parameter estimation, such as linear mixed effects models. Alongside this shift, another revolution is underway: away from using p-values as part of a “null ritual” (Gigerenzer, 2004) towards Bayesian models. Both shifts can nicely be dealt with the ‘brms’ package (Bürkner, 2017). After briefly reviewing why we shouldn’t blindly follow the “null ritual” of significance testing, I will demonstrate how easy it is to fit quite complex models using this package. I will also talk about how mixed models are used in different subfields of linguistics (Winter & Grice, 2021), and why established practices such as dropping random slopes for non-converging models are a further reason to go Bayesian. Finally, I will briefly touch on issues relating to prior specification, especially the importance of weakly informative priors to prevent overfitting.
Bodo Winter is a Senior Lecturer at the Department of Linguistics at the University of Birmingham, a UKRI Future Leaders Fellow, a Fellow of the Institute for Interdisciplinary Data Science and AI, and Editor-in-Chief at the journal Language and Cognition. Dr. Winter has received his PhD in Cognitive and Information Sciences from the University of California, Merced. His research focuses on multimodality, sound symbolism, gesture, and metaphor.
December 5, 2022, 8pm (Brisbane, 11am Berlin, 12am Helsinki)
Found in Translation - What can we learn from translations about languages and human communication
Zoom link: https://uqz.zoom.us/j/86849442143
The goal of language technology is to create computational models that can understand and generate language in a way humans can do. One of the strategies is to learn such communication abilities from real-world data and in that way somewhat resemble humans and their capability of picking up language skills through practical experience. However, the crucial question is what kind of experience is needed and what kind of tasks have to be practiced to build an understanding of human language signals. We are currently running a project that studies the use of translations that form natural semantic mirrors of original texts in other languages as a means of providing information about the underlying latent meaning that corresponds to the observable language string. The big question is what kind of abstractions can be learned from this cross-lingual signal and how much does that reflect our knowledge about linguistic properties on various levels. Part of this question is how much language diversity can be used to push abstraction levels even further. In this talk I will present some of our results and try to connect this kind of neural “black-box” NLP with questions in general linguistics and cognition.
Jörg Tiedemann is professor of language technology at the Department of Digital Humanities at the University of Helsinki. He received his PhD in computational linguistics for work on bitext alignment and machine translation from Uppsala University before moving to the University of Groningen for 5 years of post-doctoral research on question answering and information extraction. His main research interests are connected with massively multilingual data sets and data-driven natural language processing and he currently runs an ERC-funded project on representation learning and natural language understanding.
The Spread of Lexical Innovation is Constrained by Cultural Patterns
This talk was recorded March 7, 2022, as part of the LADAL Webinar Series 2022.
Recording link: https://youtu.be/sW70Y6XDiRA
AbstractIn this talk I discuss the results of three studies on the geographical diffusion of lexical innovation, all of which are based on the analysis of multibillion word corpora of Twitter data collected between 2013 and 2014. In this first study, I track the spread of new words across the US. In the second study, I zoom in and look at how these same words spread out across New York City. In the third study, I consider how lexical items from Multicultural London English have diffused across the UK. In all cases, I show that the spread of lexical innovation is not only constrained by physical distance and population density, as predicted by the Wave and Gravity Models, but by cultural patterns and boundaries.
Jack Grieve is Professor of Corpus Linguistics at the University of Birmingham and Turing Fellow at the Alan Turing Institute. His research involves analysing large corpora of natural language to understand language variation and change. He is especially interested in grammatical and lexical variation in the English language across time, space and communicative context, as well as developing methods for quantitative linguistic analysis. Jack also conducts research on authorship analysis and sometimes consult on casework as a forensic linguist. You can get in touch with Jack at email@example.com or via his Twitter handle @JWGrieve.
The Archive as Subject rather than Source: a Roadmap to Constructing and Disseminating a Digital Archive
This talk was recorded May 9, 2022, as part of the LADAL Webinar Series 2022.
Recording link: https://youtu.be/VZnV3pI8wyw
AbstractArchives are valuable resources in historical media and communication research. Unfortunately, many collections are hard to find and access, and have not been properly digitised and described. In this paper we argue that computational methods are instrumental in engaging with and digitising archive materials, even though these methods introduce problems of their own. More specifically, we articulate critical thinking on the concept of ‘the archive’ with a discussion of the key decisions and practical hurdles encountered in the construction and digitisation of an archive. This is illustrated by a mid-sized project based on records of communications between the Pentagon and CIA Entertainment Liaison Offices (ELO) and audiovisual productions companies around the world, which are key in revealing influence of the US military on the entertainment industry. However, rather than discussing every step in the project in minute detail, we maximise the relevance for a broad readership by distilling a generic roadmap and key recommendations that cover all stages of strategic decision-making in archive construction and digitisation, as well as demonstrating the required generic technical implementations.
Cedric Courtois is a senior lecturer in the School of Communication and Arts in the Faculty of Humanities and Social Sciences at the University of Queensland. He is both an audience researcher and a methodologist. His research interests include algorithmic impact in digital culture and data science applications in (digital) media and communication research (including text mining and image processing).
Martin Schweinberger and Michael Haugh presented about ATAP and LADAL at the Griffith Digital Humanities Webinar Series on Friday May 20, 2022. For more information about this webinar series and the Griffith Centre for Social and Cultural Research see here.