Keep up to date with the current developments at LADAL!
Below you will find information on and links to the latest developments at LADAL such as updates to the LADAL website, upcoming workshops and presentations, planned events, and links to resources.
We are currently looking for user stories (also known as testimonials) to see and show what people use LADAL resources for. If you have used LADAL resources - be it by simply copying some code, attending a workshop, learning about a method using a tutorial, or in any other way - we would be extremely grateful, if you would send us your user story!
To submit your user story, simply write up a paragraph describing how you have used LADAL resources and what you have used them for and send it to firstname.lastname@example.org. We really appreciate any feedback from you about this!
The second talk in the LADAL Webinar Series 2022 is by Cedric Courtois about using archives as source and as study object. Cedric is a senior lecturer in the School of Communication and Arts in the Faculty of Humanities and Social Sciences at the University of Queensland. Before joining UQ, Cedric worked, for example, at the Hans Bredow Institute (HBI) which is now also the Leibniz Institute for Media Research and collaborating partner of LADAL. He is both an audience researcher and a methodologist. His research interests include algorithmic impact in digital culture and data science applications in (digital) media and communication research (including text mining and image processing).
The LADAL Webinar Series 2022 starts off with a talk by Jack Grieve about cultural and regional constraints on the spread of linguistic innovations. Jack is Professor of Corpus Linguistics at the University of Birmingham and Turing Fellow at the Alan Turing Institute. His research involves analysing large corpora of natural language to understand language variation and change. He is especially interested in grammatical and lexical variation in the English language across time, space and communicative context, as well as developing methods for quantitative linguistic analysis. Jack also conducts research on authorship analysis and sometimes consults on casework as a forensic linguist.
Ben Foley has joined LADAL and we are more than thrilled about this! Ben was the project manager of CoEDL’s Transcription Acceleration Project (TAP) and he has specialized on speech recognition and the development of user-friendly speech recognition tools. Also, Ben’s previous experience with Aboriginal and Torres Strait Islander language resource development has resulted in apps and websites galore.
LADAL is now officially part of the Australian Text Analytics Platform (ATAP). The aim of the Australian Text Analytics Platform (ATAP) is to provide researchers with a toolset that is more powerful and customisable than those contained in the standard packages, while being accessible to a large number of researchers who do not have strong coding skills. A key outcome of the project will be the development of an integrated notebooks-based platform for processing and mining text data.
ATAP is funded by the Australian Research Data Commons (ARDC) Platforms Program and ATAP is lead by Michael - Martin is part of the steering committee, a Chief investigator (CI), and he is chairing a User Group.
Gregor Wiedemann who is co-directing the Media Research Methods Lab (MRML) at the Leibniz Institute for Media Research │ Hans Bredow Institute (HBI) has agreed to collaborate with LADAL. We are extremely happy about this here at LADAL given that the MRML is designed as a method-oriented lab, which focuses on linking established social science methods (surveys, observations, content analysis, experiments) with new digital methods from the field of computational social science (e.g. automated content analysis, network analysis, log data analysis, experience sampling) across topics and disciplines.
LADAL Opening Webinar Series 2021
FINALLY! We are really happy to announce that LADAL will have its OFFICIAL OPENING EVENT!
The LADAL Opening event will consist of weekly presentations from eminent figures in linguistics, data science, and computational humanities and will cover a wide range of topics related to LADAL-relevant issues!
The first event of the LADAL Opening is a presentation by Stefan Th. Gries on MuPADRF (Multifactorial Prediction and Deviation Analysis Using Regression/Random Forests) on June 3, 2021, 5pm Brisbane time (9am CET). The event will take place on Zoom (the Zoom link will be announced here, on Twitter, and via our collaborators).Click HERE for more information about the weekly LADAL Opening presentations
Stefan Th. Gries affiliates with LADAL!
We are really pound and feel both honored and privileged that Stefan Th. Gries has agreed to contribute to LADAL! Stefan has played an outstanding role in promoting statistical and computational skills in the language sciences. Stefan’s textbooks Statistics for Linguistics with R – A Practical Introduction and Quantitative Corpus Linguistics with R: A Practical Introduction, but also with his research output and bootcamps have had a tremendous influence on many linguists working with empirical data!
UQ’s summer research program supports LADAL: four junior academics will assist LADAL, acquire new skills, and produce new materials. A very warm welcome to the summer scholars and we hope that they get as much as possible out of the program and provide great new materials!
The VARIENG at the University of Helsinki has agreed to affiliate with LADAL! VARIENG is a perfect affiliate due to the similarity in the outlook and the alignment of aims of both VARIENG and LADAL. In addition, we are super happy to have VARIENG as an affiliate institution because of their extremely high scientific merit!
Erich Round - director of the Ancient Languages Lab and world-renowned phylogenetics expert as well as recipient of the British Academy Global Professorship - is now officially a contributor to LADAL! His expertise in R and phylogenetics fantastically complement our skill set here and we are more than going crazy for having him on board!
Also - and on an equally enthusiastic note - Peter Crosthwaite, the foremost proponent on Data Driven Learning in Australia, has agreed to be a LADAL affiliate! Peter is not only a fantastic second language acquisition scholar but he has probably one of the best overviews of existing software in this domain and is amazingly well versed in finding the right applications to do awesome research!
Welcome on board!
Laurence Anthony - the royal highness of AntConc empire - has agreed to be an affiliate of LADAL! We are so glad to have him on board as Laurence is not only tech savvy as few others in Corpus Linguistics but also because Laurence is overall wholesome and a fantastic promoter of computation in HASS research!
Stephane Guillou has agreed to be a contributor and affiliate of LADAL! That is really fantastic not only because Stephane is all-around awesome and a true R wiz but Stephane is also directing the upskilling efforts in R, Python, and Git at the UQ library and thus brings along a fantastic skill-set!
Monika Bednarek who is running the Sydney Corpus Lab at the University of Sydney, has agreed to be an affiliate member of LADAL. This is perfect for LADAL given Monika’s expertise and excellent research in Corpus Linguistics as well as the close alignment of the Sydney Corpus Lab with LADAL!
The news that LADAL exists has reached the other side of the globe: Martin was invited by Mikko Laitinen from the University of Eastern Finland to give a guest lecture about his experiences in establishing LADAL in the context of an event about developing support infrastructures for computational social sciences and humanities research.
We have decided to include interactive exercises into our tutorials and we are currently looking into different options how to achieve this. Currently Binder appears to be a viable pathway forward.
We are delighted to announce that Katy McHugh, Stephen Clark, and Restuadi Restuadi have joined the LADAL team!
Katy, Stephen, and Restuadi will be involved in the restructuring, professionalizing, and revamping the LADAL webpage. We would like to extend our warmest welcome to them and express our gratitude to the School of Languages and Cultures at UQ for providing the funding for the RA positions.
The LADAL team organizes workshops and LADAL members present their research or information relevant to LADAL at conferences. Below are links to upcoming events (conferences/workshops/presentations) and presentations containing information about LADAL or research based on LADAL.
The LADAL Opening event signifies the official kick off for LADAL. Originally this kick-off was planned for June 2020 as a 5-day conference with an invited speaker (Stefan Gries), workshops on data science, and social events. Unfortunately, this kick-off had to be postponed due to COVID19 and will be held as an online event.
HERE you will find updates and the current state of plans relating to the LADAL opening.
Speaker: Martin Schweinberger
Date: 20–24 May 2020Presentation at ICAME 41 (41th Meeting of the International Computer Archive of Modern and Medieval English). Heidelberg, Germany.
Abstract: This paper addresses issues relating to best practices in Data Management and Data Analysis in Corpus Linguistics (CL) and offers guidelines for compiling, storing, handling, and analysing data according to best practices which guarantee transparency and high quality in CL.
Open Data and Best Practices in Data Science are increasingly attracting attention as a result of the so-called Replication Crisis (RC) which is an ongoing methodological crisis primarily affecting parts of the social and life sciences that began in the early 2010s (Diener & Biswas-Diener 2019). The RC has contributed to the loss of trust that the Humanities and Social Science have been experienced over the past two decades (Yong 2018). While a discussion about Best Practices in CL has recently begun (Berez-Kroeker et al. 2018) more attention has to be placed on the causes of the RC and the lessons that can be learnt from it.
CL is somewhat disjunct from current developments in Data Science due to a lack of communication and unawareness of existing resources. This talk aims to raise awareness in CL about existing resources and problematic practices that are still common in CL, and it proposes solutions that are easily implemented and can guarantee transparency, replicability, and high quality of research outputs in CL.
The solutions that this talk focuses on encompass
being aware and following the FAIR principles (Findable, Accessible, Interoperable, and Reusable) in data management;
the recognition of corpora as research outputs which allows corpora to be uniquely indexed (DOIs) and thereby enabling corpus compilers to profit from making corpora accessible as these can be cited like other publications which increases citation scores and visibility;
the use of Git to share code and data which is an easy way to share resources free of charge by utilizing existing research infrastructure;
the use of R Notebooks to document analyses and making them available to the community and reviewers to enable full replicability and reproducibility;
making use of documentation and policy protocols in departments, schools and institutes to ease onboarding procedures and prevent data loss and corruption.
The talk thus offers relevant information for authors as well as editors and publishers to enable replication, avoid “bad” research practices, and increase the quality of research.
Berez-Kroeker, A. L., L. Gawne, S. S. Kung, B. F. Kelly, T. Heston, G. Holton, P. Pulsifer, D. I. Beaver, S. Chelliah, S. Dubinsky, et al. (2018). Reproducible research in linguistics: A position statement on data citation and attribution in our field. Linguistics 56(1), 1–18.
Diener, Edward and Biswas-Diener, Robert (2019). The Replication Crisis in Psychology. NOBA Project.
Yong, Ed (2018). Psychology’s Replication Crisis Is Running Out of Excuses. Another big project has found that only half of studies can be repeated. And this time, the usual explanations fall flat. The Atlantic.
Date: 30 October 2019
Speaker: Michael Haugh & Martin Schweinberger
Presentation at the Australian Research Data Commons (ARDC): The Australian eResearch Skilled Workforce Summit. Sydney, Australia, 29-30/7/2019.
Abstract: This presentation introduces the Language Technology and Data Analysis Laboratory (LADAL), and discusses the implications of our experiences to date in establishing it for broader efforts to develop researcher capacity in the digital humanities.
The LADAL is school-based support infrastructure for digital humanities researchers. It aims to assist staff and postgraduate students within the UQ School of Languages and Cultures to learn how to use data analytics, digital research tools, and other forms of technology to enhance their existing research programs, as well as offer pathways to new research possibilities. It complements the more generic resources and training in digital humanities methods offered by libraries (e.g. the Digital Scholars Hub at UQ) with the more specialised training/support in particular digital research methods and technologies that are required by researchers working on specific languages and cultures.
The LADAL consists of a specialist computing lab for language-based computational and experimental work (the Computational and Experimental Workshop) and an online virtual lab. With respect to web-based materials, the LADAL website (https://slcladal.github.io/index.html) offers self-guided study materials and hands-on tutorials on topics relating to digital tools, computational methods for data extraction and processing, data visualization, statistical analyses of language data, and provides links to further resources and short descriptions of digital tools relevant for digital HASS research.
In addition, the LADAL offers face-to-face consultations and specialized workshops. UQ researchers are encouraged to contact LADAL staff for advice and guidance on matters relating to digital research tools, data visualization, various statistical procedures, and text analytics.
Staff feedback during face-to-face consultations and workshop attendance confirms there is substantial demand for the kind of digital humanities infrastructure offered by LADAL. It also suggests that support and training for researchers in the digital humanities should be conceptualized on a continuum from more generic through to more localized support.
Date: 2 April 2019
Speaker: Martin Schweinberger
Presentation at the Center of Excellence for the Dynamics of Language (CoEDL) Corpus Workshop. Melbourne, Australia, 2–3/4/2019.
Below are links to additional resources, workshops, and presentations.
Getting started with R for (absolute) beginners: This workshop focused on why you should use R, what you can do with R, and how you can use it for your data analysis.
Statistics – Analyzing Survey and Questionnaire Data: This workshop introduced basic visualizations and statistical tests for analyzing survey and questionnaire data.
Happy Computer, Happy Me!: This workshop shows ways to keep your computer happy and your data clean by providing simple tips and tricks for computer maintenance that keep your computer running at optimum speed and reliability.