This tutorial introduces how to extract concordances and keyword-in-context (KWIC) displays with R.
This tutorial is aimed at beginners and intermediate users of R with the aim of showcasing how to extract keywords and key phrases from textual data and how to process the resulting concordances using R. The aim is not to provide a fully-fledged analysis but rather to show and exemplify selected useful methods associated with concordancing.
The entire R Notebook for the tutorial can be downloaded here.
If you want to render the R Notebook on your machine, i.e. knitting the
document to html or a pdf, you need to make sure that you have R and
RStudio installed and you also need to download the bibliography
file and store it in the same folder where you store the
Rmd file.
Click
this link to open an interactive version of this tutorial on
MyBinder.org.
This interactive Jupyter notebook allows
you to execute code yourself and you can also change and edit the
notebook, e.g. you can change code and upload your own data.
In the language sciences, concordancing refers to the extraction of words from a given text or texts (Lindquist 2009, 5). Commonly, concordances are displayed in the form of keyword-in-context displays (KWICs) where the search term is shown in context, i.e. with preceding and following words. Concordancing are central to analyses of text and they often represents the first step in more sophisticated analyses of language data (Stefanowitsch 2020). The play such a key role in the language sciences because concordances are extremely valuable for understanding how a word or phrase is used, how often it is used, and in which contexts is used. As concordances allow us to analyze the context in which a word or phrase occurs and provide frequency information about word use, they also enable us to analyze collocations or the collocational profiles of words and phrases (Stefanowitsch 2020, 50–51). Finally, concordances can also be used to extract examples and it is a very common procedure.
Concordances in AntConc.
There are various very good software packages that can be used to create concordances - both for offline use (e.g. AntConc (Anthony 2004), SketchEngine(Kilgarriff et al. 2004), MONOCONC(Barlow 1999), and ParaConc)(Barlow 2002) and online use (see e.g. here).
In addition, many corpora that are available such as the BYU corpora can be accessed via a web interface that have in-built concordancing functions.
Online concordances extracted from the COCA corpus that is part of the BYU corpora.
While these packages are very user-friendly, offer various additional functionalities, and almost everyone who is engaged in analyzing language has used concordance software, they all suffer from shortcomings that render R a viable alternative. Such issues include that these applications
are black boxes that researchers do not have full control over or do not know what is going on within the software
they are not open source
they hinder replication because the replications is more time consuming compared to analyses based on Notebooks.
they are commonly not free-of charge or have other restrictions on use (a notable exception is AntConc)
R represents an alternative to ready-made concordancing applications because it:
is extremely flexible and enables researchers to perform their entire analysis in a single environment
allows full transparency and documentation as analyses can be based on Notebooks
offer version control measures (this means that the specific versions of the involved software are traceable)
makes research more replicable as entire analyses can be reproduced by simply running the Notebooks that the research is based on
Especially the aspect that R enables full transparency and replicability is relevant given the ongoing Replication Crisis (Yong 2018; Aschwanden 2018; Diener and Biswas-Diener 2019; Velasco 2019; McRae 2018). The Replication Crisis is a ongoing methodological crisis primarily affecting parts of the social and life sciences beginning in the early 2010s (see also Fanelli 2009). Replication is important so that other researchers, or the public for that matter, can see or, indeed, reproduce, exactly what you have done. Fortunately, R allows you to document your entire workflow as you can store everything you do in what is called a script or a notebook (in fact, this document was originally a R notebook). If someone is then interested in how you conducted your analysis, you can simply share this notebook or the script you have written with that person.
Preparation and session set up
This tutorial is based on R. If you have not installed R or are new to it, you will find an introduction to and more information how to use R here. For this tutorials, we need to install certain packages from an R library so that the scripts shown below are executed without errors. Before turning to the code below, please install the packages by running the code below this paragraph. If you have already installed the packages mentioned below, then you can skip ahead and ignore this section. To install the necessary packages, simply run the following code - it may take some time (between 1 and 5 minutes to install all of the libraries so you do not need to worry if it takes some time).
# install packages
install.packages("quanteda")
install.packages("dplyr")
install.packages("stringr")
install.packages("flextable")
# install klippy for copy-to-clipboard button in code chunks
install.packages("remotes")
remotes::install_github("rlesur/klippy")
Now that we have installed the packages, we activate them as shown below.
# activate packages
library(quanteda)
library(dplyr)
library(stringr)
library(flextable)
# activate klippy for copy-to-clipboard button
klippy::klippy()
Once you have installed R and RStudio and also initiated the session by executing the code shown above, you are good to go.
For this tutorial, we will use Lewis Caroll’s Alice’s Adventures in Wonderland. You can use the code below to load this text into R (but you have to have access to the internet to do so).
text <- base::readRDS(url("https://slcladal.github.io/data/alice.rda", "rb"))
. |
Alice’s Adventures in Wonderland |
by Lewis Carroll |
CHAPTER I. |
Down the Rabbit-Hole |
Alice was beginning to get very tired of sitting by her sister on the |
bank, and of having nothing to do: once or twice she had peeped into |
the book her sister was reading, but it had no pictures or |
conversations in it, “and what is the use of a book,” thought Alice |
“without pictures or conversations?” |
So she was considering in her own mind (as well as she could, for the |
The table above shows that the example text requires formatting so that we can use it. Therefore, we collapse it into a single object (or text) and remove superfluous white spaces.
text <- text %>%
# collapse lines into a single text
paste0(collapse = " ") %>%
# remove superfluous white spaces
str_squish()
. |
Alice’s Adventures in Wonderland by Lewis Carroll CHAPTER I. Down the Rabbit-Hole Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, “and what is the use of a book,” thought Alice “without pictures or conversations?” So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. There was nothing so _very_ remarkable in that; nor did Alice think it so _very_ much out of the way to hear the Rabbit say to itself, “Oh dear! Oh dear! I shall be late!” (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when |
The result confirms that the entire text is now combined into a single character object.
Now that we have loaded the data, we can easily extract concordances
using the kwic
function from the quanteda
package. The kwic
function takes the text (x
)
and the search pattern (pattern
) as it main arguments but
it also allows the specification of the context window, i.e. how many
words/elements are show to the left and right of the key word (we will
go over this later on).
kwic_alice <- kwic(
# define text
text,
# define search pattern
pattern = "Alice")
docname | from | to | pre | keyword | post | pattern |
text1 | 14 | 14 | I . Down the Rabbit-Hole | Alice | was beginning to get very | Alice |
text1 | 73 | 73 | a book , " thought | Alice | " without pictures or conversations | Alice |
text1 | 153 | 153 | in that ; nor did | Alice | think it so _very_ much | Alice |
text1 | 239 | 239 | and then hurried on , | Alice | started to her feet , | Alice |
text1 | 309 | 309 | In another moment down went | Alice | after it , never once | Alice |
text1 | 348 | 348 | down , so suddenly that | Alice | had not a moment to | Alice |
text1 | 531 | 531 | " Well ! " thought | Alice | to herself , " after | Alice |
text1 | 657 | 657 | for , you see , | Alice | had learnt several things of | Alice |
text1 | 731 | 731 | got to ? " ( | Alice | had no idea what Latitude | Alice |
text1 | 924 | 924 | else to do , so | Alice | soon began talking again . | Alice |
You will see that you get a warning stating that you should use
token
before extracting concordances. This can be done as
shown below. Also, we can specify the package from which we want to use
a function by adding the package name plus :: before the function (see
below)
kwic_alice <- quanteda::kwic(
# define and tokenize text
quanteda::tokens(text),
# define search pattern
pattern = "alice")
docname | from | to | pre | keyword | post | pattern |
text1 | 14 | 14 | I . Down the Rabbit-Hole | Alice | was beginning to get very | alice |
text1 | 73 | 73 | a book , " thought | Alice | " without pictures or conversations | alice |
text1 | 153 | 153 | in that ; nor did | Alice | think it so _very_ much | alice |
text1 | 239 | 239 | and then hurried on , | Alice | started to her feet , | alice |
text1 | 309 | 309 | In another moment down went | Alice | after it , never once | alice |
text1 | 348 | 348 | down , so suddenly that | Alice | had not a moment to | alice |
text1 | 531 | 531 | " Well ! " thought | Alice | to herself , " after | alice |
text1 | 657 | 657 | for , you see , | Alice | had learnt several things of | alice |
text1 | 731 | 731 | got to ? " ( | Alice | had no idea what Latitude | alice |
text1 | 924 | 924 | else to do , so | Alice | soon began talking again . | alice |
We can easily extract the frequency of the search term
(alice) using the nrow
or the length
functions which provide the number of rows of a tables
(nrow
) or the length of a vector (length
).
nrow(kwic_alice)
## [1] 386
length(kwic_alice$keyword)
## [1] 386
The results show that there are 414 instances of the search term
(alice) but we can also find out how often different variants
(lower case versus upper case) of the search term were found using the
table
function. This is especially useful when searches
involve many different search terms (while it is, admittedly, less
useful in the present example).
table(kwic_alice$keyword)
##
## Alice
## 386
To get a better understanding of the use of a word, it is often
useful to extract more context. This is easily done by increasing size
of the context window. To do this, we specify the window
argument of the kwic
function. In the example below, we set
the context window size to 10 words/elements rather than using the
default (which is 5 word/elements).
kwic_alice_longer <- kwic(
# define text
text,
# define search pattern
pattern = "alice",
# define context window size
window = 10)
docname | from | to | pre | keyword | post | pattern |
text1 | 14 | 14 | Wonderland by Lewis Carroll CHAPTER I . Down the Rabbit-Hole | Alice | was beginning to get very tired of sitting by her | alice |
text1 | 73 | 73 | what is the use of a book , " thought | Alice | " without pictures or conversations ? " So she was | alice |
text1 | 153 | 153 | was nothing so _very_ remarkable in that ; nor did | Alice | think it so _very_ much out of the way to | alice |
text1 | 239 | 239 | and looked at it , and then hurried on , | Alice | started to her feet , for it flashed across her | alice |
text1 | 309 | 309 | rabbit-hole under the hedge . In another moment down went | Alice | after it , never once considering how in the world | alice |
text1 | 348 | 348 | , and then dipped suddenly down , so suddenly that | Alice | had not a moment to think about stopping herself before | alice |
text1 | 531 | 531 | she fell past it . " Well ! " thought | Alice | to herself , " after such a fall as this | alice |
text1 | 657 | 657 | I think - " ( for , you see , | Alice | had learnt several things of this sort in her lessons | alice |
text1 | 731 | 731 | what Latitude or Longitude I've got to ? " ( | Alice | had no idea what Latitude was , or Longitude either | alice |
text1 | 924 | 924 | down . There was nothing else to do , so | Alice | soon began talking again . " Dinah'll miss me very | alice |
EXERCISE TIME!
`
kwic_confused <- kwic(x = text, pattern = "confused")
## Warning: 'kwic.character()' is deprecated. Use 'tokens()' first.
# inspect
kwic_alice %>%
as.data.frame() %>%
head(10)
## docname from to pre keyword
## 1 text1 14 14 I . Down the Rabbit-Hole Alice
## 2 text1 73 73 a book , " thought Alice
## 3 text1 153 153 in that ; nor did Alice
## 4 text1 239 239 and then hurried on , Alice
## 5 text1 309 309 In another moment down went Alice
## 6 text1 348 348 down , so suddenly that Alice
## 7 text1 531 531 " Well ! " thought Alice
## 8 text1 657 657 for , you see , Alice
## 9 text1 731 731 got to ? " ( Alice
## 10 text1 924 924 else to do , so Alice
## post pattern
## 1 was beginning to get very alice
## 2 " without pictures or conversations alice
## 3 think it so _very_ much alice
## 4 started to her feet , alice
## 5 after it , never once alice
## 6 had not a moment to alice
## 7 to herself , " after alice
## 8 had learnt several things of alice
## 9 had no idea what Latitude alice
## 10 soon began talking again . alice
kwic(x = text, pattern = "wondering") %>%
as.data.frame() %>%
nrow()
## Warning: 'kwic.character()' is deprecated. Use 'tokens()' first.
## [1] 7
kwic_strange <- kwic(x = text, pattern = "strange")
## Warning: 'kwic.character()' is deprecated. Use 'tokens()' first.
# inspect
kwic_strange %>%
as.data.frame() %>%
head(5)
## docname from to pre keyword
## 1 text1 3547 3547 her voice sounded hoarse and strange
## 2 text1 13273 13273 , that it felt quite strange
## 3 text1 33247 33247 remember them , all these strange
## 4 text1 33458 33458 her became alive with the strange
## 5 text1 33784 33784 and eager with many a strange
## post pattern
## 1 , and the words did strange
## 2 at first ; but she strange
## 3 Adventures of hers that you strange
## 4 creatures of her little sister's strange
## 5 tale , perhaps even with strange
`
While extracting single words is very common, you may want to extract more than just one word. To extract phrases, all you need to so is to specify that the pattern you are looking for is a phrase, as shown below.
kwic_pooralice <- kwic(text, pattern = phrase("poor alice"))
docname | from | to | pre | keyword | post | pattern |
text1 | 1,555 | 1,556 | go through , " thought | poor Alice | , " it would be | poor alice |
text1 | 2,144 | 2,145 | ; but , alas for | poor Alice | ! when she got to | poor alice |
text1 | 2,346 | 2,347 | use now , " thought | poor Alice | , " to pretend to | poor alice |
text1 | 2,901 | 2,902 | to the garden door . | Poor Alice | ! It was as much | poor alice |
text1 | 3,624 | 3,625 | right words , " said | poor Alice | , and her eyes filled | poor alice |
text1 | 6,926 | 6,927 | mean it ! " pleaded | poor Alice | . " But you're so | poor alice |
text1 | 7,340 | 7,341 | more ! " And here | poor Alice | began to cry again , | poor alice |
text1 | 8,299 | 8,300 | at home , " thought | poor Alice | , " when one wasn't | poor alice |
text1 | 11,910 | 11,911 | to it ! " pleaded | poor Alice | in a piteous tone . | poor alice |
text1 | 19,287 | 19,288 | " This answer so confused | poor Alice | , that she let the | poor alice |
You may also want to extract more or less fixed patterns rather than exact words or phrases. To search for patterns that allow variation rather than specific, exactly-defined words, you need to include regular expressions in your search pattern.
EXERCISE TIME!
`
kwic_thehatter <- kwic(x = text, pattern = phrase("the hatter"))
## Warning: 'kwic.character()' is deprecated. Use 'tokens()' first.
# inspect
kwic_thehatter %>%
as.data.frame() %>%
head(10)
## docname from to pre keyword
## 1 text1 16710 16711 wish I'd gone to see the Hatter
## 2 text1 16741 16742 and the March Hare and the Hatter
## 3 text1 16993 16994 wants cutting , " said the Hatter
## 4 text1 17039 17040 it's very rude . " The Hatter
## 5 text1 17187 17188 a bit ! " said the Hatter
## 6 text1 17312 17313 with you , " said the Hatter
## 7 text1 17347 17348 , which wasn't much . The Hatter
## 8 text1 17425 17426 days wrong ! " sighed the Hatter
## 9 text1 17476 17477 in as well , " the Hatter
## 10 text1 17591 17592 should it ? " muttered the Hatter
## post pattern
## 1 instead ! " CHAPTER VII the hatter
## 2 were having tea at it the hatter
## 3 . He had been looking the hatter
## 4 opened his eyes very wide the hatter
## 5 . " You might just the hatter
## 6 , and here the conversation the hatter
## 7 was the first to break the hatter
## 8 . " I told you the hatter
## 9 grumbled : " you shouldn't the hatter
## 10 . " Does _your_ watch the hatter
kwic_thehatter %>%
as.data.frame() %>%
nrow()
## [1] 51
kwic_thecat <- kwic(x = text, pattern = phrase("the cat"))
## Warning: 'kwic.character()' is deprecated. Use 'tokens()' first.
# inspect
kwic_thecat %>%
as.data.frame() %>%
head(5)
## docname from to pre keyword post
## 1 text1 946 947 ! " ( Dinah was the cat . ) " I hope
## 2 text1 15756 15757 a few yards off . The Cat only grinned when it saw
## 3 text1 15881 15882 get to , " said the Cat . " I don't much
## 4 text1 15907 15908 you go , " said the Cat . " - so long
## 5 text1 15937 15938 do that , " said the Cat , " if you only
## pattern
## 1 the cat
## 2 the cat
## 3 the cat
## 4 the cat
## 5 the cat
`
Regular expressions allow you to search for abstract patterns rather
than concrete words or phrases which provides you with an extreme
flexibility in what you can retrieve. A regular expression (in short
also called regex or regexp) is a special sequence of
characters that stand for are that describe a pattern. You can think of
regular expressions as very powerful combinations of wildcards or as
wildcards on steroids. For example, the sequence [a-z]{1,3}
is a regular expression that stands for one up to three lower case
characters and if you searched for this regular expression, you would
get, for instance, is, a, an, of,
the, my, our, etc, and many other
short words as results.
There are three basic types of regular expressions:
regular expressions that stand for individual symbols and determine frequencies
regular expressions that stand for classes of symbols
regular expressions that stand for structural properties
The regular expressions below show the first type of regular expressions, i.e. regular expressions that stand for individual symbols and determine frequencies.
RegEx Symbol/Sequence | Explanation | Example |
? | The preceding item is optional and will be matched at most once | walk[a-z]? = walk, walks |
* | The preceding item will be matched zero or more times | walk[a-z]* = walk, walks, walked, walking |
+ | The preceding item will be matched one or more times | walk[a-z]+ = walks, walked, walking |
{n} | The preceding item is matched exactly n times | walk[a-z]{2} = walked |
{n,} | The preceding item is matched n or more times | walk[a-z]{2,} = walked, walking |
{n,m} | The preceding item is matched at least n times, but not more than m times | walk[a-z]{2,3} = walked, walking |
The regular expressions below show the second type of regular expressions, i.e. regular expressions that stand for classes of symbols.
RegEx Symbol/Sequence | Explanation |
[ab] | lower case a and b |
[AB] | upper case a and b |
[12] | digits 1 and 2 |
[:digit:] | digits: 0 1 2 3 4 5 6 7 8 9 |
[:lower:] | lower case characters: a–z |
[:upper:] | upper case characters: A–Z |
[:alpha:] | alphabetic characters: a–z and A–Z |
[:alnum:] | digits and alphabetic characters |
[:punct:] | punctuation characters: . , ; etc. |
[:graph:] | graphical characters: [:alnum:] and [:punct:] |
[:blank:] | blank characters: Space and tab |
[:space:] | space characters: Space, tab, newline, and other space characters |
[:print:] | printable characters: [:alnum:], [:punct:] and [:space:] |
The regular expressions that denote classes of symbols are enclosed
in []
and :
. The last type of regular
expressions, i.e. regular expressions that stand for structural
properties are shown below.
RegEx Symbol/Sequence | Explanation |
\\w | Word characters: [[:alnum:]_] |
\\W | No word characters: [^[:alnum:]_] |
\\s | Space characters: [[:blank:]] |
\\S | No space characters: [^[:blank:]] |
\\d | Digits: [[:digit:]] |
\\D | No digits: [^[:digit:]] |
\\b | Word edge |
\\B | No word edge |
< | Word beginning |
> | Word end |
^ | Beginning of a string |
$ | End of a string |
To include regular expressions in your KWIC searches, you include
them in your search pattern and set the argument valuetype
to "regex"
. The search pattern
"\\balic.*|\\bhatt.*"
retrieves elements that contain
alic
and hatt
followed by any characters and
where the a
in alic
and the h
in
hatt
are at a word boundary, i.e. where they are the first
letters of a word. Hence, our search would not retrieve words like
malice or shatter. The |
is an operator
(like +
, -
, or *
) that stands for
or.
# define search patterns
patterns <- c("\\balic.*|\\bhatt.*")
kwic_regex <- kwic(
# define text
text,
# define search pattern
patterns,
# define valuetype
valuetype = "regex")
docname | from | to | pre | keyword | post | pattern |
text1 | 1 | 1 | Alice's | Adventures in Wonderland by Lewis | \balic.*|\bhatt.* | |
text1 | 14 | 14 | I . Down the Rabbit-Hole | Alice | was beginning to get very | \balic.*|\bhatt.* |
text1 | 73 | 73 | a book , " thought | Alice | " without pictures or conversations | \balic.*|\bhatt.* |
text1 | 153 | 153 | in that ; nor did | Alice | think it so _very_ much | \balic.*|\bhatt.* |
text1 | 239 | 239 | and then hurried on , | Alice | started to her feet , | \balic.*|\bhatt.* |
text1 | 309 | 309 | In another moment down went | Alice | after it , never once | \balic.*|\bhatt.* |
text1 | 348 | 348 | down , so suddenly that | Alice | had not a moment to | \balic.*|\bhatt.* |
text1 | 531 | 531 | " Well ! " thought | Alice | to herself , " after | \balic.*|\bhatt.* |
text1 | 657 | 657 | for , you see , | Alice | had learnt several things of | \balic.*|\bhatt.* |
text1 | 731 | 731 | got to ? " ( | Alice | had no idea what Latitude | \balic.*|\bhatt.* |
EXERCISE TIME!
`
kwic_exu <- kwic(x = text, pattern = ".*exu.*", valuetype = "regex")
## Warning: 'kwic.character()' is deprecated. Use 'tokens()' first.
# inspect
kwic_exu %>%
as.data.frame() %>%
head(10)
## [1] docname from to pre keyword post pattern
## <0 Zeilen> (oder row.names mit Länge 0)
kwic(x = text, pattern = "\\bpit.*", valuetype = "regex") %>%
as.data.frame() %>%
nrow()
## Warning: 'kwic.character()' is deprecated. Use 'tokens()' first.
## [1] 5
kwic(x = text, pattern = "ption\\b", valuetype = "regex") %>%
as.data.frame() %>%
head(5)
## Warning: 'kwic.character()' is deprecated. Use 'tokens()' first.
## docname from to pre keyword
## 1 text1 5823 5823 adjourn , for the immediate adoption
## post pattern
## 1 of more energetic remedies - ption\\b
`
Quite often, we only want to retrieve patterns if they occur in a
certain context. For instance, we might be interested in instances of
selection but only if the preceding word is natural.
Such conditional concordances could be extracted using regular
expressions but they are easier to retrieve by piping. Piping is done
using the %>%
function from the dplyr
package and the piping sequence can be translated as and then.
We can then filter those concordances that contain natural
using the filter
function from the dplyr
package. Note the the $
stands for the end of a string so
that natural$ means that natural is the last element
in the string that is preceding the keyword.
kwic_pipe <- kwic(x = text, pattern = "alice") %>%
dplyr::filter(stringr::str_detect(pre, "poor$|little$"))
docname | from | to | pre | keyword | post | pattern |
text1 | 1,556 | 1,556 | through , " thought poor | Alice | , " it would be | alice |
text1 | 1,739 | 1,739 | " but the wise little | Alice | was not going to do | alice |
text1 | 2,145 | 2,145 | but , alas for poor | Alice | ! when she got to | alice |
text1 | 2,347 | 2,347 | now , " thought poor | Alice | , " to pretend to | alice |
text1 | 3,625 | 3,625 | words , " said poor | Alice | , and her eyes filled | alice |
text1 | 6,927 | 6,927 | it ! " pleaded poor | Alice | . " But you're so | alice |
text1 | 7,341 | 7,341 | ! " And here poor | Alice | began to cry again , | alice |
text1 | 8,300 | 8,300 | home , " thought poor | Alice | , " when one wasn't | alice |
text1 | 11,911 | 11,911 | it ! " pleaded poor | Alice | in a piteous tone . | alice |
text1 | 19,288 | 19,288 | This answer so confused poor | Alice | , that she let the | alice |
Piping is a very useful helper function and it is very frequently used in R - not only in the context of text processing but in all data science related domains.
When inspecting concordances, it is useful to re-order the
concordances so that they do not appear in the order that they appeared
in the text or texts but by the context. To reorder concordances, we can
use the arrange
function from the dplyr
package which takes the column according to which we want to re-arrange
the data as it main argument.
In the example below, we extract all instances of alice and
then arrange the instances according to the content of the
post
column in alphabetical.
kwic_ordered <- kwic(x = text, pattern = "alice") %>%
dplyr::arrange(post)
docname | from | to | pre | keyword | post | pattern |
text1 | 31,131 | 31,131 | voice , the name " | Alice | ! " CHAPTER XII . | alice |
text1 | 8,497 | 8,497 | " Oh , you foolish | Alice | ! " she answered herself | alice |
text1 | 7,808 | 7,808 | happen : " ' Miss | Alice | ! Come here directly , | alice |
text1 | 2,902 | 2,902 | the garden door . Poor | Alice | ! It was as much | alice |
text1 | 2,145 | 2,145 | but , alas for poor | Alice | ! when she got to | alice |
text1 | 73 | 73 | a book , " thought | Alice | " without pictures or conversations | alice |
text1 | 2,620 | 2,620 | and curiouser ! " cried | Alice | ( she was so much | alice |
text1 | 26,047 | 26,047 | I haven't , " said | Alice | ) - " and perhaps | alice |
text1 | 2,959 | 2,959 | of yourself , " said | Alice | , " a great girl | alice |
text1 | 2,424 | 2,424 | eat it , " said | Alice | , " and if it | alice |
Arranging concordances according to alphabetical properties may,
however, not be the most useful option. A more useful option may be to
arrange concordances according to the frequency of co-occurring terms or
collocates. In order to do this, we need to extract the co-occurring
words and calculate their frequency. We can do this by combining the
mutate
, group_by
, n()
functions
from the dplyr
package with the str_remove_all
function from the stringr
package. Then, we arrange the
concordances by the frequency of the collocates in descending order
(that is why we put a -
in the arrange function). In order
to do this, we need to
create a new variable or column which represents the word that
co-occurs with, or, as in the example below, immediately follows the
search term. In the example below, we use the mutate
function to create a new column called post_word
. We then
use the str_remove_all
function to remove everything except
for the word that immediately follows the search term (we simply remove
everything and including a white space).
group the data by the word that immediately follows the search term.
create a new column called post_word_freq
which
represents the frequencies of all the words that immediately follow the
search term.
arrange the concordances by the frequency of the collocates in descending order.
kwic_ordered_coll <- kwic(
# define text
x = text,
# define search pattern
pattern = "alice") %>%
# extract word following the keyword
dplyr::mutate(post_word = str_remove_all(post, " .*")) %>%
# group following words
dplyr::group_by(post_word) %>%
# extract frequencies of the following words
dplyr::mutate(post_word_freq = n()) %>%
# arrange/order by the frequency of the following word
dplyr::arrange(-post_word_freq)
docname | from | to | pre | keyword | post | pattern | post_word | post_word_freq |
text1 | 348 | 348 | down , so suddenly that | Alice | had not a moment to | alice | had | 2 |
text1 | 657 | 657 | for , you see , | Alice | had learnt several things of | alice | had | 2 |
text1 | 14 | 14 | I . Down the Rabbit-Hole | Alice | was beginning to get very | alice | was | 1 |
text1 | 73 | 73 | a book , " thought | Alice | " without pictures or conversations | alice | " | 1 |
text1 | 153 | 153 | in that ; nor did | Alice | think it so _very_ much | alice | think | 1 |
text1 | 239 | 239 | and then hurried on , | Alice | started to her feet , | alice | started | 1 |
text1 | 309 | 309 | In another moment down went | Alice | after it , never once | alice | after | 1 |
text1 | 531 | 531 | " Well ! " thought | Alice | to herself , " after | alice | to | 1 |
We add more columns according to which we could arrange the concordance following the same schema. For example, we could add another column that represented the frequency of words that immediately preceded the search term and then arrange according to this column.
In this section, we will extract the three words following the keyword (alice) and organize the concordances by the frequencies of the following words. We begin by inspecting the first 6 lines of the concordance of selection.
head(kwic_alice)
## Keyword-in-context with 6 matches.
## [text1, 14] I. Down the Rabbit-Hole | Alice |
## [text1, 73] a book," thought | Alice |
## [text1, 153] in that; nor did | Alice |
## [text1, 239] and then hurried on, | Alice |
## [text1, 309] In another moment down went | Alice |
## [text1, 348] down, so suddenly that | Alice |
##
## was beginning to get very
## " without pictures or conversations
## think it so _very_ much
## started to her feet,
## after it, never once
## had not a moment to
Next, we take the concordances and create a clean post column that is all in lower case and that does not contain any punctuation.
kwic_alice %>%
# convert to data frame
as.data.frame() %>%
# create new CleanPost
dplyr::mutate(CleanPost = stringr::str_remove_all(post, "[:punct:]"),
CleanPost = stringr::str_squish(CleanPost),
CleanPost = tolower(CleanPost))-> kwic_alice_following
# inspect
head(kwic_alice_following)
## docname from to pre keyword
## 1 text1 14 14 I . Down the Rabbit-Hole Alice
## 2 text1 73 73 a book , " thought Alice
## 3 text1 153 153 in that ; nor did Alice
## 4 text1 239 239 and then hurried on , Alice
## 5 text1 309 309 In another moment down went Alice
## 6 text1 348 348 down , so suddenly that Alice
## post pattern CleanPost
## 1 was beginning to get very alice was beginning to get very
## 2 " without pictures or conversations alice without pictures or conversations
## 3 think it so _very_ much alice think it so very much
## 4 started to her feet , alice started to her feet
## 5 after it , never once alice after it never once
## 6 had not a moment to alice had not a moment to
In a next step, we extract the 1st, 2nd, and 3rd words following the keyword.
kwic_alice_following %>%
# extract first element after keyword
dplyr::mutate(FirstWord = stringr::str_remove_all(CleanPost, " .*")) %>%
# extract second element after keyword
dplyr::mutate(SecWord = stringr::str_remove(CleanPost, ".*? "),
SecWord = stringr::str_remove_all(SecWord, " .*")) %>%
# extract third element after keyword
dplyr::mutate(ThirdWord = stringr::str_remove(CleanPost, ".*? "),
ThirdWord = stringr::str_remove(ThirdWord, ".*? "),
ThirdWord = stringr::str_remove_all(ThirdWord, " .*")) -> kwic_alice_following
# inspect
head(kwic_alice_following)
## docname from to pre keyword
## 1 text1 14 14 I . Down the Rabbit-Hole Alice
## 2 text1 73 73 a book , " thought Alice
## 3 text1 153 153 in that ; nor did Alice
## 4 text1 239 239 and then hurried on , Alice
## 5 text1 309 309 In another moment down went Alice
## 6 text1 348 348 down , so suddenly that Alice
## post pattern CleanPost
## 1 was beginning to get very alice was beginning to get very
## 2 " without pictures or conversations alice without pictures or conversations
## 3 think it so _very_ much alice think it so very much
## 4 started to her feet , alice started to her feet
## 5 after it , never once alice after it never once
## 6 had not a moment to alice had not a moment to
## FirstWord SecWord ThirdWord
## 1 was beginning to
## 2 without pictures or
## 3 think it so
## 4 started to her
## 5 after it never
## 6 had not a
Next, we calculate the frequencies of the subsequent words and order in descending order from the 1st to the 3rd word following the keyword.
kwic_alice_following %>%
# calculate frequency of following words
# 1st word
dplyr::group_by(FirstWord) %>%
dplyr::mutate(FreqW1 = n()) %>%
# 2nd word
dplyr::group_by(SecWord) %>%
dplyr::mutate(FreqW2 = n()) %>%
# 3rd word
dplyr::group_by(ThirdWord) %>%
dplyr::mutate(FreqW3 = n()) %>%
# ungroup
dplyr::ungroup() %>%
# arrange by following words
dplyr::arrange(-FreqW1, -FreqW2, -FreqW3) -> kwic_alice_following
# inspect results
head(kwic_alice_following, 10)
## # A tibble: 10 × 14
## docname from to pre keyword post pattern Clean…¹ First…² SecWord
## <chr> <int> <int> <chr> <chr> <chr> <fct> <chr> <chr> <chr>
## 1 text1 15840 15840 "so far , … Alice ", a… alice and sh… and she
## 2 text1 20942 20942 "be behead… Alice ", a… alice and sh… and she
## 3 text1 25847 25847 "quite a n… Alice ", a… alice and sh… and she
## 4 text1 33229 33229 "curious d… Alice ", a… alice and sh… and she
## 5 text1 33350 33350 ", and thi… Alice "and… alice and al… and all
## 6 text1 16498 16498 "said pig … Alice "; \… alice and i … and i
## 7 text1 3625 3625 "words , \… Alice ", a… alice and he… and her
## 8 text1 1692 1692 "here befo… Alice ", )… alice and ro… and round
## 9 text1 25955 25955 "eyes . He… Alice ", a… alice and tr… and tried
## 10 text1 6573 6573 "you know … Alice ", \… alice and wh… and why
## # … with 4 more variables: ThirdWord <chr>, FreqW1 <int>, FreqW2 <int>,
## # FreqW3 <int>, and abbreviated variable names ¹CleanPost, ²FirstWord
The results now show the concordance arranged by the frequency of the words following the keyword.
As many analyses use transcripts as their primary data and because transcripts have features that require additional processing, we will now perform concordancing based on on transcripts. As a first step, we load five example transcripts that represent the first five files from the Irish component of the International Corpus of English.
# define corpus files
files <- paste("https://slcladal.github.io/data/ICEIrelandSample/S1A-00", 1:5, ".txt", sep = "")
# load corpus files
transcripts <- sapply(files, function(x){
x <- readLines(x)
})
. |
<S1A-001 Riding> |
<I> |
<S1A-001$A> <#> Well how did the riding go tonight |
<S1A-001$B> <#> It was good so it was <#> Just I I couldn't believe that she was going to let me jump <,> that was only the fourth time you know <#> It was great <&> laughter </&> |
<S1A-001$A> <#> What did you call your horse |
<S1A-001$B> <#> I can't remember <#> Oh Mary 's Town <,> oh |
<S1A-001$A> <#> And how did Mabel do |
<S1A-001$B> <#> Did you not see her whenever she was going over the jumps <#> There was one time her horse refused and it refused three times <#> And then <,> she got it round and she just lined it up straight and she just kicked it and she hit it with the whip <,> and over it went the last time you know <#> And Stephanie told her she was very determined and very well-ridden <&> laughter </&> because it had refused the other times you know <#> But Stephanie wouldn't let her give up on it <#> She made her keep coming back and keep coming back <,> until <,> it jumped it you know <#> It was good |
<S1A-001$A> <#> Yeah I 'm not so sure her jumping 's improving that much <#> She uh <,> seemed to be holding the reins very tight |
The first ten lines shown above let us know that, after the header
(<S1A-001 Riding>
) and the symbol which indicates the
start of the transcript (<I>
), each utterance is
preceded by a sequence which indicates the section, file, and speaker
(e.g. <S1A-001$A>
). The first utterance is thus
uttered by speaker A
in file 001
of section
S1A
. In addition, there are several sequences that provide
meta-linguistic information which indicate the beginning of a speech
unit (<#>
), pauses (<,>
), and
laughter (<&> laughter </&>
).
To perform the concordancing, we need to change the format of the
transcripts because the kwic
function only works on
character, corpus, tokens object- in their present form, the transcripts
represent a list which contains vectors of strings. To change the
format, we collapse the individual utterances into a single character
vector for each transcript.
transcripts_collapsed <- sapply(files, function(x){
# read-in text
x <- readLines(x)
# paste all lines together
x <- paste0(x, collapse = " ")
# remove superfluous white spaces
x <- str_squish(x)
})
. |
<S1A-001 Riding> <I> <S1A-001$A> <#> Well how did the riding go tonight <S1A-001$B> <#> It was good so it was <#> Just I I couldn't believe that she was going to let me jump <,> that was only the fourth time you know <#> It was great <&> laughter </&> <S1A-001$A> <#> What did you call your horse <S1A-001$B> <#> I can't remember <#> Oh Mary 's Town <,> oh <S1A-001$A> <#> And how did Mabel do <S1A-001$B> <#> Did you not see her whenever she was going over the jumps <#> There was one time her horse |
<S1A-002 Dinner chat 1> <I> <S1A-002$A> <#> He 's been married for three years and is now <{> <[> getting divorced </[> <S1A-002$B> <#> <[> No no </[> </{> he 's got married last year and he 's getting <{> <[> divorced </[> <S1A-002$A> <#> <[> He 's now </[> </{> getting divorced <S1A-002$C> <#> Just right <S1A-002$D> <#> A wee girl of her age like <S1A-002$E> <#> Well there was a guy <S1A-002$C> <#> How long did she try it for <#> An hour a a year <S1A-002$B> <#> Mhm <{> <[> mhm </[> <S1A-002$E |
<S1A-003 Dinner chat 2> <I> <S1A-003$A> <#> I <.> wa </.> I want to go to Peru but uh <S1A-003$B> <#> Do you <S1A-003$A> <#> Oh aye <S1A-003$B> <#> I 'd love to go to Peru <S1A-003$A> <#> I want I want to go up the Machu Picchu before it falls off the edge of the mountain <S1A-003$B> <#> Lima 's supposed to be a bit dodgy <S1A-003$A> <#> Mm <S1A-003$B> <#> Bet it would be <S1A-003$B> <#> Mm <S1A-003$A> <#> But I I just I I would like <,> Machu Picchu is collapsing <S1A-003$B> <#> I don't know wh |
<S1A-004 Nursing home 1> <I> <S1A-004$A> <#> Honest to God <,> I think the young ones <#> Sure they 're flying on Monday in I think it 's Shannon <#> This is from Texas <S1A-004$B> <#> This English girl <S1A-004$A> <#> The youngest one <,> the dentist <,> she 's married to the dentist <#> Herself and her husband <,> three children and she 's six months pregnant <S1A-004$C> <#> Oh God <S1A-004$B> <#> And where are they going <S1A-004$A> <#> Coming to Dublin to the mother <{> <[> or <unclear> 3 sy |
<S1A-005 Masons> <I> <S1A-005$A> <#> Right shall we risk another beer or shall we try and <,> <{> <[> ride the bikes down there or do something like that </[> <S1A-005$B> <#> <[> Well <,> what about the </[> </{> provisions <#> What time <{> <[> <unclear> 4 sylls </unclear> </[> <S1A-005$C> <#> <[> Is is your </[> </{> man coming here <S1A-005$B> <#> <{> <[> Yeah </[> <S1A-005$A> <#> <[> He said </[> </{> he would meet us here <S1A-005$B> <#> Just the boat 's arriving you know a few minutes ' wa |
We can now extract the concordances.
kwic_trans <- quanteda::kwic(
# tokenize transcripts
quanteda::tokens(transcripts_collapsed),
# define search pattern
pattern = phrase("you know"))
docname | from | to | pre | keyword | post | pattern |
https://slcladal.github.io/data/ICEIrelandSample/S1A-001.txt | 62 | 63 | was only the fourth time | you know | < # > It was | you know |
https://slcladal.github.io/data/ICEIrelandSample/S1A-001.txt | 204 | 205 | it went the last time | you know | < # > And Stephanie | you know |
https://slcladal.github.io/data/ICEIrelandSample/S1A-001.txt | 235 | 236 | had refused the other times | you know | < # > But Stephanie | you know |
https://slcladal.github.io/data/ICEIrelandSample/S1A-001.txt | 272 | 273 | , > it jumped it | you know | < # > It was | you know |
https://slcladal.github.io/data/ICEIrelandSample/S1A-001.txt | 602 | 603 | that one < , > | you know | and starting anew fresh < | you know |
https://slcladal.github.io/data/ICEIrelandSample/S1A-001.txt | 665 | 666 | { > < [ > | you know | < / [ > < | you know |
https://slcladal.github.io/data/ICEIrelandSample/S1A-001.txt | 736 | 737 | > We didn't discuss it | you know | < S1A-001 $ A > | you know |
https://slcladal.github.io/data/ICEIrelandSample/S1A-001.txt | 922 | 923 | on Tuesday < , > | you know | < # > But I | you know |
https://slcladal.github.io/data/ICEIrelandSample/S1A-001.txt | 1,126 | 1,127 | that she could take her | you know | the wee shoulder bag she | you know |
https://slcladal.github.io/data/ICEIrelandSample/S1A-001.txt | 1,257 | 1,258 | around < , > uhm | you know | their timetable and < , | you know |
The results show that each non-alphanumeric character is counted as a single word which reduces the context of the keyword substantially. Also, the docname column contains the full path to the data which make it hard to parse the content of the table. To address the first issue, we specify the tokenizer that we will use to not disrupt the annotation too much. In addition, we clean the docname column and extract only the file name. Lastly, we will expand the context window to 10 so that we have a better understanding of the context in which the phrase was used.
kwic_trans <- quanteda::kwic(
# tokenize transcripts
quanteda::tokens(transcripts_collapsed, what = "fasterword"),
# define search
pattern = phrase("you know"),
# extend context
window = 10) %>%
# clean docnames
dplyr::mutate(docname = str_replace_all(docname, ".*/([A-Z][0-9][A-Z]-[0-9]{1,3}).txt", "\\1"))
docname | from | to | pre | keyword | post | pattern |
S1A-001 | 42 | 43 | let me jump <,> that was only the fourth time | you know | <#> It was great <&> laughter </&> <S1A-001$A> <#> What | you know |
S1A-001 | 140 | 141 | the whip <,> and over it went the last time | you know | <#> And Stephanie told her she was very determined and | you know |
S1A-001 | 164 | 165 | <&> laughter </&> because it had refused the other times | you know | <#> But Stephanie wouldn't let her give up on it | you know |
S1A-001 | 193 | 194 | and keep coming back <,> until <,> it jumped it | you know | <#> It was good <S1A-001$A> <#> Yeah I 'm not | you know |
S1A-001 | 402 | 403 | 'd be far better waiting <,> for that one <,> | you know | and starting anew fresh <S1A-001$A> <#> Yeah but I mean | you know |
S1A-001 | 443 | 444 | the best goes top of the league <,> <{> <[> | you know | </[> <S1A-001$A> <#> <[> So </[> </{> it 's like | you know |
S1A-001 | 484 | 485 | I 'm not sure now <#> We didn't discuss it | you know | <S1A-001$A> <#> Well it sounds like more money <S1A-001$B> <#> | you know |
Extending the context can also be used to identify the speaker that has uttered the search pattern that we are interested in. We will do just that as this is a common task in linguistics analyses.
To extract speakers, we need to follow these steps:
Create normal concordances of the pattern that we are interested in.
Generate concordances of the pattern that we are interested in with a substantially enlarged context window size.
Extract the speakers from the enlarged context window size.
Add the speakers to the normal concordances using the
left-join
function from the dplyr
package.
kwic_normal <- quanteda::kwic(
# tokenize transcripts
quanteda::tokens(transcripts_collapsed, what = "fasterword"),
# define search
pattern = phrase("you know")) %>%
as.data.frame()
kwic_speaker <- quanteda::kwic(
# tokenize transcripts
quanteda::tokens(transcripts_collapsed, what = "fasterword"),
# define search
pattern = phrase("you know"),
# extend search window
window = 500) %>%
# convert to data frame
as.data.frame() %>%
# extract speaker (comes after $ and before >)
dplyr::mutate(speaker = stringr::str_replace_all(pre, ".*\\$(.*?)>.*", "\\1")) %>%
# extract speaker
dplyr::pull(speaker)
# add speaker to normal kwic
kwic_combined <- kwic_normal %>%
# add speaker
dplyr::mutate(speaker = kwic_speaker) %>%
# simplify docname
dplyr::mutate(docname = stringr::str_replace_all(docname, ".*/([A-Z][0-9][A-Z]-[0-9]{1,3}).txt", "\\1")) %>%
# remove superfluous columns
dplyr::select(-to, -from, -pattern)
docname | pre | keyword | post | speaker |
S1A-001 | was only the fourth time | you know | <#> It was great <&> | B |
S1A-001 | it went the last time | you know | <#> And Stephanie told her | B |
S1A-001 | had refused the other times | you know | <#> But Stephanie wouldn't let | B |
S1A-001 | until <,> it jumped it | you know | <#> It was good <S1A-001$A> | B |
S1A-001 | <,> for that one <,> | you know | and starting anew fresh <S1A-001$A> | B |
S1A-001 | the league <,> <{> <[> | you know | </[> <S1A-001$A> <#> <[> So | B |
S1A-001 | <#> We didn't discuss it | you know | <S1A-001$A> <#> Well it sounds | B |
S1A-001 | her lesson on Tuesday <,> | you know | <#> But I was keeping | B |
S1A-001 | that she could take her | you know | the wee shoulder bag she | B |
S1A-001 | show them around <,> uhm | you know | their timetable and <,> give | B |
The resulting table shows that we have successfully extracted the
speakers (identified by the letters in the speaker
column)
and cleaned the file names (in the docnames
column).
As R represents a fully-fledged programming environment, we can, of course, also write our own, customized concordance function. The code below shows how you could go about doing so. Note, however, that this function only works if you enter more than a single file.
mykwic <- function(txts, pattern, context) {
# activate packages
require(stringr)
# list files
txts <- txts[stringr::str_detect(txts, pattern)]
conc <- sapply(txts, function(x) {
# determine length of text
lngth <- as.vector(unlist(nchar(x)))
# determine position of hits
idx <- str_locate_all(x, pattern)
idx <- idx[[1]]
ifelse(nrow(idx) >= 1, idx <- idx, return(NA))
# define start position of hit
token.start <- idx[,1]
# define end position of hit
token.end <- idx[,2]
# define start position of preceding context
pre.start <- ifelse(token.start-context < 1, 1, token.start-context)
# define end position of preceding context
pre.end <- token.start-1
# define start position of subsequent context
post.start <- token.end+1
# define end position of subsequent context
post.end <- ifelse(token.end+context > lngth, lngth, token.end+context)
# extract the texts defined by the positions
PreceedingContext <- substring(x, pre.start, pre.end)
Token <- substring(x, token.start, token.end)
SubsequentContext <- substring(x, post.start, post.end)
Id <- 1:length(Token)
conc <- cbind(Id, PreceedingContext, Token, SubsequentContext)
# return concordance
return(conc)
})
concdf <- do.call(rbind, conc) %>%
as.data.frame()
return(concdf)
}
We can now try if this function works by searching for the sequence
you know in the transcripts that we have loaded earlier. One
difference between the kwic
function provided by the
quanteda
package and the customized concordance function
used here is that the kwic
function uses the number of
words to define the context window, while the mykwic
function uses the number of characters or symbols instead (which is why
we use a notably higher number to define the context window).
kwic_youknow <- mykwic(transcripts_collapsed, "you know", 50)
Id | PreceedingContext | Token | SubsequentContext |
1 | to let me jump <,> that was only the fourth time | you know | <#> It was great <&> laughter </&> <S1A-001$A> <# |
2 | with the whip <,> and over it went the last time | you know | <#> And Stephanie told her she was very determine |
3 | ghter </&> because it had refused the other times | you know | <#> But Stephanie wouldn't let her give up on it |
4 | k and keep coming back <,> until <,> it jumped it | you know | <#> It was good <S1A-001$A> <#> Yeah I 'm not so |
5 | she 'd be far better waiting <,> for that one <,> | you know | and starting anew fresh <S1A-001$A> <#> Yeah but |
6 | er 's the best goes top of the league <,> <{> <[> | you know | </[> <S1A-001$A> <#> <[> So </[> </{> it 's like |
As this concordance function only works for more than one text, we split the text into chapters and assign each section a name.
# read in text
text_split <- text %>%
stringr::str_squish() %>%
stringr::str_split("[CHAPTER]{7,7} [XVI]{1,7}\\. ") %>%
unlist()
text_split <- text_split[which(nchar(text_split) > 2000)]
# add names
names(text_split) <- paste0("text", 1:length(text_split))
# inspect data
nchar(text_split)
## text1 text2 text3 text4 text5 text6 text7 text8 text9 text10 text11
## 11331 10888 9137 13830 11767 13730 12564 13585 12527 11287 10292
## text12
## 11518
Now that we have named elements, we can search for the pattern
poor alice. We also need to clean the concordance as some
sections do not contain any instances of the search pattern. To clean
the data, we select only the columns File
,
PreceedingContext
, Token
, and
SubsequentContext
and then remove all rows where
information is missing.
mykwic_pooralice <- mykwic(text_split, "poor Alice", 50)
Id | PreceedingContext | Token | SubsequentContext |
1 | ; “and even if my head would go through,” thought | poor Alice | , “it would be of very little use without my shoul |
2 | d on going into the garden at once; but, alas for | poor Alice | ! when she got to the door, she found she had forg |
3 | to be two people. “But it’s no use now,” thought | poor Alice | , “to pretend to be two people! Why, there’s hardl |
1 | !” “I’m sure those are not the right words,” said | poor Alice | , and her eyes filled with tears again as she went |
1 | lking such nonsense!” “I didn’t mean it!” pleaded | poor Alice | . “But you’re so easily offended, you know!” The M |
2 | onder if I shall ever see you any more!” And here | poor Alice | began to cry again, for she felt very lonely and |
You can go ahead and modify the customized concordance function to suit your needs.
Schweinberger, Martin. 2022. Concordancing with R. Brisbane: The University of Queensland. url: https://ladal.edu.au/kwics.html (Version 2022.11.15).
@manual{schweinberger2022kwics,
author = {Schweinberger, Martin},
title = {Concordancing with R},
note = {https://ladal.edu.au/kwics.html},
year = {2022},
organization = "The University of Queensland, Australia. School of Languages and Cultures},
address = {Brisbane},
edition = {2011.11.15}
}
sessionInfo()
## R version 4.2.1 RC (2022-06-17 r82510 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8
## [3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C
## [5] LC_TIME=German_Germany.utf8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] flextable_0.8.2 stringr_1.4.1 dplyr_1.0.10 quanteda_3.2.2
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.2 xfun_0.32 bslib_0.4.0 purrr_0.3.4
## [5] lattice_0.20-45 vctrs_0.4.1 generics_0.1.3 htmltools_0.5.3
## [9] yaml_2.3.5 base64enc_0.1-3 utf8_1.2.2 rlang_1.0.4
## [13] jquerylib_0.1.4 pillar_1.8.1 glue_1.6.2 DBI_1.1.3
## [17] gdtools_0.2.4 uuid_1.1-0 lifecycle_1.0.1 zip_2.2.0
## [21] evaluate_0.16 knitr_1.40 fastmap_1.1.0 fansi_1.0.3
## [25] highr_0.9 Rcpp_1.0.9 cachem_1.0.6 RcppParallel_5.1.5
## [29] jsonlite_1.8.0 systemfonts_1.0.4 fastmatch_1.1-3 stopwords_2.3
## [33] digest_0.6.29 stringi_1.7.8 grid_4.2.1 cli_3.3.0
## [37] tools_4.2.1 magrittr_2.0.3 sass_0.4.2 klippy_0.0.0.9500
## [41] tibble_3.1.8 pkgconfig_2.0.3 ellipsis_0.3.2 Matrix_1.5-1
## [45] data.table_1.14.2 xml2_1.3.3 assertthat_0.2.1 rmarkdown_2.16
## [49] officer_0.4.4 rstudioapi_0.14 R6_2.5.1 compiler_4.2.1