Topic Modelling of Charles Dickens’ Novels

Author

Gerold Schneider & Max Lauber

Introduction

Code in This Tutorial Is Not Auto-Executed

The topic modelling steps in this tutorial — particularly the udpipe annotation and the stm() model fitting — are computationally intensive and are not run automatically during knitting. Work through the tutorial interactively in RStudio. The annotation step can take 10–20 minutes; each model fitting step takes 1–5 minutes. A caching strategy is included to avoid repeating the annotation step.

This tutorial shows how to perform topic modelling using R. There is an incredible amount of text archived all over the world — probably more text produced on any given day than a single person could ever hope to read. What are linguists and other language-oriented scholars going to do with this embarrassment of riches? If automated language methods are the answer, and you have yet to acquaint yourself with topic modelling, welcome.

Topic modelling helps us engage with large quantities of text by identifying co-occurrence patterns, which, when done right, can yield new perspectives on a set of texts. It is useful to think of it as “a lens that allows researchers working on a problem to view a relevant textual corpus in a different light and at a different scale” (Mohr and Bogdanov 2013, 560).

Prerequisite Tutorials

Before working through this tutorial, we recommend familiarity with:

Learning Objectives

By the end of this tutorial you will be able to:

  1. Construct a corpus from Project Gutenberg using the gutenbergr package
  2. Use udpipe to POS-tag a corpus and remove proper nouns
  3. Pre-process text for topic modelling: tokenisation, lowercasing, punctuation removal, stopword removal, and chunking
  4. Build a document-term matrix using quanteda
  5. Fit structural topic models using the stm package
  6. Iteratively improve a topic model by adjusting chunk size, number of topics, and document-frequency thresholds
  7. Interpret topic model output in the context of a research question
Citation

Schneider, Gerold, Max Lauber & Martin Schweinberger. 2026. Topic Modelling of Charles Dickens’ Novels. Brisbane: The Language Technology and Data Analysis Laboratory (LADAL). url: https://ladal.edu.au/tutorials/topmod/topmod.html (Version 2026.05.01).

Original tutorial by Gerold Schneider & Max Lauber (2022), created for the Australian Text Analytics Platform (ATAP). Adapted for LADAL by Martin Schweinberger.


Motivation

Topic modelling lets us engage with large corpora by identifying co-occurrence patterns. The methodology is one implementation of the Firthian hypothesis that “you shall know a word by the company it keeps” (Firth 1957, 11). Words that frequently appear in similar contexts are often representative of the same topic. To arrive at a meaningful model, we need to prepare the text: removing proper names, filtering stopwords, and chunking the text into pseudo-documents of a suitable size.

Research questions

Charles Dickens is famous for his social criticism — particularly his treatment of poverty and his vision for including the poor inside society Mahlberg (2013). He is also celebrated for his literary realism. This gives us two research questions:

  1. Can we use topic modelling to bring Dickens’ social criticism to the fore, without the heavy lifting of actually reading his books?
  2. Can we use topic modelling to explore the rich imagery that Dickens constructs with his literary realism?

To explore these questions, we construct a small corpus of eight Dickens novels:

  • A Christmas Carol
  • A Tale of Two Cities
  • The Pickwick Papers
  • Oliver Twist
  • David Copperfield
  • Hard Times
  • Nicholas Nickleby
  • Great Expectations

Setup

Installing packages

Code
install.packages(c(
  "gutenbergr",          # download texts from Project Gutenberg
  "quanteda",            # corpus management and DFM construction
  "quanteda.textmodels", # text models built on quanteda
  "tidytext",            # tidy text mining tools
  "stm",                 # Structural Topic Model
  "dplyr",               # data manipulation
  "purrr",               # functional iteration
  "udpipe",              # POS tagging (pure R, no Python needed)
  "checkdown"            # interactive exercises
))

Loading packages

Code
library(gutenbergr)
library(quanteda)
library(quanteda.textmodels)
library(tidytext)
library(stm)
library(dplyr)
library(purrr)
library(udpipe)
library(checkdown)
No Python Required

The original version of this tutorial used spacyr for POS tagging, which requires a Python installation and the spaCy library. This version replaces spacyr with udpipe, a pure-R POS tagger that requires no external dependencies. The results are comparable for our purposes (identifying and removing proper nouns), and the installation is much simpler.


Data: Downloading the Dickens Corpus

Section Overview

What you will learn: How to use the gutenbergr package to find and download public-domain texts from Project Gutenberg; how to handle mirror failures with a robust download helper; and how to collapse line-by-line text into a per-book format suitable for processing

Dickens is convenient to work with because his novels are old enough to be part of the public domain. This means the eight novels can be downloaded entirely legally from Project Gutenberg.

First, we check which of Dickens’ texts are available on Gutenberg:

Code
dickens <- gutenbergr::gutenberg_works(author == "Dickens, Charles")
print(dickens, n = nrow(dickens))
# A tibble: 54 × 8
   gutenberg_id title    author gutenberg_author_id language gutenberg_bookshelf
          <int> <chr>    <chr>                <int> <chr>    <chr>              
 1           46 "A Chri… Dicke…                  37 en       "Children's Litera…
 2          564 "The My… Dicke…                  37 en       "Mystery Fiction"  
 3          580 "The Pi… Dicke…                  37 en       "Best Books Ever L…
 4          699 "A Chil… Dicke…                  37 en       "Children's Histor…
 5          700 "The Ol… Dicke…                  37 en       ""                 
 6          730 "Oliver… Dicke…                  37 en       ""                 
 7          766 "David … Dicke…                  37 en       "Harvard Classics" 
 8          821 "Dombey… Dicke…                  37 en       ""                 
 9          917 "Barnab… Dicke…                  37 en       "Historical Fictio…
10          963 "Little… Dicke…                  37 en       ""                 
11          967 "Nichol… Dicke…                  37 en       ""                 
12          968 "Martin… Dicke…                  37 en       "Best Books Ever L…
13         1023 "Bleak … Dicke…                  37 en       ""                 
14         1392 "The Se… Dicke…                  37 en       ""                 
15         1394 "The Ho… Dicke…                  37 en       ""                 
16         1406 "The Pe… Dicke…                  37 en       ""                 
17         1407 "A Mess… Dicke…                  37 en       ""                 
18         1413 "Tom Ti… Dicke…                  37 en       ""                 
19         1414 "Somebo… Dicke…                  37 en       ""                 
20         1415 "Doctor… Dicke…                  37 en       ""                 
21         1416 "Mrs. L… Dicke…                  37 en       ""                 
22         1419 "Mugby … Dicke…                  37 en       ""                 
23         1421 "Mrs. L… Dicke…                  37 en       ""                 
24         1422 "Going … Dicke…                  37 en       ""                 
25         1423 "No Tho… Dicke…                  37 en       ""                 
26         1465 "The Wr… Dicke…                  37 en       ""                 
27        15618 "The Lo… Dicke…                  37 en       ""                 
28        19337 "A Chri… Dicke…                  37 en       "Children's Litera…
29        20795 "The Cr… Dicke…                  37 en       "Children's Litera…
30        23344 "The Ma… Dicke…                  37 en       "Children's Pictur…
31        23452 "The Tr… Dicke…                  37 en       "Children's Pictur…
32        23765 "Captai… Dicke…                  37 en       "Children's Pictur…
33        25852 "The Le… Dicke…                  37 en       ""                 
34        25853 "The Le… Dicke…                  37 en       ""                 
35        25854 "The Le… Dicke…                  37 en       ""                 
36        25985 "Bardel… Dicke…                  37 en       ""                 
37        30127 "Tales … Dicke…                  37 en       ""                 
38        30368 "A Chri… Dicke…                  37 en       ""                 
39        32241 "Dicken… Dicke…                  37 en       ""                 
40        35536 "The Po… Dicke…                  37 en       ""                 
41        37121 "Charle… Dicke…                  37 en       ""                 
42        37581 "The Cr… Dicke…                  37 en       ""                 
43        40723 "The Ba… Dicke…                  37 en       ""                 
44        40729 "\"Old … Dicke…                  37 en       ""                 
45        41739 "A Chri… Dicke…                  37 en       ""                 
46        41894 "Christ… Dicke…                  37 en       ""                 
47        42232 "A Chil… Dicke…                  37 en       ""                 
48        43111 "The Pe… Dicke…                  37 en       ""                 
49        43207 "Scenes… Dicke…                  37 en       ""                 
50        46675 "Oliver… Dicke…                  37 en       ""                 
51        47534 "The Po… Dicke…                  37 en       ""                 
52        47535 "The Po… Dicke…                  37 en       ""                 
53        49125 "Storie… Dicke…                  37 en       ""                 
54        52125 "Nell a… Dicke…                  37 en       ""                 
# ℹ 2 more variables: rights <chr>, has_text <lgl>

From this we identify the IDs of the eight novels:

  • 46 = A Christmas Carol
  • 98 = A Tale of Two Cities
  • 580 = The Pickwick Papers
  • 730 = Oliver Twist
  • 766 = David Copperfield
  • 786 = Hard Times
  • 967 = Nicholas Nickleby
  • 1400 = Great Expectations

Robust download helper

Project Gutenberg’s primary servers are sometimes unavailable or rate-limited. The helper function below tries a list of mirrors in order and falls back to a direct cache URL if all mirrors fail:

Code
gutenberg_safe <- function(id,
                           meta_fields    = "title",
                           title_fallback = NA_character_) {
  mirrors <- c(
    "http://mirrors.xmission.com/gutenberg/",
    "http://gutenberg.pglaf.org/",
    "https://gutenberg.readingroo.ms/",
    "http://gutenberg.nabasny.com/"
  )
  result <- NULL

  # Step 1: try each mirror via gutenbergr
  for (m in mirrors) {
    tryCatch({
      dl <- gutenbergr::gutenberg_download(id,
                                           meta_fields = meta_fields,
                                           mirror      = m)
      if (!is.null(dl) && nrow(dl) > 0) {
        message("Downloaded ID ", id, " via mirror: ", m)
        result <- dl
        break
      }
    }, error   = function(e) NULL,
       warning = function(w) NULL)
  }

  # Step 2: fall back to direct cache URL if all mirrors failed
  if (is.null(result) || nrow(result) == 0) {
    message("All mirrors failed for ID ", id, " — trying direct cache URL")
    cache_url <- paste0(
      "https://www.gutenberg.org/cache/epub/", id, "/pg", id, ".txt"
    )
    tryCatch({
      lines <- readLines(url(cache_url), warn = FALSE, encoding = "UTF-8")
      if (is.na(title_fallback)) {
        title_fallback <- gutenbergr::gutenberg_metadata |>
          dplyr::filter(gutenberg_id == id) |>
          dplyr::pull(title) |>
          dplyr::first()
      }
      result <- data.frame(
        gutenberg_id     = id,
        text             = lines,
        title            = title_fallback,
        stringsAsFactors = FALSE
      )
      message("Downloaded ID ", id, " via direct cache URL (",
              nrow(result), " lines)")
    }, error = function(e) {
      stop("Could not download ID ", id, ": ", conditionMessage(e))
    })
  }
  result
}

Downloading the eight novels

We download each novel individually using gutenberg_safe() and combine the results:

Code
list_dickens <- c(46, 98, 580, 730, 766, 786, 967, 1400)

dickens_corpus <- purrr::map_dfr(
  list_dickens,
  ~ gutenberg_safe(.x, meta_fields = "title")
)

The download returns one row per line per work. For processing, we collapse each novel into a single row:

Code
dickens_corpus <- dickens_corpus |>
  dplyr::filter(text != "") |>
  dplyr::group_by(title) |>
  dplyr::summarise(text = paste0(text, collapse = " "), .groups = "drop")

head(dickens_corpus)
# A tibble: 6 × 2
  title                                                        text             
  <chr>                                                        <chr>            
1 A Christmas Carol in Prose; Being a Ghost Story of Christmas "The Project Gut…
2 A Tale of Two Cities                                         "A TALE OF TWO C…
3 David Copperfield                                            "The Project Gut…
4 Great Expectations                                           "The Project Gut…
5 Hard Times                                                   "The Project Gut…
6 Nicholas Nickleby                                            "THE LIFE AND AD…

We now have one row per novel — eight rows in total:

Code
table(dickens_corpus$title)

A Christmas Carol in Prose; Being a Ghost Story of Christmas 
                                                           1 
                                        A Tale of Two Cities 
                                                           1 
                                           David Copperfield 
                                                           1 
                                          Great Expectations 
                                                           1 
                                                  Hard Times 
                                                           1 
                                           Nicholas Nickleby 
                                                           1 
                                                Oliver Twist 
                                                           1 
                                         The Pickwick Papers 
                                                           1 

Pre-Processing, Part 1: POS Tagging and Proper Noun Removal

Section Overview

What you will learn: Why proper nouns are problematic for topic modelling of novels; how to download and use a udpipe model for POS tagging; how to remove proper nouns from the tagged corpus; and why we accept some tagging inaccuracies rather than making manual corrections

Why remove proper nouns?

The concept of topic modelling is based on the Firthian hypothesis that words which frequently occur in similar contexts are representative of the same topic. One of the most common techniques is the latent Dirichlet allocation (LDA). LDA

derives word clusters using a generative statistical process that begins by assuming that each document in a collection of documents is constructed from a mix of some set of possible topics. The model then assigns high probabilities to words and sets of words that tend to co-occur in multiple contexts across the corpus. (Jockers 2013, 51:123)

The algorithm we use is STM (Structural Topic Model), which is very similar to LDA but is additionally capable of processing metadata about texts. See Roberts, Stewart, and Tingley (2019) for technical details.

In the context of novels, proper nouns are mostly names of characters and locations — and these names tend to co-occur mostly within individual novels. This means the model latches onto distinct proper nouns, which display far more consistent co-occurrence patterns than any other feature, crowding out patterns that are actually helpful for answering our research questions.

Downloading and loading the udpipe model

The udpipe_download_model() function downloads approximately 17 MB on the first run and returns an object whose $file_model element contains the full path. Setting overwrite = FALSE means the download is skipped on every subsequent run:

Code
meng_model <- udpipe::udpipe_download_model(
  language  = "english-ewt",
  overwrite = FALSE
)
m_eng <- udpipe::udpipe_load_model(file = meng_model$file_model)
Fully Portable

Using meng_model$file_model rather than a hardcoded path means this code works on any machine regardless of operating system or working directory.

Annotating the corpus

The annotation step POS-tags every token — approximately two million words across the eight novels. This takes 10–20 minutes depending on your hardware. We use a cache-on-disk strategy so the annotation only runs once:

Code
cache_path <- "tutorials/topmod/data/dickens_parsed.rds"

if (file.exists(cache_path)) {
  dickens_parsed <- readRDS(cache_path)
  message("Loaded parsed corpus from cache.")
} else {
  message("Annotating corpus — this may take 10–20 minutes...")
  dickens_parsed <- as.data.frame(
    udpipe::udpipe_annotate(m_eng, x = dickens_corpus$text)
  )
  dir.create(dirname(cache_path), recursive = TRUE, showWarnings = FALSE)
  saveRDS(dickens_parsed, cache_path)
  message("Annotation complete. Saved to cache.")
}

Let’s inspect the structure:

Code
str(dickens_parsed)
'data.frame':   2015496 obs. of  14 variables:
 $ doc_id       : chr  "doc1" "doc1" "doc1" "doc1" ...
 $ paragraph_id : int  1 1 1 1 1 1 1 1 1 1 ...
 $ sentence_id  : int  1 1 1 1 1 1 1 1 1 1 ...
 $ sentence     : chr  "The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for t"| __truncated__ "The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for t"| __truncated__ "The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for t"| __truncated__ "The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for t"| __truncated__ ...
 $ token_id     : chr  "1" "2" "3" "4" ...
 $ token        : chr  "The" "Project" "Gutenberg" "eBook" ...
 $ lemma        : chr  "the" "project" "Gutenberg" "eBook" ...
 $ upos         : chr  "DET" "PROPN" "PROPN" "PROPN" ...
 $ xpos         : chr  "DT" "NNP" "NNP" "NNP" ...
 $ feats        : chr  "Definite=Def|PronType=Art" "Number=Sing" "Number=Sing" "Number=Sing" ...
 $ head_token_id: chr  "4" "4" "4" "15" ...
 $ dep_rel      : chr  "det" "compound" "compound" "nsubj" ...
 $ deps         : chr  NA NA NA NA ...
 $ misc         : chr  NA NA NA NA ...

We now have a data frame with one row per token, with columns including the token itself (token), its lemma, and its part-of-speech tag (upos). Roughly two million rows for eight novels sounds about right.

Code
head(dickens_parsed, n = 15)
   doc_id paragraph_id sentence_id
1    doc1            1           1
2    doc1            1           1
3    doc1            1           1
4    doc1            1           1
5    doc1            1           1
6    doc1            1           1
7    doc1            1           1
8    doc1            1           1
9    doc1            1           1
10   doc1            1           1
11   doc1            1           1
12   doc1            1           1
13   doc1            1           1
14   doc1            1           1
15   doc1            1           1
                                                                                                                                                                                                                                                 sentence
1  The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
2  The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
3  The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
4  The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
5  The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
6  The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
7  The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
8  The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
9  The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
10 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
11 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
12 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
13 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
14 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
15 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
   token_id     token     lemma  upos xpos                     feats
1         1       The       the   DET   DT Definite=Def|PronType=Art
2         2   Project   project PROPN  NNP               Number=Sing
3         3 Gutenberg Gutenberg PROPN  NNP               Number=Sing
4         4     eBook     eBook PROPN  NNP               Number=Sing
5         5        of        of   ADP   IN                      <NA>
6         6         A         a PROPN  NNP               Number=Sing
7         7 Christmas Christmas PROPN  NNP               Number=Sing
8         8     Carol     Carol PROPN  NNP               Number=Sing
9         9        in        in   ADP   IN                      <NA>
10       10     Prose     Prose PROPN  NNP               Number=Sing
11       11         ;         ; PUNCT    ,                      <NA>
12       12     Being        be   AUX  VBG              VerbForm=Ger
13       13         a         a   DET   DT Definite=Ind|PronType=Art
14       14     Ghost     Ghost  NOUN   NN               Number=Sing
15       15     Story     story  NOUN   NN               Number=Sing
   head_token_id  dep_rel deps          misc
1              4      det <NA>          <NA>
2              4 compound <NA>          <NA>
3              4 compound <NA>          <NA>
4             15    nsubj <NA>          <NA>
5              8     case <NA>          <NA>
6              8 compound <NA>          <NA>
7              8 compound <NA>          <NA>
8              4     nmod <NA>          <NA>
9             10     case <NA>          <NA>
10             8     nmod <NA> SpaceAfter=No
11             4    punct <NA>          <NA>
12            15      cop <NA>          <NA>
13            15      det <NA>          <NA>
14            15 compound <NA>          <NA>
15            23    nsubj <NA>          <NA>

Words from the title (e.g. Christmas, Carol) may be tagged as proper nouns because they are capitalised, even though they are not personal names here. This kind of noise is acceptable — at the scale of two million tokens, occasional mislabellings have negligible impact on the final model.

Removing proper nouns

We remove all tokens tagged as PROPN (proper noun):

Code
no_prop_dickens <- dickens_parsed[dickens_parsed$upos != "PROPN", ]
nrow(no_prop_dickens)
[1] 1934146

Proper nouns make up only roughly 4–5% of the corpus. A quick check confirms the removal worked:

Code
head(no_prop_dickens, n = 15)
   doc_id paragraph_id sentence_id
1    doc1            1           1
5    doc1            1           1
9    doc1            1           1
11   doc1            1           1
12   doc1            1           1
13   doc1            1           1
14   doc1            1           1
15   doc1            1           1
16   doc1            1           1
20   doc1            1           1
21   doc1            1           1
22   doc1            1           1
23   doc1            1           1
24   doc1            1           1
25   doc1            1           1
                                                                                                                                                                                                                                                 sentence
1  The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
5  The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
9  The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
11 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
12 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
13 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
14 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
15 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
16 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
20 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
21 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
22 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
23 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
24 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
25 The Project Gutenberg eBook of A Christmas Carol in Prose; Being a Ghost Story of Christmas This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever.
   token_id  token  lemma  upos xpos
1         1    The    the   DET   DT
5         5     of     of   ADP   IN
9         9     in     in   ADP   IN
11       11      ;      ; PUNCT    ,
12       12  Being     be   AUX  VBG
13       13      a      a   DET   DT
14       14  Ghost  Ghost  NOUN   NN
15       15  Story  story  NOUN   NN
16       16     of     of   ADP   IN
20       20     is     be   AUX  VBZ
21       21    for    for   ADP   IN
22       22    the    the   DET   DT
23       23    use    use  NOUN   NN
24       24     of     of   ADP   IN
25       25 anyone anyone  PRON   NN
                                                   feats head_token_id
1                              Definite=Def|PronType=Art             4
5                                                   <NA>             8
9                                                   <NA>            10
11                                                  <NA>             4
12                                          VerbForm=Ger            15
13                             Definite=Ind|PronType=Art            15
14                                           Number=Sing            15
15                                           Number=Sing            23
16                                                  <NA>            18
20 Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin            23
21                                                  <NA>            23
22                             Definite=Def|PronType=Art            23
23                                           Number=Sing             0
24                                                  <NA>            26
25                                           Number=Sing            26
     dep_rel deps misc
1        det <NA> <NA>
5       case <NA> <NA>
9       case <NA> <NA>
11     punct <NA> <NA>
12       cop <NA> <NA>
13       det <NA> <NA>
14  compound <NA> <NA>
15     nsubj <NA> <NA>
16      case <NA> <NA>
20       cop <NA> <NA>
21      case <NA> <NA>
22       det <NA> <NA>
23      root <NA> <NA>
24      case <NA> <NA>
25 nmod:poss <NA> <NA>

Pre-Processing, Part 2: Tokenisation, Normalisation, and Chunking

Section Overview

What you will learn: How to convert the annotated data frame to a quanteda tokens object; how to lowercase, remove punctuation, and remove stopwords; and how to chunk the resulting token stream into pseudo-documents of a controlled size

Tokenising

The udpipe output is a plain data frame, not a list or spacyr object, so as.tokens() requires a conversion step. We first use split() to group the tokens back into a named list by document ID, then pass that list to as.tokens():

Code
# Split token column by document ID to get a named list, then coerce to tokens
toks_list <- split(no_prop_dickens$token, no_prop_dickens$doc_id)
toks      <- quanteda::as.tokens(toks_list)
head(toks, n = 15)
Tokens consisting of 8 documents.
doc1 :
 [1] "The"   "of"    "in"    ";"     "Being" "a"     "Ghost" "Story" "of"   
[10] "is"    "for"   "the"  
[ ... and 38,225 more ]

doc2 :
 [1] "A"          "TALE"       "OF"         "TWO"        "CITIES"    
 [6] "A"          "STORY"      "OF"         "THE"        "FRENCH"    
[11] "REVOLUTION" "By"        
[ ... and 162,767 more ]

doc3 :
 [1] "The"      "of"       "This"     "is"       "for"      "the"     
 [7] "use"      "of"       "anyone"   "anywhere" "in"       "the"     
[ ... and 430,836 more ]

doc4 :
 [1] "The"          "of"           "Great"        "Expectations" "This"        
 [6] "is"           "for"          "the"          "use"          "of"          
[11] "anyone"       "anywhere"    
[ ... and 223,489 more ]

doc5 :
 [1] "The"      "of"       "This"     "is"       "for"      "the"     
 [7] "use"      "of"       "anyone"   "anywhere" "in"       "the"     
[ ... and 127,684 more ]

doc6 :
 [1] "THE"        "LIFE"       "AND"        "ADVENTURES" "OF"        
 [6] ","          "containing" "a"          "Faithful"   "Account"   
[11] "of"         "the"       
[ ... and 389,237 more ]

[ reached max_ndoc ... 2 more documents ]

The corpus now shows one document (sentence) per element, with proper nouns removed.

Lowercasing

For the model to correctly identify semantically identical features such as Ghostly and ghostly, we convert all tokens to lowercase:

Code
toks <- quanteda::tokens_tolower(toks)
head(toks, n = 15)
Tokens consisting of 8 documents.
doc1 :
 [1] "the"   "of"    "in"    ";"     "being" "a"     "ghost" "story" "of"   
[10] "is"    "for"   "the"  
[ ... and 38,225 more ]

doc2 :
 [1] "a"          "tale"       "of"         "two"        "cities"    
 [6] "a"          "story"      "of"         "the"        "french"    
[11] "revolution" "by"        
[ ... and 162,767 more ]

doc3 :
 [1] "the"      "of"       "this"     "is"       "for"      "the"     
 [7] "use"      "of"       "anyone"   "anywhere" "in"       "the"     
[ ... and 430,836 more ]

doc4 :
 [1] "the"          "of"           "great"        "expectations" "this"        
 [6] "is"           "for"          "the"          "use"          "of"          
[11] "anyone"       "anywhere"    
[ ... and 223,489 more ]

doc5 :
 [1] "the"      "of"       "this"     "is"       "for"      "the"     
 [7] "use"      "of"       "anyone"   "anywhere" "in"       "the"     
[ ... and 127,684 more ]

doc6 :
 [1] "the"        "life"       "and"        "adventures" "of"        
 [6] ","          "containing" "a"          "faithful"   "account"   
[11] "of"         "the"       
[ ... and 389,237 more ]

[ reached max_ndoc ... 2 more documents ]

Removing punctuation

We re-pass through tokens() with remove_punct = TRUE:

Code
toks <- quanteda::tokens(toks, remove_punct = TRUE)
head(toks, n = 15)
Tokens consisting of 8 documents.
doc1 :
 [1] "the"   "of"    "in"    "being" "a"     "ghost" "story" "of"    "is"   
[10] "for"   "the"   "use"  
[ ... and 31,087 more ]

doc2 :
 [1] "a"          "tale"       "of"         "two"        "cities"    
 [6] "a"          "story"      "of"         "the"        "french"    
[11] "revolution" "by"        
[ ... and 133,563 more ]

doc3 :
 [1] "the"      "of"       "this"     "is"       "for"      "the"     
 [7] "use"      "of"       "anyone"   "anywhere" "in"       "the"     
[ ... and 354,628 more ]

doc4 :
 [1] "the"          "of"           "great"        "expectations" "this"        
 [6] "is"           "for"          "the"          "use"          "of"          
[11] "anyone"       "anywhere"    
[ ... and 184,940 more ]

doc5 :
 [1] "the"      "of"       "this"     "is"       "for"      "the"     
 [7] "use"      "of"       "anyone"   "anywhere" "in"       "the"     
[ ... and 104,093 more ]

doc6 :
 [1] "the"        "life"       "and"        "adventures" "of"        
 [6] "containing" "a"          "faithful"   "account"    "of"        
[11] "the"        "fortunes"  
[ ... and 316,256 more ]

[ reached max_ndoc ... 2 more documents ]

Comparing this output to the previous one, punctuation characters have disappeared.

Removing stopwords

Stopwords are very common function words (the, in, a, etc.) that appear across virtually every page of every book. Their high frequency and low semantic content would dominate co-occurrence patterns and make topic interpretation very difficult. We remove the standard English stopword list bundled with quanteda:

Code
toks <- quanteda::tokens_remove(toks,
                                pattern = quanteda::stopwords("english"))
head(toks, n = 15)
Tokens consisting of 8 documents.
doc1 :
 [1] "ghost"        "story"        "use"          "anyone"       "anywhere"    
 [6] "parts"        "world"        "cost"         "almost"       "restrictions"
[11] "whatsoever"   "may"         
[ ... and 14,720 more ]

doc2 :
 [1] "tale"       "two"        "cities"     "story"      "french"    
 [6] "revolution" "--"         "recalled"   "ii"         "mail"      
[11] "chapter"    "iii"       
[ ... and 60,425 more ]

doc3 :
 [1] "use"          "anyone"       "anywhere"     "parts"        "world"       
 [6] "cost"         "almost"       "restrictions" "whatsoever"   "may"         
[11] "copy"         "give"        
[ ... and 156,918 more ]

doc4 :
 [1] "great"        "expectations" "use"          "anyone"       "anywhere"    
 [6] "parts"        "world"        "cost"         "almost"       "restrictions"
[11] "whatsoever"   "may"         
[ ... and 80,745 more ]

doc5 :
 [1] "use"          "anyone"       "anywhere"     "parts"        "world"       
 [6] "cost"         "almost"       "restrictions" "whatsoever"   "may"         
[11] "copy"         "give"        
[ ... and 48,318 more ]

doc6 :
 [1] "life"         "adventures"   "containing"   "faithful"     "account"     
 [6] "fortunes"     "misfortunes"  "uprisings"    "downfallings" "'s"          
[11] "story"        "begun"       
[ ... and 151,542 more ]

[ reached max_ndoc ... 2 more documents ]

The remaining tokens are the content words most likely to be informative for topic modelling.

Chunking

Sentences are too short to contain meaningful co-occurrence patterns. With only eight novels, dividing by book would give us too few documents. We therefore create pseudo-documents by splitting the token stream into chunks of a fixed word count.

The heuristic we start from is that roughly two pages of a novel should contain words relating to a similar theme. At approximately 500 words per page, we begin with chunks of 1,000 words.

First, we unlist the tokens object into a single flat vector:

Code
list_toks <- unlist(toks, use.names = FALSE)
head(list_toks, n = 15)
 [1] "ghost"        "story"        "use"          "anyone"       "anywhere"    
 [6] "parts"        "world"        "cost"         "almost"       "restrictions"
[11] "whatsoever"   "may"          "copy"         "give"         "away"        

Now we define the chunk size and split the vector into pseudo-documents:

Code
chunk          <- 1000
n              <- length(list_toks)
r              <- rep(1:ceiling(n / chunk), each = chunk)[1:n]
chunky_dickens <- split(list_toks, r)

The object r assigns each token a chunk number. Checking its distribution and the total number of chunks:

Code
table(r) |> head(5)
r
   1    2    3    4    5 
1000 1000 1000 1000 1000 
Code
length(chunky_dickens)
[1] 736

We get around 700 pseudo-documents of 1,000 words each (the final chunk is shorter). We convert to a quanteda tokens object and then to a document-feature matrix (DFM):

Code
chunky_toks <- quanteda::tokens(chunky_dickens)
dtm         <- quanteda::dfm(chunky_toks)
dtm
Document-feature matrix of: 736 documents, 28,448 features (97.76% sparse) and 0 docvars.
    features
docs ghost story use anyone anywhere parts world cost almost restrictions
   1     5     5   1      1        1     1     3    1      1            1
   2     0     0   0      0        0     0     0    1      0            0
   3    24     1   0      0        1     0     1    0      1            0
   4    15     0   0      0        0     1     2    0      0            0
   5     7     0   1      0        0     0     1    0      0            0
   6    11     0   0      0        0     0     4    1      0            0
[ reached max_ndoc ... 730 more documents, reached max_nfeat ... 28,438 more features ]

Exploring the Document-Term Matrix

Section Overview

What you will learn: How to compute document frequencies; how to inspect the most and least frequent features; and what the distribution of document frequencies tells us about the trimming decisions we will make

Before fitting any models, it is worth examining the feature frequencies. The docfreq() function counts how many pseudo-documents each feature appears in:

Code
doc_freq <- quanteda::docfreq(dtm)
head(doc_freq, n = 20)
       ghost        story          use       anyone     anywhere        parts 
          58          152          212           66           82           55 
       world         cost       almost restrictions   whatsoever          may 
         371           57          338           11           21          561 
        copy         give         away       re-use        terms     included 
          45          450          628           10          138           28 
       ebook       online 
          12           15 

The twenty-five most frequent features:

Code
head(sort(doc_freq, decreasing = TRUE), n = 25)
   one   said   upon   time    now little   know   well   good    say   come 
   729    727    721    707    704    701    694    679    678    678    675 
  made   much    see   head    man  never    way  great   hand   like    two 
   674    672    668    666    665    665    664    662    657    655    648 
   old  first  think 
   645    636    636 

The most common feature, said, appears in nearly all pseudo-documents — indicating a great deal of speech and dialogue. Several words related to time (time, now, never, first, old) appear in the top 25, suggesting character development. We are also seeing the rewards of our pre-processing: had we not removed punctuation and stopwords, this list would be dominated by the, a, commas, and full stops.

The twenty-five least frequent features:

Code
tail(sort(doc_freq, decreasing = TRUE), n = 25)
            unfatherly             undeniably                  vages 
                     1                      1                      1 
             ev'rythin              sitivated           ungracefully 
                     1                      1                      1 
         unequivocally               absented                   _has 
                     1                      1                      1 
                _taken            accompanies            dissentions 
                     1                      1                      1 
           enlargement                  papas                wardles 
                     1                      1                      1 
               winkles               trundles                   owls 
                     1                      1                      1 
           chroniclers custom--unquestionably            conversable 
                     1                      1                      1 
         unconquerable                retains             juvenility 
                     1                      1                      1 
               idolise 
                     1 

Among these we see features appearing in only one pseudo-document each. Some are processing artefacts; others are rare genuine words. More interestingly, non-standard spellings such as stimilated (from stimulated) and olesome (from wholesome) give us a first flavour of Dickens’ literary realism — his use of non-standard spelling to render the speech of dialect speakers.


Topic Modelling

Section Overview

What you will learn: How to trim the DFM to remove very rare and very frequent features; how to fit a Structural Topic Model with stm(); and how to iteratively improve the model by adjusting the number of topics, document-frequency thresholds, and chunk size

Reproducibility via set.seed()

All stm() calls in this tutorial are immediately preceded by set.seed(42). This ensures that the random initialisation of the STM algorithm starts from the same point every time, producing the same topics on every run. Without a fixed seed, the STM algorithm starts from a random state and produces somewhat different topics on each execution.

Trimming

Before fitting the first model, we trim the DFM to remove features that are too rare or too frequent to be informative:

  • min_termfreq = 2: only features occurring at least twice in the corpus
  • min_docfreq: minimum proportion of pseudo-documents the feature must appear in
  • max_docfreq: maximum proportion of pseudo-documents (removes very common words)
  • docfreq_type = "prop": document frequency expressed as a proportion
Code
dtm_trimmed <- quanteda::dfm_trim(
  dtm,
  min_termfreq = 2,
  min_docfreq  = 0.0001,
  max_docfreq  = 0.15,
  docfreq_type = "prop"
)

Even after removing stopwords, some content words are so frequent across all documents that they would dominate every topic. Trimming is a necessary complement to the earlier stopword removal.


First Model: 10 Topics, 1,000-Word Chunks

The first model is always a starting point. We fit a structural topic model with K = 10 topics:

Code
set.seed(42)
stmOut_1 <- stm::stm(documents = dtm_trimmed,
                     K          = 10,
                     max.em.its = 200)
Beginning Spectral Initialization 
     Calculating the gram matrix...
     Using only 10000 most frequent terms during initialization...
     Finding anchor words...
    ..........
     Recovering initialization...
    ....................................................................................................
Initialization complete.
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 1 (approx. per word bound = -9.453) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 2 (approx. per word bound = -8.720, relative change = 7.755e-02) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 3 (approx. per word bound = -8.665, relative change = 6.301e-03) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 4 (approx. per word bound = -8.653, relative change = 1.360e-03) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 5 (approx. per word bound = -8.648, relative change = 6.857e-04) 
Topic 1: fat, ghost, merry, dodger, squeers 
 Topic 2: baron, lord, bottle, crummles, manager 
 Topic 3: uncle, collector, squeers, guard, nephew 
 Topic 4: doctor, charles, brothers, ned, agnes 
 Topic 5: wery, 'em, wot, wos, magistrate 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’ve, ’em, sikes, beadle 
 Topic 8: prisoner, prison, doctor, madame, village 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’em, thee, fur, thou, ’ly 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 6 (approx. per word bound = -8.643, relative change = 4.656e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 7 (approx. per word bound = -8.640, relative change = 3.552e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 8 (approx. per word bound = -8.638, relative change = 2.733e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 9 (approx. per word bound = -8.636, relative change = 2.190e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 10 (approx. per word bound = -8.635, relative change = 1.881e-04) 
Topic 1: fat, ghost, merry, dodger, nephew 
 Topic 2: baron, manager, lord, bottle, crummles 
 Topic 3: uncle, squeers, collector, guard, nephew 
 Topic 4: doctor, brothers, charles, ned, oliver 
 Topic 5: wery, 'em, wot, wos, magistrate 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’ve, ’em, sikes, —i 
 Topic 8: prisoner, prison, doctor, village, madame 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, fur, thee, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 11 (approx. per word bound = -8.633, relative change = 1.663e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 12 (approx. per word bound = -8.632, relative change = 1.475e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 13 (approx. per word bound = -8.631, relative change = 1.369e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 14 (approx. per word bound = -8.630, relative change = 1.276e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 15 (approx. per word bound = -8.629, relative change = 1.186e-04) 
Topic 1: fat, ghost, spinster, merry, dodger 
 Topic 2: baron, manager, bottle, lord, crummles 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, ned, oliver 
 Topic 5: wery, 'em, wot, wos, magistrate 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’ve, ’em, —i, sikes 
 Topic 8: prisoner, prison, madame, doctor, village 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, fur, thee, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 16 (approx. per word bound = -8.628, relative change = 1.060e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 17 (approx. per word bound = -8.627, relative change = 9.940e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 18 (approx. per word bound = -8.626, relative change = 9.149e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 19 (approx. per word bound = -8.625, relative change = 8.179e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 20 (approx. per word bound = -8.625, relative change = 6.999e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: baron, manager, bottle, lord, crummles 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, ned 
 Topic 5: wery, 'em, wot, wos, magistrate 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’em, —i, ’ve, sikes 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, fur, thee, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 21 (approx. per word bound = -8.624, relative change = 6.402e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 22 (approx. per word bound = -8.624, relative change = 6.110e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 23 (approx. per word bound = -8.623, relative change = 5.593e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 24 (approx. per word bound = -8.623, relative change = 5.336e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 25 (approx. per word bound = -8.622, relative change = 4.752e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: baron, bottle, manager, lord, crummles 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, ned 
 Topic 5: wery, 'em, wos, wot, magistrate 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’em, sikes, ’ve, —i 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 26 (approx. per word bound = -8.622, relative change = 4.276e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 27 (approx. per word bound = -8.621, relative change = 4.187e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 28 (approx. per word bound = -8.621, relative change = 4.053e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 29 (approx. per word bound = -8.621, relative change = 4.147e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 30 (approx. per word bound = -8.620, relative change = 4.311e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: baron, bottle, manager, lord, crummles 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, ned 
 Topic 5: wery, 'em, wos, wot, magistrate 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’em, sikes, ’ve, —i 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 31 (approx. per word bound = -8.620, relative change = 4.298e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 32 (approx. per word bound = -8.620, relative change = 3.975e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 33 (approx. per word bound = -8.619, relative change = 3.879e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 34 (approx. per word bound = -8.619, relative change = 3.651e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 35 (approx. per word bound = -8.619, relative change = 3.419e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: baron, bottle, manager, lord, crummles 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, ned 
 Topic 5: wery, 'em, wos, wot, magistrate 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’em, ’ve, sikes, beadle 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 36 (approx. per word bound = -8.618, relative change = 3.291e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 37 (approx. per word bound = -8.618, relative change = 2.807e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 38 (approx. per word bound = -8.618, relative change = 2.587e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 39 (approx. per word bound = -8.618, relative change = 2.527e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 40 (approx. per word bound = -8.618, relative change = 2.609e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: baron, bottle, manager, lord, crummles 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, ned 
 Topic 5: wery, 'em, wos, wot, ere 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’em, ’ve, beadle, —i 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 41 (approx. per word bound = -8.617, relative change = 2.577e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 42 (approx. per word bound = -8.617, relative change = 2.629e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 43 (approx. per word bound = -8.617, relative change = 2.549e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 44 (approx. per word bound = -8.617, relative change = 2.443e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 45 (approx. per word bound = -8.616, relative change = 2.270e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: baron, bottle, manager, lord, crummles 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, ned 
 Topic 5: wery, 'em, wos, wot, magistrate 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’em, ’ve, beadle, —i 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 46 (approx. per word bound = -8.616, relative change = 2.197e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 47 (approx. per word bound = -8.616, relative change = 2.092e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 48 (approx. per word bound = -8.616, relative change = 2.054e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 49 (approx. per word bound = -8.616, relative change = 2.174e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 50 (approx. per word bound = -8.616, relative change = 2.124e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: baron, bottle, manager, lord, crummles 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, ned 
 Topic 5: wery, 'em, wos, wot, magistrate 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’em, beadle, ’ve, —i 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 51 (approx. per word bound = -8.615, relative change = 2.017e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 52 (approx. per word bound = -8.615, relative change = 1.966e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 53 (approx. per word bound = -8.615, relative change = 2.155e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 54 (approx. per word bound = -8.615, relative change = 2.443e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 55 (approx. per word bound = -8.615, relative change = 2.496e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: sikes, baron, bottle, manager, lord 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, marriage 
 Topic 5: wery, 'em, wos, wot, magistrate 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’em, beadle, ’ve, —i 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 56 (approx. per word bound = -8.614, relative change = 2.428e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 57 (approx. per word bound = -8.614, relative change = 2.379e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 58 (approx. per word bound = -8.614, relative change = 2.322e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 59 (approx. per word bound = -8.614, relative change = 2.256e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 60 (approx. per word bound = -8.614, relative change = 2.099e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: sikes, baron, bottle, oliver, manager 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, marriage 
 Topic 5: wery, 'em, wos, wot, magistrate 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’em, beadle, —i, ’ve 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 61 (approx. per word bound = -8.613, relative change = 2.163e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 62 (approx. per word bound = -8.613, relative change = 2.219e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 63 (approx. per word bound = -8.613, relative change = 1.984e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 64 (approx. per word bound = -8.613, relative change = 1.748e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 65 (approx. per word bound = -8.613, relative change = 1.820e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: sikes, baron, oliver, bottle, manager 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, marriage 
 Topic 5: wery, 'em, wos, wot, magistrate 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’em, beadle, —i, ’ve 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 66 (approx. per word bound = -8.613, relative change = 1.932e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 67 (approx. per word bound = -8.612, relative change = 1.972e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 68 (approx. per word bound = -8.612, relative change = 2.050e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 69 (approx. per word bound = -8.612, relative change = 2.000e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 70 (approx. per word bound = -8.612, relative change = 1.696e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: sikes, baron, oliver, bottle, manager 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, marriage 
 Topic 5: wery, 'em, wos, wot, magistrate 
 Topic 6: traddles, agnes, --’, loved, waiter 
 Topic 7: ’am, ’em, beadle, —i, ’ve 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 71 (approx. per word bound = -8.612, relative change = 1.680e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 72 (approx. per word bound = -8.612, relative change = 1.608e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 73 (approx. per word bound = -8.612, relative change = 1.705e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 74 (approx. per word bound = -8.611, relative change = 1.723e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 75 (approx. per word bound = -8.611, relative change = 1.681e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: sikes, oliver, baron, bottle, lord 
 Topic 3: uncle, squeers, collector, guard, 'am 
 Topic 4: doctor, brothers, charles, oliver, marriage 
 Topic 5: wery, 'em, wos, wot, magistrate 
 Topic 6: traddles, agnes, --’, loved, doctor 
 Topic 7: ’am, ’em, beadle, —i, ’ve 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 76 (approx. per word bound = -8.611, relative change = 1.440e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 77 (approx. per word bound = -8.611, relative change = 1.222e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 78 (approx. per word bound = -8.611, relative change = 1.119e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 79 (approx. per word bound = -8.611, relative change = 1.114e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 80 (approx. per word bound = -8.611, relative change = 1.351e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: sikes, oliver, baron, bottle, lord 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, marriage 
 Topic 5: wery, 'em, wos, wot, magistrate 
 Topic 6: traddles, agnes, --’, loved, doctor 
 Topic 7: ’am, ’em, beadle, —i, ’ve 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 81 (approx. per word bound = -8.611, relative change = 1.861e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 82 (approx. per word bound = -8.610, relative change = 1.264e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 83 (approx. per word bound = -8.610, relative change = 1.321e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 84 (approx. per word bound = -8.610, relative change = 1.403e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 85 (approx. per word bound = -8.610, relative change = 1.442e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: sikes, oliver, baron, bottle, lord 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, marriage 
 Topic 5: wery, 'em, wos, wot, magistrate 
 Topic 6: traddles, agnes, --’, loved, doctor 
 Topic 7: ’am, ’em, —i, beadle, pip 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 86 (approx. per word bound = -8.610, relative change = 1.463e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 87 (approx. per word bound = -8.610, relative change = 1.542e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 88 (approx. per word bound = -8.610, relative change = 1.518e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 89 (approx. per word bound = -8.610, relative change = 1.382e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 90 (approx. per word bound = -8.609, relative change = 1.311e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: sikes, oliver, baron, bottle, lord 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, marriage 
 Topic 5: wery, 'em, wos, wot, 'n 
 Topic 6: traddles, agnes, --’, loved, doctor 
 Topic 7: ’am, ’em, —i, beadle, pip 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 91 (approx. per word bound = -8.609, relative change = 1.252e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 92 (approx. per word bound = -8.609, relative change = 1.203e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 93 (approx. per word bound = -8.609, relative change = 1.135e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 94 (approx. per word bound = -8.609, relative change = 1.112e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 95 (approx. per word bound = -8.609, relative change = 1.083e-05) 
Topic 1: fat, ghost, spinster, merry, nephew 
 Topic 2: sikes, oliver, baron, bottle, yer 
 Topic 3: uncle, squeers, collector, guard, chaise 
 Topic 4: doctor, brothers, charles, oliver, marriage 
 Topic 5: wery, 'em, wos, wot, 'n 
 Topic 6: traddles, agnes, --’, loved, doctor 
 Topic 7: ’am, ’em, —i, beadle, pip 
 Topic 8: prisoner, prison, sea, madame, crowd 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, fur, thou 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 96 (approx. per word bound = -8.609, relative change = 1.019e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Model Converged 

K = 10 tells the algorithm to find ten topics. max.em.its = 200 caps the number of EM iterations.

The raw model object:

Code
stmOut_1
A topic model with 10 topics, 736 documents and a 18511 word dictionary.

We plot the top ten keywords for each topic:

Code
plot(stmOut_1, n = 10)

This first model already gives some indication that the topic modelling is working. We see one topic containing contractions (’ll, ’em, ’re, ’ve), reflecting informal speech settings. Another contains alternate spellings (wery, wot, ai, wos), pointing to dialogue written in dialect. Both topics are directly relevant to our research questions: the representation of informal and dialectal speech is a hallmark of Dickens’ literary realism, and his willingness to give voice to dialect speakers reflects his progressive attitudes towards poverty.

Beyond these, a topic containing prisoner and prison, and another with grave and earth, give some thematic coherence. However, many topics are still difficult to interpret — exactly what we would expect from a first-pass model with coarse chunks and only ten topics.


Second Model: 20 Topics

Sometimes the number of topics prevents the model from showing co-occurrences at the required granularity. We give it space for 20 topics:

Code
set.seed(42)
stmOut_2 <- stm::stm(documents = dtm_trimmed,
                     K          = 20,
                     max.em.its = 200)
Beginning Spectral Initialization 
     Calculating the gram matrix...
     Using only 10000 most frequent terms during initialization...
     Finding anchor words...
    ....................
     Recovering initialization...
    ....................................................................................................
Initialization complete.
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 1 (approx. per word bound = -9.376) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 2 (approx. per word bound = -8.503, relative change = 9.314e-02) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 3 (approx. per word bound = -8.432, relative change = 8.291e-03) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 4 (approx. per word bound = -8.411, relative change = 2.488e-03) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 5 (approx. per word bound = -8.404, relative change = 9.160e-04) 
Topic 1: merry, coachman, guard, nephew, attachment 
 Topic 2: baron, bottle, pipe, lord, punch 
 Topic 3: uncle, ’ly, widow, collector, 'am 
 Topic 4: oliver, merry, relations, exists, o'clock 
 Topic 5: magistrate, crowd, officer, attorney, pounds 
 Topic 6: traddles, ’ly, baby, cart, yard 
 Topic 7: ’am, sparsit, ith, ’th, wath 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’em, ’am, thee, thou, fur 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, occasions, ‘“, respectable, attended 
 Topic 13: prisoner, prison, judge, jury, sea 
 Topic 14: fat, brothers, charles, squeers, nicholas 
 Topic 15: wery, 'em, wos, wot, 'n 
 Topic 16: 'am, cook, squeers, ma'am, female 
 Topic 17: manager, pip, stage, pipe, river 
 Topic 18: ghost, sisters, trees, loved, goblin 
 Topic 19: traddles, agnes, --’, loved, papa 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 6 (approx. per word bound = -8.400, relative change = 4.880e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 7 (approx. per word bound = -8.397, relative change = 3.299e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 8 (approx. per word bound = -8.395, relative change = 2.169e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 9 (approx. per word bound = -8.394, relative change = 1.543e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 10 (approx. per word bound = -8.393, relative change = 1.377e-04) 
Topic 1: fat, horses, 'em, coachman, chaise 
 Topic 2: baron, bottle, pipe, lord, punch 
 Topic 3: uncle, widow, ’ly, 'am, collector 
 Topic 4: oliver, relations, merry, o'clock, servants 
 Topic 5: magistrate, crowd, officer, attorney, pounds 
 Topic 6: traddles, ’ly, baby, cart, waiter 
 Topic 7: ’am, sparsit, ith, ’th, wath 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’em, thee, ’am, thou, fur 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, occasions, ‘“, respectable, attended 
 Topic 13: prisoner, prison, jury, sea, judge 
 Topic 14: fat, brothers, charles, squeers, ,' 
 Topic 15: wery, 'em, wos, wot, 'n 
 Topic 16: 'am, cook, squeers, ma'am, female 
 Topic 17: manager, pip, stage, sergeant, river 
 Topic 18: ghost, sisters, trees, loved, goblin 
 Topic 19: traddles, agnes, --’, loved, papa 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 11 (approx. per word bound = -8.392, relative change = 1.278e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 12 (approx. per word bound = -8.390, relative change = 1.244e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 13 (approx. per word bound = -8.389, relative change = 1.189e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 14 (approx. per word bound = -8.389, relative change = 9.105e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 15 (approx. per word bound = -8.388, relative change = 8.176e-05) 
Topic 1: chaise, fat, 'em, horses, coachman 
 Topic 2: baron, bottle, pipe, lord, punch 
 Topic 3: uncle, widow, 'am, collector, niece 
 Topic 4: oliver, beadle, relations, merry, o'clock 
 Topic 5: magistrate, crowd, officer, attorney, pounds 
 Topic 6: traddles, ’ly, baby, waiter, cart 
 Topic 7: ’am, sparsit, ith, ’th, wath 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’em, ’ly, thee, thou, ’am 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, occasions, ‘“, respectable, boat 
 Topic 13: prisoner, prison, sea, jury, judge 
 Topic 14: fat, brothers, charles, squeers, ,' 
 Topic 15: wery, 'em, wos, wot, 'n 
 Topic 16: 'am, cook, squeers, ma'am, female 
 Topic 17: manager, pip, stage, sergeant, pipe 
 Topic 18: ghost, sisters, trees, goblin, loved 
 Topic 19: traddles, agnes, --’, loved, papa 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 16 (approx. per word bound = -8.387, relative change = 7.978e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 17 (approx. per word bound = -8.387, relative change = 7.201e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 18 (approx. per word bound = -8.386, relative change = 6.866e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 19 (approx. per word bound = -8.386, relative change = 6.872e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 20 (approx. per word bound = -8.385, relative change = 6.280e-05) 
Topic 1: chaise, 'em, fat, horses, coachman 
 Topic 2: baron, bottle, pipe, lord, punch 
 Topic 3: uncle, widow, 'am, collector, niece 
 Topic 4: beadle, oliver, relations, societies, merry 
 Topic 5: magistrate, crowd, officer, attorney, judge 
 Topic 6: traddles, ’ly, waiter, baby, cart 
 Topic 7: ’am, sparsit, ith, ’th, wath 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, thou, ’am 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, occasions, ‘“, respectable, boat 
 Topic 13: prisoner, prison, sea, jury, judge 
 Topic 14: fat, brothers, charles, squeers, ,' 
 Topic 15: wery, wos, 'em, wot, 'n 
 Topic 16: 'am, cook, squeers, ma'am, female 
 Topic 17: manager, pip, stage, convict, sergeant 
 Topic 18: ghost, sisters, trees, goblin, loved 
 Topic 19: traddles, agnes, --’, loved, papa 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 21 (approx. per word bound = -8.385, relative change = 5.646e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 22 (approx. per word bound = -8.384, relative change = 4.600e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 23 (approx. per word bound = -8.384, relative change = 4.296e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 24 (approx. per word bound = -8.383, relative change = 4.224e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 25 (approx. per word bound = -8.383, relative change = 4.107e-05) 
Topic 1: chaise, horses, 'em, fat, coachman 
 Topic 2: baron, bottle, pipe, lord, punch 
 Topic 3: uncle, widow, 'am, collector, niece 
 Topic 4: societies, learned, relations, oliver, fat 
 Topic 5: magistrate, crowd, officer, attorney, judge 
 Topic 6: traddles, ’ly, waiter, baby, cart 
 Topic 7: ’am, sparsit, ith, ’th, wath 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, thou, ’am 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, occasions, ‘“, respectable, boat 
 Topic 13: prisoner, prison, sea, jury, prisoners 
 Topic 14: fat, brothers, charles, ,', squeers 
 Topic 15: wery, wos, wot, 'em, 'n 
 Topic 16: 'am, cook, squeers, ma'am, female 
 Topic 17: manager, pip, stage, convict, sergeant 
 Topic 18: ghost, sisters, trees, goblin, monks 
 Topic 19: traddles, agnes, --’, loved, papa 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 26 (approx. per word bound = -8.383, relative change = 4.044e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 27 (approx. per word bound = -8.383, relative change = 3.549e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 28 (approx. per word bound = -8.382, relative change = 3.021e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 29 (approx. per word bound = -8.382, relative change = 2.846e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 30 (approx. per word bound = -8.382, relative change = 2.589e-05) 
Topic 1: chaise, horses, 'em, fat, coachman 
 Topic 2: baron, bottle, lord, pipe, punch 
 Topic 3: uncle, widow, 'am, collector, niece 
 Topic 4: learned, societies, fat, relations, oliver 
 Topic 5: magistrate, crowd, officer, attorney, judge 
 Topic 6: traddles, ’ly, waiter, baby, cart 
 Topic 7: ’am, sparsit, ith, ’th, wath 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, thou, ’am 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, occasions, ‘“, respectable, boat 
 Topic 13: prisoner, prison, sea, jury, prisoners 
 Topic 14: fat, brothers, charles, ,', squeers 
 Topic 15: wery, wos, wot, 'em, 'n 
 Topic 16: 'am, cook, squeers, ma'am, female 
 Topic 17: manager, pip, stage, convict, sergeant 
 Topic 18: ghost, sisters, trees, goblin, monks 
 Topic 19: traddles, agnes, --’, loved, papa 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 31 (approx. per word bound = -8.382, relative change = 2.610e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 32 (approx. per word bound = -8.381, relative change = 2.592e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 33 (approx. per word bound = -8.381, relative change = 2.818e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 34 (approx. per word bound = -8.381, relative change = 3.050e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 35 (approx. per word bound = -8.381, relative change = 2.496e-05) 
Topic 1: chaise, horses, fat, 'em, coachman 
 Topic 2: baron, bottle, pipe, lord, punch 
 Topic 3: uncle, widow, 'am, collector, niece 
 Topic 4: learned, societies, fat, oliver, relations 
 Topic 5: magistrate, crowd, officer, attorney, judge 
 Topic 6: traddles, ’ly, waiter, baby, cart 
 Topic 7: ’am, sparsit, ith, ’th, wath 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, thou, ’am 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, ‘“, occasions, boat, respectable 
 Topic 13: prisoner, prison, sea, jury, prisoners 
 Topic 14: fat, brothers, charles, ,', nicholas 
 Topic 15: wery, wos, wot, 'em, 'n 
 Topic 16: 'am, cook, squeers, ma'am, female 
 Topic 17: manager, pip, stage, convict, sergeant 
 Topic 18: ghost, sisters, trees, goblin, monks 
 Topic 19: traddles, agnes, --’, loved, papa 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 36 (approx. per word bound = -8.380, relative change = 2.355e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 37 (approx. per word bound = -8.380, relative change = 2.551e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 38 (approx. per word bound = -8.380, relative change = 2.546e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 39 (approx. per word bound = -8.380, relative change = 2.337e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 40 (approx. per word bound = -8.380, relative change = 2.418e-05) 
Topic 1: chaise, horses, fat, 'em, coachman 
 Topic 2: baron, bottle, undertaker, pipe, lord 
 Topic 3: uncle, widow, 'am, collector, niece 
 Topic 4: learned, societies, fat, board, oliver 
 Topic 5: magistrate, crowd, officer, attorney, judge 
 Topic 6: traddles, ’ly, waiter, baby, cart 
 Topic 7: ’am, sparsit, ith, ’th, wath 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, thou, ’am 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, ‘“, occasions, boat, respectable 
 Topic 13: prisoner, prison, sea, jury, prisoners 
 Topic 14: fat, brothers, charles, ,', nicholas 
 Topic 15: wery, wos, wot, 'em, 'n 
 Topic 16: 'am, cook, squeers, ma'am, female 
 Topic 17: manager, pip, stage, convict, sergeant 
 Topic 18: ghost, sisters, trees, goblin, monks 
 Topic 19: traddles, agnes, --’, loved, umble 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 41 (approx. per word bound = -8.379, relative change = 2.642e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 42 (approx. per word bound = -8.379, relative change = 2.887e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 43 (approx. per word bound = -8.379, relative change = 2.899e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 44 (approx. per word bound = -8.379, relative change = 2.762e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 45 (approx. per word bound = -8.379, relative change = 2.236e-05) 
Topic 1: chaise, horses, fat, 'em, coachman 
 Topic 2: baron, bottle, undertaker, pipe, lord 
 Topic 3: uncle, widow, 'am, collector, niece 
 Topic 4: learned, societies, board, fat, oliver 
 Topic 5: magistrate, crowd, officer, attorney, judge 
 Topic 6: traddles, ’ly, waiter, baby, cart 
 Topic 7: ’am, sparsit, ith, ’th, wath 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, thou, ’am 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, ‘“, occasions, boat, respectable 
 Topic 13: prisoner, prison, sea, jury, prisoners 
 Topic 14: fat, brothers, charles, ,', nicholas 
 Topic 15: wery, wos, wot, 'em, 'n 
 Topic 16: 'am, cook, squeers, ma'am, female 
 Topic 17: manager, pip, stage, convict, sergeant 
 Topic 18: ghost, sisters, trees, goblin, monks 
 Topic 19: traddles, agnes, --’, loved, umble 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 46 (approx. per word bound = -8.378, relative change = 1.849e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 47 (approx. per word bound = -8.378, relative change = 1.597e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 48 (approx. per word bound = -8.378, relative change = 1.722e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 49 (approx. per word bound = -8.378, relative change = 1.637e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 50 (approx. per word bound = -8.378, relative change = 1.637e-05) 
Topic 1: chaise, horses, fat, 'em, hostler 
 Topic 2: baron, bottle, undertaker, pipe, lord 
 Topic 3: uncle, widow, 'am, collector, niece 
 Topic 4: learned, societies, board, oliver, fat 
 Topic 5: magistrate, crowd, officer, attorney, judge 
 Topic 6: traddles, ’ly, waiter, baby, cart 
 Topic 7: ’am, sparsit, ith, ’th, wath 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, thou, ’am 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, ‘“, occasions, boat, respectable 
 Topic 13: prisoner, prison, sea, jury, prisoners 
 Topic 14: fat, brothers, charles, ,', nicholas 
 Topic 15: wery, wos, wot, 'em, 'n 
 Topic 16: 'am, cook, squeers, ma'am, female 
 Topic 17: manager, pip, stage, convict, sergeant 
 Topic 18: ghost, sisters, trees, goblin, monks 
 Topic 19: traddles, agnes, --’, loved, umble 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 51 (approx. per word bound = -8.378, relative change = 1.646e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 52 (approx. per word bound = -8.378, relative change = 1.708e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 53 (approx. per word bound = -8.377, relative change = 1.535e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 54 (approx. per word bound = -8.377, relative change = 1.259e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 55 (approx. per word bound = -8.377, relative change = 1.115e-05) 
Topic 1: chaise, horses, fat, 'em, hostler 
 Topic 2: baron, bottle, undertaker, pipe, lord 
 Topic 3: uncle, widow, 'am, collector, niece 
 Topic 4: learned, societies, board, beadle, oliver 
 Topic 5: magistrate, crowd, officer, attorney, judge 
 Topic 6: traddles, ’ly, waiter, baby, cart 
 Topic 7: ’am, sparsit, ith, ’th, wath 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, thou, ’am 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, ‘“, occasions, boat, respectable 
 Topic 13: prisoner, prison, sea, jury, prisoners 
 Topic 14: fat, brothers, charles, ,', nicholas 
 Topic 15: wery, wos, wot, 'em, 'n 
 Topic 16: 'am, cook, squeers, ma'am, female 
 Topic 17: manager, pip, convict, stage, sergeant 
 Topic 18: ghost, sisters, trees, goblin, monks 
 Topic 19: traddles, agnes, --’, loved, umble 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 56 (approx. per word bound = -8.377, relative change = 1.220e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 57 (approx. per word bound = -8.377, relative change = 1.253e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 58 (approx. per word bound = -8.377, relative change = 1.602e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 59 (approx. per word bound = -8.377, relative change = 1.436e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 60 (approx. per word bound = -8.377, relative change = 1.245e-05) 
Topic 1: chaise, horses, fat, 'em, hostler 
 Topic 2: baron, bottle, undertaker, pipe, lord 
 Topic 3: uncle, widow, 'am, collector, niece 
 Topic 4: beadle, board, oliver, learned, societies 
 Topic 5: magistrate, crowd, officer, attorney, judge 
 Topic 6: traddles, ’ly, baby, waiter, cart 
 Topic 7: ’am, sparsit, ith, ’th, wath 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, thou, ’am 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, ‘“, occasions, boat, respectable 
 Topic 13: prisoner, prison, sea, jury, prisoners 
 Topic 14: fat, brothers, charles, ,', nicholas 
 Topic 15: wery, wos, wot, 'em, 'n 
 Topic 16: 'am, cook, squeers, ma'am, madman 
 Topic 17: manager, pip, convict, stage, sergeant 
 Topic 18: ghost, sisters, trees, goblin, monks 
 Topic 19: traddles, agnes, --’, loved, umble 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 61 (approx. per word bound = -8.376, relative change = 1.693e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 62 (approx. per word bound = -8.376, relative change = 1.617e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 63 (approx. per word bound = -8.376, relative change = 1.814e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 64 (approx. per word bound = -8.376, relative change = 1.747e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 65 (approx. per word bound = -8.376, relative change = 1.686e-05) 
Topic 1: chaise, horses, 'em, fat, hostler 
 Topic 2: baron, bottle, undertaker, pipe, lord 
 Topic 3: uncle, widow, 'am, collector, niece 
 Topic 4: beadle, oliver, board, learned, societies 
 Topic 5: magistrate, crowd, officer, attorney, judge 
 Topic 6: traddles, ’ly, baby, waiter, cart 
 Topic 7: ’am, sparsit, ith, ’th, whelp 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, thou, ’am 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, ‘“, occasions, respectable, boat 
 Topic 13: prisoner, prison, sea, jury, prisoners 
 Topic 14: fat, brothers, charles, ,', nicholas 
 Topic 15: wery, wos, wot, 'n, 'em 
 Topic 16: 'am, cook, ma'am, squeers, madman 
 Topic 17: manager, pip, convict, stage, sergeant 
 Topic 18: ghost, sisters, trees, goblin, monks 
 Topic 19: traddles, agnes, --’, loved, umble 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 66 (approx. per word bound = -8.376, relative change = 1.680e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 67 (approx. per word bound = -8.376, relative change = 1.686e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 68 (approx. per word bound = -8.375, relative change = 1.655e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 69 (approx. per word bound = -8.375, relative change = 1.435e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 70 (approx. per word bound = -8.375, relative change = 1.149e-05) 
Topic 1: chaise, horses, 'em, fat, hostler 
 Topic 2: baron, bottle, undertaker, pipe, lord 
 Topic 3: uncle, widow, 'am, collector, niece 
 Topic 4: beadle, oliver, board, learned, societies 
 Topic 5: magistrate, crowd, officer, attorney, judge 
 Topic 6: traddles, ’ly, baby, waiter, cart 
 Topic 7: ’am, sparsit, ith, ’th, whelp 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, thou, ’am 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, ‘“, occasions, respectable, boat 
 Topic 13: prisoner, prison, sea, jury, prisoners 
 Topic 14: fat, brothers, charles, ,', nicholas 
 Topic 15: wery, wos, wot, 'n, ere 
 Topic 16: 'am, cook, ma'am, squeers, madman 
 Topic 17: manager, pip, convict, stage, sergeant 
 Topic 18: ghost, sisters, trees, goblin, monks 
 Topic 19: traddles, agnes, --’, loved, umble 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 71 (approx. per word bound = -8.375, relative change = 1.174e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 72 (approx. per word bound = -8.375, relative change = 1.220e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 73 (approx. per word bound = -8.375, relative change = 1.089e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 74 (approx. per word bound = -8.375, relative change = 1.041e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 75 (approx. per word bound = -8.375, relative change = 1.072e-05) 
Topic 1: chaise, horses, 'em, fat, hostler 
 Topic 2: baron, bottle, undertaker, pipe, lord 
 Topic 3: uncle, widow, 'am, collector, niece 
 Topic 4: beadle, oliver, board, learned, societies 
 Topic 5: magistrate, crowd, officer, attorney, pounds 
 Topic 6: traddles, ’ly, baby, waiter, cart 
 Topic 7: ’am, sparsit, ith, ’th, whelp 
 Topic 8: madame, spy, knitting, vengeance, wot 
 Topic 9: project, works, electronic, foundation, copyright 
 Topic 10: ’ly, ’em, thee, thou, ’am 
 Topic 11: doctor, sikes, oliver, yer, ’ve 
 Topic 12: squeers, ‘“, occasions, respectable, boat 
 Topic 13: prisoner, prison, sea, jury, prisoners 
 Topic 14: fat, brothers, charles, ,', nicholas 
 Topic 15: wery, wos, wot, 'n, ere 
 Topic 16: 'am, cook, ma'am, squeers, madman 
 Topic 17: manager, pip, convict, stage, sergeant 
 Topic 18: ghost, sisters, trees, goblin, monks 
 Topic 19: traddles, agnes, --’, loved, umble 
 Topic 20: roads, village, mender, fountain, carriage 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Model Converged 
Code
plot(stmOut_2, n = 10)

This already presents quite an improvement. One topic pertains to water, with words like boat, river, wind, tide, sea and marshes. Another reflects a young man’s education, featuring boys, school, schoolmaster, son and desk. A topic pointing towards Dickens’ interest in justice contains prisoner, prison, jury, court, citizen and witness, while a closely related topic reflects the institutional dimension with magistrate, office, clerk, attorney, officer and judge. A topic pointing towards the otherworldly contains beneath, grave, goblin, churchyard, earth, church and wind — capturing the ambiance Dickens conjures at the opening of Great Expectations. We also now have several distinct topics capturing contractions and alternate spellings.

Despite this improvement, several topics remain hard to interpret. But with this single adjustment — increasing K from 10 to 20 — we have taken a meaningful step toward interpretable content analysis.


Third Model: More Rare Words

We can amplify the role of rare words by tightening the upper document-frequency threshold to 15%:

Code
dtm_trimmed_rare <- quanteda::dfm_trim(
  dtm,
  min_termfreq = 2,
  min_docfreq  = 0.005,
  max_docfreq  = 0.15,
  docfreq_type = "prop"
)

set.seed(42)
stmOut_3 <- stm::stm(documents = dtm_trimmed_rare,
                     K          = 20,
                     max.em.its = 200)
Beginning Spectral Initialization 
     Calculating the gram matrix...
     Using only 10000 most frequent terms during initialization...
     Finding anchor words...
    ....................
     Recovering initialization...
    ....................................................................................................
Initialization complete.
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 1 (approx. per word bound = -8.892) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 2 (approx. per word bound = -8.380, relative change = 5.749e-02) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 3 (approx. per word bound = -8.317, relative change = 7.599e-03) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 4 (approx. per word bound = -8.296, relative change = 2.505e-03) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 5 (approx. per word bound = -8.285, relative change = 1.309e-03) 
Topic 1: steady, attended, nephew, faithful, hats 
 Topic 2: baron, bottle, lord, sisters, pipe 
 Topic 3: misery, doctor, scenes, brief, poverty 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, merry, pointed, passes 
 Topic 6: ’am, sparsit, ith, ma’am, ’th 
 Topic 7: prisoner, judge, prison, court, jury 
 Topic 8: ’em, thou, thee, fur, tis 
 Topic 9: fat, magistrate, 'am, officer, crowd 
 Topic 10: wery, wos, wot, 'em, 'n 
 Topic 11: pip, sikes, ’ve, forge, ’em 
 Topic 12: doctor, agnes, baby, mama, miss 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, crummles, squeers, stage, collector 
 Topic 15: charles, brothers, ned, fat, nicholas 
 Topic 16: oliver, yer, bottle, dodger, squeers 
 Topic 17: traddles, agnes, --’, loved, pounds 
 Topic 18: uncle, guard, horses, niece, collector 
 Topic 19: madame, village, spy, roads, mender 
 Topic 20: umble, agnes, copperfield, heep, ‘“ 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 6 (approx. per word bound = -8.278, relative change = 7.990e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 7 (approx. per word bound = -8.274, relative change = 5.222e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 8 (approx. per word bound = -8.271, relative change = 3.637e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 9 (approx. per word bound = -8.269, relative change = 2.748e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 10 (approx. per word bound = -8.267, relative change = 2.293e-04) 
Topic 1: nephew, clerk, yard, steady, attended 
 Topic 2: baron, bottle, sisters, lord, punch 
 Topic 3: monks, intelligence, misery, tonight, promise 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, merry, pointed, reading 
 Topic 6: ’am, sparsit, ith, ma’am, ’em 
 Topic 7: prisoner, judge, prison, court, jury 
 Topic 8: thee, thou, ’em, fur, tis 
 Topic 9: fat, 'am, magistrate, officer, spinster 
 Topic 10: wery, wos, wot, 'n, 'em 
 Topic 11: pip, sikes, forge, ’ve, ’em 
 Topic 12: doctor, agnes, baby, miss, mama 
 Topic 13: ’ly, boat, sea, river, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: brothers, charles, ned, fat, nicholas 
 Topic 16: oliver, yer, dodger, sikes, bottle 
 Topic 17: traddles, agnes, --’, papa, pounds 
 Topic 18: uncle, guard, horses, collector, mail 
 Topic 19: madame, village, spy, roads, mender 
 Topic 20: umble, agnes, copperfield, heep, ‘“ 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 11 (approx. per word bound = -8.265, relative change = 2.051e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 12 (approx. per word bound = -8.264, relative change = 1.836e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 13 (approx. per word bound = -8.262, relative change = 1.655e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 14 (approx. per word bound = -8.261, relative change = 1.733e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 15 (approx. per word bound = -8.260, relative change = 1.721e-04) 
Topic 1: nephew, clerk, yard, uncle, ’ly 
 Topic 2: baron, bottle, lord, sisters, punch 
 Topic 3: monks, intelligence, tonight, misery, prison 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, merry, pointed, reading 
 Topic 6: ’am, sparsit, ith, ma’am, ’em 
 Topic 7: prisoner, judge, prison, court, jury 
 Topic 8: thee, thou, ’em, fur, tis 
 Topic 9: fat, 'am, magistrate, officer, spinster 
 Topic 10: wery, wos, wot, 'n, 'em 
 Topic 11: pip, forge, ’ve, convict, marshes 
 Topic 12: doctor, agnes, baby, miss, mama 
 Topic 13: ’ly, boat, sea, river, tide 
 Topic 14: manager, crummles, squeers, stage, collector 
 Topic 15: brothers, charles, ned, fat, nicholas 
 Topic 16: sikes, oliver, yer, dodger, ’ve 
 Topic 17: traddles, agnes, --’, papa, pounds 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, village, spy, roads, mender 
 Topic 20: umble, agnes, copperfield, heep, ‘“ 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 16 (approx. per word bound = -8.258, relative change = 1.538e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 17 (approx. per word bound = -8.257, relative change = 1.416e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 18 (approx. per word bound = -8.256, relative change = 1.326e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 19 (approx. per word bound = -8.255, relative change = 1.220e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 20 (approx. per word bound = -8.254, relative change = 1.020e-04) 
Topic 1: nephew, clerk, yard, uncle, ’ly 
 Topic 2: baron, lord, bottle, wititterly, sisters 
 Topic 3: monks, intelligence, misery, tonight, oliver 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, merry, pointed, reading 
 Topic 6: ’am, sparsit, ith, ma’am, ’em 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, ’em, fur, tis 
 Topic 9: fat, 'am, magistrate, officer, spinster 
 Topic 10: wery, wos, wot, 'n, 'em 
 Topic 11: pip, forge, convict, marshes, chap 
 Topic 12: doctor, agnes, baby, miss, cousin 
 Topic 13: ’ly, boat, sea, river, tide 
 Topic 14: manager, crummles, squeers, stage, collector 
 Topic 15: brothers, charles, ned, fat, nicholas 
 Topic 16: sikes, oliver, yer, dodger, ’ve 
 Topic 17: traddles, agnes, --’, papa, waiter 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, village, spy, roads, mender 
 Topic 20: umble, agnes, copperfield, heep, ‘“ 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 21 (approx. per word bound = -8.253, relative change = 9.149e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 22 (approx. per word bound = -8.253, relative change = 8.596e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 23 (approx. per word bound = -8.252, relative change = 8.224e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 24 (approx. per word bound = -8.251, relative change = 8.567e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 25 (approx. per word bound = -8.251, relative change = 7.979e-05) 
Topic 1: nephew, clerk, yard, uncle, ’ly 
 Topic 2: baron, lord, wititterly, bottle, sisters 
 Topic 3: monks, intelligence, misery, tonight, oliver 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, pointed, merry, contrary 
 Topic 6: ’am, sparsit, ith, ma’am, ’em 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, ’em, fur, tis 
 Topic 9: fat, 'am, magistrate, spinster, officer 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, convict, marshes, chap 
 Topic 12: doctor, agnes, baby, miss, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, crummles, squeers, stage, collector 
 Topic 15: brothers, charles, ned, fat, nicholas 
 Topic 16: sikes, oliver, yer, dodger, ’ve 
 Topic 17: traddles, agnes, --’, papa, waiter 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: umble, agnes, copperfield, heep, ‘“ 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 26 (approx. per word bound = -8.250, relative change = 7.852e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 27 (approx. per word bound = -8.249, relative change = 7.503e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 28 (approx. per word bound = -8.249, relative change = 6.837e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 29 (approx. per word bound = -8.248, relative change = 6.246e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 30 (approx. per word bound = -8.248, relative change = 5.591e-05) 
Topic 1: nephew, clerk, yard, ’ly, uncle 
 Topic 2: baron, lord, wititterly, bottle, sisters 
 Topic 3: monks, intelligence, misery, tonight, oliver 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, pointed, contrary, merry 
 Topic 6: ’am, sparsit, ith, ma’am, ’em 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, ’em, fur, tis 
 Topic 9: fat, 'am, magistrate, spinster, officer 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, convict, marshes, chap 
 Topic 12: doctor, agnes, baby, miss, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, crummles, squeers, stage, collector 
 Topic 15: brothers, charles, ned, nicholas, fat 
 Topic 16: sikes, oliver, yer, dodger, ’ve 
 Topic 17: traddles, agnes, --’, papa, waiter 
 Topic 18: uncle, guard, horses, mail, chaise 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 31 (approx. per word bound = -8.247, relative change = 5.575e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 32 (approx. per word bound = -8.247, relative change = 5.382e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 33 (approx. per word bound = -8.247, relative change = 5.074e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 34 (approx. per word bound = -8.246, relative change = 4.962e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 35 (approx. per word bound = -8.246, relative change = 4.750e-05) 
Topic 1: nephew, clerk, yard, ’ly, uncle 
 Topic 2: baron, lord, wititterly, bottle, sisters 
 Topic 3: monks, intelligence, misery, tonight, oliver 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, merry 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, fur, ’em, tis 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, convict, marshes, chap 
 Topic 12: doctor, agnes, baby, miss, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, crummles, squeers, stage, collector 
 Topic 15: brothers, charles, ned, nicholas, fat 
 Topic 16: sikes, oliver, yer, dodger, ’ve 
 Topic 17: traddles, agnes, --’, waiter, papa 
 Topic 18: uncle, guard, horses, mail, chaise 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 36 (approx. per word bound = -8.245, relative change = 4.512e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 37 (approx. per word bound = -8.245, relative change = 4.330e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 38 (approx. per word bound = -8.245, relative change = 4.475e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 39 (approx. per word bound = -8.244, relative change = 4.656e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 40 (approx. per word bound = -8.244, relative change = 4.235e-05) 
Topic 1: nephew, clerk, yard, ’ly, uncle 
 Topic 2: baron, lord, wititterly, bottle, sisters 
 Topic 3: monks, intelligence, misery, tonight, oliver 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, merry 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, fur, tis, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, convict, marshes, chap 
 Topic 12: doctor, agnes, baby, loved, miss 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, crummles, squeers, stage, collector 
 Topic 15: brothers, charles, ned, nicholas, fat 
 Topic 16: oliver, sikes, yer, ’em, ’ve 
 Topic 17: traddles, --’, agnes, waiter, papa 
 Topic 18: uncle, guard, horses, mail, chaise 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, heep, copperfield, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 41 (approx. per word bound = -8.244, relative change = 3.791e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 42 (approx. per word bound = -8.243, relative change = 3.585e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 43 (approx. per word bound = -8.243, relative change = 3.486e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 44 (approx. per word bound = -8.243, relative change = 3.458e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 45 (approx. per word bound = -8.242, relative change = 3.632e-05) 
Topic 1: nephew, clerk, yard, ’ly, uncle 
 Topic 2: baron, lord, wititterly, bottle, sisters 
 Topic 3: monks, intelligence, misery, tonight, oliver 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, merry 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, fur, tis, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, convict, marshes, chap 
 Topic 12: doctor, agnes, baby, loved, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: brothers, charles, ned, nicholas, fat 
 Topic 16: oliver, sikes, yer, ’em, ’ve 
 Topic 17: traddles, --’, agnes, waiter, papa 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, heep, copperfield, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 46 (approx. per word bound = -8.242, relative change = 3.539e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 47 (approx. per word bound = -8.242, relative change = 3.385e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 48 (approx. per word bound = -8.242, relative change = 3.214e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 49 (approx. per word bound = -8.241, relative change = 2.977e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 50 (approx. per word bound = -8.241, relative change = 2.831e-05) 
Topic 1: nephew, clerk, yard, ’ly, uncle 
 Topic 2: baron, lord, bottle, wititterly, sisters 
 Topic 3: monks, intelligence, misery, tonight, oliver 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, merry 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, fur, tis, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, convict, marshes, chap 
 Topic 12: doctor, agnes, baby, loved, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: brothers, charles, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, ’em, beadle 
 Topic 17: traddles, --’, agnes, waiter, papa 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, heep, copperfield, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 51 (approx. per word bound = -8.241, relative change = 2.890e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 52 (approx. per word bound = -8.241, relative change = 3.033e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 53 (approx. per word bound = -8.240, relative change = 2.978e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 54 (approx. per word bound = -8.240, relative change = 3.221e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 55 (approx. per word bound = -8.240, relative change = 3.348e-05) 
Topic 1: nephew, clerk, yard, ’ly, uncle 
 Topic 2: baron, lord, bottle, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, merry 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, fur, tis, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, convict, marshes, chap 
 Topic 12: doctor, agnes, baby, loved, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: brothers, charles, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, ’em, beadle 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, heep, copperfield, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 56 (approx. per word bound = -8.240, relative change = 3.042e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 57 (approx. per word bound = -8.239, relative change = 2.914e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 58 (approx. per word bound = -8.239, relative change = 3.068e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 59 (approx. per word bound = -8.239, relative change = 3.075e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 60 (approx. per word bound = -8.239, relative change = 3.085e-05) 
Topic 1: nephew, clerk, yard, ’ly, uncle 
 Topic 2: baron, lord, bottle, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, merry 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, fur, tis, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, convict, marshes, chap 
 Topic 12: doctor, agnes, baby, loved, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: brothers, charles, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, ’em, beadle 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 61 (approx. per word bound = -8.238, relative change = 3.291e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 62 (approx. per word bound = -8.238, relative change = 3.219e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 63 (approx. per word bound = -8.238, relative change = 2.863e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 64 (approx. per word bound = -8.238, relative change = 2.664e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 65 (approx. per word bound = -8.237, relative change = 2.482e-05) 
Topic 1: nephew, clerk, yard, ’ly, uncle 
 Topic 2: baron, bottle, lord, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, merry 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, fur, tis, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, convict, marshes, chap 
 Topic 12: doctor, agnes, baby, loved, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: brothers, charles, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, beadle, ’em 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, heep, copperfield, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 66 (approx. per word bound = -8.237, relative change = 2.373e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 67 (approx. per word bound = -8.237, relative change = 2.662e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 68 (approx. per word bound = -8.237, relative change = 2.520e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 69 (approx. per word bound = -8.237, relative change = 2.332e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 70 (approx. per word bound = -8.236, relative change = 2.273e-05) 
Topic 1: nephew, clerk, yard, ’ly, uncle 
 Topic 2: baron, bottle, lord, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, merry 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, tis, fur, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, marshes, convict, chap 
 Topic 12: doctor, agnes, baby, loved, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: brothers, charles, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, beadle, ’em 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 71 (approx. per word bound = -8.236, relative change = 2.127e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 72 (approx. per word bound = -8.236, relative change = 1.905e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 73 (approx. per word bound = -8.236, relative change = 1.963e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 74 (approx. per word bound = -8.236, relative change = 1.934e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 75 (approx. per word bound = -8.236, relative change = 2.017e-05) 
Topic 1: nephew, clerk, yard, ’ly, uncle 
 Topic 2: baron, bottle, lord, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, merry 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, tis, fur, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, marshes, convict, chap 
 Topic 12: doctor, agnes, baby, loved, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: brothers, charles, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, beadle, ’em 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 76 (approx. per word bound = -8.235, relative change = 2.049e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 77 (approx. per word bound = -8.235, relative change = 1.705e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 78 (approx. per word bound = -8.235, relative change = 1.765e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 79 (approx. per word bound = -8.235, relative change = 1.645e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 80 (approx. per word bound = -8.235, relative change = 1.488e-05) 
Topic 1: nephew, clerk, yard, ’ly, uncle 
 Topic 2: baron, bottle, lord, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, merry 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, tis, fur, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, marshes, convict, chap 
 Topic 12: doctor, agnes, baby, loved, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: brothers, charles, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, beadle, ’em 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, heep, copperfield, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 81 (approx. per word bound = -8.235, relative change = 1.450e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 82 (approx. per word bound = -8.235, relative change = 1.618e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 83 (approx. per word bound = -8.234, relative change = 1.830e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 84 (approx. per word bound = -8.234, relative change = 2.042e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 85 (approx. per word bound = -8.234, relative change = 1.786e-05) 
Topic 1: nephew, clerk, yard, ’ly, merry 
 Topic 2: baron, bottle, lord, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, merry 
 Topic 6: ’am, sparsit, ith, ma’am, ’em 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, tis, fur, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, marshes, convict, chap 
 Topic 12: doctor, agnes, baby, loved, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: brothers, charles, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, ’em, beadle 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, heep, copperfield, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 86 (approx. per word bound = -8.234, relative change = 1.583e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 87 (approx. per word bound = -8.234, relative change = 1.493e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 88 (approx. per word bound = -8.234, relative change = 1.476e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 89 (approx. per word bound = -8.234, relative change = 1.682e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 90 (approx. per word bound = -8.233, relative change = 1.774e-05) 
Topic 1: nephew, clerk, yard, ’ly, merry 
 Topic 2: baron, bottle, lord, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, merry 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, tis, fur, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, marshes, convict, chap 
 Topic 12: doctor, agnes, baby, loved, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: charles, brothers, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, ’em, beadle 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 91 (approx. per word bound = -8.233, relative change = 1.790e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 92 (approx. per word bound = -8.233, relative change = 1.708e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 93 (approx. per word bound = -8.233, relative change = 1.671e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 94 (approx. per word bound = -8.233, relative change = 1.655e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 95 (approx. per word bound = -8.233, relative change = 1.414e-05) 
Topic 1: nephew, clerk, yard, ’ly, merry 
 Topic 2: baron, bottle, lord, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, ’ly 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, tis, fur, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, marshes, convict, chap 
 Topic 12: doctor, agnes, baby, loved, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: charles, brothers, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, ’em, beadle 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 96 (approx. per word bound = -8.233, relative change = 1.500e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 97 (approx. per word bound = -8.233, relative change = 1.499e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 98 (approx. per word bound = -8.232, relative change = 1.478e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 99 (approx. per word bound = -8.232, relative change = 1.531e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 100 (approx. per word bound = -8.232, relative change = 1.377e-05) 
Topic 1: nephew, clerk, yard, ’ly, merry 
 Topic 2: baron, bottle, lord, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, ’ly 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, tis, fur, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, marshes, convict, chap 
 Topic 12: doctor, agnes, baby, loved, cousin 
 Topic 13: boat, ’ly, sea, river, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: charles, brothers, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, ’em, beadle 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 101 (approx. per word bound = -8.232, relative change = 1.479e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 102 (approx. per word bound = -8.232, relative change = 1.301e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 103 (approx. per word bound = -8.232, relative change = 1.230e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 104 (approx. per word bound = -8.232, relative change = 1.329e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 105 (approx. per word bound = -8.232, relative change = 1.581e-05) 
Topic 1: nephew, clerk, yard, ’ly, merry 
 Topic 2: baron, bottle, lord, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, ’ly 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, tis, fur, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, marshes, convict, chap 
 Topic 12: doctor, agnes, baby, loved, cousin 
 Topic 13: boat, ’ly, river, sea, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: charles, brothers, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, ’em, beadle 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 106 (approx. per word bound = -8.231, relative change = 1.554e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 107 (approx. per word bound = -8.231, relative change = 1.635e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 108 (approx. per word bound = -8.231, relative change = 1.658e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 109 (approx. per word bound = -8.231, relative change = 1.553e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 110 (approx. per word bound = -8.231, relative change = 1.501e-05) 
Topic 1: nephew, clerk, yard, ’ly, cousin 
 Topic 2: baron, bottle, lord, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, ’ly 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, tis, fur, ’em 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, marshes, convict, chap 
 Topic 12: doctor, agnes, baby, loved, ’am 
 Topic 13: boat, ’ly, river, sea, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: charles, brothers, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, ’em, beadle 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 111 (approx. per word bound = -8.231, relative change = 1.487e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 112 (approx. per word bound = -8.231, relative change = 1.552e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 113 (approx. per word bound = -8.231, relative change = 1.512e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 114 (approx. per word bound = -8.230, relative change = 1.519e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 115 (approx. per word bound = -8.230, relative change = 1.657e-05) 
Topic 1: nephew, clerk, yard, ’ly, cousin 
 Topic 2: baron, lord, bottle, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, ’ly 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, tis, fur, wi’ 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, marshes, convict, chap 
 Topic 12: doctor, agnes, baby, loved, ’am 
 Topic 13: boat, ’ly, river, sea, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: charles, brothers, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, ’em, beadle 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 116 (approx. per word bound = -8.230, relative change = 1.823e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 117 (approx. per word bound = -8.230, relative change = 1.618e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 118 (approx. per word bound = -8.230, relative change = 1.452e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 119 (approx. per word bound = -8.230, relative change = 1.174e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 120 (approx. per word bound = -8.230, relative change = 1.164e-05) 
Topic 1: nephew, clerk, yard, ’ly, cousin 
 Topic 2: baron, lord, bottle, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, ’ly 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, tis, fur, wi’ 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, marshes, convict, chap 
 Topic 12: doctor, agnes, baby, loved, ’am 
 Topic 13: boat, ’ly, river, sea, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: charles, brothers, ned, nicholas, squeers 
 Topic 16: oliver, sikes, yer, ’em, beadle 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 121 (approx. per word bound = -8.230, relative change = 1.197e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 122 (approx. per word bound = -8.230, relative change = 1.290e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 123 (approx. per word bound = -8.229, relative change = 1.263e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 124 (approx. per word bound = -8.229, relative change = 1.173e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 125 (approx. per word bound = -8.229, relative change = 1.220e-05) 
Topic 1: nephew, clerk, yard, ’ly, cousin 
 Topic 2: baron, lord, bottle, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, ’ly 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, tis, fur, wi’ 
 Topic 9: fat, 'am, magistrate, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, marshes, convict, chap 
 Topic 12: doctor, agnes, baby, loved, ’am 
 Topic 13: boat, ’ly, river, sea, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: charles, brothers, ned, squeers, nicholas 
 Topic 16: oliver, sikes, yer, ’em, beadle 
 Topic 17: traddles, --’, waiter, agnes, commons 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, spy, village, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 126 (approx. per word bound = -8.229, relative change = 1.439e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 127 (approx. per word bound = -8.229, relative change = 1.480e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 128 (approx. per word bound = -8.229, relative change = 1.353e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 129 (approx. per word bound = -8.229, relative change = 1.181e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 130 (approx. per word bound = -8.229, relative change = 1.139e-05) 
Topic 1: nephew, clerk, yard, ’ly, cousin 
 Topic 2: baron, lord, bottle, wititterly, sisters 
 Topic 3: monks, intelligence, misery, oliver, tonight 
 Topic 4: project, works, electronic, foundation, copyright 
 Topic 5: ghost, scrooge, contrary, pointed, ’ly 
 Topic 6: ’am, sparsit, ith, ’em, ma’am 
 Topic 7: prisoner, judge, prison, jury, court 
 Topic 8: thee, thou, tis, fur, wi’ 
 Topic 9: fat, magistrate, 'am, spinster, crowd 
 Topic 10: wery, wos, wot, 'n, ere 
 Topic 11: pip, forge, marshes, convict, chap 
 Topic 12: doctor, agnes, baby, loved, ’am 
 Topic 13: boat, ’ly, river, sea, tide 
 Topic 14: manager, squeers, crummles, stage, collector 
 Topic 15: charles, brothers, ned, squeers, nicholas 
 Topic 16: oliver, sikes, yer, ’em, beadle 
 Topic 17: traddles, --’, waiter, commons, agnes 
 Topic 18: uncle, guard, horses, chaise, mail 
 Topic 19: madame, village, spy, roads, mender 
 Topic 20: agnes, umble, copperfield, heep, ’ve 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 131 (approx. per word bound = -8.229, relative change = 1.175e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 132 (approx. per word bound = -8.229, relative change = 1.086e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 133 (approx. per word bound = -8.228, relative change = 1.038e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Model Converged 

Comparing the vocabulary sizes of the two matrices:

Code
dtm_trimmed_rare
Document-feature matrix of: 736 documents, 12,123 features (97.45% sparse) and 0 docvars.
    features
docs ghost anyone anywhere parts cost restrictions whatsoever copy re-use
   1     5      1        1     1    1            1          1    1      1
   2     0      0        0     0    1            0          0    1      0
   3    24      0        1     0    0            0          0    0      0
   4    15      0        0     1    0            0          0    0      0
   5     7      0        0     0    0            0          0    0      0
   6    11      0        0     0    1            0          0    0      0
    features
docs included
   1        1
   2        0
   3        0
   4        0
   5        0
   6        0
[ reached max_ndoc ... 730 more documents, reached max_nfeat ... 12,113 more features ]
Code
dtm_trimmed
Document-feature matrix of: 736 documents, 18,511 features (98.22% sparse) and 0 docvars.
    features
docs ghost anyone anywhere parts cost restrictions whatsoever copy re-use
   1     5      1        1     1    1            1          1    1      1
   2     0      0        0     0    1            0          0    1      0
   3    24      0        1     0    0            0          0    0      0
   4    15      0        0     1    0            0          0    0      0
   5     7      0        0     0    0            0          0    0      0
   6    11      0        0     0    1            0          0    0      0
    features
docs included
   1        1
   2        0
   3        0
   4        0
   5        0
   6        0
[ reached max_ndoc ... 730 more documents, reached max_nfeat ... 18,501 more features ]
Code
plot(stmOut_3, n = 10)

Tightening the upper frequency threshold does not yield a clear improvement here. Some novel words appear, but we also see some character names that the proper noun removal missed — indicating that a small number of names were not tagged as PROPN by udpipe. For the current purposes, we continue with this imperfect data — which is a realistic situation: data will always have some quirks.


Fourth Model: More Common Words, More Topics

We explore what happens when we allow more common words by increasing the minimum document frequency to 1% and using 30 topics:

Code
dtm_trimmed_common <- quanteda::dfm_trim(
  dtm,
  min_termfreq = 2,
  min_docfreq  = 0.01,
  max_docfreq  = 0.25,
  docfreq_type = "prop"
)

set.seed(42)
stmOut_4 <- stm::stm(documents = dtm_trimmed_common,
                     K          = 30,
                     max.em.its = 200)
Beginning Spectral Initialization 
     Calculating the gram matrix...
     Finding anchor words...
    ..............................
     Recovering initialization...
    ................................................................................
Initialization complete.
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 1 (approx. per word bound = -8.265) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 2 (approx. per word bound = -8.031, relative change = 2.825e-02) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 3 (approx. per word bound = -7.980, relative change = 6.448e-03) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 4 (approx. per word bound = -7.959, relative change = 2.581e-03) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 5 (approx. per word bound = -7.949, relative change = 1.286e-03) 
Topic 1: merry, occasions, exists, respect, faithful 
 Topic 2: respect, neighbourhood, faithful, steady, passes 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, doctor, --’ 
 Topic 5: invariably, umble, proud, weep, wonder 
 Topic 6: ha, ’em, thee, ’d, wi’ 
 Topic 7: magistrate, crowd, 'am, ma, office 
 Topic 8: hats, nephew, clerk, scrooge, ghost 
 Topic 9: beneath, goblin, earth, sisters, sun 
 Topic 10: wine, madame, husband, spy, shop 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, son 
 Topic 14: oliver, beadle, undertaker, boys, board 
 Topic 15: prisoner, ghost, prison, spirit, husband 
 Topic 16: manager, crummles, ladies, collector, stage 
 Topic 17: fat, garden, 're, ladies, 'oh 
 Topic 18: squeers, 'am, ma, boys, 'd 
 Topic 19: traddles, boys, school, book, shop 
 Topic 20: river, boat, sea, tide, wind 
 Topic 21: judge, court, jury, clerk, witness 
 Topic 22: roads, village, wine, stone, mender 
 Topic 23: uncle, sword, widow, collector, niece 
 Topic 24: attachment, agnes, papa, lived, silent 
 Topic 25: yer, sikes, dodger, ’ve, dog 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, gate, ’d, shoulder 
 Topic 28: attended, letter, traddles, confidence, punch 
 Topic 29: fat, horse, stranger, chaise, ladies 
 Topic 30: doctor, giles, officer, ladies, miss 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 6 (approx. per word bound = -7.944, relative change = 6.220e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 7 (approx. per word bound = -7.941, relative change = 3.697e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 8 (approx. per word bound = -7.939, relative change = 2.731e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 9 (approx. per word bound = -7.937, relative change = 2.540e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 10 (approx. per word bound = -7.935, relative change = 2.663e-04) 
Topic 1: merry, occasions, exists, grey, coachman 
 Topic 2: respect, neighbourhood, faithful, steady, passes 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, doctor, --’ 
 Topic 5: umble, invariably, baby, copperfield, wonder 
 Topic 6: ha, ’em, thee, wi’, ’d 
 Topic 7: magistrate, crowd, 'am, ma, office 
 Topic 8: nephew, hats, scrooge, ghost, clerk 
 Topic 9: beneath, sisters, goblin, earth, sun 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, board 
 Topic 15: prisoner, ghost, prison, spirit, husband 
 Topic 16: manager, crummles, ladies, collector, stage 
 Topic 17: fat, garden, ladies, 're, 'oh 
 Topic 18: squeers, boys, 'am, 'd, ma 
 Topic 19: traddles, boys, school, book, shop 
 Topic 20: river, boat, sea, tide, wind 
 Topic 21: judge, court, jury, clerk, prisoner 
 Topic 22: roads, village, wine, stone, mender 
 Topic 23: uncle, sword, widow, collector, niece 
 Topic 24: attachment, agnes, lived, papa, silent 
 Topic 25: sikes, yer, dodger, ’ve, dog 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, ’re, gate 
 Topic 28: attended, traddles, letter, confidence, punch 
 Topic 29: fat, horse, stranger, chaise, ladies 
 Topic 30: doctor, giles, miss, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 11 (approx. per word bound = -7.933, relative change = 2.611e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 12 (approx. per word bound = -7.931, relative change = 2.585e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 13 (approx. per word bound = -7.929, relative change = 2.273e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 14 (approx. per word bound = -7.927, relative change = 2.113e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 15 (approx. per word bound = -7.925, relative change = 2.111e-04) 
Topic 1: merry, occasions, coachman, guard, hearts 
 Topic 2: respect, neighbourhood, steady, faithful, sisters 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, doctor, --’ 
 Topic 5: umble, invariably, baby, copperfield, wonder 
 Topic 6: ha, ’em, thee, wi’, ’t 
 Topic 7: magistrate, 'am, ma, crowd, office 
 Topic 8: nephew, hats, scrooge, ghost, game 
 Topic 9: beneath, goblin, sun, grave, earth 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, board 
 Topic 15: prisoner, ghost, spirit, prison, husband 
 Topic 16: manager, crummles, ladies, collector, stage 
 Topic 17: fat, garden, ladies, 'oh, 're 
 Topic 18: squeers, boys, 'd, 'am, 'oh 
 Topic 19: traddles, boys, school, book, somebody 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, court, jury, clerk, prisoner 
 Topic 22: wine, roads, village, stone, mender 
 Topic 23: uncle, widow, sword, ma, 'am 
 Topic 24: attachment, agnes, lived, silent, papa 
 Topic 25: sikes, yer, dodger, dog, ’ve 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, ’re, gate 
 Topic 28: attended, traddles, letter, confidence, punch 
 Topic 29: fat, horse, chaise, stranger, ladies 
 Topic 30: doctor, giles, miss, ladies, officer 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 16 (approx. per word bound = -7.924, relative change = 1.879e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 17 (approx. per word bound = -7.923, relative change = 1.504e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 18 (approx. per word bound = -7.922, relative change = 1.223e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 19 (approx. per word bound = -7.921, relative change = 1.015e-04) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 20 (approx. per word bound = -7.920, relative change = 9.651e-05) 
Topic 1: merry, occasions, coachman, guard, drive 
 Topic 2: respect, neighbourhood, sisters, steady, faithful 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, doctor, --’ 
 Topic 5: umble, invariably, copperfield, wonder, baby 
 Topic 6: ha, thee, ’em, wi’, ’t 
 Topic 7: magistrate, 'am, ma, crowd, office 
 Topic 8: nephew, hats, scrooge, game, uncle 
 Topic 9: beneath, goblin, sun, grave, scene 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, board 
 Topic 15: prisoner, ghost, spirit, prison, husband 
 Topic 16: manager, crummles, ladies, collector, stage 
 Topic 17: fat, garden, ladies, 'oh, 're 
 Topic 18: squeers, boys, 'd, 'oh, thou 
 Topic 19: traddles, school, boys, book, waiter 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, court, clerk, jury, prisoner 
 Topic 22: wine, roads, village, stone, mender 
 Topic 23: uncle, widow, sword, ma, 'am 
 Topic 24: attachment, agnes, lived, silent, merry 
 Topic 25: sikes, yer, dodger, dog, ’ve 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, ’re, guardian 
 Topic 28: attended, traddles, letter, confidence, punch 
 Topic 29: fat, horse, chaise, stranger, ladies 
 Topic 30: doctor, giles, miss, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 21 (approx. per word bound = -7.919, relative change = 8.185e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 22 (approx. per word bound = -7.919, relative change = 7.642e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 23 (approx. per word bound = -7.918, relative change = 7.208e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 24 (approx. per word bound = -7.918, relative change = 6.386e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 25 (approx. per word bound = -7.917, relative change = 5.909e-05) 
Topic 1: merry, occasions, coachman, guard, drive 
 Topic 2: respect, sisters, neighbourhood, faithful, steady 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, --’, loved 
 Topic 5: umble, invariably, copperfield, wonder, proud 
 Topic 6: ha, thee, ’em, wi’, ’t 
 Topic 7: magistrate, 'am, ma, crowd, office 
 Topic 8: nephew, hats, scrooge, game, uncle 
 Topic 9: beneath, goblin, sun, grave, scene 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, board 
 Topic 15: prisoner, ghost, spirit, prison, husband 
 Topic 16: manager, crummles, ladies, collector, stage 
 Topic 17: fat, garden, ladies, 'oh, 're 
 Topic 18: squeers, boys, 'd, 'oh, thou 
 Topic 19: traddles, school, boys, book, waiter 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, court, clerk, jury, prisoner 
 Topic 22: wine, roads, village, stone, mender 
 Topic 23: uncle, widow, sword, ma, 'am 
 Topic 24: attachment, agnes, lived, waiter, spirit 
 Topic 25: sikes, yer, dodger, ’ve, dog 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, ’re, guardian 
 Topic 28: attended, traddles, letter, confidence, punch 
 Topic 29: fat, horse, chaise, stranger, ladies 
 Topic 30: doctor, giles, miss, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 26 (approx. per word bound = -7.917, relative change = 5.554e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 27 (approx. per word bound = -7.917, relative change = 4.674e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 28 (approx. per word bound = -7.916, relative change = 5.547e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 29 (approx. per word bound = -7.916, relative change = 4.862e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 30 (approx. per word bound = -7.915, relative change = 4.961e-05) 
Topic 1: merry, occasions, coachman, guard, drive 
 Topic 2: sisters, respect, neighbourhood, grey, faithful 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, --’, loved 
 Topic 5: umble, invariably, copperfield, wonder, agnes 
 Topic 6: ha, thee, ’em, ’t, wi’ 
 Topic 7: magistrate, 'am, ma, crowd, office 
 Topic 8: hats, nephew, scrooge, uncle, ghost 
 Topic 9: beneath, goblin, grave, sun, scene 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, board 
 Topic 15: prisoner, ghost, spirit, husband, prison 
 Topic 16: manager, crummles, ladies, collector, stage 
 Topic 17: fat, garden, ladies, 'oh, 're 
 Topic 18: squeers, boys, 'd, 'oh, thou 
 Topic 19: traddles, school, boys, book, waiter 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, court, clerk, jury, prisoner 
 Topic 22: wine, roads, village, mender, stone 
 Topic 23: uncle, widow, ma, sword, 'am 
 Topic 24: attachment, agnes, lived, waiter, wrong 
 Topic 25: sikes, yer, dodger, dog, ’ve 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, guardian, ’re 
 Topic 28: attended, traddles, letter, confidence, punch 
 Topic 29: fat, horse, stranger, chaise, ladies 
 Topic 30: doctor, giles, miss, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 31 (approx. per word bound = -7.915, relative change = 4.648e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 32 (approx. per word bound = -7.915, relative change = 4.165e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 33 (approx. per word bound = -7.914, relative change = 4.312e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 34 (approx. per word bound = -7.914, relative change = 4.092e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 35 (approx. per word bound = -7.914, relative change = 3.690e-05) 
Topic 1: merry, occasions, coachman, guard, drive 
 Topic 2: sisters, neighbourhood, grey, respect, loved 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, --’, loved 
 Topic 5: umble, invariably, copperfield, wonder, agnes 
 Topic 6: ha, thee, ’em, ’t, wi’ 
 Topic 7: magistrate, 'am, ma, crowd, office 
 Topic 8: hats, nephew, scrooge, ghost, uncle 
 Topic 9: beneath, goblin, grave, scene, sun 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, board 
 Topic 15: prisoner, ghost, spirit, husband, prison 
 Topic 16: manager, crummles, ladies, collector, stage 
 Topic 17: fat, ladies, garden, 'oh, 're 
 Topic 18: squeers, boys, 'd, 'oh, thou 
 Topic 19: traddles, boys, school, book, waiter 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, court, clerk, jury, prisoner 
 Topic 22: wine, roads, village, mender, stone 
 Topic 23: uncle, widow, ma, sword, 'am 
 Topic 24: attachment, agnes, lived, waiter, wrong 
 Topic 25: sikes, yer, dodger, dog, ’ve 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, guardian, ’re 
 Topic 28: attended, traddles, letter, confidence, punch 
 Topic 29: horse, fat, stranger, chaise, ladies 
 Topic 30: doctor, giles, miss, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 36 (approx. per word bound = -7.913, relative change = 3.590e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 37 (approx. per word bound = -7.913, relative change = 3.518e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 38 (approx. per word bound = -7.913, relative change = 3.500e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 39 (approx. per word bound = -7.913, relative change = 3.535e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 40 (approx. per word bound = -7.912, relative change = 3.576e-05) 
Topic 1: merry, occasions, coachman, guard, drive 
 Topic 2: sisters, grey, garden, neighbourhood, loved 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, --’, loved 
 Topic 5: umble, invariably, copperfield, wonder, agnes 
 Topic 6: ha, thee, ’em, ’t, wi’ 
 Topic 7: magistrate, 'am, ma, crowd, 'oh 
 Topic 8: hats, nephew, ghost, spirit, scrooge 
 Topic 9: beneath, goblin, scene, grave, sun 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, board 
 Topic 15: prisoner, ghost, husband, spirit, prison 
 Topic 16: manager, crummles, ladies, collector, stage 
 Topic 17: fat, ladies, garden, 'oh, 're 
 Topic 18: squeers, boys, 'd, 'oh, thou 
 Topic 19: traddles, boys, school, book, waiter 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, court, clerk, jury, prisoner 
 Topic 22: wine, roads, village, mender, stone 
 Topic 23: uncle, widow, ma, sword, 'am 
 Topic 24: attachment, agnes, lived, waiter, likely 
 Topic 25: sikes, yer, dodger, dog, ’ve 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, guardian, ’re 
 Topic 28: attended, traddles, letter, confidence, punch 
 Topic 29: horse, fat, stranger, chaise, ladies 
 Topic 30: doctor, miss, giles, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 41 (approx. per word bound = -7.912, relative change = 3.395e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 42 (approx. per word bound = -7.912, relative change = 3.194e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 43 (approx. per word bound = -7.911, relative change = 3.200e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 44 (approx. per word bound = -7.911, relative change = 3.214e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 45 (approx. per word bound = -7.911, relative change = 3.061e-05) 
Topic 1: merry, coachman, guard, occasions, wonder 
 Topic 2: sisters, grey, garden, loved, flowers 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, --’, loved 
 Topic 5: umble, invariably, copperfield, wonder, agnes 
 Topic 6: ha, thee, ’em, ’t, wi’ 
 Topic 7: magistrate, 'am, ma, crowd, 'oh 
 Topic 8: hats, nephew, ghost, spirit, scrooge 
 Topic 9: beneath, goblin, scene, grave, sun 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, waistcoat 
 Topic 15: prisoner, ghost, husband, prison, spirit 
 Topic 16: manager, crummles, ladies, collector, stage 
 Topic 17: fat, ladies, garden, 'oh, 're 
 Topic 18: squeers, boys, 'd, 'oh, thou 
 Topic 19: traddles, boys, school, book, waiter 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, court, clerk, jury, prisoner 
 Topic 22: wine, roads, village, mender, stone 
 Topic 23: uncle, widow, ma, sword, 'am 
 Topic 24: attachment, agnes, lived, waiter, wrong 
 Topic 25: sikes, yer, dodger, dog, ’ve 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, guardian, ’re 
 Topic 28: attended, traddles, letter, confidence, punch 
 Topic 29: horse, fat, stranger, chaise, ladies 
 Topic 30: doctor, miss, giles, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 46 (approx. per word bound = -7.911, relative change = 3.015e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 47 (approx. per word bound = -7.911, relative change = 2.906e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 48 (approx. per word bound = -7.910, relative change = 2.802e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 49 (approx. per word bound = -7.910, relative change = 2.931e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 50 (approx. per word bound = -7.910, relative change = 3.075e-05) 
Topic 1: merry, coachman, guard, occasions, wonder 
 Topic 2: sisters, grey, loved, garden, flowers 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, --’, loved 
 Topic 5: umble, invariably, copperfield, wonder, agnes 
 Topic 6: ha, thee, ’em, ’t, wi’ 
 Topic 7: magistrate, 'am, ma, crowd, 'oh 
 Topic 8: hats, nephew, ghost, spirit, scrooge 
 Topic 9: beneath, goblin, scene, grave, sun 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, waistcoat 
 Topic 15: prisoner, ghost, husband, prison, spirit 
 Topic 16: manager, crummles, ladies, collector, stage 
 Topic 17: fat, ladies, garden, 'oh, 're 
 Topic 18: squeers, boys, 'd, 'oh, thou 
 Topic 19: traddles, boys, school, book, waiter 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, court, clerk, jury, prisoner 
 Topic 22: wine, roads, village, mender, stone 
 Topic 23: uncle, widow, ma, 'am, sword 
 Topic 24: attachment, agnes, lived, waiter, wrong 
 Topic 25: sikes, yer, dodger, dog, ’ve 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, guardian, ’re 
 Topic 28: attended, traddles, letter, confidence, punch 
 Topic 29: horse, fat, stranger, chaise, ladies 
 Topic 30: doctor, miss, giles, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 51 (approx. per word bound = -7.910, relative change = 2.933e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 52 (approx. per word bound = -7.909, relative change = 2.678e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 53 (approx. per word bound = -7.909, relative change = 2.565e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 54 (approx. per word bound = -7.909, relative change = 2.736e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 55 (approx. per word bound = -7.909, relative change = 2.588e-05) 
Topic 1: merry, coachman, guard, wonder, exists 
 Topic 2: sisters, loved, grey, garden, flowers 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, --’, loved 
 Topic 5: umble, invariably, copperfield, wonder, agnes 
 Topic 6: ha, thee, ’em, ’t, wi’ 
 Topic 7: magistrate, 'am, ma, crowd, 'oh 
 Topic 8: hats, ghost, nephew, spirit, laughed 
 Topic 9: beneath, goblin, scene, sun, grave 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, waistcoat 
 Topic 15: prisoner, ghost, husband, prison, spirit 
 Topic 16: manager, crummles, ladies, collector, stage 
 Topic 17: fat, ladies, garden, 'oh, 're 
 Topic 18: squeers, boys, 'd, 'oh, thou 
 Topic 19: traddles, boys, school, book, waiter 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, court, clerk, jury, prisoner 
 Topic 22: wine, roads, village, mender, stone 
 Topic 23: uncle, widow, ma, 'am, sword 
 Topic 24: attachment, agnes, lived, waiter, wrong 
 Topic 25: sikes, yer, dodger, dog, ’ve 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, guardian, ’re 
 Topic 28: attended, traddles, letter, confidence, punch 
 Topic 29: horse, fat, stranger, chaise, ladies 
 Topic 30: doctor, miss, giles, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 56 (approx. per word bound = -7.909, relative change = 2.524e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 57 (approx. per word bound = -7.908, relative change = 2.439e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 58 (approx. per word bound = -7.908, relative change = 2.265e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 59 (approx. per word bound = -7.908, relative change = 2.245e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 60 (approx. per word bound = -7.908, relative change = 2.570e-05) 
Topic 1: merry, coachman, guard, wonder, exists 
 Topic 2: sisters, loved, grey, garden, flowers 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, --’, letter 
 Topic 5: umble, invariably, copperfield, agnes, wonder 
 Topic 6: ha, thee, ’em, ’t, wi’ 
 Topic 7: magistrate, 'am, ma, crowd, 'oh 
 Topic 8: ghost, spirit, hats, nephew, laughed 
 Topic 9: beneath, goblin, scene, sun, grave 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, waistcoat 
 Topic 15: prisoner, ghost, husband, prison, breast 
 Topic 16: manager, crummles, collector, ladies, stage 
 Topic 17: fat, ladies, garden, 'oh, 're 
 Topic 18: squeers, boys, 'd, 'oh, thou 
 Topic 19: traddles, boys, school, book, waiter 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, clerk, court, jury, prisoner 
 Topic 22: wine, roads, village, mender, stone 
 Topic 23: uncle, ma, widow, 'am, sword 
 Topic 24: attachment, agnes, lived, waiter, wrong 
 Topic 25: sikes, yer, dodger, dog, ’ve 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, guardian, ’re 
 Topic 28: attended, traddles, letter, confidence, punch 
 Topic 29: horse, fat, stranger, chaise, ladies 
 Topic 30: doctor, miss, giles, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 61 (approx. per word bound = -7.908, relative change = 2.465e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 62 (approx. per word bound = -7.907, relative change = 2.188e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 63 (approx. per word bound = -7.907, relative change = 2.082e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 64 (approx. per word bound = -7.907, relative change = 2.030e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 65 (approx. per word bound = -7.907, relative change = 2.140e-05) 
Topic 1: merry, coachman, guard, wonder, exists 
 Topic 2: sisters, loved, grey, garden, flowers 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, --’, letter 
 Topic 5: umble, invariably, copperfield, agnes, wonder 
 Topic 6: ha, thee, ’em, ’t, wi’ 
 Topic 7: magistrate, 'am, ma, crowd, 'oh 
 Topic 8: ghost, spirit, hats, nephew, laughed 
 Topic 9: beneath, goblin, scene, sun, grave 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, waistcoat 
 Topic 15: prisoner, ghost, husband, prison, breast 
 Topic 16: manager, crummles, collector, ladies, stage 
 Topic 17: fat, ladies, garden, 'oh, spinster 
 Topic 18: squeers, boys, 'd, 'oh, thou 
 Topic 19: traddles, boys, school, book, waiter 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, clerk, court, jury, prisoner 
 Topic 22: wine, roads, village, mender, stone 
 Topic 23: uncle, ma, widow, 'am, sword 
 Topic 24: attachment, agnes, waiter, lived, son 
 Topic 25: sikes, yer, dodger, dog, ’ve 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, guardian, ’re 
 Topic 28: attended, traddles, letter, confidence, punch 
 Topic 29: horse, fat, stranger, chaise, ladies 
 Topic 30: doctor, miss, giles, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 66 (approx. per word bound = -7.907, relative change = 1.895e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 67 (approx. per word bound = -7.907, relative change = 1.681e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 68 (approx. per word bound = -7.907, relative change = 1.693e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 69 (approx. per word bound = -7.906, relative change = 1.611e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 70 (approx. per word bound = -7.906, relative change = 1.625e-05) 
Topic 1: merry, guard, coachman, wonder, exists 
 Topic 2: sisters, loved, grey, garden, flowers 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, --’, letter 
 Topic 5: umble, invariably, copperfield, agnes, wonder 
 Topic 6: ha, thee, ’em, ’t, wi’ 
 Topic 7: magistrate, 'am, ma, crowd, 'oh 
 Topic 8: ghost, spirit, hats, nephew, laughed 
 Topic 9: beneath, goblin, sun, scene, grave 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, waistcoat 
 Topic 15: prisoner, ghost, husband, prison, breast 
 Topic 16: manager, crummles, collector, ladies, stage 
 Topic 17: fat, ladies, garden, 'oh, spinster 
 Topic 18: squeers, boys, 'd, 'oh, thou 
 Topic 19: traddles, boys, school, book, waiter 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, clerk, court, jury, prisoner 
 Topic 22: wine, roads, village, mender, stone 
 Topic 23: uncle, ma, 'am, widow, sword 
 Topic 24: attachment, agnes, waiter, son, lived 
 Topic 25: sikes, yer, dodger, dog, ’ve 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, guardian, ’re 
 Topic 28: attended, traddles, letter, confidence, son 
 Topic 29: horse, fat, stranger, chaise, ladies 
 Topic 30: doctor, miss, giles, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 71 (approx. per word bound = -7.906, relative change = 1.520e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 72 (approx. per word bound = -7.906, relative change = 1.416e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 73 (approx. per word bound = -7.906, relative change = 1.291e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 74 (approx. per word bound = -7.906, relative change = 1.254e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 75 (approx. per word bound = -7.906, relative change = 1.270e-05) 
Topic 1: merry, guard, coachman, wonder, exists 
 Topic 2: sisters, loved, grey, garden, flowers 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, --’, letter 
 Topic 5: umble, invariably, copperfield, agnes, wonder 
 Topic 6: ha, thee, ’em, ’t, wi’ 
 Topic 7: magistrate, 'am, ma, crowd, 'oh 
 Topic 8: ghost, spirit, hats, nephew, laughed 
 Topic 9: beneath, goblin, sun, scene, grave 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, waistcoat 
 Topic 15: prisoner, ghost, husband, prison, breast 
 Topic 16: manager, crummles, collector, ladies, stage 
 Topic 17: fat, ladies, garden, 'oh, spinster 
 Topic 18: squeers, boys, 'd, 'oh, thou 
 Topic 19: traddles, boys, school, book, waiter 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, clerk, court, jury, prisoner 
 Topic 22: wine, roads, village, mender, stone 
 Topic 23: uncle, ma, 'am, widow, sword 
 Topic 24: attachment, agnes, waiter, son, lived 
 Topic 25: sikes, yer, dodger, dog, ’ve 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, guardian, ’re 
 Topic 28: attended, traddles, letter, confidence, son 
 Topic 29: horse, fat, stranger, chaise, ladies 
 Topic 30: doctor, miss, giles, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 76 (approx. per word bound = -7.906, relative change = 1.250e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 77 (approx. per word bound = -7.906, relative change = 1.307e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 78 (approx. per word bound = -7.905, relative change = 1.294e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 79 (approx. per word bound = -7.905, relative change = 1.287e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 80 (approx. per word bound = -7.905, relative change = 1.335e-05) 
Topic 1: merry, guard, coachman, wonder, exists 
 Topic 2: sisters, loved, grey, garden, flowers 
 Topic 3: project, works, electronic, terms, foundation 
 Topic 4: aunt, traddles, agnes, --’, letter 
 Topic 5: umble, invariably, copperfield, agnes, wonder 
 Topic 6: ha, thee, ’em, ’t, wi’ 
 Topic 7: magistrate, 'am, ma, crowd, 'oh 
 Topic 8: ghost, spirit, hats, nephew, laughed 
 Topic 9: beneath, goblin, sun, scene, grave 
 Topic 10: madame, wine, husband, spy, streets 
 Topic 11: ’ly, ’m, ’d, ai, fur 
 Topic 12: wery, 'm, says, 'd, 're 
 Topic 13: brother, brothers, charles, ned, lord 
 Topic 14: oliver, beadle, undertaker, boys, waistcoat 
 Topic 15: prisoner, ghost, husband, prison, breast 
 Topic 16: manager, crummles, collector, ladies, stage 
 Topic 17: fat, ladies, garden, 'oh, spinster 
 Topic 18: squeers, boys, 'd, 'oh, thou 
 Topic 19: traddles, boys, school, book, waiter 
 Topic 20: boat, river, sea, tide, wind 
 Topic 21: judge, clerk, court, jury, prisoner 
 Topic 22: wine, roads, village, mender, stone 
 Topic 23: uncle, ma, 'am, widow, sword 
 Topic 24: attachment, agnes, son, waiter, wrong 
 Topic 25: sikes, yer, dodger, dog, ’ve 
 Topic 26: ma, ’am, matron, sparsit, stranger 
 Topic 27: —and, pip, ’d, guardian, ’re 
 Topic 28: attended, traddles, letter, confidence, son 
 Topic 29: horse, fat, stranger, chaise, ladies 
 Topic 30: doctor, miss, giles, ladies, officer 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 81 (approx. per word bound = -7.905, relative change = 1.513e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 82 (approx. per word bound = -7.905, relative change = 1.420e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 83 (approx. per word bound = -7.905, relative change = 1.091e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 84 (approx. per word bound = -7.905, relative change = 1.061e-05) 
.........................................................................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Model Converged 
Code
plot(stmOut_4, n = 10)

Some of the clearest topics from the second model are still recognisable — a water-related topic and a legal topic remain coherent — but there is no clear overall improvement. Allowing more common words means the model is dominated by high-frequency features that appear across many different topics without clearly distinguishing them. The direction this points us in is clear: rather than tinkering further with document-frequency thresholds on 1,000-word chunks, we should try smaller chunks.


Fifth Model: Smaller Chunks (500 Words)

Section Overview

What you will learn: Why chunk size is one of the most important parameters in topic modelling; how to re-chunk the token stream at a smaller size; and what improvements emerge from moving to 500-word pseudo-documents

Smaller chunks limit the number of co-occurrences each word has, which should be especially relevant for rare and scene-specific words. We adjust the chunk size from 1,000 to 500 words:

Code
chunk          <- 500
n              <- length(list_toks)
r              <- rep(1:ceiling(n / chunk), each = chunk)[1:n]
chunky_dickens <- split(list_toks, r)
chunky_toks    <- quanteda::tokens(chunky_dickens)
dtm            <- quanteda::dfm(chunky_toks)

dtm_trimmed_5 <- quanteda::dfm_trim(
  dtm,
  min_termfreq = 2,
  min_docfreq  = 0.005,
  max_docfreq  = 0.25,
  docfreq_type = "prop"
)

set.seed(42)
stmOut_5 <- stm::stm(documents = dtm_trimmed_5,
                     K          = 25,
                     max.em.its = 200)
Beginning Spectral Initialization 
     Calculating the gram matrix...
     Finding anchor words...
    .........................
     Recovering initialization...
    ...................................................................................
Initialization complete.
.........................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 1 (approx. per word bound = -8.214) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 2 (approx. per word bound = -8.004, relative change = 2.556e-02) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 3 (approx. per word bound = -7.963, relative change = 5.041e-03) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 4 (approx. per word bound = -7.950, relative change = 1.685e-03) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 5 (approx. per word bound = -7.943, relative change = 8.407e-04) 
Topic 1: master, doctor, nephew, merry, year 
 Topic 2: project, works, electronic, terms, copyright 
 Topic 3: ha, o, ’em, thee, wi’ 
 Topic 4: money, five, doctor, pounds, hundred 
 Topic 5: donations, foundation, information, laws, including 
 Topic 6: aunt, traddles, ,’, agnes, happy 
 Topic 7: beneath, ground, grave, cold, spirit 
 Topic 8: magistrate, person, friends, countenance, question 
 Topic 9: family, children, master, year, wife 
 Topic 10: boys, manager, water, landlord, stranger 
 Topic 11: 'll, squeers, 'd, wife, says 
 Topic 12: woman, sister, master, boys, ,’ 
 Topic 13: collector, ladies, daughter, girl, company 
 Topic 14: fat, 'll, party, ladies, 're 
 Topic 15: doctor, woman, girl, money, lay 
 Topic 16: case, judge, prisoner, gentlemen, court 
 Topic 17: water, lay, wind, stopped, stone 
 Topic 18: girl, sikes, oliver, ’ve, yer 
 Topic 19: wine, wife, husband, shop, madame 
 Topic 20: brother, sister, girl, brothers, daughter 
 Topic 21: wery, o, 'm, 'll, says 
 Topic 22: ,’, ’ly, ’d, ’m, ai 
 Topic 23: ma, ’am, 'am, ,’, ma’am 
 Topic 24: uncle, coach, gentlemen, horses, coat 
 Topic 25: widow, demd, demmit, soul, husband 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 6 (approx. per word bound = -7.939, relative change = 5.081e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 7 (approx. per word bound = -7.936, relative change = 3.474e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 8 (approx. per word bound = -7.934, relative change = 2.661e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 9 (approx. per word bound = -7.933, relative change = 2.176e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 10 (approx. per word bound = -7.931, relative change = 1.864e-04) 
Topic 1: doctor, master, nephew, merry, clerk 
 Topic 2: project, works, electronic, terms, agreement 
 Topic 3: ha, o, ’em, ’d, thee 
 Topic 4: money, five, pounds, hundred, pound 
 Topic 5: donations, information, foundation, laws, including 
 Topic 6: aunt, traddles, ,’, agnes, happy 
 Topic 7: beneath, cold, ground, spirit, grave 
 Topic 8: magistrate, person, friends, countenance, question 
 Topic 9: family, children, master, wife, pocket 
 Topic 10: boys, water, landlord, manager, stranger 
 Topic 11: 'll, squeers, 'd, says, wife 
 Topic 12: sister, woman, boys, —and, hair 
 Topic 13: collector, ladies, daughter, company, married 
 Topic 14: fat, 'll, ladies, party, 're 
 Topic 15: doctor, woman, girl, money, lay 
 Topic 16: case, judge, prisoner, gentlemen, court 
 Topic 17: water, wind, lay, windows, stone 
 Topic 18: girl, sikes, oliver, ’ve, woman 
 Topic 19: wine, husband, wife, doctor, shop 
 Topic 20: brother, girl, sister, brothers, daughter 
 Topic 21: wery, o, 'm, 'll, says 
 Topic 22: ,’, ’ly, ’m, ’d, ai 
 Topic 23: ma, ’am, 'am, ,’, ma’am 
 Topic 24: coach, uncle, gentlemen, horses, coat 
 Topic 25: widow, demd, soul, husband, demmit 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 11 (approx. per word bound = -7.930, relative change = 1.597e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 12 (approx. per word bound = -7.929, relative change = 1.351e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 13 (approx. per word bound = -7.928, relative change = 1.164e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 14 (approx. per word bound = -7.927, relative change = 1.048e-04) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 15 (approx. per word bound = -7.926, relative change = 9.394e-05) 
Topic 1: doctor, master, nephew, merry, school 
 Topic 2: project, works, electronic, terms, agreement 
 Topic 3: ha, o, ’em, ’d, thee 
 Topic 4: money, five, pounds, hundred, pound 
 Topic 5: donations, information, compliance, chapter, including 
 Topic 6: aunt, traddles, ,’, agnes, happy 
 Topic 7: spirit, cold, beneath, ground, grave 
 Topic 8: magistrate, person, friends, countenance, question 
 Topic 9: family, children, pocket, master, wife 
 Topic 10: boys, water, landlord, stranger, manager 
 Topic 11: 'll, squeers, 'd, says, rejoined 
 Topic 12: sister, woman, —and, hair, boys 
 Topic 13: collector, ladies, daughter, company, married 
 Topic 14: fat, 'll, ladies, party, spinster 
 Topic 15: doctor, woman, girl, rose, lay 
 Topic 16: case, judge, prisoner, gentlemen, court 
 Topic 17: water, windows, wind, lay, stone 
 Topic 18: girl, sikes, oliver, ’ve, woman 
 Topic 19: wine, husband, wife, doctor, madame 
 Topic 20: brother, girl, sister, brothers, rejoined 
 Topic 21: wery, o, 'm, 'll, says 
 Topic 22: ,’, ’ly, ’m, ’d, ai 
 Topic 23: ma, ’am, ,’, 'am, ma’am 
 Topic 24: coach, uncle, gentlemen, horses, coat 
 Topic 25: widow, demd, soul, husband, wife 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 16 (approx. per word bound = -7.926, relative change = 8.785e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 17 (approx. per word bound = -7.925, relative change = 8.318e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 18 (approx. per word bound = -7.924, relative change = 7.685e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 19 (approx. per word bound = -7.924, relative change = 7.110e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 20 (approx. per word bound = -7.923, relative change = 6.732e-05) 
Topic 1: doctor, master, nephew, merry, school 
 Topic 2: project, works, electronic, terms, agreement 
 Topic 3: ha, o, ’em, ’d, thee 
 Topic 4: money, five, pounds, hundred, pound 
 Topic 5: donations, information, chapter, compliance, public 
 Topic 6: aunt, traddles, ,’, agnes, happy 
 Topic 7: spirit, cold, beneath, ground, grave 
 Topic 8: magistrate, person, friends, countenance, question 
 Topic 9: family, children, pocket, wife, master 
 Topic 10: boys, water, stranger, landlord, punch 
 Topic 11: 'll, squeers, 'd, rejoined, says 
 Topic 12: sister, —and, hair, woman, boys 
 Topic 13: collector, ladies, daughter, company, married 
 Topic 14: fat, 'll, ladies, party, 'oh 
 Topic 15: doctor, woman, girl, rose, lay 
 Topic 16: case, judge, prisoner, gentlemen, court 
 Topic 17: water, windows, wind, stone, lay 
 Topic 18: girl, sikes, oliver, woman, ’ve 
 Topic 19: wine, husband, wife, doctor, madame 
 Topic 20: brother, sister, girl, brothers, rejoined 
 Topic 21: wery, o, 'm, 'll, says 
 Topic 22: ,’, ’ly, ’m, ’d, ai 
 Topic 23: ma, ’am, ,’, 'am, ma’am 
 Topic 24: coach, uncle, gentlemen, horses, coat 
 Topic 25: widow, demd, soul, husband, wife 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 21 (approx. per word bound = -7.923, relative change = 6.283e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 22 (approx. per word bound = -7.922, relative change = 5.749e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 23 (approx. per word bound = -7.922, relative change = 5.321e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 24 (approx. per word bound = -7.921, relative change = 4.696e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 25 (approx. per word bound = -7.921, relative change = 4.816e-05) 
Topic 1: doctor, master, nephew, merry, school 
 Topic 2: project, works, electronic, terms, foundation 
 Topic 3: ha, o, ’em, ’d, thee 
 Topic 4: money, five, pounds, hundred, office 
 Topic 5: donations, chapter, information, compliance, public 
 Topic 6: aunt, traddles, ,’, agnes, happy 
 Topic 7: spirit, cold, beneath, ground, ghost 
 Topic 8: magistrate, person, friends, countenance, question 
 Topic 9: family, children, pocket, ,’, wife 
 Topic 10: boys, water, punch, stranger, landlord 
 Topic 11: 'll, squeers, 'd, rejoined, boys 
 Topic 12: sister, —and, hair, boys, walk 
 Topic 13: collector, ladies, daughter, company, married 
 Topic 14: fat, 'll, ladies, party, 'oh 
 Topic 15: woman, doctor, girl, rose, dead 
 Topic 16: case, judge, prisoner, gentlemen, court 
 Topic 17: water, wind, windows, stone, lay 
 Topic 18: girl, sikes, oliver, ’ve, woman 
 Topic 19: wine, husband, wife, doctor, madame 
 Topic 20: brother, sister, girl, brothers, rejoined 
 Topic 21: wery, o, 'm, 'll, says 
 Topic 22: ,’, ’ly, ’m, ’d, ai 
 Topic 23: ma, ’am, ,’, 'am, sparsit 
 Topic 24: coach, uncle, gentlemen, horses, coat 
 Topic 25: widow, demd, soul, 'am, wife 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 26 (approx. per word bound = -7.921, relative change = 4.957e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 27 (approx. per word bound = -7.920, relative change = 4.646e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 28 (approx. per word bound = -7.920, relative change = 4.388e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 29 (approx. per word bound = -7.920, relative change = 4.083e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 30 (approx. per word bound = -7.919, relative change = 3.918e-05) 
Topic 1: doctor, master, nephew, merry, happy 
 Topic 2: project, works, electronic, terms, foundation 
 Topic 3: ha, o, ’em, ’d, woman 
 Topic 4: money, five, pounds, hundred, office 
 Topic 5: donations, chapter, information, compliance, public 
 Topic 6: aunt, traddles, ,’, agnes, happy 
 Topic 7: spirit, cold, ghost, beneath, ground 
 Topic 8: magistrate, friends, person, countenance, question 
 Topic 9: family, children, ,’, pocket, wife 
 Topic 10: boys, water, glass, landlord, punch 
 Topic 11: squeers, 'll, boys, rejoined, 'd 
 Topic 12: sister, —and, hair, boys, walk 
 Topic 13: collector, ladies, daughter, company, married 
 Topic 14: fat, 'll, ladies, party, 'oh 
 Topic 15: woman, girl, doctor, rose, dead 
 Topic 16: case, judge, prisoner, gentlemen, court 
 Topic 17: water, wind, windows, stone, high 
 Topic 18: girl, sikes, oliver, ’ve, woman 
 Topic 19: wine, husband, wife, doctor, madame 
 Topic 20: brother, sister, girl, brothers, lord 
 Topic 21: wery, o, 'm, 'll, says 
 Topic 22: ,’, ’ly, ’m, ’d, ai 
 Topic 23: ma, ’am, ,’, 'am, sparsit 
 Topic 24: coach, uncle, gentlemen, horses, hat 
 Topic 25: widow, demd, 'am, soul, wife 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 31 (approx. per word bound = -7.919, relative change = 3.377e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 32 (approx. per word bound = -7.919, relative change = 3.585e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 33 (approx. per word bound = -7.919, relative change = 3.139e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 34 (approx. per word bound = -7.918, relative change = 3.132e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 35 (approx. per word bound = -7.918, relative change = 3.248e-05) 
Topic 1: doctor, master, nephew, happy, school 
 Topic 2: project, works, electronic, terms, foundation 
 Topic 3: ha, o, ’em, ’d, woman 
 Topic 4: money, five, pounds, hundred, office 
 Topic 5: donations, chapter, information, story, public 
 Topic 6: aunt, traddles, ,’, agnes, happy 
 Topic 7: spirit, ghost, cold, beneath, ground 
 Topic 8: magistrate, friends, person, countenance, question 
 Topic 9: family, children, ,’, pocket, wife 
 Topic 10: glass, water, boys, punch, company 
 Topic 11: 'll, squeers, boys, rejoined, 'd 
 Topic 12: sister, —and, hair, boys, walk 
 Topic 13: collector, ladies, daughter, company, married 
 Topic 14: fat, 'll, ladies, party, 'oh 
 Topic 15: woman, girl, doctor, rose, dead 
 Topic 16: case, judge, prisoner, gentlemen, court 
 Topic 17: water, wind, windows, stone, high 
 Topic 18: girl, sikes, oliver, ’ve, woman 
 Topic 19: husband, wine, wife, doctor, madame 
 Topic 20: brother, sister, girl, brothers, lord 
 Topic 21: wery, o, 'm, 'll, says 
 Topic 22: ,’, ’ly, ’m, ’d, ai 
 Topic 23: ma, ’am, ,’, 'am, sparsit 
 Topic 24: coach, uncle, gentlemen, hat, horses 
 Topic 25: widow, 'am, demd, soul, wife 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 36 (approx. per word bound = -7.918, relative change = 3.378e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 37 (approx. per word bound = -7.917, relative change = 3.181e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 38 (approx. per word bound = -7.917, relative change = 3.027e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 39 (approx. per word bound = -7.917, relative change = 3.036e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 40 (approx. per word bound = -7.917, relative change = 2.955e-05) 
Topic 1: doctor, master, nephew, happy, school 
 Topic 2: project, works, electronic, terms, foundation 
 Topic 3: ha, o, ’em, ’d, woman 
 Topic 4: money, five, pounds, office, hundred 
 Topic 5: donations, chapter, information, story, public 
 Topic 6: aunt, traddles, ,’, agnes, happy 
 Topic 7: spirit, ghost, cold, ground, beneath 
 Topic 8: magistrate, friends, person, countenance, certainly 
 Topic 9: family, children, ,’, pocket, wife 
 Topic 10: glass, water, boys, company, punch 
 Topic 11: squeers, 'll, boys, rejoined, 'd 
 Topic 12: sister, —and, hair, boys, walk 
 Topic 13: collector, ladies, daughter, company, married 
 Topic 14: fat, 'll, ladies, party, 'oh 
 Topic 15: girl, woman, doctor, rose, dead 
 Topic 16: case, judge, prisoner, gentlemen, court 
 Topic 17: water, wind, windows, stone, high 
 Topic 18: girl, sikes, oliver, ’ve, woman 
 Topic 19: husband, wine, wife, doctor, madame 
 Topic 20: brother, sister, girl, brothers, lord 
 Topic 21: wery, o, 'm, 'll, says 
 Topic 22: ,’, ’ly, ’m, ’d, ai 
 Topic 23: ma, ’am, ,’, sparsit, woman 
 Topic 24: coach, uncle, gentlemen, hat, coat 
 Topic 25: widow, 'am, soul, demd, ma 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 41 (approx. per word bound = -7.917, relative change = 2.865e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 42 (approx. per word bound = -7.916, relative change = 2.644e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 43 (approx. per word bound = -7.916, relative change = 2.350e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 44 (approx. per word bound = -7.916, relative change = 2.787e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 45 (approx. per word bound = -7.916, relative change = 2.514e-05) 
Topic 1: doctor, master, nephew, happy, school 
 Topic 2: project, works, electronic, terms, foundation 
 Topic 3: ha, o, ’em, ’d, woman 
 Topic 4: money, five, pounds, office, hundred 
 Topic 5: donations, chapter, story, information, public 
 Topic 6: aunt, traddles, ,’, agnes, happy 
 Topic 7: spirit, ghost, cold, ground, beneath 
 Topic 8: magistrate, friends, person, countenance, certainly 
 Topic 9: family, children, ,’, pocket, wife 
 Topic 10: glass, water, company, stranger, punch 
 Topic 11: squeers, 'll, boys, rejoined, 'd 
 Topic 12: sister, —and, hair, boys, walk 
 Topic 13: collector, ladies, daughter, company, married 
 Topic 14: fat, 'll, ladies, party, spinster 
 Topic 15: girl, woman, doctor, rose, dead 
 Topic 16: case, judge, prisoner, gentlemen, court 
 Topic 17: water, wind, windows, stone, sea 
 Topic 18: girl, sikes, oliver, ’ve, woman 
 Topic 19: husband, wine, wife, doctor, madame 
 Topic 20: brother, sister, girl, brothers, lord 
 Topic 21: wery, o, 'm, 'll, says 
 Topic 22: ,’, ’ly, ’m, ’d, ai 
 Topic 23: ma, ’am, ,’, sparsit, woman 
 Topic 24: coach, uncle, gentlemen, hat, coat 
 Topic 25: 'am, widow, ma, soul, demd 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 46 (approx. per word bound = -7.916, relative change = 2.336e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 47 (approx. per word bound = -7.915, relative change = 2.163e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 48 (approx. per word bound = -7.915, relative change = 2.115e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 49 (approx. per word bound = -7.915, relative change = 2.000e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 50 (approx. per word bound = -7.915, relative change = 1.992e-05) 
Topic 1: doctor, master, nephew, happy, school 
 Topic 2: project, works, electronic, terms, foundation 
 Topic 3: ha, o, ’em, ’d, woman 
 Topic 4: money, five, pounds, office, hundred 
 Topic 5: donations, chapter, story, public, information 
 Topic 6: aunt, traddles, ,’, agnes, happy 
 Topic 7: spirit, ghost, cold, ground, beneath 
 Topic 8: magistrate, friends, person, countenance, certainly 
 Topic 9: family, ,’, children, pocket, master 
 Topic 10: glass, water, waiter, stranger, company 
 Topic 11: squeers, 'll, boys, rejoined, 'd 
 Topic 12: sister, —and, hair, boys, walk 
 Topic 13: collector, ladies, daughter, company, married 
 Topic 14: fat, 'll, ladies, party, spinster 
 Topic 15: girl, woman, doctor, dead, rose 
 Topic 16: case, judge, prisoner, gentlemen, court 
 Topic 17: water, wind, windows, stone, sea 
 Topic 18: girl, sikes, oliver, ’ve, woman 
 Topic 19: husband, wife, wine, doctor, madame 
 Topic 20: brother, sister, girl, brothers, lord 
 Topic 21: wery, o, 'm, says, 'll 
 Topic 22: ,’, ’ly, ’m, ’d, ai 
 Topic 23: ma, ’am, ,’, sparsit, woman 
 Topic 24: coach, uncle, gentlemen, hat, coat 
 Topic 25: 'am, ma, widow, soul, demd 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 51 (approx. per word bound = -7.915, relative change = 2.005e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 52 (approx. per word bound = -7.915, relative change = 1.915e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 53 (approx. per word bound = -7.914, relative change = 1.808e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 54 (approx. per word bound = -7.914, relative change = 1.668e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 55 (approx. per word bound = -7.914, relative change = 1.573e-05) 
Topic 1: doctor, master, nephew, happy, school 
 Topic 2: project, works, electronic, terms, foundation 
 Topic 3: ha, o, ’em, ’d, woman 
 Topic 4: money, five, pounds, office, hundred 
 Topic 5: donations, chapter, story, public, information 
 Topic 6: aunt, traddles, ,’, agnes, happy 
 Topic 7: ghost, spirit, cold, ground, beneath 
 Topic 8: magistrate, friends, person, countenance, certainly 
 Topic 9: family, ,’, children, pocket, master 
 Topic 10: glass, water, waiter, stranger, company 
 Topic 11: squeers, 'll, boys, rejoined, 'd 
 Topic 12: sister, —and, hair, boys, walk 
 Topic 13: collector, ladies, daughter, company, married 
 Topic 14: fat, 'll, ladies, party, spinster 
 Topic 15: girl, woman, doctor, dead, rose 
 Topic 16: case, judge, prisoner, gentlemen, court 
 Topic 17: water, wind, windows, stone, sea 
 Topic 18: girl, sikes, oliver, ’ve, beadle 
 Topic 19: husband, wife, wine, doctor, madame 
 Topic 20: brother, sister, girl, brothers, lord 
 Topic 21: wery, o, 'm, says, 'll 
 Topic 22: ,’, ’ly, ’m, ’d, ai 
 Topic 23: ma, ’am, ,’, sparsit, woman 
 Topic 24: coach, uncle, gentlemen, hat, coat 
 Topic 25: 'am, ma, widow, soul, ma'am 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 56 (approx. per word bound = -7.914, relative change = 1.527e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 57 (approx. per word bound = -7.914, relative change = 1.612e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 58 (approx. per word bound = -7.914, relative change = 1.551e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 59 (approx. per word bound = -7.914, relative change = 1.559e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 60 (approx. per word bound = -7.914, relative change = 1.483e-05) 
Topic 1: doctor, master, nephew, happy, school 
 Topic 2: project, works, electronic, terms, foundation 
 Topic 3: ha, o, ’em, ’d, woman 
 Topic 4: money, five, pounds, office, hundred 
 Topic 5: donations, chapter, story, public, information 
 Topic 6: aunt, traddles, ,’, agnes, happy 
 Topic 7: ghost, spirit, cold, ground, beneath 
 Topic 8: magistrate, friends, person, countenance, certainly 
 Topic 9: family, ,’, children, pocket, master 
 Topic 10: glass, water, waiter, stranger, company 
 Topic 11: squeers, 'll, boys, rejoined, 'd 
 Topic 12: sister, —and, hair, boys, walk 
 Topic 13: collector, ladies, daughter, company, married 
 Topic 14: fat, 'll, ladies, party, spinster 
 Topic 15: girl, woman, doctor, dead, rose 
 Topic 16: case, judge, prisoner, gentlemen, court 
 Topic 17: water, wind, windows, stone, sea 
 Topic 18: girl, sikes, oliver, beadle, ’ve 
 Topic 19: husband, wife, wine, madame, doctor 
 Topic 20: brother, sister, girl, brothers, lord 
 Topic 21: wery, o, 'm, says, 'll 
 Topic 22: ,’, ’ly, ’m, ’d, ai 
 Topic 23: ma, ’am, ,’, sparsit, woman 
 Topic 24: coach, uncle, gentlemen, hat, coat 
 Topic 25: 'am, ma, widow, ma'am, soul 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 61 (approx. per word bound = -7.913, relative change = 1.548e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 62 (approx. per word bound = -7.913, relative change = 1.464e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 63 (approx. per word bound = -7.913, relative change = 1.413e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 64 (approx. per word bound = -7.913, relative change = 1.345e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 65 (approx. per word bound = -7.913, relative change = 1.246e-05) 
Topic 1: doctor, master, nephew, happy, school 
 Topic 2: project, works, electronic, terms, foundation 
 Topic 3: o, ha, ’em, ’d, woman 
 Topic 4: money, five, pounds, office, hundred 
 Topic 5: chapter, donations, story, public, information 
 Topic 6: aunt, traddles, ,’, agnes, happy 
 Topic 7: ghost, spirit, cold, ground, s 
 Topic 8: magistrate, friends, person, countenance, certainly 
 Topic 9: family, ,’, children, pocket, master 
 Topic 10: glass, water, waiter, stranger, company 
 Topic 11: squeers, 'll, boys, rejoined, 'd 
 Topic 12: sister, —and, hair, often, walk 
 Topic 13: collector, ladies, daughter, company, married 
 Topic 14: fat, 'll, ladies, party, spinster 
 Topic 15: girl, woman, doctor, dead, rose 
 Topic 16: case, judge, prisoner, gentlemen, court 
 Topic 17: water, wind, windows, stone, road 
 Topic 18: girl, sikes, oliver, beadle, ’ve 
 Topic 19: husband, wife, wine, madame, doctor 
 Topic 20: brother, sister, girl, brothers, daughter 
 Topic 21: wery, o, 'm, says, 'll 
 Topic 22: ,’, ’ly, ’m, ’d, ai 
 Topic 23: ma, ’am, ,’, sparsit, woman 
 Topic 24: coach, uncle, gentlemen, hat, coat 
 Topic 25: 'am, ma, widow, ma'am, soul 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 66 (approx. per word bound = -7.913, relative change = 1.236e-05) 
.........................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Model Converged 
Code
plot(stmOut_5, n = 10)

The smaller chunks produce noticeably different results. Several topics contain informal or dialectal language, showing how much space Dickens gives to people who do not speak received pronunciation — markers of rural and working-class backgrounds, and of children. Dickens’ willingness to represent people in poverty thus becomes even more salient than in earlier models.

We also get more distinct thematic topics. One points to graveyards, with words like ground, cold, beneath, lay, spot, earth and grave. Another describes comfortable evenings, with evening, dinner, remember, parlour, glad and sitting. Two topics indicate an engagement with justice: one containing prisoner, prison and death, and another containing case, question, judge, clerk, jury and attorney.

Why does 500 work better than 1,000? Shorter chunks allow thematically relevant co-occurrence patterns to emerge while shifting the frequency thresholds in a way that reveals more salient patterns. However, chunks can also be too short: if pseudo-documents are too small, only few features remain and topic detection — which hinges on several words in collaboration — suffers.


Sixth Model: Even Smaller Chunks (200 Words)

The trajectory of smaller chunks has thus far led to improvements, so we try 200-word chunks:

Code
chunk          <- 200
n              <- length(list_toks)
r              <- rep(1:ceiling(n / chunk), each = chunk)[1:n]
chunky_dickens <- split(list_toks, r)
chunky_toks    <- quanteda::tokens(chunky_dickens)
dtm_3          <- quanteda::dfm(chunky_toks)

dtm_trimmed_3 <- quanteda::dfm_trim(
  dtm_3,
  min_termfreq = 2,
  min_docfreq  = 0.005,
  max_docfreq  = 0.25,
  docfreq_type = "prop"
)

set.seed(42)
stmOut_6 <- stm::stm(documents = dtm_trimmed_3,
                     K          = 25,
                     max.em.its = 200)
Beginning Spectral Initialization 
     Calculating the gram matrix...
     Finding anchor words...
    .........................
     Recovering initialization...
    ..............................................
Initialization complete.
......................................................................................................
Completed E-Step (3 seconds). 
Completed M-Step. 
Completing Iteration 1 (approx. per word bound = -7.682) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 2 (approx. per word bound = -7.559, relative change = 1.614e-02) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 3 (approx. per word bound = -7.532, relative change = 3.564e-03) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 4 (approx. per word bound = -7.521, relative change = 1.377e-03) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 5 (approx. per word bound = -7.516, relative change = 7.186e-04) 
Topic 1: work, project, works, electronic, foundation 
 Topic 2: says, wery, o, 'm, 'i 
 Topic 3: ha, o, ’em, ’ll, tell 
 Topic 4: information, public, state, chapter, new 
 Topic 5: family, boys, whole, always, among 
 Topic 6: 'i, three, hat, small, table 
 Topic 7: brother, 'i, sister, poor, brothers 
 Topic 8: ’ll, boy, girl, woman, cried 
 Topic 9: name, friend, ghost, three, whether 
 Topic 10: dog, thee, boy, bed, thou 
 Topic 11: magistrate, friend, officer, shall, boy 
 Topic 12: ,’, ’ly, ’ll, going, ’d 
 Topic 13: aunt, boy, fat, ladies, spinster 
 Topic 14: saw, light, place, people, seemed 
 Topic 15: ,’, returned, traddles, oh, mother 
 Topic 16: work, money, paid, full, form 
 Topic 17: lady, 'i, ma, 'am, rejoined 
 Topic 18: child, heart, father, love, knew 
 Topic 19: going, mother, home, always, saw 
 Topic 20: stranger, doctor, gentlemen, uncle, widow 
 Topic 21: 'i, 'll, cried, shall, yes 
 Topic 22: use, terms, read, law, work 
 Topic 23: ma, ’am, returned, lady, ,’ 
 Topic 24: coach, behind, road, horses, place 
 Topic 25: judge, gentlemen, court, case, jury 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 6 (approx. per word bound = -7.513, relative change = 4.308e-04) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 7 (approx. per word bound = -7.511, relative change = 2.750e-04) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 8 (approx. per word bound = -7.509, relative change = 1.917e-04) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 9 (approx. per word bound = -7.508, relative change = 1.437e-04) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 10 (approx. per word bound = -7.507, relative change = 1.120e-04) 
Topic 1: work, project, works, electronic, terms 
 Topic 2: o, wery, says, 'm, 'i 
 Topic 3: ha, o, ’ll, ’em, —and 
 Topic 4: public, state, chapter, new, place 
 Topic 5: family, boys, always, three, among 
 Topic 6: table, 'i, hat, small, coat 
 Topic 7: brother, sister, poor, brothers, 'i 
 Topic 8: boy, ’ll, girl, woman, cried 
 Topic 9: name, ghost, friend, three, spirit 
 Topic 10: dog, thee, bed, boy, still 
 Topic 11: friend, magistrate, officer, 'i, shall 
 Topic 12: ,’, ’ll, ’ly, going, ’d 
 Topic 13: aunt, boy, fat, spinster, ladies 
 Topic 14: light, saw, place, seemed, people 
 Topic 15: ,’, returned, oh, traddles, always 
 Topic 16: money, work, paid, pounds, five 
 Topic 17: lady, 'i, ma, 'am, ladies 
 Topic 18: child, heart, father, love, even 
 Topic 19: going, home, mother, saw, always 
 Topic 20: stranger, doctor, uncle, gentlemen, widow 
 Topic 21: 'i, 'll, cried, yes, shall 
 Topic 22: use, read, terms, law, reading 
 Topic 23: ma, ’am, returned, lady, tea 
 Topic 24: coach, street, behind, road, horses 
 Topic 25: judge, gentlemen, court, case, prisoner 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 11 (approx. per word bound = -7.506, relative change = 8.894e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 12 (approx. per word bound = -7.506, relative change = 8.105e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 13 (approx. per word bound = -7.505, relative change = 7.107e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 14 (approx. per word bound = -7.505, relative change = 6.360e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 15 (approx. per word bound = -7.504, relative change = 5.410e-05) 
Topic 1: work, project, works, terms, electronic 
 Topic 2: o, wery, says, 'm, 'i 
 Topic 3: ha, ’ll, o, —and, ’em 
 Topic 4: public, state, chapter, new, place 
 Topic 5: boys, family, always, school, four 
 Topic 6: table, hat, large, coat, small 
 Topic 7: brother, poor, sister, brothers, charles 
 Topic 8: boy, ’ll, girl, woman, cried 
 Topic 9: name, ghost, spirit, three, friend 
 Topic 10: dog, bed, thee, boy, father 
 Topic 11: friend, magistrate, 'i, friends, officer 
 Topic 12: ,’, ’ll, ’ly, ’m, ’d 
 Topic 13: aunt, boy, fat, spinster, ladies 
 Topic 14: light, saw, place, dark, seemed 
 Topic 15: ,’, returned, oh, traddles, always 
 Topic 16: money, five, work, paid, pounds 
 Topic 17: lady, 'i, ladies, ma, 'am 
 Topic 18: child, heart, father, love, even 
 Topic 19: going, home, saw, morning, always 
 Topic 20: stranger, doctor, uncle, glass, wine 
 Topic 21: 'i, 'll, cried, yes, shall 
 Topic 22: read, use, book, reading, law 
 Topic 23: ma, ’am, returned, lady, tea 
 Topic 24: coach, street, behind, road, horses 
 Topic 25: case, gentlemen, judge, court, prisoner 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 16 (approx. per word bound = -7.504, relative change = 4.783e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 17 (approx. per word bound = -7.504, relative change = 4.267e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 18 (approx. per word bound = -7.503, relative change = 3.941e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 19 (approx. per word bound = -7.503, relative change = 3.482e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 20 (approx. per word bound = -7.503, relative change = 3.077e-05) 
Topic 1: work, project, works, terms, electronic 
 Topic 2: o, wery, says, 'm, 'i 
 Topic 3: ’ll, ha, —and, o, tell 
 Topic 4: new, state, public, chapter, place 
 Topic 5: boys, family, always, school, people 
 Topic 6: table, large, coat, hat, black 
 Topic 7: brother, poor, sister, brothers, tell 
 Topic 8: boy, girl, ’ll, woman, cried 
 Topic 9: name, ghost, spirit, prison, three 
 Topic 10: dog, bed, thee, father, still 
 Topic 11: friend, friends, magistrate, 'i, countenance 
 Topic 12: ,’, ’ll, ’ly, ’m, ’d 
 Topic 13: aunt, boy, fat, spinster, ladies 
 Topic 14: light, place, dark, seemed, saw 
 Topic 15: ,’, returned, oh, traddles, mother 
 Topic 16: money, five, business, paid, pounds 
 Topic 17: lady, 'i, ladies, ma, 'am 
 Topic 18: child, heart, father, love, even 
 Topic 19: going, home, saw, morning, evening 
 Topic 20: stranger, doctor, uncle, glass, wine 
 Topic 21: 'i, 'll, yes, cried, shall 
 Topic 22: read, book, use, letter, reading 
 Topic 23: ma, ’am, returned, lady, tea 
 Topic 24: coach, street, behind, road, horse 
 Topic 25: case, gentlemen, court, judge, prisoner 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 21 (approx. per word bound = -7.503, relative change = 2.767e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 22 (approx. per word bound = -7.503, relative change = 3.002e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 23 (approx. per word bound = -7.502, relative change = 2.585e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 24 (approx. per word bound = -7.502, relative change = 2.378e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 25 (approx. per word bound = -7.502, relative change = 2.177e-05) 
Topic 1: work, project, works, terms, electronic 
 Topic 2: o, wery, says, 'm, 'i 
 Topic 3: ’ll, ha, —and, tell, o 
 Topic 4: new, state, chapter, place, public 
 Topic 5: boys, family, always, school, people 
 Topic 6: large, black, coat, table, hat 
 Topic 7: brother, poor, sister, brothers, tell 
 Topic 8: boy, girl, woman, ’ll, cried 
 Topic 9: name, ghost, spirit, prison, husband 
 Topic 10: dog, bed, thee, hands, let 
 Topic 11: friend, friends, 'i, countenance, magistrate 
 Topic 12: ,’, ’ll, ’ly, ’m, ’d 
 Topic 13: aunt, boy, fat, spinster, ladies 
 Topic 14: light, place, dark, seemed, saw 
 Topic 15: ,’, returned, oh, traddles, mother 
 Topic 16: money, five, business, ten, pounds 
 Topic 17: lady, 'i, ladies, ma, 'am 
 Topic 18: child, heart, father, love, tears 
 Topic 19: going, home, saw, morning, evening 
 Topic 20: glass, stranger, wine, uncle, doctor 
 Topic 21: 'i, 'll, yes, cried, shall 
 Topic 22: read, book, letter, use, paper 
 Topic 23: ma, ’am, returned, lady, tea 
 Topic 24: coach, street, behind, road, horse 
 Topic 25: case, gentlemen, court, judge, prisoner 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 26 (approx. per word bound = -7.502, relative change = 2.042e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 27 (approx. per word bound = -7.502, relative change = 1.810e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 28 (approx. per word bound = -7.502, relative change = 1.564e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 29 (approx. per word bound = -7.502, relative change = 1.356e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 30 (approx. per word bound = -7.501, relative change = 1.167e-05) 
Topic 1: work, project, works, terms, electronic 
 Topic 2: o, wery, says, 'm, 'i 
 Topic 3: ’ll, —and, tell, ha, o 
 Topic 4: new, state, place, chapter, public 
 Topic 5: boys, family, always, school, people 
 Topic 6: large, black, coat, table, small 
 Topic 7: brother, poor, sister, brothers, tell 
 Topic 8: boy, girl, woman, ’ll, cried 
 Topic 9: ghost, name, spirit, prison, husband 
 Topic 10: dog, bed, hands, thee, still 
 Topic 11: friend, friends, countenance, 'i, magistrate 
 Topic 12: ,’, ’ll, ’ly, ’m, ’d 
 Topic 13: aunt, boy, fat, spinster, ladies 
 Topic 14: light, place, seemed, dark, air 
 Topic 15: ,’, returned, oh, traddles, mother 
 Topic 16: money, five, business, ten, hundred 
 Topic 17: lady, 'i, ladies, ma, 'am 
 Topic 18: child, heart, father, love, tears 
 Topic 19: home, going, saw, morning, evening 
 Topic 20: glass, stranger, wine, uncle, doctor 
 Topic 21: 'i, 'll, yes, cried, shall 
 Topic 22: read, letter, book, paper, reading 
 Topic 23: ma, ’am, returned, lady, tea 
 Topic 24: coach, street, behind, road, horse 
 Topic 25: case, gentlemen, court, judge, prisoner 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 31 (approx. per word bound = -7.501, relative change = 1.159e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 32 (approx. per word bound = -7.501, relative change = 1.056e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 33 (approx. per word bound = -7.501, relative change = 1.063e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Model Converged 
Code
plot(stmOut_6, n = 10)

Decreasing the chunk size to 200 words gets us much closer to the kind of results that allow us to address our research questions. Across the twenty-five topics, clear themes emerge.

Informal and dialectal language — reliably appearing since the first model — is now captured with even finer distinctions. One topic reflects dialectal speech, with features like wery, wot and ere. Another contains contractions like ’ll and n’t alongside interaction words like asked, hear and tell. We get several topics of this kind in total, each with distinct flavours.

Literary realism — one topic, displaying a high degree of internal consistency, pertains to travel with words including coach, road, horses, chaise and guard. A second describes outdoor spaces, containing light, wind, people, water, windows, dark, streets and sea. A third describes a comfortable social setting with glass, table, gentlemen, wine, company, bottle and chair. The granularity of these topics — describing different spaces in specific detail — reflects Dickens’ commitment to including mundane everyday experience.

Positive emotions and care — topics reflect the theme of family and love, with words like child, home, happy, loved, love and heart in one, and mother, home, pretty, laughing and remember in another. A third topic opens up in scope: heart, love, beautiful, happiness, happy, poor, people and world together indicate that Dickens has a broader conception of who is worthy of happiness and love than many of his contemporaries.

Other recurring themes include character prospects (father, business, money, brother, tell, hope), remembrance (saw, seen, sat, home, knew, gone), and loss (child, hands, heart, cried, moment, death). Topics pertaining to the legal system and to school also appear, albeit less clearly than in some earlier models.

Some topics remain barely interpretable — which is not uncommon in topic modelling.

This model with 200-word chunks allows us to address both research questions rather well: the topics reveal Dickens’ social criticism through the prominence of poverty, informal speech, and legal themes; and they reveal his literary realism through the specific, granular descriptions of spaces, travel, and social settings.


Seventh Model: 200-Word Chunks, Narrower Frequency Range

We keep the 200-word chunks but narrow the range of included words by lowering the maximum document frequency to 15%:

Code
dtm_trimmed_4 <- quanteda::dfm_trim(
  dtm_3,
  min_termfreq = 2,
  min_docfreq  = 0.005,
  max_docfreq  = 0.15,
  docfreq_type = "prop"
)

set.seed(42)
stmOut_7 <- stm::stm(documents = dtm_trimmed_4,
                     K          = 25,
                     max.em.its = 200)
Beginning Spectral Initialization 
     Calculating the gram matrix...
     Finding anchor words...
    .........................
     Recovering initialization...
    ..............................................
Initialization complete.
......................................................................................................
Completed E-Step (3 seconds). 
Completed M-Step. 
Completing Iteration 1 (approx. per word bound = -7.794) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 2 (approx. per word bound = -7.651, relative change = 1.839e-02) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 3 (approx. per word bound = -7.621, relative change = 3.953e-03) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 4 (approx. per word bound = -7.609, relative change = 1.513e-03) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 5 (approx. per word bound = -7.603, relative change = 7.912e-04) 
Topic 1: work, project, works, terms, electronic 
 Topic 2: wery, says, o, 'm, 'll 
 Topic 3: ha, o, ’em, ’ll, ’d 
 Topic 4: brother, sister, brothers, fellow, charles 
 Topic 5: bed, dog, thee, ’ll, lay 
 Topic 6: ’ll, woman, girl, sikes, oliver 
 Topic 7: information, foundation, state, public, new 
 Topic 8: magistrate, officer, town, matter, countenance 
 Topic 9: fat, small, ladies, hat, coat 
 Topic 10: pen, book, fact, four, daughter 
 Topic 11: uncle, stranger, 'am, ma, widow 
 Topic 12: family, among, boys, whole, new 
 Topic 13: ,’, ’ly, ’ll, ’d, ’m 
 Topic 14: child, father, love, mother, tears 
 Topic 15: men, dark, among, passed, wine 
 Topic 16: aunt, fat, spinster, ,’, else 
 Topic 17: ,’, traddles, mother, really, believe 
 Topic 18: ma, ’am, tea, woman, sparsit 
 Topic 19: mother, remember, horse, cart, gave 
 Topic 20: paid, money, full, work, pay 
 Topic 21: gentlemen, case, judge, court, question 
 Topic 22: 'll, 'd, squeers, boys, rejoined 
 Topic 23: coach, road, behind, horses, street 
 Topic 24: business, money, mean, girl, want 
 Topic 25: dinner, sister, pudding, glass, water 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 6 (approx. per word bound = -7.599, relative change = 4.667e-04) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 7 (approx. per word bound = -7.597, relative change = 3.038e-04) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 8 (approx. per word bound = -7.595, relative change = 2.186e-04) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 9 (approx. per word bound = -7.594, relative change = 1.602e-04) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 10 (approx. per word bound = -7.593, relative change = 1.295e-04) 
Topic 1: work, project, works, terms, electronic 
 Topic 2: wery, o, says, 'm, 'll 
 Topic 3: ha, o, ’em, ’ll, ’re 
 Topic 4: brother, sister, brothers, fellow, charles 
 Topic 5: bed, dog, thee, lay, arm 
 Topic 6: ’ll, woman, girl, oliver, sikes 
 Topic 7: information, foundation, state, number, new 
 Topic 8: magistrate, officer, countenance, matter, doctor 
 Topic 9: coat, hat, small, fat, large 
 Topic 10: pen, four, daughter, book, fine 
 Topic 11: uncle, stranger, 'am, ma, widow 
 Topic 12: family, boys, among, whole, new 
 Topic 13: ,’, ’ly, ’ll, ’d, ’m 
 Topic 14: child, father, love, tears, mother 
 Topic 15: dark, among, men, passed, sun 
 Topic 16: aunt, fat, ,’, spinster, else 
 Topic 17: ,’, traddles, really, mother, believe 
 Topic 18: ma, ’am, woman, tea, sparsit 
 Topic 19: mother, remember, coming, used, garden 
 Topic 20: money, paid, letter, pounds, five 
 Topic 21: gentlemen, case, judge, prisoner, court 
 Topic 22: 'll, 'd, rejoined, squeers, boys 
 Topic 23: coach, street, road, behind, horses 
 Topic 24: business, money, reason, mean, want 
 Topic 25: dinner, wine, glass, waiter, sister 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 11 (approx. per word bound = -7.592, relative change = 1.055e-04) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 12 (approx. per word bound = -7.592, relative change = 9.028e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 13 (approx. per word bound = -7.591, relative change = 7.406e-05) 
......................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 14 (approx. per word bound = -7.591, relative change = 6.261e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 15 (approx. per word bound = -7.590, relative change = 5.381e-05) 
Topic 1: work, project, works, terms, electronic 
 Topic 2: o, wery, says, 'm, 'd 
 Topic 3: ha, ’ll, o, ’em, —and 
 Topic 4: brother, sister, brothers, fellow, charles 
 Topic 5: bed, dog, arm, lay, thee 
 Topic 6: ’ll, girl, woman, oliver, sikes 
 Topic 7: information, number, foundation, new, state 
 Topic 8: magistrate, officer, countenance, doctor, immediately 
 Topic 9: coat, hat, fat, large, black 
 Topic 10: ghost, four, daughter, book, forty 
 Topic 11: uncle, stranger, 'am, ma, widow 
 Topic 12: boys, family, among, whole, six 
 Topic 13: ,’, ’ll, ’ly, ’d, ’m 
 Topic 14: child, father, love, tears, mother 
 Topic 15: dark, among, sun, passed, stone 
 Topic 16: aunt, ,’, spinster, fat, agnes 
 Topic 17: ,’, traddles, really, believe, father 
 Topic 18: ma, ’am, woman, tea, sparsit 
 Topic 19: mother, remember, evening, used, coming 
 Topic 20: money, letter, read, five, paper 
 Topic 21: gentlemen, judge, prisoner, case, court 
 Topic 22: 'll, rejoined, 'd, 're, squeers 
 Topic 23: coach, street, road, behind, horses 
 Topic 24: business, reason, money, question, bad 
 Topic 25: dinner, wine, glass, fire, water 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 16 (approx. per word bound = -7.590, relative change = 4.654e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 17 (approx. per word bound = -7.590, relative change = 4.092e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 18 (approx. per word bound = -7.589, relative change = 3.706e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 19 (approx. per word bound = -7.589, relative change = 3.155e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 20 (approx. per word bound = -7.589, relative change = 2.850e-05) 
Topic 1: work, project, works, terms, electronic 
 Topic 2: o, wery, says, 'm, 'd 
 Topic 3: ’ll, ha, o, ’em, —and 
 Topic 4: brother, sister, brothers, fellow, mother 
 Topic 5: bed, dog, arm, fell, men 
 Topic 6: girl, woman, ’ll, oliver, sikes 
 Topic 7: information, number, new, foundation, state 
 Topic 8: magistrate, countenance, officer, party, immediately 
 Topic 9: fat, hat, coat, black, large 
 Topic 10: ghost, four, spirit, daughter, forty 
 Topic 11: uncle, stranger, 'am, ma, widow 
 Topic 12: boys, family, among, whole, occasion 
 Topic 13: ,’, ’ll, ’ly, ’m, ’d 
 Topic 14: child, father, love, tears, mother 
 Topic 15: dark, among, sun, passed, stone 
 Topic 16: aunt, ,’, spinster, agnes, ’ll 
 Topic 17: ,’, traddles, really, believe, father 
 Topic 18: ma, ’am, woman, sparsit, tea 
 Topic 19: mother, evening, remember, used, coming 
 Topic 20: money, letter, read, five, paper 
 Topic 21: gentlemen, prisoner, judge, case, court 
 Topic 22: 'll, rejoined, 'd, 're, 'oh 
 Topic 23: coach, street, road, behind, horse 
 Topic 24: business, reason, question, bad, perhaps 
 Topic 25: dinner, glass, wine, water, fire 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 21 (approx. per word bound = -7.589, relative change = 2.367e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 22 (approx. per word bound = -7.589, relative change = 2.186e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 23 (approx. per word bound = -7.588, relative change = 2.088e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 24 (approx. per word bound = -7.588, relative change = 1.857e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 25 (approx. per word bound = -7.588, relative change = 1.580e-05) 
Topic 1: work, project, works, terms, electronic 
 Topic 2: o, wery, says, 'm, 'd 
 Topic 3: ’ll, ha, o, —and, ’em 
 Topic 4: brother, sister, brothers, mother, fellow 
 Topic 5: bed, arm, dog, fell, men 
 Topic 6: girl, woman, ’ll, oliver, sikes 
 Topic 7: number, information, new, foundation, state 
 Topic 8: magistrate, countenance, officer, party, immediately 
 Topic 9: fat, hat, coat, black, large 
 Topic 10: ghost, spirit, four, fine, forty 
 Topic 11: uncle, stranger, 'am, ma, widow 
 Topic 12: family, boys, among, occasion, world 
 Topic 13: ,’, ’ll, ’ly, ’m, ’d 
 Topic 14: child, father, love, tears, mother 
 Topic 15: dark, among, sun, passed, stone 
 Topic 16: aunt, ,’, spinster, agnes, ’ll 
 Topic 17: ,’, traddles, really, believe, agnes 
 Topic 18: ma, ’am, woman, sparsit, tea 
 Topic 19: mother, evening, used, remember, coming 
 Topic 20: money, letter, read, five, paper 
 Topic 21: gentlemen, prisoner, judge, court, case 
 Topic 22: 'll, rejoined, 'd, 're, 'oh 
 Topic 23: coach, street, road, behind, horse 
 Topic 24: business, question, reason, bad, perhaps 
 Topic 25: glass, dinner, wine, water, fire 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 26 (approx. per word bound = -7.588, relative change = 1.302e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 27 (approx. per word bound = -7.588, relative change = 1.170e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 28 (approx. per word bound = -7.588, relative change = 1.163e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 29 (approx. per word bound = -7.588, relative change = 1.091e-05) 
......................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Model Converged 
Code
plot(stmOut_7, n = 10)

In comparison to the sixth model, the improvements here are subtler but worth examining.

Appearance and social setting — there is now a topic more clearly detailing people’s appearance, with coat, black, hat, small, white, large, pretty and appearance. A companion topic describes social settings with gentlemen, company, ladies, fat, chair, party, honourable and crowd.

Passing of time — a topic captures the passage of time evocatively, with evening, often, walked, quiet, hour, passed, days, window, thoughts and hours.

Legal and institutional interactions — the legal system theme gains specificity, with paper and read appearing in the same topic as case, prisoner, court and judge. A related topic captures formal institutional interactions with matter, beg, magistrate, fellow, person, pray, certainly and inquired.

Family and heritage — a topic on love contains love, child, happy, woman, speak, tears, loved and feel. A second captures heritage with mother, gave, mine, remember, pretty, wonder, father, suppose and baby. The words wonder and suppose bring a quality of speculation that may well evoke characters like the orphan Pip in Great Expectations.

An interesting observation is that the word poor no longer appears in this model. This indicates that it occurs in somewhere between 15% and 25% of all pseudo-documents — making it quite a frequent word. Of course, poor is not necessarily related to poverty per se, as it can also express sympathy. But Dickens’ use of poor could be a productive avenue for more qualitative investigation.

As always, some topics containing contractions and alternate spellings are present, and some topics remain challenging to interpret.


Final Comments

Many roads lead to Rome. This goes for each step of the way: the choice of programming language and packages, the pre-processing decisions, and the precise specification and interpretation of the models. There is no one-size-fits-all approach.

Perhaps the best frame for thinking about topic modelling is provided by Tangherlini and Leonard (2013), who discuss it in terms of a division of labour: “the computer algorithm is given the task of doing what it does best: counting words and calculating probabilities of term co-occurrence” and the “researcher is given the task of doing what he or she does best: applying domain expertise and experience for labelling and curating the topics” (Tangherlini and Leonard 2013, 728).

Working through this tutorial, you have seen how tweaking parameters — number of topics, document-frequency thresholds, chunk size — can change the results substantially. The journey from a first, uninterpretable model to one that provides genuine insight into Dickens’ social criticism and literary realism illustrates both the challenges and the rewards of computational text analysis.

Parameter Summary
Model Chunk K min_docfreq max_docfreq Key finding
1 1,000 10 0.0001 0.15 Baseline — few interpretable topics
2 1,000 20 0.0001 0.15 More topics → clearer themes
3 1,000 20 0.005 0.15 Rarer words → marginal change
4 1,000 30 0.01 0.25 More common words → no improvement
5 500 25 0.005 0.25 Smaller chunks → clearer themes
6 200 25 0.005 0.25 Even smaller chunks → best overall
7 200 25 0.005 0.15 Narrower range → finer detail

The general trend: smaller chunks and more topics tend to produce more interpretable results, up to the point where chunks become so small that co-occurrences are too sparse.


Citation & Session Info

Schneider, Gerold, Max Lauber & Martin Schweinberger. 2026. Topic Modelling of Charles Dickens’ Novels. Brisbane: The Language Technology and Data Analysis Laboratory (LADAL). url: https://ladal.edu.au/tutorials/topmod/topmod.html (Version 2026.05.01).

@manual{schneider2026topmod,
  author       = {Schneider, Gerold and Lauber, Max and Schweinberger, Martin},
  title        = {Topic Modelling of Charles Dickens' Novels},
  note         = {tutorials/topmod/topmod.html},
  year         = {2026},
  organization = {The University of Queensland, Australia. School of Languages and Cultures},
  address      = {Brisbane},
  edition      = {2026.05.01}
}
AI Transparency Statement

This tutorial was adapted for LADAL by Martin Schweinberger with the assistance of Claude (claude.ai), a large language model created by Anthropic. The original tutorial was authored by Gerold Schneider and Max Lauber for the Australian Text Analytics Platform (ATAP, 2022). The adaptation involved: converting the document to Quarto format; replacing spacyr (Python-dependent) with udpipe (pure R); adding set.seed(42) before all stm() calls for reproducibility; adding a disk-cache for the udpipe annotation step; replacing the direct gutenberg_download() call with a robust mirror-fallback helper (gutenberg_safe()); fixing the as.tokens() call to work correctly with udpipe data frame output via split(); converting all div blocks to Quarto callouts; and adding LADAL-style section overviews and learning objectives. The original research questions, corpus, pre-processing logic, and iterative modelling narrative are entirely the work of the original authors.

Code
sessionInfo()
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26200)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Australia/Brisbane
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] purrr_1.2.1                udpipe_0.8.11             
[3] dplyr_1.2.0                stm_1.3.8                 
[5] tidytext_0.4.2             quanteda.textmodels_0.9.10
[7] quanteda_4.2.0             gutenbergr_0.2.4          
[9] checkdown_0.0.13          

loaded via a namespace (and not attached):
 [1] Matrix_1.7-2        glmnet_4.1-8        jsonlite_2.0.0     
 [4] janeaustenr_1.0.0   compiler_4.4.2      BiocManager_1.30.27
 [7] renv_1.1.7          stopwords_2.3       tidyselect_1.2.1   
[10] Rcpp_1.1.1          splines_4.4.2       yaml_2.3.10        
[13] fastmap_1.2.0       lattice_0.22-6      R6_2.6.1           
[16] SnowballC_0.7.1     generics_0.1.4      shape_1.4.6.1      
[19] knitr_1.51          iterators_1.0.14    htmlwidgets_1.6.4  
[22] tibble_3.3.1        pillar_1.11.1       tokenizers_0.3.0   
[25] rlang_1.1.7         fastmatch_1.1-8     stringi_1.8.7      
[28] xfun_0.56           cli_3.6.5           magrittr_2.0.4     
[31] digest_0.6.39       foreach_1.5.2       grid_4.4.2         
[34] rstudioapi_0.17.1   markdown_2.0        lifecycle_1.0.5    
[37] vctrs_0.7.2         data.table_1.17.0   evaluate_1.0.5     
[40] glue_1.8.0          codetools_0.2-20    survival_3.7-0     
[43] rmarkdown_2.30      matrixStats_1.5.0   tools_4.4.2        
[46] pkgconfig_2.0.3     htmltools_0.5.9    

Back to top

Back to LADAL home


References

Firth, John Rupert. 1957. “Studies in Linguistic Analysis.” In A Synopsis of Linguistic Theory 1930–1955, edited by John Rupert Firth, 1–32. Oxford: Blackwell.
Jockers, Matthew L. 2013. Macroanalysis: Digital Methods and Literary History. Vol. 51. Urbana, Chicago; Springfield: University of Illinois Press. https://doi.org/https://doi.org/10.5860/choice.51-4276.
Kailash, Sudha. 2012. “Charles Dickens as a Social Critic.” International Journal of Research in Economics & Social Sciences 2 (8): 1–51.
Mahlberg, Michaela. 2013. Corpus Stylistics and Dickens’s Fiction. Routledge. https://doi.org/https://doi.org/10.4324/9780203076088.
Mohr, John W, and Petko Bogdanov. 2013. “Introduction–Topic Models: What They Are and Why They Matter.” Poetics 41 (6): 545–69. https://doi.org/https://doi.org/10.1016/j.poetic.2013.10.001.
Roberts, Margaret E, Brandon M Stewart, and Dustin Tingley. 2019. “Stm: An r Package for Structural Topic Models.” Journal of Statistical Software 91: 1–40.
Tangherlini, Timothy R, and Peter Leonard. 2013. “Trawling in the Sea of the Great Unread: Sub-Corpus Topic Modeling and Humanities Research.” Poetics 41 (6): 725–49. https://doi.org/https://doi.org/10.1016/j.poetic.2013.08.002.