Local Large Language Models in R with Ollama

Introduction

This tutorial introduces Ollama and the ollamar R package — a toolkit for running open-source large language models (LLMs) directly on your own machine and calling them from R. Unlike cloud-based AI services, Ollama requires no API key, sends no data to external servers, and works entirely offline once a model has been downloaded. This makes it particularly well suited for research involving sensitive or proprietary text, for reproducible analyses that must not depend on third-party service availability, and for teaching environments without budget for commercial API access.
The tutorial covers the conceptual foundations of local LLM inference, installation and setup, and eight practical workflows relevant to corpus linguistics and NLP research: basic text generation, multi-turn conversation, sentiment analysis and text classification, named entity recognition, text summarisation, generating embeddings, corpus-scale batch processing, and using a model to assist with writing R code.
Before working through this tutorial, you should be comfortable with:
- Getting Started with R — R objects, functions, and the tidyverse
- String Processing in R — working with text in R
- Loading and Saving Data — reading files into R
Familiarity with basic NLP concepts (tokens, sentiment, named entities) is helpful but not required — concepts are introduced as they arise.
By the end of this tutorial you will be able to:
- Explain what Ollama is, how local LLM inference works, and when to prefer it over cloud APIs
- Install Ollama, pull a model, and verify the connection from R
- Generate text from a prompt using
generate() - Build multi-turn conversations using
chat()and conversation history management - Use prompt engineering to perform sentiment analysis, NER, and text summarisation
- Generate sentence embeddings with
embed()for downstream analysis - Process a corpus of texts at scale using parallelisation
- Use a local LLM to assist with writing and debugging R code
Schweinberger, Martin. 2026. Local Large Language Models in R with Ollama. Brisbane: The Language Technology and Data Analysis Laboratory (LADAL). url: https://ladal.edu.au/tutorials/ollama/ollama.html (Version 2026.05.01).
What Is Ollama and Why Use It?
What you will learn: What Ollama is and how it works; the difference between local and cloud-based LLM inference; the key advantages of running models locally; hardware requirements; and how ollamar connects R to the Ollama server
Local vs Cloud LLM Inference
When you use a cloud-based LLM service — such as OpenAI’s GPT, Anthropic’s Claude, or Google’s Gemini — your text is sent over the internet to a remote server, processed there, and the response is returned to you. This works well for many tasks but raises concerns in three areas:
Privacy — any text you send to a cloud API is processed on servers you do not control. For research involving sensitive data (patient records, confidential documents, anonymised survey responses), this is often unacceptable under institutional ethics approvals and data governance policies.
Cost — commercial APIs charge per token. For corpus linguistics, where you may need to process thousands of texts, costs can escalate rapidly. A corpus of 10,000 abstracts processed with a cloud API at typical 2025 pricing can cost tens to hundreds of dollars.
Reproducibility — cloud models are updated without notice. An analysis run today against GPT-4o may produce different results next month against a silently updated version of the same model. Local models, by contrast, are fixed: the weights you download today are the weights you use in six months.
Ollama eliminates all three concerns by running the model entirely on your own hardware.
What Ollama Is
Ollama is a free, open-source application that downloads, manages, and serves open-source LLMs locally. It provides a REST API on http://127.0.0.1:11434 that any application — including R, Python, or a web browser — can call to generate text, chat, or produce embeddings. From R, the ollamar package (lin2024ollamar?) wraps this API in a clean set of R functions.
Your R script
│
▼
ollamar (R package) ──HTTP──▶ Ollama server (127.0.0.1:11434)
│
▼
Local LLM weights
(stored on your machine)
│
▼
Response text
Ollama supports a large and growing library of models. Once you have installed Ollama, pulling a new model is a single command.
Hardware Requirements
LLMs vary greatly in size. The model used throughout this tutorial, llama3.2:3b, is a 3-billion-parameter model that runs on virtually any modern laptop with at least 8 GB of RAM, without a GPU. Larger models require more resources:
| Model size | RAM required | GPU needed? | Typical use |
|---|---|---|---|
| 1B–3B | 4–8 GB | No | Teaching, prototyping, simple tasks |
| 7B–8B | 8–16 GB | Optional | Most NLP research tasks |
| 13B | 16 GB | Recommended | High-quality generation |
| 70B+ | 48 GB+ | Required | Near-GPT-4 quality |
For the tasks in this tutorial, llama3.2:3b is sufficient and runs comfortably without a GPU on a standard research laptop.
The ollamar Package
The ollamar package (lin2024ollamar?) uses the httr2 library to make HTTP requests to the Ollama server. Most functions return an httr2_response object by default, which must be parsed with resp_process(). Alternatively, you can specify the output format directly using the output parameter, which accepts "text", "df" (tibble), "jsonlist", "raw", or "resp" (the default httr2 response).
ollamar is an R interface to Ollama — a separate application that must be installed on your machine before any ollamar function will work. Installing the R package alone is not sufficient.
To install Ollama:
- Go to ollama.com in your browser
- Click Download — the site detects your operating system (Windows, Mac, or Linux) automatically
- Run the installer (
OllamaSetup.exeon Windows; drag to Applications on Mac) - After installation, close and reopen your terminal — Windows and Mac need a fresh terminal session to recognise the new
ollamacommand - Verify the installation worked by opening a terminal and running:
ollama --versionYou should see a version number. If you see “not recognised” or “command not found”, restart your computer and try again.
Then download the model used in this tutorial:
ollama pull llama3.2This downloads the 3B model (~2 GB) and only needs to be done once. Ollama stores model weights automatically in its own cache folder — you do not need to specify a location.
Once installed, Ollama runs as a background service and starts automatically at login. You can confirm it is running by checking the system tray (Windows) or menu bar (Mac) for the Ollama icon. From R, verify the connection with ollamar::test_connection() before running any analysis code.
Q1. A researcher wants to use an LLM to analyse 5,000 interview transcripts that contain sensitive personal information about mental health. She is considering using a commercial cloud API. What are the two most important reasons she should use a local model via Ollama instead?
Q2. A colleague says: ‘llama3.2:3b is only 3 billion parameters — it must be far worse than GPT-4 and not worth using for research.’ What is the most accurate response to this argument?
Setup
What you will learn: How to install Ollama; how to pull (download) a model; how to install and load ollamar; how to test the connection; and what to do when things go wrong
Step 1 — Install Ollama
Download and install Ollama from ollama.com. Installers are available for Windows, Mac, and Linux. After installation, Ollama runs as a background service and starts automatically when your computer boots.
To verify Ollama is running, open a terminal and type:
ollama --versionYou should see a version number. If you see “command not found”, Ollama is not installed or not on your PATH.
Step 2 — Pull a Model
Download the llama3.2:3b model. This is a 2 GB download and only needs to be done once:
ollama pull llama3.2To see all models you have downloaded:
ollama listllama3.2 defaults to the 3B parameter version, which runs on any laptop with 8 GB RAM and no GPU. If you have more resources available, llama3.1:8b (requires 8 GB RAM) produces noticeably better output for complex tasks. Pull it with ollama pull llama3.1. Throughout this tutorial we use llama3.2 for accessibility, with notes where a larger model would improve results.
Step 3 — Install ollamar
Code
# Stable version from CRAN
install.packages("ollamar")
# Development version with latest features (optional)
# install.packages("remotes")
# remotes::install_github("hauselin/ollamar")
# Additional packages used in this tutorial
install.packages(c(
"dplyr", "purrr", "tibble", "stringr",
"ggplot2", "flextable", "httr2", "checkdown"
))Step 4 — Load Packages
Code
library(ollamar)
library(dplyr)
library(purrr)
library(tibble)
library(stringr)
library(ggplot2)
library(flextable)
library(httr2)
library(checkdown)Step 5 — Test the Connection
Code
# Check that Ollama is running and R can reach it
ollamar::test_connection()<httr2_response>
GET http://localhost:11434/
Status: 200 OK
Content-Type: text/plain
Body: In memory (17 bytes)
A successful connection prints a confirmation message. If you see "Ollama local server not running or wrong server", check that the Ollama application is open and running in the background.
Code
# See which models you have downloaded
ollamar::list_models() name size parameter_size quantization_level
1 llama3.2:latest 2 GB 3.2B Q4_K_M
2 nomic-embed-text:latest 274 MB 137M F16
modified
1 2026-03-20T08:40:36
2 2026-03-20T08:40:37
Expected output:
name size parameter_size quantization_level modified
1 llama3.2:latest 2.0 GB 3.2B Q4_K_M 2025-01-15
ollamar communicates with Ollama via HTTP. If Ollama is not running when you call generate(), chat(), or embed(), you will get a connection error. Always check test_connection() at the start of your session if you are unsure.
Q3. You run test_connection() and see "Ollama local server not running or wrong server". You are sure Ollama is installed. What should you check first?
Q4. You run list_models() and see an empty data frame. What does this mean and what should you do?
Basic Text Generation
What you will learn: How generate() works; the output parameter and its five format options; how to write effective prompts; and how to inspect and process the response object
The generate() Function
generate() is the simplest way to get a response from a model. It takes a model name and a prompt and returns a response in the format you specify:
Code
library(ollamar)
# Generate a response — returns httr2_response object by default
resp <- ollamar::generate("llama3.2", "What is corpus linguistics?")
# Inspect the raw response object
resp<httr2_response>
POST http://127.0.0.1:11434/api/generate
Status: 200 OK
Content-Type: application/json
Body: In memory (6271 bytes)
Code
# <httr2_response>
# POST http://127.0.0.1:11434/api/generate
# Status: 200 OK
# Extract just the text
ollamar::resp_process(resp, "text")[1] "Corpus linguistics is a subfield of linguistics that deals with the study of language through the analysis and examination of large databases or \"corpora\" of texts, speech, or other forms of communication. The term \"corpus\" comes from Latin, meaning \"body\" or \"collection\".\n\nIn corpus linguistics, researchers use digital tools to analyze and quantify linguistic data, often from large collections of texts, such as books, articles, emails, social media posts, or conversations. By examining these corpora, researchers can identify patterns, trends, and relationships in language that might not be apparent through traditional qualitative methods.\n\nCorpus linguistics is used in various areas of research, including:\n\n1. **Language description**: Corpus linguists study the grammar, syntax, vocabulary, and pronunciation of languages to describe their structure and usage.\n2. **Language teaching**: Corpora are used to develop language learning materials, such as textbooks and online resources, that reflect current language use.\n3. **Language evaluation**: Corpora help assess language proficiency, detect linguistic errors, and evaluate the effectiveness of language teaching methods.\n4. **Discourse analysis**: Corpus linguists analyze how people use language in different contexts, such as in conversations, meetings, or written texts.\n5. **Stylistics**: Researchers study how authors' styles and preferences influence language use in different genres, such as fiction, non-fiction, or poetry.\n\nSome of the key tools and techniques used in corpus linguistics include:\n\n1. **Text analysis software**: Programs like Latent Dirichlet Allocation (LDA), topic modeling, and network analysis help researchers identify patterns and trends in corpora.\n2. **Machine learning algorithms**: Techniques like clustering, classification, and regression are applied to analyze large datasets and make predictions about language behavior.\n3. **Natural Language Processing (NLP)**: NLP methods enable researchers to extract information from text data, such as named entities, sentiment analysis, or part-of-speech tagging.\n\nCorpus linguistics has many benefits, including:\n\n1. **Objectivity**: Corpora provide a neutral, quantifiable approach to language analysis.\n2. **Scale**: Large corpora allow researchers to study vast amounts of data and identify patterns that might be missed through qualitative methods.\n3. **Generalizability**: Corpus linguistics findings can be applied across languages and contexts.\n\nHowever, corpus linguistics also has limitations and challenges, such as:\n\n1. **Data quality**: The accuracy and relevance of corpora depend on their collection, annotation, and maintenance.\n2. **Methodological issues**: Researchers must carefully consider the design, implementation, and interpretation of corpus-based studies to ensure validity and reliability.\n\nOverall, corpus linguistics offers a powerful tool for understanding language structure, use, and behavior, enabling researchers to uncover insights that can inform language teaching, research, and policy-making."
Code
# Or get a tidy tibble with metadata
ollamar::resp_process(resp, "df")# A tibble: 1 × 3
model response created_at
<chr> <chr> <chr>
1 llama3.2 "Corpus linguistics is a subfield of linguistics that dea… 2026-03-1…
Output Formats
The output parameter saves you from calling resp_process() separately:
Code
# Text string — most convenient for single outputs
txt <- ollamar::generate("llama3.2",
"Define collocations in corpus linguistics.",
output = "text")
cat(txt)In corpus linguistics, a collocation is a pair or group of words that occur together in a language, often in a specific context or register, and are more common than expected by chance alone. Collocations are also known as lexical bundles or word clusters.
Corpus linguists use statistical analysis to identify patterns of co-occurrence between words in large databases of text, such as corpora. By examining these patterns, researchers can identify collocations that are particularly common or uncommon, and gain insights into the way language is used in different contexts.
Collocations can be classified into several types, including:
1. Fixed expressions: Phrases that are grammatically fixed and cannot be changed without altering their meaning.
2. Semantic associations: Words that have a shared meaning or connotation.
3. Syntactic patterns: Word orders that are commonly used together in sentences.
4. Idiomatic expressions: Collocations that have a unique meaning that is different from the individual words.
Understanding collocations is important for corpus linguistics because it allows researchers to:
1. Identify linguistic patterns and trends in language use.
2. Develop models of language acquisition and language teaching.
3. Analyze the style and tone of writing or speech.
4. Inform language teaching and learning strategies.
5. Improve language processing and machine translation algorithms.
Some examples of collocations include:
* "the meaning of life" (a fixed expression)
* "to be on the same page" (a semantic association)
* "in a nutshell" (a syntactic pattern)
* "break a leg" (an idiomatic expression)
By examining collocations, corpus linguists can gain a deeper understanding of how language is used in different contexts and develop new insights into the nature of language itself.
Code
# Tibble — useful for storing results alongside metadata
df <- ollamar::generate("llama3.2",
"Name three open-source corpora of English.",
output = "df")
glimpse(df)Rows: 1
Columns: 3
$ model <chr> "llama3.2"
$ response <chr> "Here are three open-source corpora of English:\n\n1. **Cor…
$ created_at <chr> "2026-03-19T22:45:54.3719384Z"
Code
# Columns: model, response, done, total_duration, ...
# JSON list — useful for programmatic parsing
jl <- ollamar::generate("llama3.2",
"What is TF-IDF?",
output = "jsonlist")output value |
Returns | Best for |
|---|---|---|
"resp" (default) |
httr2 response object | Checking status codes; low-level access |
"text" |
Character string | Simple single-call workflows |
"df" |
Tibble with metadata | Storing results with timing and model info |
"jsonlist" |
Named R list | Programmatic access to all response fields |
"raw" |
Raw bytes | Advanced / debugging |
Writing Effective Prompts
The quality of the output depends heavily on how the prompt is written. Three principles are especially important:
Be specific about the task. A vague prompt produces a vague response. “Tell me about this text” is far less effective than “In one sentence, identify the main topic of the following text.”
Specify the output format. If you need structured output, say so explicitly: “Respond with only a single word: positive, negative, or neutral.” Without this, the model may explain its reasoning at length, which complicates downstream processing.
Provide context. The model knows nothing about your research project. Brief framing — “You are a corpus linguistics researcher analysing parliamentary debate transcripts” — shapes the register and vocabulary of the response.
Code
# Vague prompt — produces a generic, hard-to-process response
ollamar::generate("llama3.2",
"What do you think about this sentence: 'The bill was passed.'",
output = "text")[1] "A simple yet informative sentence. It conveys a neutral fact without taking a stance or expressing an opinion. The use of the present tense (\"was passed\") also adds to its objective and matter-of-fact tone.\n\nIn terms of language, the sentence is grammatically correct and follows standard English syntax. However, it's worth noting that the verb \"passed\" can have different connotations depending on the context in which it's used. For example, if the speaker was involved in the legislative process, they might use a more formal term like \"enacted.\" But in this case, the sentence seems to be simply reporting a fact without adding any emotional or evaluative language.\n\nOverall, I think the sentence is clear and effective for conveying a straightforward piece of information."
Code
# Specific prompt — produces a precise, processable response
ollamar::generate("llama3.2",
paste(
"You are a computational linguist.",
"Identify the grammatical voice (active or passive) of the following sentence.",
"Respond with exactly one word: active or passive.",
"Sentence: 'The bill was passed.'"
),
output = "text"
)[1] "Passive."
Code
# Expected: "passive"Q5. You call generate("llama3.2", "Summarise this corpus.", output = "text") and receive a lengthy explanation of what corpus summarisation is rather than a summary of your corpus. What is the most likely cause and how should you fix it?
Q6. You want to generate responses for 100 different prompts using generate() in a loop. A colleague suggests using output = "df" rather than output = "text". What advantage does the "df" format offer for a batch workflow?
Multi-Turn Chat
What you will learn: The difference between generate() and chat(); how conversation history is structured as a list of messages; how to use create_message(), append_message(), and related helpers; how system prompts shape model behaviour; and how to build a reusable chat loop
generate() vs chat()
generate() is stateless — each call is independent and the model has no memory of previous calls. chat() maintains conversation context by accepting a message history: a list of all previous turns (both user messages and assistant responses). This allows follow-up questions, iterative refinement, and role-playing scenarios where the model maintains a consistent persona across turns.
The message history is a list of named lists, each with two elements: role (one of "system", "user", or "assistant") and content (the message text):
list(
list(role = "system", content = "You are a helpful linguist."),
list(role = "user", content = "What is a hapax legomenon?"),
list(role = "assistant", content = "A hapax legomenon is a word that occurs only once..."),
list(role = "user", content = "Can you give me an example from Shakespeare?")
)Creating and Managing Message History
ollamar provides helper functions so you never need to construct these lists manually:
Code
# create_message() — start a history with one message
# The second argument is the role; default is "user"
messages <- ollamar::create_message(
"You are a corpus linguistics expert. Give concise, precise answers.",
"system"
)
# append_message() — add a user turn
messages <- ollamar::append_message(
"What is the difference between type frequency and token frequency?",
"user",
messages
)
# Send to the model and get a response
resp <- ollamar::chat("llama3.2", messages, output = "df")
# The model's reply
cat(resp$content)Token frequency refers to the number of times each word appears in a given text, regardless of its part of speech or grammatical context.
Type frequency, on the other hand, refers to the number of unique words (types) in a given text. It measures the variety of vocabulary used in the text, whereas token frequency provides information about the distribution of specific words within that text.
To continue the conversation, append the assistant’s reply and a new user message:
Code
# Append the model's reply to maintain context
messages <- ollamar::append_message(resp$content, "assistant", messages)
# Add the next user question
messages <- ollamar::append_message(
"How would I calculate type-token ratio in R?",
"user",
messages
)
# Continue the conversation
resp2 <- ollamar::chat("llama3.2", messages, output = "df")
cat(resp2$content)You can calculate the type-token ratio (TR) in R using the following steps:
1. Tokenize your text data into individual words or tokens.
2. Count the number of unique tokens (type).
3. Count the total number of tokens.
The formula for TR is: `TR = Type / Total Tokens`
In R, you can use the following code to calculate TR:
```r
library(stringr)
text_data <- your_text_data
tokenized_text <- str_split(text_data, "\\s+")[[1]]
unique_tokens <- length(unique(tokenized_text))
total_tokens <- length(tokenized_text)
tr_ratio <- unique_tokens / total_tokens
print(tr_ratio)
```
Note: This assumes that you have already tokenized your text data. If not, you can use the `str_split` function to split your text into individual words or tokens.
System Prompts
A system prompt is a message with role = "system" placed at the start of the history. It sets the model’s persona, constraints, and output format for the entire conversation. System prompts are one of the most powerful tools for producing consistent, well-formatted output:
Code
# System prompt for a structured linguistic analysis assistant
sys_prompt <- paste(
"You are a linguistic analysis assistant specialising in corpus linguistics.",
"You always respond in plain text, without markdown formatting.",
"Your answers are concise and technically precise.",
"When asked to classify text, you respond with only the label — no explanation."
)
messages <- ollamar::create_message(sys_prompt, "system")
messages <- ollamar::append_message(
"Classify the register of this text: 'Pursuant to the provisions of section 42...'",
"user",
messages
)
ollamar::chat("llama3.2", messages, output = "text")[1] "Formal/Legal"
Code
# Expected: "legal/formal"A Reusable Chat Loop
For interactive use, you can build a simple loop that manages the conversation history automatically:
Code
# Initialise with a system prompt
messages <- ollamar::create_message(
"You are a helpful R programming assistant for linguists.",
"system"
)
# Simple interactive chat loop — run in RStudio console, not knitted document
chat_with_model <- function(model = "llama3.2") {
msgs <- ollamar::create_message(
"You are a helpful R programming and corpus linguistics assistant.",
"system"
)
cat("Chat started. Type 'quit' to exit.\n\n")
repeat {
user_input <- readline("You: ")
if (trimws(user_input) == "quit") { cat("Goodbye!\n"); break }
msgs <- ollamar::append_message(user_input, "user", msgs)
resp <- ollamar::chat(model, msgs, output = "df")
reply <- resp$content
msgs <- ollamar::append_message(reply, "assistant", msgs)
cat("Model:", reply, "\n\n")
}
}
chat_with_model()ollamar provides a full set of message manipulation helpers:
prepend_message()— add a message to the beginning of the historyinsert_message()— insert at a specific position (positive or negative index)delete_message()— remove a message at a specific positioncreate_messages()— create a history with multiple messages at once
These are particularly useful when building complex multi-turn workflows where the conversation history needs to be edited programmatically.
Q7. You are building a chat workflow for annotating linguistic data. After 10 turns, you notice the model has started ignoring your system prompt instructions. What is the most likely cause?
Q8. What is the key practical difference between using generate() and chat() for a sequence of related prompts about the same document?
Sentiment Analysis and Text Classification
What you will learn: How to use prompt engineering to turn a general-purpose LLM into a text classifier; how to enforce structured output; how to process a vector of texts and collect results; and how to evaluate classification output
Prompt-Based Classification
Rather than fine-tuning a model, LLMs can be directed to perform classification tasks through careful prompt design. The key principle is to ask for a constrained response — a single label from a defined set — rather than an open-ended answer.
Code
classify_sentiment <- function(text, model = "llama3.2") {
prompt <- paste(
"Classify the sentiment of the following text.",
"Respond with exactly one word: positive, negative, or neutral.",
"Do not explain your answer.",
paste0("Text: '", text, "'")
)
ollamar::generate(model, prompt, output = "text") |>
trimws() |>
tolower()
}
# Test on a single sentence
classify_sentiment("The results were surprisingly strong and exceeded all expectations.")[1] "positive."
Code
# Expected: "positive"
classify_sentiment("The methodology is flawed and the conclusions are unwarranted.")[1] "negative."
Code
# Expected: "negative"Batch Classification
Apply the classifier to a vector of texts using purrr::map_chr():
Code
reviews <- tibble::tibble(
id = 1:6,
text = c(
"An outstanding contribution to the field — clear, rigorous, and insightful.",
"The paper is poorly structured and the argument is difficult to follow.",
"The study replicates previous findings without offering new theoretical insights.",
"A welcome addition to the literature on discourse coherence.",
"The sample size is too small to support the generalisations made.",
"The cross-linguistic comparison is both ambitious and well executed."
)
)
reviews <- reviews |>
dplyr::mutate(
sentiment = purrr::map_chr(text, classify_sentiment)
)
reviews |>
dplyr::select(id, sentiment, text) |>
flextable::flextable() |>
flextable::set_table_properties(width = .95, layout = "autofit") |>
flextable::theme_zebra() |>
flextable::fontsize(size = 10) |>
flextable::set_caption(caption = "Sentiment classification of academic review sentences using llama3.2.") |>
flextable::border_outer()id | sentiment | text |
|---|---|---|
1 | positive. | An outstanding contribution to the field — clear, rigorous, and insightful. |
2 | negative | The paper is poorly structured and the argument is difficult to follow. |
3 | negative | The study replicates previous findings without offering new theoretical insights. |
4 | positive | A welcome addition to the literature on discourse coherence. |
5 | negative | The sample size is too small to support the generalisations made. |
6 | positive | The cross-linguistic comparison is both ambitious and well executed. |
Multi-Class Topic Classification
The same pattern extends to any classification scheme. Here we classify academic sentences by rhetorical function:
Code
classify_rhetorical <- function(text, model = "llama3.2") {
prompt <- paste(
"Classify the rhetorical function of the following academic sentence.",
"Choose exactly one label from: background, method, result, conclusion.",
"Respond with that single word only.",
paste0("Sentence: '", text, "'")
)
ollamar::generate(model, prompt, output = "text") |>
trimws() |>
tolower()
}
sentences <- c(
"Previous research has established a strong link between frequency and acceptability.",
"We collected data from 120 native speakers using an online survey platform.",
"The analysis revealed a significant effect of register on hedging frequency (p < .001).",
"These findings suggest that usage-based accounts require revision."
)
purrr::map_chr(sentences, classify_rhetorical)[1] "conclusion" "background" "conclusion." "conclusion"
Code
# Expected: c("background", "method", "result", "conclusion")Q9. You run batch sentiment classification on 200 texts and find that about 15% of results contain extra words like “The sentiment is positive” instead of just “positive”. What prompt change would most reliably fix this?
Q10. You want to classify 1,000 newspaper headlines by topic (politics, economics, sport, culture, science) using a local LLM. A colleague suggests evaluating the classifier on 50 manually labelled headlines before using it on the full corpus. Why is this a good practice?
Named Entity Recognition
What you will learn: How to prompt a local LLM to identify and classify named entities; how to request structured JSON output for easier parsing; how to parse and process entity output in R; and the trade-offs between LLM-based NER and dedicated NER models
Prompting for NER
Named entity recognition asks the model to identify spans of text that refer to real-world objects and classify them by type (person, organisation, location, etc.). Requesting JSON output makes the response much easier to parse programmatically:
Code
extract_entities <- function(text, model = "llama3.2") {
prompt <- paste(
"Extract all named entities from the following text.",
"Return a JSON array where each element has two fields:",
"'entity' (the text span) and 'type' (one of: PERSON, ORG, LOC, DATE, MISC).",
"Return only the JSON array — no explanation, no markdown, no code block.",
paste0("Text: '", text, "'")
)
raw <- ollamar::generate(model, prompt, output = "text") |> trimws()
# Try to isolate just the JSON array in case the model added surrounding text
json_str <- stringr::str_extract(raw, "\\[.*\\]")
if (is.na(json_str)) json_str <- raw
result <- tryCatch(
jsonlite::fromJSON(json_str, simplifyDataFrame = TRUE),
error = function(e) {
warning("JSON parsing failed. Raw output was: ", raw)
NULL
}
)
# Validate that we got a proper data frame with the right columns
if (is.null(result) ||
!is.data.frame(result) ||
!all(c("entity", "type") %in% names(result))) {
return(tibble::tibble(entity = NA_character_, type = NA_character_))
}
tibble::as_tibble(result)
}
# Test on a news sentence
result <- extract_entities(
"Christine Lagarde met Rishi Sunak in London last Tuesday to discuss IMF reform."
)
result# A tibble: 3 × 2
entity type
<chr> <chr>
1 Christine Lagarde PERSON
2 Rishi Sunak PERSON
3 London LOC
Code
# entity type
# 1 Christine Lagarde PERSON
# 2 Rishi Sunak PERSON
# 3 London LOC
# 4 last Tuesday DATE
# 5 IMF ORGCorpus-Scale NER
Apply the extractor across a corpus and bind the results into a single data frame:
Code
news_corpus <- tibble::tibble(
doc_id = paste0("doc", 1:4),
text = c(
"The European Central Bank announced rate rises in Frankfurt, affecting markets across Germany and France.",
"Ursula von der Leyen met Joe Biden at the G7 summit in Hiroshima to discuss trade policy.",
"Oxford University published a landmark study on language acquisition in the journal Nature.",
"Amazon opened a new fulfilment centre near Manchester, creating 1,500 jobs in the region."
)
)
ner_results <- purrr::pmap_dfr(news_corpus, function(doc_id, text) {
ents <- extract_entities(text)
if (nrow(ents) > 0 && !is.na(ents$entity[1])) {
dplyr::mutate(ents, doc_id = doc_id)
} else {
tibble::tibble(entity = NA, type = NA, doc_id = doc_id)
}
})
ner_results# A tibble: 7 × 3
entity type doc_id
<chr> <chr> <chr>
1 European Central Bank ORG doc1
2 Frankfurt LOC doc1
3 Germany LOC doc1
4 France LOC doc1
5 <NA> <NA> doc2
6 <NA> <NA> doc3
7 <NA> <NA> doc4
LLM-Based NER vs Dedicated Models
LLM-based NER via prompting has different trade-offs compared to dedicated NER models (such as those available through udpipe or the BERT-based models in the BERT/RoBERTa tutorial):
| Property | LLM (Ollama) | Dedicated NER model |
|---|---|---|
| Setup complexity | Low — no fine-tuning | Low — pre-trained weights |
| Speed | Slow (seconds per text) | Fast (milliseconds per text) |
| Customisability | High — change entity types in prompt | Low — fixed to training categories |
| Output consistency | Variable — JSON parsing can fail | Consistent structured output |
| Domain adaptation | Easy — describe domain in prompt | Requires fine-tuning |
| Best for | Flexible exploration, novel entity types | Production pipelines, large corpora |
For corpora of thousands of documents, dedicated models are far more practical. LLM-based NER is most useful when you need non-standard entity types, when the domain is unusual, or when you are exploring a new task before committing to a heavier infrastructure.
Q11. You run the extract_entities() function on 50 texts and find that about 10% return a JSON parsing error. What are two good defensive coding strategies to handle this?
Text Summarisation
What you will learn: How to prompt for extractive and abstractive summaries; how to control summary length and format; how to apply summarisation at corpus scale; and how to compare model output across different prompt formulations
Single-Document Summarisation
Summarisation is one of the tasks where local LLMs perform most reliably. The key prompt design decisions are specifying the desired length, the target audience, and whether the summary should be extractive (drawn verbatim from the source) or abstractive (paraphrased in the model’s own words):
Code
summarise_text <- function(text,
n_sentences = 3,
model = "llama3.2") {
prompt <- paste(
paste0("Summarise the following text in exactly ", n_sentences, " sentences."),
"Write in plain academic prose. Do not use bullet points.",
"Do not include any preamble — begin immediately with the summary.",
paste0("\n\nText:\n", text)
)
ollamar::generate(model, prompt, output = "text") |> trimws()
}
# Darwin abstract (illustrative)
darwin_passage <- paste(
"The struggle for existence amongst all organic beings throughout the world,",
"which inevitably follows from their high geometrical powers of increase,",
"will be treated of. This is the doctrine of Malthus, applied to the whole",
"animal and vegetable kingdoms. As many more individuals of each species are",
"born than can possibly survive, and as, consequently, there is a frequently",
"recurring struggle for existence, it follows that any being, if it vary",
"however slightly in any manner profitable to itself, will have a better",
"chance of surviving and thus be naturally selected."
)
cat(summarise_text(darwin_passage, n_sentences = 2))The struggle for existence among all organic beings throughout the world is inevitable due to their high reproductive capabilities, resulting from their geometrical powers of increase. As a consequence, individuals that vary slightly in advantageous ways are more likely to survive and be naturally selected, thereby securing a better chance of survival.
Varying Summary Length
Code
# Compare summaries at different lengths
lengths <- c(1, 2, 3)
summaries <- purrr::map_chr(
lengths,
~ summarise_text(darwin_passage, n_sentences = .x)
)
# Display side by side
tibble::tibble(
n_sentences = lengths,
summary = summaries
) |>
flextable::flextable() |>
flextable::set_table_properties(width = .95, layout = "autofit") |>
flextable::theme_zebra() |>
flextable::fontsize(size = 10) |>
flextable::set_caption(caption = "Same passage summarised at 1, 2, and 3 sentences.") |>
flextable::border_outer()n_sentences | summary |
|---|---|
1 | The struggle for survival among all organic beings worldwide, driven by their high reproductive abilities, will inevitably lead to natural selection as individuals with advantageous variations are more likely to survive and reproduce. |
2 | The struggle for existence among all organic beings worldwide is a fundamental principle, stemming from the high reproductive capabilities of living organisms. As a result, individuals with advantageous variations are more likely to survive and reproduce, leading to natural selection as populations adapt over time. |
3 | The struggle for existence among all organic beings throughout the world, driven by their high geometrical powers of increase, is a fundamental concept in understanding the natural world. According to Malthus' doctrine, the birth rate far exceeds the survival rate across various species, resulting in a recurring struggle for existence that favors individuals with advantageous traits. As a result, any being that undergoes slight variations beneficial to itself will have a higher chance of survival and be naturally selected. |
Corpus-Scale Summarisation
For a corpus of documents, apply the function across rows and store results:
Code
# Illustrative corpus of five abstracts
abstracts <- tibble::tibble(
paper_id = paste0("P", 1:5),
abstract = c(
"This study investigates the frequency distribution of hedging devices in spoken and written academic English using a corpus of 500,000 words drawn from lectures and journal articles...",
"We present a computational model of lexical alignment in dialogue, trained on the British National Corpus...",
"The paper examines the development of grammaticalisation in Old English modal verbs using diachronic corpus data spanning the 7th to 12th centuries...",
"Using eye-tracking methodology, we investigate how readers process garden-path sentences in English and German...",
"This paper reports on a large-scale survey of attitudes towards language change among speakers of Irish English in three urban centres..."
)
)
abstracts_summarised <- abstracts |>
dplyr::mutate(
summary = purrr::map_chr(
abstract,
~ summarise_text(.x, n_sentences = 1)
)
)
abstracts_summarised |>
dplyr::select(paper_id, summary) |>
flextable::flextable() |>
flextable::set_table_properties(width = .95, layout = "autofit") |>
flextable::theme_zebra() |>
flextable::fontsize(size = 10) |>
flextable::set_caption(caption = "One-sentence summaries generated by llama3.2.") |>
flextable::border_outer()paper_id | summary |
|---|---|
P1 | The study examines the frequency distribution of hedging devices in both spoken and written academic English based on a corpus of 500,000 words derived from lectures and journal articles. |
P2 | A computational model of lexical alignment in dialogue has been developed and trained on the British National Corpus, aiming to improve understanding of language patterns in conversational contexts. |
P3 | This study investigates the evolution of grammaticalisation in Old English modal verbs through an analysis of diachronic corpus data from the 7th to 12th centuries. |
P4 | Researchers used eye-tracking methodology to examine how readers from both English and German-speaking populations process garden-path sentences, a type of sentence that can lead to cognitive dissonance due to its grammatical structure. |
P5 | A large-scale survey of attitudes towards language change was conducted among speakers of Irish English in three urban centres to examine their perspectives on linguistic evolution. |
Q12. You summarise 200 abstracts and find that about 20% of summaries begin with “Here is a one-sentence summary:” despite your prompt saying “Do not include any preamble.” What is the most robust fix?
Generating Embeddings
What you will learn: What embed() does and when to use it; how to extract numeric embedding vectors; how to compute cosine similarity between embeddings; and how to apply embeddings to semantic grouping and nearest-neighbour search
What Are Embeddings?
The embed() function sends text to the model and returns a numeric vector (the embedding) rather than generated text. An embedding is a fixed-length representation of meaning in a high-dimensional space: texts with similar meaning produce vectors that are close together (high cosine similarity); texts with unrelated meaning produce vectors that are far apart.
Ollama can produce embeddings using models specifically optimised for the task. nomic-embed-text is a widely used embedding model that is fast, small (~270 MB), and produces high-quality 768-dimensional embeddings:
Code
# Pull the embedding model (one-time download, ~270 MB)
ollamar::pull("nomic-embed-text")
# Generate an embedding for a single sentence
emb <- ollamar::embed("nomic-embed-text", "Corpus linguistics studies language in use.")
length(emb$embeddings[[1]]) # 768 dimensionsNot all Ollama models support embeddings. Use nomic-embed-text or mxbai-embed-large for embedding tasks — do not use generation models like llama3.2 for embeddings, as their output will be of lower quality. Conversely, embedding models cannot generate text. Pull the right model for the right task.
Cosine Similarity
Code
# Cosine similarity function
cosine_sim <- function(a, b) {
sum(a * b) / (sqrt(sum(a^2)) * sqrt(sum(b^2)))
}
sentences <- c(
"Frequency effects are central to usage-based theories of grammar.",
"Usage-based linguistics emphasises the role of input frequency in acquisition.",
"The morphosyntax of Swahili noun class agreement has been extensively studied.",
"Corpus data reveal robust collocational preferences in academic writing.",
"Token frequency shapes the entrenchment of linguistic constructions."
)
embeddings <- purrr::map(
sentences,
~ ollamar::embed("nomic-embed-text", .x)[, 1]
)
# Compute pairwise cosine similarity matrix
n <- length(embeddings)
sim <- matrix(0, n, n, dimnames = list(paste0("S", 1:n), paste0("S", 1:n)))
for (i in seq_len(n)) for (j in seq_len(n)) {
sim[i, j] <- cosine_sim(embeddings[[i]], embeddings[[j]])
}
round(sim, 3) S1 S2 S3 S4 S5
S1 1.000 0.691 0.600 0.597 0.662
S2 0.691 1.000 0.665 0.658 0.801
S3 0.600 0.665 1.000 0.643 0.639
S4 0.597 0.658 0.643 1.000 0.644
S5 0.662 0.801 0.639 0.644 1.000
Expected: S1, S2, and S5 (all about frequency/usage-based linguistics) should cluster together with high mutual similarity (~0.85+). S4 (collocations) should be moderately similar to them. S3 (Swahili morphosyntax) should be dissimilar to all others.
Nearest-Neighbour Search
Embeddings support semantic search: given a query, find the most similar texts in a corpus:
Code
# Simple nearest-neighbour search
semantic_search <- function(query, corpus_texts, corpus_embeddings, top_n = 3) {
query_emb <- ollamar::embed("nomic-embed-text", query)[, 1]
sims <- purrr::map_dbl(
corpus_embeddings,
~ cosine_sim(query_emb, .x)
)
tibble::tibble(
text = corpus_texts,
similarity = sims
) |>
dplyr::arrange(dplyr::desc(similarity)) |>
head(top_n)
}
# Find the sentences most similar to a query
semantic_search(
query = "How does experience shape language knowledge?",
corpus_texts = sentences,
corpus_embeddings = embeddings,
top_n = 3
)# A tibble: 3 × 2
text similarity
<chr> <dbl>
1 Usage-based linguistics emphasises the role of input frequency in … 0.706
2 Token frequency shapes the entrenchment of linguistic construction… 0.638
3 Frequency effects are central to usage-based theories of grammar. 0.591
Q13. You use embed() with llama3.2 (a generation model) instead of nomic-embed-text and find the similarity scores are much lower and less meaningful. Why?
Corpus-Scale Batch Processing
What you will learn: Why LLM inference is slow and what determines speed; how to process a corpus sequentially with progress tracking; how to use httr2’s parallel request functionality for speedup; and practical strategies for managing large-scale processing jobs
Why Batch Processing Requires Care
Unlike fast string-processing functions, each generate() or chat() call involves running a neural network — a process that takes seconds per text even on modern hardware. A corpus of 1,000 texts at 3 seconds each takes roughly 50 minutes sequentially. Three strategies reduce this:
Parallelisation — ollamar integrates with httr2’s req_perform_parallel() to issue multiple requests simultaneously, reducing total time proportionally to the degree of parallelism.
Batching — grouping short texts together in a single prompt reduces the number of API calls.
Model selection — smaller, faster models (3B) are appropriate for simple tasks; reserve larger models for tasks where quality is critical.
Sequential Processing with Progress
For modest corpora (up to a few hundred texts), sequential processing with a progress indicator is the simplest approach:
Code
classify_batch_sequential <- function(texts, model = "llama3.2") {
n <- length(texts)
results <- character(n)
for (i in seq_along(texts)) {
cat(sprintf("Processing %d of %d...\r", i, n))
results[i] <- classify_sentiment(texts[i], model)
Sys.sleep(0.1) # small pause to avoid overwhelming the local server
}
cat("\nDone.\n")
results
}Parallel Processing with httr2
For larger corpora, ollamar supports parallelisation by building request objects first (output = "req") and then executing them all simultaneously with httr2::req_perform_parallel():
Code
library(httr2)
texts_to_classify <- c(
"The results confirm the central hypothesis and extend previous findings.",
"The methodology contains several unacknowledged limitations.",
"No significant difference was found between the two groups.",
"This work represents a major advance in our understanding of acquisition.",
"The conclusions are not supported by the data presented."
)
# Step 1: Build a system prompt (shared across all requests)
sys_msg <- ollamar::create_message(
paste(
"You classify academic sentences by sentiment.",
"Respond with exactly one word: positive, negative, or neutral."
),
"system"
)
# Step 2: Create a list of httr2_request objects — one per text
reqs <- lapply(texts_to_classify, function(txt) {
msgs <- ollamar::append_message(txt, "user", sys_msg)
ollamar::chat("llama3.2", msgs, output = "req")
})
# Step 3: Execute all requests in parallel
resps <- httr2::req_perform_parallel(reqs)
# Step 4: Extract results
results <- dplyr::bind_rows(
lapply(resps, ollamar::resp_process, "df")
)
tibble::tibble(
text = texts_to_classify,
sentiment = trimws(tolower(results$content))
)# A tibble: 5 × 2
text sentiment
<chr> <chr>
1 The results confirm the central hypothesis and extend previous find… positive
2 The methodology contains several unacknowledged limitations. negative
3 No significant difference was found between the two groups. neutral
4 This work represents a major advance in our understanding of acquis… positive
5 The conclusions are not supported by the data presented. negative
Ollama processes requests on your local hardware. Issuing 50 simultaneous requests does not make your laptop 50× faster — it will saturate your CPU or GPU and may actually slow down individual responses or cause timeouts. In practice, 2–4 parallel requests is a sensible limit for a laptop CPU. Experiment to find the sweet spot for your hardware.
Saving and Resuming Large Jobs
For very large corpora, saving results incrementally prevents losing progress if the session is interrupted:
Code
process_corpus_with_checkpointing <- function(texts, ids,
output_file = "results.rds",
model = "llama3.2") {
# Load existing results if a checkpoint exists
if (file.exists(output_file)) {
existing <- readRDS(output_file)
done_ids <- existing$id
cat("Resuming from checkpoint:", nrow(existing), "texts already processed.\n")
} else {
existing <- tibble::tibble(id = character(), text = character(), result = character())
done_ids <- character()
}
# Process only texts not yet done
todo_idx <- which(!ids %in% done_ids)
cat("Remaining:", length(todo_idx), "texts.\n")
for (i in todo_idx) {
result <- classify_sentiment(texts[i], model)
new_row <- tibble::tibble(id = ids[i], text = texts[i], result = result)
existing <- dplyr::bind_rows(existing, new_row)
# Save checkpoint every 10 texts
if (i %% 10 == 0) saveRDS(existing, output_file)
}
saveRDS(existing, output_file)
existing
}Q14. You are processing 500 newspaper articles with a local LLM and want to use parallel requests. You try 20 parallel requests at once and find the total processing time is actually longer than sequential processing. What is the most likely explanation?
Using a Local LLM to Help Write R Code
What you will learn: How to prompt a local LLM to generate R code; how to ask for code explanations and debugging help; how to use the model as a programming assistant within an R workflow; and the limitations and best practices for LLM-assisted coding
Why a Local LLM for R Help?
Commercial AI coding assistants (GitHub Copilot, ChatGPT) are excellent but require internet access and send your code to external servers. A local LLM provides a privacy-preserving coding assistant that works offline and never transmits your proprietary scripts or data descriptions to third parties.
For R-specific tasks, a well-prompted model can:
- Generate boilerplate code — read a CSV, reshape a data frame, run a t-test
- Explain unfamiliar functions — “What does
purrr::reduce()do?” - Debug error messages — paste the error and the code that produced it
- Suggest improvements — “How can I make this loop more idiomatic in R?”
- Write
dplyrorggplot2pipelines — describe what you need, get working code
Setting Up an R Coding Assistant
Code
# System prompt for an R coding assistant
r_assistant_sys <- paste(
"You are an expert R programmer specialising in data science, corpus linguistics,",
"and natural language processing. You write clean, idiomatic R code using the",
"tidyverse (dplyr, purrr, ggplot2, stringr) and base R where appropriate.",
"When asked to write code: provide only the R code, no prose explanation unless asked.",
"When asked to explain code: be concise and precise.",
"When debugging: identify the error cause first, then provide the fixed code."
)
# Helper function for one-off coding questions
ask_r <- function(question, model = "llama3.2") {
msgs <- ollamar::create_message(r_assistant_sys, "system")
msgs <- ollamar::append_message(question, "user", msgs)
ollamar::chat(model, msgs, output = "text") |> trimws()
}Code Generation Examples
Code
# Generate a ggplot2 visualisation
ask_r("Write R code to create a bar chart showing word frequency from a character vector called 'words', using ggplot2. Show the top 20 words, with bars sorted by frequency.")[1] "```r\nlibrary(ggplot2)\n\ntop_20_words <- words %>% \n arrange(desc(str_count)) %>% \n head(20)\n\nggplot(top_20_words, aes(x = reorder(words, -str_count), y = str_count)) + \n geom_bar(stat = \"identity\") + \n labs(title = \"Top 20 Words by Frequency\", x = \"Words\", y = \"Frequency\")\n```"
Code
# Explain an unfamiliar function
ask_r("Explain what purrr::accumulate() does and give a simple example relevant to text processing.")[1] "`purrr::accumulate()` applies a function to each element of an iterable (such as a vector) and returns a new sequence with the results.\n\nHere is a simple example using `purrr::accumulate()` for text processing:\n```r\nlibrary(purrr)\n\ntext <- c(\"hello\", \"world\", \"foo\", \"bar\")\n\nresult <- accumulate(text, str_length)\nprint(result) # prints: numeric(4) of length 4\n```\nIn this example, `str_length` is applied to each string in the `text` vector. The result is a new sequence with the lengths of each string."
Code
# Debug an error
error_context <- paste(
"I get this error:",
"Error in UseMethod('select'): no applicable method for 'select' applied to 'character'",
"From this code:",
"result <- my_text |> select(word)"
)
ask_r(error_context)[1] "```r\nresult <- my_text %>%\n str_extract(\"\\\\w+\") %>%\n unnest()\n```\nOr \n```r\nlibrary(tidytext)\nresult <- my_text %>%\n unnest_tokens(word, text = word)\n```\nAssuming `my_text` is a tidy text data frame and `word` is the desired column."
Code
# Expected response: explains that select() is a dplyr function for data frames,
# not for character vectors; suggests str_extract() or other string functions instead.
# Get a complete data processing pipeline
ask_r(paste(
"Write a complete R pipeline that:",
"1. Reads a CSV file called 'corpus.csv' with columns 'doc_id' and 'text'",
"2. Tokenises the text column into words using tidytext::unnest_tokens()",
"3. Removes stop words using tidytext's stop_words data",
"4. Counts word frequency per document",
"5. Returns a tibble sorted by frequency descending"
))[1] "```r\nlibrary(tidytext)\nlibrary(dplyr)\n\npipeline <- corpus %>% \n unnest_tokens(word, text) %>% \n antonyms_remove() %>% \n inner_join(stop_words, by = \"word\", remove = TRUE) %>% \n count(doc_id, word) %>% \n group_by(doc_id) %>% \n summarise(word_count = n()) %>% \n arrange(desc(word_count))\n```"
Stateful Coding Session
For a sustained coding session where you want the model to remember earlier code and context:
Code
# Start a persistent coding session
msgs <- ollamar::create_message(r_assistant_sys, "system")
# Turn 1: Ask for initial code
msgs <- ollamar::append_message(
"Write a function called read_corpus() that reads all .txt files from a folder and returns a tibble with columns doc_id and text.",
"user", msgs
)
resp1 <- ollamar::chat("llama3.2", msgs, output = "df")
msgs <- ollamar::append_message(resp1$content, "assistant", msgs)
cat(resp1$content)```r
read_corpus <- function(folder_path) {
docs <- dir(folder_path, pattern = ".txt$", full.names = TRUE)
df <- data.frame(doc_id = 1:length(docs),
text = readLines(folder_path[docs]))
tidy_df <- as_tibble(df, rows = "doc_id", cols = "text")
tidy_df$doc_id <- tidy_df$doc_id - 1
return(tidy_df)
}
```
Code
# Turn 2: Ask for an improvement — model remembers the function
msgs <- ollamar::append_message(
"Now add error handling: if the folder does not exist, print a clear message and return NULL.",
"user", msgs
)
resp2 <- ollamar::chat("llama3.2", msgs, output = "df")
cat(resp2$content)```r
read_corpus <- function(folder_path) {
if (!file.exists(folder_path)) {
stop(paste0("Folder '", folder_path, "' does not exist"))
return(NULL)
}
tryCatch(
expr = {
docs <- dir(folder_path, pattern = ".txt$", full.names = TRUE)
df <- data.frame(doc_id = 1:length(docs),
text = readLines(folder_path[docs]))
tidy_df <- as_tibble(df, rows = "doc_id", cols = "text")
tidy_df$doc_id <- tidy_df$doc_id - 1
return(tidy_df)
},
error = function(e) {
stop(paste0("Error reading folder '", folder_path, "': ", e))
return(NULL)
}
)
}
```
A local LLM will sometimes produce R code that looks plausible but contains errors — deprecated function names, incorrect argument names, or logic that does not match the stated intent. Always run generated code in a test environment and verify the output before using it in a real analysis. The model is a first-draft assistant, not an infallible oracle.
Q15. You ask the model to write a function that reads a corpus and the code it produces uses read.csv() with stringsAsFactors = TRUE (the old R 3.x default). What does this tell you about LLM-generated code, and what should you do?
Q16. You are using the local LLM to help debug a complex purrr::map() pipeline that processes proprietary survey data. A colleague suggests you use ChatGPT instead because it is a better model. What is the strongest argument for continuing to use the local model?
Model Management
What you will learn: How to pull, list, copy, and delete models; how to inspect model metadata; how to choose the right model for a task; and an overview of the model ecosystem available through Ollama
Core Model Management Functions
Code
# List downloaded models
ollamar::list_models()
# Pull a new model (downloads from Ollama library)
ollamar::pull("llama3.2") # 3B generation model, ~2 GB
ollamar::pull("nomic-embed-text") # embedding model, ~270 MB
ollamar::pull("llama3.1") # 8B model, ~4.7 GB (optional)
# Show detailed information about a model
ollamar::show("llama3.2")
# Returns: model parameters, context length, quantisation, architecture
# Copy a model under a new name (useful for creating custom variants)
ollamar::copy("llama3.2", "llama3.2-linguistics")
# List models currently loaded in memory
ollamar::ps()
# Delete a model (frees disk space)
ollamar::delete("llama3.2-linguistics")Recommended Models by Task
Task | Recommended_model | RAM_required | Notes |
|---|---|---|---|
Quick generation / prototyping | llama3.2 (3B) | 4–8 GB | Fast; good for teaching and exploration |
Production classification / NER | llama3.2 (3B) or llama3.1 (8B) | 4–16 GB | Test on labelled sample before full deployment |
High-quality summarisation | llama3.1 (8B) | 8–16 GB | Significantly better output than 3B for long texts |
Sentence embeddings | nomic-embed-text | 4 GB | Do not use generation models for embeddings |
Multilingual tasks | llama3.1 or aya:8b | 8 GB | Cohere's model; strong cross-lingual performance |
Code generation | codellama or llama3.2 | 4–8 GB | Specialised for code; better than general models |
Summary and Further Reading
This tutorial has introduced Ollama and the ollamar R package as a tool for running large language models locally, covering eight practical NLP workflows for corpus linguistics and language research.
Section 1 established the case for local LLMs: privacy for sensitive data, cost predictability for large corpora, and reproducibility from fixed model weights. It introduced the Ollama architecture (a local REST server at 127.0.0.1:11434) and the ollamar package as its R interface.
Section 2 covered setup: installing Ollama, pulling models, installing ollamar, and verifying the connection with test_connection() and list_models().
Section 3 introduced generate() for single-prompt text generation, the five output formats ("resp", "text", "df", "jsonlist", "raw"), and the principles of effective prompt engineering: specificity, constrained output format, and contextual framing.
Section 4 covered multi-turn conversation with chat() and the message history system. It introduced create_message(), append_message(), system prompts, and the full set of message management helpers (prepend_message(), insert_message(), delete_message()).
Section 5 demonstrated prompt-based text classification for sentiment analysis and rhetorical function labelling, with a wrapper function, batch processing via purrr::map_chr(), and the importance of evaluating classifiers on a gold standard before deploying on the full corpus.
Section 6 showed named entity recognition by prompting for JSON output, parsing with jsonlite::fromJSON(), and processing a corpus. It compared LLM-based NER with dedicated models for different use cases.
Section 7 covered text summarisation with length control and corpus-scale application. It discussed post-processing and few-shot prompting as remedies for common formatting failures.
Section 8 introduced embed() for generating sentence embeddings with nomic-embed-text, cosine similarity computation, and nearest-neighbour semantic search.
Section 9 addressed corpus-scale batch processing: sequential processing with progress tracking, parallel processing with httr2::req_perform_parallel(), hardware saturation limits, and checkpoint-based resumption of large jobs.
Section 10 demonstrated using a local LLM as a privacy-preserving R coding assistant for code generation, explanation, debugging, and sustained coding sessions with conversation history.
Section 11 surveyed model management functions and provided a task-by-model recommendation table.
Further reading: The ollamar package is documented at hauselin.github.io/ollama-r. The Ollama model library is at ollama.com/library. (lin2024ollamar?) is the primary package citation. For prompt engineering principles see White et al. (2023). For the broader landscape of open-source LLMs see Touvron, Martin, et al. (2023) and Touvron, Lavril, et al. (2023).
Citation & Session Info
Schweinberger, Martin. 2026. Local Large Language Models in R with Ollama. Brisbane: The Language Technology and Data Analysis Laboratory (LADAL). url: https://ladal.edu.au/tutorials/ollama/ollama.html (Version 2026.05.01).
@manual{schweinberger2026ollama,
author = {Schweinberger, Martin},
title = {Local Large Language Models in R with Ollama},
note = {tutorials/ollama/ollama.html},
year = {2026},
organization = {The University of Queensland, Australia. School of Languages and Cultures},
address = {Brisbane},
edition = {2026.05.01}
}
This tutorial was written with the assistance of Claude (claude.ai), a large language model created by Anthropic. Claude was used to draft and structure the entire tutorial, including all R code, conceptual explanations, and exercises. All content was reviewed and approved by Martin Schweinberger, who takes full responsibility for its accuracy.
Code
sessionInfo()R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: Australia/Brisbane
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] httr2_1.2.2 flextable_0.9.11 ggplot2_4.0.2 stringr_1.6.0
[5] tibble_3.3.1 purrr_1.2.1 dplyr_1.2.0 ollamar_1.2.2
[9] checkdown_0.0.13
loaded via a namespace (and not attached):
[1] utf8_1.2.4 rappdirs_0.3.3 generics_0.1.3
[4] tidyr_1.3.2 fontLiberation_0.1.0 renv_1.1.7
[7] xml2_1.3.6 stringi_1.8.4 digest_0.6.39
[10] magrittr_2.0.3 evaluate_1.0.5 grid_4.4.2
[13] RColorBrewer_1.1-3 fastmap_1.2.0 jsonlite_2.0.0
[16] zip_2.3.2 scales_1.4.0 fontBitstreamVera_0.1.1
[19] codetools_0.2-20 textshaping_1.0.0 cli_3.6.5
[22] rlang_1.1.7 fontquiver_0.2.1 crayon_1.5.3
[25] litedown_0.9 commonmark_2.0.0 withr_3.0.2
[28] yaml_2.3.10 gdtools_0.5.0 tools_4.4.2
[31] officer_0.7.3 uuid_1.2-1 curl_7.0.0
[34] vctrs_0.7.1 R6_2.6.1 lifecycle_1.0.5
[37] htmlwidgets_1.6.4 ragg_1.5.1 pkgconfig_2.0.3
[40] pillar_1.10.1 gtable_0.3.6 glue_1.8.0
[43] data.table_1.17.0 Rcpp_1.1.1 systemfonts_1.3.1
[46] xfun_0.56 tidyselect_1.2.1 rstudioapi_0.17.1
[49] knitr_1.51 farver_2.1.2 patchwork_1.3.0
[52] htmltools_0.5.9 rmarkdown_2.30 compiler_4.4.2
[55] S7_0.2.1 askpass_1.2.1 markdown_2.0
[58] openssl_2.3.2