Introduction to Quantitative Reasoning: Why We Need Science

Schweinberger, Martin

doi:10.5281/zenodo.19424873

Introduction to Quantitative Reasoning: Why We Need Science

This tutorial provides a conceptual introduction to quantitative reasoning and the scientific method, covering the logical foundations of empirical research, the history of quantitative thinking, and the philosophical underpinnings of data analysis. It is designed for researchers in linguistics and the humanities who are new to quantitative methods and want to understand the ‘why’ behind statistical approaches.

Author

Martin Schweinberger

Published

2026

Great Court, The University of Queensland

Introduction

This tutorial introduces the foundations of quantitative reasoning and scientific thinking. It asks a deceptively simple question: why can we not simply observe the world carefully and reason from what we see? The answer — that human perception and cognition are systematically biased in ways that evolution has shaped but that our research goals require us to overcome — provides the motivation for the entire scientific enterprise.

The tutorial covers cognitive biases that affect how we perceive patterns, probability, and causation; logical fallacies that undermine valid reasoning; the philosophical foundations of the scientific method including Karl Popper’s theory of falsification; and what it means to apply scientific thinking to linguistics and to everyday claims about the world.

Learning Objectives

By the end of this tutorial you will be able to:

Explain why empirical evidence is necessary, and why pure logical reasoning is insufficient for knowledge about the world
Identify and describe the major cognitive biases that affect human reasoning — including confirmation bias, poor probabilistic intuition, pattern-seeking, pareidolia, and anthropocentric perception
Recognise and name at least ten common logical fallacies and explain why each undermines valid argumentation
Describe Popper’s principle of falsification and explain what distinguishes scientific from non-scientific claims
Apply the scientific circle to a concrete research question
Evaluate everyday claims, health claims, and news stories using scientific criteria
Explain why linguistics is an empirical science and distinguish descriptive from prescriptive approaches

Prerequisite Tutorials

This tutorial assumes no prior knowledge of statistics or research methods. It is designed as a first step and does not require completion of any earlier tutorial. Readers who want to build directly on this foundation may proceed to:

Citation

Martin Schweinberger. 2026. Introduction to Quantitative Reasoning: Why We Need Science. The Language Technology and Data Analysis Laboratory (LADAL), The University of Queensland, Australia. url: https://ladal.edu.au/tutorials/quant_intro/quant_intro.html (Version 3.1.1). doi: 10.5281/zenodo.19332884.

Part 1: Why We Need Science

Section Overview

What you will learn: Why pure logical reasoning cannot answer empirical questions; why careful observation alone is insufficient; and how human cognition is systematically biased in ways that make a disciplined scientific methodology necessary.

The problem with intuition

What science is

Before addressing why science is necessary, it is worth establishing what it is.

A working definition of science

Science is a methodological process used to acquire knowledge about the world based on empirical evidence.

The key components are:

Methodological: Systematic and principled, not haphazard
Process: Ongoing and self-correcting, not a fixed body of knowledge
Empirical: Grounded in observation of reality, not pure speculation
About the world: Concerned with how things actually are, not just how they could logically be

Why not just think about it?

For some domains, reasoning alone works well. The formal sciences — logic and mathematics — proceed entirely through deduction:

Premise 1: Socrates is a human being
Premise 2: All humans are mortal
Conclusion: Therefore, Socrates is mortal

If the premises are true and the logic is valid, the conclusion must be true. No observation of Socrates is required.

The problem is that logic cannot tell us which possible world is our world. Consider three equally coherent possibilities:

Possible world 1: I raise my left arm after counting to 3
Possible world 2: I raise my right arm after counting to 3
Possible world 3: I raise neither arm after counting to 3

All three are logically possible. To know which one actually happened requires empirical evidence — observation of what occurred. (For the record: I counted to two and raised neither arm.)

Why not just observe carefully?

If we need evidence, why not simply observe the world attentively? Because human beings are systematically biased observers. The remainder of this tutorial demonstrates this problem in detail.

Cognitive biases: how we get it wrong

Bias 1: Emotional reasoning over facts

What we fear is often not what actually harms us. Two widely cited contrasts illustrate this:

Strangers versus known contacts. Our fear of strangers — sometimes called “stranger danger” — is vivid and pervasive. Yet the evidence consistently shows that most violence against children and adults occurs within families and among known contacts, not from strangers. The fear is misplaced, and the misplacement has real costs in how we direct protective attention.

Sharks versus cows and mosquitoes. Shark attacks are dramatic and memorable, and have been amplified by popular culture. Yet in the United States, cows kill roughly 20 people per year while sharks kill fewer than one on average. Globally, mosquitoes cause around 700,000 deaths annually through disease transmission. The asymmetry between fear and statistical risk is striking.

The explanation is that vivid, emotionally charged narratives override statistical information. Evolutionary pressures favoured quick emotional responses to salient threats over careful actuarial reasoning.

Bias 2: Confirmation bias

Confirmation bias is the tendency to seek out, interpret, and remember information in ways that confirm what we already believe, while ignoring or discounting contradictory evidence.

This bias is both pervasive and insidious: it affects experts as much as novices, operates even when we are trying to be objective, and reinforces existing beliefs — including incorrect ones — rather than correcting them. We will demonstrate it directly with the Wason Selection Task and the Number Sequence Puzzle in Part 3.

Bias 3: Poor probabilistic intuition

Most people are surprised by how consistently wrong their intuitions are when it comes to probability and statistics. Two classical demonstrations make this vivid.

The Monty Hall Problem

Monty Hall hosted the American television game show Let’s Make a Deal. The game works as follows:

Three doors are presented. Behind two of them are goats; behind one is a prize.
The contestant chooses a door (say, Door 1).
The host, who knows where the prize is, opens a different door to reveal a goat (say, Door 3).
The host asks: “Do you want to switch to Door 2?”

Question: should you switch?

Think about this carefully before reading on. Most people have a strong intuition about the answer.

The intuitive answer is that it does not matter — there are now two doors remaining, so the probability must be 50-50. This is incorrect.

You should always switch. Switching gives you a 2/3 probability of winning; staying gives you only 1/3.

Why switching is correct

When you initially chose Door 1, you had a 1/3 chance of being right. Doors 2 and 3 together held a 2/3 chance of hiding the prize.

When Monty opens Door 3 (always revealing a goat, because he knows where the prize is), that 2/3 probability does not disappear — it concentrates entirely onto Door 2. Door 1 still has only its original 1/3 probability.

Door	Before Monty opens Door 3	After Monty opens Door 3
Door 1 (your choice)	1/3	1/3
Door 2	1/3	2/3
Door 3	1/3	0 (revealed as goat)

The key insight is that Monty’s action is not random — he always opens a losing door. That constraint is what transfers probability.

A more transparent version: 20 doors. Imagine 20 doors instead of 3. You pick Door 1 (1/20 chance of winning). Monty then opens 18 doors, all revealing goats, leaving one other door closed. Would you switch? Almost everyone would — it is obvious that the 19/20 probability has concentrated onto that one remaining door. The logic with 3 doors is identical, just less intuitively obvious.

You can verify this empirically using an online Monty Hall simulation. Running 100 trials with each strategy consistently produces roughly 33% wins when staying and 67% wins when switching.

The Birthday Problem

Question: how many people?

How many people need to be in a room for there to be a 50% chance that two of them share a birthday? Think about your answer before reading on.

Most people guess something around 100 or even 183 (half of 365). The correct answer is only 23. With 23 people, the probability that at least two share a birthday is 50.7%.

The calculation is most easily approached by computing the complement — the probability that all 23 people have different birthdays:

Person 1: 365/365 (any birthday is fine)
Person 2: 364/365 (must differ from person 1)
Person 3: 363/365 (must differ from persons 1 and 2)
...
Person 23: 343/365 (must differ from all 22 others)

P(all different) = (365 × 364 × 363 × ... × 343) / 365^23
                 = 0.4927

P(at least one match) = 1 - 0.4927 = 0.5073

Code

# Verify in R
n <- 23
days_in_year <- 365

prob_all_different <- prod((days_in_year - 0:(n - 1)) / days_in_year)
prob_match <- 1 - prob_all_different
prob_match
# [1] 0.5072972

With 73 people, the probability of a shared birthday exceeds 99.999%.

Code

n <- 73
prob_all_different <- prod((days_in_year - 0:(n - 1)) / days_in_year)
prob_match <- 1 - prob_all_different
prob_match
# [1] 0.9999919

The lesson is that we systematically underestimate how quickly probabilities accumulate — particularly with combinatorial calculations. We are reasonably good at linear arithmetic but very poor at reasoning about exponential growth and compound probabilities. This is one of many reasons why statistical analysis cannot be replaced by intuition.

Fast and slow thinking

The ball and bat problem

Question

A ball and a bat together cost $1.10. The bat costs $1.00 more than the ball. How much does the ball cost?

Most people immediately answer “10 cents.” This is wrong. If the ball costs 10 cents, the bat costs $1.10, and the total is $1.20 — not $1.10.

The correct answer is 5 cents: ball = $0.05, bat = $1.05, total = $1.10.

System 1 and System 2

The psychologist Daniel Kahneman distinguishes two modes of cognition (Kahneman 2011):

System 1 (fast thinking) operates automatically and effortlessly. It generates intuitive responses based on pattern recognition and association. It is fast and requires no conscious effort — but it regularly produces errors on problems that require careful reasoning.

System 2 (slow thinking) is deliberate, effortful, and analytical. It applies logical rules and checks its own work. It is more reliable but requires cognitive effort that we often avoid expending.

The ball and bat problem shows System 1 in action: it generates “10 cents” almost instantly because the numbers $1.00 and $0.10 are salient and combine to give a plausible total. System 2, if engaged, immediately detects the error — but System 1 answers first and System 2 tends to be lazy about checking plausible-seeming answers.

Key insight: science as institutionalised System 2 thinking

Science can be understood as a set of institutional and methodological procedures designed to force deliberate, effortful, System 2 reasoning. Peer review, pre-registration, replication, controlled experiments, and statistical testing are all mechanisms for preventing the fast, intuitive, and frequently wrong conclusions of System 1 from being accepted as knowledge. Science is expensive in time and effort — but it produces more reliable knowledge precisely because of that cost.

Seeing patterns in randomness

Skinner’s superstitious pigeons

In a classic experiment, B. F. Skinner (1948) placed pigeons in boxes where food was delivered at random intervals, with no connection to anything the pigeon did. The result was that each pigeon developed idiosyncratic repetitive behaviours — one turned in circles, another pecked at corners of the box — which it had happened to be performing when food arrived by chance.

The pigeons had assumed a causal connection between their behaviour and the food reward, even though the delivery was entirely random. Each accidental co-occurrence reinforced the behaviour, creating what Skinner called “superstitious” conditioning.

Human superstitions operate by the same mechanism. Athletes who perform well while wearing a particular item of clothing begin treating that item as a causal agent. Gamblers develop “systems” based on perceived patterns in random sequences. In all cases, the cognitive machinery evolved to detect genuine patterns in the environment applies itself inappropriately to random co-occurrences.

Why this matters for research: The same tendency that creates superstition in pigeons and humans can create false patterns in data. If you run enough analyses on a dataset, some will produce significant results by chance alone. This is one reason why hypotheses should be specified before data collection (pre-registration), not inferred from the data retrospectively.

Pareidolia: seeing faces everywhere

Pareidolia is the perception of meaningful patterns — especially faces — in random or ambiguous stimuli. Famous examples include the “Face on Mars” photographed by Viking 1 in 1976 (later shown to be an ordinary rock formation under different lighting), apparent religious figures in food burn marks, and the “Man in the Moon” (with different cultures perceiving different figures in the same lunar surface).

The evolutionary explanation (Bruce Hood, Cardiff University) is straightforward. The ability to quickly detect faces — and particularly to distinguish friend from foe, safe from threatening — was highly adaptive. The cost of a false negative (failing to detect a real face when one is present) was potentially severe: missing a predator or failing to recognise an enemy. The cost of a false positive (seeing a face where there is none) was low: a momentary misperception with no lasting consequence. Evolution therefore favoured an over-sensitive face-detection system, and we inherit the result.

The sweater experiment

Thought experiment

A professor offers you $10 to wear a sweater for one minute. Would you accept?

Most people would. Now consider an additional detail: the sweater previously belonged to a convicted serial killer. Does this change your answer?

Many people become reluctant, or feel discomfort even if they would still accept. Rationally, the sweater is just cloth — its history carries no physical trace that could harm the wearer. Yet the feeling of contamination is real and difficult to dismiss by reasoning.

The evolutionary explanation mirrors that for pareidolia. Ancestors who avoided objects associated with disease, death, or dangerous individuals were at a genuine survival advantage — contaminated objects can carry pathogens. The emotional response of disgust and avoidance was adaptive. Today, that same response activates in contexts where it no longer makes adaptive sense but where we inherited the tendency nonetheless.

The anthropocentric bias

Anthropocentric bias (sometimes called experiential realism) is the assumption that the world appears to all organisms as it appears to us — that our perceptual experience constitutes, rather than merely filters, reality.

Consider human versus bee vision. Humans perceive light in the wavelength range of approximately 400–700 nanometres. Bees perceive roughly 300–650 nm, which includes ultraviolet light but excludes red (which appears black to them). The practical consequence is that flowers look dramatically different to bees than to us: many flowers have ultraviolet patterns that guide bees to nectar but are completely invisible to human eyes.

The philosopher-linguists Evans and Green put this well:

“However, the parts of this external reality to which we have access are largely constrained by the ecological niche we have adapted to and the nature of our embodiment. In other words, language does not directly reflect the world. Rather, it reflects our unique human construal of the world: our ‘world view’ as it appears to us through the lens of our embodiment.”

— Evans and Green (2006, 46)

The implications for research are significant. Any science that takes human perception as a transparent window onto reality — rather than as one evolved, partial, species-specific perspective on it — will systematically reproduce the biases of that perspective. This is a further argument for why we need systematic, instrument-mediated, and community-checked science rather than just careful personal observation.

The afterimage demonstration

Try this

Stare at the dot between the red and green squares for 30 seconds without looking away.
Immediately shift your gaze to the dot between the sand dunes.

What happens: After staring at the red square, the left portion of the dunes appears greenish. After staring at the green square, the right portion appears reddish.

Why: The red-sensitive and green-sensitive photoreceptors in your retina become temporarily fatigued (depleted of neurotransmitter). When you look at the neutral sand, the fatigued cells fire less strongly, so the complementary colour dominates. What you “see” is not simply what is there — it is the output of a neurophysiological process that is itself subject to fatigue, context, and prior stimulation.

Gestalt perception

Gestalt psychology (from the German word for “form” or “shape”) studies how we perceive unified wholes from collections of parts. Several classic principles demonstrate that perception is an active, constructive process, not a passive recording of stimulation.

The Kanizsa triangle above contains no actual triangle — there are three Pac-Man shapes and three angle markers. Yet virtually everyone perceives a bright white triangle overlaying the other elements. The brain constructs the missing contours from partial information, using the principle of closure (completing incomplete shapes).

When the same elements are rearranged, the triangle disappears and three Pac-Man shapes appear instead:

Same elements, different arrangement — radically different perception. Other Gestalt principles include proximity (nearby items are perceptually grouped), similarity (similar items are grouped), continuity (smooth lines are preferred over sharp changes), and common fate (items moving together are grouped).

All of these principles demonstrate the same point: perception is not a record of the external world but a construction that the brain generates based on partial information, prior expectations, and evolved heuristics.

Context effects on perception

The Thatcher illusion

Look at the two upside-down faces above. One may seem slightly unusual, but both appear roughly human and recognisable.

Now look at the same images right-side-up:

The distortion — eyes and mouth inverted relative to the face — that was barely noticeable upside-down is now grotesque and immediately obvious.

Why: When a face is inverted, the brain does not deploy its specialised face-processing system; it processes the image as a generic object. Local distortions go unnoticed. When the face is right-side-up, the full face-processing architecture activates, and the mismatch between the expected face template and the actual distorted image is immediately detectable. Context (orientation) determines which perceptual processing system is recruited, and that choice determines what we see.

The B/13 illusion

The ambiguous figure below illustrates how context determines categorical perception:

The middle symbol in the alphabetic sequence A, B, C is typically read as the letter “B.”

The same symbol in the numeric sequence 12, 13, 14 is typically read as the number “13.”

The physical stimulus is identical in both cases. What changes is the context, which activates different prior expectations and determines which categorical interpretation the perceiver reaches. The same stimulus produces different perceptions depending on its context. This has direct implications for linguistics: the same linguistic form can carry different meanings in different contexts, and we cannot study meaning without studying context.

Exercises: Cognitive Biases and Perception

Q1. The Monty Hall problem reveals a systematic failure of probabilistic intuition. The core of the correct solution is that Monty’s action is not random. Which statement best captures why this matters?

Q2. Pareidolia and Skinner’s pigeon experiments both illustrate the same underlying cognitive tendency. What is it?

Part 2: Logical Fallacies

Section Overview

What you will learn: The most common logical fallacies encountered in academic discourse, media, and everyday argumentation — what they are, why they are fallacious, and how to recognise and counter them.

What are logical fallacies?

Definition

A logical fallacy is a pattern of argument that appears persuasive but contains a fundamental flaw in reasoning. Logical fallacies are not merely weak arguments — they are systematically invalid in a way that can be precisely identified.

Recognising logical fallacies matters because they are pervasive in public discourse, because everyone is susceptible to them (including trained researchers), and because they prevent accurate conclusions and undermine rational debate. Being able to name and explain a fallacy is not merely an academic exercise: it is a practical tool for evaluating claims.

The ten most important fallacies

1. Confirmation bias and cherry-picking

What it is: Selectively seeking out, reporting, or emphasising evidence that supports a preferred conclusion while ignoring or discounting contradictory evidence.

Example:

Claim: "Vaccines cause autism!"
Evidence cited: 1 study (subsequently retracted for scientific fraud) that found a link
Evidence ignored: 100+ subsequent independent studies that found no link

Why it is a fallacy: The strength of evidence lies in its totality, not in the existence of at least one supporting study. Every scientific question can find at least one study pointing in any direction; what matters is the weight and quality of the full body of evidence.

Scientific solution: Pre-register analysis plans before collecting data; report all results including negative ones; conduct systematic reviews and meta-analyses that pool evidence across studies.

2. Ad hominem (attack the person)

What it is: Attacking the character, credentials, or motives of a person making an argument rather than addressing the argument itself.

Examples:

“You can’t trust their climate research — they are a leftist.”
“His statistics are wrong because he is funded by industry.”
“She is just saying that because she is young and naive.”

Why it is a fallacy: A person’s character, political affiliation, or funding source does not determine whether their argument is logically valid or their evidence reliable. These are separate questions. An argument must be evaluated on its own merits.

Correct approach: Identify specific methodological or logical flaws in the argument itself. If funding bias is a concern, examine whether the methods and conclusions are appropriate — not whether the funding source is ideologically convenient.

3. Appeal to authority

What it is: Citing a person’s authority or expertise as the sole justification for accepting a claim, without engaging with the evidence or reasoning behind it.

When it is not a fallacy: Citing a researcher’s work in the sense of engaging with their evidence and methods is entirely appropriate. “According to Smith et al. (2020), who found X using method Y…” is legitimate evidence-based reasoning.

When it is a fallacy:

“Einstein said it, so it must be true!” — This is the person’s opinion, not the evidence.
“Dr. X claims treatment Y works, and she is an expert!” — Expertise confers credibility but not infallibility.

Key distinction: An authority’s evidence and reasoning can be cited as support; an authority’s opinion alone is not evidence.

4. Straw man

What it is: Misrepresenting an opponent’s position — usually by exaggerating or oversimplifying it — in order to attack the weaker, distorted version rather than the actual argument.

Example:

Person A: "We should have some regulations on firearms to reduce violence."
Person B: "You want to ban all guns and leave people completely defenceless!"

Person A said nothing about banning all guns. Person B has constructed a distorted version (“straw man”) of the argument because it is easier to defeat than the actual position.

Why it is called “straw man”: A straw man is easy to knock down, unlike a real opponent. Winning against a straw man creates the appearance of having refuted the real argument without having engaged with it.

5. Argument from ignorance

What it is: Claiming that a proposition is true because it has not been proven false (or vice versa). Treating absence of evidence as evidence of absence — or, more commonly in practice, as evidence of presence.

Examples:

“No one has proven aliens do not exist, so they must be real.”
“Science cannot explain consciousness, therefore it must be supernatural.”

Why it is wrong: Absence of evidence is not, in general, evidence of absence. There are many things that have not yet been investigated. The appropriate response to insufficient evidence is to remain agnostic — to say “we do not yet know” — not to fill the gap with a preferred explanation.

Correct reasoning: Maintain that the burden of proof lies with the person making the positive claim. Absence of disproof does not confirm the claim; it merely leaves it untested.

6. False dichotomy

What it is: Presenting a situation as though only two options exist, when in fact more are available — typically by framing the two extreme positions as the only possibilities.

Examples:

“America: love it or leave it.”
“You are either with us or against us.”
“Either we cut all social programmes or the economy collapses.”

Why it is manipulative: It forces a choice between extremes, eliminates middle ground and compromise, and polarises discussion by making nuanced positions invisible.

7. Slippery slope

What it is: Claiming that one action will inevitably lead, through a chain of steps, to an extreme and undesirable outcome — without providing evidence that the causal chain would actually operate.

Examples:

“If we allow same-sex marriage, next people will marry animals.”
“If we ban one type of gun, soon they will ban all guns.”

When it is legitimate: When there is actual evidence that each step in the chain follows predictably from the previous one, a slope argument may be valid. The fallacy lies in asserting the chain without that evidence.

When it is a fallacy: When the argument relies on fear of an extreme outcome rather than on evidence that the intermediate steps are likely.

8. Circular argument (begging the question)

What it is: An argument in which the conclusion is already contained in, or assumed by, one of the premises. The argument appears to provide evidence for its conclusion but actually just restates the same claim in different words.

Examples:

“The Bible is true because it says so in the Bible.”
“I am trustworthy because I say I am trustworthy.”

Why it fails: No new information is added. If you accept the premise, you have already accepted the conclusion. The argument provides no independent reason to believe the conclusion is true.

Valid structure: Independent premises lead through explicit reasoning to a conclusion that was not already assumed in the starting point.

9. Red herring

What it is: Introducing irrelevant information to distract from the actual question or issue under discussion.

Example:

Journalist: "Why did the government waste millions on this failed project?"
Politician:  "Let me tell you about all the great schools we have built.
              Education is so important, do you not agree?"

The politician has not addressed the question of the waste. Instead, they have introduced a different — and more politically comfortable — topic.

Why it works: People naturally follow new conversational directions, and the original question is easy to lose track of, especially in spoken discourse.

10. Sunk cost fallacy

What it is: Continuing to invest resources (time, money, effort) in something because of what has already been invested, even when the future expected costs outweigh the future expected benefits.

Examples:

Watching a film to the end even though you stopped enjoying it hours ago, because you have already invested two hours.
Continuing to fund a research project that has clearly failed because substantial resources have already been committed.

Why it is irrational: Past costs are irretrievable. They cannot be recovered and are therefore irrelevant to the decision about what to do next. The only rational question is: given the current situation, do the expected future benefits outweigh the expected future costs?

Rational approach: Evaluate each decision forward-looking only. Ask: if I were starting from scratch with no prior investment, would I begin this? If no, the sunk cost fallacy may be operating.

Why fallacies matter for science

Fallacies undermine knowledge

Without awareness of logical fallacies, researchers and readers reach wrong conclusions, waste resources, defend indefensible positions, and spread misinformation — even in good faith.

Science provides the institutional antidote: peer review catches ad hominem and cherry-picking; pre-registration counters confirmation bias; the requirement to engage with the strongest version of opposing theories counters straw man arguments; and the norm of reporting negative results counters selective reporting.

Recognising fallacies in one’s own thinking is harder than recognising them in others’ — but it is the more important skill.

Testing your understanding: the Wason Selection Task

You see four cards. Each card has a letter on one side and a number on the other side. The visible faces are:

Card 1: A    Card 2: K    Card 3: 2    Card 4: 7

The rule: “If there is a vowel on one side of a card, then there is an even number on the other side.”

Which cards must you turn over to test whether the rule is true or false?

Think carefully. You need to choose the minimum set of cards that could definitively falsify the rule.

The most common answer is Cards 1 and 3 (A and 2). This is incorrect.

The correct answer is Cards 1 and 4 (A and 7).

Card 1 (A): Must be turned over. It is a vowel, so the rule requires an even number on the reverse. If there is an odd number, the rule is false.
Card 2 (K): Does not need to be turned over. The rule says nothing about what must be on the reverse of consonants.
Card 3 (2): Does not need to be turned over. The rule says nothing about what must be on the reverse of even numbers — a vowel or a consonant on the reverse would both be compatible with the rule.
Card 4 (7): Must be turned over. If there is a vowel on the reverse, the rule is violated (the vowel would require an even number on its reverse, but we have an odd number).

What this demonstrates: Most people turn over cards that confirm the rule (vowel, even number) rather than cards that could falsify it (vowel?, odd number). This is confirmation bias operating in a purely logical context. Scientific thinking requires actively seeking evidence that could prove you wrong, not just evidence consistent with your hypothesis.

Testing your understanding: the number sequence puzzle

Here are three numbers that follow a rule I have in mind:

1    2    4

You may propose one additional number, and I will tell you whether it follows my rule. What number would you choose, and what rule do you hypothesise?

Think before reading on

What is your hypothesis about the rule? What number would best test it?

Typical responses: Most people guess 8 (following the “doubling” hypothesis) or 16 (following the “squaring” hypothesis). Both follow the rule — but neither tests whether the hypothesis is correct.

A better strategy: Propose a number that your hypothesis predicts would not follow the rule — say, 3 or 7 or 10. If the rule is “each number is larger than the previous one” (which it is), then 3 would follow the rule, falsifying the doubling hypothesis.

The actual rule: “Each number must be larger than the previous number.”

What this demonstrates: Confirmation bias again. When people think their hypothesis is “doubling,” they propose numbers that would confirm it rather than numbers that would challenge it. But only by testing the boundaries of your hypothesis — by attempting to falsify it — can you distinguish your hypothesis from the many other hypotheses compatible with your initial evidence.

Exercises: Logical Fallacies

Q3. A politician responds to a question about rising crime rates by saying: “I am proud of the new schools this government has built, and education is the foundation of a safe society.” Which fallacy does this illustrate?

Q4. A friend argues: “I have watched five seasons of this series and it has been mediocre throughout. I might as well finish it — I have already invested 40 hours.” What is the flaw in this reasoning?

Part 3: What Is Science?

Section Overview

What you will learn: A comprehensive definition of science; the distinction between empirical and formal sciences; the scientific method as a cycle of hypothesis-testing; the Clever Hans case study as a demonstration of why methodology matters; and Popper’s principle of falsification as the criterion that separates scientific from non-scientific claims.

Defining science

The working definition given in Part 1 can now be made more precise:

Science: comprehensive definition

Science is an unbiased, fundamentally methodological enterprise that aims at building and organising knowledge about the empirical world in the form of falsifiable explanations and predictions, by means of systematic observation and experimentation.

The key components are:

Unbiased: Systematic checks against the cognitive biases documented in Parts 1 and 2
Methodological: Follows principled, replicable procedures
Empirical: Based on observation of reality, not pure reasoning
Falsifiable: Claims can in principle be proven wrong by evidence
Explanatory: Accounts for why patterns occur, not just that they occur
Predictive: Generates testable predictions about what will be observed
Observational: Depends on careful, instrument-mediated measurement
Experimental: Tests hypotheses through controlled manipulation

Types of science

Empirical sciences examine phenomena of reality through the scientific method. Their goal is to explain and predict what actually exists and occurs. Examples include biology, physics, chemistry, psychology, sociology, and linguistics. Their method involves observing reality, forming and testing hypotheses, and refining theories in response to evidence.

Formal sciences examine abstract systems through axiomatic reasoning. Their goal is logical coherence and internal consistency. Examples include mathematics, formal logic, theoretical computer science, and formal linguistics. Their method involves starting from axioms, applying logical operations, and deriving theorems. Crucially, formal sciences can prove their results — because their claims concern abstract objects defined by their own axioms.

The key difference is epistemological: formal sciences can establish truths by logical proof; empirical sciences cannot prove — they can only test and potentially falsify. As Popper showed, this asymmetry between proof and falsification is fundamental to understanding how science works.

The scientific method

Science does not proceed in a straight line from observation to truth. It is a cycle of hypothesis formation, testing, revision, and renewed testing — a continuous self-correcting process.

The basic steps are:

Observe a phenomenon and notice something requiring explanation
Formulate a research question — make it specific and tractable
Review existing literature — what is already known?
Form a hypothesis (H₁) — a testable prediction based on prior observation and theory
Form a null hypothesis (H₀) — the position that there is no effect or difference; what we try to disprove
Determine significance level — how certain must we be to reject H₀? (Typically α = .05)
Design the study — how will you collect data to test the hypothesis?
Collect data — execute the design
Analyse the data — apply statistical tests; calculate effect sizes
Draw conclusions — can you reject H₀? What does this imply for H₁?
If H₀ cannot be rejected — form a new hypothesis and repeat

Example: finding lost keys

The abstract steps become concrete with a trivial everyday example:

Observation:  My keys are missing.
Question:     Where are my keys?
Literature:   I have left them on the TV table before.
H₁:           My keys are on the TV table.
H₀:           My keys are NOT on the TV table.
Design:       I will check the TV table.
Data:         I checked — no keys there.
Analysis:     H₀ cannot be rejected.
Conclusion:   My keys must be elsewhere.
New H₁:       My keys are in my coat pocket.
[Repeat]

This trivial example captures the logic that applies to the most sophisticated experiments.

Clever Hans: a case study in methodology

The phenomenon

Between 1891 and 1904, a horse named Clever Hans became famous across Europe for apparently being able to perform arithmetic, answer questions in German, spell words, and tell the time. His owner, Wilhelm von Osten, would ask questions and Hans would tap his hoof the correct number of times. Multiple scientific commissions investigated and found no evidence of fraud. Von Osten appeared to genuinely believe in his horse’s abilities.

The investigation

The psychologist Oskar Pfungst (1907) took a more systematic approach. He designed controlled experiments varying two factors: whether the questioner knew the correct answer, and whether Hans could see the questioner.

Condition	Result
Questioner knows answer + Hans can see questioner	Hans answers correctly
Questioner does not know answer	Hans cannot answer
Hans can see questioner	Hans answers correctly
Hans cannot see questioner (blinders)	Hans cannot answer

The pattern was unambiguous: Hans’s performance depended entirely on whether he could see someone who knew the answer.

The discovery

Pfungst found that questioners unconsciously provided micro-cues that Hans had learned to read. When asking a question requiring a numerical tap count, the questioner would unconsciously tense up; as Hans approached the correct number, the questioner would relax slightly. Hans had learned to start tapping at the tensing cue and stop at the relaxing cue — appearing to know the answer when he was actually reading involuntary muscle movements.

Lessons from Clever Hans

Appearances deceive: Even trained scientists were fooled by systematic observation without proper controls.
Belief bias: Questioners who believed in Hans tended to confirm their belief through uncritical observation.
Unintentional cuing: Von Osten was not deceiving anyone — he gave the cues entirely without awareness.
The need for controls: Only a systematic design that manipulated questioner knowledge and visibility could reveal the truth.
Experimenter effects: The observer’s expectations can influence the outcome of an observation or experiment — a finding that motivates double-blind experimental designs.

The term Clever Hans effect now refers to any situation in which an experimenter’s unconscious behaviour influences a subject’s responses, and serves as a reminder of why blinding and systematic methodology are not merely bureaucratic requirements but essential safeguards against self-deception.

Popper and falsification

The problem of induction

The Austrian-British philosopher Karl Popper (1902–1994) identified a fundamental problem with the traditional view of science as proceeding from many observations to general laws:

Traditional view:
  Observation 1: Swan 1 is white
  Observation 2: Swan 2 is white
  ...
  Observation 10,000: Swan 10,000 is white
  ↓
  Law: All swans are white

Popper’s insight: No number of confirming observations can prove a universal generalisation true. No matter how many white swans you observe, the 10,001st swan might be black. And indeed, when Europeans arrived in Australia they encountered black swans — observations that immediately falsified the “all swans are white” generalisation that had seemed secure for centuries.

But notice the asymmetry: a single black swan is sufficient to refute the universal claim. While we cannot verify by accumulating positive evidence, we can — and must — test by seeking negative evidence.

Falsification as the criterion of science

Popper’s criterion

A theory is scientific if and only if it is falsifiable.

A theory is falsifiable when it is possible to describe, in advance, what kind of observation would prove it wrong. Falsifiable theories take an empirical risk: they stake out a position that could be contradicted by evidence.

A theory that is compatible with every possible observation is not scientific — not because it is necessarily false, but because it cannot be tested and therefore cannot be part of the self-correcting process that constitutes science.

Falsifiable (scientific) examples:

“All swans are white” — falsified by a single non-white swan
“The Earth orbits the Sun” — falsifiable by stellar parallax measurements
“Smoking causes cancer” — falsifiable by epidemiological studies

Not falsifiable (not scientific) examples:

“God exists” — no observation could definitively disprove this
“Everything happens for a reason” — compatible with any possible outcome
“This patient’s symptoms are caused by repressed childhood memories” — can be interpreted to confirm the theory regardless of the patient’s response

The last example points to Popper’s famous critique of psychoanalysis: Freudian theory, he argued, is structured so that any conceivable behaviour can be interpreted as confirming it. A patient who is close to their mother confirms the Oedipal hypothesis; a patient who is distant from their mother confirms it too (they are “repressing” their feelings). A theory that cannot be falsified by any evidence is not a scientific theory — even if it happens to be true.

Science as evolutionary progress

Popper drew an analogy between science and biological evolution. In evolution, genetic variation is subjected to natural selection — variants that fit their environment survive; those that do not are eliminated. In science, theoretical variation (new hypotheses and conjectures) is subjected to empirical testing — theories that withstand attempts at falsification survive; those that do not are rejected. Both processes are progressive but not teleological: they eliminate what does not work without guaranteeing that what remains is final truth.

Implications for research practice:

Ask of every hypothesis: “What would falsify this?”
Design studies with the explicit goal of testing, not just confirming
Treat a hypothesis that survives many serious attempts at falsification as well-corroborated, not as proven
Treat a failed attempt to confirm as informative — a narrowing of the space of possibilities

What is linguistics?

Linguistics

Linguistics is the scientific study of language and individual languages. Linguists aim to uncover, describe, explain, and model the systems that underlie human language use.

As an empirical science, linguistics studies language through systematic observation of real language use, tests hypotheses about linguistic structure and function, and produces falsifiable claims about how language works.

Descriptive versus prescriptive linguistics illustrates the scientific/non-scientific distinction:

Approach	Character	Example
Descriptive (scientific)	Describes what speakers actually do	“English speakers frequently use ain’t in casual conversation”
Prescriptive (non-scientific)	Prescribes what speakers should do	“You should not say ain’t”

Prescriptive claims are not falsifiable in Popper’s sense — they are normative, not empirical. Descriptive claims can be tested against corpus data and thus belong to the domain of science.

Example: the scientific circle in linguistics

Observation:  Children appear to learn grammar without explicit instruction.
Question:     How do children acquire language?
Literature:   Chomsky's Universal Grammar hypothesis;
              Tomasello's usage-based approach.
H₁:           Children extract grammatical patterns through frequency tracking.
Design:       Expose children to artificial language with manipulated
              input frequencies; record which patterns they learn.
Data:         Children's productions; error patterns; learning rates.
Analysis:     Compare learning rates for high- versus low-frequency patterns.
Conclusion:   Higher frequency predicts faster acquisition —
              supports usage-based hypothesis.
Refinement:   Test with different age groups, complexity levels.
[Repeat]

Exercises: The Scientific Method and Falsification

Q5. A researcher proposes the theory: “Students who feel positively about their lecturer will perform better on written assessments.” Is this theory scientific in Popper’s sense?

Q6. A therapist argues: “If a patient denies having repressed childhood trauma, that itself shows how deeply it is repressed. If a patient acknowledges having difficult memories, that confirms the trauma theory.” What is the scientific problem with this argument?

Part 4: Applying Scientific Thinking

Section Overview

What you will learn: How to apply the scientific method to real-world claims — including health claims, news reports, and unusual beliefs — and how to design a linguistics study from the ground up.

Applying the scientific circle to real claims

Ghost belief

Given what we have covered, we can offer a scientific analysis of why people believe in ghosts — not a dismissal of those beliefs, but an explanation of the cognitive and perceptual mechanisms that generate such experiences in the absence of actual ghosts.

Several factors operate together:

Pareidolia and agency detection — the brain is primed to detect faces and intentional agents. In low light, in unfamiliar environments, or when anxious, ambiguous stimuli are more likely to be interpreted as presences.

Confirmation bias — people who believe in ghosts attend to and remember experiences that are consistent with that belief (unexplained sounds, feelings of being watched) and discount or forget the vast majority of experiences that have mundane explanations.

Sleep paralysis — during transitions in and out of REM sleep, it is possible to experience vivid hallucinations combined with an inability to move. This experience — including the sensation of a threatening presence in the room — is well-documented neurologically and has likely generated ghost and demon narratives across cultures.

Infrasound — sounds below the threshold of human hearing (below roughly 20 Hz) can produce feelings of unease, anxiety, and the sensation of an unseen presence. Old buildings with large resonant chambers sometimes produce infrasound.

Emotional factors — grief, sleep deprivation, and fear heighten the tendency to perceive meaningful patterns in ambiguous stimuli.

None of these explanations requires ghosts to exist. Together, they account for the full range of reported ghost experiences using well-understood mechanisms.

Evaluating health claims

Claim: “Vitamin X cures cancer!”

Applying scientific criteria:

Is it falsifiable? Yes — we could test on cancer patients with controls.
What is the evidence? Anecdotes? Observational studies? Randomised controlled trials?
Is the sample adequate? How many participants? What was the control condition?
Were confounds controlled? Diet, other treatments, disease severity?
What is the effect size? Statistically significant but practically trivial?
Has it been replicated? By independent groups without financial stake?
Was it peer-reviewed? Does it appear in a respected journal or in promotional material?
Are there conflicts of interest? Who funded the study?

An anecdote about one person who took the vitamin and recovered is not evidence in the relevant sense — because people recover from cancer without the vitamin, and we have no way of knowing what would have happened without it.

Evaluating news claims

Headline: “Study shows chocolate improves memory!”

Critical questions for any such claim:

Is the claimed relationship causal or merely correlational? (Studies showing that chocolate eaters have better memory may simply reflect that wealthier people eat more good-quality chocolate and also have better access to education and healthcare.)
Was there a control group?
Was the sample size adequate?
What was the effect size — is the improvement practically meaningful?
Who funded the study? (A study funded by a chocolate manufacturer requires particular scrutiny.)
Has it been replicated independently?
Is the result consistent with the broader body of research?

Making personal decisions

Claim: “This quantum healing bracelet balances your body’s energy.”

Applying scientific analysis:

Falsifiability: The claim is vague. “Balancing quantum energy” does not specify what would constitute evidence of success or failure.
Mechanism: There is no known biological mechanism by which wearing a bracelet could affect health through quantum effects. Quantum phenomena operate at sub-atomic scales, not at the scale of biological systems.
Evidence: Only testimonials — anecdotes subject to all the biases documented in this tutorial (confirmation bias, placebo effect, regression to the mean).
Red flags: Pseudoscientific vocabulary (“quantum,” “energy balance”) misappropriated from physics; claims only found in alternative-medicine contexts; no peer-reviewed trials.
Conclusion: The claim is not falsifiable as stated and lacks any plausible mechanistic basis. Extremely unlikely to produce the claimed effects.

Designing a linguistics study

Exercise: designing a study on spoken fluency

You want to investigate whether younger or older speakers of English differ in spoken fluency. How would you design this study scientifically?

Think through the following before reading the answer below.

Search for anecdotal reports online about young versus old speakers and compile a list of observations.
Operationalise 'fluency' as a measurable variable (e.g. words per minute, rate of filled pauses, rate of self-corrections); define 'young' and 'old' with specific age ranges; recruit matched groups (controlling for education, health, language background); use a standardised elicitation task; record, transcribe, and code blind (coder does not know participant age); apply appropriate statistical tests; report effect sizes alongside significance tests.
Ask a group of people informally whether they think older or younger speakers are more fluent and report the modal answer.
Record a few conversations with young and old speakers and note your impressions of who seemed more fluent.

Quick Reference

Cognitive biases checklist

When evaluating evidence or making decisions, watch for:

Emotional reasoning — feeling does not constitute fact
Confirmation bias — the tendency to seek only supporting evidence
Poor probabilistic intuition — systematic underestimation of compound and combinatorial probabilities
Pattern-seeking — perceiving agency or causal structure in random co-occurrences
Pareidolia — seeing faces and meaningful forms in ambiguous stimuli
Context effects — perceptions are shaped by prior expectations and surrounding context
Anthropocentric bias — assuming our perceptual experience constitutes rather than filters reality

Logical fallacies to avoid

Ad hominem — attacking the person rather than the argument
Appeal to authority — citing a person’s opinion rather than their evidence
Straw man — misrepresenting an opponent’s position
False dichotomy — presenting only two options when more exist
Slippery slope — asserting an inevitable chain of consequences without evidence
Circular argument — assuming what you are trying to prove
Red herring — introducing an irrelevant topic to avoid the real issue
Sunk cost fallacy — continuing due to prior investment rather than future prospects
Argument from ignorance — treating absence of disproof as proof
Confirmation bias / cherry-picking — selectively reporting supportive evidence

Scientific method summary

1. Observe → 2. Question → 3. Review literature →
4. Hypothesise (H₁ and H₀) → 5. Design → 6. Collect data →
7. Analyse → 8. Conclude → 9. Refine → [Repeat]

Key principles: falsifiable hypotheses; controlled observation; statistical analysis; peer review; replication.

Evaluating claims

Questions to ask of any empirical claim:

Is it falsifiable?
What is the evidence? (Anecdote? Observational study? Randomised trial?)
Was the sample size adequate?
Were proper controls included?
Were confounds addressed?
Is the effect size meaningful?
Has it been independently replicated?
Was it peer-reviewed?
Are there conflicts of interest?
Is it consistent with the broader body of evidence?

Citation & Session Info

Citation

Martin Schweinberger. 2026. Introduction to Quantitative Reasoning: Why We Need Science. The Language Technology and Data Analysis Laboratory (LADAL), The University of Queensland, Australia. url: https://ladal.edu.au/tutorials/quant_intro/quant_intro.html (Version 3.1.1). doi: 10.5281/zenodo.19332884.

@manual{martinschweinberger2026introduction,
  author       = {Martin Schweinberger},
  title        = {Introduction to Quantitative Reasoning: Why We Need Science},
  year         = {2026},
  note         = {https://ladal.edu.au/tutorials/quant_intro/quant_intro.html},
  organization = {The Language Technology and Data Analysis Laboratory (LADAL), The University of Queensland, Australia},
  edition      = {3.1.1}
  doi      = {10.5281/zenodo.19332884}
}

Code

sessionInfo()

R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26200)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Australia/Brisbane
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] cowplot_1.2.0    ggplot2_4.0.2    checkdown_0.0.13

loaded via a namespace (and not attached):
 [1] gtable_0.3.6        jsonlite_2.0.0      dplyr_1.2.0        
 [4] compiler_4.4.2      BiocManager_1.30.27 renv_1.1.7         
 [7] tidyselect_1.2.1    Rcpp_1.1.1          magick_2.8.5       
[10] scales_1.4.0        yaml_2.3.10         fastmap_1.2.0      
[13] R6_2.6.1            commonmark_2.0.0    labeling_0.4.3     
[16] generics_0.1.4      knitr_1.51          htmlwidgets_1.6.4  
[19] tibble_3.3.1        pillar_1.11.1       RColorBrewer_1.1-3 
[22] rlang_1.1.7         litedown_0.9        xfun_0.56          
[25] S7_0.2.1            cli_3.6.5           withr_3.0.2        
[28] magrittr_2.0.4      digest_0.6.39       grid_4.4.2         
[31] rstudioapi_0.17.1   markdown_2.0        lifecycle_1.0.5    
[34] vctrs_0.7.2         evaluate_1.0.5      glue_1.8.0         
[37] farver_2.1.2        codetools_0.2-20    rmarkdown_2.30     
[40] tools_4.4.2         pkgconfig_2.0.3     htmltools_0.5.9

AI Transparency Statement

This tutorial was revised and restyled with the assistance of Claude (claude.ai), a large language model created by Anthropic. All substantive content — examples, explanations, case studies, and reasoning — was retained from the original and reviewed and approved by Martin Schweinberger, who takes full responsibility for the tutorial’s accuracy.

Back to top

Back to HOME

References

Evans, Vyvyan, and Melanie Green. 2006. Cognitive Linguistics: An Introduction. Edinburgh: Edinburgh University Press.

Kahneman, Daniel. 2011. “Fast and Slow Thinking.” Allen Lane and Penguin Books, New York.