Monday, February 2, 2026

Post 4 — Why Replication Isn’t Enough: The Evolutionary Trap of Scientific Incentives

Among all the lessons in The Natural Selection of Bad Science, one strikes particularly hard: even strong replication efforts cannot, by themselves, reverse the evolutionary decline in scientific quality.

This is deeply unintuitive. Most scientists agree that the “replication movement” — from the Open Science Framework to large-scale reproducibility projects — is one of the most important reforms of the modern academic era. And yet, according to the model by Smaldino & McElreath, replication, even when well-funded and rigorous, cannot fix the fundamental evolutionary pressures that select for low-effort research.

This article explains why replication fails as a corrective mechanism, what the math shows, and how real-world scientific history aligns perfectly with the model’s predictions. We will also examine several case studies — from priming research to fMRI social neuroscience to cancer biomarker studies — that illustrate the “replication trap” in action.


1. Why People Think Replication Should Work

Replication seems like the perfect immune system for science.
If a result is false, just replicate it.
If it doesn’t repeat, discard it.

Simple.

But this reasoning assumes:

  1. Replications are common.

  2. Failed replications lead to consequences.

  3. Low-quality labs suffer reputational damage.

  4. The system rewards trustworthy results.

Unfortunately, none of these assumptions are true.

Replication is not the default. Science has no built-in self-correcting machinery.
It has the potential to self-correct, but only under the right pressures — and those pressures are currently too weak.

Smaldino & McElreath quantify this problem and show that:

Replication has only a tiny effect on the evolutionary trajectory of scientific methods unless it is extremely punitive to low-effort labs.

Which it rarely is.


2. The Model’s Logic: Replicators Cannot Compete with Producers

In the model:

  • Some labs specialize in production: quick studies, low effort, high false-positive rate.

  • Other labs specialize in replication: they repeat studies to verify their truthfulness.

What happens when we simulate a population containing both?

Result 1: Low-effort producers produce more papers and outcompete replicators.

Replicators:

  • publish less frequently

  • spend more time on confirmations

  • cannot generate flashy findings

  • rarely receive top-tier grants

  • don’t produce sensational media-worthy results

Meanwhile, low-effort producers:

  • publish frequently and visibly

  • get grants

  • train more students

  • create more academic successors

  • dominate institutional resources

If fitness = publication output, then:

Producers reproduce faster than replicators, causing replicators to be outcompeted.

This is identical to how parasitic strategies in nature can overwhelm cooperative ones.


3. Replication Has Almost No Punitive Power in the Real World

The model assumes that failed replications might harm a lab.

But in practice:

  • Replication failures are rarely published in high-impact journals.

  • Original authors face little consequence.

  • Failed replication papers get fewer citations.

  • Journals prefer novel claims over verification.

  • Null results are undervalued.

  • Universities don’t reward replication studies at promotion time.

Even when replication failures happen, authors often:

  • invoke “hidden moderators”

  • claim the field has moved on

  • suggest conceptual misinterpretation

  • publicly dispute the findings

Replication often becomes a public debate, not a correction.

The producer has already extracted career value from their original flashy result.
A failed replication five years later affects nothing.

Thus:

Replication does not reduce the reproductive fitness of low-effort labs.
So low-effort labs continue to grow.


4. The Mathematical Trap: Replication Pressure Is Too Slow

Another key point in the paper is about the time lag:

Low-effort labs can outrun replication

Because:

  • Replications take more time than flashy original studies.

  • Producers generate multiple new papers in the time it takes for one failed replication to emerge.

  • Low-quality labs can pivot quickly to new topics.

  • Replicators remain tied to verifying old problems.

This resembles Red Queen dynamics:

The replicators must run as fast as they can just to stay in place,
while low-effort labs sprint ahead unhindered.


5. Real-World Case Study #1: Social Priming

Few fields provide a better illustration of this dynamic.

Early 2000s psychology was full of:

  • very small sample sizes

  • flexible analysis pipelines

  • researcher degrees of freedom

  • surprising “cute” findings

Classic examples:

  • priming people with words related to old age makes them walk slower

  • holding a warm cup makes you judge people as kinder

  • thinking about money makes you less social

These studies were published because:

  • they were novel

  • statistically significant (p < 0.05)

  • quick to run

  • highly publishable in top journals

Replication attempts began years later.

By then:

  • Many of the original authors had built entire careers

  • The most famous results appeared in textbooks

  • High-impact journals resisted null replications

  • Tenure committees didn’t care about replication failures

Even after the field-wide replication crisis, many original researchers insisted the failures were due to:

  • cultural differences

  • subtle context shifts

  • experimenter effects

  • conceptual misunderstanding

This is exactly what Smaldino & McElreath’s model predicts:
the producers had already won the evolutionary race.

The immune system activated too late.


6. Case Study #2: fMRI Social Neuroscience and “Dead Salmon” Problems

In 2009, Bennett et al. famously showed that an fMRI analysis pipeline detected “brain activity” in a dead salmon.
The result: without rigorous correction, false positives ran rampant.

Did this humiliation lead to the downfall of low-effort fMRI studies?
Not really.

  • Labs kept publishing underpowered fMRI studies.

  • Multiverse analysis showed high false discovery rates.

  • The average fMRI sample size remained too small for years.

  • Replication attempts were rare and underfunded.

Why?
Because flashy fMRI studies:

  • made headlines

  • generated TED Talks

  • attracted major grants

  • produced visually compelling brain images

Replicators — who were slower and less flashy — were selected against.


7. Case Study #3: Cancer Biomarker Research

A 2005 paper in Nature found that 88% of cancer biomarker literature was irreproducible.

And yet:

  • The field did not collapse.

  • Labs continued producing low-quality biomarker studies.

  • Replication studies were not rewarded.

  • Novel positive results dominated publication incentives.

Companies and journals prefer exciting claims:

“New blood biomarker predicts cancer risk!”

—even if statistically flawed.

This creates the exact ecological environment where low-effort labs thrive.


8. Replication is Not Evolutionary Pressure — It is Ecological Feedback

A key conceptual error many scientists make is assuming replication will automatically shape behavior.

But in evolutionary terms:

  • Replication is post-hoc ecological feedback.

  • Evolutionary selection is determined by reproductive success.

If failed replication does NOT affect a lab’s reproduction (its ability to secure students, grants, jobs, tenure), then:

Replication has no power as a selective force.

For replication to matter evolutionarily, two things must happen:

(1) Failed replication must be strongly punished

– loss of grants
– loss of prestige
– loss of student recruitment
– slowing of lab growth

(2) Successful replication must be rewarded

– career advancement
– grant funding
– hiring and promotion credit
– institutional prestige

But the current system does none of this.

Thus, as the paper says:

“Replication alone will have little effect unless it affects the differential reproduction of labs.”

In plainer terms:

Scientists must lose by producing bad science, not merely be embarrassed by it.


9. Why Journals Defeat Replication

Even if replicators do their job perfectly, journals undermine their effect.

Replications are not glamorous

Science incentives promote “impact,” not verification.

Replication studies:

  • have lower citation potential

  • rarely produce new mechanisms or theories

  • do not attract media coverage

  • are harder to publish in top journals

Editors prefer:

  • breakthroughs

  • paradigm shifts

  • counterintuitive findings

  • novel experimental paradigms

This creates an asymmetry:

False positives have many outlets.
False negatives have few.

And asymmetry drives evolution.


10. Replication in Other Fields: A Historical View

The replication trap is not new.
It’s just more visible now.

A few examples:

Classical Anthropology

Margaret Mead’s controversial findings on Samoan adolescent sexuality were criticized by later ethnographers — but the replication attempts did not erase Mead’s influence.

Economics

  • Reinhart & Rogoff’s paper on national debt thresholds was debunked by replication.

  • Yet the original paper shaped global austerity policy for years.

Replication came too late.

Nutritional Epidemiology

Contradictory diet studies appear weekly.

Nobody replicates them because:

  • replication is expensive

  • null findings are unpublishable

  • dietary questionnaires are unreliable

  • flashy claims drive media coverage

The field evolves based on visibility, not reliability.


11. The Deeper Evolutionary Lesson

Replication is vital for truth — but weak for evolution.

Evolution does not reward truth-seeking.
It rewards success.

If the system rewards:

  • speed

  • quantity

  • novelty

  • media visibility

…then evolution will select for labs that maximize those traits.

Replication cannot stop this any more than the occasional predator stops a rapidly multiplying prey species — unless predation is intense and targeted.

This is the “evolutionary trap” of scientific incentives.


12. Can Replication Ever Work as a Corrective Force?

Yes — but only under certain extreme conditions:

1. Replications must be common.

(e.g., 10–20% of published studies should be replications)

2. Failed replications must have major career consequences.

(denial of grants, loss of institutional credibility)

3. Replicators must receive strong institutional and financial rewards.

4. Journals must give equal prestige to replications and novel findings.

5. Funding agencies must incentivize adversarial replication.

6. Pre-registration and transparency must be standard.

These policies would change the evolutionary calculus.

Labs that produce unreliable work would:

  • lose funding

  • lose recruits

  • decline in prestige

  • shrink

  • eventually disappear

Labs that produce reliable work would:

  • survive

  • reproduce

  • shape the next generation

Only then does replication become an evolutionary force.


13. Conclusion: Replication is Necessary — but Not Sufficient

Replication is essential for a healthy scientific ecosystem.

But it is not enough.

The model shows — and history affirms — that:

  • Replicators cannot win an evolutionary race against low-effort producers.

  • Replication pressure is slow, weak, and rarely punitive.

  • The incentive structure protects flashy producers.

  • Failed replications seldom harm careers.

  • Replications themselves are under-incentivized.

The core insight:

Replication cleans up messes, but does not prevent them.
Only incentive reform can prevent their creation.

In the next post, we will explore how scientific fields have historically collapsed under their own incentive structures — and what they teach us about the future.

Sunday, February 1, 2026

Post 3 — The Maths of Misaligned Incentives: Modeling the Evolution of Bad Science

In the first two posts, we discussed a striking proposition: modern scientific practices evolve under natural selection. Labs that publish more quickly have higher “fitness,” and traits that boost publication rates — even if they weaken the reliability of findings — spread across generations of scientists.

But how do we formalize such a claim?

Smaldino & McElreath didn’t simply rely on anecdotes or intuition. They constructed mathematical and computational models to test how scientific cultures evolve under different incentives. The results were stark:

When productivity (number of publications) is rewarded over accuracy, scientific rigor inevitably declines.

This decline isn’t slow. It’s rapid, predictable, and difficult to avoid without deliberate counter-selection.

In this post, we explore:

  • the core structure of their model,

  • how “lab traits” are encoded mathematically,

  • how selection pressure is simulated,

  • why low methodological rigor outcompetes high rigor,

  • and how these results mirror real-world science.

We’ll also use examples from population genetics, epidemiology, and cultural evolution to illustrate why the model behaves the way it does.


1. Why Model Science at All?

Science is complicated. Labs vary enormously:

  • Some are huge industrial-scale operations.

  • Others are tiny one-person groups.

  • Some fields can run 200 experiments per year.

  • Others take 2 years for a single dataset.

So why did the authors model scientific evolution using simplified assumptions?

Because models clarify causality.

Real scientific cultures are messy — but the underlying logic of incentives is simple. By stripping away the noise, the model reveals a core truth:

Even with well-intentioned scientists, a system that rewards quantity will select for low-quality methods.

This is the same reason evolutionary biologists model complex ecosystems using simple genetic or ecological equations: clarity emerges from abstraction.


2. The Core Entities of the Model: Labs, Traits, and Fitness

The model treats labs as the evolving unit — not individual scientists.

Each lab is defined by three key traits:

  1. Effort (E)
    How much methodological rigor the lab applies (large samples, careful controls, slow pace).
    Higher effort → lower false positives but fewer studies per year.

  2. Power (W)
    The statistical power of research produced by the lab.
    Higher power → more true discoveries, but more expensive studies.

  3. Replication Rate (R)
    How often the lab attempts to replicate existing findings.

Each trait comes with a cost:

  • Higher power → fewer total studies

  • Higher effort → even fewer studies

  • Higher replication → fewer original publications

Meanwhile, the scientific environment rewards total publication count, not accuracy.

That reward system is encoded into a fitness function.


3. Fitness = Publication Count × Reward Structure

A lab’s fitness in the model is determined by:

  • the number of studies it runs,

  • the probability each study yields a publishable result,

  • the reward associated with each result
    (original positive > replication > null results).

Crucial point:

Positive results almost always produce higher rewards than nulls, regardless of truth.

Thus, labs face pressure to:

  • increase throughput,

  • maximize positive outcomes,

  • minimize time spent on replications,

  • avoid costly, high-effort research.

The fitness function creates a landscape where low effort and low power produce evolutionary advantages.


4. How Do Labs “Reproduce”?

Just as organisms with more offspring spread their genes, labs with high fitness:

  • train more students,

  • place more postdocs in new positions,

  • expand into new subfields,

  • receive more grant money,

  • split or spawn new labs.

In the simulation, “offspring” labs inherit their parent’s traits with slight mutations. This mirrors real-world mentorship:

A student leaves the lab carrying its habits — sample size norms, statistical techniques, even its attitudes toward novelty vs. rigor.

Over generations, labs that publish more (even unreliably) produce more academic descendants.
High-rigor labs produce fewer.

Thus, traits that maximize publication count spread.


5. The Model’s Heart: The Trade-Off Between Power and Productivity

Scientific studies have costs.

A high-power study (e.g., N = 200) provides robust statistical inference.
But it is expensive.

A low-power study (e.g., N = 10) is cheap and fast.

In the model:

  • Low power = many studies per year.

  • Many studies = more chances at positive results.

  • Positive results = more publications.

  • More publications = higher evolutionary fitness.

The logic is evolutionary dynamite.

Mathematically, the expected number of publishable results for a lab is:

Expected Positives = Number of Studies × P(Significant Result)

But P(Significant Result) includes:

  • true positives (depends on power & reality)

  • false positives (depends on effort & statistical standards)

This means:

Low power increases false positives → increasing publication output → increasing fitness.

This is the central paradox of modern science:

  • Low-power labs produce less reliable results,
    but

  • Low-power labs produce more publishable results.

Thus low power is selected for.


6. Replication in the Model: A Weak Immune System

The model incorporates replication attempts. But there are incentives against replication:

  • Replications earn less prestige.

  • Replications of null results rarely get published.

  • Labs conducting replications lose precious time.

Mathematically, replications have:

  • lower reward,

  • lower impact,

  • higher cost.

Thus in the simulation:

  • Replication rates evolve downward.

  • Fields drift toward poor self-correction.

Even when replication is present, it cannot overcome the selection pressure toward low power unless the rewards for replication are radically increased.

This reflects real-world patterns:

  • Replications are rare in psychology, biology, and ecology.

  • Journals routinely reject replications.

  • Funding agencies rarely support them.

Replication becomes evolutionarily disadvantageous.


7. The Model’s Results: Decline Is Inevitable Under Current Incentives

After many generations in the simulation, labs evolve toward:

Low effort

(minimal methodological rigor)

Low power

(maximal throughput)

Low replication

(minimal time spent correcting errors)

This is not just one possible outcome. It is the stable outcome of the system.

The authors show that:

  • Even if all labs begin as high-quality, high-effort, high-power groups…

  • Evolutionary pressure rapidly degrades methods.

  • The average false-positive rate skyrockets.

  • Replication does not save the system.

  • High-quality labs go extinct (outcompeted by low-quality ones).

This reflects reality across many fields where:

  • underpowered studies dominate,

  • novelty outruns replication,

  • flashy claims outnumber reliable findings.

The model formalizes what many scientists intuitively observe.


8. Real-World Parallels: Why the Model Matches Reality

(a) Population Genetics Parallel

Low-power labs resemble advantageous alleles:

  • They reproduce more,

  • Their “offspring” spread,

  • They take over the population.

High-power labs resemble disadvantageous alleles:

  • They reproduce less,

  • They gradually disappear.

(b) Epidemiology Parallel

False positives behave like infectious agents:

  • They spread rapidly,

  • Transmission is easy,

  • Labs are susceptible hosts.

Replication is like an immune response:

  • Slow,

  • Underfunded,

  • Often neutralized by social pressures.

(c) Cultural Evolution Parallel

Just as religious rituals or political ideologies spread when they confer social benefits, bad scientific methods spread when they confer career benefits.

The parallels are mathematically and conceptually tight.


9. Why Good Intentions Do Not Change the Outcome

A key contribution of the paper is dispelling a myth:

Bad science is not the result of bad intentions.

Even if every scientist:

  • wants to uncover truth,

  • values rigor,

  • disapproves of p-hacking,

…the system pushes them toward:

  • cutting corners,

  • reducing sample sizes,

  • publishing prematurely,

  • avoiding replications.

This is why the authors emphasize:

“Good science will not survive unless good scientists are rewarded for doing good work.”

The mechanism is evolutionary, not ethical.


10. When Does Good Science Win? Rare but Possible

The authors simulated strong reforms:

  • high rewards for replication,

  • penalties for false positives,

  • mandatory methodological standards.

Under these artificial conditions, high-effort, high-power labs flourish.

This reflects real-world domains like:

  • particle physics (high standardization)

  • genomics (large collaborative consortia)

  • mathematics (proof-based verification)

These fields have strict norms that counterbalance productivity incentives.

The lesson is optimistic:

If we change the incentive landscape, evolution will favor good science.

But until then, decline is inevitable.


11. Conclusion: The Mathematics Make It Clear — Incentives Shape Evolution

Smaldino & McElreath’s model shows that:

  • Publication incentives are misaligned with truth-seeking.

  • Natural selection acts on labs based on those incentives.

  • Low rigor and low power spread through academic “lineages.”

  • Replication is too weak to stop this trend.

  • Only structural reforms can reverse the decline.

This is evolutionary theory applied not to organisms, but to epistemic culture.

The model is not meant to be a perfect mirror of science, but a lens to reveal its hidden structure. And it reveals something stark: we have built an evolutionary environment that selects against good science.

In the next post, we examine a critical question:

Why do bad methods spread faster than good ones — even when scientists know they’re bad?

This will take us deeper into incentives, lab sociology, and the dynamics of statistical shortcuts.

Saturday, January 31, 2026

Post 2 — Evolutionary Theory Meets Academia: How Selection Shapes Scientific Methods

 In the first post of this series, we introduced a provocative idea: modern science is not just struggling — it is evolving in a direction that selects for weak, unreliable methods. This idea, central to Smaldino & McElreath’s influential 2016 paper The Natural Selection of Bad Science, rests on a powerful metaphor:

Scientific labs behave like organisms in an evolutionary system.
The methods they use are traits.
Their students are their progeny.
Publications are their fitness.

This metaphor isn’t just poetic. It is a rigorous conceptual framework allowing us to explain why poor research practices spread even when individuals have good intentions.

This post explores that evolutionary lens:

  • Why does academia behave like an ecosystem?

  • How do methods “reproduce”?

  • What is the unit of selection?

  • How do labs evolve over time?

  • Why do certain practices become dominant while others vanish?

We will also use stories from the history of science — from Darwin’s own notebooks to Thomas Kuhn to modern lab politics — to illustrate how culture, training, and incentives act as evolutionary forces.


1. The Central Analogy: Labs as Lineages

Smaldino & McElreath begin with a simple but profound observation:

Scientific practice is not created anew by each generation. It is inherited.

A PhD student or postdoc absorbs:

  • their advisor’s habits,

  • their lab’s methodological norms,

  • statistical preferences,

  • attitudes toward replication,

  • openness to data sharing,

  • norms about p-values,

  • and even meta-level beliefs about what “counts” as good science.

When these trainees move on to establish their own labs, they carry those inherited traits with them, modifying them slightly, recombining them with influences from other labs, but largely preserving the lineage.

This is cultural evolution — a well-studied field — but applied here to scientific methodology.

Examples of inherited scientific culture

  • The Cold Spring Harbor molecular biology lineage, which proliferated through shared summer courses and collaborative DNA work.

  • The Copenhagen School of quantum mechanics, where Bohr’s philosophical stance became a transmittable “method” for thinking about physics.

  • The Chicago School of economics, where rational-choice modeling spread through mentorship and institutional prestige.

Students didn’t just learn theories — they inherited methods, priorities, and epistemic values.


2. What Exactly Is Being Selected? “Traits” in Scientific Lineages

Traits = Research practices.

Examples include:

  • Sample sizes

  • Statistical thresholds

  • Willingness to preregister

  • Commitment to replication

  • How aggressively a lab chases significant results

  • The balance between carefulness and productivity

  • Whether negative results are ever written up

  • How much time is spent refining experiments

These traits are not innate; they are learned.

And crucially — some traits boost short-term success at the expense of long-term reliability.

This immediately sets up a tension:

  • Rigor is slow and costly.

  • Speed produces more publishable results.

  • Publishing more results increases grant success.

  • Therefore, speed boosts evolutionary fitness, even if it lowers rigor.

This is exactly the type of trade-off natural selection thrives on.


3. The Unit of Selection: The Lab, Not the Individual

Scientists often think of academic success as individual —
X scientists wins awards, publishes papers, secures grants.

But the evolutionary view shifts the focus:

The lab or research group is the unit of selection.

Why?

Because:

  • Labs recruit students.

  • Those students carry the lab’s practices elsewhere.

  • Successful labs breed more “descendants.”

  • Practices are copied and transmitted through mentoring lineages.

A lab that produces many successful students spreads its methods faster.
A lab that produces few students leaves little evolutionary footprint.

This is why certain laboratory cultures — good or bad — propagate with surprising persistence.

A historical anecdote

In early molecular biology, the Watson–Crick style of rapid, intuitive model-building spread widely, while Linus Pauling's more hierarchical and chemistry-heavy style faded.
Not because the former was inherently better, but because the labs carrying it produced more trainees in a rapidly expanding field.

Methods spread because trainees spread.


4. Selection Pressure: Publication Counts as Fitness

In biological evolution, fitness = reproductive success.
In academic evolution, fitness = publication success.

The publication record determines:

  • Who gets grants

  • Who gets tenure

  • Whose students get jobs

  • Who attracts new students

  • Which labs grow, split, and reproduce

Thus, labs with traits that maximize publication numbers reproduce more successfully in the academic ecosystem.

These traits may include:

  • running many small-N studies

  • chasing p < 0.05 results

  • favoring novelty over accuracy

  • avoiding replications

  • presenting exploratory findings as confirmatory

  • inflating claims

  • streamlining the path to publication

In other words, questionable research practices (QRPs) increase fitness.

This is not a moral accusation.

It is an evolutionary prediction.


5. Cultural Evolution in Action: Famous Examples

Example 1: Mendel vs. Fisher — Methodological Divergence

Mendel’s experiments are sometimes criticized for being too perfect.
Fisher famously suggested low-variance results indicated bias or over-tidying of data.

But more interesting is how Mendel’s meticulous methods did not spread.
His successors were not trained in his exacting style, and the field evolved into very different methodological norms.

Why?
Because the evolutionary environment around genetics changed.
Speed of discovery mattered more than perfection.

Example 2: The Replication Crisis in Psychology

For decades, psychology labs that produced:

  • surprising effects

  • clever paradigms

  • small studies

  • publishable results

…thrived.
These labs trained many students.

Meanwhile, labs that insisted on:

  • high power

  • robust replication

  • slow, careful experimentation

…produced fewer papers and trained fewer students.

Over time, the field evolved toward flashy, unreliable results.

Example 3: Biomedical “breakthrough culture”

Preclinical cancer biology is notorious for irreproducibility.
Amgen reported in 2012 that they could reproduce only 6 of 53 “landmark” studies.

Why?
Because the labs producing “breakthroughs” got the funding.
They reproduced themselves.
Their methods spread.

Labs doing slow, confirmatory research did not grow.

Evolutionary selection at work.


6. Why Good Practices Often Lose

In evolution, “good” = survival-enhancing, not morally good.

In academia:

  • High-power studies (good for truth)
    require money, time, effort.

  • Low-power studies (bad for truth)
    allow more experiments → more publications.

Thus:

Low power has higher fitness than high power.

This single insight explains much of the replication crisis.

Bad methods win not because they are bad, but because they:

  • are cheaper,

  • produce publishable results faster,

  • generate more “offspring labs.”

Rigor is selected against.

This parallels biological evolution:

  • Peacocks evolve burdensome tails because sexual selection rewards flashiness.

  • Labs evolve burdensome statistical habits because academic selection rewards flashiness.


7. Transmission: How Practices Spread Through Academic Pedigrees

Humans are cultural animals.
We copy behaviors with high social payoffs.
Science is no different.

The main transmission pathways:

  1. Advisor → student

  2. Collaborator → collaborator

  3. Postdoc → new institution

  4. Hiring committee → new faculty (selecting for “productive” candidates)

  5. Grant panel → funded lab

This is similar to the transmission of:

  • languages

  • tool use in primates

  • religious norms

  • social rituals

  • craft techniques

Scientific methodology is a cultural artifact.


8. Drift, Mutation, Selection — All Present in Science

Smaldino & McElreath’s insight allows us to map biological evolutionary features directly onto academia.

Mutation: method innovations

Statistical innovations (Bayesian models, preregistration) are mutations.
Some spread, some die off.

Drift: accidental shifts

A charismatic advisor or influential journal editor can cause random swings in norms, independent of method quality.

Selection: survival of the most publishable

Traits that maximize output proliferate.


9. A Closer Look: Why Evolutionary Thinking Helps Us Understand Scientific Decline

The evolutionary perspective clarifies several mysteries:

❓ Why do bad practices persist even though everyone agrees they are harmful?

➡ Because they increase fitness under current incentives.

❓ Why don’t reforms (e.g., “use bigger samples”) stick?

➡ Because selection pressure overrides idealistic norms.

❓ Why do some labs produce generations of similarly unreliable work?

➡ Because success breeds reproduction of method lineages.

❓ Why do fields diverge so much in reliability?

➡ Because each field has a different ecological niche:

  • neuroscientists face expensive data collection → chronic low power

  • social psychologists have flexible experiments → high p-hacking incentives

  • physics has strong mathematical constraints → low methodological drift

Evolutionary environments differ.


10. Conclusion: Science Evolves — but Who Directs That Evolution?

Viewing science as an evolutionary system reveals something uncomfortable:

We have created a selection environment in which poor methods thrive.

This doesn’t mean the scientists themselves are bad.
It means the system rewards the wrong traits.

Until we redesign the incentive structure, science will continue to evolve toward:

  • more questionable practices

  • more flashy but unreliable findings

  • lower average rigor

  • higher false positive rates

  • declining public trust

In the next post, we’ll dive into the mathematical and computational models that Smaldino & McElreath use to demonstrate how bad science wins under current selection pressures.

Friday, January 30, 2026

Post 1 — The Crisis Beneath the Lab Coat: Why “Bad Science” Evolves

In 2016, Paul Smaldino and Richard McElreath published a striking and uncomfortable paper: “The Natural Selection of Bad Science.” It argues that science is not just failing in isolated pockets — it is evolving in a direction that systematically favors poor practices.

This is not because scientists are bad people. It’s because scientists are people inside an environment that rewards speed, flashiness, and positive results, regardless of whether those results are true.

This first post in our deep-dive series introduces the idea that bad science evolves, just like biological traits do — not through malice, but through selection pressures.


1. What Exactly Is “Bad Science”?

“Bad science” doesn’t necessarily mean fraudulent science or outright misconduct. It means science that:

  • uses underpowered studies,

  • relies on weak statistical methods,

  • employs p-hacking,

  • selectively reports only significant results, and

  • rarely replicates findings.

Such science can be performed by well-meaning researchers simply trying to survive in an academic ecosystem designed around publish-or-perish.

The replication crisis

The 2015 Open Science Collaboration attempted to replicate 100 psychology findings; only about 36% replicated. Similar failures have been seen in cancer biology and economics.

When replicability fails, it’s a sign that science is producing too many false positives, and doing so systematically.


2. Why Are False Positives So Common?

False positives arise naturally from noise, but the modern scientific ecosystem amplifies them.

The incentives look like this:

BehaviorReward
Publish flashy results quicklyGrants, tenure, fame
Take years to do a careful, high-power studyVery few rewards
Publish a null resultOften impossible
Do a replicationActively discouraged

This creates a pressure cooker in which the quickest way to generate publishable results is simply to lower methodological standards.

As Ioannidis famously argued in 2005, “Most published research findings are false” — not because scientists are bad, but because the system selects for false-positive-generating behavior.


3. Historical Anecdotes: When Incentives Tilt, Science Skews

The ESP Debacle

In 2011, psychologist Daryl Bem published a paper suggesting students could predict future events — ESP.
The methods were weak and statistically tortured, but the findings were novel and surprising. So they were published in a prestigious journal.

Why? Because novelty sells, even if the methods are flimsy.

The Brian Wansink “P-hacking Factory”

Wansink, a Cornell researcher, ran a social-nutrition lab famous for headline-grabbing results (“People eat more soup from self-refilling bowls!”).
His emails later revealed systematic data dredging — not fraud, but a culture where “find something publishable” trumped rigor.

These stories illustrate the paper’s thesis: labs that produce lots of positive results prosper, even if the results are fragile.


4. Smaldino & McElreath’s Insight: Science Evolves Like a Darwinian System

Here’s the key insight of the paper:

Research methods are transmitted culturally through labs, and labs that publish more quickly produce more “descendant” labs.

Just as biological traits that increase reproductive fitness spread, research behaviors that increase publication output spread — regardless of whether they uncover truth.

This is the heart of the argument.

Labs = organisms

With traits such as:

  • sample size norms

  • statistical approaches

  • replication habits

  • degree of rigor

Students & postdocs = progeny

They carry the lab’s practices to new institutions.

Publication success = reproductive fitness

Thus, science becomes an evolutionary system — and not a benign one.

If quick-and-dirty methods generate more papers per year, they become dominant in the population of labs. Over decades, methodological deterioration becomes inevitable.


5. The Model: Why Low-Power Science Wins

Smaldino & McElreath built computational models to test this idea. The models show that:

  • Labs that use low sample sizes can run more studies.

  • More studies = more chances for false positives (“significant results”).

  • Significant results = publications.

  • Publications = hiring, tenure, grants.

  • Successful labs produce more trainees → spreading their methods.

In evolutionary terms:

Low-rigor labs have higher fitness.

This is an uncomfortable conclusion.

Is this really how academia works?

Yes — and you can see it empirically.

  • Neuroscience has a median statistical power around 20–30%.

  • Ecology has chronically tiny sample sizes.

  • Biomedical research repeatedly fails in pharmaceutical replication checks (Amgen, Bayer).

These are not failings of individuals — they are signs of evolutionary pressure.


6. Why Replication Fails as Quality Control

Replication is supposed to act like the immune system of science. But it almost never does.

Why?

  • Replications are expensive.

  • Replications are discouraged.

  • Journals often reject replication papers.

  • Senior scientists retaliate against negative replications.

Thus, poor methods do not get “punished.”
Instead, they persist and propagate.

The model shows that unless replication is made incredibly common and highly rewarded, it cannot counteract the evolutionary drift toward bad science.


7. What This Means for the Future

If left unchanged, the system will continue to evolve toward:

  • lower power

  • higher false positive rates

  • more irreproducible results

  • faster publication cycles

  • increased pressure on young scientists

  • widening gap between published claims and reality

The paper is a warning: We are selecting for the worst kinds of science.

Unless incentives change, good methodology will go extinct in many fields.


Conclusion: A Crisis of Evolution, Not of Ethics

Smaldino & McElreath force us to confront a difficult truth:

The decline in scientific rigor is not caused by bad people, but by a bad system.

Science is evolving — and not toward greater reliability.

But evolution is not destiny.
In later posts, we’ll explore how to redesign incentives so that good science becomes the winning strategy again.


Thursday, January 29, 2026

Research as a Feeling: What Science Actually Feels Like

When we talk about research, we often describe it as a method, a discipline, a set of rules. We talk about protocols, replication, peer review, statistical significance.

But beneath the structure—beneath the grants and the deadlines and the unsolved problems—research is something far more intimate. It is a feeling.

It’s the pulse that scientists across history have recognized even when their worlds, tools, and fields were vastly different. Whether it was Rosalind Franklin staring down the helical shadows on her X-ray diffraction plate or Ramanujan scribbling mathematical visions in the early morning hours, that feeling—restless, luminous, stubborn—has always been the real engine of discovery.

And that is the feeling captured in this poem:


Research as a Feeling

(Original Poem)

Research is not a task,
not truly—
it is the thrum beneath the ribs,
the quiet electricity
that wakes before you do.

It begins as a tremor,
a question so small it barely casts a shadow,
yet it rearranges the furniture
of your mind.

It is the warm ache
of finding a clue at midnight,
the way hope curls inside the chest—
soft, persistent—
like a creature learning to breathe.

It is frustration, too:
a slow-burn hunger,
a door that will not open
no matter how many keys you forge.
But even then, the door glows,
and you keep walking back to it.

Research is the feeling of standing
at the edge of a forest
where every leaf whispers a secret
you almost understand.
It is the echo of “almost”
that pulls you deeper.

It is falling in love
with the unseen,
with the possibility that truth
is a shape you can hold
if you learn how to cup your hands
just right.

It is the moment the data shifts
like dawn finding a window—
a clarity so sudden
you forget to breathe.

And then,
quietly,
you begin again.


The Poem, Explained Through the Lives of Scientists

Let’s walk through the poem with real scientific stories that show research not as a career—but as an emotional landscape.


“The thrum beneath the ribs… the quiet electricity that wakes before you do.”

Marie Curie used to say that she was often awake long before the sun, thinking about radium. She once admitted to a friend that the excitement of possibility made her feel “physically restless.”

For her, science wasn’t a job. It was physiological. A heartbeat. An electrical hum.

Many scientists recognize this: the feeling of waking up with a question already pressing against the mind. The poem opens by naming that sensation.


“A question so small it barely casts a shadow… yet it rearranges the furniture of your mind.”

Charles Darwin’s entire life was changed by one small, almost inconspicuous question:
“Why do species vary from island to island?”

It wasn’t a grand philosophical inquiry at first. It was a tiny observation—finch beaks differing slightly across the Galápagos. But that small question shifted the mental architecture of biology forever.

Research often starts this way: a faint itch in the brain that slowly becomes a gravitational center.


“The warm ache of finding a clue at midnight…”

Richard Feynman described how some of his best insights came “not during the day, but when I should have been asleep.”

Watson and Crick’s breakthrough moment came after a long night staring at cardboard cutouts of bases, finally realizing that A must pair with T, and C with G.

Midnight discoveries feel different. The world is quiet. Your thoughts echo louder. The poem captures the mixture of exhaustion and elation that only late-night research delivers.


“It is frustration, too… a door that will not open no matter how many keys you forge.”

Every researcher knows this part.

Gregor Mendel spent years performing careful pea-plant experiments, only to have his work ignored during his lifetime. He faced the unopenable door of obscurity and scientific resistance.

Jocelyn Bell Burnell discovered the first pulsar but was initially dismissed outright—her signal was even jokingly called “LGM” for “Little Green Men.” She had to try key after key before the door cracked open for recognition.

Frustration is not the enemy of research. It is built into its architecture.


“Standing at the edge of a forest where every leaf whispers a secret…”

This line evokes the feeling many scientists report at the beginning of a major, mysterious project.

Barbara McClintock described her genetic work in maize as “walking through a dark forest” where every discovery led to another branching path.

When the unknown feels vast but textured—full of quiet clues—you understand why researchers keep moving forward.


“Falling in love with the unseen… truth as a shape you can hold.”

Einstein often wrote about his “almost romantic” pursuit of deep physical truths. He described falling in love not with results but with the hidden order of the universe.

And Ramanujan believed mathematical truths were “gifts” he could sense emotionally before he could prove them formally. To him, numbers were living things, and discovering them was an act of devotion.

This section of the poem captures that beautiful, irrational, almost spiritual part of research.


“The moment the data shifts like dawn finding a window…”

Every scientist remembers that moment.

The gel with the unexpected band.
The graph where the curve finally rises.
The microscope slide where the pattern becomes obvious.
The code that produces a clean output for the first time.

For Kary Mullis, PCR came to him like a sudden sunrise during a nighttime drive—an abrupt alignment of clarity. He pulled off the road to scribble down the idea.

Discovery often feels like dawn: silent, sudden, transformative.


“And then, quietly, you begin again.”

This is the most universal truth of research.

The project ends. The paper is published. The celebration lasts an hour or a day. And then the scientist returns to the bench, or the lab meeting, or the notebook—because the feeling that started everything is still alive.

Ada Lovelace described this cycle perfectly: “The more I know, the more I want to know.”

Research does not end. It loops.

And that is the quiet beauty of the poem’s final line.


In the End: Research Is an Emotion Before It Is a Method

This poem reminds us that research is a human experience—full of longing, frustration, joy, surprise, obsession, and wonder.

It is not just a career path.
It is not just a skillset.
It is a feeling.

And across centuries, every scientist we admire has felt it too.