Evidential Statistics: A New Lens for 21st Century Science

Moving beyond "significant" or "not significant" to measure the strength of scientific evidence itself

Statistical Paradigm Shift Scientific Evidence 21st Century Science

Beyond "Significant" or "Not Significant"

For decades, the world of science has been held in a binary grip. Research findings are either "statistically significant" or they are not—a determination often made by a value known as the p-value. This rigid framework has shaped countless conclusions, from the effectiveness of new medicines to the impact of environmental policies.

But what if this paradigm, the very bedrock of statistical inference, is no longer sufficient for the complex, data-rich questions of our time?

The Problem with P-Values
  • Encourages binary thinking
  • Arbitrary thresholds (p < 0.05)
  • Contributes to replication crisis
  • Doesn't measure evidence strength

Enter Evidential Statistics, a modern synthesis designed to measure the strength of scientific evidence itself. This approach moves beyond simply rejecting or accepting a pre-defined null hypothesis. Instead, it seeks to quantify how much the data supports one theory over another, offering a more nuanced and intuitive way to understand what our experiments are truly telling us.

As Mark L. Taper and José Miguel Ponciano argue, this could be the most effective statistical paradigm to support the progress of 21st-century science 7 . This article explores how this powerful framework is changing the game, providing scientists with the tools to build a more robust and reliable body of knowledge.

What is Evidential Statistics?

The Limits of the Old Guard

Traditional statistical testing, developed in the early 20th century, often revolves around the p-value. A p-value below an arbitrary threshold (like 0.05) is taken as evidence to reject a null hypothesis (e.g., "this drug has no effect").

However, this method has several well-documented shortcomings:

  • It encourages a binary "yes/no" or "on/off" style of thinking, which often fails to capture the true complexity of scientific problems 7 .
  • It can lead to the dismissal of valuable findings that fail to cross the magic "p < 0.05" line, a major contributor to the "replication crisis" in some fields.
  • It does not directly measure the strength of evidence for competing hypotheses.
The Core of the Evidential Approach

Evidential statistics offers a different path. Its primary goal is to clarify scientific arguments with statistical arguments 7 . Instead of asking, "Can I reject this single null hypothesis?" it asks, "How much does this data support Hypothesis A versus Hypothesis B?"

This is a fundamentally different question. It allows for graded interpretations—the data might weakly support one model, strongly support another, or be almost indifferent between them.

This approach is built on the concept of the evidence function, which is used to estimate the difference between how well two competing models explain the observed data 7 .

Comparing Statistical Approaches
Traditional Statistics

Question: Can we reject H₀?

Output: p-value (significant/not)

Interpretation: Binary decision

Evidential Statistics

Question: How much evidence for H₁ vs H₂?

Output: Likelihood ratio

Interpretation: Graded evidence

A New Toolkit for Modern Science

The evidential approach is not just a theoretical exercise; it is a practical toolkit designed for contemporary scientific challenges.

Handling Model Misspecification

A key strength of evidential statistics is its ability to assess the security of a conclusion even when the models used are not perfect representations of reality 7 .

Supporting Theory Development

The careful use of evidential statistics can play a crucial role in the future development of scientific theories by providing a clearer measure of which theoretical models are best supported by empirical data 7 .

Moving Beyond "Proof"

The framework discourages the notion of "proving" a theory with 100% certainty. Instead, it embraces the ongoing process of scientific learning 5 7 .

Essential Reagents for Statistical Research

Tool / Reagent Primary Function Why It's Essential
Likelihood Function Quantifies how well a statistical model explains the observed data. The foundational component for measuring evidence; it connects theoretical models to real-world observations.
Likelihood Ratio Directly compares the strength of evidence for two competing models (Model A vs. Model B). Provides a clear, quantitative answer to the core question: "How much more does the data support one model over another?"
Evidence Function Estimates the difference in divergence between competing models and the true data-generating process 7 . Allows for robust inference even when all candidate models are imperfect approximations of reality.
Model Space Projection A technique to visualize and compare many models simultaneously based on their divergence from the truth 7 . Moves analysis beyond simple head-to-head comparisons, enabling a more holistic understanding of a scientific problem.

An In-Depth Look: The Millikan Oil Drop Experiment

To see the difference between a traditional analysis and an evidential approach, let's re-examine a classic experiment through this modern lens.

The Objective and Methodology

In the early 20th century, physicist Robert Millikan sought to measure the fundamental electric charge of the electron 1 . His method was ingenious:

  1. Creating Charged Drops: He sprayed fine oil droplets into a chamber.
  2. Balancing Forces: These droplets became charged through friction. Millikan then created an electric field between two metal plates.
  3. Suspending the Drops: At the specific voltage where the droplet hung motionless, the forces were equal.
  4. Repeated Measurement: He performed this balancing act for hundreds of individual droplets 1 .
Traditional vs. Evidential Analysis
Traditional Approach

Focuses on demonstrating that charges are integer multiples of a base value, concluding charge is "quantized."

Evidential Approach

Formally compares the strength of evidence for two competing models: continuous vs. quantized charge.

Results and Analysis

Oil Droplet Measured Charge (x 10⁻¹⁹ C) Ratio to Base Value (e) Closest Integer Multiple of e
1 1.592 0.995 1
2 3.198 1.999 2
3 4.821 3.013 3
4 6.386 3.991 4
5 7.994 4.996 5
The measured charges are all very close to integer multiples of a base value of ~1.602 x 10⁻¹⁹ C, providing strong evidence for the quantized model of electric charge.
Evidential Interpretation

An evidential analysis would calculate a likelihood ratio showing the data is astronomically more probable under the quantized model. The result would not just be "significant" but would provide a quantitative measure of the overwhelming strength of evidence for charge quantization.

Implementing the Evidential Workflow

Adopting an evidential approach involves a deliberate process that aligns with the core principles of the scientific method 2 5 .

1. Ask a Question & Formulate Competing Models

Start with a clear research question. Instead of a single null hypothesis, explicitly define at least two plausible theoretical models that offer different explanations for the phenomenon.

Example: "How does the evidence for a model where the drug is effective compare to a model where it is not?"
2. Gather and Prepare Data

Collect data through careful experimentation or observation. The data must be organized in a way that allows the models to be evaluated. As with all science, this means ensuring data tables are tidy, with each variable in its own column and each observation in its own row 9 .

3. Calculate the Evidence

Use statistical software to compute the likelihood of the data under each competing model. From this, calculate the key reagent—the likelihood ratio—which directly measures the relative evidence.

4. Analyze and Interpret the Strength of Evidence

Interpret the likelihood ratio using established scales. The goal is to see how much the data has shifted the weight of evidence from your initial position.

Likelihood Ratio (Model A vs. Model B) Interpretation of Evidence Strength
1 to 8 Weak or very weak evidence
8 to 32 Moderate evidence for Model A
32 to 128 Strong evidence for Model A
> 128 Very strong evidence for Model A
Note: The same scale applies in reverse for evidence supporting Model B (e.g., 1/8 to 1/32 = Moderate evidence for Model B).
5. Communicate the Findings

Report the results not just with a "p-value," but with the measured strength of evidence. A clear statement like, "The data provides strong evidence (LR = 50) in favor of the new cognitive therapy over the standard treatment" is more informative and nuanced than "the new therapy was statistically significant (p < 0.05)."

Building a Stronger Scientific Future

The transition from traditional hypothesis testing to evidential statistics represents a significant evolution in scientific thought. It moves us away from making dichotomous decisions based on arbitrary thresholds and toward a more continuous, evidence-based understanding of the world.

By providing tools to quantify the strength of evidence directly, this paradigm offers a more intuitive and robust foundation for scientific progress.

As we face increasingly complex challenges—from personalized medicine to climate science—the need for sophisticated statistical tools has never been greater. Evidential statistics, with its focus on clarity and its ability to handle real-world complexity, is poised to be a cornerstone of 21st-century scientific discovery, ensuring that our conclusions are built on the solid ground of measured evidence 7 .

References