A p-value is the probability of obtaining results as extreme as (or more extreme than) those observed, assuming the null hypothesis is true. A p-value of 0.03 means there is a 3% probability of seeing such results if there were truly no effect. It does NOT mean there is a 97% probability your hypothesis is correct. Misinterpretation of p-values is cited as a major contributor to the reproducibility crisis in psychology and medicine (Nature, 2015).

Why is p < 0.05 the significance threshold?

The 0.05 threshold was proposed by Ronald Fisher in 1925 as a practical guideline. It became the de facto standard in psychology, medicine, and biology — especially in the US and UK — through decades of textbook and journal adoption. In 2019, the American Statistical Association (ASA) released a statement that p < 0.05 should not be used as a binary decision rule. Some journals (Basic and Applied Social Psychology) have banned p-values entirely. Physics uses much stricter thresholds (p < 0.0000003, or 5-sigma) for new discoveries.

P-Value Calculator — Z-Test and T-Test, One-Tailed and Two-Tailed

Q: Why is p < 0.05 the significance threshold?

The 0.05 threshold was proposed by Ronald Fisher in 1925 as a practical guideline. It became the de facto standard in psychology, medicine, and biology — especially in the US and UK — through decades of textbook and journal adoption. In 2019, the American Statistical Association (ASA) released a statement that p < 0.05 should not be used as a binary decision rule. Some journals (Basic and Applied Social Psychology) have banned p-values entirely. Physics uses much stricter thresholds (p < 0.0000003, or 5-sigma) for new discoveries.

P-Value Calculator

Calculate the p-value from a z-score or t-score for one-tailed or two-tailed hypothesis tests. Identifies significance at both α = 0.05 and α = 0.01 thresholds.

Quick Answer

p-value = probability of results this extreme IF H₀ is true. p < 0.05 → reject H₀ at 5% significance. p < 0.01 → highly significant. p ≠ probability your hypothesis is true.

Test Type

Tail

Z-Score

Field / Country	Standard α	One or Two-Tailed?	Governing Body
🇺🇸 US Psychology (APA)	0.05	Two-tailed	APA Publication Manual
🇬🇧 UK Medical Research	0.05	Two-tailed	NIHR / BMJ / Lancet
🇪🇺 EU Drug Approvals (EMA)	0.05 (one-sided 0.025)	Two-tailed	ICH E9 Guideline
🇺🇸 US Drug Approvals (FDA)	0.05	Two-tailed	FDA / ICH E9
🔭 Physics (particle physics)	0.0000003 (5-sigma)	N/A	CERN / PDG
🧬 Genetics (GWAS)	5×10⁻⁸	Two-tailed	Multiple testing correction

Frequently Asked Questions

What is the reproducibility crisis and how does it relate to p-values?

The reproducibility crisis (or replication crisis) refers to findings that many published scientific results fail to replicate when repeated. A 2015 study in Science reproduced only 36% of 100 psychology studies. Key contributors: p-hacking (testing multiple hypotheses and only reporting p < 0.05), HARKing (hypothesising after results are known), publication bias (journals favouring significant results). Major efforts to address this include pre-registration of studies (AsPredicted.org, OSF) and the use of Bayesian methods. The UK Medical Research Council and US NIH both now require pre-registration.

What is the difference between p-value significance thresholds in physics vs biology?

Physics (particle physics at CERN): requires 5-sigma (p < 0.0000003) for a "discovery" — the Higgs boson announcement used this threshold. This is because: (1) results can be checked against theory precisely, (2) experiments run for years producing massive datasets, (3) a wrong announcement would be a major setback. Biology and psychology: 0.05 threshold, partly because effect sizes are smaller and data noisier. Genomics uses genome-wide significance p < 5×10⁻⁸ to correct for testing ~1 million genetic variants simultaneously (Bonferroni correction).

P-Value Calculator

Frequently Asked Questions

Related Calculators