Egger's test is a linear-regression test for funnel-plot asymmetry used to detect small-study effects and possible publication bias in meta-analysis. It regresses the standardized effect estimate on its precision and asks whether the regression intercept differs significantly from zero. A non-zero intercept signals that smaller, less precise studies report systematically different effects than larger, more precise ones, which is the statistical fingerprint of an asymmetric funnel plot.
Introduced by Matthias Egger, George Davey Smith, Martin Schneider, and Christoph Minder in a 1997 BMJ paper, the test became the most cited diagnostic for funnel-plot asymmetry in clinical meta-analysis. Three decades later it remains the default reported alongside a funnel plot in PRISMA-compliant manuscripts, despite well-documented limitations on binary outcomes and small evidence bases. This guide walks through what the test does, when it works, when it breaks, and how to run it cleanly in R and Stata.
How Egger's Regression Builds on the Funnel Plot
The starting point is the funnel plot interpretation for publication bias: a scatterplot of effect estimates against a measure of precision such as the inverse of the standard error. In the absence of small-study effects, the points form a symmetric inverted funnel around the pooled estimate. Asymmetry, with small low-precision studies clustered on one side, is the visual signal Egger's test was designed to quantify.
Egger's regression formalizes the visual judgment. It converts each study's effect estimate into a standardized normal deviate by dividing the effect by its standard error, then regresses the deviate on the precision, which is the inverse of the standard error. In a symmetric funnel, the regression line passes through the origin: when precision is infinite the standardized deviate equals the true effect, and when precision is zero the deviate is zero. A non-zero intercept means that even infinitely imprecise studies do not produce a standardized deviate of zero, which only happens when the smallest, least precise studies are systematically pulled in one direction.
The test statistic is a t-test on the intercept of this regression. The reported quantities are the intercept estimate, its standard error, the t value, the degrees of freedom (number of studies minus two), and a p value. A small p value, conventionally below 0.10 in the original paper and often below 0.05 in modern practice, rejects the null hypothesis that the intercept is zero and supports the hypothesis of funnel-plot asymmetry.
What the Intercept Actually Measures
The intercept in Egger's regression has a direct interpretation: it is the predicted standardized effect at zero precision, which corresponds to a hypothetical study with infinite standard error. Under a symmetric funnel, no signal exists at zero precision and the intercept estimates zero. Under asymmetry caused by missing small negative studies, the intercept is positive because the only small studies that appear contribute positive standardized effects.
A common misreading is that the intercept measures the magnitude of publication bias directly. It does not. The intercept measures asymmetry of the funnel, and asymmetry has several possible causes, of which publication bias is only one. Other causes include true heterogeneity that correlates with study size, methodological differences between small and large studies, chance, and selective outcome reporting within studies. Egger's test is best understood as a test of asymmetry, not as a test of publication bias per se.
This distinction matters when reporting. A significant Egger's test should be reported as "evidence of small-study effects" or "evidence of funnel-plot asymmetry," not as "evidence of publication bias." The same applies to a non-significant test: it should be reported as "no statistical evidence of funnel-plot asymmetry," not as "no evidence of publication bias." The literature is full of papers that overclaim in both directions, and reviewers and methodologists routinely flag the language.
When You Can Trust Egger's Test
The test was designed and validated for continuous outcomes measured as standardized mean differences and similar effect sizes where the standard error is independent of the effect estimate. Under those conditions, Egger's regression has acceptable type I error and reasonable power, although the power is modest unless the asymmetry is pronounced and the evidence base is moderate or large.
The Cochrane Handbook and the original authors recommend a minimum of ten studies before running the test. Below this threshold the test has very low power and the p value is unstable. With fewer than ten studies, a non-significant Egger's test should not be interpreted as evidence of symmetry: the test could not have detected asymmetry even if it were present. With ten to twenty studies, modest power is available for moderate asymmetry. With more than twenty studies, the test is reasonably powered for the kinds of asymmetry that publication bias typically produces.
A second precondition is moderate heterogeneity. When between-study heterogeneity is very large, the standardized normal deviate becomes noisy and the regression intercept is hard to estimate precisely. Both extreme homogeneity and extreme heterogeneity can distort the test. Most authors report Egger's test alongside the I-squared statistic and the tau-squared estimate so readers can judge whether heterogeneity is in a range where the test is interpretable.
Why Egger's Test Can Fail on Binary Outcomes
The most important methodological caveat is that the classical Egger test can produce inflated type I error rates when applied to binary outcomes summarized as odds ratios or risk ratios. The reason is mathematical: for binary outcomes, the standard error of the log odds ratio is mechanically correlated with the log odds ratio itself, because both depend on the same cell counts. This induces funnel-plot asymmetry even when no publication bias or small-study effect exists.
Several authors have proposed corrections. Peters and colleagues in 2006 developed an alternative test that uses a weighted regression of the log odds ratio on a transformed measure of precision derived from the total sample size, avoiding the mechanical correlation. Harbord and colleagues in 2006 proposed a modified test based on the efficient score and Fisher's information, which has better statistical properties for binary outcomes. Macaskill and colleagues in 2001 introduced a related test using study size as the predictor rather than precision.
For binary outcome meta-analyses, the Cochrane Handbook recommends Peters' test or Harbord's test in preference to the classical Egger test. Most current meta-analysis software offers both as named options. Reviewers increasingly catch papers that apply the classical Egger test to odds ratios or risk ratios when an outcome-appropriate variant would have been correct. The choice of variant should be documented in the methods section of the manuscript.
Alternative and Complementary Tests for Publication Bias
Egger's test is one of several diagnostics for publication bias and small-study effects. The classical alternative is the rank correlation test of Begg and Mazumdar from 1994, which computes the Kendall tau correlation between the standardized effect and its variance. Begg's test has lower power than Egger's in most realistic scenarios and is now usually reported only when historical comparison with older meta-analyses is needed.
A different family of methods estimates the effect after adjusting for funnel-plot asymmetry. The trim-and-fill method of Duval and Tweedie iteratively imputes missing studies on the under-represented side of the funnel and recomputes the pooled estimate. The adjusted estimate gives a rough sense of how publication bias could shift the result, but trim-and-fill has known limitations under heterogeneity and is best interpreted as a sensitivity analysis rather than a definitive correction.
The Copas selection model explicitly models the probability that a study is published as a function of its standard error and effect estimate, and estimates the corrected pooled effect under that selection. It is computationally more demanding and requires the user to specify or sweep across selection parameters, but it can recover from severe selection that defeats simpler tests. Other selection-model approaches include the Vevea-Hedges weight function and the PET-PEESE regression family. A complete sensitivity analysis often combines Egger's regression with trim-and-fill and one selection-model estimator. For a broader survey of approaches, see publication bias detection methods and the related discussion of fail-safe N methods.