How many studies do you need for a meta-analysis?

A minimum of 2 studies is technically possible, but 5+ is recommended for meaningful heterogeneity assessment, and 10+ for reliable publication bias testing (Higgins et al., 2023).

What is the difference between a systematic review and a meta-analysis?

A systematic review is the broader methodology (search, screen, appraise). A meta-analysis is the optional statistical component that quantitatively pools effect sizes.

Can you do a meta-analysis without a systematic review?

Yes, standalone meta-analyses exist, but best practice embeds meta-analysis within a systematic review to ensure comprehensive, unbiased study identification.

What is I-squared in meta-analysis?

I-squared measures the percentage of total variability across studies that is due to heterogeneity rather than sampling error. Cochrane classifies 0-40% as low, 50-90% as substantial.

Should I use random-effects or fixed-effect?

Use random-effects when you expect clinical or methodological differences between studies: which is almost always the case in medical research.

What software is best for meta-analysis?

R (meta/metafor packages) and Stata are preferred for transparency and reproducibility. They produce publication-ready outputs and allow advanced techniques like network meta-analysis.

How do you interpret a forest plot?

Each horizontal line represents one study's effect estimate and confidence interval. The diamond at the bottom shows the pooled effect. If the diamond doesn't cross the line of no effect, the result is statistically significant.

How to Do a Meta-Analysis: Researcher Guide

This is the definitive guide on how to do a meta-analysis. A meta-analysis is a statistical method that combines effect sizes from multiple independent studies to produce a single pooled estimate of a treatment effect or association. It uses inverse variance weighting, assesses heterogeneity via I-squared and tau-squared, and produces forest plots to visualize results. Meta-analysis is typically conducted within a systematic review following Cochrane Handbook methodology (Higgins et al., 2023).

Whether you are pooling clinical trial results, observational cohort data, or diagnostic accuracy studies, this step-by-step meta-analysis guide for beginners walks you through every decision point with the statistical reasoning behind it. Our biostatisticians have conducted meta-analyses across clinical medicine, public health, and pharmacology, the most common error we see is ignoring prediction intervals, which describe the range of true effects you would expect in a future study conducted in a new setting. This guide covers the complete meta-analysis methodology from defining outcomes through GRADE-rated certainty of evidence, with links to free calculators and example R code at every step. For the broader evidence synthesis process, see our guide to our complete systematic review guide.

What Is a Meta-Analysis?

Nine-step meta-analysis pipeline with deliverable per step. Source: Cochrane Handbook v6.5, Borenstein 2021.

The distinction between meta-analysis and systematic review is fundamental. A systematic review is the complete methodological framework, protocol registration, database searching, screening, data extraction, and quality appraisal. A meta-analysis is the optional statistical component that sits inside a systematic review when the included studies are sufficiently homogeneous to pool quantitatively. Not every systematic review includes a meta-analysis, and not every meta-analysis is embedded in a systematic review, though best practice favors the combination. For a deeper comparison, see .

Outcome Type	Effect Measure	When to Use	Key Property
Continuous, same scale	Weighted Mean Difference	All studies use identical measurement tool	Preserves original units
Continuous, different scales	Hedges' g (SMD)	Studies measure same construct on different scales	Unitless, corrected for small samples
Binary	Odds Ratio	Case-control studies or when RR is inappropriate	Symmetric, can be computed from any 2x2 table
Binary	Risk Ratio	Prospective studies, clinical interpretability	Directly interpretable as relative risk
Binary	Risk Difference	Absolute effect needed (NNT calculation)	Scale-dependent, affected by baseline risk

Feature	Fixed-Effect	Random-Effects (DL)	Random-Effects (REML)
Assumption	One true effect	Distribution of true effects	Distribution of true effects
Weights	Inverse within-study variance only	Inverse within-study + between-study variance	Inverse within-study + between-study variance
Tau-squared estimation	Not applicable	Moment-based (simple, may underestimate)	Likelihood-based (less biased)
Best for	Identical protocols, same population	General use, adequate with many studies	General use, especially few studies

Step 5, Assess Heterogeneity

Heterogeneity is the degree to which the true effects vary across studies. Assessing statistical heterogeneity is mandatory, it determines whether your pooled estimate should be interpreted as a single value or as an average across a distribution of different effects.

I-Squared

The I-squared statistic quantifies the percentage of total variability across studies that is due to true heterogeneity rather than sampling error. I-squared measures statistical heterogeneity on a 0-100% scale. Cochrane classifies I-squared heterogeneity as low (0-40%), moderate (30-60%), substantial (50-90%), and considerable (75-100%) (Higgins et al., 2023). Note the overlapping ranges, interpretation depends on context, not rigid cutoffs. An I-squared of 60% in a pharmacological meta-analysis of identical drug doses may be concerning, while 60% in a behavioral intervention meta-analysis may be expected and acceptable.

Use our I-squared and tau-squared calculator to compute heterogeneity statistics from your data. For a detailed interpretation guide, see I-squared interpretation guide.

Tau-Squared

While I-squared tells you the proportion of variability due to heterogeneity, tau-squared estimates between-study variance in absolute terms. A tau-squared of 0.10 for an SMD meta-analysis means the standard deviation of true effects across studies is approximately 0.32 (the square root of tau-squared). This absolute value is essential for computing prediction intervals and for determining whether heterogeneity is clinically meaningful, not just statistically detectable.

Cochran's Q Test

Cochran's Q is a chi-squared test of the null hypothesis that all studies share the same true effect. A significant Q (p < 0.10 is the conventional threshold, not p < 0.05) indicates heterogeneity. However, Q has low power with few studies and excessive power with many studies, so it should be interpreted alongside I-squared and tau-squared, never in isolation.

Prediction Intervals Revisited

The prediction interval deserves emphasis here because it is the most underreported heterogeneity metric. A meta-analysis with I-squared = 80% tells you that most variability is not due to chance. But the prediction interval tells you the actual range of effects you would expect to see in a new study, which is what clinicians and policymakers actually need to know. Always report prediction intervals alongside I-squared and tau-squared.

Want your entire meta-analysis handled by experienced biostatisticians? Our team manages every step, from effect size extraction and model selection to forest plots and publication bias assessment. access professional research support now, or learn more about our meta-analysis services for researchers.

Step 6, Conduct Sensitivity and Subgroup Analyses

Sensitivity analysis and subgroup analysis explore the robustness of your pooled estimate and investigate potential sources of heterogeneity. These analyses separate a rigorous meta-analysis from a superficial one.

Leave-One-Out Analysis

Leave-one-out analysis systematically removes one study at a time and recalculates the pooled effect. If the pooled estimate changes substantially when a single study is removed, that study is highly influential and warrants scrutiny. Was it an outlier? Did it have a different population, different dose, or a high risk of bias? Use our leave-one-out analysis tool to run this analysis on your data. In R, the leave1out() function in metafor automates this:

leave1out(res)

Subgroup Analysis

Subgroup analysis stratifies studies by a categorical moderator (study design, geographic region, intervention dose, risk of bias) and computes separate pooled estimates for each subgroup. The test for subgroup differences examines whether the pooled effects differ significantly across categories.

Critical rule: pre-specify your subgroup analyses in your protocol before analyzing data. Post-hoc subgroups, invented after seeing the results, are hypothesis-generating only. Reviewers and editors will scrutinize unplanned subgroup analyses as potential data dredging. Limit yourself to 3-5 subgroups with a clinical or methodological rationale.

Meta-Regression

For continuous moderators (publication year, mean participant age, intervention duration), meta-regression analysis extends subgroup analysis by fitting the moderator as a continuous predictor. Meta-regression requires at least 10 studies to have adequate power, and results should be interpreted cautiously because it is an observational association across studies, not a within-study causal estimate. Use our meta-regression input tool to format your data for meta-regression in R.

In metafor, meta-regression is straightforward:

res_reg <- rma(yi, vi, mods = ~ year + dose, data = dat)
summary(res_reg)

Report the regression coefficient, its confidence interval, the residual I-squared (heterogeneity remaining after accounting for the moderator), and the R-squared analog (proportion of between-study variance explained).

Step 7, Test for Publication Bias

Publication bias, the selective publication of studies with positive or statistically significant results, threatens the validity of every meta-analysis. If your pooled estimate is based on a biased sample of studies, it will overestimate the true effect. Publication bias detection uses both visual and statistical methods.

Funnel Plot

A funnel plot graphs each study's effect size (x-axis) against a measure of its precision, typically the standard error (y-axis). In the absence of publication bias, studies scatter symmetrically around the pooled estimate in an inverted funnel shape, precise (large) studies cluster at the top, imprecise (small) studies spread widely at the bottom. A funnel plot detects publication bias when the plot shows asymmetry, typically missing studies in the bottom-left corner (small studies with non-significant results). For guidance on reading funnel plots, see interpreting funnel plots. Generate your own with our build a funnel plot.

Egger's Test

Egger's regression test formally tests funnel plot asymmetry by regressing the standardized effect size against its precision. A significant intercept (p < 0.10) suggests asymmetry consistent with publication bias or other small-study effects. Egger's regression test for funnel plot asymmetry requires a minimum of 10 studies to have adequate power (Sterne et al., 2011). With fewer than 10 studies, the test is unreliable and should not be used, report the funnel plot visually instead. In R:

regtest(res, model = "lm")

Trim-and-Fill

The trim and fill method estimates the number of missing studies needed to make the funnel plot symmetric, imputes those studies, and recalculates the pooled estimate. While useful as a sensitivity check, trim-and-fill assumes that the asymmetry is entirely due to publication bias (rather than other sources of small-study effects) and should be interpreted cautiously. In R:

tf <- trimfill(res)
funnel(tf)

Begg's Test

Begg's test (rank correlation test) is an alternative to Egger's test that uses Kendall's tau to test the association between effect sizes and their variances. It has lower power than Egger's test and is less commonly used today, but some reviewers still request it. In R:

ranktest(res)

For a comprehensive overview of all publication bias methods, see publication bias methods.

Step 8, Rate Certainty of Evidence

A pooled effect estimate is only as useful as the confidence you can place in it. The GRADE framework (Grading of Recommendations, Assessment, Development, and Evaluation) provides a structured approach to rating the certainty of evidence from meta-analyses. GRADE produces a summary of findings table that rates each outcome as high, moderate, low, or very low certainty.

GRADE evaluates five domains that can downgrade certainty:

Domain	Downgrades When
Risk of bias	Included studies have high risk of bias (inadequate randomization, blinding, attrition)
Inconsistency	High heterogeneity (I-squared > 50%) without explanation
Indirectness	Studies do not directly address the clinical question (different population, comparator, or outcome)
Imprecision	Wide confidence intervals crossing the line of clinical significance
Publication bias	Funnel plot asymmetry, Egger's test significant, or evidence of selective reporting

And three domains that can upgrade certainty (applicable mainly to observational studies):

Large magnitude of effect (RR > 2 or < 0.5)
Dose-response gradient
Residual confounding would reduce the effect

The summary of findings table presents the pooled estimate, confidence interval, number of studies and participants, and the GRADE rating for each outcome. Most clinical journals, Cochrane, and guideline organizations require GRADE assessments. GRADE produces a summary of findings table that communicates both the statistical result and the confidence warranted by the underlying evidence, a far more useful output than a pooled estimate alone.

Step 9, Report Your Results

Transparent reporting ensures that readers can evaluate your methods, reproduce your analysis, and build on your findings. The PRISMA 2020 statement provides a 27-item checklist for reporting systematic reviews and meta-analyses.

Essential Reporting Elements

Your results section must include:

Forest plots for every pre-specified outcome, with study labels, effect estimates, confidence intervals, weights, and the pooled diamond. Use our free forest plot tool to generate publication-ready figures.
Heterogeneity statistics for each analysis: I-squared, tau-squared, Cochran's Q (with degrees of freedom and p-value), and prediction intervals.
Funnel plots with Egger's test results for analyses including 10 or more studies.
Sensitivity analysis results, including leave-one-out analysis and any analyses excluding high risk-of-bias studies.
Subgroup analysis results with tests for subgroup differences.
GRADE summary of findings table rating evidence certainty for each outcome.

Writing the Results Narrative

Present pooled estimates with their confidence intervals and prediction intervals in the text. State the direction and magnitude of the effect, then immediately report the heterogeneity. For example: "The pooled standardized mean difference was -0.45 (95% CI: -0.62 to -0.28; 95% PI: -1.10 to 0.20), indicating a moderate treatment effect on average, though the prediction interval includes the possibility of no effect or harm in some settings (I-squared = 72%, tau-squared = 0.15)."

Always report both statistical significance and clinical significance. A pooled odds ratio of 0.92 (95% CI: 0.85-0.99) is statistically significant but may not be clinically meaningful if the absolute risk reduction is less than 1%.

Supplementary Material

Include your full R or Stata analysis code, the extracted data file, and any sensitivity analyses not shown in the main text as supplementary material. This allows peer reviewers to verify your results and future researchers to update your meta-analysis when new studies are published.

Software for Meta-Analysis

The choice of software determines reproducibility, analytical flexibility, and publication readiness. Open-source platforms, particularly R and Stata, are strongly preferred over proprietary black-box tools.

R

R is the most widely used open-source platform for meta-analysis, with several specialized packages:

metafor, The most comprehensive package. Supports fixed-effect, random-effects (DL, REML, PM, and other estimators), meta-regression, multivariate meta-analysis, and network meta-analysis. Defaults to REML. Produces publication-ready forest plots, funnel plots, and diagnostic plots. Maintained by Wolfgang Viechtbauer.
meta, A user-friendly wrapper that calls metafor internally. Provides convenient functions like metabin() for binary outcomes and metacont() for continuous outcomes. Excellent for standard pairwise meta-analyses.
netmeta, Specialized for network meta-analysis (comparing multiple interventions simultaneously). Produces league tables and SUCRA rankings.
dmetar, A companion package to the "Doing Meta-Analysis in R" guide. Provides additional functions for outlier detection, influence analysis, and power calculations.

For researchers learning how to perform a meta-analysis in R, the combination of metafor for computation and forest() for visualization covers the vast majority of use cases.

Stata

Stata provides robust meta-analysis capabilities through community-contributed commands:

metan, The workhorse command for pairwise meta-analysis. Handles both fixed and random-effects models with a single command.
metafunnel, Generates funnel plots for publication bias assessment.
metareg, Runs meta-regression with continuous and categorical moderators.
network, Stata's suite for network meta-analysis.

Stata's advantage is its integrated environment, data management, analysis, and graphing in a single platform. Its disadvantage is the license cost.

CMA (Comprehensive Meta-Analysis)

CMA is a commercial point-and-click tool designed specifically for meta-analysis. While user-friendly, it is a black box, users cannot inspect or share the underlying code, making results non-reproducible. Peer reviewers increasingly expect reproducible scripts, which CMA cannot provide. We do not recommend CMA for publication-bound meta-analyses.

RevMan (Review Manager)

RevMan is Cochrane's free meta-analysis software. It integrates data entry, risk of bias assessment, and meta-analysis in a single workflow. RevMan is adequate for standard Cochrane reviews with pairwise comparisons but lacks the flexibility for meta-regression, network meta-analysis, or custom sensitivity analyses. It is appropriate for Cochrane reviews but limiting for independent research.

Software	Cost	Reproducible Code	Meta-Regression	Network meta-analysis	Best For
R (metafor)	Free	Yes	Yes	Yes (netmeta)	Publication-bound research, advanced methods
Stata	$125-$1,395/yr	Yes	Yes	Yes	Integrated analysis environment
CMA	$195-$1,295	No	Limited	No	Teaching, quick exploratory analyses
RevMan	Free	No	No	No	Standard Cochrane reviews

Open-source software (R, Stata) matters because transparency is the currency of credible meta-analysis. When reviewers can run your code and reproduce your forest plots, confidence in your results increases. When they cannot, they question whether the outputs were generated correctly.

Common Meta-Analysis Mistakes

Even experienced researchers make methodological errors that undermine their meta-analyses. Recognizing these mistakes in advance prevents months of revision and resubmission cycles.

Using a fixed-effect model with high heterogeneity. When I-squared exceeds 50% and the Q-test is significant, a fixed-effect model produces confidence intervals that are too narrow because it ignores between-study variance. The pooled estimate under a fixed-effect model with heterogeneous studies gives disproportionate weight to the largest study and fails to account for the reality that true effects vary. Always use a random-effects model unless you have a strong justification for assuming a common true effect.

Omitting prediction intervals. The confidence interval around the pooled estimate tells you the precision of the average effect. The prediction interval tells you the range of effects you would expect in a new study. A statistically significant pooled estimate with a prediction interval that crosses the null indicates that the intervention may not work in all settings, a critical insight that the confidence interval alone conceals.

Conducting post-hoc subgroup analyses and presenting them as confirmatory. Subgroup analyses that were not pre-specified in the protocol are exploratory. Presenting them without acknowledging their post-hoc nature is misleading and will be identified by experienced reviewers.

Mixing incompatible effect measures. Pooling odds ratios from some studies with risk ratios from others without conversion produces meaningless results. Standardize all studies to the same effect measure before pooling.

Running Egger's test with fewer than 10 studies. Egger's regression test for funnel plot asymmetry requires a minimum of 10 studies to have adequate power (Sterne et al., 2011). With fewer studies, the test cannot distinguish real asymmetry from sampling variation, and a non-significant result falsely reassures.

Ignoring the clinical context of heterogeneity. An I-squared of 75% does not automatically invalidate a meta-analysis. If the heterogeneity is explained by pre-specified subgroup variables (e.g., drug dose, disease severity) and the subgroup-specific estimates are clinically coherent, the meta-analysis remains informative. The mistake is reporting I-squared without investigating what drives it.

Failing to provide reproducible code. A forest plot without the underlying analysis code is an unverifiable claim. Peer reviewers increasingly request R or Stata scripts as supplementary material, and journals are beginning to mandate code sharing for meta-analyses.

For studies with access to raw participant-level data, the IPD meta-analysis guide covers the one-stage and two-stage frameworks, hierarchical modelling, and how individual patient data changes the analysis.

Diagnostic accuracy reviews require a different statistical machinery than intervention reviews, and the diagnostic test accuracy meta-analysis guide walks through the bivariate model, HSROC curves, and QUADAS-2 risk of bias.

Bringing It All Together

Learning how to do a meta-analysis means mastering a nine-step pipeline: define outcomes and choose your effect measure, extract or calculate effect sizes from each included study, select the appropriate statistical model (almost always random-effects with REML), run the pooled analysis and generate forest plots, assess heterogeneity using I-squared, tau-squared, and prediction intervals, conduct pre-specified sensitivity and subgroup analyses, test for publication bias using funnel plots and Egger's test, rate the certainty of evidence with GRADE, and report your results transparently per PRISMA 2020.

Each step has decision points that require both statistical knowledge and clinical judgment. The software you use, ideally R (metafor) or Stata, should be open-source and produce reproducible code. The outputs you deliver, forest plots, funnel plots, heterogeneity statistics, sensitivity analyses, and a GRADE summary of findings table, must meet journal standards on first submission.

If you are new to meta-analysis, start with a simple pairwise analysis of a single outcome using the R code examples in this guide and our free tools: the easy-to-use effect size calculator for computing Hedges' g or odds ratios, the online forest plot generator for visualizing your results, and the free funnel plot generator for checking publication bias. For heterogeneity, use our I-squared and tau-squared calculator, and for sensitivity testing, try the leave-one-out analysis tool.

For researchers who need publication-ready results without the learning curve, consider our professional research services. Our biostatisticians specialize in meta-analysis across clinical medicine, epidemiology, and health technology assessment. We deliver reproducible R code, publication-ready forest plots, GRADE-rated evidence tables, and a complete statistical methods section. Learn more about our meta-analysis service or explore the process of hiring a meta-analysis expert.

References

Borenstein, M., Hedges, L.V., Higgins, J.P.T., & Rothstein, H.R. (2009). Introduction to Meta-Analysis. Wiley.
DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7(3), 177-188.
Higgins, J.P.T., Thomas, J., Chandler, J., Cumpston, M., Li, T., Page, M.J., & Welch, V.A. (Eds.) (2023). Cochrane Handbook for Systematic Reviews of Interventions (Version 6.4). Cochrane.
Sterne, J.A.C., Sutton, A.J., Ioannidis, J.P.A., et al. (2011). Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ, 343, d4002.

Some researchers attempt this in spreadsheets, but you should understand the limitations of running meta-analysis in Excel before choosing your software.

If RevMan does not meet your needs, explore our comparison of alternatives to RevMan for meta-analysis.

After submission, reviewers often challenge your statistical choices. Our guide to responding to statistical reviewer comments can help you navigate those revisions.

Once you have calculated your effect sizes, the next step is creating a publication-ready forest plot with subgroup panels and sensitivity analysis.

When choosing your random-effects estimator, understand the differences between REML and DerSimonian-Laird, as the Cochrane Handbook now recommends REML for most analyses.

Key Takeaways

What Is a Meta-Analysis?

Always report prediction intervals

Pre-specify subgroup analyses in your protocol

Frequently Asked Questions

Related Articles

Reading About Meta-Analysis? Our PhD Team Runs Them Every Day.

Dr. Sarah Mitchell

Reading About Meta-Analysis? Our PhD Team Runs Them Every Day.

When Is a Meta-Analysis Appropriate?

Step 1, Define Your Outcomes and Effect Measures

Continuous Outcomes

Binary Outcomes

Step 2, Extract or Calculate Effect Sizes

What Data to Extract

Handling Missing or Incomplete Data

Converting Between Effect Measures

Creating Your Data File

Step 3, Choose Your Statistical Model

Fixed-Effect Model

Random-Effects Model

DerSimonian-Laird vs REML

Step 4, Run the Pooled Analysis and Generate Forest Plots

The Forest Plot

Prediction Intervals

Example R Code