How to Do a Meta-Analysis: A Step-by-Step Guide for Researchers
Learn how to do a meta-analysis in 9 steps: define outcomes, calculate effect sizes, choose a model, run pooled analysis, assess heterogeneity, test for publication bias, and rate evidence with GRADE.
A meta-analysis statistically pools effect sizes from multiple studies to produce a single quantitative estimate of a treatment effect or association
The process follows 9 steps: define outcomes, extract/calculate effect sizes, select a statistical model, run pooled analysis, generate forest plots, assess heterogeneity, conduct sensitivity analyses, test for publication bias, and rate evidence certainty
Random-effects models (DerSimonian-Laird or REML) are recommended over fixed-effect models when clinical or methodological heterogeneity is expected
I-squared quantifies the percentage of variability due to heterogeneity rather than chance, Cochrane classifies 0-40% as low, 30-60% as moderate, 50-90% as substantial, and 75-100% as considerable
Publication bias is assessed using funnel plots, Egger's test (requires 10+ studies), and trim-and-fill analysis
Every meta-analysis should report forest plots, heterogeneity statistics (I-squared, tau-squared, prediction intervals), and sensitivity analyses at minimum
R (meta, metafor packages) and Stata provide transparent, reproducible analysis, preferable to black-box software
This is the definitive guide on how to do a meta-analysis. A meta-analysis is a statistical method that combines effect sizes from multiple independent studies to produce a single pooled estimate of a treatment effect or association. It uses inverse variance weighting, assesses heterogeneity via I-squared and tau-squared, and produces forest plots to visualize results. Meta-analysis is typically conducted within a systematic review following Cochrane Handbook methodology (Higgins et al., 2023).
Whether you are pooling clinical trial results, observational cohort data, or diagnostic accuracy studies, this step-by-step meta-analysis guide for beginners walks you through every decision point with the statistical reasoning behind it. Our biostatisticians have conducted meta-analyses across clinical medicine, public health, and pharmacology, the most common error we see is ignoring prediction intervals, which describe the range of true effects you would expect in a future study conducted in a new setting. This guide covers the complete meta-analysis methodology from defining outcomes through GRADE-rated certainty of evidence, with links to free calculators and example R code at every step. For the broader evidence synthesis process, see our guide to our complete systematic review guide.
What Is a Meta-Analysis?
Nine-step meta-analysis pipeline with deliverable per step. Source: Cochrane Handbook v6.5, Borenstein 2021.
The distinction between meta-analysis and systematic review is fundamental. A systematic review is the complete methodological framework, protocol registration, database searching, screening, data extraction, and quality appraisal. A meta-analysis is the optional statistical component that sits inside a systematic review when the included studies are sufficiently homogeneous to pool quantitatively. Not every systematic review includes a meta-analysis, and not every meta-analysis is embedded in a systematic review, though best practice favors the combination. For a deeper comparison, see .
Pro Tip
Always report prediction intervals
I-squared tells you the proportion of heterogeneity, but the prediction interval shows the expected range of true effects in future settings, far more clinically useful.
Pro Tip
Pre-specify subgroup analyses in your protocol
Post-hoc subgroups are hypothesis-generating only. Reviewers and editors scrutinize unplanned subgroup analyses for data dredging.
Pro Tip
Frequently Asked Questions
7
A minimum of 2 studies is technically possible, but 5+ is recommended for meaningful heterogeneity assessment, and 10+ for reliable publication bias testing (Higgins et al., 2023).
A systematic review is the broader methodology (search, screen, appraise). A meta-analysis is the optional statistical component that quantitatively pools effect sizes.
Yes, standalone meta-analyses exist, but best practice embeds meta-analysis within a systematic review to ensure comprehensive, unbiased study identification.
I-squared measures the percentage of total variability across studies that is due to heterogeneity rather than sampling error. Cochrane classifies 0-40% as low, 50-90% as substantial.
Use random-effects when you expect clinical or methodological differences between studies, which is almost always the case in medical research.
R (meta/metafor packages) and Stata are preferred for transparency and reproducibility. They produce publication-ready outputs and allow advanced techniques like network meta-analysis.
Each horizontal line represents one study's effect estimate and confidence interval. The diamond at the bottom shows the pooled effect. If the diamond doesn't cross the line of no effect, the result is statistically significant.
Share
Found this useful? Share it with your colleagues.
Need help with your meta-analysis?
Our PhD statisticians run complete meta-analyses: effect sizes, forest plots, heterogeneity testing, and publication-ready results sections.
Reading About Meta-Analysis? Our PhD Team Runs Them Every Day.
From data extraction to forest plots, sensitivity analysis, and a journal-ready manuscript. We handle the full meta-analysis so you can focus on your research question.
Dr. Sarah Mitchell holds a PhD in Biostatistics from Johns Hopkins Bloomberg School of Public Health and has over 15 years of experience in systematic review methodology and meta-analysis. She has authored or co-authored 40+ peer-reviewed publications in journals including the Journal of Clinical Epidemiology, BMC Medical Research Methodology, and Research Synthesis Methods. A former Cochrane Review Group statistician and current editorial board member of Systematic Reviews, Dr. Mitchell has supervised 200+ evidence synthesis projects across clinical medicine, public health, and social sciences.
Not sure whether heterogeneity is a problem? Calculate I-squared and tau-squared with our free heterogeneity calculator.
Need expert help with your meta-analysis? Research Gold's biostatisticians deliver publication-ready forest plots, funnel plots, and GRADE-rated evidence. Explore our services.
Reading About Meta-Analysis? Our PhD Team Runs Them Every Day.
From data extraction to forest plots, sensitivity analysis, and a journal-ready manuscript. We handle the full meta-analysis so you can focus on your research question.
In formal terms, a meta-analysis is a statistical synthesis method (Borenstein et al., 2009). It is a component of a systematic review. It requires effect size calculation from each included study. And it produces a forest plot, the defining visualization of pooled results. These four relationships anchor every decision in the nine-step process below.
When Is a Meta-Analysis Appropriate?
A meta-analysis is appropriate when the included studies are sufficiently similar in population, intervention, comparator, and outcome (PICO) to justify statistical pooling. The decision to pool is clinical and methodological before it is statistical.
Three conditions must be met. First, clinical homogeneity: the studies must address the same or closely related clinical question. Pooling a drug trial in pediatric asthma with a surgical trial in adult COPD produces a number, but not a meaningful one. Second, methodological similarity: studies should share broadly comparable designs (e.g., all RCTs, or all prospective cohorts). Mixing randomized controlled trials with cross-sectional studies introduces confounding that no statistical model can resolve. Third, sufficient studies: while two studies is the technical minimum, five or more are recommended for meaningful heterogeneity assessment, and ten or more are needed for reliable publication bias detection using Egger's test (Sterne et al., 2011).
When should you not meta-analyze? When the included studies differ so profoundly in design, population, or intervention that pooling them would obscure rather than clarify. In these cases, narrative synthesis, a structured qualitative summary, is more appropriate. The Cochrane Handbook explicitly warns against meta-analysis when "there is considerable variation in the results and the variation is not explained by study characteristics" (Higgins et al., 2023). If you determine that pooling is unjustified after examining your data, present the results in a structured table with individual study effect sizes rather than forcing a misleading pooled estimate.
Step 1, Define Your Outcomes and Effect Measures
The first step is selecting the outcomes to analyze and the effect measure to express each one. This decision shapes every downstream calculation.
Continuous Outcomes
For continuous outcomes (blood pressure reduction, pain scores, cognitive test performance), the primary options are the standardized mean difference and the weighted mean difference. The standardized mean difference (SMD) is used when studies measure the same construct on different scales, for example, depression measured by both the PHQ-9 and the HAM-D. The two most common SMD variants are Cohen's d and Hedges' g. Hedges' g corrects Cohen's d for small samples by applying a correction factor that removes upward bias in studies with fewer than 20 participants per arm (Borenstein et al., 2009). For most meta-analyses, Hedges' g is preferred. When all studies use the same measurement scale, the weighted mean difference (WMD) preserves the original units and is easier to interpret clinically.
Binary Outcomes
For binary outcomes (mortality, treatment response, adverse events), the standard measures are the odds ratio (OR), risk ratio (RR), and risk difference (RD). Odds ratio calculation is the most common in meta-analysis because the OR has favorable mathematical properties, it is symmetric and unbounded, and can be computed from case-control studies where RR cannot. Risk ratio estimation is more intuitive clinically ("patients are 1.5 times more likely to respond") and is preferred for prospective studies. Risk difference expresses the absolute difference in event rates and is needed to calculate the number needed to treat.
Outcome Type
Effect Measure
When to Use
Key Property
Continuous, same scale
Weighted Mean Difference
All studies use identical measurement tool
Preserves original units
Continuous, different scales
Hedges' g (SMD)
Studies measure same construct on different scales
Unitless, corrected for small samples
Binary
Odds Ratio
Case-control studies or when RR is inappropriate
Symmetric, can be computed from any 2x2 table
Binary
Risk Ratio
Prospective studies, clinical interpretability
Directly interpretable as relative risk
Binary
Risk Difference
Absolute effect needed (NNT calculation)
Scale-dependent, affected by baseline risk
Choose your effect measure before extracting data, and register this choice in your protocol. Changing the effect measure after seeing results introduces selective reporting bias.
Step 2, Extract or Calculate Effect Sizes
With your effect measure defined, the next task is extracting or computing individual study effect sizes and their variances. Every meta-analysis requires effect size calculation from each included study before pooling can occur.
What Data to Extract
For continuous outcomes, extract sample sizes, means, and standard deviations for each group. For binary outcomes, extract the 2x2 table (events and non-events in intervention and control groups). When studies report only summary statistics, p-values, confidence intervals, or t-statistics, you can back-calculate effect sizes using established formulas. The Cochrane Handbook Chapter 6 provides the standard conversion equations (Higgins et al., 2023).
Handling Missing or Incomplete Data
Studies frequently omit the data you need. Some report medians and interquartile ranges instead of means and standard deviations. Others report only "p < 0.05" without a specific value. For median-to-mean conversion, methods by Wan et al. (2014) and Luo et al. (2018) provide validated approximations. For studies reporting only a p-value and sample size, you can convert the p-value to a t-statistic and then to an SMD, our convert p-value to confidence interval tool demonstrates this logic. When data cannot be extracted or estimated reliably, document the study as "included but not pooled" and explain why in your results section.
Converting Between Effect Measures
Sometimes individual studies report results in different formats. One randomized controlled trial reports means and SDs, another reports a t-test result, and a third reports only a p-value and direction of effect. You need to convert all of these into a common metric. Use our comprehensive effect size calculator to calculate Cohen's d and Hedges' g from means, t-statistics, F-statistics, or correlation coefficients. For binary outcomes, you can convert between OR, RR, and RD when you know the baseline event rate. Always report your conversion method in the statistical methods section.
Creating Your Data File
Organize extracted data in a structured spreadsheet with one row per study and columns for: study ID, year, sample sizes (intervention and control), effect size, standard error or variance, and any subgroup variables. This file becomes the direct input for your R or Stata analysis script.
Step 3, Choose Your Statistical Model
The choice between a fixed-effect model and a random-effects model is the most consequential statistical decision in your meta-analysis. This choice determines how study weights are calculated and what your pooled estimate actually means.
Fixed-Effect Model
The fixed effect model assumes that all included studies estimate the same true underlying effect. Variation across studies is attributed entirely to sampling error, the studies are conceptually identical replications. Under this assumption, larger studies receive proportionally more weight because they estimate the common effect with greater precision. The fixed-effect model uses inverse-variance weighting where each study's weight is the reciprocal of its within-study variance.
The fixed-effect model is appropriate when studies are functionally identical in population, intervention, and setting. In practice, this is rare. Multi-center pharmaceutical trials using the same protocol and the same drug dose in the same patient population come closest. For most clinical research questions, the fixed-effect assumption is implausible.
Random-Effects Model
The random-effects model assumes that the true effect varies across studies, each study estimates its own true effect, and these true effects are drawn from a distribution. A random effects meta-analysis accounts for both within-study sampling error and between-study variance (tau-squared). This model assigns more similar weights across studies because even small studies contribute information about the distribution of true effects.
The random-effects model accounts for between-study heterogeneity, making it the default recommendation for most meta-analyses in clinical and social science research (Borenstein et al., 2009). When tau-squared is zero (no between-study variance), the random-effects model collapses to the fixed-effect model, so it is never wrong to start with random-effects.
DerSimonian-Laird vs REML
The two most widely used estimators for tau-squared in random-effects models are the DerSimonian-Laird method (DerSimonian & Laird, 1986) and REML estimation (restricted maximum likelihood). DerSimonian-Laird is the historical default, computationally simple and implemented in most software. However, it underestimates tau-squared when the number of studies is small, producing confidence intervals that are too narrow. REML provides less biased variance estimates, especially with fewer than 20 studies, and is the default in the metafor package in R. For new meta-analyses, REML is the recommended estimator.
Step 4, Run the Pooled Analysis and Generate Forest Plots
With effect sizes extracted and your model selected, you now run the pooled analysis. The core computation is inverse variance weighting, each study's effect size is multiplied by its weight (the inverse of its variance), summed across all studies, and divided by the total weight to produce the pooled effect estimate.
In a fixed-effect model, the weight for study i is:
w_i = 1 / v_i
where v_i is the within-study variance. In a random-effects model, the weight becomes:
w_i = 1 / (v_i + tau-squared)
where tau-squared is the estimated between-study variance. This addition of tau-squared to the denominator is what makes random-effects weights more uniform across studies, large and small studies contribute more similarly because tau-squared is constant.
The Forest Plot
A meta-analysis produces a forest plot, the signature output that displays individual study results and the pooled estimate on a single axis. Each study appears as a square (sized proportional to its weight) centered on its effect estimate, with a horizontal line representing the 95% confidence interval. The pooled estimate appears as a diamond at the bottom, with the diamond's width representing its confidence interval. A forest plot visualizes the pooled effect size, individual study contributions, and the degree of consistency across studies at a single glance.
Beyond the confidence interval for the pooled effect, every random-effects meta-analysis should report a prediction interval. The confidence interval describes the uncertainty around the average true effect. The prediction interval describes the range within which the true effect of a future study is expected to fall. A meta-analysis might show a statistically significant pooled OR of 0.65 (95% CI: 0.50-0.85), but the prediction interval might be 0.30-1.40, indicating that in some future settings, the intervention could actually be harmful. Prediction intervals are computed from tau-squared and provide far more clinically relevant information than confidence intervals alone. The metafor package in R reports them by default with the predict() function.
Example R Code
Below is a minimal R script for running a random-effects meta-analysis and generating a forest plot using the metafor package:
library(metafor)
# Data: yi = effect sizes, vi = variances
dat <- escalc(measure = "SMD", m1i = m1, sd1i = sd1, n1i = n1,
m2i = m2, sd2i = sd2, n2i = n2, data = mydata)
# Fit random-effects model (REML is default)
res <- rma(yi, vi, data = dat)
# Forest plot
forest(res, slab = dat$study, header = TRUE,
xlab = "Standardized Mean Difference")
# Prediction interval
predict(res)
This code computes Hedges' g (the default for measure = "SMD" in metafor), fits a REML random-effects model, and generates a publication-ready forest plot. Include scripts like this as supplementary material with your manuscript.
Need help with your meta-analysis?
Our PhD statisticians run complete meta-analyses: effect sizes, forest plots, heterogeneity testing, and publication-ready results sections.
Heterogeneity is the degree to which the true effects vary across studies. Assessing statistical heterogeneity is mandatory, it determines whether your pooled estimate should be interpreted as a single value or as an average across a distribution of different effects.
I-Squared
The I-squared statistic quantifies the percentage of total variability across studies that is due to true heterogeneity rather than sampling error. I-squared measures statistical heterogeneity on a 0-100% scale. Cochrane classifies I-squared heterogeneity as low (0-40%), moderate (30-60%), substantial (50-90%), and considerable (75-100%) (Higgins et al., 2023). Note the overlapping ranges, interpretation depends on context, not rigid cutoffs. An I-squared of 60% in a pharmacological meta-analysis of identical drug doses may be concerning, while 60% in a behavioral intervention meta-analysis may be expected and acceptable.
While I-squared tells you the proportion of variability due to heterogeneity, tau-squared estimates between-study variance in absolute terms. A tau-squared of 0.10 for an SMD meta-analysis means the standard deviation of true effects across studies is approximately 0.32 (the square root of tau-squared). This absolute value is essential for computing prediction intervals and for determining whether heterogeneity is clinically meaningful, not just statistically detectable.
Cochran's Q Test
Cochran's Q is a chi-squared test of the null hypothesis that all studies share the same true effect. A significant Q (p < 0.10 is the conventional threshold, not p < 0.05) indicates heterogeneity. However, Q has low power with few studies and excessive power with many studies, so it should be interpreted alongside I-squared and tau-squared, never in isolation.
Prediction Intervals Revisited
The prediction interval deserves emphasis here because it is the most underreported heterogeneity metric. A meta-analysis with I-squared = 80% tells you that most variability is not due to chance. But the prediction interval tells you the actual range of effects you would expect to see in a new study, which is what clinicians and policymakers actually need to know. Always report prediction intervals alongside I-squared and tau-squared.
Sensitivity analysis and subgroup analysis explore the robustness of your pooled estimate and investigate potential sources of heterogeneity. These analyses separate a rigorous meta-analysis from a superficial one.
Leave-One-Out Analysis
Leave-one-out analysis systematically removes one study at a time and recalculates the pooled effect. If the pooled estimate changes substantially when a single study is removed, that study is highly influential and warrants scrutiny. Was it an outlier? Did it have a different population, different dose, or a high risk of bias? Use our leave-one-out analysis tool to run this analysis on your data. In R, the leave1out() function in metafor automates this:
leave1out(res)
Subgroup Analysis
Subgroup analysis stratifies studies by a categorical moderator (study design, geographic region, intervention dose, risk of bias) and computes separate pooled estimates for each subgroup. The test for subgroup differences examines whether the pooled effects differ significantly across categories.
Critical rule: pre-specify your subgroup analyses in your protocol before analyzing data. Post-hoc subgroups, invented after seeing the results, are hypothesis-generating only. Reviewers and editors will scrutinize unplanned subgroup analyses as potential data dredging. Limit yourself to 3-5 subgroups with a clinical or methodological rationale.
Meta-Regression
For continuous moderators (publication year, mean participant age, intervention duration), meta-regression analysis extends subgroup analysis by fitting the moderator as a continuous predictor. Meta-regression requires at least 10 studies to have adequate power, and results should be interpreted cautiously because it is an observational association across studies, not a within-study causal estimate. Use our meta-regression input tool to format your data for meta-regression in R.
In metafor, meta-regression is straightforward:
res_reg <- rma(yi, vi, mods = ~ year + dose, data = dat)
summary(res_reg)
Report the regression coefficient, its confidence interval, the residual I-squared (heterogeneity remaining after accounting for the moderator), and the R-squared analog (proportion of between-study variance explained).
Step 7, Test for Publication Bias
Publication bias, the selective publication of studies with positive or statistically significant results, threatens the validity of every meta-analysis. If your pooled estimate is based on a biased sample of studies, it will overestimate the true effect. Publication bias detection uses both visual and statistical methods.
Funnel Plot
A funnel plot graphs each study's effect size (x-axis) against a measure of its precision, typically the standard error (y-axis). In the absence of publication bias, studies scatter symmetrically around the pooled estimate in an inverted funnel shape, precise (large) studies cluster at the top, imprecise (small) studies spread widely at the bottom. A funnel plot detects publication bias when the plot shows asymmetry, typically missing studies in the bottom-left corner (small studies with non-significant results). For guidance on reading funnel plots, see interpreting funnel plots. Generate your own with our build a funnel plot.
Egger's Test
Egger's regression test formally tests funnel plot asymmetry by regressing the standardized effect size against its precision. A significant intercept (p < 0.10) suggests asymmetry consistent with publication bias or other small-study effects. Egger's regression test for funnel plot asymmetry requires a minimum of 10 studies to have adequate power (Sterne et al., 2011). With fewer than 10 studies, the test is unreliable and should not be used, report the funnel plot visually instead. In R:
regtest(res, model = "lm")
Trim-and-Fill
The trim and fill method estimates the number of missing studies needed to make the funnel plot symmetric, imputes those studies, and recalculates the pooled estimate. While useful as a sensitivity check, trim-and-fill assumes that the asymmetry is entirely due to publication bias (rather than other sources of small-study effects) and should be interpreted cautiously. In R:
tf <- trimfill(res)
funnel(tf)
Begg's Test
Begg's test (rank correlation test) is an alternative to Egger's test that uses Kendall's tau to test the association between effect sizes and their variances. It has lower power than Egger's test and is less commonly used today, but some reviewers still request it. In R:
A pooled effect estimate is only as useful as the confidence you can place in it. The GRADE framework (Grading of Recommendations, Assessment, Development, and Evaluation) provides a structured approach to rating the certainty of evidence from meta-analyses. GRADE produces a summary of findings table that rates each outcome as high, moderate, low, or very low certainty.
GRADE evaluates five domains that can downgrade certainty:
Domain
Downgrades When
Risk of bias
Included studies have high risk of bias (inadequate randomization, blinding, attrition)
Inconsistency
High heterogeneity (I-squared > 50%) without explanation
Indirectness
Studies do not directly address the clinical question (different population, comparator, or outcome)
Imprecision
Wide confidence intervals crossing the line of clinical significance
Publication bias
Funnel plot asymmetry, Egger's test significant, or evidence of selective reporting
And three domains that can upgrade certainty (applicable mainly to observational studies):
Large magnitude of effect (RR > 2 or < 0.5)
Dose-response gradient
Residual confounding would reduce the effect
The summary of findings table presents the pooled estimate, confidence interval, number of studies and participants, and the GRADE rating for each outcome. Most clinical journals, Cochrane, and guideline organizations require GRADE assessments. GRADE produces a summary of findings table that communicates both the statistical result and the confidence warranted by the underlying evidence, a far more useful output than a pooled estimate alone.
Step 9, Report Your Results
Transparent reporting ensures that readers can evaluate your methods, reproduce your analysis, and build on your findings. The PRISMA 2020 statement provides a 27-item checklist for reporting systematic reviews and meta-analyses.
Essential Reporting Elements
Your results section must include:
Forest plots for every pre-specified outcome, with study labels, effect estimates, confidence intervals, weights, and the pooled diamond. Use our free forest plot tool to generate publication-ready figures.
Heterogeneity statistics for each analysis: I-squared, tau-squared, Cochran's Q (with degrees of freedom and p-value), and prediction intervals.
Funnel plots with Egger's test results for analyses including 10 or more studies.
Sensitivity analysis results, including leave-one-out analysis and any analyses excluding high risk-of-bias studies.
Subgroup analysis results with tests for subgroup differences.
GRADE summary of findings table rating evidence certainty for each outcome.
Writing the Results Narrative
Present pooled estimates with their confidence intervals and prediction intervals in the text. State the direction and magnitude of the effect, then immediately report the heterogeneity. For example: "The pooled standardized mean difference was -0.45 (95% CI: -0.62 to -0.28; 95% PI: -1.10 to 0.20), indicating a moderate treatment effect on average, though the prediction interval includes the possibility of no effect or harm in some settings (I-squared = 72%, tau-squared = 0.15)."
Always report both statistical significance and clinical significance. A pooled odds ratio of 0.92 (95% CI: 0.85-0.99) is statistically significant but may not be clinically meaningful if the absolute risk reduction is less than 1%.
Supplementary Material
Include your full R or Stata analysis code, the extracted data file, and any sensitivity analyses not shown in the main text as supplementary material. This allows peer reviewers to verify your results and future researchers to update your meta-analysis when new studies are published.
Software for Meta-Analysis
The choice of software determines reproducibility, analytical flexibility, and publication readiness. Open-source platforms, particularly R and Stata, are strongly preferred over proprietary black-box tools.
R
R is the most widely used open-source platform for meta-analysis, with several specialized packages:
metafor, The most comprehensive package. Supports fixed-effect, random-effects (DL, REML, PM, and other estimators), meta-regression, multivariate meta-analysis, and network meta-analysis. Defaults to REML. Produces publication-ready forest plots, funnel plots, and diagnostic plots. Maintained by Wolfgang Viechtbauer.
meta, A user-friendly wrapper that calls metafor internally. Provides convenient functions like metabin() for binary outcomes and metacont() for continuous outcomes. Excellent for standard pairwise meta-analyses.
netmeta, Specialized for network meta-analysis (comparing multiple interventions simultaneously). Produces league tables and SUCRA rankings.
dmetar, A companion package to the "Doing Meta-Analysis in R" guide. Provides additional functions for outlier detection, influence analysis, and power calculations.
For researchers learning how to perform a meta-analysis in R, the combination of metafor for computation and forest() for visualization covers the vast majority of use cases.
Stata
Stata provides robust meta-analysis capabilities through community-contributed commands:
metan, The workhorse command for pairwise meta-analysis. Handles both fixed and random-effects models with a single command.
metafunnel, Generates funnel plots for publication bias assessment.
metareg, Runs meta-regression with continuous and categorical moderators.
network, Stata's suite for network meta-analysis.
Stata's advantage is its integrated environment, data management, analysis, and graphing in a single platform. Its disadvantage is the license cost.
CMA (Comprehensive Meta-Analysis)
CMA is a commercial point-and-click tool designed specifically for meta-analysis. While user-friendly, it is a black box, users cannot inspect or share the underlying code, making results non-reproducible. Peer reviewers increasingly expect reproducible scripts, which CMA cannot provide. We do not recommend CMA for publication-bound meta-analyses.
RevMan (Review Manager)
RevMan is Cochrane's free meta-analysis software. It integrates data entry, risk of bias assessment, and meta-analysis in a single workflow. RevMan is adequate for standard Cochrane reviews with pairwise comparisons but lacks the flexibility for meta-regression, network meta-analysis, or custom sensitivity analyses. It is appropriate for Cochrane reviews but limiting for independent research.
Software
Cost
Reproducible Code
Meta-Regression
Network meta-analysis
Best For
R (metafor)
Free
Yes
Yes
Yes (netmeta)
Publication-bound research, advanced methods
Stata
$125-$1,395/yr
Yes
Yes
Yes
Integrated analysis environment
CMA
$195-$1,295
No
Limited
No
Teaching, quick exploratory analyses
RevMan
Free
No
No
No
Standard Cochrane reviews
Open-source software (R, Stata) matters because transparency is the currency of credible meta-analysis. When reviewers can run your code and reproduce your forest plots, confidence in your results increases. When they cannot, they question whether the outputs were generated correctly.
Common Meta-Analysis Mistakes
Even experienced researchers make methodological errors that undermine their meta-analyses. Recognizing these mistakes in advance prevents months of revision and resubmission cycles.
Using a fixed-effect model with high heterogeneity. When I-squared exceeds 50% and the Q-test is significant, a fixed-effect model produces confidence intervals that are too narrow because it ignores between-study variance. The pooled estimate under a fixed-effect model with heterogeneous studies gives disproportionate weight to the largest study and fails to account for the reality that true effects vary. Always use a random-effects model unless you have a strong justification for assuming a common true effect.
Omitting prediction intervals. The confidence interval around the pooled estimate tells you the precision of the average effect. The prediction interval tells you the range of effects you would expect in a new study. A statistically significant pooled estimate with a prediction interval that crosses the null indicates that the intervention may not work in all settings, a critical insight that the confidence interval alone conceals.
Conducting post-hoc subgroup analyses and presenting them as confirmatory. Subgroup analyses that were not pre-specified in the protocol are exploratory. Presenting them without acknowledging their post-hoc nature is misleading and will be identified by experienced reviewers.
Mixing incompatible effect measures. Pooling odds ratios from some studies with risk ratios from others without conversion produces meaningless results. Standardize all studies to the same effect measure before pooling.
Running Egger's test with fewer than 10 studies. Egger's regression test for funnel plot asymmetry requires a minimum of 10 studies to have adequate power (Sterne et al., 2011). With fewer studies, the test cannot distinguish real asymmetry from sampling variation, and a non-significant result falsely reassures.
Ignoring the clinical context of heterogeneity. An I-squared of 75% does not automatically invalidate a meta-analysis. If the heterogeneity is explained by pre-specified subgroup variables (e.g., drug dose, disease severity) and the subgroup-specific estimates are clinically coherent, the meta-analysis remains informative. The mistake is reporting I-squared without investigating what drives it.
Failing to provide reproducible code. A forest plot without the underlying analysis code is an unverifiable claim. Peer reviewers increasingly request R or Stata scripts as supplementary material, and journals are beginning to mandate code sharing for meta-analyses.
For studies with access to raw participant-level data, the IPD meta-analysis guide covers the one-stage and two-stage frameworks, hierarchical modelling, and how individual patient data changes the analysis.
Diagnostic accuracy reviews require a different statistical machinery than intervention reviews, and the diagnostic test accuracy meta-analysis guide walks through the bivariate model, HSROC curves, and QUADAS-2 risk of bias.
Bringing It All Together
Learning how to do a meta-analysis means mastering a nine-step pipeline: define outcomes and choose your effect measure, extract or calculate effect sizes from each included study, select the appropriate statistical model (almost always random-effects with REML), run the pooled analysis and generate forest plots, assess heterogeneity using I-squared, tau-squared, and prediction intervals, conduct pre-specified sensitivity and subgroup analyses, test for publication bias using funnel plots and Egger's test, rate the certainty of evidence with GRADE, and report your results transparently per PRISMA 2020.
Each step has decision points that require both statistical knowledge and clinical judgment. The software you use, ideally R (metafor) or Stata, should be open-source and produce reproducible code. The outputs you deliver, forest plots, funnel plots, heterogeneity statistics, sensitivity analyses, and a GRADE summary of findings table, must meet journal standards on first submission.
For researchers who need publication-ready results without the learning curve, consider our professional research services. Our biostatisticians specialize in meta-analysis across clinical medicine, epidemiology, and health technology assessment. We deliver reproducible R code, publication-ready forest plots, GRADE-rated evidence tables, and a complete statistical methods section. Learn more about our meta-analysis service or explore the process of hiring a meta-analysis expert.
References
Borenstein, M., Hedges, L.V., Higgins, J.P.T., & Rothstein, H.R. (2009). Introduction to Meta-Analysis. Wiley.
DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7(3), 177-188.
Higgins, J.P.T., Thomas, J., Chandler, J., Cumpston, M., Li, T., Page, M.J., & Welch, V.A. (Eds.) (2023). Cochrane Handbook for Systematic Reviews of Interventions (Version 6.4). Cochrane.
Sterne, J.A.C., Sutton, A.J., Ioannidis, J.P.A., et al. (2011). Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ, 343, d4002.
Research Gold's medical translation service delivers publication-grade translations of manuscripts, clinical trial protocols, regulatory submissions, and patient-reported outcome instruments. Translators are PhD-credentialed and ATA-certified, working through a two-translator translation, edit, proofread workflow. English-Arabic is our deepest pair, with strong coverage of Spanish, Mandarin, French, German, Portuguese, Japanese, Korean, and Turkish. From $0.12 per source word.
Diagnostic test accuracy meta-analysis pools paired sensitivity and specificity estimates across studies using bivariate or HSROC models. This guide covers when bivariate methods are required, the role of the summary ROC curve, threshold effects, QUADAS-2 risk of bias, and software options including R mada, R metafor, and Stata midas.
Research Gold's grant writing service drafts publishable NIH, NIHR, R01, K-award, and institutional research grant applications. PhD writers handle specific aims, significance, innovation, approach, statistical analysis plan, sample size, biosketches, and budget justification. Mock peer review before submission. From $400 for specific aims, $3,500 for a full R01.