This is the definitive guide on how to do a meta-analysis. A meta-analysis is a statistical method that combines effect sizes from multiple independent studies to produce a single pooled estimate of a treatment effect or association. It uses inverse variance weighting, assesses heterogeneity via I-squared and tau-squared, and produces forest plots to visualize results. Meta-analysis is typically conducted within a systematic review following Cochrane Handbook methodology (Higgins et al., 2023).
Whether you are pooling clinical trial results, observational cohort data, or diagnostic accuracy studies, this step-by-step meta-analysis guide for beginners walks you through every decision point with the statistical reasoning behind it. Our biostatisticians have conducted meta-analyses across clinical medicine, public health, and pharmacology, the most common error we see is ignoring prediction intervals, which describe the range of true effects you would expect in a future study conducted in a new setting. This guide covers the complete meta-analysis methodology from defining outcomes through GRADE-rated certainty of evidence, with links to free calculators and example R code at every step. For the broader evidence synthesis process, see our complete systematic review guide.
What Is a Meta-Analysis?
A meta-analysis is a quantitative data synthesis technique that statistically combines the results of two or more independent studies addressing the same research question. Unlike a narrative review that summarizes findings qualitatively, a meta-analysis calculates a single pooled effect estimate with a confidence interval, giving researchers a precise, weighted summary of the evidence.
The distinction between meta-analysis and systematic review is fundamental. A systematic review is the complete methodological framework, protocol registration, database searching, screening, data extraction, and quality appraisal. A meta-analysis is the optional statistical component that sits inside a systematic review when the included studies are sufficiently homogeneous to pool quantitatively. Not every systematic review includes a meta-analysis, and not every meta-analysis is embedded in a systematic review, though best practice favors the combination. For a deeper comparison, see SR vs meta-analysis explained.
In formal terms, a meta-analysis is a statistical synthesis method (Borenstein et al., 2009). It is a component of a systematic review. It requires effect size calculation from each included study. And it produces a forest plot, the defining visualization of pooled results. These four relationships anchor every decision in the nine-step process below.
When Is a Meta-Analysis Appropriate?
A meta-analysis is appropriate when the included studies are sufficiently similar in population, intervention, comparator, and outcome (PICO) to justify statistical pooling. The decision to pool is clinical and methodological before it is statistical.
Three conditions must be met. First, clinical homogeneity: the studies must address the same or closely related clinical question. Pooling a drug trial in pediatric asthma with a surgical trial in adult COPD produces a number, but not a meaningful one. Second, methodological similarity: studies should share broadly comparable designs (e.g., all RCTs, or all prospective cohorts). Mixing RCTs with cross-sectional studies introduces confounding that no statistical model can resolve. Third, sufficient studies: while two studies is the technical minimum, five or more are recommended for meaningful heterogeneity assessment, and ten or more are needed for reliable publication bias detection using Egger's test (Sterne et al., 2011).
When should you not meta-analyze? When the included studies differ so profoundly in design, population, or intervention that pooling them would obscure rather than clarify. In these cases, narrative synthesis, a structured qualitative summary, is more appropriate. The Cochrane Handbook explicitly warns against meta-analysis when "there is considerable variation in the results and the variation is not explained by study characteristics" (Higgins et al., 2023). If you determine that pooling is unjustified after examining your data, present the results in a structured table with individual study effect sizes rather than forcing a misleading pooled estimate.
Step 1, Define Your Outcomes and Effect Measures
The first step is selecting the outcomes to analyze and the effect measure to express each one. This decision shapes every downstream calculation.
Continuous Outcomes
For continuous outcomes (blood pressure reduction, pain scores, cognitive test performance), the primary options are the standardized mean difference and the weighted mean difference. The standardized mean difference (SMD) is used when studies measure the same construct on different scales, for example, depression measured by both the PHQ-9 and the HAM-D. The two most common SMD variants are Cohen's d and Hedges' g. Hedges' g corrects Cohen's d for small samples by applying a correction factor that removes upward bias in studies with fewer than 20 participants per arm (Borenstein et al., 2009). For most meta-analyses, Hedges' g is preferred. When all studies use the same measurement scale, the weighted mean difference (WMD) preserves the original units and is easier to interpret clinically.
Binary Outcomes
For binary outcomes (mortality, treatment response, adverse events), the standard measures are the odds ratio (OR), risk ratio (RR), and risk difference (RD). Odds ratio calculation is the most common in meta-analysis because the OR has favorable mathematical properties, it is symmetric and unbounded, and can be computed from case-control studies where RR cannot. Risk ratio estimation is more intuitive clinically ("patients are 1.5 times more likely to respond") and is preferred for prospective studies. Risk difference expresses the absolute difference in event rates and is needed to calculate the number needed to treat.
| Outcome Type | Effect Measure | When to Use | Key Property |
|---|---|---|---|
| Continuous, same scale | Weighted Mean Difference | All studies use identical measurement tool | Preserves original units |
| Continuous, different scales | Hedges' g (SMD) | Studies measure same construct on different scales | Unitless, corrected for small samples |
| Binary | Odds Ratio | Case-control studies or when RR is inappropriate | Symmetric, can be computed from any 2x2 table |
| Binary | Risk Ratio | Prospective studies, clinical interpretability | Directly interpretable as relative risk |
| Binary | Risk Difference | Absolute effect needed (NNT calculation) | Scale-dependent, affected by baseline risk |
Choose your effect measure before extracting data, and register this choice in your protocol. Changing the effect measure after seeing results introduces selective reporting bias.
Step 2, Extract or Calculate Effect Sizes
With your effect measure defined, the next task is extracting or computing individual study effect sizes and their variances. Every meta-analysis requires effect size calculation from each included study before pooling can occur.
What Data to Extract
For continuous outcomes, extract sample sizes, means, and standard deviations for each group. For binary outcomes, extract the 2x2 table (events and non-events in intervention and control groups). When studies report only summary statistics, p-values, confidence intervals, or t-statistics, you can back-calculate effect sizes using established formulas. The Cochrane Handbook Chapter 6 provides the standard conversion equations (Higgins et al., 2023).
Handling Missing or Incomplete Data
Studies frequently omit the data you need. Some report medians and interquartile ranges instead of means and standard deviations. Others report only "p < 0.05" without a specific value. For median-to-mean conversion, methods by Wan et al. (2014) and Luo et al. (2018) provide validated approximations. For studies reporting only a p-value and sample size, you can convert the p-value to a t-statistic and then to an SMD, our convert p-value to confidence interval tool demonstrates this logic. When data cannot be extracted or estimated reliably, document the study as "included but not pooled" and explain why in your results section.
Converting Between Effect Measures
Sometimes individual studies report results in different formats. One RCT reports means and SDs, another reports a t-test result, and a third reports only a p-value and direction of effect. You need to convert all of these into a common metric. Use our effect size calculator to calculate Cohen's d and Hedges' g from means, t-statistics, F-statistics, or correlation coefficients. For binary outcomes, you can convert between OR, RR, and RD when you know the baseline event rate. Always report your conversion method in the statistical methods section.
Creating Your Data File
Organize extracted data in a structured spreadsheet with one row per study and columns for: study ID, year, sample sizes (intervention and control), effect size, standard error or variance, and any subgroup variables. This file becomes the direct input for your R or Stata analysis script.
Step 3, Choose Your Statistical Model
The choice between a fixed-effect model and a random-effects model is the most consequential statistical decision in your meta-analysis. This choice determines how study weights are calculated and what your pooled estimate actually means.
Fixed-Effect Model
The fixed effect model assumes that all included studies estimate the same true underlying effect. Variation across studies is attributed entirely to sampling error, the studies are conceptually identical replications. Under this assumption, larger studies receive proportionally more weight because they estimate the common effect with greater precision. The fixed-effect model uses inverse-variance weighting where each study's weight is the reciprocal of its within-study variance.
The fixed-effect model is appropriate when studies are functionally identical in population, intervention, and setting. In practice, this is rare. Multi-center pharmaceutical trials using the same protocol and the same drug dose in the same patient population come closest. For most clinical research questions, the fixed-effect assumption is implausible.
Random-Effects Model
The random-effects model assumes that the true effect varies across studies, each study estimates its own true effect, and these true effects are drawn from a distribution. A random effects meta-analysis accounts for both within-study sampling error and between-study variance (tau-squared). This model assigns more similar weights across studies because even small studies contribute information about the distribution of true effects.
The random-effects model accounts for between-study heterogeneity, making it the default recommendation for most meta-analyses in clinical and social science research (Borenstein et al., 2009). When tau-squared is zero (no between-study variance), the random-effects model collapses to the fixed-effect model, so it is never wrong to start with random-effects.
DerSimonian-Laird vs REML
The two most widely used estimators for tau-squared in random-effects models are the DerSimonian-Laird method (DerSimonian & Laird, 1986) and REML estimation (restricted maximum likelihood). DerSimonian-Laird is the historical default, computationally simple and implemented in most software. However, it underestimates tau-squared when the number of studies is small, producing confidence intervals that are too narrow. REML provides less biased variance estimates, especially with fewer than 20 studies, and is the default in the metafor package in R. For new meta-analyses, REML is the recommended estimator.
| Feature | Fixed-Effect | Random-Effects (DL) | Random-Effects (REML) |
|---|---|---|---|
| Assumption | One true effect | Distribution of true effects | Distribution of true effects |
| Weights | Inverse within-study variance only | Inverse within-study + between-study variance | Inverse within-study + between-study variance |
| Tau-squared estimation | Not applicable | Moment-based (simple, may underestimate) | Likelihood-based (less biased) |
| Best for | Identical protocols, same population | General use, adequate with many studies | General use, especially few studies |
For a detailed comparison with worked examples, read choosing your MA model.
Step 4, Run the Pooled Analysis and Generate Forest Plots
With effect sizes extracted and your model selected, you now run the pooled analysis. The core computation is inverse variance weighting, each study's effect size is multiplied by its weight (the inverse of its variance), summed across all studies, and divided by the total weight to produce the pooled effect estimate.
In a fixed-effect model, the weight for study i is:
w_i = 1 / v_i
where v_i is the within-study variance. In a random-effects model, the weight becomes:
w_i = 1 / (v_i + tau-squared)
where tau-squared is the estimated between-study variance. This addition of tau-squared to the denominator is what makes random-effects weights more uniform across studies, large and small studies contribute more similarly because tau-squared is constant.
The Forest Plot
A meta-analysis produces a forest plot, the signature output that displays individual study results and the pooled estimate on a single axis. Each study appears as a square (sized proportional to its weight) centered on its effect estimate, with a horizontal line representing the 95% confidence interval. The pooled estimate appears as a diamond at the bottom, with the diamond's width representing its confidence interval. A forest plot visualizes the pooled effect size, individual study contributions, and the degree of consistency across studies at a single glance.
For a thorough walkthrough of reading forest plots, see how to read a forest plot. To generate your own, use our free forest plot tool.
Prediction Intervals
Beyond the confidence interval for the pooled effect, every random-effects meta-analysis should report a prediction interval. The confidence interval describes the uncertainty around the average true effect. The prediction interval describes the range within which the true effect of a future study is expected to fall. A meta-analysis might show a statistically significant pooled OR of 0.65 (95% CI: 0.50-0.85), but the prediction interval might be 0.30-1.40, indicating that in some future settings, the intervention could actually be harmful. Prediction intervals are computed from tau-squared and provide far more clinically relevant information than confidence intervals alone. The metafor package in R reports them by default with the predict() function.
Example R Code
Below is a minimal R script for running a random-effects meta-analysis and generating a forest plot using the metafor package:
library(metafor)
# Data: yi = effect sizes, vi = variances
dat <- escalc(measure = "SMD", m1i = m1, sd1i = sd1, n1i = n1,
m2i = m2, sd2i = sd2, n2i = n2, data = mydata)
# Fit random-effects model (REML is default)
res <- rma(yi, vi, data = dat)
# Forest plot
forest(res, slab = dat$study, header = TRUE,
xlab = "Standardized Mean Difference")
# Prediction interval
predict(res)
This code computes Hedges' g (the default for measure = "SMD" in metafor), fits a REML random-effects model, and generates a publication-ready forest plot. Include scripts like this as supplementary material with your manuscript.
Step 5, Assess Heterogeneity
Heterogeneity is the degree to which the true effects vary across studies. Assessing statistical heterogeneity is mandatory, it determines whether your pooled estimate should be interpreted as a single value or as an average across a distribution of different effects.
I-Squared
The I-squared statistic quantifies the percentage of total variability across studies that is due to true heterogeneity rather than sampling error. I-squared measures statistical heterogeneity on a 0-100% scale. Cochrane classifies I-squared heterogeneity as low (0-40%), moderate (30-60%), substantial (50-90%), and considerable (75-100%) (Higgins et al., 2023). Note the overlapping ranges, interpretation depends on context, not rigid cutoffs. An I-squared of 60% in a pharmacological meta-analysis of identical drug doses may be concerning, while 60% in a behavioral intervention meta-analysis may be expected and acceptable.
Use our I-squared and tau-squared calculator to compute heterogeneity statistics from your data. For a detailed interpretation guide, see I-squared interpretation guide.
Tau-Squared
While I-squared tells you the proportion of variability due to heterogeneity, tau-squared estimates between-study variance in absolute terms. A tau-squared of 0.10 for an SMD meta-analysis means the standard deviation of true effects across studies is approximately 0.32 (the square root of tau-squared). This absolute value is essential for computing prediction intervals and for determining whether heterogeneity is clinically meaningful, not just statistically detectable.
Cochran's Q Test
Cochran's Q is a chi-squared test of the null hypothesis that all studies share the same true effect. A significant Q (p < 0.10 is the conventional threshold, not p < 0.05) indicates heterogeneity. However, Q has low power with few studies and excessive power with many studies, so it should be interpreted alongside I-squared and tau-squared, never in isolation.
Prediction Intervals Revisited
The prediction interval deserves emphasis here because it is the most underreported heterogeneity metric. A meta-analysis with I-squared = 80% tells you that most variability is not due to chance. But the prediction interval tells you the actual range of effects you would expect to see in a new study, which is what clinicians and policymakers actually need to know. Always report prediction intervals alongside I-squared and tau-squared.
Step 6, Conduct Sensitivity and Subgroup Analyses
Sensitivity analysis and subgroup analysis explore the robustness of your pooled estimate and investigate potential sources of heterogeneity. These analyses separate a rigorous meta-analysis from a superficial one.
Leave-One-Out Analysis
Leave-one-out analysis systematically removes one study at a time and recalculates the pooled effect. If the pooled estimate changes substantially when a single study is removed, that study is highly influential and warrants scrutiny. Was it an outlier? Did it have a different population, different dose, or a high risk of bias? Use our leave-one-out analysis tool to run this analysis on your data. In R, the leave1out() function in metafor automates this:
leave1out(res)
Subgroup Analysis
Subgroup analysis stratifies studies by a categorical moderator (study design, geographic region, intervention dose, risk of bias) and computes separate pooled estimates for each subgroup. The test for subgroup differences examines whether the pooled effects differ significantly across categories.
Critical rule: pre-specify your subgroup analyses in your protocol before analyzing data. Post-hoc subgroups, invented after seeing the results, are hypothesis-generating only. Reviewers and editors will scrutinize unplanned subgroup analyses as potential data dredging. Limit yourself to 3-5 subgroups with a clinical or methodological rationale.
Meta-Regression
For continuous moderators (publication year, mean participant age, intervention duration), meta-regression analysis extends subgroup analysis by fitting the moderator as a continuous predictor. Meta-regression requires at least 10 studies to have adequate power, and results should be interpreted cautiously because it is an observational association across studies, not a within-study causal estimate. Use our meta-regression input tool to format your data for meta-regression in R.
In metafor, meta-regression is straightforward:
res_reg <- rma(yi, vi, mods = ~ year + dose, data = dat)
summary(res_reg)
Report the regression coefficient, its confidence interval, the residual I-squared (heterogeneity remaining after accounting for the moderator), and the R-squared analog (proportion of between-study variance explained).
Step 7, Test for Publication Bias
Publication bias, the selective publication of studies with positive or statistically significant results, threatens the validity of every meta-analysis. If your pooled estimate is based on a biased sample of studies, it will overestimate the true effect. Publication bias detection uses both visual and statistical methods.
Funnel Plot
A funnel plot graphs each study's effect size (x-axis) against a measure of its precision, typically the standard error (y-axis). In the absence of publication bias, studies scatter symmetrically around the pooled estimate in an inverted funnel shape, precise (large) studies cluster at the top, imprecise (small) studies spread widely at the bottom. A funnel plot detects publication bias when the plot shows asymmetry, typically missing studies in the bottom-left corner (small studies with non-significant results). For guidance on reading funnel plots, see interpreting funnel plots. Generate your own with our funnel plot generator.
Egger's Test
Egger's regression test formally tests funnel plot asymmetry by regressing the standardized effect size against its precision. A significant intercept (p < 0.10) suggests asymmetry consistent with publication bias or other small-study effects. Egger's regression test for funnel plot asymmetry requires a minimum of 10 studies to have adequate power (Sterne et al., 2011). With fewer than 10 studies, the test is unreliable and should not be used, report the funnel plot visually instead. In R:
regtest(res, model = "lm")
Trim-and-Fill
The trim and fill method estimates the number of missing studies needed to make the funnel plot symmetric, imputes those studies, and recalculates the pooled estimate. While useful as a sensitivity check, trim-and-fill assumes that the asymmetry is entirely due to publication bias (rather than other sources of small-study effects) and should be interpreted cautiously. In R:
tf <- trimfill(res)
funnel(tf)
Begg's Test
Begg's test (rank correlation test) is an alternative to Egger's test that uses Kendall's tau to test the association between effect sizes and their variances. It has lower power than Egger's test and is less commonly used today, but some reviewers still request it. In R:
ranktest(res)
For a comprehensive overview of all publication bias methods, see publication bias methods.
Step 8, Rate Certainty of Evidence
A pooled effect estimate is only as useful as the confidence you can place in it. The GRADE framework (Grading of Recommendations, Assessment, Development, and Evaluation) provides a structured approach to rating the certainty of evidence from meta-analyses. GRADE produces a summary of findings table that rates each outcome as high, moderate, low, or very low certainty.
GRADE evaluates five domains that can downgrade certainty:
| Domain | Downgrades When |
|---|---|
| Risk of bias | Included studies have high risk of bias (inadequate randomization, blinding, attrition) |
| Inconsistency | High heterogeneity (I-squared > 50%) without explanation |
| Indirectness | Studies do not directly address the clinical question (different population, comparator, or outcome) |
| Imprecision | Wide confidence intervals crossing the line of clinical significance |
| Publication bias | Funnel plot asymmetry, Egger's test significant, or evidence of selective reporting |
And three domains that can upgrade certainty (applicable mainly to observational studies):
- Large magnitude of effect (RR > 2 or < 0.5)
- Dose-response gradient
- Residual confounding would reduce the effect
The summary of findings table presents the pooled estimate, confidence interval, number of studies and participants, and the GRADE rating for each outcome. Most clinical journals, Cochrane, and guideline organizations require GRADE assessments. GRADE produces a summary of findings table that communicates both the statistical result and the confidence warranted by the underlying evidence, a far more useful output than a pooled estimate alone.
Step 9, Report Your Results
Transparent reporting ensures that readers can evaluate your methods, reproduce your analysis, and build on your findings. The PRISMA 2020 statement provides a 27-item checklist for reporting systematic reviews and meta-analyses.
Essential Reporting Elements
Your results section must include:
- Forest plots for every pre-specified outcome, with study labels, effect estimates, confidence intervals, weights, and the pooled diamond. Use our free forest plot tool to generate publication-ready figures.
- Heterogeneity statistics for each analysis: I-squared, tau-squared, Cochran's Q (with degrees of freedom and p-value), and prediction intervals.
- Funnel plots with Egger's test results for analyses including 10 or more studies.
- Sensitivity analysis results, including leave-one-out analysis and any analyses excluding high risk-of-bias studies.
- Subgroup analysis results with tests for subgroup differences.
- GRADE summary of findings table rating evidence certainty for each outcome.
Writing the Results Narrative
Present pooled estimates with their confidence intervals and prediction intervals in the text. State the direction and magnitude of the effect, then immediately report the heterogeneity. For example: "The pooled standardized mean difference was -0.45 (95% CI: -0.62 to -0.28; 95% PI: -1.10 to 0.20), indicating a moderate treatment effect on average, though the prediction interval includes the possibility of no effect or harm in some settings (I-squared = 72%, tau-squared = 0.15)."
Always report both statistical significance and clinical significance. A pooled odds ratio of 0.92 (95% CI: 0.85-0.99) is statistically significant but may not be clinically meaningful if the absolute risk reduction is less than 1%.
Supplementary Material
Include your full R or Stata analysis code, the extracted data file, and any sensitivity analyses not shown in the main text as supplementary material. This allows peer reviewers to verify your results and future researchers to update your meta-analysis when new studies are published.
Software for Meta-Analysis
The choice of software determines reproducibility, analytical flexibility, and publication readiness. Open-source platforms, particularly R and Stata, are strongly preferred over proprietary black-box tools.
R
R is the most widely used open-source platform for meta-analysis, with several specialized packages:
- metafor, The most comprehensive package. Supports fixed-effect, random-effects (DL, REML, PM, and other estimators), meta-regression, multivariate meta-analysis, and network meta-analysis. Defaults to REML. Produces publication-ready forest plots, funnel plots, and diagnostic plots. Maintained by Wolfgang Viechtbauer.
- meta, A user-friendly wrapper that calls metafor internally. Provides convenient functions like metabin() for binary outcomes and metacont() for continuous outcomes. Excellent for standard pairwise meta-analyses.
- netmeta, Specialized for network meta-analysis (comparing multiple interventions simultaneously). Produces league tables and SUCRA rankings.
- dmetar, A companion package to the "Doing Meta-Analysis in R" guide. Provides additional functions for outlier detection, influence analysis, and power calculations.
For researchers learning how to perform a meta-analysis in R, the combination of metafor for computation and forest() for visualization covers the vast majority of use cases.
Stata
Stata provides robust meta-analysis capabilities through community-contributed commands:
- metan, The workhorse command for pairwise meta-analysis. Handles both fixed and random-effects models with a single command.
- metafunnel, Generates funnel plots for publication bias assessment.
- metareg, Runs meta-regression with continuous and categorical moderators.
- network, Stata's suite for network meta-analysis.
Stata's advantage is its integrated environment, data management, analysis, and graphing in a single platform. Its disadvantage is the license cost.
CMA (Comprehensive Meta-Analysis)
CMA is a commercial point-and-click tool designed specifically for meta-analysis. While user-friendly, it is a black box, users cannot inspect or share the underlying code, making results non-reproducible. Peer reviewers increasingly expect reproducible scripts, which CMA cannot provide. We do not recommend CMA for publication-bound meta-analyses.
RevMan (Review Manager)
RevMan is Cochrane's free meta-analysis software. It integrates data entry, risk of bias assessment, and meta-analysis in a single workflow. RevMan is adequate for standard Cochrane reviews with pairwise comparisons but lacks the flexibility for meta-regression, network meta-analysis, or custom sensitivity analyses. It is appropriate for Cochrane reviews but limiting for independent research.
| Software | Cost | Reproducible Code | Meta-Regression | Network MA | Best For |
|---|---|---|---|---|---|
| R (metafor) | Free | Yes | Yes | Yes (netmeta) | Publication-bound research, advanced methods |
| Stata | $125-$1,395/yr | Yes | Yes | Yes | Integrated analysis environment |
| CMA | $195-$1,295 | No | Limited | No | Teaching, quick exploratory analyses |
| RevMan | Free | No | No | No | Standard Cochrane reviews |
Open-source software (R, Stata) matters because transparency is the currency of credible meta-analysis. When reviewers can run your code and reproduce your forest plots, confidence in your results increases. When they cannot, they question whether the outputs were generated correctly.
Common Meta-Analysis Mistakes
Even experienced researchers make methodological errors that undermine their meta-analyses. Recognizing these mistakes in advance prevents months of revision and resubmission cycles.
Using a fixed-effect model with high heterogeneity. When I-squared exceeds 50% and the Q-test is significant, a fixed-effect model produces confidence intervals that are too narrow because it ignores between-study variance. The pooled estimate under a fixed-effect model with heterogeneous studies gives disproportionate weight to the largest study and fails to account for the reality that true effects vary. Always use a random-effects model unless you have a strong justification for assuming a common true effect.
Omitting prediction intervals. The confidence interval around the pooled estimate tells you the precision of the average effect. The prediction interval tells you the range of effects you would expect in a new study. A statistically significant pooled estimate with a prediction interval that crosses the null indicates that the intervention may not work in all settings, a critical insight that the confidence interval alone conceals.
Conducting post-hoc subgroup analyses and presenting them as confirmatory. Subgroup analyses that were not pre-specified in the protocol are exploratory. Presenting them without acknowledging their post-hoc nature is misleading and will be identified by experienced reviewers.
Mixing incompatible effect measures. Pooling odds ratios from some studies with risk ratios from others without conversion produces meaningless results. Standardize all studies to the same effect measure before pooling.
Running Egger's test with fewer than 10 studies. Egger's regression test for funnel plot asymmetry requires a minimum of 10 studies to have adequate power (Sterne et al., 2011). With fewer studies, the test cannot distinguish real asymmetry from sampling variation, and a non-significant result falsely reassures.
Ignoring the clinical context of heterogeneity. An I-squared of 75% does not automatically invalidate a meta-analysis. If the heterogeneity is explained by pre-specified subgroup variables (e.g., drug dose, disease severity) and the subgroup-specific estimates are clinically coherent, the meta-analysis remains informative. The mistake is reporting I-squared without investigating what drives it.
Failing to provide reproducible code. A forest plot without the underlying analysis code is an unverifiable claim. Peer reviewers increasingly request R or Stata scripts as supplementary material, and journals are beginning to mandate code sharing for meta-analyses.
Bringing It All Together
Learning how to do a meta-analysis means mastering a nine-step pipeline: define outcomes and choose your effect measure, extract or calculate effect sizes from each included study, select the appropriate statistical model (almost always random-effects with REML), run the pooled analysis and generate forest plots, assess heterogeneity using I-squared, tau-squared, and prediction intervals, conduct pre-specified sensitivity and subgroup analyses, test for publication bias using funnel plots and Egger's test, rate the certainty of evidence with GRADE, and report your results transparently per PRISMA 2020.
Each step has decision points that require both statistical knowledge and clinical judgment. The software you use, ideally R (metafor) or Stata, should be open-source and produce reproducible code. The outputs you deliver, forest plots, funnel plots, heterogeneity statistics, sensitivity analyses, and a GRADE summary of findings table, must meet journal standards on first submission.
If you are new to meta-analysis, start with a simple pairwise analysis of a single outcome using the R code examples in this guide and our free tools: the effect size calculator for computing Hedges' g or odds ratios, the forest plot generator for visualizing your results, and the funnel plot generator for checking publication bias. For heterogeneity, use our I-squared and tau-squared calculator, and for sensitivity testing, try the leave-one-out analysis tool.
For researchers who need publication-ready results without the learning curve, consider our professional research services. Our biostatisticians specialize in meta-analysis across clinical medicine, epidemiology, and health technology assessment. We deliver reproducible R code, publication-ready forest plots, GRADE-rated evidence tables, and a complete statistical methods section. Learn more about our meta-analysis service or explore the process of hiring a meta-analysis expert.
References
- Borenstein, M., Hedges, L.V., Higgins, J.P.T., & Rothstein, H.R. (2009). Introduction to Meta-Analysis. Wiley.
- DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7(3), 177-188.
- Higgins, J.P.T., Thomas, J., Chandler, J., Cumpston, M., Li, T., Page, M.J., & Welch, V.A. (Eds.) (2023). Cochrane Handbook for Systematic Reviews of Interventions (Version 6.4). Cochrane.
- Sterne, J.A.C., Sutton, A.J., Ioannidis, J.P.A., et al. (2011). Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ, 343, d4002.