To do a meta-analysis you define a precise question, systematically search and screen the relevant studies, extract a comparable effect size and its variance from each one, then combine those estimates with inverse-variance weighting into a single pooled effect and quantify how much the studies genuinely disagree. It is the statistical engine inside a systematic review, and its credibility rests as much on the early protocol and extraction work as on the model you fit at the end.
This guide is written for researchers running a real meta-analysis to a publishable standard, so it covers the mechanics most introductions skip: the variance of each effect metric, the choice of between-study variance estimator, the Hartung-Knapp adjustment, prediction intervals, dependency between effect sizes, and the diagnostics reviewers now expect. If your topic is psychology specifically, pair this with our worked overview of meta-analysis in psychology; the methodology below applies across every field.
Step 1: Define the question and pre-register the protocol
Everything starts with a focused, answerable question. Most reviewers structure it with the PICO framework, specifying the Population, Intervention, Comparison, and Outcome, or PECO for exposure questions. A sharp question dictates your eligibility criteria, your search terms, and the effect size you will eventually pool. Use our PICO framework builder to pin this down before anything else.
The protocol is where doctoral rigor begins, because it commits you to analytical choices before the data can influence them. Pre-register it, typically on PROSPERO for health topics or the Open Science Framework for others, and specify more than the question: name the effect size metric, the pooling model, the tau-squared estimator, whether you will apply the Hartung-Knapp adjustment, your planned subgroups and meta-regression covariates, and your rule for handling multiple effect sizes per study. Each of these decisions changes the result, so deciding them a priori is what separates a confirmatory synthesis from a flexible one. This single discipline does more for credibility than any computation later.
Step 2: Search multiple databases systematically
A meta-analysis is only as complete as its search, and a missed literature biases the pool no matter how clean the statistics. Build a reproducible search strategy combining controlled vocabulary (MeSH in PubMed, Emtree in Embase) with free-text terms joined by Boolean logic, and run it across several databases, typically PubMed, Embase, the Cochrane Library, Web of Science, and a field-specific source such as PsycINFO or CINAHL. Our search strategy builder helps structure searches that hold up to peer review.
Extend the search beyond databases to limit publication bias at the source: search trial registries (ClinicalTrials.gov, the WHO ICTRP), grey literature and dissertations, conference proceedings, and the reference lists of included studies through backward and forward citation chasing. Record the exact strings, databases, and dates so the search is reproducible, and consider a PRESS peer review of the strategy. The search is the part of a meta-analysis most exposed in peer review, so document it as if a referee will rerun it.
Step 3: Screen studies in duplicate against your inclusion criteria
Searches return thousands of records, most irrelevant. Screening happens in two passes: titles and abstracts first, then full texts of the survivors. Best practice uses two independent reviewers at both stages, with disagreements resolved by discussion or a third reviewer. Quantify screening agreement with Cohen's kappa; a value below roughly 0.6 signals that your criteria are ambiguous and need refining before you proceed.
Track every record from identification to inclusion so you can build a PRISMA 2020 flow diagram documenting how many were found, deduplicated, screened, excluded, and included, with reasons for full-text exclusions. Our PRISMA flow generator produces that figure from your screening counts. Alongside screening, appraise each included study for risk of bias using a structured tool such as Cochrane RoB 2 for randomized trials or ROBINS-I for non-randomized studies, because those judgments feed both your sensitivity analyses and the final certainty rating.
Step 4: Extract effect sizes and their variances
This is the stage where errors do the most damage, because a mistaken effect size propagates into the pooled result, and where doctoral-level work diverges from a basic summary. From each included study you extract not only an effect size but also its variance, because pooling weights every study by the inverse of that variance. Choose the metric a priori from the outcome type.
Continuous outcomes. The standardized mean difference expresses the group difference in pooled standard deviation units. Cohen's d is the mean difference divided by the pooled standard deviation, but it is upward biased in small samples, so apply the Hedges' g correction with the factor J approximately equal to 1 minus 3 divided by (4 times df minus 1). The variance of g is approximately (n1 plus n2)/(n1 times n2) plus g squared divided by 2(n1 plus n2). When all studies use the same well-understood scale, a raw mean difference is more interpretable than a standardized one.
Binary outcomes. Use the risk ratio, odds ratio, or risk difference, and crucially analyze ratio measures on the natural-log scale, where they are roughly symmetric and normally distributed. The variance of the log odds ratio from a 2 by 2 table with cells a, b, c, d is 1/a plus 1/b plus 1/c plus 1/d. Apply a continuity correction (commonly adding 0.5) when a cell is zero, and prefer the Peto odds ratio or an exact method when events are rare, since the standard correction distorts sparse data.
Correlational outcomes. Convert each correlation r with the Fisher z transformation, z equal to 0.5 times the natural log of (1 plus r) over (1 minus r), which stabilizes the variance to approximately 1/(n minus 3). Pool on the z scale and back-transform the pooled estimate to r for reporting.
Our guide to calculating standardized effect sizes covers the conversions between these metrics, and the effect size calculator handles individual comparisons, including from t statistics, F ratios, and odds ratios when a study does not report the raw inputs.