What is an effect size in meta-analysis?

An effect size is a standardised, quantitative measure of the magnitude of a treatment effect, association, or difference observed in a study. In meta-analysis, effect sizes serve as the common currency that allows you to combine results from studies that may use different measurement instruments, different scales, or different sample sizes into a single pooled estimate. Common effect sizes include the standardised mean difference for continuous outcomes, the odds ratio and risk ratio for dichotomous outcomes, and the hazard ratio for time-to-event outcomes. The choice of effect size determines how your results are interpreted and compared.

What is the difference between Cohen d and Hedges g?

Both Cohen d and Hedges g are standardised mean differences that express the difference between two group means in standard deviation units. Cohen d divides the mean difference by the pooled standard deviation, while Hedges g applies a correction factor (approximately 1 minus 3 divided by 4n minus 9, where n is the total sample size) that adjusts for the upward bias present in Cohen d when sample sizes are small. For studies with more than about 20 participants per group, the two values are nearly identical. For smaller samples, Hedges g provides a less biased estimate and is generally preferred in meta-analysis.

When should I use odds ratios versus risk ratios?

Use risk ratios when you want an intuitive measure of how many times more likely an event is in the intervention group compared to the control group. Risk ratios are the preferred measure for cohort studies and randomised controlled trials where you can estimate incidence directly. Use odds ratios for case-control studies where you cannot estimate incidence directly, or when using logistic regression models. The critical caveat is that odds ratios overestimate the risk ratio when the event rate is above approximately 10 percent. In rare disease scenarios the two measures converge, but for common outcomes they can diverge substantially and lead to misinterpretation if odds ratios are described as risk ratios.

How do I extract hazard ratios from published studies?

Extracting hazard ratios can be challenging because not all studies report them directly. When a study reports the hazard ratio with its confidence interval, extract those values directly. When only Kaplan-Meier curves are available, you can use methods described by Parmar, Tierney, and others to estimate the hazard ratio from the curve data and number at risk at each time point. When only p-values and event counts are reported, approximation methods exist but introduce additional uncertainty. The Tierney 2007 practical methods paper and the Cochrane Handbook provide step-by-step guidance for each extraction scenario. Always document which method you used for each study.

Can I combine different types of effect sizes in one meta-analysis?

Yes, but you must convert them to a common metric first. Established conversion formulas exist for translating between standardised mean differences and odds ratios (the Hasselblad-Hedges or Cox-Snell formulas), between odds ratios and risk ratios (using baseline event rates), and between correlation coefficients and standardised mean differences. The key requirement is that all converted effect sizes must represent the same underlying construct. You should document every conversion in your methods, use the original values in sensitivity analyses to check robustness, and acknowledge the additional assumptions introduced by conversion.

How do I interpret heterogeneity statistics like I-squared?

I-squared describes the percentage of total variability across studies that is due to true heterogeneity rather than sampling error. Cochrane provides rough benchmarks: 0 to 40 percent might not be important, 30 to 60 percent may represent moderate heterogeneity, 50 to 90 percent may represent substantial heterogeneity, and 75 to 100 percent represents considerable heterogeneity. However, I-squared should always be interpreted alongside the Cochran Q test p-value and the tau-squared value, which estimates the absolute between-study variance. The prediction interval is the most clinically meaningful heterogeneity measure because it shows the range within which a future study's true effect is likely to fall.

What benchmarks exist for interpreting effect sizes?

Cohen's widely cited benchmarks classify a standardised mean difference of 0.2 as small, 0.5 as medium, and 0.8 as large. For odds ratios, values of 1.5, 2.5, and 4.3 correspond roughly to small, medium, and large effects. For correlation coefficients, 0.1 is small, 0.3 is medium, and 0.5 is large. However, these benchmarks are generic and context-dependent. A small effect size in a life-threatening condition may be clinically meaningful, while a large effect size in a self-reported subjective outcome may not be. Always interpret effect sizes in the context of your clinical question, the baseline risk, and the patient perspective rather than relying solely on arbitrary thresholds.

Back to Blog

Statistics

20 min read

Understanding Meta-Analysis Effect Sizes: A Complete Guide to Choosing and Interpreting the Right Measure

Effect sizes are the currency of meta-analysis, but choosing the wrong measure can invalidate your entire synthesis. This guide explains every major effect size, when to use each one, how to convert between them, and the interpretation pitfalls that catch even experienced reviewers.

Dr. James Chen

January 28, 2026

Effect sizes are the foundation of every meta-analysis. They translate the raw results from individual studies, which come in different formats, on different scales, and across different populations, into a common metric that can be pooled, compared, and interpreted across an entire body of evidence. Choosing the wrong effect size can invalidate your synthesis. Choosing the right one delivers clear, actionable conclusions that advance clinical knowledge and inform practice.

This guide covers every major effect size used in contemporary meta-analysis, explains when and why to use each one, provides worked decision frameworks, walks through conversion methods between metrics, and addresses the interpretation pitfalls that catch even experienced systematic reviewers.

Why Effect Sizes Are the Currency of Meta-Analysis

Individual studies report their results in a bewildering variety of formats: means and standard deviations, percentages and p-values, regression coefficients, hazard ratios, median survival times, or simple statements like "the intervention group improved significantly." A meta-analysis cannot work with this raw heterogeneity of reporting formats. It needs a standardised measure that captures the direction, magnitude, and precision of each study's finding in a common unit that allows direct comparison and mathematical pooling.

That standardised measure is the effect size, and it simultaneously answers the two most important questions in evidence synthesis. First, in which direction does the evidence point: does the intervention help, harm, or make no difference compared to the comparator? Second, how large is the effect: is it trivially small, clinically meaningful, or transformatively large? Without effect sizes, you cannot construct a forest plot, calculate a pooled estimate, assess heterogeneity, or conduct any of the subgroup or sensitivity analyses that give a meta-analysis its analytical power. They are not a statistical convenience; they are the fundamental unit of evidence synthesis.

The choice of effect size depends on three factors: the type of outcome being measured (continuous, dichotomous, time-to-event, or correlational), the measurement scales used across studies (identical or different instruments), and the study designs contributing data (experimental, observational, case-control). Getting this choice right at the protocol stage prevents painful analytical problems downstream.

Effect Sizes for Continuous Outcomes

When your outcome is measured on a continuous scale (blood pressure, pain scores, cognitive test performance, quality-of-life ratings), you have two primary options: the mean difference (MD) and the standardised mean difference (SMD). The choice between them depends entirely on whether your included studies all use the same measurement instrument.

Mean Difference (MD)

The mean difference is the simplest and most clinically interpretable effect size: the arithmetic difference between the intervention group mean and the control group mean, expressed in the original measurement units.

When to use the mean difference:

All included studies measure the outcome on the same scale using the same instrument (e.g., systolic blood pressure in mmHg, Hamilton Depression Rating Scale score, forced expiratory volume in litres).
The measurement scale has a clinically meaningful interpretation that your audience understands intuitively.
You want readers to be able to directly translate your pooled result into clinical practice without requiring statistical training to interpret standardised units.

Advantages of the mean difference: Direct clinical interpretability is the primary strength. A pooled MD of -5.2 mmHg for blood pressure is immediately meaningful to clinicians, patients, and policymakers. No information is lost through standardisation, and the result requires no additional context for interpretation.

Limitations: The mean difference cannot be used when studies measure the same underlying construct using different instruments. You cannot meaningfully average a Beck Depression Inventory score change with a PHQ-9 score change in their original units, even though both measure depression severity.

Standardised Mean Difference (SMD)

The standardised mean difference expresses the difference between group means in standard deviation units rather than the original measurement units. This standardisation allows you to pool results from studies using different measurement instruments for the same underlying construct. The two versions you will encounter are Cohen's d and Hedges' g.

Cohen's d divides the mean difference by the pooled standard deviation of both groups:

d = (Mean_intervention - Mean_control) / SD_pooled

Hedges' g applies a small-sample correction factor to Cohen's d that adjusts for the upward bias present when sample sizes are small:

g = d × J, where J ≈ 1 - (3 / (4df - 1))

For studies with more than approximately 20 participants per group, Cohen's d and Hedges' g are nearly identical. For smaller samples, Hedges' g provides a less biased estimate and is the default choice in most meta-analysis software.

Pro Tip: Always use Hedges' g instead of Cohen's d as your default SMD. The correction factor adds negligible computational complexity (your software handles it automatically) but eliminates the systematic upward bias that Cohen's d exhibits with small samples. Since systematic reviews frequently include studies with modest sample sizes, defaulting to Hedges' g is a costless safeguard that improves accuracy. Every major meta-analysis software package (RevMan, Comprehensive Meta-Analysis, R metafor) offers Hedges' g as a standard option.

Interpreting Standardised Mean Differences

Cohen's widely cited benchmarks provide a starting framework for SMD interpretation, but they should always be contextualised within your specific clinical question:

SMD Magnitude	Cohen's Label	Clinical Context Example
0.2	Small	A new antihypertensive lowers systolic BP by 0.2 SD, modest but potentially meaningful if the drug has a favourable safety profile
0.5	Medium	A psychotherapy intervention improves depression scores by 0.5 SD, a clearly noticeable clinical improvement
0.8	Large	A surgical intervention improves functional outcomes by 0.8 SD, a substantial, easily observable change
1.2+	Very large	Rare in clinical research; common in laboratory or educational interventions

The critical caveat: these benchmarks are generic defaults, not clinical thresholds. An SMD of 0.2 in a life-threatening condition where no effective treatment exists may be profoundly meaningful. An SMD of 0.8 in a self-reported subjective outcome with known measurement bias may be less impressive than it appears. Always interpret SMDs in the context of the baseline severity, the patient population, the clinical significance threshold, and the precision of the estimate.

Choosing Between MD and SMD

Use the mean difference whenever possible because it preserves direct clinical interpretability. Reserve the standardised mean difference for situations where pooling requires standardisation across different instruments. If most of your included studies use the same measurement scale but a small number use different instruments, consider converting those outlier results to the common scale (if conversion formulas exist) rather than standardising everything. This preserves interpretability for the majority of your evidence.

Effect Sizes for Dichotomous Outcomes

When your outcome is binary (event or no event, response or no response, alive or dead), the three primary options are the risk ratio (RR), odds ratio (OR), and risk difference (RD). Each has distinct strengths, limitations, and appropriate use cases, and confusing them is one of the most common errors in meta-analysis reporting and interpretation.

Risk Ratio (Relative Risk)

The risk ratio compares the probability of the event in the intervention group to the probability in the control group:

RR = (Events_intervention / N_intervention) / (Events_control / N_control)

When to use the risk ratio: Cohort studies and randomised controlled trials where you can directly estimate the incidence of the event in both groups. The risk ratio is the preferred measure for most intervention meta-analyses because it is the most intuitive relative measure for clinicians and patients.

Interpretation framework:

RR Value	Meaning	Clinical Example
RR = 1.0	No difference between groups	The intervention has no effect on the outcome
RR = 0.75	25% relative risk reduction	The intervention reduces the event rate by one quarter
RR = 1.50	50% relative risk increase	The intervention increases the event rate by half
RR = 0.50	50% relative risk reduction	The intervention halves the event rate

The risk ratio is bounded by zero on the lower end but has no upper bound, and its natural value of no effect is 1.0. It is typically log-transformed for meta-analytic pooling (because the log-RR has a more symmetric sampling distribution) and then back-transformed for reporting.

Odds Ratio

The odds ratio compares the odds (not the probability) of the event between groups:

OR = (Events_intervention / Non-events_intervention) / (Events_control / Non-events_control)

When to use the odds ratio: Case-control studies where you cannot directly estimate incidence, logistic regression models (which naturally output odds ratios), and situations involving rare events (below approximately 10% prevalence) where OR and RR converge to nearly identical values.

The critical caveat that every meta-analyst must understand: When baseline event rates exceed approximately 10%, odds ratios systematically overestimate the corresponding risk ratio. This divergence grows as the event rate increases. An OR of 3.0 with a 30% baseline risk corresponds to an RR of approximately 2.0, which represents a 50% overestimation of the relative risk. Never describe or interpret an odds ratio as though it were a risk ratio when the outcome is common. This single error is responsible for more misinterpretation in published meta-analyses than almost any other statistical mistake.

Baseline Event Rate	Odds Ratio	Corresponding Risk Ratio	Overestimation
5% (rare)	2.0	1.95	Minimal (~3%)
10%	2.0	1.82	~10%
20%	2.0	1.67	~20%
30%	2.0	1.54	~30%
50% (common)	2.0	1.33	~50%

Risk Difference (Absolute Risk Reduction)

The risk difference is the absolute difference in event rates between groups:

RD = (Events_intervention / N_intervention) - (Events_control / N_control)

When to use the risk difference: When you need to communicate the absolute clinical impact of an intervention, calculate the Number Needed to Treat (NNT = 1 / |RD|), or help decision-makers understand how many events would be prevented per population treated. The risk difference contextualises relative effects within the baseline risk, which is essential for clinical decision-making.

Limitation for meta-analytic pooling: Risk differences tend to be more heterogeneous across studies than relative measures because they depend directly on the baseline event rate, which varies across populations and settings. For this reason, most meta-analyses use a relative measure (RR or OR) as the primary pooled estimate and present the risk difference or NNT as a supplementary measure for clinical interpretation.

Choosing Between RR, OR, and RD

For most meta-analyses of intervention effects, follow this decision framework:

Primary measure: Use the risk ratio because it is intuitive, directly interpretable, and appropriate for cohort studies and RCTs.
Case-control studies: Use the odds ratio because incidence cannot be estimated directly from case-control data.
Supplementary measure: Report the risk difference and/or NNT alongside the primary relative measure to provide clinical context about absolute impact.
Mixed study designs with rare events: Either RR or OR is acceptable since they converge when prevalence is below 10%.
Mixed study designs with common events: Convert to a common metric using documented conversion formulas and present sensitivity analyses using unconverted values.

Pro Tip: Always check the direction of your effect sizes before pooling. A surprisingly common error in meta-analysis is accidentally combining effect sizes with reversed directions. If one study reports the mean difference as intervention minus control and another reports control minus intervention, your pooled estimate will be biased toward the null or even reversed. Create a consistent coding scheme during data extraction that defines the direction for all effect sizes, and verify the direction of every extracted value during data checking. This five-minute quality check has caught errors in countless published meta-analyses.

Effect Sizes for Time-to-Event Outcomes

Hazard Ratio (HR)

The hazard ratio is the standard effect size for time-to-event (survival) data. It represents the ratio of hazard rates between two groups, where the hazard is the instantaneous rate of the event at any given time point, conditional on the participant having survived to that point.

When to use the hazard ratio: Studies reporting survival analysis or time-to-event outcomes where censoring is present (participants lost to follow-up, studies ending before all events occur). Common applications include overall survival, progression-free survival, time to relapse, time to treatment failure, and duration of response in oncology, cardiology, and infectious disease research.

Interpretation framework:

HR Value	Meaning (for harmful events)
HR = 1.0	No difference in event rates between groups
HR = 0.70	30% reduction in the hazard rate (favours intervention)
HR = 1.40	40% increase in the hazard rate (favours control)
HR = 0.50	50% reduction in the hazard rate (strong intervention benefit)

The hazard ratio assumes proportional hazards, meaning that the relative difference in event rates between groups remains constant over time. When this assumption is violated (e.g., a treatment that provides early benefit but loses effectiveness over time), the HR can be misleading and alternative approaches such as restricted mean survival time (RMST) should be considered.

Extracting Hazard Ratios From Published Studies

Not all studies report hazard ratios directly, which makes extraction one of the most challenging aspects of time-to-event meta-analysis. The approach depends on what data the study provides:

Available Data	Extraction Method	Precision
HR with 95% CI reported directly	Extract as-is, convert to ln(HR) and SE for pooling	Highest
HR with p-value only	Calculate SE from p-value using the normal distribution	High
Kaplan-Meier curves with numbers at risk	Use Parmar/Tierney methods to estimate HR from curve data	Moderate
Median survival times for both groups	Approximate HR from ratio of median times (assumes exponential distribution)	Lower
Only event counts and p-values	Use approximation methods from Tierney et al. (2007)	Lower

The Tierney et al. (2007) practical methods paper and Cochrane Handbook Chapter 6 provide step-by-step extraction guidance for each scenario. Always document which extraction method you used for each study in a supplementary table so that reviewers can evaluate the potential impact of different extraction approaches on your pooled estimate.

Pro Tip: Extract data at the most granular level available. When studies report both adjusted and unadjusted hazard ratios, extract both and document which you use in your primary analysis versus sensitivity analysis. When individual patient data or detailed Kaplan-Meier curves are available alongside summary statistics, use the most granular source to maximise precision. And always record the covariates included in adjusted analyses, as pooling hazard ratios adjusted for different covariate sets introduces heterogeneity that should be explored.

Correlation Coefficients as Effect Sizes

Pearson's r and Fisher's z Transformation

The Pearson correlation coefficient measures the linear association between two continuous variables, ranging from -1 (perfect negative association) through 0 (no association) to +1 (perfect positive association). It is the standard effect size in meta-analyses that examine the strength of association between variables rather than the effect of an intervention. This measure is common in psychology, education, organisational behaviour, and social science research.

The pooling problem with raw correlations: The sampling distribution of Pearson's r is skewed, particularly when the true correlation is far from zero, and its variance depends on the correlation value itself. This means you should not average raw correlation coefficients directly. Instead, apply Fisher's z transformation before pooling:

z = 0.5 × ln((1 + r) / (1 - r))

Pool the z-transformed values using standard inverse-variance weighting, then back-transform the pooled z to obtain the pooled correlation coefficient. Fisher's z has a nearly normal sampling distribution regardless of the true correlation, which makes standard meta-analytic methods appropriate.

Benchmarks for interpreting correlations (Cohen, 1988):

Correlation (r)	Label	Research Example
0.10	Small	Association between a single personality trait and job performance
0.30	Medium	Association between socioeconomic status and academic achievement
0.50	Large	Association between height and weight in adults

As with all effect size benchmarks, these are starting points for interpretation, not absolute thresholds. A "small" correlation observed consistently across dozens of high-quality studies may be more scientifically meaningful than a "large" correlation from a single underpowered study with measurement problems.

Converting Between Effect Size Metrics

Sometimes your included studies report different types of effect sizes for comparable outcomes, requiring conversion to a common metric before pooling. Established conversion formulas exist for the most common scenarios, but every conversion introduces assumptions that must be documented and tested.

Common Conversion Formulas

SMD to Odds Ratio (Hasselblad-Hedges formula):

ln(OR) = d × (π / √3) ≈ d × 1.814

Odds Ratio to SMD:

d = ln(OR) × (√3 / π) ≈ ln(OR) × 0.551

Odds Ratio to Risk Ratio (requires baseline event rate P₀):

RR = OR / (1 - P₀ + P₀ × OR)

Correlation to SMD:

d = 2r / √(1 - r²)

Important Caveats for Conversions

Every conversion formula relies on distributional assumptions that may not hold perfectly in practice. The SMD-to-OR conversion assumes logistic distributions in both groups. The OR-to-RR conversion requires accurate knowledge of the baseline event rate. The r-to-d conversion assumes equal group sizes. These assumptions introduce additional uncertainty beyond the sampling error of the original estimates.

Pro Tip: Document every conversion in a supplementary table. For every study where you converted between effect size metrics, create a row in a supplementary table showing the original reported value, the conversion formula applied, the converted value, and any assumptions required (such as the baseline event rate used for OR-to-RR conversion). This makes your analysis fully reproducible and allows reviewers to verify your calculations. Additionally, conduct a sensitivity analysis using only unconverted effect sizes to assess whether the conversions meaningfully influence your pooled estimate.

Understanding and Investigating Heterogeneity

Once you have pooled your effect sizes, you must assess heterogeneity, the extent to which the true underlying effect varies across studies beyond what would be expected from random sampling error alone. A meta-analysis that reports a pooled effect without assessing heterogeneity is fundamentally incomplete and potentially misleading.

Key Heterogeneity Statistics

Statistic	What It Measures	Interpretation
Cochran's Q	Whether observed variability exceeds expected sampling error	A formal hypothesis test; significant p-value suggests true heterogeneity exists
I²	The percentage of total variability attributable to true heterogeneity	0–40% might not be important; 30–60% moderate; 50–90% substantial; 75–100% considerable
τ² (tau-squared)	The absolute between-study variance in the true effect	Expressed in squared effect-size units; useful for comparing heterogeneity across different meta-analyses
Prediction interval	The range within which a future study's true effect is likely to fall	The most clinically meaningful heterogeneity measure; often much wider than the confidence interval

Note the deliberately overlapping I² ranges in the Cochrane guidance. This reflects the fact that I² must be interpreted alongside the Q test p-value, the tau-squared estimate, and the clinical context. A meta-analysis with I² of 60% where all individual studies show benefit in the same direction is very different from one with I² of 60% where studies show conflicting directions of effect.

Pro Tip: Report prediction intervals alongside confidence intervals in every forest plot. The confidence interval for a pooled effect tells you the precision of the average estimate across the included studies. The prediction interval tells you the range within which a future study's true effect is likely to fall given the observed heterogeneity. In the presence of substantial heterogeneity, prediction intervals are often dramatically wider than confidence intervals and provide a far more honest picture of what clinicians should expect when applying the evidence to new settings and populations. Both intervals appear in the same forest plot row and cost nothing additional to compute.

Investigating the Sources of Heterogeneity

When heterogeneity is present, your job is not simply to report its magnitude but to investigate its sources. Pre-specified investigations should include:

Subgroup analyses compare effect sizes across predefined categories. Common subgroup variables include study quality (low vs. high risk of bias), population characteristics (age, severity, geographic region), intervention features (dose, duration, delivery mode), and study design (RCT vs. observational). Limit the number of subgroup analyses to avoid false-positive findings. A commonly cited guideline is no more than one subgroup variable per ten included studies.

Meta-regression models the association between continuous study-level covariates (e.g., mean age, intervention duration, publication year) and the effect size. Meta-regression requires at least ten studies to have adequate statistical power, and its results should be interpreted as exploratory and hypothesis-generating rather than confirmatory.

Sensitivity analyses test the robustness of your pooled estimate by systematically varying analytical decisions. Common sensitivity analyses include removing outlier studies (identified by visual inspection of the forest plot or formal statistical tests), restricting to studies at low risk of bias, using alternative effect size calculations, switching between random-effects and fixed-effect models, and excluding studies where effect sizes were converted from a different metric.

Reporting Effect Sizes: A Comprehensive Checklist

A well-reported meta-analysis presents effect sizes with sufficient context and detail for readers to independently evaluate the evidence. PRISMA 2020 mandates all of these reporting elements, and incomplete effect-size reporting is one of the most common reasons for revision requests at high-impact journals.

Essential Reporting Elements

Element	What to Report	Why It Matters
Effect measure	Name and justification (e.g., "We used Hedges' g because studies used different depression instruments")	Allows readers to assess appropriateness
Pooled estimate with 95% CI	The primary result (e.g., SMD = 0.45, 95% CI 0.28 to 0.62)	Quantifies the best estimate and its precision
Prediction interval	The expected range for a new study (e.g., PI -0.12 to 1.02)	Contextualises heterogeneity clinically
Forest plot	Individual study effects with weights and pooled diamond	Visual display of the evidence structure
Heterogeneity statistics	Q, I², τ², and prediction interval	Assesses consistency across studies
Subgroup and sensitivity results	Pre-specified and post-hoc analyses with rationale	Tests robustness and explores variability
Publication bias assessment	Funnel plot and statistical test if ≥10 studies	Evaluates risk of missing studies
GRADE certainty rating	Certainty for each primary outcome with justification	Contextualises the strength of evidence

Choosing the Right Effect Size: A Decision Framework

When selecting your effect size at the protocol stage, work through this decision tree to ensure you choose the most appropriate and informative measure for your specific review question.

Step 1: Identify Your Outcome Type

Outcome Type	Go To
Continuous (measured on a scale)	Step 2
Dichotomous (event / no event)	Step 3
Time-to-event (survival data with censoring)	Use Hazard Ratio
Correlational (association between variables)	Use Fisher z-transformed Pearson r

Step 2: Continuous Outcome: Same Scale or Different Scales?

Scenario	Effect Size	Rationale
All studies use the same instrument and scale	Mean Difference (MD)	Preserves direct clinical interpretability
Studies use different instruments for the same construct	Standardised Mean Difference (Hedges' g)	Enables pooling across different scales
Most studies use the same scale, a few use different instruments	Convert outliers to the common scale if possible; otherwise use Hedges' g	Balances interpretability with inclusiveness

Step 3: Dichotomous Outcome: Which Study Designs?

Scenario	Primary Measure	Supplementary Measure
RCTs and cohort studies	Risk Ratio (RR)	Risk Difference and NNT for clinical context
Case-control studies	Odds Ratio (OR)	Convert to RR with baseline risk for interpretation
Mixed designs, rare events (<10%)	Either RR or OR (they converge)	Risk Difference
Mixed designs, common events (>10%)	Convert to a common metric with documented assumptions	Sensitivity analysis with unconverted values

Common Mistakes That Undermine Meta-Analyses

Even experienced systematic reviewers make errors in effect-size selection, calculation, and interpretation. Being aware of these common pitfalls allows you to avoid them in your own work and identify them when peer-reviewing others' meta-analyses.

Pooling different constructs under the same label. An effect size for pain intensity and one for pain frequency cannot be combined just because both are called "pain outcomes." Each pooled analysis must include only studies measuring fundamentally the same construct with conceptually equivalent endpoints.
Ignoring the direction of effect sizes. Ensure that a positive SMD (or RR greater than 1.0) consistently means the same thing across all included studies before pooling. A single reversed-direction effect size can substantially bias the pooled estimate toward the null.
Using Cohen's d with small samples. Switch to Hedges' g whenever any included study has fewer than 20 participants per group. The correction factor is trivial to compute and eliminates systematic upward bias.
Treating odds ratios as risk ratios. When event rates exceed 10%, the OR overestimates the RR by an increasingly large margin. Always state clearly which measure you report and convert to the appropriate metric for interpretation.
Averaging raw correlations without Fisher's z transformation. Raw correlations have skewed sampling distributions. Always transform to Fisher's z before pooling and back-transform the pooled estimate for reporting.
Reporting pooled estimates without heterogeneity assessment. A pooled effect size without I², tau-squared, and ideally a prediction interval is incomplete. A tight confidence interval can mask enormous between-study variability that a prediction interval would reveal.
Relying solely on statistical significance. An effect size can be statistically significant (p < 0.05) but clinically trivial, or non-significant but clinically meaningful given the sample size. Always report and interpret the magnitude of the effect, the width of the confidence interval, and the certainty of evidence, not just the p-value.

By understanding these fundamentals, from selecting the right metric through calculating, pooling, and interpreting effect sizes with appropriate nuance, you can produce a meta-analysis that is both statistically rigorous and clinically meaningful. The time invested in getting your effect-size decisions right at the protocol stage pays dividends throughout every subsequent phase of your systematic review.