Effect size calculation for meta-analysis is the process of computing a standardized, quantitative measure of the magnitude of a treatment effect, exposure association, or group difference from individual study data so that results can be pooled across studies. Meta-analysis requires effect sizes as its fundamental input, without them, there is nothing to synthesize. In our meta-analyses, effect size extraction is where most data errors originate, standardizing your extraction process with a structured spreadsheet eliminates approximately 80% of these errors.
An effect size is a standardized, quantitative measure of the magnitude of a treatment effect, exposure association, or group difference. In meta-analysis, effect sizes from individual studies are pooled to produce a combined estimate. The three main families are: standardized mean differences (Cohen's d, Hedges' g), ratio measures (odds ratio, risk ratio, hazard ratio), and correlation coefficients (Pearson's r).
This guide covers every step of effect size calculation, from selecting the right measure for your outcome type, through formulas for computing each effect size family, to converting between measures when studies report results on different scales. Whether you are working with continuous outcomes, binary events, or survival data, you will find the formulas, tables, and practical guidance you need. For a broader overview of the full synthesis process, see our complete meta-analysis guide.
What Is an Effect Size in Meta-Analysis?
An effect size quantifies how large a treatment effect, group difference, or association is, independent of sample size. Unlike a p-value, which tells you whether a result is statistically significant, an effect size tells you whether it matters practically. Statistical significance can be achieved with a trivially small effect in a large enough sample, while a clinically meaningful effect may fail to reach significance in a small study.
Meta-analysis requires effect size calculation because pooling results across studies demands a common metric. Individual studies may report means and standard deviations, 2x2 tables, hazard ratios, or correlation coefficients. Before these can be combined, they must be translated into comparable effect sizes with accompanying variance or standard error estimates for weighting.
The distinction matters: a p-value of 0.03 from a study of 5,000 participants may reflect a negligible effect, while a p-value of 0.08 from a study of 40 participants may reflect a large one. Effect sizes strip away sample-size dependence and focus on what researchers actually care about, the magnitude and direction of the effect. Every forest plot in a meta-analysis visualizes these pooled effect sizes and their confidence intervals, making effect size calculation the foundation of evidence synthesis.
Types of Effect Size Measures
The correct effect size type depends on your outcome data and study design. There are three main families, each suited to different research questions. Choosing the wrong measure introduces bias or makes your results uninterpretable. The table below maps outcome types to the appropriate effect size measure.
| Outcome Type | Study Design | Recommended Effect Size | Example |
|---|---|---|---|
| Continuous (same scale) | RCT or cohort | Mean Difference (MD) | Blood pressure in mmHg |
| Continuous (different scales) | RCT or cohort | Standardized Mean Difference (Hedges' g) | Depression measured by BDI vs. PHQ-9 |
| Binary (events) | RCT | Risk Ratio (RR) or Risk Difference (RD) | Infection yes/no |
| Binary (case-control) | Case-control | Odds Ratio (OR) | Disease exposure status |
| Time-to-event | Survival analysis | Hazard Ratio (HR) | Time to relapse |
| Association | Cross-sectional | Pearson's r | Correlation between variables |
Standardized Mean Differences, Cohen's d and Hedges' g
Cohen's d is the most widely known standardized mean difference. It expresses the difference between two group means in standard deviation units: d = (M1 − M2) / SD_pooled. Cohen's d is a standardized mean difference that uses the pooled within-group standard deviation as the denominator. When studies measure the same construct on different scales, for example, depression severity using the Beck Depression Inventory and the Patient Health Questionnaire, the standardized mean difference allows direct comparison.
Hedges' g corrects for the upward bias inherent in Cohen's d when sample sizes are small. Hedges' g applies a correction factor J = 1 − 3/(4df − 1) to adjust Cohen's d for small-sample bias, which is particularly important when studies have fewer than 20 participants per group (Hedges, 1981). For meta-analysis, Hedges' g is the preferred standardized mean difference because most systematic reviews include at least some small studies.
Ratio Measures, Odds Ratio, Risk Ratio, Risk Difference
Odds ratios compare the odds of an event in one group to the odds in another. An odds ratio is a ratio effect size measure computed from a 2x2 contingency table: OR = (a/b) / (c/d), where a, b, c, and d are the cell counts. Odds ratios are the standard effect size for case-control studies and are widely used in clinical trials.
Risk ratios (relative risk) compare the probability of an event: RR = (a/(a+b)) / (c/(c+d)). Risk ratios are more intuitive than odds ratios for prospective studies. When event rates are low (under 10%), OR and RR approximate each other closely. When event rates are high, they diverge substantially, and the choice between them matters.
Risk difference measures the absolute difference in event probability between groups: RD = (a/(a+b)) − (c/(c+d)). Risk difference conveys clinical impact directly, a risk difference of 0.05 means 5 additional events per 100 people treated.
Correlation-Based, Pearson's r
Pearson's r measures the linear association between two continuous variables, ranging from −1 to +1. It is the natural effect size for studies examining relationships, for example, the correlation between physical activity and cardiovascular risk. For meta-analysis, r is typically transformed using Fisher's z-transformation before pooling and back-transformed for interpretation.
Hazard Ratios (Time-to-Event Data)
Hazard ratios are the standard effect size for survival analysis and time-to-event outcomes. An HR of 0.75 means the treatment group has a 25% lower instantaneous rate of the event at any given time. When primary studies report Kaplan-Meier curves without explicit HRs, you can digitize survival curves to extract the data needed for estimation.
How to Calculate Effect Size for Meta-Analysis: Cohen's d and Hedges' g
Cohen's d calculation starts with the difference between group means divided by the pooled standard deviation. The formula requires three inputs from each study: mean, standard deviation, and sample size for both groups.
Cohen's d formula: d = (M1 − M2) / SD_pooled, where SD_pooled = sqrt[((n1−1)SD1² + (n2−1)SD2²) / (n1 + n2 − 2)].
The variance of d is: V_d = (n1+n2)/(n1×n2) + d²/(2(n1+n2)).
Hedges' g correction: g = d × J, where J = 1 − 3/(4(n1+n2−2) − 1). This small sample bias correction adjusts for the tendency of Cohen's d to overestimate effects in small samples. The correction is negligible when total sample size exceeds 40, but critical for smaller studies.
Effect size interpretation follows the conventions established by Cohen (1988): 0.2 = small, 0.5 = medium, 0.8 = large. These benchmarks are widely cited but should be interpreted in context, a "small" effect may be clinically significant if the outcome is mortality, while a "large" effect may be trivial if the outcome measure is unreliable.
Use our free effect size calculator to compute Cohen's d and Hedges' g from means, SDs, and sample sizes without manual formula entry.
How to Calculate Odds Ratios and Risk Ratios
Odds ratio calculation and risk ratio calculation both begin with a 2x2 contingency table that cross-tabulates the intervention group and the outcome.
| Event | No Event | |
|---|---|---|
| Treatment | a | b |
| Control | c | d |
Odds ratio: OR = (a × d) / (b × c).
Risk ratio: RR = (a / (a + b)) / (c / (c + d)).
For meta-analysis, both OR and RR must be log-transformed before pooling. The log-transformed effect size has an approximately normal sampling distribution, which is required for standard meta-analysis weighting methods. The standard error of the log-OR is: SE(ln OR) = sqrt(1/a + 1/b + 1/c + 1/d). The standard error of the log-RR is: SE(ln RR) = sqrt(1/a − 1/(a+b) + 1/c − 1/(c+d)).
After pooling the log-transformed values, results are back-transformed by exponentiating to obtain the pooled effect estimate on the natural scale.
When OR and RR diverge: If the baseline event rate exceeds 10%, odds ratios overestimate the risk ratio. In such cases, report both or convert OR to RR using the formula: RR = OR / (1 − p0 + p0 × OR), where p0 is the baseline event rate. This distinction matters for clinical interpretation, an OR of 2.0 when baseline risk is 50% corresponds to an RR of only 1.33, which tells a very different clinical story.
Converting Between Effect Size Measures
Effect size conversion is necessary when studies in your meta-analysis report results using different metrics and you need to pool them on a single scale. Established formulas enable conversion between major effect size types, allowing you to combine evidence that would otherwise be incomparable.
OR to SMD (Hasselblad-Hedges formula): d = ln(OR) × sqrt(3) / pi. This Hasselblad-Hedges formula assumes an underlying logistic distribution and works well for most meta-analysis data preparation applications (Hasselblad & Hedges, 1995).
SMD to OR: OR = exp(d × pi / sqrt(3)). This is the inverse of the Hasselblad-Hedges conversion.
d to r: r = d / sqrt(d² + 4). This conversion assumes equal group sizes; for unequal groups, use: r = d / sqrt(d² + (n1+n2)²/(n1×n2)).
r to d: d = 2r / sqrt(1 − r²). Again, this assumes equal group sizes.
Effect sizes from different measures can be converted for pooling using these established formulas: OR to SMD via ln(OR) × sqrt(3)/pi (Hasselblad & Hedges, 1995), and d to r via r = d/sqrt(d² + 4) (Borenstein et al., 2009).
Extracting Effect Sizes from Test Statistics
When primary studies report only test statistics rather than raw data, you can still extract effect sizes for your meta-analysis:
From t-values: d = t × sqrt(1/n1 + 1/n2). This is exact for independent-samples t-tests.
From F-values (one-way, two groups): d = sqrt(F) × sqrt(1/n1 + 1/n2). For F with df = 1 in the numerator, F = t².
From chi-square (df = 1): OR = exp(sqrt(chi²) × sqrt(1/a + 1/b + 1/c + 1/d)), or convert chi-square to a phi coefficient: r = sqrt(chi²/N).
From p-values: Convert the p-value to a z-score or t-value using the inverse normal or inverse t distribution, then apply the t-to-d formula. This is approximate but often the only option when effect size extraction from older studies is needed.
Try our correlation to effect size converter to automate r-to-d and d-to-r conversions. You can also convert between SE and SD when studies report standard errors instead of standard deviations.
Dealing with Missing or Incomplete Data
Missing data is one of the most common obstacles in effect size extraction for meta-analysis. Studies frequently report results in formats that do not directly provide the statistics needed for effect size calculation. Several validated methods exist for handling these situations before resorting to excluding studies.
Estimating means from medians: When studies report medians, interquartile ranges (IQR), and ranges instead of means and standard deviations, the Wan et al. (2014) and Luo et al. (2018) methods provide validated approaches for estimation. These methods use the median, minimum, maximum, first and third quartiles, and sample size to estimate the mean and SD. Use our median-to-mean estimator to apply these methods automatically.
Extracting SDs from confidence intervals: If a study reports a mean and 95% confidence interval but no standard deviation, calculate: SD = sqrt(n) × (upper − lower) / (2 × 1.96). For small samples, replace 1.96 with the appropriate t-value for n − 1 degrees of freedom.
Extracting SDs from standard errors: The relationship is straightforward: SD = standard error × sqrt(n). Many studies report SEs in text or figures while omitting SDs.
Imputing correlations from test statistics: When you need a pre-post correlation for change-score analyses but the study does not report it, you can use the relationship between reported statistics (e.g., the SD of the change score, the SDs at each time point) to back-calculate the correlation.
Contacting study authors: When mathematical extraction fails, contacting corresponding authors is a legitimate and common practice. Document all author contact attempts and responses in your systematic review protocol. Response rates vary but are typically 30–50% within a reasonable timeframe. Always mention the specific data points you need and why.
Common Effect Size Calculation Errors
Avoiding these common errors will improve the accuracy and credibility of your meta-analysis. Each mistake introduces systematic bias that can distort your pooled effect estimate and mislead clinical or policy decisions.
Mixing Cohen's d and Hedges' g across studies. Pick one measure and use it consistently throughout your meta-analysis. Mixing them introduces inconsistency because the Hedges' g correction adjusts for small-sample bias while Cohen's d does not. The practical difference is small for large studies but meaningful for studies with fewer than 20 participants per group.
Forgetting to log-transform OR/RR before pooling. Odds ratios and risk ratios have asymmetric distributions on the natural scale. Pooling untransformed values produces biased estimates and incorrect confidence intervals. Always compute ln(OR) or ln(RR) and their standard errors before entering data into your meta-analysis software. The log-transformed effect size is what gets pooled; back-transform only for final reporting.
Using change scores and final values interchangeably. Combining studies that report change-from-baseline with studies that report post-intervention values introduces heterogeneity unless you account for baseline differences. Effect sizes from change scores and final values are not equivalent and should be analyzed separately or adjusted using appropriate formulas.
Not adjusting for clustering in cluster-RCTs. Cluster-randomized trials require a design effect adjustment: effective n = actual n / (1 + (m − 1) × ICC), where m is the average cluster size and ICC is the intraclass correlation coefficient. Failing to adjust inflates the effective sample size and produces falsely narrow confidence intervals.
Pooling effect sizes on different scales. Effect sizes must be on the same metric to be pooled. Combining odds ratios with standardized mean differences, or mixing adjusted and unadjusted estimates, produces meaningless results. Use the conversion formulas described above or analyze subgroups separately.
For a visual guide to interpreting your pooled results after calculating effect sizes, see our guide on how to read a forest plot. For an overview of professional evidence synthesis support, visit our evidence synthesis services overview.