Assess heterogeneity in your meta-analysis with I², Cochran's Q, τ² (DerSimonian-Laird or REML estimator), and prediction intervals. Import studies directly from the Effect Size Calculator or Forest Plot via the pipeline, or drag-drop a CSV/Excel file. Calculate the minimum number of studies needed to detect a given effect size.
Heterogeneity tab: Import studies from the Effect Size Calculator or Forest Plot via the pipeline banner, drag-drop a CSV/Excel file, or enter effect sizes and standard errors manually. Toggle between DerSimonian-Laird (DL) and REML (τ² estimator) in the results panel. REML uses Fisher scoring iteration for less biased estimates with small study counts. All inputs are auto-saved locally. Export results as CSV, Excel, or copy to clipboard.
Power tab: Enter your expected effect size, significance level, and desired power to estimate the minimum number of studies needed.
Move data between tools automatically. Compute effect sizes, then send results to Forest Plot, Funnel Plot, or Heterogeneity analysis with one click.
No data in pipeline yet. Compute effect sizes or convert data in any tool, then send it downstream.
Drag & drop a file or
CSV, TSV, Excel (.xlsx/.xls) - max 500 rows
Load sample data to see how the tool works, or clear all fields to start fresh.
Need this done professionally? Get a full heterogeneity analysis with expert statistical interpretation.
Get a Free QuoteLow heterogeneity. Studies are reasonably consistent. A fixed-effect model may be appropriate.
Moderate heterogeneity. Consider exploring sources via subgroup or meta-regression analyses.
High heterogeneity. Results should be interpreted with caution. Investigate sources before pooling.
An I-squared calculator quantifies the proportion of total variability in a set of effect estimates that is attributable to genuine between-study differences rather than within-study sampling error. Introduced by Higgins & Thompson (2002) and codified in the Cochrane Handbook (Higgins et al., 2023), I-squared has become the most widely reported heterogeneity statistic in systematic reviews because it offers an intuitive percentage-based interpretation. Cochrane RevMan, the standard software for Cochrane systematic reviews, reports I-squared alongside Cochran's Q and tau-squared as built-in heterogeneity statistics in every forest plot output. However, I-squared alone does not tell the full story. It is a relative measure that describes the ratio of between-study variance to total variance, and its value depends heavily on the precision of the included studies. Two meta-analyses with identical between-study variance can produce very different I-squared values if one contains large, precise trials and the other contains small, imprecise ones.
This is why a comprehensive heterogeneity test calculator must also report absolute measures. Tau-squared (the between-study variance on the effect-size scale) provides the information that I-squared cannot: how much the true effects actually vary across studies. DerSimonian & Laird (1986) proposed the most commonly used moment-based estimator of tau-squared, and it remains the default in many software packages. More modern estimators, such as restricted maximum likelihood (REML), now recommended as the preferred alternative to DerSimonian-Laird for tau-squared estimation, and the Paule-Mandel estimator, which has been shown to produce improved between-study variance estimates with better coverage properties, produce less biased estimates, particularly when the number of studies is small. Regardless of the estimator, tau-squared feeds directly into the random-effects model weight calculation and into prediction intervals, which describe the range within which the true effect of a future similar study is expected to fall.
The relationship between I-squared and tau-squared is governed by the equation I-squared = tau-squared / (tau-squared + typical within-study variance). This means that meta-analysis heterogeneity can appear low (small I-squared) even when the absolute between-study variance is clinically meaningful, a scenario that arises when the included studies are small and imprecise. Conversely, very large trials can inflate I-squared to alarming levels even when the absolute differences between study effects are trivially small. The Cochrane Handbook recommends interpreting I-squared alongside the confidence interval for I-squared itself, the Q-test p-value, and visual inspection of the forest plot. Prediction intervals, as advocated by IntHout et al. (2016), provide a valuable complement to I-squared by expressing the expected range of true effects in future similar studies, a clinically intuitive measure that can reveal meaningful heterogeneity even when I-squared appears moderate. Our effect size calculator helps ensure that the effect estimates entering your heterogeneity assessment are computed consistently, while the forest plot generator provides the visual context needed to judge whether observed variability is clinically important.
Beyond descriptive statistics, understanding the sources of heterogeneity is often the most valuable part of a meta-analysis. When I-squared exceeds 40-50%, PRISMA 2020 (Page et al., 2021) recommends pre-specified subgroup analyses or meta-regression to explore potential moderators. Study-level characteristics such as risk of bias rating, intervention dose, follow-up duration, and participant demographics may explain why effects differ across studies. As a preliminary step, outlier detection using externally studentized residuals can identify individual studies whose effect sizes fall far from the pooled estimate, flagging potential data errors or genuinely distinct study populations that warrant separate investigation. Our meta-regression data formatter structures moderator data for direct import into R metafor or Stata, while the RoB 2 assessment tool provides the methodological quality ratings that serve as common moderator variables. Together, these tools create a workflow where heterogeneity is not merely reported but actively investigated.
Finally, the tau-squared calculator output has direct implications for sample size planning in future research. When between-study variance is large, even a well-powered individual trial may not resolve the clinical question. What is needed instead is reduction in the sources of heterogeneity through standardized protocols and patient populations. Conversely, when tau-squared is near zero and the summary effect is imprecise, the field needs more or larger studies rather than better-designed ones. By combining I-squared, tau-squared, prediction intervals, and Cochran's Q in a single assessment, researchers gain a multidimensional view of heterogeneity that supports both the interpretation of current evidence and the planning of future research.
The DerSimonian-Laird (DL) estimator, introduced in 1986, is the most widely used method for estimating between-study variance (τ²) in random-effects meta-analysis. It is a moment-based estimator: it equates the observed Cochran's Q statistic to its expected value under the random-effects model and solves for τ². DL is computationally simple, non-iterative, and has been the default in Cochrane RevMan and most statistical software for decades. However, DL has well-documented limitations. It tends to underestimate τ² when the number of studies is small (fewer than 15–20), which in turn leads to confidence intervals that are too narrow and pooled estimates that appear more precise than they truly are. This negative bias can have real consequences for clinical decision-making, particularly in fields where meta-analyses routinely include only 5–10 studies.
The Restricted Maximum Likelihood (REML) estimator addresses these shortcomings through an iterative likelihood-based approach. REML maximizes a restricted log-likelihood function that accounts for the loss of degrees of freedom from estimating the overall mean, analogous to using n − 1 instead of n in a sample variance calculation. Our implementation uses Fisher scoring, running up to 100 iterations with a convergence threshold of 1×10⁻&sup8;, initialized at the DL estimate to ensure rapid convergence. Simulation studies by Veroniki et al. (2016) and Langan et al. (2019) have shown that REML produces less biased τ² estimates than DL, particularly when the number of studies is small or when studies have unequal sample sizes. The Cochrane Statistical Methods Group and the metafor R package (Viechtbauer, 2010) now recommend REML as the preferred default estimator for random-effects meta-analysis.
When should you use each? For meta-analyses with 20 or more reasonably homogeneous studies, DL and REML typically produce very similar τ² estimates, and the choice is unlikely to affect conclusions. For smaller meta-analyses, which represent the majority of published systematic reviews, REML is the safer choice because it reduces the risk of underestimating between-study variance. When you select REML in this calculator, the DL estimate is displayed alongside for comparison: if the two values are similar, you can be more confident in the result; if they diverge substantially, it signals that the DL estimate may be biased and REML should be preferred. In either case, always report which estimator was used, as recommended by PRISMA 2020 (Page et al., 2021) and the Cochrane Handbook (Higgins et al., 2023).
I² describes the percentage of variability in effect estimates that is due to heterogeneity rather than sampling error. An I² of 0% means all variability is due to chance; 100% means all variability reflects true differences between studies. It was proposed by Higgins & Thompson (2002).
τ² is the between-study variance in a random-effects meta-analysis. Unlike I², τ² is on the scale of the effect size, making it useful for calculating prediction intervals. It's estimated using methods like DerSimonian-Laird or REML.
Use random-effects when you expect genuine variation between studies (different populations, interventions, settings). If I² > 0 or the Q test is significant, a random-effects model is generally more appropriate than fixed-effect.
Technically, you can pool 2+ studies. However, with fewer than 5 studies, estimates of between-study variance (τ²) are imprecise, and tests for heterogeneity have low power. Most methodologists recommend at least 5-10 studies for reliable random-effects results.
An I² of 75% means that 75% of the total variability in effect estimates is due to true differences between studies (heterogeneity) rather than sampling error. The Cochrane Handbook suggests: I² = 0–40% might not be important, 30–60% may represent moderate heterogeneity, 50–90% may represent substantial heterogeneity, and 75–100% represents considerable heterogeneity. Always interpret I² alongside τ² and the prediction interval.
I² is a percentage describing the proportion of variability due to heterogeneity (relative measure). τ² (tau-squared) is the absolute between-study variance in the true effect sizes, expressed on the effect size scale. I² depends on study precision and sample size; it increases mechanically as studies get larger. τ² is scale-dependent but precision-independent, making it more appropriate for comparing heterogeneity across meta-analyses.
Use a random-effects model when you expect true effect sizes to vary across studies due to differences in populations, interventions, or settings, which is almost always the case in systematic reviews. The Cochrane Handbook recommends random-effects (e.g., DerSimonian-Laird) as the default unless there is strong reason to believe all studies estimate the same true effect. Fixed-effect models are appropriate when studies are very similar in design and population.
REML (Restricted Maximum Likelihood) is an iterative, likelihood-based estimator for τ² that accounts for the degrees of freedom lost when estimating the overall mean. It produces less biased estimates than DerSimonian-Laird, especially when the number of studies is small (fewer than 15–20). This calculator implements REML via Fisher scoring with up to 100 iterations and a convergence threshold of 1×10⁻⁸. REML is now recommended as the default by the Cochrane Statistical Methods Group and the metafor R package.
Yes. If you have already computed effect sizes using the Effect Size Calculator or prepared data in the Forest Plot Generator, the pipeline banner at the top of the tool lets you import those studies directly with no re-entry needed. You can also drag-drop a CSV or Excel file, or paste spreadsheet data. All imported data can be edited in place before running the analysis.
Ready to visualize your pooled results? Our meta-analysis forest plot tool renders publication-ready forest plots with subgroup diamonds and prediction intervals. Before pooling, determine whether your study has adequate statistical power with the power analysis calculator, which estimates the probability of detecting a true effect at various sample sizes. To compute the standardized mean differences, odds ratios, or correlations you need as inputs, use our effect size calculator with built-in confidence intervals and variance estimates.
Reviewed by
Dr. Sarah Mitchell holds a PhD in Biostatistics from Johns Hopkins Bloomberg School of Public Health and has over 15 years of experience in systematic review methodology and meta-analysis. She has authored or co-authored 40+ peer-reviewed publications in journals including the Journal of Clinical Epidemiology, BMC Medical Research Methodology, and Research Synthesis Methods. A former Cochrane Review Group statistician and current editorial board member of Systematic Reviews, Dr. Mitchell has supervised 200+ evidence synthesis projects across clinical medicine, public health, and social sciences. She reviews all Research Gold tools to ensure statistical accuracy and compliance with Cochrane Handbook and PRISMA 2020 standards.
Our PhD team runs complete meta-analyses: data extraction, effect size computation, forest plots, sensitivity analysis, and a manuscript ready for journal submission. Average turnaround: 2-4 weeks.