Subgroup analysis and meta-regression are the two primary methods for investigating why effect sizes vary across studies in a meta-analysis. When your I-squared value is high, indicating substantial heterogeneity beyond what chance alone would explain, these methods help identify which study-level characteristics are associated with larger or smaller effects. Understanding when and how to use each method is essential for producing informative, transparent meta-analyses that go beyond a single pooled estimate.
Both methods address the same fundamental question: do effect sizes differ depending on specific study characteristics? But they differ in how they model this relationship. Subgroup analysis is simpler, dividing studies into discrete groups and comparing pooled estimates. Meta-regression is more flexible, using weighted regression to model the relationship between covariates and effect sizes. The choice between them depends on the nature of your moderator variables, the number of included studies, and whether your investigation is prespecified or exploratory.
When to Investigate Heterogeneity
Before conducting subgroup analysis or meta-regression, confirm that meaningful heterogeneity exists. The Cochrane Handbook recommends investigating heterogeneity when:
- I-squared exceeds 50 percent, suggesting moderate to substantial heterogeneity
- The Q-test is statistically significant, indicating that between-study variation exceeds what would be expected by chance
- Clinical or methodological diversity among included studies suggests that effect sizes may legitimately differ
- The prediction interval around the pooled estimate is wide, indicating that the true effect in a new study could differ substantially from the average
If heterogeneity is low (I-squared below 30%) and studies are clinically similar, subgroup analysis and meta-regression add little value and may produce spurious findings through multiple testing.
Subgroup Analysis: Methodology
How It Works
Subgroup analysis divides included studies into groups based on a categorical study-level characteristic and then calculates separate pooled effect estimates for each group. The key output is the test for subgroup differences (also called the interaction test or between-group Q-test), which evaluates whether the pooled estimates differ significantly between groups.
Example: A meta-analysis of exercise interventions for depression includes studies from high-income and low-income countries. Subgroup analysis pools the effect estimate separately for each country group and tests whether the pooled effects are statistically different.
Step-by-Step Process
- Prespecify your subgroups in the protocol. Limit to 3-5 subgroups with strong clinical or theoretical rationale
- Classify each study into the appropriate subgroup based on the moderator variable
- Pool effect sizes within each subgroup using the same meta-analytical model as your main analysis
- Conduct the test for subgroup differences to determine whether the between-group variation is statistically significant
- Present results with separate forest plots or a single forest plot with subgroup sections
- Interpret with caution, especially if the analysis was not prespecified
Interpreting the Test for Subgroup Differences
The correct way to evaluate subgroup differences is the interaction test, which directly compares the pooled estimates between groups. A common mistake is comparing whether individual subgroup estimates are statistically significant. This approach is flawed because a significant estimate in one subgroup and a non-significant estimate in another does not mean the effects are different; the confidence intervals may overlap substantially.
Meta-Regression: Methodology
How It Works
Meta-regression uses weighted least squares regression (or restricted maximum likelihood) to model the relationship between one or more study-level covariates and the effect size. Each study is a data point, with the effect size as the dependent variable and the study-level characteristic as the independent variable. Studies are weighted by their precision (inverse variance).
Example: A meta-analysis includes studies with intervention durations ranging from 4 to 52 weeks. Meta-regression models whether longer intervention duration is associated with larger effect sizes, treating duration as a continuous variable.
When Meta-Regression Is Better Than Subgroup Analysis
- Continuous moderator variables. Subgroup analysis requires categorizing a continuous variable (e.g., splitting age into "young" and "old"), which loses information. Meta-regression can model the continuous relationship directly
- Multiple covariates. Meta-regression can include multiple covariates simultaneously, allowing you to assess the independent association of each while controlling for others (although the number of studies rarely supports more than 2-3 covariates)
- Dose-response relationships. Meta-regression can model whether effects increase linearly or non-linearly with dose, duration, or intensity
Minimum Studies Required
The rule of thumb is 10 studies per covariate in the meta-regression model. This means:
- With 10-15 studies, examine only 1 covariate
- With 20-30 studies, examine up to 2-3 covariates
- With fewer than 10 studies, meta-regression should not be attempted
Even with sufficient studies, meta-regression in systematic reviews has low statistical power because the number of data points (studies) is typically small compared to individual participant data analyses.
Need expert biostatistical support for your meta-analysis? Our team conducts subgroup analyses, meta-regression, sensitivity analyses, and publication bias assessment with publication-ready output. Get a free quote, or explore our biostatistics consulting services.
Common Pitfalls
1. Multiple Testing Without Correction
Testing many subgroups inflates the probability of finding at least one statistically significant difference by chance alone. If you test 10 subgroups at the 5% significance level, there is approximately a 40% chance of finding at least one "significant" result even when no true differences exist. Prespecify a limited number of subgroups and adjust your interpretation accordingly.
2. Ecological Bias (Ecological Fallacy)
Both subgroup analysis and meta-regression use study-level data, not individual participant data. An association observed at the study level may not reflect the true relationship at the individual level. For example, studies conducted in countries with older populations may show different effects, but this does not mean that older individuals within those studies respond differently.
3. Confounding
Study-level characteristics are often correlated. Studies from high-income countries may also be larger, more recent, and use different intervention protocols. Without controlling for these correlations (which requires meta-regression with multiple covariates and sufficient studies), an apparent subgroup difference may be explained by a confounding study characteristic.
4. Post-Hoc Subgroup Analysis
Subgroup analyses conducted after seeing the data (post-hoc) are inherently exploratory and should be labeled as such. Readers and peer reviewers are appropriately skeptical of post-hoc subgroup findings because they are susceptible to data-dredging and selective reporting.
5. Ignoring Within-Study Variation
Subgroup analysis categorizes entire studies, but many studies include diverse participants. A study categorized as "adults over 65" may include participants aged 65 to 95 with very different responses. Individual participant data meta-analysis is the gold standard for investigating participant-level moderators but requires access to raw data from each study.
Reporting Subgroup Analysis and Meta-Regression
PRISMA 2020 requires transparent reporting of all planned and conducted subgroup analyses and meta-regressions:
- Prespecification: State which analyses were prespecified in the protocol and which were post-hoc
- Rationale: Provide a clinical or theoretical rationale for each subgroup
- Methods: Describe the statistical methods used (fixed or random effects, Q-test for interaction, meta-regression model)
- Results: Report the test for subgroup differences (Q-statistic, degrees of freedom, p-value) for each comparison
- Number per subgroup: Report the number of studies and total participants in each subgroup
- Forest plots: Present forest plots showing subgroup results, which can be generated using our forest plot generator
- Interpretation: Clearly distinguish between prespecified and exploratory findings
Frequently Asked Questions
The FAQ section below addresses the most common questions about subgroup analysis and meta-regression in meta-analysis.