Sensitivity analysis in systematic reviews tests whether the conclusions of your review hold up when key methodological decisions are varied. Every systematic review involves judgment calls, from which studies to include to which statistical model to use, and sensitivity analysis reveals which of those decisions actually matter for the final result. A finding that survives multiple sensitivity analyses is robust. One that flips under reasonable alternative choices is fragile, and readers deserve to know.
The Cochrane Handbook describes sensitivity analysis as a "crucial component" of systematic reviews, and PRISMA 2020 requires that all pre-specified sensitivity analyses and their results be reported regardless of outcome. Yet many published reviews either skip sensitivity analysis entirely or bury a single leave-one-out analysis in supplementary materials. This guide covers the full toolkit: when sensitivity analysis is needed, which methods to use, how to interpret and report results, and how to pre-specify analyses in your protocol.
What Sensitivity Analysis Tests
The core question of sensitivity analysis is simple: "Would my conclusion change if I had made a different reasonable decision?" This applies to every stage of a systematic review:
- Study inclusion. Would results differ if borderline studies (unclear eligibility, conference abstracts, unpublished data) were included or excluded?
- Data extraction. When a study reports multiple time points, outcome measures, or subgroups, does the choice of which data to extract affect the pooled result?
- Risk of bias. Does restricting the analysis to studies with low risk of bias change the conclusion?
- Statistical model. Does switching between fixed-effect and random-effects models alter the pooled estimate or its significance?
- Missing data. When studies have incomplete outcome data, do best-case and worst-case imputation scenarios produce different conclusions?
- Effect size measure. For binary outcomes, do odds ratios, risk ratios, and risk differences tell the same story?
Each of these represents a decision node where an alternative choice was equally defensible.
Leave-One-Out Analysis
Leave-one-out sensitivity analysis is the most common and most straightforward method. It sequentially removes each study from the meta-analysis, recalculates the pooled estimate, and examines whether any single study disproportionately influences the result.
How to interpret: If the pooled effect size and its statistical significance remain stable regardless of which study is removed, your findings are robust to individual study influence. If removing a single study changes the direction of the effect (e.g., from favoring treatment to favoring control) or changes statistical significance (from significant to non-significant or vice versa), that study is influential and warrants close examination.
What to do with influential studies: An influential study is not necessarily problematic. It may be the largest, highest-quality study that legitimately carries more weight. Investigate whether it differs clinically (different population, dose, or comparator), methodologically (different design, lower risk of bias), or statistically (different follow-up duration, different outcome definition). Report your findings transparently rather than excluding the study without justification.
Limitations: Leave-one-out analysis only tests single-study influence. It does not detect situations where two or three studies collectively drive the result, nor does it address methodological decisions beyond study inclusion.
Software implementation: In R, metafor::leave1out() performs this automatically. In Stata, metainf provides similar functionality. RevMan does not include built-in leave-one-out analysis. Our sensitivity analysis tool provides an interactive interface for exploring study influence.
Decision-Node Sensitivity Analysis
Decision-node analysis systematically varies choices made at each stage of the review. Unlike leave-one-out (which only tests study inclusion), this approach examines the full range of methodological decisions.
Pre-specify decision nodes in your protocol. For each node, identify the primary analysis choice and at least one reasonable alternative:
| Decision Node | Primary Analysis | Sensitivity Analysis |
|---|---|---|
| Study eligibility | Include RCTs and quasi-experimental | Restrict to RCTs only |
| Risk of bias | Include all studies | Restrict to low risk of bias only |
| Missing data | Complete case analysis | Best-case/worst-case imputation |
| Statistical model | Random-effects (REML) | Fixed-effect model |
| Effect measure | Standardized mean difference | Mean difference (if scales comparable) |
| Outlier handling | Include all studies | Exclude statistical outliers (> 3 SD from pooled mean) |
| Publication type | Include only peer-reviewed | Add grey literature |
Run the meta-analysis under each alternative specification and present results side by side. This gives readers and guideline panels a comprehensive picture of evidence robustness.
Threshold Sensitivity Analysis
Threshold analysis asks: "How much would the data need to change to overturn the conclusion?" Rather than testing specific alternative decisions, it quantifies the fragility of the result.
Fragility index for meta-analysis: For binary outcomes, the fragility index counts the minimum number of events that, if reassigned from treatment to control (or vice versa) across studies, would change the statistical significance of the pooled result. A fragility index of 2 means that reassigning just 2 events would flip the conclusion, indicating a fragile finding.
Need your sensitivity analyses planned and executed by experienced methodologists? Our team runs every standard and advanced sensitivity analysis, from leave-one-out through threshold analysis, and delivers publication-ready results tables. Get a free quote and let us strengthen your systematic review manuscript, or explore our full range of research services.
Threshold for clinical relevance: Beyond statistical significance, you can calculate how much the pooled effect would need to shift to cross a clinically meaningful threshold. If the pooled risk ratio is 0.72 and the minimally important difference is 0.85, the question becomes: "What would need to change for the effect to become clinically unimportant?"
Unmeasured confounding sensitivity analysis: For systematic reviews of observational studies, the E-value quantifies how strong an unmeasured confounder would need to be to explain away the observed association. A large E-value means the result is robust to potential confounding; a small E-value means even weak confounding could account for the finding.
Risk of Bias Sensitivity Analysis
Restricting the meta-analysis to studies assessed as having low risk of bias is one of the most important and commonly performed sensitivity analyses. The Cochrane Handbook and GRADE framework both recommend this approach.
Implementation: After completing your risk of bias assessment (using RoB 2 for RCTs or ROBINS-I for non-randomized studies), run two meta-analyses: one including all studies and one restricted to those rated as low risk of bias overall.
Interpreting discordance: If the pooled effect is significant when all studies are included but non-significant when restricted to low-bias studies, this has direct implications for GRADE certainty ratings. The evidence may be rated down for risk of bias if the result depends on studies with serious methodological limitations.
Stratified analysis: Rather than a binary include/exclude approach, stratify studies by risk of bias level (low, some concerns, high) and test for interaction. This reveals whether effect sizes differ systematically by study quality, a pattern sometimes called small-study effects when combined with publication bias.
Reporting Sensitivity Analysis Results
PRISMA 2020 item 23 requires reporting results of all sensitivity analyses, including those where conclusions did not change. The SWiM guideline provides additional reporting recommendations for non-quantitative sensitivity analyses.
Best practices for reporting:
- Table format. Present all sensitivity analyses in a single summary table with columns for: analysis description, number of studies included, pooled estimate with confidence interval, I-squared, and whether the conclusion changed
- Forest plot overlay. For key sensitivity analyses, consider showing the restricted analysis alongside the primary analysis in a single forest plot
- Narrative interpretation. State explicitly whether the primary conclusion was robust or sensitive to each analysis. Avoid burying important sensitivity results in supplementary materials
- Protocol concordance. Note which sensitivity analyses were pre-specified in the protocol versus conducted post hoc
Example reporting language: "The primary analysis included 14 trials and found a pooled standardized mean difference of -0.45 (95% CI: -0.62 to -0.28) favoring the intervention. Restricting to the 8 trials with low risk of bias yielded a smaller but still significant effect (SMD -0.31, 95% CI: -0.52 to -0.10). Leave-one-out analysis showed that no single trial changed the direction or significance of the pooled estimate. Results were consistent when using a fixed-effect model (SMD -0.42, 95% CI: -0.55 to -0.29)."
Pre-Specifying Sensitivity Analyses in the Protocol
Sensitivity analyses gain credibility when pre-specified. Include a dedicated section in your protocol or PROSPERO registration:
- List each planned sensitivity analysis with justification. Example: "We will restrict the meta-analysis to studies rated as low risk of bias to assess whether pooled effects are driven by methodologically weaker studies."
- Distinguish from subgroup analyses. Subgroup analyses explore effect modification by clinical characteristics. Sensitivity analyses test robustness to methodological choices. Some analyses could be either (e.g., restricting by study design), so label them clearly.
- Limit the number. Running 20 sensitivity analyses inflates the chance of finding one that "works." Pre-specify 3-6 that address the most important decision nodes for your review question.
- Allow for post hoc additions. State that additional sensitivity analyses may be conducted if unexpected issues arise during data extraction or analysis, and label these as exploratory.
Common Mistakes to Avoid
Running sensitivity analysis only when results are unexpected. If you only test robustness when the primary result surprises you, this introduces bias. Pre-specify analyses regardless of what you expect to find.
Interpreting a non-significant sensitivity analysis as "no effect." If restricting to low risk of bias studies makes the pooled effect non-significant, this does not prove the intervention is ineffective. It may simply reflect reduced statistical power from fewer studies. Report the point estimate and confidence interval, not just the p-value.
Dropping studies based on sensitivity results. Sensitivity analysis is diagnostic, not prescriptive. If leave-one-out reveals an influential study, investigate and discuss it. Do not remove it from the primary analysis without a pre-specified, methodologically justified reason.
Ignoring sensitivity analyses that change the conclusion. If one of your pre-specified analyses overturns the result, this is arguably the most important finding of your review. Reporting only the analyses that support your conclusion is a form of selective reporting.
Not running enough sensitivity analyses. A single leave-one-out analysis is better than nothing, but it only addresses one type of uncertainty. Aim for sensitivity analyses that cover study inclusion, risk of bias, statistical model, and at least one domain specific to your review question (e.g., missing data handling, dose categorization, follow-up duration).