Sensitivity analysis in systematic reviews tests whether the conclusions of your review hold up when key methodological decisions are varied. Every systematic review involves judgment calls, from which studies to include to which statistical model to use, and sensitivity analysis reveals which of those decisions actually matter for the final result. A finding that survives multiple sensitivity analyses is robust. One that flips under reasonable alternative choices is fragile, and readers deserve to know.

The Cochrane Handbook describes sensitivity analysis as a "crucial component" of systematic reviews, and PRISMA 2020 requires that all pre-specified sensitivity analyses and their results be reported regardless of outcome. Yet many published reviews either skip sensitivity analysis entirely or bury a single leave-one-out analysis in supplementary materials. This guide covers the full toolkit: when sensitivity analysis is needed, which methods to use, how to interpret and report results, and how to pre-specify analyses in your protocol.

What Sensitivity Analysis Tests

The core question of sensitivity analysis is simple: "Would my conclusion change if I had made a different reasonable decision?" This applies to every stage of a systematic review:

Each of these represents a decision node where an alternative choice was equally defensible.

Leave-One-Out Analysis

Leave-one-out sensitivity analysis is the most common and most straightforward method. It sequentially removes each study from the meta-analysis, recalculates the pooled estimate, and examines whether any single study disproportionately influences the result.

How to interpret: If the pooled effect size and its statistical significance remain stable regardless of which study is removed, your findings are robust to individual study influence. If removing a single study changes the direction of the effect (e.g., from favoring treatment to favoring control) or changes statistical significance (from significant to non-significant or vice versa), that study is influential and warrants close examination.

What to do with influential studies: An influential study is not necessarily problematic. It may be the largest, highest-quality study that legitimately carries more weight. Investigate whether it differs clinically (different population, dose, or comparator), methodologically (different design, lower risk of bias), or statistically (different follow-up duration, different outcome definition). Report your findings transparently rather than excluding the study without justification.

Limitations: Leave-one-out analysis only tests single-study influence. It does not detect situations where two or three studies collectively drive the result, nor does it address methodological decisions beyond study inclusion.

Software implementation: In R, metafor::leave1out() performs this automatically. In Stata, metainf provides similar functionality. RevMan does not include built-in leave-one-out analysis. Our sensitivity analysis tool provides an interactive interface for exploring study influence.

Decision-Node Sensitivity Analysis

Decision-node analysis systematically varies choices made at each stage of the review. Unlike leave-one-out (which only tests study inclusion), this approach examines the full range of methodological decisions.

Pre-specify decision nodes in your protocol. For each node, identify the primary analysis choice and at least one reasonable alternative:

Decision NodePrimary AnalysisSensitivity Analysis
Study eligibilityInclude RCTs and quasi-experimentalRestrict to RCTs only
Risk of biasInclude all studiesRestrict to low risk of bias only
Missing dataComplete case analysisBest-case/worst-case imputation
Statistical modelRandom-effects (REML)Fixed-effect model
Effect measureStandardized mean differenceMean difference (if scales comparable)
Outlier handlingInclude all studiesExclude statistical outliers (> 3 SD from pooled mean)
Publication typeInclude only peer-reviewedAdd grey literature

Run the meta-analysis under each alternative specification and present results side by side. This gives readers and guideline panels a comprehensive picture of evidence robustness.

Threshold Sensitivity Analysis

Threshold analysis asks: "How much would the data need to change to overturn the conclusion?" Rather than testing specific alternative decisions, it quantifies the fragility of the result.

Fragility index for meta-analysis: For binary outcomes, the fragility index counts the minimum number of events that, if reassigned from treatment to control (or vice versa) across studies, would change the statistical significance of the pooled result. A fragility index of 2 means that reassigning just 2 events would flip the conclusion, indicating a fragile finding.

Need your sensitivity analyses planned and executed by experienced methodologists? Our team runs every standard and advanced sensitivity analysis, from leave-one-out through threshold analysis, and delivers publication-ready results tables. Get a free quote and let us strengthen your systematic review manuscript, or explore our full range of research services.

Threshold for clinical relevance: Beyond statistical significance, you can calculate how much the pooled effect would need to shift to cross a clinically meaningful threshold. If the pooled risk ratio is 0.72 and the minimally important difference is 0.85, the question becomes: "What would need to change for the effect to become clinically unimportant?"

Unmeasured confounding sensitivity analysis: For systematic reviews of observational studies, the E-value quantifies how strong an unmeasured confounder would need to be to explain away the observed association. A large E-value means the result is robust to potential confounding; a small E-value means even weak confounding could account for the finding.

Risk of Bias Sensitivity Analysis

Restricting the meta-analysis to studies assessed as having low risk of bias is one of the most important and commonly performed sensitivity analyses. The Cochrane Handbook and GRADE framework both recommend this approach.

Implementation: After completing your risk of bias assessment (using RoB 2 for RCTs or ROBINS-I for non-randomized studies), run two meta-analyses: one including all studies and one restricted to those rated as low risk of bias overall.

Interpreting discordance: If the pooled effect is significant when all studies are included but non-significant when restricted to low-bias studies, this has direct implications for GRADE certainty ratings. The evidence may be rated down for risk of bias if the result depends on studies with serious methodological limitations.

Stratified analysis: Rather than a binary include/exclude approach, stratify studies by risk of bias level (low, some concerns, high) and test for interaction. This reveals whether effect sizes differ systematically by study quality, a pattern sometimes called small-study effects when combined with publication bias.

Reporting Sensitivity Analysis Results

PRISMA 2020 item 23 requires reporting results of all sensitivity analyses, including those where conclusions did not change. The SWiM guideline provides additional reporting recommendations for non-quantitative sensitivity analyses.

Best practices for reporting:

Example reporting language: "The primary analysis included 14 trials and found a pooled standardized mean difference of -0.45 (95% CI: -0.62 to -0.28) favoring the intervention. Restricting to the 8 trials with low risk of bias yielded a smaller but still significant effect (SMD -0.31, 95% CI: -0.52 to -0.10). Leave-one-out analysis showed that no single trial changed the direction or significance of the pooled estimate. Results were consistent when using a fixed-effect model (SMD -0.42, 95% CI: -0.55 to -0.29)."

Pre-Specifying Sensitivity Analyses in the Protocol

Sensitivity analyses gain credibility when pre-specified. Include a dedicated section in your protocol or PROSPERO registration:

  1. List each planned sensitivity analysis with justification. Example: "We will restrict the meta-analysis to studies rated as low risk of bias to assess whether pooled effects are driven by methodologically weaker studies."
  2. Distinguish from subgroup analyses. Subgroup analyses explore effect modification by clinical characteristics. Sensitivity analyses test robustness to methodological choices. Some analyses could be either (e.g., restricting by study design), so label them clearly.
  3. Limit the number. Running 20 sensitivity analyses inflates the chance of finding one that "works." Pre-specify 3-6 that address the most important decision nodes for your review question.
  4. Allow for post hoc additions. State that additional sensitivity analyses may be conducted if unexpected issues arise during data extraction or analysis, and label these as exploratory.

Common Mistakes to Avoid

Running sensitivity analysis only when results are unexpected. If you only test robustness when the primary result surprises you, this introduces bias. Pre-specify analyses regardless of what you expect to find.

Interpreting a non-significant sensitivity analysis as "no effect." If restricting to low risk of bias studies makes the pooled effect non-significant, this does not prove the intervention is ineffective. It may simply reflect reduced statistical power from fewer studies. Report the point estimate and confidence interval, not just the p-value.

Dropping studies based on sensitivity results. Sensitivity analysis is diagnostic, not prescriptive. If leave-one-out reveals an influential study, investigate and discuss it. Do not remove it from the primary analysis without a pre-specified, methodologically justified reason.

Ignoring sensitivity analyses that change the conclusion. If one of your pre-specified analyses overturns the result, this is arguably the most important finding of your review. Reporting only the analyses that support your conclusion is a form of selective reporting.

Not running enough sensitivity analyses. A single leave-one-out analysis is better than nothing, but it only addresses one type of uncertainty. Aim for sensitivity analyses that cover study inclusion, risk of bias, statistical model, and at least one domain specific to your review question (e.g., missing data handling, dose categorization, follow-up duration).