AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews, version 2) is the standard critical appraisal instrument for evaluating the methodological quality of systematic reviews. Published in 2017 by Shea and colleagues, AMSTAR 2 contains 16 items organized into critical and non-critical domains that assess whether a systematic review was conducted with sufficient rigor to produce reliable results.
AMSTAR 2 is used in umbrella reviews to assess included systematic reviews, by clinical guideline development groups to evaluate evidence quality, by peer reviewers assessing manuscripts, and by researchers and clinicians who need to determine whether a systematic review's conclusions can be trusted. Understanding how to apply and interpret AMSTAR 2 is essential for anyone who reads, conducts, or commissions systematic reviews.
The 16 AMSTAR 2 Items
AMSTAR 2 evaluates systematic reviews across 16 methodological domains. Seven are designated as critical domains (marked with an asterisk below) that can independently lower the overall confidence rating.
| # | Item | Critical? |
|---|---|---|
| 1 | Did the research questions and inclusion criteria include PICO components? | No |
| 2 | Was the review protocol registered before the review began, and were significant deviations reported?* | Yes |
| 3 | Did the review authors explain their selection of study designs for inclusion? | No |
| 4 | Did the review authors use a comprehensive literature search strategy?* | Yes |
| 5 | Was study selection performed in duplicate? | No |
| 6 | Was data extraction performed in duplicate? | No |
| 7 | Did the review authors provide a list of excluded studies and justify the exclusions?* | Yes |
| 8 | Did the review describe the included studies in adequate detail? | No |
| 9 | Did the review use a satisfactory technique to assess risk of bias in individual studies?* | Yes |
| 10 | Did the review report funding sources for included studies? | No |
| 11 | If meta-analysis was performed, did the review use appropriate methods for statistical combination?* | Yes |
| 12 | If meta-analysis was performed, did the review assess the potential impact of risk of bias on the results?* | Yes |
| 13 | Did the review account for risk of bias when interpreting and discussing results?* | Yes |
| 14 | Did the review provide a satisfactory explanation for any heterogeneity observed? | No |
| 15 | If quantitative synthesis was performed, did the review carry out adequate investigation of publication bias and discuss its likely impact?* | Yes |
| 16 | Did the review authors report any potential sources of conflict of interest, including funding? | No |
The 7 Critical Domains Explained
The critical domains represent methodological elements whose absence or inadequacy is most likely to affect the validity of the review's conclusions.
Item 2: Protocol Registration
The review should have a registered protocol (on PROSPERO or equivalent) published before the review began. The protocol should follow PRISMA-P and any deviations from the protocol should be reported and justified. Protocol registration prevents selective reporting and post-hoc methodological changes.
Item 4: Comprehensive Search Strategy
The search should cover at least two relevant databases with a documented Boolean search strategy. The search should include reference list checking, expert consultation, and grey literature searching where appropriate. See our search strategy guide for what constitutes a comprehensive search.
Item 7: List of Excluded Studies With Justifications
The review should provide a list of studies excluded at the full-text screening stage with specific reasons for each exclusion, following the eligibility criteria defined in the protocol. This enables readers to assess whether relevant studies were inappropriately excluded.
Item 9: Risk of Bias Assessment
The review should use an appropriate, validated tool for risk of bias assessment. For randomized controlled trials, this means RoB 2. For non-randomized studies, ROBINS-I or the Newcastle-Ottawa Scale. The assessment should be performed at the individual study level.
Item 11: Appropriate Meta-Analytical Methods
If meta-analysis was conducted, the review should use appropriate statistical methods. This includes selecting the correct effect size metric, choosing between random and fixed effects models with justification, and assessing heterogeneity.
Item 12: Impact of Risk of Bias on Meta-Analysis Results
The review should assess whether the risk of bias in included studies affects the meta-analysis results, typically through sensitivity analysis excluding high-risk-of-bias studies.
Item 13: Risk of Bias in Interpretation
The discussion and conclusions should account for the risk of bias findings. Reviews that identify high risk of bias in many included studies but still draw strong conclusions without caveat fail this item.
How to Rate Overall Confidence
AMSTAR 2 does not produce a numerical score. Instead, the pattern of weaknesses across the 16 items determines an overall confidence rating:
| Rating | Criteria |
|---|---|
| High | No critical flaws and no more than one non-critical weakness |
| Moderate | No critical flaws but more than one non-critical weakness |
| Low | One critical flaw with or without non-critical weaknesses |
| Critically Low | More than one critical flaw with or without non-critical weaknesses |
This rating scheme means that a single critical flaw (such as no risk of bias assessment or no comprehensive search) can reduce confidence to "low" regardless of how well other aspects of the review were conducted. This reflects the reality that certain methodological shortcuts fundamentally compromise a review's reliability.
Need rigorous quality assessment for your systematic review or umbrella review? Our methodologists apply AMSTAR 2, RoB 2, ROBINS-I, and GRADE with dual-reviewer calibration and transparent reporting. Get a free quote, or explore our systematic review services.
Applying AMSTAR 2: Practical Tips
Calibrate Before Starting
If multiple assessors are evaluating systematic reviews, calibrate by independently assessing the same 2-3 reviews and comparing results. Discuss disagreements and establish decision rules for ambiguous items. This is the same calibration principle used in screening pilot testing.
Use the Detailed Guidance
Each AMSTAR 2 item includes detailed sub-questions and guidance in the original publication. The item descriptions above are summaries. For accurate application, consult the full item descriptions in Shea et al. (2017, BMJ) or the AMSTAR 2 website.
Document Your Assessments
Record your assessment for each item with a brief justification, not just a yes/no. This transparency allows readers of your umbrella review to evaluate your quality assessments and is expected by peer reviewers.
Present Results in Tables
Create a summary table showing the AMSTAR 2 rating for each item across all assessed reviews. Use color coding (green for yes, red for no, yellow for partial) for visual clarity. Include the overall confidence rating in the final column. Use our risk of bias chart tool for creating visual quality assessment summaries.
Limitations of AMSTAR 2
AMSTAR 2 has known limitations:
- Designed for intervention reviews only. It is not appropriate for diagnostic test accuracy reviews, qualitative reviews, scoping reviews, or prognostic reviews
- Binary critical flaw impact. A single critical flaw reduces confidence to "low" regardless of the flaw's actual impact on the review's conclusions
- Does not assess reporting quality. AMSTAR 2 evaluates conduct, not reporting. A review may be well-conducted but poorly reported, or vice versa
- Subjective judgment required. Several items require assessor judgment, which can vary between assessors despite calibration
Despite these limitations, AMSTAR 2 remains the most widely accepted and validated tool for assessing systematic review quality and is the standard used in published umbrella reviews.
Frequently Asked Questions
The FAQ section below addresses the most common questions about using the AMSTAR 2 checklist.