Assess the methodological quality of systematic reviews using the AMSTAR-2 framework (Shea et al., 2017) with auto-calculated confidence ratings for your umbrella review.
Add one row per systematic review being appraised. Click each colored circle to cycle through judgments: + Yes, ~ Partial Yes, − No (and N/A for Q11, Q14, Q15). Items marked with are critical domains. The overall confidence rating is calculated automatically using the AMSTAR-2 algorithm (Shea et al., 2017). Use the group tabs to switch between Items 1-8 and Items 9-16, then export as a high-resolution PNG.
Load sample data to see how the tool works, or clear all fields to start fresh.
| Systematic Review | Q1 PICO components in research questions | Q2 Protocol registered before review | Q3 Study design selection explained | Q4 Comprehensive literature search | Q5 Study selection in duplicate | Q6 Data extraction in duplicate | Q7 List of excluded studies with justifications | Q8 Included studies described in detail | Overall Confidence | |
|---|---|---|---|---|---|---|---|---|---|---|
| - | ||||||||||
| - | ||||||||||
| - |
AMSTAR 2 (Shea et al., 2017, BMJ) • Generated with Research Gold
Click Add Review to create a row for each systematic review included in your umbrella review or overview of reviews. Enter the study identifier (e.g., Author Year) so your assessment table maps directly to your PRISMA flow diagram and characteristics of included reviews table.
For each review, click through the 16 AMSTAR-2 items across two tabs (Items 1-8 and Items 9-16). Cycle judgments between Yes, Partial Yes, No, and Not Applicable. Each item addresses a specific methodological feature such as protocol registration, search strategy, or statistical methods.
Pay special attention to Items 2, 4, 7, 9, 11, 13, and 15, which are marked as critical. A No on any critical item will reduce the overall confidence to Low or Critically Low. These items represent methodological steps where flaws most commonly produce misleading conclusions in systematic reviews.
The tool automatically calculates the overall confidence rating (High, Moderate, Low, or Critically Low) for each review using the AMSTAR-2 algorithm. High means no critical flaws and no non-critical weaknesses. Moderate means non-critical weaknesses only. Low means one critical flaw. Critically Low means two or more critical flaws.
Review the summary bar chart showing the distribution of Yes, Partial Yes, and No judgments across all included reviews for each item. This visualization reveals which methodological features are consistently weak across the body of included reviews.
Download the assessment table and summary bar chart as high-resolution PNG files for your manuscript supplementary materials. The visualizations are formatted to meet journal submission requirements and can be included directly in your umbrella review without additional formatting.
Need this done professionally? Get a complete umbrella review with AMSTAR-2 quality assessment across all included reviews.
Get a Free QuoteAMSTAR-2 distinguishes between critical and non-critical items. The 7 critical domains (Items 2, 4, 7, 9, 11, 13, 15) represent methodological steps where flaws are most likely to invalidate the review's conclusions. A single critical flaw reduces confidence to Low, and two or more critical flaws produce Critically Low. Non-critical weaknesses can only reduce confidence to Moderate at worst.
Unlike the original AMSTAR which produced a numeric score (0-11), AMSTAR-2 uses a categorical algorithm. High confidence requires no flaws anywhere. Moderate allows non-critical weaknesses but no critical flaws. This means a review could have 9 non-critical weaknesses and still rate Moderate, while a single critical flaw drops it to Low regardless of other strengths.
Non-critical items (1, 3, 5, 6, 8, 10, 12, 14, 16) cover important but less decisive methodological features such as study design justification, data extraction methods, and conflict of interest reporting. A No on these items counts as a weakness but cannot alone reduce confidence below Moderate. Multiple non-critical weaknesses together move the rating from High to Moderate.
Item 1 (research question using PICO) and Item 2 (protocol registered before the review, critical) establish whether the review was planned a priori. Protocol registration protects against outcome switching and selective reporting. Reviews without protocols often modify their scope, inclusion criteria, or outcomes during data collection, introducing bias that cannot be detected from the final publication alone.
Item 4 (critical) requires a comprehensive search strategy including at least two databases, keyword and index term searches, and supplementary strategies such as reference list checking, expert contact, or grey literature searching. An inadequate search may miss relevant studies, distorting the pooled estimate and reducing the generalizability of the review's conclusions.
AMSTAR-2 assesses how well the systematic review was conducted, not the certainty of the evidence it synthesizes. A review can receive High confidence (excellent methodology) while its conclusions remain uncertain because the primary studies are few, small, or heterogeneous. Pair AMSTAR-2 with GRADE to communicate both the trustworthiness of the review process and the certainty of the findings.
AMSTAR-2 (Shea et al., 2017), published in the BMJ, replaced the original AMSTAR with a more nuanced 16-item instrument that identifies 7 critical domains and produces categorical confidence ratings (High, Moderate, Low, Critically Low) rather than numeric scores. A single critical flaw reduces confidence to Low regardless of performance on other items. Two or more critical flaws result in Critically Low. This design ensures that fundamental methodological weaknesses are never masked by strengths in minor areas. The original AMSTAR (Shea et al., 2007) used an 11-item scale that produced a numeric score, but this approach allowed reviews with serious flaws in critical areas to receive acceptable scores if they performed well on non-critical items.
The 7 critical items represent methodological steps where flaws most commonly produce misleading conclusions. Protocol registration (Item 2) prevents post-hoc modifications to eligibility criteria and outcomes. A comprehensive literature search (Item 4) ensures all relevant evidence enters the review. Justification for excluded studies (Item 7) provides transparency about what was left out and why. The risk of bias assessment technique (Item 9) ensures appropriate tools were applied to primary studies. Appropriate statistical methods (Item 11) guards against invalid pooling. Accounting for risk of bias in interpreting results (Item 13) prevents overconfident conclusions from flawed studies. Publication bias investigation (Item 15) assesses whether missing studies may have distorted the pooled estimate.
Umbrella reviews (also called overviews of reviews) synthesize evidence from multiple systematic reviews on the same or related topics. AMSTAR-2 is the standard tool for classifying the quality of included reviews in these higher-order syntheses. After classifying all included reviews, conduct sensitivity analyses restricted to reviews rated High or Moderate confidence to test robustness. If conclusions change substantially when Critically Low reviews are excluded, the overall evidence base may be unreliable. This approach mirrors the sensitivity analysis strategy used in primary meta-analyses when removing high risk of bias studies.
AMSTAR-2 complements GRADE for umbrella reviews but addresses different questions. AMSTAR-2 assesses how well each included systematic review was conducted (methodological quality of the review process), while GRADE assesses the certainty of evidence for specific outcomes across the primary studies. A systematic review can receive High AMSTAR-2 confidence but still present Low GRADE certainty evidence if the primary studies are small, inconsistent, or indirect. Conversely, a Critically Low AMSTAR-2 review might contain high-certainty GRADE evidence if the primary studies are large and consistent, though the review methodology raises concerns about completeness and bias.
The distinction between Partial Yes and No responses is important for several AMSTAR-2 items. Partial Yes typically indicates that the review authors attempted but did not fully achieve the methodological standard. For example, Item 4 (comprehensive search) receives Partial Yes if at least two databases were searched but supplementary search strategies (grey literature, reference checking) were not employed. These partial responses count as non-critical weaknesses in the confidence algorithm, meaning they reduce confidence from High to Moderate but not further.
For primary studies within your included reviews, use the RoB 2 tool for randomized trials or our GRADE evidence certainty tool to rate overall certainty. Ensure transparent reporting with the PRISMA checklist tool. When your umbrella review identifies gaps in the evidence base, plan future primary systematic reviews using structured frameworks and visualize their meta-analytic results with our forest plot generator.
AMSTAR-2 (A MeaSurement Tool to Assess systematic Reviews, version 2) is a 16-item critical appraisal instrument developed by Shea et al. (2017) and published in the BMJ. It evaluates the methodological quality of systematic reviews, including those with or without meta-analysis. Unlike the original AMSTAR, AMSTAR-2 identifies 7 critical domains and uses an overall confidence rating system (High, Moderate, Low, Critically Low) rather than a numeric score. It is the standard tool for umbrella reviews and overviews of reviews.
The 7 critical domains are: Item 2 (protocol registered before the review), Item 4 (comprehensive literature search strategy), Item 7 (list of excluded studies with justifications), Item 9 (satisfactory risk of bias assessment technique), Item 11 (appropriate statistical methods for meta-analysis), Item 13 (risk of bias accounted for in interpreting results), and Item 15 (publication bias investigation when 10 or more studies are included). A flaw in any critical domain has a disproportionate impact on the overall confidence rating.
AMSTAR-2 uses a categorical rating system. High confidence means no critical flaws and no non-critical weaknesses. Moderate confidence means more than one non-critical weakness but no critical flaws. Low confidence means one critical flaw, with or without non-critical weaknesses. Critically Low confidence means more than one critical flaw, with or without non-critical weaknesses. A critical flaw is a 'No' response on any of the 7 critical domains. A non-critical weakness is a 'No' or 'Partial Yes' on a non-critical item.
Both AMSTAR-2 and ROBIS assess the quality of systematic reviews, but they differ in scope and structure. AMSTAR-2 evaluates 16 methodological items and produces an overall confidence rating, making it well suited for umbrella reviews that need a standardized, transparent quality classification across many included reviews. ROBIS (Whiting et al., 2016) focuses on risk of bias across 4 domains with signaling questions and is often preferred when reviewers want a domain-based risk of bias judgment rather than a global quality rating. Many umbrella review protocols specify AMSTAR-2 because its categorical output (High to Critically Low) integrates naturally into GRADE-CERQual and evidence mapping frameworks.
Yes. AMSTAR-2 was explicitly designed to assess systematic reviews both with and without meta-analysis. For reviews that do not include a meta-analysis, Items 11 (appropriate statistical methods for meta-analysis), 14 (satisfactory discussion of heterogeneity), and 15 (publication bias investigation) can be marked as Not Applicable. The remaining 13 items still provide a comprehensive assessment of the review methodology, and the overall confidence rating is calculated from the applicable items.
Present AMSTAR-2 results in a summary table showing the per-review judgment for each of the 16 items and the overall confidence rating. Include a summary bar chart showing the proportion of reviews rated Yes, Partial Yes, and No for each item. Report the distribution of overall confidence ratings across all included reviews (e.g., 5 High, 8 Moderate, 3 Low, 2 Critically Low). State in the methods section that AMSTAR-2 (Shea et al., 2017) was used and that two reviewers independently assessed each review. Sensitivity analyses restricted to reviews rated High or Moderate confidence can strengthen the robustness of your conclusions.
Assess risk of bias in the primary randomized trials within your included reviews using the RoB 2 assessment tool for randomized controlled trials. Rate the certainty of evidence for each outcome using the GRADE evidence certainty tool. Ensure your umbrella review reporting meets PRISMA standards with the PRISMA checklist tool for scoping and systematic reviews.
Reviewed by
Dr. Sarah Mitchell holds a PhD in Biostatistics from Johns Hopkins Bloomberg School of Public Health and has over 15 years of experience in systematic review methodology and meta-analysis. She has authored or co-authored 40+ peer-reviewed publications in journals including the Journal of Clinical Epidemiology, BMC Medical Research Methodology, and Research Synthesis Methods. A former Cochrane Review Group statistician and current editorial board member of Systematic Reviews, Dr. Mitchell has supervised 200+ evidence synthesis projects across clinical medicine, public health, and social sciences. She reviews all Research Gold tools to ensure statistical accuracy and compliance with Cochrane Handbook and PRISMA 2020 standards.
We conduct full risk of bias assessments, GRADE evaluations, and complete systematic reviews with rigorous methodology that satisfies peer reviewers. Average turnaround: 2-4 weeks.