GRADE Evidence Certainty Tool

Free

Rate the certainty of evidence using the GRADE framework. Assess five downgrading domains and three upgrading factors for each outcome, then generate a publication-ready Summary of Findings table.

Load sample data to see how the tool works, or clear all fields to start fresh.

Outcome name

Studies

Participants

Patients (Intervention)

Patients (Control)

Effect Data

Measure

Effect Estimate

CI Lower

CI Upper

Baseline Risk(per 1000)

For RR/OR/HR: events per 1000 in control group

Starting certainty (study design)

Downgrading Factors

Risk of Bias

Footnote

Inconsistency

Footnote

Indirectness

Footnote

Imprecision

Footnote

Publication Bias

Footnote

Next step

GRADE table started. Want a full Summary of Findings with expert grading?

A PhD methodologist grades certainty across all outcomes and delivers a Cochrane-style Summary of Findings table.

Our promise: Free rework on search, screening, or synthesis if reviewers push back.

Quote in minutesPay only after you approve scopePhD methodologistPRISMA 2020 + Cochrane HandbookNDA available on request

Quote my systematic review WhatsApp

Timeline

Most projects deliver in under 2 weeks. We confirm an exact date in your quote.

If reviewers push back

If reviewers question the search, screening, or synthesis, we rework the section free.

Confidentiality

NDA available on request before scope discussion. Your data, study design, and manuscript stay private either way.

How to Use This Tool

Add Outcomes

Add each outcome from your systematic review. Name it clearly (e.g., 'All-cause mortality', 'Quality of life') and select the starting certainty based on study design.

Rate Downgrading Domains

For each outcome, rate the five downgrading domains: risk of bias, inconsistency, indirectness, imprecision, and publication bias. Add footnotes to justify each rating.

Consider Upgrading

For observational evidence, evaluate three upgrading factors: large effect magnitude, dose-response gradient, and plausible confounding that would reduce the effect.

Generate SoF Table

Review the auto-calculated certainty ratings in the Summary of Findings view. Export the complete table as CSV or copy it as formatted text for your manuscript.

Want a PhD methodologist to handle the whole project?

Get a complete GRADE assessment with Summary of Findings tables. Free rework on synthesis or GRADE assessment if reviewers push back. Pay only after you approve scope.

WhatsApp Quote my review + meta-analysis

Key Takeaways for GRADE Assessment

GRADE operates at the outcome level

Unlike study-level tools like RoB 2 or the Newcastle-Ottawa Scale, GRADE rates the certainty of a body of evidence for each specific outcome. A single systematic review may have high-certainty evidence for one outcome and very-low-certainty evidence for another.

Each downgrade requires explicit justification

Transparent reporting is fundamental to GRADE. Every downgrading or upgrading decision must include a footnote explaining the rationale, for example specifying which studies had high risk of bias, or why confidence intervals were deemed imprecise relative to a clinically meaningful threshold.

Starting certainty depends on study design

Evidence from randomized controlled trials starts at high certainty and can only be downgraded. Observational evidence starts at low certainty and can be both downgraded and upgraded. This distinction reflects the fundamental difference in susceptibility to confounding between these designs.

SoF tables are required by Cochrane

The Cochrane Handbook mandates Summary of Findings tables for all Cochrane Reviews. These tables present the GRADE assessment alongside effect estimates, making the strength of evidence immediately clear to clinicians, policymakers, and guideline panels who rely on systematic review conclusions.

Understanding GRADE in Systematic Reviews

The GRADE certainty of evidence tool implements the framework that Guyatt et al. (2008) introduced to bring transparency and consistency to evidence appraisal in systematic reviews and clinical practice guidelines. GRADE classifies the certainty of a body of evidence into four levels (high, moderate, low, and very low) by evaluating five downgrading domains (risk of bias, inconsistency, indirectness, imprecision, and publication bias) and three upgrading factors (large effect magnitude, dose-response gradient, and plausible residual confounding that would reduce the observed effect). The Cochrane Handbook (Higgins et al., 2023, Chapter 14) mandates GRADE assessments for every Cochrane Review, and over 110 organizations worldwide, including the WHO, the BMJ, and NICE, have formally endorsed the framework. GRADEpro GDT is the official software for creating Summary of Findings tables and interactive evidence profiles, streamlining the process of documenting downgrading and upgrading decisions in a structured format. Because GRADE operates at the outcome level rather than the study level, reviewers must first complete study-level quality evaluations using tools such as our RoB 2 assessment tool before synthesizing those judgments into an overall certainty rating.

A GRADE assessment online workflow begins by establishing the starting certainty based on study design: evidence from randomized controlled trials starts at high certainty, while observational evidence starts at low certainty (Guyatt et al., 2011). Schünemann et al. (2023) emphasize that each downgrading or upgrading decision requires an explicit footnote justifying the rationale, for example specifying which studies contributed serious risk of bias or why the pooled confidence interval was deemed imprecise relative to a clinically meaningful threshold. The imprecision domain introduces the optimal information size concept: if the cumulative sample across included studies falls below what a single adequately powered trial would require, imprecision concerns persist regardless of statistical significance. Bayesian reanalysis can serve as a sensitivity check for borderline imprecision judgments by quantifying the posterior probability that the true effect exceeds a clinically meaningful threshold, and the evolving minimally contextualized framework (sometimes called GRADE 2.0) encourages reviewers to anchor imprecision thresholds to clinical decision-making contexts rather than relying solely on statistical significance. To interpret the pooled effect estimates that GRADE certainty applies to, reviewers can visualize results using our forest plot generator and compute standardized metrics with our effect size calculator.

The summary of findings table generator embedded in this tool produces the structured output that PRISMA 2020 (Page et al., 2021) recommends for presenting the results of evidence synthesis. A Summary of Findings (SoF) table displays, for each outcome, the number of studies and participants, the relative and absolute effect estimates, the GRADE certainty rating, and footnotes explaining every judgment. This format makes the strength of evidence immediately accessible to clinicians, policymakers, and guideline panels who rely on systematic review conclusions to inform practice. The MAGIC app extends this workflow by enabling guideline developers to link GRADE evidence profiles directly to clinical recommendations, creating living guidelines that update as new evidence emerges. For outcomes expressed as binary events, translating relative measures into absolute terms improves clinical interpretability. Our number needed to treat calculator converts odds ratios and risk ratios into NNT values that clinicians can apply directly at the bedside.

Integrating GRADE into a systematic review requires coordination across multiple analytical steps. The risk-of-bias domain draws on study-level assessments from RoB 2 for randomized trials and from tools like ROBINS-I for non-randomized studies of interventions. The inconsistency domain examines statistical heterogeneity, measured by I² and Cochran's Q, to determine whether variability across study results exceeds what chance alone would explain. The indirectness domain evaluates whether the available evidence directly addresses the review question in terms of population, intervention, comparator, and outcome. Schünemann et al. (2023) stress that GRADE judgments should reflect the totality of these considerations rather than relying on any single domain in isolation, ensuring that the final certainty rating communicates a coherent, evidence- grounded assessment of confidence in the estimated effect. For network meta-analysis, the CINeMA (Confidence in Network Meta-Analysis) framework adapts the GRADE domains to handle the additional complexity of indirect comparisons and network geometry, ensuring that certainty ratings account for intransitivity and incoherence across the evidence network.

Frequently Asked Questions

What is the GRADE framework?

GRADE (Grading of Recommendations, Assessment, Development and Evaluations) is the most widely adopted system for rating the certainty of evidence in systematic reviews. Developed by an international working group, GRADE categorizes evidence into four levels (high, moderate, low, and very low) based on the study design and five downgrading domains: risk of bias, inconsistency, indirectness, imprecision, and publication bias. Evidence from RCTs starts at high certainty and can be downgraded, while observational evidence starts at low certainty and can be upgraded.

What are the five downgrading domains in GRADE?

The five domains that can lower certainty are: (1) Risk of bias, meaning methodological limitations in the included studies, such as lack of blinding or incomplete follow-up; (2) Inconsistency, meaning unexplained heterogeneity in results across studies; (3) Indirectness, meaning differences between the review question and the available evidence in population, intervention, comparator, or outcome; (4) Imprecision, meaning wide confidence intervals or insufficient sample size, often evaluated against an optimal information size; (5) Publication bias, meaning systematic non-publication of studies with unfavorable results, assessed through funnel plots or statistical tests.

When can evidence be upgraded in GRADE?

Observational evidence (which starts at low certainty) can be upgraded by one or two levels through three factors: (1) Large magnitude of effect, where a risk ratio greater than 2 or less than 0.5 upgrades by one level, while greater than 5 or less than 0.2 upgrades by two levels, provided there is no plausible confounding; (2) Dose-response gradient, meaning a clear relationship between the amount of exposure and the magnitude of effect; (3) Plausible confounding that would reduce the demonstrated effect, where all plausible biases would work against the observed effect, strengthening confidence in the finding.

What is a Summary of Findings (SoF) table?

A Summary of Findings table is a structured presentation of the key results of a systematic review, recommended by Cochrane and the GRADE working group. It presents, for each outcome: the number of studies and participants, the GRADE certainty rating with justification, the relative and absolute effect estimates, and footnotes explaining each downgrading or upgrading decision. SoF tables make the conclusions of a systematic review transparent and accessible to decision-makers, clinicians, and guideline panels.

How does GRADE differ from other quality assessment tools?

Unlike tools such as the Newcastle-Ottawa Scale or RoB 2, which assess individual study quality, GRADE rates the certainty of a body of evidence for a specific outcome across all studies. It operates at the outcome level, not the study level. GRADE also integrates multiple dimensions beyond internal validity; it considers the directness of evidence, precision of estimates, and consistency across studies. This makes GRADE complementary to study-level tools: you would use RoB 2 or NOS to assess each study, then use GRADE to synthesize those assessments into an overall certainty rating.

What is the difference between GRADE and risk of bias assessment?

Risk of bias (e.g., RoB 2) evaluates individual studies, while GRADE evaluates the overall body of evidence across all studies for a specific outcome. Risk of bias is one of five GRADE domains; the others are inconsistency, indirectness, imprecision, and publication bias. A single high-risk study does not automatically downgrade GRADE; the decision depends on how many studies are affected and their influence on the pooled estimate.

Can GRADE be used for observational studies?

Yes. Observational evidence starts at low certainty (two levels below RCTs) and can be further downgraded for the same five domains. Uniquely, observational evidence can also be upgraded if it shows a large effect (e.g., RR > 2), a dose-response gradient, or if plausible confounders would reduce the observed effect. Upgrading is rare and requires strong justification.

Related Research Tools

GRADE assessments build on study-level quality evaluations. Use our RoB 2 assessment tool for randomized trials to systematically evaluate bias domains before feeding those judgments into your GRADE risk-of-bias rating. To visualize the pooled estimates that GRADE certainty applies to, create publication-ready figures with our effect size calculator. For converting between effect measures needed in your SoF table, the NNT calculator translates odds ratios and risk ratios into number needed to treat for clinical interpretation.

Reviewed by

Dr. Sarah Mitchell

PhD, Biostatistics & Research Methodology

Dr. Sarah Mitchell holds a PhD in Biostatistics from Johns Hopkins Bloomberg School of Public Health and has over 15 years of experience in systematic review methodology and meta-analysis. She has authored or co-authored 40+ peer-reviewed publications in journals including the Journal of Clinical Epidemiology, BMC Medical Research Methodology, and Research Synthesis Methods. A former Cochrane Review Group statistician and current editorial board member of Systematic Reviews, Dr. Mitchell has supervised 200+ evidence synthesis projects across clinical medicine, public health, and social sciences. She reviews all Research Gold tools to ensure statistical accuracy and compliance with Cochrane Handbook and PRISMA 2020 standards.

Learn more about our team

Quality Assessment Is Complex. Our Experts Handle It Daily.

We conduct full risk of bias assessments, GRADE evaluations, and complete systematic reviews with rigorous methodology that satisfies peer reviewers. Most projects deliver in under 2 weeks.

Our promise: Free rework on search, screening, or synthesis if reviewers push back.

4.9 / 5 across 1,194+ projectsQuote in minutesPRISMA 2020 + Cochrane HandbookPhD methodologistPay only after you approve scopeNDA available on request

Quote my systematic review Chat on WhatsApp

You Shape What We Build Next

How to Use This Tool

Add Outcomes

Add each outcome from your systematic review. Name it clearly (e.g., 'All-cause mortality', 'Quality of life') and select the starting certainty based on study design.

Rate Downgrading Domains

For each outcome, rate the five downgrading domains: risk of bias, inconsistency, indirectness, imprecision, and publication bias. Add footnotes to justify each rating.

Consider Upgrading

For observational evidence, evaluate three upgrading factors: large effect magnitude, dose-response gradient, and plausible confounding that would reduce the effect.

Generate SoF Table

Review the auto-calculated certainty ratings in the Summary of Findings view. Export the complete table as CSV or copy it as formatted text for your manuscript.

Key Takeaways for GRADE Assessment

GRADE operates at the outcome level

Each downgrade requires explicit justification

Starting certainty depends on study design

SoF tables are required by Cochrane

Understanding GRADE in Systematic Reviews

Frequently Asked Questions

What is the GRADE framework?

What are the five downgrading domains in GRADE?

When can evidence be upgraded in GRADE?

What is a Summary of Findings (SoF) table?

How does GRADE differ from other quality assessment tools?

What is the difference between GRADE and risk of bias assessment?

Can GRADE be used for observational studies?

Related Research Tools

Quality Assessment Is Complex. Our Experts Handle It Daily.

We conduct full risk of bias assessments, GRADE evaluations, and complete systematic reviews with rigorous methodology that satisfies peer reviewers. Most projects deliver in under 2 weeks.

Our promise: Free rework on search, screening, or synthesis if reviewers push back.

4.9 / 5 across 1,194+ projectsQuote in minutesPRISMA 2020 + Cochrane HandbookPhD methodologistPay only after you approve scopeNDA available on request