Risk of bias in a systematic review is a domain-based evaluation of whether systematic errors in study design, conduct, or reporting may distort results. Unlike quality scoring, risk of bias assessment examines specific domains, selection, performance, detection, attrition, and reporting bias, using validated tools such as RoB 2 (for RCTs), ROBINS-I (for non-randomized studies), and the Newcastle-Ottawa Scale (for observational studies).

Every systematic review must include a formal risk of bias assessment of its included studies. The Cochrane Handbook (Higgins et al., 2023, Chapter 8) mandates domain-based evaluation because a single high-risk domain can invalidate an otherwise well-conducted study. Whether you are reviewing randomized controlled trials, cohort studies, or qualitative research, selecting the correct bias assessment tool and applying it consistently determines whether your review's conclusions can be trusted. Need to assess risk of bias? Use our free RoB 2 assessment tool for RCTs or ROBINS-I tool for non-randomized studies.

What Is Risk of Bias in a Systematic Review?

Risk of bias assessment evaluates whether systematic errors in included studies may distort the review's results. It uses domain-based tools, RoB 2 for RCTs, ROBINS-I for non-randomized studies, NOS for observational studies, and feeds into GRADE certainty ratings.

Risk of bias is not the same as reporting quality or methodological quality. Reporting quality evaluates whether a study describes what was done (assessed by tools like CONSORT or STROBE). Methodological quality is a broader, less precise term that can encompass everything from study design to statistical analysis. Risk of bias specifically asks: did systematic errors in the study's design, conduct, or analysis produce results that deviate from the truth?

The Cochrane Handbook identifies five core bias domains that apply across study designs:

Each bias domain is assessed independently because problems in one domain can completely undermine a study's findings regardless of how well other domains were handled. A systematic review has as a core component risk of bias assessment, this is not optional, it is foundational to the integrity of the evidence synthesis.

Why Risk of Bias Assessment Matters

Risk of bias assessment directly determines whether your systematic review's conclusions are credible. Without it, readers and guideline developers cannot judge whether the studies supporting your findings are trustworthy.

Three authoritative frameworks require formal risk of bias evaluation. The Cochrane Handbook mandates it for all Cochrane reviews and provides detailed guidance on tool selection and application. PRISMA 2020 (Page et al., 2021) requires authors to describe the risk of bias methods used and present results for each included study. Most peer-reviewed journals now reject systematic reviews that lack risk of bias assessment, making it a practical requirement for publication.

Risk of bias is the first domain evaluated in the GRADE framework for assessing certainty of evidence. When studies contributing to a body of evidence have high risk of bias, GRADE downgrades the certainty of evidence, potentially moving it from "high" to "moderate" or lower. This directly affects whether clinical guidelines classify a recommendation as strong or conditional. Risk of bias is input to GRADE assessment, making your domain-level judgments consequential far beyond the systematic review itself.

In our systematic reviews, we have found that calibration sessions between reviewers before full assessment reduce disagreements by approximately 40%, making the process faster and more reliable.

Choosing the Right Risk of Bias Tool

Which risk of bias tool should I use? Use RoB 2 for randomized controlled trials, ROBINS-I for non-randomized studies of interventions, Newcastle-Ottawa Scale for observational cohort and case-control studies, and JBI checklists for qualitative or other study types.

The decision depends entirely on the study design of your included studies. Many systematic reviews include multiple study designs, requiring you to use more than one tool. The table below maps each tool to its intended use:

ToolStudy DesignDomains AssessedScoring MethodDeveloper
RoB 2Randomized controlled trials5 domainsLow risk / Some concerns / High riskCochrane (Sterne et al., 2019)
ROBINS-INon-randomized studies of interventions7 domainsLow / Moderate / Serious / Critical / No informationCochrane (Sterne et al., 2016)
Newcastle-Ottawa ScaleObservational (cohort, case-control)3 categories (8 items)Star-based (max 9 stars)Wells et al.
JBI ChecklistQualitative, cross-sectional, prevalence, and othersVaries by checklistYes / No / Unclear / Not applicableJoanna Briggs Institute

RoB 2 -- For Randomized Controlled Trials

RoB 2 (revised Cochrane risk of bias tool) is the standard for assessing randomized controlled trials. It replaced the original Cochrane RoB tool in 2019 and provides a more structured, signaling-question-based approach. RoB 2 assesses risk of bias in RCTs across five domains, each evaluated through a series of signaling questions that guide the assessor to a domain-level judgment.

ROBINS-I -- For Non-Randomized Studies of Interventions

ROBINS-I (Risk Of Bias In Non-randomized Studies of Interventions) was developed for studies that compare health outcomes across groups but lack randomization. ROBINS-I assesses risk of bias in non-randomized studies by evaluating seven domains organized into pre-intervention, at-intervention, and post-intervention categories. Because non-randomized studies lack the built-in protection against confounding that randomization provides, ROBINS-I is necessarily more complex than RoB 2 and demands careful consideration of each study's analytical approach to confounding control. For a full walkthrough, see our ROBINS-I assessment guide.

Newcastle-Ottawa Scale -- For Observational Studies

The Newcastle-Ottawa Scale is widely used for cohort and case-control studies. It is simpler than ROBINS-I and uses a star-based scoring system rather than domain-level judgments. NOS is particularly common in public health and epidemiology reviews where observational designs predominate. Its ease of use makes it accessible to reviewers without extensive methodological training, though this simplicity comes at the cost of less detailed domain-level insight compared to ROBINS-I.

JBI Checklists -- For Qualitative and Other Study Types

The JBI Checklist (Joanna Briggs Institute) provides JBI critical appraisal tools for study types that RoB 2, ROBINS-I, and NOS do not cover. These include qualitative studies, cross-sectional studies, prevalence studies, case reports, and diagnostic accuracy studies. Each checklist is tailored to the methodological features of its target study design. JBI checklists use a Yes/No/Unclear/Not applicable format and are freely available from the JBI website, making them the most versatile option for mixed-methods systematic reviews.

Assessing observational studies? Use our Newcastle-Ottawa Scale calculator or JBI critical appraisal tool.

RoB 2 -- Domain-by-Domain Overview

RoB 2 evaluates five domains for each RCT, with each domain containing signaling questions that lead to a structured judgment. Cochrane Handbook recommends domain-based risk of bias assessment over summary quality scores, because a single high-risk domain can invalidate an otherwise well-conducted study (Higgins et al., 2023, Chapter 8).

Domain 1: Randomization process, This domain evaluates whether the allocation sequence was truly random and whether allocation concealment was adequate. Signaling questions ask whether the sequence was generated using a validated method (computer-generated, random number tables) and whether participants and recruiters could foresee assignments. Inadequate randomization introduces selection bias at baseline.

Domain 2: Deviations from intended interventions, This domain assesses whether participants, caregivers, or study personnel were aware of intervention assignments and whether any deviations from the protocol occurred. Blinding of participants and personnel is the primary safeguard. The domain also considers whether an intention-to-treat analysis was used, which preserves the benefits of randomization even when protocol deviations occur.

Domain 3: Missing outcome data, This domain evaluates whether outcome data were available for all or nearly all randomized participants. High dropout rates, differential attrition between groups, or inappropriate handling of missing data can introduce attrition bias. The signaling questions assess both the proportion of missing data and whether missingness is likely related to the outcome.

Domain 4: Measurement of the outcome, This domain examines whether the method of outcome measurement was appropriate and whether outcome assessors were blinded to intervention status. Subjective outcomes (pain scales, clinician-rated scores) are more vulnerable to detection bias than objective outcomes (mortality, laboratory values).

Domain 5: Selection of the reported result, This domain evaluates whether the reported results were selected from multiple eligible analyses of the data. Reporting bias includes switching primary outcomes, selecting favorable subgroup analyses, or choosing between multiple measurement time points. Pre-registration of protocols (e.g., on ClinicalTrials.gov or PROSPERO) is strong evidence against selective reporting.

Each domain receives a judgment of Low risk, Some concerns, or High risk. The overall RoB 2 judgment for a study follows the worst domain, a study with four "low risk" domains and one "high risk" domain receives an overall "high risk" judgment. This reflects the principle that a single compromised domain can invalidate an entire study's findings.

ROBINS-I -- Domain-by-Domain Overview

ROBINS-I evaluates seven bias domains organized into three temporal categories that reflect when the bias could have been introduced relative to the intervention. ROBINS-I assesses risk of bias in non-randomized studies through a more granular framework than RoB 2, because non-randomized designs are inherently more vulnerable to confounding and selection effects.

Pre-intervention domains:

At-intervention domain:

Post-intervention domains:

ROBINS-I uses a five-level judgment scale: Low risk, Moderate risk, Serious risk, Critical risk, and No information. The scale is intentionally more granular than RoB 2 because non-randomized studies face more heterogeneous threats to validity. A judgment of "critical" risk means the study is too problematic to provide any useful evidence for the review question.

Newcastle-Ottawa Scale -- Scoring Guide

The Newcastle-Ottawa Scale evaluates observational study quality through a star-based system with a maximum of 9 stars across three categories. NOS evaluates observational study quality using a simpler approach than ROBINS-I, which is both its strength (ease of use) and its limitation (less granular assessment).

Selection (maximum 4 stars):

Comparability (maximum 2 stars):

Outcome (maximum 3 stars):

Common thresholds classify NOS scores as: Good quality (7-9 stars), Fair quality (4-6 stars), Poor quality (0-3 stars). However, these cutoffs are not officially validated by the tool developers, and Cochrane advises caution when using summary scores for any risk of bias tool.

When should you use NOS instead of ROBINS-I? NOS is appropriate when your review includes observational studies that are not studying interventions, for example, studies examining risk factors, prognosis, or prevalence. ROBINS-I is specifically designed for studies comparing interventions and should be preferred in that context. For a detailed walkthrough, see our Newcastle-Ottawa Scale guide.

How to Present Risk of Bias Results

Risk of bias results should be presented using visual displays (traffic light plots and summary bar charts), narrative description, and integration into sensitivity analyses. The systematic review follows PRISMA 2020 reporting guidelines for risk of bias presentation, which require both a description of methods used and study-level results.

Traffic light plots display individual domain judgments for each included study. Each cell shows a colored circle, green (low risk), yellow (some concerns/moderate), red (high risk/serious/critical), creating an at-a-glance visual summary. Traffic light plots are the standard output from RoB 2 and ROBINS-I and can be generated using the robvis R package or Cochrane's RevMan software.

Summary bar charts show the proportion of studies at each risk of bias level for each domain. These plots answer the question: across all included studies, which domains are most problematic? A bar chart showing that 80% of studies have high risk of bias in the "blinding" domain tells a different story than one where risk is evenly distributed.

Summary tables provide a compact presentation of domain-level judgments across all studies. For reviews with many included studies, tables may be placed in supplementary materials with the traffic light plot in the main text.

Sensitivity analysis by risk of bias is a critical analytical step. Always repeat your primary meta-analysis excluding high-risk-of-bias studies. If the pooled effect changes substantially, this indicates that the overall result may be driven by biased studies, a finding that must be reported in the Discussion section. Sensitivity analysis tests result robustness and is one of the most informative analyses in any systematic review. For detailed data handling, see our guide on data extraction best practices.

Common Risk of Bias Assessment Mistakes

Five errors undermine risk of bias assessments more than any others, and each one threatens the validity of your systematic review's conclusions.

Using the wrong tool for the study design. Applying RoB 2 to a cohort study or Newcastle-Ottawa Scale to a randomized trial produces meaningless results. RoB 2 is designed exclusively for RCTs. ROBINS-I is for non-randomized studies of interventions. NOS covers observational studies. JBI checklists handle qualitative and other designs. If your review includes mixed designs, you must use multiple tools.

Treating risk of bias as a quality score. Risk of bias assessment is domain-based assessment, not a summation exercise. Averaging domain judgments into a single number obscures critical information. A study with "low risk" in four domains and "high risk" in one domain is fundamentally different from a study with "some concerns" across all domains, but both might receive similar numeric scores. Cochrane explicitly recommends against summary scores.

Single-reviewer assessment. Having only one person assess risk of bias introduces subjectivity and reduces reliability. Cochrane recommends at least two independent reviewers assessing every study, with disagreements resolved through discussion or a third reviewer. In practice, inter-rater agreement for risk of bias assessment ranges from fair to moderate (kappa 0.40-0.60), which underscores why dual assessment is essential.

Not conducting sensitivity analysis by RoB level. Completing risk of bias assessment without using the results in your analysis is a missed opportunity. Sensitivity analysis by risk of bias, repeating the meta-analysis with and without high-risk studies, reveals whether your conclusions depend on potentially biased evidence.

Applying RoB 2 to non-randomized studies. RoB 2 was designed for trials with randomization. Applying it to non-randomized studies means you are not evaluating the most important sources of bias (confounding, selection into intervention groups). Use ROBINS-I instead. For guidance on the correct tool, see our ROBINS-I assessment guide.

How Risk of Bias Feeds into GRADE

Risk of bias is the first of five domains in the GRADE framework for assessing certainty of evidence, and it is often the domain with the most impact on the overall certainty rating. The GRADE framework assesses certainty of evidence across risk of bias, inconsistency, indirectness, imprecision, and publication bias.

When the majority of studies contributing to an outcome have high risk of bias, GRADE recommends downgrading the certainty of evidence by one level (e.g., from "high" to "moderate"). If risk of bias is particularly severe or pervasive, downgrading by two levels may be warranted. This downgrade directly affects clinical recommendations, a body of evidence rated "low certainty" supports only conditional (weak) recommendations, regardless of the size of the effect estimate.

The connection between risk of bias and GRADE means that your domain-level judgments have consequences beyond the systematic review itself. Guideline panels, health technology assessment agencies, and clinical decision-makers all use GRADE certainty of evidence ratings to determine how much confidence to place in your findings. Flawed risk of bias assessment, whether too lenient or too strict, propagates through to every downstream decision.

For Cochrane reviews, risk of bias results feed directly into the Summary of Findings table, which presents the GRADE assessment for each outcome. This table is often the most-read component of a Cochrane review, making accurate risk of bias assessment essential to its credibility. For a complete walkthrough of how GRADE works, see our GRADE framework guide.

RoB 2 evaluates 5 domains for each RCT: randomization process, deviations from intended interventions, missing outcome data, measurement of the outcome, and selection of the reported result (Sterne et al., 2019). These domain judgments are the raw input that determines whether your evidence receives a high, moderate, low, or very low certainty rating under GRADE.

Research Gold's systematic review service includes expert risk of bias assessment using the correct tool for your study designs, with results integrated into GRADE certainty ratings and presented in publication-ready Summary of Findings tables. For a comprehensive overview of the entire systematic review process, see our complete systematic review guide.