RoB 2 assessment is the process of evaluating risk of bias in randomized controlled trials using the revised Cochrane tool developed by Sterne et al. (2019). RoB 2 (Risk of Bias 2) provides a structured framework with five domains, signalling questions, and a judgement algorithm that produces transparent, reproducible bias evaluations for every included RCT in a systematic review. Whether you are conducting your first systematic review or updating a Cochrane review, understanding how to apply the RoB 2 tool correctly is essential for credible evidence synthesis.
RoB 2 replaced the original Cochrane risk of bias tool (Higgins et al., 2008) in response to documented limitations in reproducibility and the overuse of the "unclear" judgement category. The revised tool eliminates ambiguity by requiring assessors to answer structured signalling questions that map directly to domain-level judgements. This guide walks you through every domain, explains how to answer signalling questions, and shows how your RoB 2 results feed into GRADE certainty of evidence ratings.
What Is the RoB 2 Tool?
The RoB 2 tool is the official Cochrane instrument for assessing risk of bias in individual randomized controlled trials included in systematic reviews. Published by Sterne et al. (2019) in the BMJ, RoB 2 provides a domain-based evaluation framework that asks whether the trial result should be considered at risk of bias due to flaws in design, conduct, or reporting.
RoB 2 assesses risk of bias in RCTs across five structured domains. Each domain targets a specific aspect of trial methodology where bias can be introduced. The tool uses signalling questions within each domain to guide assessors toward a judgement of Low risk, Some concerns, or High risk of bias, replacing the old "unclear" category with more actionable classifications.
The Cochrane Handbook Chapter 8 (Higgins et al., 2023) provides detailed guidance on implementing RoB 2 within a systematic review. The tool is designed for outcome-level assessment, meaning you assess each trial separately for each outcome of interest. A trial may be at low risk of bias for its primary outcome but at high risk for a secondary outcome measured using a subjective scale.
Before beginning any RoB 2 assessment, you must specify your effect of interest, either the effect of assignment to intervention (intention-to-treat analysis) or the effect of adhering to intervention (per-protocol analysis). This choice determines which version of Domain 2 you use and affects how you evaluate deviations from intended interventions throughout the trial.
The 5 RoB 2 Domains Explained
RoB 2 organizes bias evaluation into five sequential RoB 2 domains, each addressing a distinct stage in the trial process where bias can enter. Understanding each domain in depth is critical for accurate and consistent assessment.
Domain 1: Bias Arising From the Randomization Process
This domain evaluates whether the randomization sequence was truly random and whether allocation concealment was adequate. Signalling questions ask whether the allocation sequence was random, whether it was concealed until participants were enrolled, and whether there were baseline differences suggesting problems with randomization.
Adequate randomization requires a validated method such as computer-generated random numbers or random number tables. Allocation concealment means that the person enrolling participants could not foresee the upcoming assignment. If baseline imbalances exist between groups, you must judge whether they are consistent with chance or suggest a flaw in the randomization process.
Domain 2: Bias Due to Deviations From Intended Interventions
Domain 2 evaluates whether participants, caregivers, or trial personnel were aware of intervention assignments during the trial and whether any deviations from the intended interventions affected the outcome. This is where blinding becomes relevant, the domain asks whether participants and those delivering the intervention were blinded, and whether any unblinding or co-interventions could have introduced bias.
The version of Domain 2 you use depends on your specified effect of interest. For the effect of assignment (intention-to-treat), you evaluate whether an appropriate analysis was used that estimates the effect regardless of deviations. For the effect of adhering to intervention, you evaluate whether deviations occurred and whether they were balanced between groups.
Domain 3: Bias Due to Missing Outcome Data
This domain addresses attrition, whether outcome data were available for all or nearly all participants. Missing data can introduce bias if the reasons for missingness differ between groups or are related to the outcome itself. Signalling questions ask whether data were available for all participants, whether evidence exists that the result was not biased by missing data, and whether missingness could depend on the true value of the outcome.
If more than 5% of outcome data are missing, assessors should carefully evaluate whether appropriate methods such as multiple imputation or sensitivity analyses were used to handle missingness. An intention-to-treat (ITT) analysis that includes all randomized participants reduces but does not eliminate concerns about missing data.
Domain 4: Bias in Measurement of the Outcome
Domain 4 asks whether the method of outcome measurement was appropriate, whether it was applied consistently across groups, and whether outcome assessors were aware of intervention assignments. Subjective outcomes such as pain scales or quality of life measures are more susceptible to measurement bias than objective outcomes like mortality or laboratory values.
When outcome assessors are not blinded, there is a risk that knowledge of the assigned intervention influences how they measure or interpret the outcome. This domain also evaluates whether the outcome measure itself could have been influenced by knowledge of treatment assignment, for example, if participants self-report outcomes and are aware of their group allocation.
Domain 5: Bias in Selection of the Reported Result
The final domain evaluates whether the trial report selectively presents results. Reported result selection bias occurs when investigators choose which outcomes, analyses, or subgroups to report based on the results themselves. Signalling questions ask whether the data were analyzed in accordance with a pre-specified analysis plan and whether the reported result is likely to have been selected from multiple eligible outcome measurements or analyses.
Pre-registration of the trial protocol on platforms such as ClinicalTrials.gov provides the strongest protection against selective reporting. Comparing the published report against the registered protocol helps assessors determine whether the reported results align with what was planned.
Signalling Questions and How to Answer Them
Each RoB 2 domain contains two to five signalling questions that guide the assessor toward a domain-level judgement. These questions are answered using a standardized set of response options: Yes (Y), Probably Yes (PY), Probably No (PN), No (N), and No Information (NI).
The response "Probably Yes" carries the same weight as "Yes" in the judgement algorithm, it indicates that the assessor believes the answer is likely affirmative based on available evidence, even if not explicitly stated in the trial report. Similarly, "Probably No" carries the same weight as "No." The "No Information" response indicates that the trial report provides insufficient detail to make a judgement.
Here is a summary of signalling question structure across all five domains:
| Domain | Number of Questions | Focus Area | Key Considerations |
|---|---|---|---|
| D1: Randomization | 3 | Sequence generation, concealment, baseline balance | Was allocation truly concealed? |
| D2: Deviations | 5-7 (varies by effect) | Blinding, co-interventions, analysis type | Effect of assignment vs. adhering |
| D3: Missing data | 4 | Completeness, reasons for missingness | Was missingness related to the outcome? |
| D4: Measurement | 4 | Assessor blinding, measurement method | Subjective vs. objective outcomes |
| D5: Reported result | 3 | Pre-registration, analysis plan adherence | Protocol vs. publication comparison |
The judgement algorithm maps signalling question responses to a domain-level judgement. If all questions suggest no bias concerns, the domain is judged as Low risk. If any question raises potential concerns without clear evidence of bias, the domain is judged as Some concerns. If any question indicates a definite bias problem, the domain is judged as High risk.
Answering "No Information" to critical signalling questions typically results in a "Some concerns" judgement because the inability to confirm adequate methodology leaves open the possibility of bias. Assessors should document their rationale for every response, particularly for "Probably Yes" and "Probably No" answers, to ensure transparency and reproducibility.
Making the Overall RoB 2 Judgement
After completing all five domains, the assessor must make an overall risk of bias judgement for that trial-outcome pair. The overall judgement follows a strict hierarchical rule: it is determined by the worst domain-level judgement across all five domains.
The rules are straightforward. If all five domains are judged Low risk, the overall judgement is Low risk of bias. If any domain is judged Some concerns and none are judged High risk, the overall judgement is Some concerns. If any domain is judged High risk, the overall judgement is High risk of bias. Multiple domains judged as Some concerns may also collectively warrant an overall High risk judgement if the assessor believes the combined concerns are substantial enough to meaningfully affect the result.
This hierarchical approach means that a single domain at high risk is sufficient to classify the entire trial result as high risk, even if the other four domains are at low risk. This reflects the reality that bias in any stage of the trial process can distort the result, regardless of how well other aspects were conducted.
Documentation is essential. For each domain and for the overall judgement, assessors should provide a brief free-text justification explaining the reasoning behind their judgement. This justification serves both as a record for the review team and as transparency for readers who wish to evaluate the basis for the risk of bias assessment.
A systematic review has a component relationship with its risk of bias assessment, the assessment is integral to the review's conclusions. Without a properly conducted RoB 2 assessment, the credibility of the entire evidence synthesis is compromised.
RoB 2 vs the Original Cochrane RoB Tool
The transition from the original Cochrane risk of bias tool (Higgins et al., 2008) to RoB 2 represents a substantial methodological improvement. Understanding the differences helps researchers who are familiar with the older tool adapt to the revised framework.
| Feature | Original RoB (2008) | RoB 2 (2019) |
|---|---|---|
| Structure | 7 items (sequence generation, allocation concealment, blinding of participants/personnel, blinding of outcome assessment, incomplete outcome data, selective reporting, other bias) | 5 domains with structured signalling questions |
| Response options | Low risk, High risk, Unclear risk | Low risk, Some concerns, High risk |
| Ambiguity handling | "Unclear" category widely overused | "Unclear" eliminated; NI response triggers Some concerns |
| Assessment level | Study level (often) | Outcome level (required) |
| Effect of interest | Not specified | Must specify assignment vs. adhering effect |
| Judgement process | Subjective item-level ratings | Algorithm-driven from signalling questions |
| Free-text justification | Optional | Required for transparency |
The elimination of the "unclear" judgement category addresses one of the most widely criticized aspects of the original tool. Studies using the original RoB tool frequently reported 30-50% of items as "unclear," making it difficult to draw meaningful conclusions about the overall quality of the evidence base. RoB 2 forces a more definitive evaluation by structuring the assessment through specific, answerable questions.
RoB 2 also requires assessors to specify their effect of interest before beginning the assessment, which determines how Domain 2 is evaluated. The original tool did not make this distinction explicit, leading to inconsistent assessments of the same trial depending on the assessor's implicit assumptions.
For reviews transitioning from the original tool to RoB 2 in an update, the Cochrane Handbook recommends re-assessing all included studies using RoB 2 rather than attempting to map old judgements to the new framework. The structural differences between the two tools make direct mapping unreliable.
Presenting RoB 2 Results in Your Systematic Review
Effective presentation of RoB 2 results is required by the PRISMA 2020 reporting guideline and essential for readers to evaluate the credibility of your systematic review. Two standard visualization formats have become the norm: traffic light plots and summary bar charts.
A traffic light plot displays each included study as a row with colored circles indicating the domain-level judgement for each of the five domains and the overall judgement. Green indicates Low risk, yellow indicates Some concerns, and red indicates High risk. This format allows readers to quickly identify which studies and which domains are driving bias concerns across the evidence base.
The robvis R package (McGuinness and Higgins, 2021) generates both traffic light plots and weighted summary bar charts directly from RoB 2 assessment data. You export your completed assessments from the RoB 2 Excel template, import them into robvis, and produce publication-ready figures with a few lines of code. The robvis package has become the standard tool for RoB 2 visualization in Cochrane and non-Cochrane reviews alike.
Summary bar charts show the proportion of studies at each risk of bias level for each domain. This format provides an overview of the most problematic domains across the entire evidence base. If 80% of studies are at high risk for Domain 5 (selective reporting), this signals a systematic concern that should be addressed in the Discussion section.
PRISMA 2020 requires that systematic reviews present individual study risk of bias assessments and describe how the results were used in evidence synthesis. This means your RoB 2 figures should appear in the Results section, and your Discussion should explicitly address how studies at high risk of bias affect the overall conclusions. Sensitivity analyses that exclude high-risk studies provide additional context for interpreting the robustness of your findings.
When preparing your risk of bias figures, ensure that the judgement rationale is available as supplementary material. Readers and peer reviewers increasingly expect access to the completed RoB 2 forms, not just the summary figures, to evaluate the transparency and reproducibility of your assessment. Our risk of bias assessment tool can help you organize domain-level judgements for each included study.
How RoB 2 Feeds Into GRADE
The GRADE framework assesses certainty of evidence across five domains, and risk of bias is the first domain evaluated. RoB 2 results provide the direct input for GRADE Domain 1, making the connection between study-level bias assessment and body-of-evidence certainty ratings explicit and traceable.
GRADE rates the certainty of evidence as High, Moderate, Low, or Very Low. For a body of evidence from RCTs, the starting rating is High certainty. Risk of bias can lead to downgrading by one or two levels. If most studies contributing to the pooled estimate are at low risk of bias, no downgrade is warranted. If some studies have Some concerns but the overall bias is unlikely to meaningfully affect the estimate, a one-level downgrade may be appropriate. If most studies are at high risk or the direction of bias is likely to affect the estimate, a one- or two-level downgrade applies.
The GRADE framework assesses certainty of evidence by integrating risk of bias with four additional domains: inconsistency, indirectness, imprecision, and publication bias. RoB 2 feeds directly into the first domain, and the quality of your RoB 2 assessment determines the credibility of your GRADE rating. A superficial RoB 2 assessment that rates everything as Low risk without adequate justification will undermine the entire GRADE process.
For practical guidance on applying GRADE after completing your risk of bias assessment, see our GRADE framework guide. The integration between RoB 2 and GRADE demonstrates why rigorous bias assessment is not just a methodological checkbox but a foundational step that shapes the conclusions of your entire systematic review.
Risk of bias is GRADE Domain 1, it is evaluated first because even the most precise, consistent, and direct evidence is unreliable if the underlying studies are methodologically flawed. This is why Cochrane requires RoB 2 assessment for all included RCTs before GRADE ratings are assigned.
Common RoB 2 Assessment Mistakes
Even experienced systematic reviewers make errors when applying the RoB 2 tool. Recognizing these common mistakes helps you produce more accurate and defensible assessments.
Conflating RoB 2 with ROBINS-I. RoB 2 is exclusively for randomized controlled trials. ROBINS-I is the companion tool for non-randomized studies of interventions (cohort studies, case-control studies, interrupted time series). Applying the wrong tool to the wrong study design produces meaningless results. If your systematic review includes both RCTs and observational studies, you must use both tools, RoB 2 for the trials and ROBINS-I for the non-randomized studies. For guidance on ROBINS-I, see our ROBINS-I assessment guide.
Treating "No Information" as Low risk. When a trial report does not provide sufficient detail to answer a signalling question, the correct response is "No Information", not "Probably Yes" or any affirmative response. Absence of reported information about randomization procedures does not mean randomization was adequate. It means you cannot confirm it was adequate, which typically leads to a "Some concerns" judgement. This is one of the most frequent errors and leads to systematically overoptimistic bias assessments.
Failing to specify the effect of interest. RoB 2 requires you to declare whether you are assessing the effect of assignment (intention-to-treat) or the effect of adhering to intervention (per-protocol) before beginning your assessment. The choice affects how Domain 2 is evaluated. Without this specification, your assessment lacks a clear evaluative framework and cannot be replicated.
Single-reviewer assessment. While not technically prohibited, single-reviewer RoB 2 assessment substantially reduces reliability. The Cochrane Handbook recommends dual-reviewer assessment with documented inter-rater agreement. A Cohen's kappa of 0.80 or higher indicates substantial agreement and is the standard threshold for reliability. Disagreements should be resolved through discussion or a third reviewer, with the resolution process documented.
Ignoring outcome specificity. RoB 2 is designed for outcome-level assessment. A trial may be at low risk of bias for an objective outcome such as mortality but at high risk for a subjective outcome such as self-reported pain. Assessing bias at the study level rather than the outcome level, a common shortcut, produces inaccurate results and misleading GRADE ratings.
Not using the guidance document. The RoB 2 guidance document (Sterne et al., 2019) provides detailed explanations and worked examples for every signalling question. Assessors who skip the guidance document and rely solely on the summary tool frequently misinterpret questions, particularly in Domain 2 where the effect of interest determines the applicable signalling questions.
Inconsistent calibration between reviewers. Even with dual-reviewer assessment, disagreements are inevitable if reviewers have not calibrated their interpretation of the signalling questions. Piloting the assessment on three to five studies before beginning full assessment allows reviewers to discuss their reasoning, identify areas of disagreement, and establish shared standards for the remaining studies. This calibration step is essential for achieving acceptable inter-rater reliability.
Avoiding these mistakes requires discipline, familiarity with the guidance document, and a commitment to transparency in documenting your assessment rationale. For a broader perspective on risk of bias assessment across different study designs, see our complete risk of bias guide.