ROBINS-I assessment is the standard approach for evaluating risk of bias in non-randomized studies of interventions. Developed by Sterne and colleagues in 2016 and endorsed by Cochrane, the ROBINS-I tool provides a structured framework for judging whether a cohort study, case-control study, or other non-randomized design produces results that can be trusted for clinical and policy decisions. If your systematic review includes any non-randomized evidence, ROBINS-I is the tool you need.
ROBINS-I stands for Risk Of Bias In Non-randomized Studies of Interventions. Unlike simple quality checklists that assign points, ROBINS-I asks assessors to reason through seven specific bias domains and reach a judgement for each one. The overall risk of bias rating for a study is determined by its worst domain-level judgement. This makes ROBINS-I more rigorous and more demanding than older tools, but it also produces more transparent and reproducible assessments.
What Is ROBINS-I?
ROBINS-I is a Cochrane tool for assessing bias in cohort, case-control, and other non-randomized study designs (Sterne et al., 2016). It evaluates seven domains, confounding, participant selection, intervention classification, deviations, missing data, outcome measurement, and reported result selection, benchmarked against a hypothetical target trial.
The tool was published in the BMJ by Sterne, Hernan, Reeves, Savovic, Berkman, Viswanathan, and colleagues as part of an international effort to bring the same rigor to non-randomized study assessment that the Cochrane Risk of Bias tool (now RoB 2) brought to randomized controlled trials. Before ROBINS-I, reviewers relied on ad-hoc scales and checklists that lacked a coherent theoretical foundation. ROBINS-I changed that by grounding every judgement in a comparison against an ideal randomized experiment.
The tool applies to any study that compares two or more intervention groups using non-randomized data. This includes prospective and retrospective cohort studies, case-control studies, controlled before-after designs, interrupted time series with a comparison group, and cross-sectional studies comparing exposed and unexposed populations. It does not apply to single-arm studies, uncontrolled case series, or diagnostic accuracy studies.
ROBINS-I is recommended in the Cochrane Handbook Chapter 25 (Higgins et al., 2023) as the preferred tool whenever a systematic review includes non-randomized studies of interventions. Its results feed directly into GRADE assessments, where risk of bias is one of five domains that determine the overall certainty of evidence.
The Target Trial Framework
The most important concept in ROBINS-I is the target trial framework. Before assessing any study, you must specify the hypothetical randomized controlled trial that would ideally answer your review question. Every domain-level judgement then asks: how does this study deviate from that ideal experiment?
The target trial specification includes the eligible population, the experimental and comparator interventions, the randomization and allocation procedure, the follow-up period, the primary outcome and how it would be measured, and the analysis plan. You do not need a detailed protocol for a trial that will never run. You need enough specificity to anchor your bias judgements.
This framework improves upon ad-hoc checklists because it forces assessors to think causally. Rather than asking "did the study control for confounders?" in the abstract, ROBINS-I asks "what confounders would be balanced by randomization in the target trial, and did the study adequately adjust for them?" The target trial makes the standard of comparison explicit, reducing subjective disagreement between assessors.
You specify one target trial for your entire review, not one per included study. Every study is then evaluated against the same benchmark. This ensures consistency and allows meaningful comparison of bias judgements across studies. Write the target trial specification into your systematic review protocol before you begin data extraction.
The target trial concept comes from the work of Hernan and Robins on causal inference. It recognizes that non-randomized studies are attempting to emulate a trial that was never conducted. The degree to which a study succeeds in that emulation, or fails, determines its risk of bias.
The 7 ROBINS-I Domains
ROBINS-I evaluates seven distinct sources of bias. Each domain is assessed through a series of signalling questions that guide the assessor toward a domain-level judgement. The domains are ordered to follow the chronological structure of a study, from the conditions present before intervention assignment through to the reporting of results.
Domain 1: Bias Due to Confounding
Confounding occurs when a prognostic factor that predicts the outcome also influences which intervention a participant receives. In a randomized trial, randomization balances measured and unmeasured confounders across groups. In a non-randomized study, this balance is absent by default.
The signalling questions ask whether the study identified relevant confounders, whether those confounders were measured validly, and whether appropriate analytical methods (such as regression adjustment, propensity score matching, or instrumental variable analysis) were used to control for them. A study that provides no adjustment for key confounders will typically receive a serious or critical risk of bias judgement for this domain.
Confounding is the domain where non-randomized studies are most vulnerable. Even well-designed observational studies with rigorous adjustment can only control for measured confounders. Residual and unmeasured confounding remain threats. For this reason, achieving a low risk of bias rating for confounding requires strong justification, for instance, when the intervention is assigned based on a clear threshold (creating a natural experiment) or when the study uses an instrumental variable approach.
Domain 2: Bias in Selection of Participants
Selection bias in ROBINS-I refers to bias that arises when participant inclusion in the study is related to both the intervention and the outcome. This is not the same as external validity or generalizability. It concerns whether the selection process created a distorted comparison between intervention groups.
For example, if a cohort study excludes participants who experienced the outcome before the study start date but does so differentially between groups, the resulting comparison is biased. Similarly, if participants are selected into the study based on their intervention status and their outcome status simultaneously, the analysis will produce misleading estimates.
The signalling questions ask whether the start of follow-up coincided with intervention assignment, whether selection was related to the intervention and outcome, and whether appropriate adjustments were made for any selection-related biases.
Domain 3: Bias in Classification of Interventions
Misclassification of intervention status occurs when the study incorrectly assigns participants to intervention groups. In a well-conducted RCT, intervention assignment is unambiguous. In non-randomized studies, intervention status may be determined from medical records, self-report, or administrative databases, each of which introduces the possibility of error.
This domain asks whether intervention status was well-defined and whether classification was based on information collected at or before the time of intervention. Differential misclassification, where errors in classification are related to the outcome, is particularly problematic because it can bias the effect estimate in either direction.
Domain 4: Bias Due to Deviations From Intended Interventions
Once participants are assigned to an intervention group, they may not receive or adhere to that intervention as intended. Deviations can include switching between groups, co-interventions that differ between groups, or non-adherence to the assigned treatment protocol.
ROBINS-I distinguishes between two types of effect: the effect of assignment to intervention (analogous to intention-to-treat analysis in a trial) and the effect of starting and adhering to intervention (analogous to per-protocol analysis). The signalling questions differ depending on which effect your review targets.
For the effect of assignment, the key concern is whether deviations from intended interventions occurred and whether they were balanced across groups. For the effect of adherence, the questions ask whether appropriate statistical methods (such as inverse probability weighting) were used to account for deviations.
Domain 5: Bias Due to Missing Data
Missing data can bias results when the proportion of participants with missing outcome data is substantial and when missingness is related to the true value of the outcome. In a well-conducted trial, missing data are minimized through active follow-up and reported transparently. In non-randomized studies, missing data may be more prevalent and less well-documented.
The signalling questions ask whether outcome data were available for all or nearly all participants, whether the proportion of missing data was similar across intervention groups, and whether the study used appropriate methods to handle missing data (such as multiple imputation). A study that excludes 30% of participants without explanation or analysis of the impact will typically receive a serious risk of bias judgement for this domain.
Domain 6: Bias in Measurement of Outcomes
Bias in outcome measurement occurs when the method of measuring the outcome is influenced by knowledge of the intervention received. In a blinded RCT, outcome assessors do not know which treatment participants received. In most non-randomized studies, blinding is absent.
The signalling questions ask whether the outcome measure was objective or subjective, whether assessors were blinded to intervention status, and whether measurement methods were comparable across groups. Objective outcomes (such as mortality or laboratory values) are less susceptible to measurement bias than subjective outcomes (such as pain scales or functional assessments), even without blinding.
Domain 7: Bias in Selection of the Reported Result
Reporting bias at the study level occurs when the reported result is selected from among multiple analyses, outcomes, or subgroups based on the direction or magnitude of the effect. A study that measures an outcome at multiple time points but reports only the most favorable one introduces selection bias in the reported result.
The signalling questions ask whether the outcome and analysis plan were pre-specified, whether multiple outcome measures or analytical methods were used, and whether there is evidence that the reported result was selected on the basis of the findings. Pre-registration of the study protocol provides protection against this domain, though it is less common in non-randomized research than in randomized trials.
Judgement Categories and the Overall ROBINS-I Assessment Rating
Each domain receives one of five judgement categories. These categories are more granular than the "low, some concerns, high" scale used in RoB 2 for randomized trials, reflecting the additional complexity of assessing non-randomized evidence.
| Judgement | Meaning | Implication |
|---|---|---|
| Low risk of bias | Comparable to a well-conducted RCT for this domain | No concern about bias from this source |
| Moderate risk of bias | Sound for a non-randomized study but cannot be considered comparable to a well-conducted RCT | Some concern, but unlikely to seriously alter results |
| Serious risk of bias | Important problems exist in this domain | Results may be meaningfully biased |
| Critical risk of bias | The study is too problematic in this domain to provide useful evidence | Results should not be used for the comparison of interest |
| No Information | Insufficient reporting to make a judgement | Cannot determine whether bias exists |
The overall risk of bias judgement for a study is determined by the most severe domain-level judgement. If any single domain is rated as critical, the overall judgement is critical. If the worst domain-level rating is serious, the overall rating is serious. This conservative approach reflects the principle that a chain is only as strong as its weakest link, a study with excellent outcome measurement but no confounding adjustment still produces biased estimates.
When multiple domains receive a moderate risk of bias rating, the overall judgement may be elevated to serious if the combined effect of moderate concerns across several domains is judged to substantially lower confidence in the result. This requires assessor judgement and should be documented transparently.
It is important to distinguish between "No Information" and "Low risk of bias." Studies that fail to report relevant details should not receive favorable ratings by default. A study that does not describe its confounding adjustment strategy receives a "No Information" rating for the confounding domain, not a "Low risk" rating. Treating incomplete reporting as low risk inflates confidence in the evidence.
ROBINS-I vs RoB 2, When to Use Which
The choice between ROBINS-I and RoB 2 depends entirely on the study design, not on reviewer preference. RoB 2 is designed for individually randomized trials and cluster-randomized trials. ROBINS-I is designed for non-randomized studies that compare two or more intervention groups. Using the wrong tool produces meaningless results.
| Feature | ROBINS-I | RoB 2 |
|---|---|---|
| Study designs | Cohort, case-control, cross-sectional, controlled before-after | Individually randomized, cluster-randomized trials |
| Number of domains | 7 | 5 |
| Judgement categories | Low, Moderate, Serious, Critical, NI | Low, Some Concerns, High |
| Target trial required | Yes | No (randomization provides the benchmark) |
| Confounding domain | Yes (Domain 1) | Not applicable |
| Assessment time per study | 20–40 minutes | 15–25 minutes |
| Overall judgement rule | Worst domain determines overall | Algorithm-based overall judgement |
ROBINS-I is inherently more demanding than RoB 2 because non-randomized studies face threats that randomization eliminates. The confounding domain alone adds a layer of complexity that does not exist in RoB 2. Additionally, the five-category judgement scale in ROBINS-I (compared to three in RoB 2) requires more nuanced reasoning.
When a systematic review includes both randomized and non-randomized studies, you should use RoB 2 for the RCTs and ROBINS-I for the non-randomized studies. The results can then be synthesized using a framework like GRADE, where both tools feed into the risk of bias domain of the evidence certainty assessment. For detailed guidance on RoB 2, see our RoB 2 assessment guide. For a broader overview of risk of bias across study designs, see our complete risk of bias guide.
Presenting ROBINS-I Results
Clear presentation of ROBINS-I results is essential for transparency and reproducibility. Three approaches are commonly used: traffic light plots, summary bar charts, and narrative synthesis tables.
Traffic light plots display the domain-level judgement for each study in a color-coded matrix. Green indicates low risk, yellow indicates moderate risk, red indicates serious risk, and dark red indicates critical risk. Grey indicates no information. The robvis R package (McGuinness & Higgins, 2020) generates publication-ready traffic light plots from a structured dataset. These plots provide an immediate visual overview of where the evidence is strong and where it is compromised.
Summary bar charts aggregate the proportion of studies at each risk of bias level across domains. They show, for example, that 80% of included studies have serious or critical risk of confounding bias while 60% have low risk of outcome measurement bias. These charts help identify systematic weaknesses in the evidence base.
Narrative synthesis tables combine the visual judgements with text explanations for each domain-level rating. This is particularly important when judgements are contentious or when the reasoning behind a rating is not obvious from the study alone. The Cochrane Handbook recommends documenting the rationale for each domain-level judgement, including references to specific study characteristics that informed the rating.
Your ROBINS-I results should be presented in the Results section of your systematic review, typically alongside the study characteristics table. They should also be referenced in the Discussion when interpreting findings, and in the GRADE evidence profile if you are conducting a formal certainty of evidence assessment. Use our ROBINS-I assessment tool to generate structured output that can be directly imported into your review.
Common ROBINS-I Assessment Mistakes
Even experienced reviewers make errors when applying ROBINS-I. Awareness of these common mistakes improves the reliability and credibility of your bias assessment.
Using ROBINS-I for randomized trials. ROBINS-I is exclusively for non-randomized studies. If a study randomized participants to intervention groups, use RoB 2 regardless of other design features. Applying ROBINS-I to an RCT produces inappropriate judgements because the confounding domain assumes the absence of randomization.
Ignoring the confounding domain. Some reviewers treat confounding as an afterthought, assigning moderate or low risk without examining whether the study adjusted for the key confounders specified in the target trial. Confounding is the most consequential domain for non-randomized studies. It demands careful attention to what confounders were measured, how they were controlled, and whether residual confounding remains plausible.
Not specifying the target trial. Assessing ROBINS-I domains without a clearly defined target trial removes the anchor that makes judgements consistent and reproducible. If two assessors have different mental models of the ideal experiment, they will reach different domain-level judgements. Write the target trial into your protocol and reference it explicitly during assessment.
Treating No Information as low risk. When a study does not report enough detail to assess a domain, the correct rating is No Information, not Low risk. Absent information about confounding adjustment is not evidence of adequate confounding adjustment. This mistake inflates the apparent quality of poorly reported studies and undermines the integrity of the bias assessment.
Assessing domains in isolation. While each domain is judged separately, assessors should consider interactions between domains. A study with serious missing data and serious outcome measurement bias may have compounding problems that are worse than either domain alone. The overall judgement should reflect these interdependencies.
Failing to calibrate across studies. The same standard should apply to every study in your review. If you rate one study as having serious risk of confounding because it did not adjust for age, you must apply the same judgement to every study that fails to adjust for age. Inconsistent calibration reduces the credibility of your assessment and invites criticism during peer review.
Conflating ROBINS-I with the Newcastle-Ottawa Scale. ROBINS-I and the Newcastle-Ottawa Scale serve different purposes. ROBINS-I assesses risk of bias in intervention studies through a domain-based, judgement-driven process. The Newcastle-Ottawa Scale provides a simpler, points-based quality assessment suitable for observational studies that are not focused on intervention effects. Choosing the right tool depends on your review question and the type of studies included.
Accurate ROBINS-I assessment requires training, practice, and dual-reviewer consensus. Disagreements between reviewers should be resolved through discussion or by consulting a third reviewer. Document all judgements and their rationale so that your assessment is transparent and auditable. The structured approach that ROBINS-I provides, when applied correctly, produces risk of bias assessments that strengthen the credibility of your systematic review and provide a solid foundation for evidence-based conclusions.