Assess risk of bias and applicability concerns in diagnostic test accuracy studies using the QUADAS-2 framework (Whiting et al., 2011), the gold standard for Cochrane DTA reviews.
Add your diagnostic accuracy studies and enter their names. Click each colored circle to cycle through judgments: + Low, − High, ? Unclear, N/A. The table has two sections: Risk of Bias (4 domains) and Applicability Concerns (3 domains). Domain 4 (Flow and Timing) has no applicability assessment per the QUADAS-2 framework.
Load sample data to see how the tool works, or clear all fields to start fresh.
| Study | Risk of Bias | Applicability Concerns | ||||||
|---|---|---|---|---|---|---|---|---|
D1 Patient Selection | D2 Index Test | D3 Reference Standard | D4 Flow and Timing | A1 Patient Selection | A2 Index Test | A3 Reference Standard | ||
QUADAS-2 (Whiting PF et al., 2011) • Generated with Research Gold
Click Add Study to create a row for each primary diagnostic accuracy study included in your systematic review. Enter study identifiers (e.g., Author Year) so the traffic light table clearly maps to your PRISMA flow diagram and data extraction forms.
Assess whether consecutive or random enrollment was used, whether a case-control design was avoided, and whether inappropriate exclusions were applied. Assign Low, High, or Unclear risk of bias, then rate the applicability of the patient population to your review question.
Evaluate whether the index test results were interpreted without knowledge of the reference standard results and whether a pre-specified threshold was used. Consider whether the conduct of the index test, its technology, or its interpretation differs from your review question for applicability.
Determine whether the reference standard correctly classifies the target condition and whether its results were interpreted without knowledge of the index test results. Assess applicability by considering whether the reference standard definition matches your review question.
Evaluate whether all patients received the same reference standard, whether all patients were included in the analysis, and whether the interval between the index test and reference standard was appropriate. This domain has no applicability concern rating.
Review the traffic light plot showing per-study, per-domain judgments and the summary bar chart showing proportions at each risk level. Download both as high-resolution PNGs for your manuscript and copy the auto-generated methods paragraph.
Need this done professionally? Get a complete systematic review or meta-analysis handled end-to-end.
Get a Free QuoteQUADAS-2 produces seven separate judgments per study: four risk of bias ratings (Patient Selection, Index Test, Reference Standard, Flow and Timing) and three applicability concern ratings (Patient Selection, Index Test, Reference Standard). Flow and Timing has no applicability rating because it relates to study conduct rather than relevance to the review question.
Each domain includes signaling questions that prompt assessors to consider specific methodological features before assigning a judgment. These questions are answered Yes, No, or Unclear, and the pattern of answers informs the overall domain judgment. This structured approach improves inter-rater reliability compared to global assessments without guidance.
QUADAS-2 does not prescribe a formal algorithm for an overall risk of bias judgment across all domains. Most systematic reviews classify a study as high overall risk if any single domain is rated high. However, review teams should pre-specify their decision rule and apply it consistently across all included studies.
An Unclear judgment indicates insufficient information to determine whether bias is present. High proportions of Unclear ratings suggest inadequate reporting in the primary studies rather than absence of bias. When interpreting results, consider conducting sensitivity analyses that treat Unclear as either Low or High to test the impact on your conclusions.
QUADAS-2 findings directly inform sensitivity analyses in DTA meta-analysis. When pooling sensitivity and specificity using bivariate or HSROC models, restrict analyses to studies at low risk of bias to assess whether the pooled estimate changes. A substantial shift indicates that methodological quality is influencing the accuracy estimates.
The summary bar chart shows what proportion of included studies are at Low, High, or Unclear risk for each domain. Domains where most studies are High or Unclear warrant particular attention in your discussion section and may justify downgrading certainty of evidence within the GRADE framework for diagnostic tests.
QUADAS-2 (Whiting et al., 2011) organizes quality assessment into four domains: Patient Selection, Index Test, Reference Standard, and Flow and Timing. Each domain uses signaling questions to guide transparent judgments of risk of bias, while the first three domains also receive applicability concern ratings. This structured approach replaced the original 14-item QUADAS checklist (Whiting et al., 2003), which lacked a clear pathway from item responses to domain-level conclusions. PRISMA-DTA (McInnes et al., 2018) requires authors to present QUADAS-2 results for all included studies in diagnostic test accuracy systematic reviews.
The Patient Selection domain evaluates whether the study enrolled a consecutive or random sample and whether it avoided inappropriate exclusions that could distort accuracy estimates. Case-control designs, where known diseased patients are compared with known healthy controls, tend to overestimate diagnostic accuracy because spectrum effects are eliminated. The Index Test domain focuses on whether test interpretation was blinded to the reference standard result and whether a pre-specified threshold was applied. Post-hoc threshold selection inflates sensitivity and specificity by optimizing the cutoff to the available data rather than validating it prospectively.
The Reference Standard domain assesses whether the reference test correctly classifies the target condition and whether its results were interpreted independently of the index test. Incorporation bias, where the index test forms part of the reference standard, artificially inflates agreement between the two tests. The Flow and Timing domain evaluates whether all enrolled patients received the same reference standard and whether the time interval between tests was appropriate. Differential verification (using different reference standards for different patients) and excessive delay between tests can both distort accuracy estimates in unpredictable directions.
After completing your assessment, investigate the influence of study quality on pooled accuracy by conducting sensitivity analyses restricted to studies at low risk of bias. This is particularly important when constructing summary receiver operating characteristic (SROC) curves or pooling sensitivity and specificity using bivariate models (Reitsma et al., 2005). Calculate diagnostic accuracy metrics for individual studies using our diagnostic accuracy calculator. For randomized controlled trials, use the RoB 2 assessment tool, and for non-randomized intervention studies, use the ROBINS-I tool.
QUADAS-2 findings feed directly into GRADE assessments of diagnostic evidence (Defined by Schunemann et al., 2020, in the Cochrane Handbook Chapter 8). A high proportion of studies at high risk of bias provides grounds for downgrading certainty by one or two levels depending on the severity and consistency of the bias across included studies. Present both the traffic light table (per-study domain judgments) and the summary bar chart (proportion at each risk level per domain) in your manuscript as recommended by Cochrane DTA guidelines.
Two independent reviewers should complete assessments for every study, with disagreements resolved by discussion or a third reviewer. Pilot the tailored signaling questions on 3 to 5 studies before full application to ensure consistent interpretation across your review team. Document any modifications to the standard signaling questions in your protocol and supplementary materials. Visualize your pooled diagnostic accuracy results alongside your QUADAS-2 findings using our forest plot generator to present sensitivity and specificity estimates with confidence intervals for each included study.
QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies, revised) is a structured tool for assessing risk of bias and applicability concerns in primary diagnostic accuracy studies included in systematic reviews. Developed by Whiting et al. (2011) and published in Annals of Internal Medicine, QUADAS-2 replaced the original QUADAS checklist (2003) with a domain-based framework that uses signaling questions to guide transparent, reproducible judgments. It is the recommended tool for all Cochrane diagnostic test accuracy (DTA) reviews.
QUADAS-2 assesses four domains: Patient Selection, Index Test, Reference Standard, and Flow and Timing. The first three domains are evaluated for both risk of bias and applicability concerns, producing seven separate judgments per study (four risk of bias ratings plus three applicability ratings). The fourth domain, Flow and Timing, is assessed only for risk of bias because it relates to the conduct of the study rather than the relevance of its findings to the review question.
Risk of bias refers to methodological flaws in how the study was designed, conducted, or analyzed that could distort estimates of diagnostic accuracy (for example, partial verification bias or differential verification). Applicability concerns refer to the degree to which the study matches the review question in terms of patient characteristics, index test conduct, or reference standard definition. A study can have low risk of bias but high applicability concern if it was well conducted but enrolled a population that differs meaningfully from the target population of the review.
Use QUADAS-2 when your systematic review evaluates the accuracy of a diagnostic or screening test (sensitivity, specificity, predictive values, likelihood ratios, AUC). Use RoB 2 when your review evaluates the effect of an intervention in randomized controlled trials. The two tools address fundamentally different study designs: QUADAS-2 is built for cross-sectional or cohort diagnostic accuracy studies, while RoB 2 is built for randomized experiments. If your review includes both diagnostic accuracy studies and intervention trials, apply the appropriate tool to each study type.
Report QUADAS-2 results using a traffic light table showing per-study, per-domain judgments (Low, High, or Unclear risk of bias and Low, High, or Unclear applicability concern) and a summary bar chart showing the proportion of studies at each risk level for each domain. PRISMA-DTA (McInnes et al., 2018) requires authors to present risk of bias and applicability results for all included studies. Include the completed assessment as a supplementary table and reference the QUADAS-2 citation (Whiting et al., 2011) in your methods section.
Yes. QUADAS-2 is appropriate for any study that evaluates the accuracy of a test against a reference standard, whether the test is used for diagnostic confirmation, screening, staging, monitoring, or prognosis. The key requirement is that the study reports data from which two-by-two tables (true positives, false positives, false negatives, true negatives) can be constructed. The signaling questions may need tailoring to the specific review question, and Cochrane encourages review teams to pilot the tailored tool on a subset of studies before full application.
Calculate sensitivity, specificity, positive and negative likelihood ratios, and AUC for your diagnostic accuracy studies using our diagnostic accuracy calculator. For randomized controlled trials, assess bias with the RoB 2 assessment tool. For non-randomized studies of interventions, use the ROBINS-I tool for non-randomized studies. Visualize your pooled diagnostic accuracy estimates with our forest plot generator for meta-analysis.
Reviewed by
Dr. Sarah Mitchell holds a PhD in Biostatistics from Johns Hopkins Bloomberg School of Public Health and has over 15 years of experience in systematic review methodology and meta-analysis. She has authored or co-authored 40+ peer-reviewed publications in journals including the Journal of Clinical Epidemiology, BMC Medical Research Methodology, and Research Synthesis Methods. A former Cochrane Review Group statistician and current editorial board member of Systematic Reviews, Dr. Mitchell has supervised 200+ evidence synthesis projects across clinical medicine, public health, and social sciences. She reviews all Research Gold tools to ensure statistical accuracy and compliance with Cochrane Handbook and PRISMA 2020 standards.
Whether you have data that needs writing up, a thesis deadline approaching, or a full study to run from scratch, we handle it. Average turnaround: 2-4 weeks.