Compute sensitivity, specificity, PPV, NPV, likelihood ratios, diagnostic odds ratio, Youden's index, accuracy, and F1 score from a 2×2 table with Wilson score 95% confidence intervals.
Enter the four cells of a 2×2 diagnostic table comparing index test results against the reference standard.
Se = TP/(TP+FN), Sp = TN/(TN+FP), PPV = TP/(TP+FP), NPV = TN/(TN+FN). Wilson score 95% CIs for all proportions.
Fill in the true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN) from your diagnostic study.
See sensitivity, specificity, PPV, NPV, likelihood ratios, DOR, Youden’s index, accuracy, prevalence, and F1 score computed instantly.
All proportions include Wilson score 95% CIs. Likelihood ratios and DOR use log-based CIs for correct coverage.
Copy all results to your clipboard for reporting, load the example data to explore, or reset to start fresh.
Increasing sensitivity typically decreases specificity and vice versa. The optimal balance depends on the clinical context: screening tests favor high sensitivity (to catch all cases), while confirmatory tests favor high specificity (to minimize false positives).
Unlike PPV and NPV, likelihood ratios do not depend on disease prevalence. LR+ > 10 and LR− < 0.1 provide strong diagnostic evidence (Jaeschke et al., 1994). This makes LRs more generalizable across populations than predictive values.
The diagnostic odds ratio combines sensitivity and specificity into one measure (DOR = TP×TN / FP×FN). A DOR of 1 means no discrimination. DOR is useful for meta-analysis of diagnostic tests but does not indicate the direction of the trade-off.
Even an excellent test (Se = 0.99, Sp = 0.99) has a PPV of only 50% when prevalence is 1%. Always report prevalence alongside PPV/NPV, or use likelihood ratios to communicate test performance across populations.
A diagnostic accuracy calculator transforms a 2×2 contingency table — comparing index test results against a reference standard — into the full set of metrics that clinicians and systematic reviewers require. The sensitivity specificity calculator computes the probability of a positive test given disease is present (sensitivity) and the probability of a negative test given disease is absent (specificity). These paired metrics are the foundation of Cochrane Diagnostic Test Accuracy (DTA) Reviews (Deeks et al., 2023), which pool sensitivity and specificity separately using bivariate or hierarchical summary ROC models to account for the threshold effect — the inherent trade-off between sensitivity and specificity as the positivity threshold changes. The bivariate model (Reitsma et al., 2005) jointly estimates pooled sensitivity and specificity with their correlation, while the hierarchical summary ROC (HSROC) model (Rutter & Gatsonis, 2001) parameterises the underlying ROC curve directly; both approaches are implemented in software such as RevMan, Meta-DiSc, and the R mada package.
The likelihood ratio calculator derives LR+ (sensitivity / (1 − specificity)) and LR− ((1 − sensitivity) / specificity) from the same table. Likelihood ratios express how much a test result changes the pre-test probability of disease: LR+ above 10 or LR− below 0.1 provides strong diagnostic evidence (Deeks & Altman, 2004). The diagnostic odds ratio (DOR = LR+ / LR−) is a single summary of test discriminative ability, while Youden's J index (sensitivity + specificity − 1) identifies the optimal threshold when multiple cut-points are evaluated. The Fagan nomogram provides a visual method for Bayesian updating: by drawing a line from the pre-test probability through the likelihood ratio, clinicians can read off the post-test probability directly, making LRs immediately actionable at the point of care. Predictive values — PPV and NPV — depend on disease prevalence in the tested population, which is why this tool requires the 2×2 cell counts rather than pre-computed sensitivity and specificity.
This 2x2 table calculator connects directly to the broader evidence synthesis workflow. The same contingency table structure underlies our chi-square and Fisher's exact test calculator, which tests statistical association between test classification and disease status. When test-and-treat strategies are evaluated, likelihood ratios can be combined with baseline risk estimates in our NNT calculator to determine the number of patients who need to be tested and treated for one additional patient to benefit. For Bayesian approaches to updating diagnostic probabilities, our Bayes factor calculator quantifies evidence strength under competing hypotheses.
When reporting diagnostic accuracy in a systematic review, PRISMA-DTA (McInnes et al., 2018) requires presenting sensitivity and specificity for each study alongside 95% confidence intervals, paired forest plots (one for sensitivity, one for specificity), and if applicable, a summary ROC curve. Individual study forest plots can be generated using our forest plot generator. Always report the reference standard used, the spectrum of disease severity in the study population, and whether index test interpretation was blinded to the reference standard result — factors that the QUADAS-2 risk of bias tool evaluates for diagnostic accuracy studies. Spectrum bias (when the enrolled case mix differs from the target clinical population) and partial verification bias (when only test-positive patients receive the reference standard) are among the most common threats to DTA study validity and should be explicitly assessed and reported.
Sensitivity (true positive rate) is the probability that a test correctly identifies patients who have the disease: Se = TP/(TP+FN). Specificity (true negative rate) is the probability that a test correctly identifies patients who do not have the disease: Sp = TN/(TN+FP). A highly sensitive test rarely misses true cases (few false negatives), while a highly specific test rarely misclassifies healthy individuals (few false positives).
Likelihood ratios quantify how much a test result changes the probability of disease. A positive likelihood ratio (LR+) above 10 provides strong evidence to rule in a diagnosis, while an LR+ of 5–10 provides moderate evidence (Jaeschke et al., 1994). A negative likelihood ratio (LR−) below 0.1 provides strong evidence to rule out a diagnosis. LR values close to 1 indicate the test provides little diagnostic information.
Positive predictive value (PPV) is the probability that a person with a positive test truly has the disease. PPV depends not only on the test’s sensitivity and specificity but also on the prevalence (pre-test probability) of the disease in the population being tested. Even a highly specific test will have a low PPV when used in a low-prevalence population because most positive results will be false positives. This is why screening tests perform differently in different populations.
The diagnostic odds ratio (DOR = LR+/LR−) is a single summary measure that combines sensitivity and specificity into one number. It is particularly useful in meta-analyses of diagnostic test accuracy (DTA reviews) because it can be pooled across studies using standard meta-analytic methods. A DOR of 1 indicates the test has no discriminatory power, while higher values indicate better discriminatory ability. However, DOR does not convey the trade-off between sensitivity and specificity.
Verification bias (also called work-up bias) occurs when not all patients receive the reference standard, and the decision to verify depends on the test result. This typically inflates sensitivity and deflates specificity. To address it, report whether all patients received the reference standard, consider using correction methods (e.g., Begg & Greenes adjustment), and flag potential bias in your quality assessment using tools like QUADAS-2. This calculator assumes all patients received both the index test and the reference standard.
Sensitivity is the probability of a positive test given the patient has the disease (TP/(TP+FN)), while PPV is the probability of disease given a positive test (TP/(TP+FP)). Sensitivity is a fixed test property, whereas PPV depends on disease prevalence. A test with 99% sensitivity can have a PPV below 10% in low-prevalence populations.
Multiply the pre-test odds by the likelihood ratio to get post-test odds, then convert back to probability. Pre-test odds = prevalence / (1 – prevalence). Post-test odds = pre-test odds × LR. Post-test probability = post-test odds / (1 + post-test odds). This Bayesian updating process is often visualized using a Fagan nomogram.
Youden’s index (J = sensitivity + specificity – 1) summarizes a test’s discriminative ability in a single number ranging from 0 (useless) to 1 (perfect). It is most useful when selecting an optimal cut-point on a ROC curve: the threshold that maximizes J balances sensitivity and specificity equally. For clinical decisions where one error type is costlier, weighted alternatives are preferred.
Our biostatisticians specialize in Cochrane DTA reviews, SROC curves, bivariate meta-analysis, and QUADAS-2 quality assessment for your diagnostic accuracy systematic review.