Why diagnostic test accuracy meta-analysis is statistically different
A meta-analysis of intervention trials pools a single effect estimate per study (a risk ratio, an odds ratio, a mean difference). A diagnostic test accuracy review must pool two paired estimates per study at once: sensitivity and specificity. These two indices are negatively correlated within studies because they share an underlying threshold. Moving the cut-off that defines a positive test trades sensitivity for specificity. A standard univariate meta-analysis of either index alone ignores that correlation and produces summary estimates that are biased and over-precise.
The Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy resolves this with two equivalent statistical frameworks: the bivariate random-effects model (Reitsma 2005) and the hierarchical summary ROC model (Rutter and Gatsonis 2001). Both are recommended by the Cochrane Screening and Diagnostic Tests Methods Group. The 2010 chapter "Analysing and Presenting Results" by Macaskill, Gatsonis, Deeks, Harbord, and Takwoingi remains the methodological reference standard.
The bivariate random-effects model in practice
The bivariate model assumes that the logit-transformed sensitivity and the logit-transformed specificity from each study are drawn from a bivariate normal distribution with a between-study covariance term. The model produces:
- A pooled summary sensitivity and a pooled summary specificity, with confidence intervals that account for the within-study and between-study variance components.
- A summary ROC curve that traces the joint posterior across the threshold space rather than connecting two point estimates.
- A 95% confidence region around the summary point and a 95% prediction region for the next study's expected sensitivity-specificity pair.
- The between-study correlation, which is typically negative when studies used different positivity thresholds.
The Reitsma model is fitted as a generalised linear mixed model with a logit link and study identifiers as random effects. The R package mada (Doebler) implements this via the reitsma() function. The R package metafor implements the bivariate model via rma.glmm() for binary outcomes. Stata users typically rely on metandi, midas, or the metan suite with custom random-effects extensions.
When the HSROC model is preferred
The hierarchical summary ROC model (Rutter and Gatsonis 2001) is mathematically equivalent to the bivariate model when no covariates are included. It expresses test performance with three parameters: an accuracy parameter (analogous to log diagnostic odds ratio), a threshold parameter, and a shape parameter that allows the SROC curve to be asymmetric. The HSROC parameterisation is preferred when:
- The included studies used heterogeneous threshold definitions and you want to model threshold variation explicitly.
- You need to compare two diagnostic tests within the same review using meta-regression of the accuracy parameter.
- The asymmetry of the SROC curve is clinically interpretable, for example when test performance degrades faster at higher specificity than at higher sensitivity.
For most reviews with a fixed test and a small number of threshold variants, the bivariate parameterisation is more interpretable and easier to communicate to clinical collaborators.
Threshold effects and what to do about them
A negative correlation between sensitivity and specificity across studies usually indicates that included studies used different positivity thresholds, either explicitly through different cut-off values or implicitly through different reader judgement criteria. Three pragmatic strategies address threshold variation:
- Pool only studies that used the same threshold if the literature is large enough to support this restriction. The pooled estimates are then directly clinically interpretable.
- Fit the HSROC model with the threshold parameter and report the SROC curve as the primary result, with a summary operating point only as a secondary estimate.
- Report a clinically defensible operating point chosen prospectively in the protocol, for example the threshold maximising the Youden index or minimising a weighted combination of false positives and false negatives.
The Cochrane Handbook discourages the Q statistic* (the point on the SROC curve where sensitivity equals specificity) as a routine summary because it gives the wrong impression of accuracy when the SROC curve is asymmetric or when study points cluster away from the sensitivity-equals-specificity diagonal.