A second reviewer for a systematic review is required by every major evidence synthesis guideline. The Cochrane Handbook (Higgins et al., 2023) states that at least two reviewers must independently screen titles, abstracts, and full texts to minimize selection bias. PRISMA 2020 requires authors to report how many reviewers screened records and how disagreements were resolved. The Joanna Briggs Institute (JBI) mirrors this standard, requiring independent dual screening across all stages of study selection, data extraction, and critical appraisal. If you are conducting a systematic review alone, you face a serious methodological gap that peer reviewers and journal editors will identify during submission.
What Happens When You Submit a Single-Reviewer Systematic Review
Journals that publish systematic reviews evaluate methodological rigor before scientific content. A single-reviewer systematic review raises immediate red flags during editorial screening, and many manuscripts never reach peer review as a result.
Desk rejection is the most common outcome. Editors at journals indexed in MEDLINE, Scopus, and Web of Science check whether the methods section describes independent dual screening with a documented conflict resolution process. When that description is absent, the manuscript is returned without review. Journals that follow the PRISMA 2020 reporting guideline (Page et al., 2021) expect item 16a to describe the selection process, including the number of reviewers involved at each stage.
Peer reviewer criticism targets single-reviewer screening even when the manuscript passes editorial triage. Reviewers trained in evidence synthesis methodology will question the risk of selection bias, noting that a single reviewer introduces subjective judgment without any calibration check. The reviewer's recommendation will typically request that the authors repeat the screening process with a second reviewer and report inter-rater agreement statistics.
Quality assessment tools penalize single-reviewer methods. The AMSTAR 2 critical appraisal tool (Shea et al., 2017) includes Item 5, which asks whether study selection was performed in duplicate. A "no" answer on this item downgrades the overall confidence rating of the review. Similarly, the ROBIS tool (Whiting et al., 2016) evaluates study selection within its assessment of bias in the review process.
The practical consequence is clear: investing months in a systematic review only to face rejection because of a single-reviewer design is an avoidable loss that proper planning eliminates from the start.
Why Dual Screening Reduces Bias in Study Selection
The requirement for a second reviewer is not bureaucratic. It addresses a well-documented source of error in evidence synthesis.
Confirmation bias causes a solo reviewer to favor studies that align with their hypothesis or prior expectations. When two reviewers screen independently, each serves as a check on the other's judgment, making it far more likely that borderline studies receive fair evaluation rather than reflexive exclusion.
Fatigue-related errors increase as the screening workload grows. A systematic review searching multiple databases can generate 2,000 to 15,000 records after deduplication. Screening thousands of titles and abstracts in a single session leads to declining attention, increased miss rates, and inconsistent application of inclusion and exclusion criteria. A second reviewer catches the studies that a fatigued primary reviewer overlooks.
Ambiguous eligibility decisions are inherent to systematic reviews. Studies that partially meet inclusion criteria, use non-standard outcome definitions, or employ mixed-methods designs create legitimate disagreement. Two independent reviewers surface these ambiguous cases for structured discussion rather than allowing one person's judgment call to determine study inclusion silently.
Reproducibility depends on dual screening. The Cochrane Handbook (Higgins et al., 2023) emphasizes that a systematic review should be reproducible by another team following the same protocol. When two reviewers document their independent decisions, the selection process becomes transparent and verifiable. A single reviewer's decisions are opaque by definition.
Research by Edwards et al. (2002) demonstrated that single-reviewer screening missed approximately 8 percent of relevant studies compared to dual-reviewer screening. In a review with 30 included studies, that represents two or three missing studies that could change the direction and magnitude of pooled effect estimates in a meta-analysis.
Inter-Rater Reliability: How to Measure and Report Screening Agreement
Inter-rater reliability quantifies how consistently two reviewers make the same screening decisions. Reporting this statistic is expected in the methods section of every systematic review, and calculating it correctly strengthens your manuscript.
Cohen's kappa is the standard measure for inter-rater agreement in systematic reviews. Introduced by Fleiss (1971) as an extension of Cohen's original work, kappa adjusts for agreement that would occur by chance alone. The formula accounts for the base rate of inclusion and exclusion, making it more informative than simple percent agreement.
Kappa interpretation thresholds follow the scale published by Landis and Koch (1977). A kappa of 0.81 to 1.00 indicates almost perfect agreement. A kappa of 0.61 to 0.80 indicates substantial agreement, which most journals consider acceptable. A kappa of 0.41 to 0.60 indicates moderate agreement and signals the need for recalibration. Anything below 0.40 indicates poor to fair agreement and suggests fundamental problems with your eligibility criteria or reviewer training.
Calculate your kappa using our free Cohen's kappa calculator, which generates the statistic from a 2x2 contingency table of reviewer decisions and provides the interpretation alongside the raw value.
When to calculate kappa depends on your screening volume. For reviews with fewer than 500 records, calculate kappa after completing all title-and-abstract screening. For larger reviews, calculate kappa after a pilot screening of 50 to 100 records, use the result to identify and resolve disagreements in criteria interpretation, then proceed with the remaining records. Report the pilot kappa and the final kappa separately in your methods section.
Percent agreement alone is insufficient. Two reviewers screening 1,000 records and excluding 950 will achieve 95 percent agreement even if they disagree on every single included study. Kappa corrects for this prevalence-dependent inflation of raw agreement, which is why journals require it over simple percentage reporting.
Reporting template for your methods section: "Two reviewers [initials] independently screened all titles and abstracts against pre-defined eligibility criteria. Inter-rater agreement was measured using Cohen's kappa (kappa = [value], 95% CI [lower, upper]). Disagreements were resolved through discussion, with a third reviewer [initials] consulted when consensus could not be reached."