How many reviewers should extract data in a systematic review?

Two independent reviewers should extract data from every included study. This dual-reviewer process reduces transcription errors and subjective judgment bias. Discrepancies are resolved through discussion or by a third reviewer.

What should I do about missing data during extraction?

Contact the corresponding author by email, document the attempt and outcome, and record the data point as missing in your extraction form. If the author does not respond within two weeks, note this in your review and consider sensitivity analyses that exclude studies with missing data.

Should I extract data into Excel or a specialized tool?

Use a structured extraction form rather than a blank spreadsheet. Purpose-built forms with dropdown menus, validation rules, and predefined fields reduce errors compared to free-text Excel entry. Tools like Covidence, JBI SUMARI, or Research Gold's Extraction Template Builder provide this structure.

How do I measure agreement between reviewers?

Calculate Cohen's kappa for categorical extraction fields and intraclass correlation coefficients for continuous fields. A kappa above 0.80 indicates excellent agreement. Report the agreement statistic in your methods section.

What happens when reviewers disagree on extracted data?

First, both reviewers discuss the discrepancy and attempt to reach consensus by referring to the original study and the coding manual. If consensus is not reached, a third reviewer independently extracts the disputed data point and makes the final decision.

Methodology

8 min read

Data Extraction in Systematic Reviews: Best Practices for Accurate, Reproducible Collection

A complete guide to data extraction in systematic reviews, form design, dual-reviewer extraction, piloting, Cohen's kappa agreement, handling missing data, and free extraction template builder.

Dr. Sarah Mitchell

March 24, 2026

Build a standardized extraction form in minutes with our free Extraction Template Builder, pre-loaded with Cochrane-recommended fields.

Key Takeaways

Data extraction is the systematic process of collecting study-level information from included studies into a standardized, pre-tested form

The Cochrane Handbook recommends dual-reviewer extraction with independent data collection followed by consensus to reduce transcription errors

Pilot the extraction form on 3-5 diverse studies before full extraction to identify ambiguous fields and improve consistency

Need professional help with your research?

Our PhD experts deliver complete systematic reviews and meta-analyses, from protocol to manuscript.

Explore our Systematic Review Service starting from $895.

Get a Free Quote

Professional Support

Let a PhD Expert Handle Your Research

From protocol to publication-ready manuscript. Our PhD-level methodologists handle systematic reviews, meta-analyses, scoping reviews, and more. Average turnaround: 2-4 weeks.

4.9/5 rating1,194+ deliveredNDA protected

Get a Free Quote Pricing

Written by

Dr. Sarah Mitchell

PhD, Biostatistics & Research Methodology

Dr. Sarah Mitchell holds a PhD in Biostatistics from Johns Hopkins Bloomberg School of Public Health and has over 15 years of experience in systematic review methodology and meta-analysis. She has authored or co-authored 40+ peer-reviewed publications in journals including the Journal of Clinical Epidemiology, BMC Medical Research Methodology, and Research Synthesis Methods. A former Cochrane Review Group statistician and current editorial board member of Systematic Reviews, Dr. Mitchell has supervised 200+ evidence synthesis projects across clinical medicine, public health, and social sciences.

Learn more about our team

Measuring reviewer agreement? Use our free Cohen's Kappa Calculator to compute inter-rater reliability instantly.

Research Gold's systematic review service includes custom extraction form design, dual-reviewer extraction, and full data verification. Start your project today.

Let a PhD Expert Handle Your Research

From protocol to publication-ready manuscript. Our PhD-level methodologists handle systematic reviews, meta-analyses, scoping reviews, and more. Average turnaround: 2-4 weeks.

Get a Free Quote Explore Systematic Review Service

Starting from $895. Final quote after we review your project.

Data extraction systematic review methodology is the structured process of collecting, recording, and organizing study-level information from every included primary study into a standardized form. It is the bridge between study selection and data synthesis, the step that transforms a set of screened articles into an analyzable dataset. When extraction is done well, your meta-analysis rests on a solid, reproducible foundation. When it is done poorly, no statistical method can rescue the results.

Data extraction is not reading and note-taking. It is a formalized, auditable procedure governed by a pre-designed form, executed independently by two reviewers, and verified through agreement statistics. The Cochrane Handbook for Systematic Reviews of Interventions, Chapter 5 (Higgins et al., 2023), devotes an entire chapter to this process because extraction errors propagate directly into effect size calculations, subgroup analyses, and the conclusions that inform clinical practice and policy.

This guide covers every stage of the extraction process, from form design and piloting through dual-reviewer extraction, reliability measurement, missing data handling, and the transition from raw extracted data to analysis-ready datasets. Our extraction forms are piloted on the first 3-5 included studies before full extraction begins, and we refine the form iteratively until reviewer agreement reaches acceptable thresholds.

What Is Data Extraction in a Systematic Review

The purpose is twofold. First, extraction captures the raw material needed for quantitative synthesis: sample sizes, means, standard deviations, odds ratios, confidence intervals, and other statistical parameters that feed into meta-analytic models. Second, extraction records the qualitative and contextual information needed to interpret those numbers: study design, setting, population characteristics, intervention details, outcome definitions, and follow-up duration.

A systematic review is only as reliable as the data extracted from its primary studies. If a reviewer misreads a sample size, transposes digits in a standard deviation, or records a subgroup result as the overall result, the downstream effect size estimate will be wrong. Dual-reviewer extraction exists precisely to catch these errors before they enter the analysis pipeline.

The extraction stage typically begins immediately after study selection is finalized, once the PRISMA flow diagram confirms the number of included studies and full-text exclusion reasons have been documented. For a step-by-step overview of the entire review process, see our guide on how to write a systematic review.

What to Extract: The Five Core Domains

Five core data extraction domains for systematic review — Five core extraction domains: study, participants, intervention, outcomes, RoB. Source: Cochrane Handbook v6.5, ch 5.

A well-designed extraction form captures information across five domains. The specific fields within each domain depend on your review question, but the domains themselves are consistent across nearly all systematic reviews.

Domain	What to Record	Example Fields
Study Characteristics	Metadata about the study itself	First author, year, journal, country, funding source, study design (RCT, cohort, cross-sectional), registration number
Population	Who was studied	Sample size (total and per arm), age (mean, SD, or median/IQR), sex distribution, diagnosis or condition, inclusion/exclusion criteria, setting (hospital, community, school)
Intervention / Exposure	What was done to the experimental group	Intervention type, dose, frequency, duration, delivery mode, comparator/control description, co-interventions
Outcomes	What was measured	Primary and secondary outcome definitions, measurement instruments, time points, outcome data (means, SDs, proportions, event counts), direction of effect
Effect Sizes and Precision	The numbers for meta-analysis	Point estimates (mean difference, odds ratio, hazard ratio), confidence intervals, p-values, sample sizes per group at each time point

Beyond these five domains, most reviews also extract information needed for risk of bias assessment, randomization method, allocation concealment, blinding, attrition, selective outcome reporting. Some teams extract risk of bias data on the same form; others use a separate tool. For a comprehensive guide to bias assessment, see our risk of bias systematic review guide.

The principle governing field selection is simple: extract everything you will need for your planned analyses and nothing you will not use. Every unnecessary field slows extraction, increases error opportunities, and adds no value. Conversely, discovering during analysis that you failed to extract a critical variable means returning to the primary studies, a costly and time-consuming correction.

Designing a Data Extraction Form

The extraction form is the most important methodological tool in the extraction phase. A well-designed form minimizes ambiguity, reduces disagreement between reviewers, and produces data that flows directly into analysis software without manual reformatting.

Start with your analysis plan. Before designing a single field, write out your planned analyses: the primary meta-analysis, subgroup analyses, sensitivity analyses, and any narrative summaries. Each analysis requires specific data inputs. Your extraction form should contain one field for every required input, no more, no less.

Use closed-ended fields wherever possible. Dropdown menus, radio buttons, and checkboxes produce consistent data. A free-text field asking for "study design" will yield entries like "RCT," "randomized controlled trial," "randomised trial," and "parallel group RCT", all meaning the same thing but requiring manual harmonization. A dropdown menu with predefined options eliminates this problem.

Include a notes field for each section. Despite your best efforts at standardization, some studies will report information in unexpected formats. A notes field gives reviewers a place to flag anomalies, document assumptions, and record information that does not fit neatly into predefined categories. These notes become invaluable during consensus meetings.

Build the form in the order you will present results. If your results section will start with study characteristics, move to risk of bias, then present outcome data by subgroup, arrange the extraction form in the same sequence. This alignment reduces cognitive load during extraction and ensures the form structure maps directly to your evidence tables.

Form Design Principle	Why It Matters	Implementation
Closed-ended fields	Eliminates inconsistent coding	Dropdowns for study design, risk of bias judgments, outcome types
Explicit units	Prevents unit confusion	Label every numeric field (e.g., "Age, mean, years," "Follow-up, months")
One outcome per row	Supports multiple outcomes per study	Use a repeating row structure for each outcome-time point combination
Conditional logic	Hides irrelevant fields	If study design = "cohort," hide randomization-related fields
Source page/table reference	Enables verification	For every extracted number, record the page, table, or figure from the original study

Our free Extraction Template Builder generates a structured form pre-loaded with Cochrane-recommended fields. You select your review type (intervention, diagnostic, prognostic), choose the relevant domains, and the tool produces a downloadable form ready for piloting.

Piloting the Extraction Form

Never begin full extraction without piloting. A pilot test reveals ambiguities in field definitions, missing fields, redundant fields, and areas where reviewer interpretation diverges. The Cochrane Handbook recommends piloting on a small subset of included studies before commencing full extraction.

Select 3-5 diverse studies. Choose studies that vary in design, sample size, reporting quality, and the complexity of their results. If all pilot studies are well-reported RCTs, you will not discover how your form handles poorly reported observational studies until you are midway through extraction, when changes become expensive.

Have both reviewers extract independently. The purpose of piloting is not just to test the form, it is to test the form in the hands of the people who will use it. Both reviewers extract the pilot studies independently, then compare their results field by field. Every discrepancy is a signal that the form or coding manual needs revision.

Revise the form and coding manual. After comparing pilot extractions, revise ambiguous field definitions, add missing fields, remove unused fields, and update the coding manual with explicit decision rules for situations that caused disagreement. Common revisions include clarifying how to handle studies that report medians instead of means, specifying whether to extract intention-to-treat or per-protocol results, and defining how to record outcomes measured at multiple time points.

Repeat if necessary. If the pilot reveals substantial disagreement or if you made major form revisions, pilot again on a new set of studies. The goal is to enter full extraction with a stable form and a shared understanding of how to use it.

Piloting is an investment that pays for itself many times over. A form that has been tested and refined produces cleaner data, fewer consensus meetings, and a smoother transition to analysis.

Struggling with data extraction for your systematic review? Our team uses validated extraction forms, dual-reviewer verification, and handles everything from study coding to data reconciliation. get a personalized quote to learn how we can help, or see our systematic review services for researchers.

less than 0.00	Poor agreement
0.00 - 0.20	Slight agreement
0.21 - 0.40	Fair agreement
0.41 - 0.60	Moderate agreement
0.61 - 0.80	Substantial agreement
0.81 - 1.00	Almost perfect agreement

Missing Data Scenario	Recommended Action
Standard deviation not reported	Estimate from confidence intervals, p-values, or IQR; impute from similar studies if necessary
Results presented only in figures	Use plot digitization software (e.g., WebPlotDigitizer) to estimate values
Subgroup data not separated	Contact author; if unavailable, include only the overall result
Outcome measured but not reported	Record as "measured but not reported", potential selective reporting concern
Ambiguous sample size (e.g., unclear dropouts)	Contact author; record the most conservative estimate and note the ambiguity

Data Extraction in Systematic Reviews: Best Practices for Accurate, Reproducible Collection

Key Takeaways

Let a PhD Expert Handle Your Research

Dr. Sarah Mitchell

Let a PhD Expert Handle Your Research

What Is Data Extraction in a Systematic Review

What to Extract: The Five Core Domains

Designing a Data Extraction Form

Piloting the Extraction Form

Dual-Reviewer Extraction and Reliability

Pilot on 3-5 diverse studies

Create a coding manual with decision rules

Extract in the order you'll present results

Frequently Asked Questions

Related Articles

Handling Missing Data During Extraction

From Extraction to Analysis

Common Mistakes in Data Extraction

Related Articles