Data extraction systematic review methodology is the structured process of collecting, recording, and organizing study-level information from every included primary study into a standardized form. It is the bridge between study selection and data synthesis, the step that transforms a set of screened articles into an analyzable dataset. When extraction is done well, your meta-analysis rests on a solid, reproducible foundation. When it is done poorly, no statistical method can rescue the results.
Data extraction is not reading and note-taking. It is a formalized, auditable procedure governed by a pre-designed form, executed independently by two reviewers, and verified through agreement statistics. The Cochrane Handbook for Systematic Reviews of Interventions, Chapter 5 (Higgins et al., 2023), devotes an entire chapter to this process because extraction errors propagate directly into effect size calculations, subgroup analyses, and the conclusions that inform clinical practice and policy.
This guide covers every stage of the extraction process, from form design and piloting through dual-reviewer extraction, reliability measurement, missing data handling, and the transition from raw extracted data to analysis-ready datasets. Our extraction forms are piloted on the first 3-5 included studies before full extraction begins, and we refine the form iteratively until reviewer agreement reaches acceptable thresholds.
What Is Data Extraction in a Systematic Review
The purpose is twofold. First, extraction captures the raw material needed for quantitative synthesis: sample sizes, means, standard deviations, odds ratios, confidence intervals, and other statistical parameters that feed into meta-analytic models. Second, extraction records the qualitative and contextual information needed to interpret those numbers: study design, setting, population characteristics, intervention details, outcome definitions, and follow-up duration.
A systematic review is only as reliable as the data extracted from its primary studies. If a reviewer misreads a sample size, transposes digits in a standard deviation, or records a subgroup result as the overall result, the downstream effect size estimate will be wrong. Dual-reviewer extraction exists precisely to catch these errors before they enter the analysis pipeline.
The extraction stage typically begins immediately after study selection is finalized, once the PRISMA flow diagram confirms the number of included studies and full-text exclusion reasons have been documented. For a step-by-step overview of the entire review process, see our guide on how to write a systematic review.
What to Extract: The Five Core Domains
A well-designed extraction form captures information across five domains. The specific fields within each domain depend on your review question, but the domains themselves are consistent across nearly all systematic reviews.
| Domain | What to Record | Example Fields |
|---|---|---|
| Study Characteristics | Metadata about the study itself | First author, year, journal, country, funding source, study design (RCT, cohort, cross-sectional), registration number |
| Population | Who was studied | Sample size (total and per arm), age (mean, SD, or median/IQR), sex distribution, diagnosis or condition, inclusion/exclusion criteria, setting (hospital, community, school) |
| Intervention / Exposure | What was done to the experimental group | Intervention type, dose, frequency, duration, delivery mode, comparator/control description, co-interventions |
| Outcomes | What was measured | Primary and secondary outcome definitions, measurement instruments, time points, outcome data (means, SDs, proportions, event counts), direction of effect |
| Effect Sizes and Precision | The numbers for meta-analysis | Point estimates (mean difference, odds ratio, hazard ratio), confidence intervals, p-values, sample sizes per group at each time point |
Beyond these five domains, most reviews also extract information needed for risk of bias assessment, randomization method, allocation concealment, blinding, attrition, selective outcome reporting. Some teams extract risk of bias data on the same form; others use a separate tool. For a comprehensive guide to bias assessment, see our risk of bias systematic review guide.
The principle governing field selection is simple: extract everything you will need for your planned analyses and nothing you will not use. Every unnecessary field slows extraction, increases error opportunities, and adds no value. Conversely, discovering during analysis that you failed to extract a critical variable means returning to the primary studies, a costly and time-consuming correction.
Designing a Data Extraction Form
The extraction form is the most important methodological tool in the extraction phase. A well-designed form minimizes ambiguity, reduces disagreement between reviewers, and produces data that flows directly into analysis software without manual reformatting.
Start with your analysis plan. Before designing a single field, write out your planned analyses: the primary meta-analysis, subgroup analyses, sensitivity analyses, and any narrative summaries. Each analysis requires specific data inputs. Your extraction form should contain one field for every required input, no more, no less.
Use closed-ended fields wherever possible. Dropdown menus, radio buttons, and checkboxes produce consistent data. A free-text field asking for "study design" will yield entries like "RCT," "randomized controlled trial," "randomised trial," and "parallel group RCT", all meaning the same thing but requiring manual harmonization. A dropdown menu with predefined options eliminates this problem.
Include a notes field for each section. Despite your best efforts at standardization, some studies will report information in unexpected formats. A notes field gives reviewers a place to flag anomalies, document assumptions, and record information that does not fit neatly into predefined categories. These notes become invaluable during consensus meetings.
Build the form in the order you will present results. If your results section will start with study characteristics, move to risk of bias, then present outcome data by subgroup, arrange the extraction form in the same sequence. This alignment reduces cognitive load during extraction and ensures the form structure maps directly to your evidence tables.
| Form Design Principle | Why It Matters | Implementation |
|---|---|---|
| Closed-ended fields | Eliminates inconsistent coding | Dropdowns for study design, risk of bias judgments, outcome types |
| Explicit units | Prevents unit confusion | Label every numeric field (e.g., "Age, mean, years," "Follow-up, months") |
| One outcome per row | Supports multiple outcomes per study | Use a repeating row structure for each outcome-time point combination |
| Conditional logic | Hides irrelevant fields | If study design = "cohort," hide randomization-related fields |
| Source page/table reference | Enables verification | For every extracted number, record the page, table, or figure from the original study |
Our free Extraction Template Builder generates a structured form pre-loaded with Cochrane-recommended fields. You select your review type (intervention, diagnostic, prognostic), choose the relevant domains, and the tool produces a downloadable form ready for piloting.
Piloting the Extraction Form
Never begin full extraction without piloting. A pilot test reveals ambiguities in field definitions, missing fields, redundant fields, and areas where reviewer interpretation diverges. The Cochrane Handbook recommends piloting on a small subset of included studies before commencing full extraction.
Select 3-5 diverse studies. Choose studies that vary in design, sample size, reporting quality, and the complexity of their results. If all pilot studies are well-reported RCTs, you will not discover how your form handles poorly reported observational studies until you are midway through extraction, when changes become expensive.
Have both reviewers extract independently. The purpose of piloting is not just to test the form, it is to test the form in the hands of the people who will use it. Both reviewers extract the pilot studies independently, then compare their results field by field. Every discrepancy is a signal that the form or coding manual needs revision.
Revise the form and coding manual. After comparing pilot extractions, revise ambiguous field definitions, add missing fields, remove unused fields, and update the coding manual with explicit decision rules for situations that caused disagreement. Common revisions include clarifying how to handle studies that report medians instead of means, specifying whether to extract intention-to-treat or per-protocol results, and defining how to record outcomes measured at multiple time points.
Repeat if necessary. If the pilot reveals substantial disagreement or if you made major form revisions, pilot again on a new set of studies. The goal is to enter full extraction with a stable form and a shared understanding of how to use it.
Piloting is an investment that pays for itself many times over. A form that has been tested and refined produces cleaner data, fewer consensus meetings, and a smoother transition to analysis.
Struggling with data extraction for your systematic review? Our team uses validated extraction forms, dual-reviewer verification, and handles everything from study coding to data reconciliation. get a personalized quote to learn how we can help, or see our systematic review services for researchers.