Data extraction systematic review methodology is the structured process of collecting, recording, and organizing study-level information from every included primary study into a standardized form. It is the bridge between study selection and data synthesis, the step that transforms a set of screened articles into an analyzable dataset. When extraction is done well, your meta-analysis rests on a solid, reproducible foundation. When it is done poorly, no statistical method can rescue the results.
Data extraction is not reading and note-taking. It is a formalized, auditable procedure governed by a pre-designed form, executed independently by two reviewers, and verified through agreement statistics. The Cochrane Handbook for Systematic Reviews of Interventions, Chapter 5 (Higgins et al., 2023), devotes an entire chapter to this process because extraction errors propagate directly into effect size calculations, subgroup analyses, and the conclusions that inform clinical practice and policy.
This guide covers every stage of the extraction process, from form design and piloting through dual-reviewer extraction, reliability measurement, missing data handling, and the transition from raw extracted data to analysis-ready datasets. Our extraction forms are piloted on the first 3-5 included studies before full extraction begins, and we refine the form iteratively until reviewer agreement reaches acceptable thresholds.
What Is Data Extraction in a Systematic Review
Data extraction, sometimes called data collection or data abstraction, is the systematic process of identifying, recording, and coding relevant information from each study that passed full-text screening. Every included study is read in full, and predetermined data points are transferred from the published report into a structured extraction form.
The purpose is twofold. First, extraction captures the raw material needed for quantitative synthesis: sample sizes, means, standard deviations, odds ratios, confidence intervals, and other statistical parameters that feed into meta-analytic models. Second, extraction records the qualitative and contextual information needed to interpret those numbers: study design, setting, population characteristics, intervention details, outcome definitions, and follow-up duration.
A systematic review is only as reliable as the data extracted from its primary studies. If a reviewer misreads a sample size, transposes digits in a standard deviation, or records a subgroup result as the overall result, the downstream effect size estimate will be wrong. Dual-reviewer extraction exists precisely to catch these errors before they enter the analysis pipeline.
The extraction stage typically begins immediately after study selection is finalized, once the PRISMA flow diagram confirms the number of included studies and full-text exclusion reasons have been documented. For a step-by-step overview of the entire review process, see our guide on how to write a systematic review.
What to Extract: The Five Core Domains
A well-designed extraction form captures information across five domains. The specific fields within each domain depend on your review question, but the domains themselves are consistent across nearly all systematic reviews.
| Domain | What to Record | Example Fields |
|---|---|---|
| Study Characteristics | Metadata about the study itself | First author, year, journal, country, funding source, study design (RCT, cohort, cross-sectional), registration number |
| Population | Who was studied | Sample size (total and per arm), age (mean, SD, or median/IQR), sex distribution, diagnosis or condition, inclusion/exclusion criteria, setting (hospital, community, school) |
| Intervention / Exposure | What was done to the experimental group | Intervention type, dose, frequency, duration, delivery mode, comparator/control description, co-interventions |
| Outcomes | What was measured | Primary and secondary outcome definitions, measurement instruments, time points, outcome data (means, SDs, proportions, event counts), direction of effect |
| Effect Sizes and Precision | The numbers for meta-analysis | Point estimates (mean difference, odds ratio, hazard ratio), confidence intervals, p-values, sample sizes per group at each time point |
Beyond these five domains, most reviews also extract information needed for risk of bias assessment, randomization method, allocation concealment, blinding, attrition, selective outcome reporting. Some teams extract risk of bias data on the same form; others use a separate tool. For a comprehensive guide to bias assessment, see our risk of bias systematic review guide.
The principle governing field selection is simple: extract everything you will need for your planned analyses and nothing you will not use. Every unnecessary field slows extraction, increases error opportunities, and adds no value. Conversely, discovering during analysis that you failed to extract a critical variable means returning to the primary studies, a costly and time-consuming correction.
Designing a Data Extraction Form
The extraction form is the most important methodological tool in the extraction phase. A well-designed form minimizes ambiguity, reduces disagreement between reviewers, and produces data that flows directly into analysis software without manual reformatting.
Start with your analysis plan. Before designing a single field, write out your planned analyses: the primary meta-analysis, subgroup analyses, sensitivity analyses, and any narrative summaries. Each analysis requires specific data inputs. Your extraction form should contain one field for every required input, no more, no less.
Use closed-ended fields wherever possible. Dropdown menus, radio buttons, and checkboxes produce consistent data. A free-text field asking for "study design" will yield entries like "RCT," "randomized controlled trial," "randomised trial," and "parallel group RCT", all meaning the same thing but requiring manual harmonization. A dropdown menu with predefined options eliminates this problem.
Include a notes field for each section. Despite your best efforts at standardization, some studies will report information in unexpected formats. A notes field gives reviewers a place to flag anomalies, document assumptions, and record information that does not fit neatly into predefined categories. These notes become invaluable during consensus meetings.
Build the form in the order you will present results. If your results section will start with study characteristics, move to risk of bias, then present outcome data by subgroup, arrange the extraction form in the same sequence. This alignment reduces cognitive load during extraction and ensures the form structure maps directly to your evidence tables.
| Form Design Principle | Why It Matters | Implementation |
|---|---|---|
| Closed-ended fields | Eliminates inconsistent coding | Dropdowns for study design, risk of bias judgments, outcome types |
| Explicit units | Prevents unit confusion | Label every numeric field (e.g., "Age, mean, years," "Follow-up, months") |
| One outcome per row | Supports multiple outcomes per study | Use a repeating row structure for each outcome-time point combination |
| Conditional logic | Hides irrelevant fields | If study design = "cohort," hide randomization-related fields |
| Source page/table reference | Enables verification | For every extracted number, record the page, table, or figure from the original study |
Our free Extraction Template Builder generates a structured form pre-loaded with Cochrane-recommended fields. You select your review type (intervention, diagnostic, prognostic), choose the relevant domains, and the tool produces a downloadable form ready for piloting.
Piloting the Extraction Form
Never begin full extraction without piloting. A pilot test reveals ambiguities in field definitions, missing fields, redundant fields, and areas where reviewer interpretation diverges. The Cochrane Handbook recommends piloting on a small subset of included studies before commencing full extraction.
Select 3-5 diverse studies. Choose studies that vary in design, sample size, reporting quality, and the complexity of their results. If all pilot studies are well-reported RCTs, you will not discover how your form handles poorly reported observational studies until you are midway through extraction, when changes become expensive.
Have both reviewers extract independently. The purpose of piloting is not just to test the form, it is to test the form in the hands of the people who will use it. Both reviewers extract the pilot studies independently, then compare their results field by field. Every discrepancy is a signal that the form or coding manual needs revision.
Revise the form and coding manual. After comparing pilot extractions, revise ambiguous field definitions, add missing fields, remove unused fields, and update the coding manual with explicit decision rules for situations that caused disagreement. Common revisions include clarifying how to handle studies that report medians instead of means, specifying whether to extract intention-to-treat or per-protocol results, and defining how to record outcomes measured at multiple time points.
Repeat if necessary. If the pilot reveals substantial disagreement or if you made major form revisions, pilot again on a new set of studies. The goal is to enter full extraction with a stable form and a shared understanding of how to use it.
Piloting is an investment that pays for itself many times over. A form that has been tested and refined produces cleaner data, fewer consensus meetings, and a smoother transition to analysis.
Dual-Reviewer Extraction and Reliability
The gold standard for data extraction in systematic reviews is independent, dual-reviewer extraction followed by consensus. This is not optional, it is a methodological requirement endorsed by Cochrane, JBI, and every major reporting guideline. Dual-reviewer extraction reduces both random transcription errors and systematic interpretation biases that a single reviewer cannot self-detect.
Independent extraction means truly independent. Both reviewers extract data from every included study without consulting each other, without sharing partially completed forms, and without discussing individual studies until both extractions are complete. Any communication during extraction undermines the independence that makes dual extraction valuable.
Consensus resolution follows a structured process. After both reviewers complete extraction for a batch of studies, their forms are compared field by field. Discrepancies are categorized as either transcription errors (simple mistakes caught immediately) or interpretation disagreements (genuine differences in how a data point was read or coded). Transcription errors are corrected by referring to the original study. Interpretation disagreements are discussed, and if consensus cannot be reached, a third reviewer adjudicates.
Measure agreement with Cohen's kappa. For categorical extraction fields, study design, risk of bias judgments, outcome classification, Cohen's kappa (Cohen, 1960) quantifies the level of agreement beyond chance. Landis and Koch (1977) proposed the following interpretation scale:
| Kappa Value | Interpretation |
|---|---|
| less than 0.00 | Poor agreement |
| 0.00 - 0.20 | Slight agreement |
| 0.21 - 0.40 | Fair agreement |
| 0.41 - 0.60 | Moderate agreement |
| 0.61 - 0.80 | Substantial agreement |
| 0.81 - 1.00 | Almost perfect agreement |
A kappa above 0.80 is the conventional threshold for excellent inter-rater reliability in systematic review extraction. If your kappa falls below 0.60, this signals a problem with the extraction form, the coding manual, or reviewer training that must be addressed before proceeding.
For continuous extraction fields, sample sizes, means, standard deviations, effect sizes, use the intraclass correlation coefficient (ICC) rather than kappa. An ICC above 0.90 indicates excellent agreement for continuous data.
Calculate your agreement statistics with our free Cohen's Kappa Calculator, which accepts raw agreement data and returns kappa with 95% confidence intervals.
Report agreement in your methods section. Every systematic review should report the inter-rater reliability achieved during extraction. State the agreement statistic used, the value obtained, and how discrepancies were resolved. This transparency allows readers to assess the reliability of your extracted dataset.
Handling Missing Data During Extraction
Missing data is the norm, not the exception. Primary studies frequently fail to report all the information your extraction form requests. Standard deviations may be missing, subgroup sample sizes may be unclear, follow-up duration may be ambiguous, and outcome data may be presented in figures rather than tables (requiring estimation).
Contact the corresponding author. When critical data points are missing from the published report, email the corresponding author with a specific, concise request. State which study you are referencing, which data points you need, and why (e.g., "We are conducting a meta-analysis and require the standard deviation for the primary outcome at 6 months"). Document the date of contact, whether a response was received, and what data were provided.
Set a response deadline. Allow two weeks for a response, with one follow-up email if no reply is received. After two attempts, record the data point as unavailable and proceed. Many journals and research institutions have data-sharing policies that support these requests, but response rates vary widely.
Document missing data systematically. Your extraction form should have a dedicated field or code for missing data, never leave cells blank, as blank cells are ambiguous (is the data missing from the study, or did the reviewer forget to extract it?). Use a consistent code such as "NR" (not reported), "NA" (not applicable), or "REQ" (requested from author, pending response).
Plan for missing data in your analysis. Missing outcome data affects your meta-analysis in several ways. You may need to impute standard deviations using methods described in the Cochrane Handbook (Chapter 6), estimate data from figures using plot digitization software, or conduct sensitivity analyses comparing results with and without imputed values. The handling of missing data should be pre-specified in your protocol to avoid selective decisions during analysis.
| Missing Data Scenario | Recommended Action |
|---|---|
| Standard deviation not reported | Estimate from confidence intervals, p-values, or IQR; impute from similar studies if necessary |
| Results presented only in figures | Use plot digitization software (e.g., WebPlotDigitizer) to estimate values |
| Subgroup data not separated | Contact author; if unavailable, include only the overall result |
| Outcome measured but not reported | Record as "measured but not reported", potential selective reporting concern |
| Ambiguous sample size (e.g., unclear dropouts) | Contact author; record the most conservative estimate and note the ambiguity |
Missing data is not a failure of extraction, it is a reality of working with published literature. The quality of your response to missing data determines whether your review is transparent and reproducible or opaque and questionable.
From Extraction to Analysis
The transition from completed extraction forms to an analysis-ready dataset is a critical but often overlooked step. Raw extracted data rarely feeds directly into meta-analytic software without transformation, harmonization, and verification.
Harmonize effect measures. Different studies may report different effect measures for the same outcome, one reports a mean difference, another reports a standardized mean difference, and a third reports the raw means and standard deviations from which you must calculate the effect. Before analysis, convert all extracted data into a common effect measure. For continuous outcomes, this typically means calculating standardized mean differences (Hedges' g or Cohen's d). For binary outcomes, decide whether to use odds ratios, risk ratios, or risk differences and convert accordingly.
Verify data entry. After extraction is complete and consensus is reached, perform a final verification pass. Select a random 10-20% of studies and check every extracted field against the original publication. This audit catches residual errors that survived the dual-extraction and consensus process.
Structure data for software input. Meta-analysis software, whether RevMan, R (metafor package), Stata, or Comprehensive Meta-Analysis, requires data in specific formats. Typically, you need one row per study (or per study-outcome combination for multivariate meta-analysis), with columns for effect size, standard error or variance, and subgroup variables. Restructure your extraction dataset to match these requirements before importing.
Create an evidence table. Before running any analysis, produce a descriptive evidence table summarizing the characteristics of included studies. This table, typically Table 1 in a systematic review, serves as a quality check: if a study's characteristics look implausible in the table (e.g., a mean age of 12 in a study of elderly patients), the extraction needs revisiting. The evidence table also provides the narrative foundation for your results section.
Your search strategy determines which studies enter the review, and your extraction determines the data quality of those studies. For guidance on building comprehensive search strategies, see our search strategy systematic review guide.
Common Mistakes in Data Extraction
Even experienced reviewers make extraction errors. Understanding the most common mistakes helps you design safeguards against them.
Extracting from the wrong table or figure. Studies with multiple outcomes, subgroups, or time points often present results across several tables. A reviewer who extracts from Table 3 instead of Table 2, or from the per-protocol analysis instead of the intention-to-treat analysis, introduces a systematic error that may not be caught during consensus if both reviewers make the same mistake. Mitigate this by requiring page and table references for every extracted number.
Confusing standard deviation with standard error. This is the single most common numerical extraction error. A standard error is always smaller than the corresponding standard deviation, and using SE where SD is needed will underestimate the variance and produce artificially narrow confidence intervals in your meta-analysis. When a study reports "mean (SE)" or "mean (SD)," record exactly what is stated and convert during the harmonization step.
Extracting transformed rather than raw data. Some studies report log-transformed means, adjusted odds ratios, or age-standardized rates. These are not interchangeable with their raw counterparts. Your extraction form should specify which version to extract, and the coding manual should include decision rules for studies that report only transformed data.
Ignoring multi-arm trials. A three-arm trial contributes two comparisons to your meta-analysis, but the shared control group cannot be counted twice without inflating the total sample size. During extraction, flag multi-arm trials and record data for each arm separately so that the analytical adjustment (splitting the control group or using multivariate methods) can be applied correctly.
Failing to record the source of each number. Without page and table references, verification is impossible. When a discrepancy is discovered during consensus or audit, reviewers must return to the original study and locate the exact source of each number. If no reference was recorded, this search can take longer than the original extraction.
Not distinguishing between "not reported" and "zero." A study that does not report adverse events is not the same as a study that reports zero adverse events. Your extraction form must distinguish between these two situations, as they have different implications for meta-analysis (particularly for rare events).
These mistakes are preventable with a well-designed form, a thorough coding manual, and a pilot phase that surfaces ambiguities before full extraction begins. The investment in form design and piloting pays dividends in data quality, reviewer efficiency, and the credibility of your final results.
Data extraction is the methodological backbone of every systematic review and meta-analysis. It determines the accuracy, completeness, and reproducibility of your synthesized evidence. By designing a structured form, piloting it on diverse studies, extracting independently with two reviewers, measuring agreement with Cohen's kappa, handling missing data transparently, and verifying data before analysis, you build a dataset that can withstand scrutiny from peer reviewers, editors, and the broader research community. For a comprehensive framework that ties extraction into the larger review process, see our step-by-step systematic review guide.