Extract time-to-event data from Kaplan-Meier curves and estimate hazard ratios for meta-analysis. Enter digitized time points, survival probabilities, and number at risk to reconstruct survival statistics using established methods.
| Time | Survival Prob. | No. at Risk | |
|---|---|---|---|
| Time | Survival Prob. | No. at Risk | |
|---|---|---|---|
Enter the required data above and select an estimation method to see results. For the median method, ensure survival data crosses 0.5 in both arms.
Limitations and Notes
Input the time points and corresponding survival probabilities extracted from the Kaplan-Meier curve. Add number at risk data if available from the table below the curve.
Enter the total sample size and total events for each arm. For two-arm comparisons, provide data for both the treatment and control groups separately.
Choose from three methods: simple HR from median survival, Tierney method from p-value and events, or direct entry of published HR and CI for validation.
Review the estimated hazard ratio, log(HR), standard error, and median survival times. Copy the formatted results for direct use in your meta-analysis.
For meta-analysis of survival outcomes, the hazard ratio (HR) is the standard effect measure because it accounts for both the magnitude and timing of events across the entire follow-up period. Unlike median survival or event rates at fixed time points, the HR captures the relative risk of the event over time. When HR and its variance are not directly reported, they must be estimated from available data using methods like those described by Tierney and Parmar.
When digitizing KM curves, having the number at risk table is essential for accurate data reconstruction. Without it, the method cannot distinguish events from censorings, leading to potentially biased HR estimates. Always extract number at risk data when available, even if only at a few time points. Some journals report this table below the KM curve, while others include it as supplementary material.
Tierney and Parmar describe a hierarchy of methods depending on available data. The most precise approach uses the reported HR and CI directly. Next best is using the log-rank p-value with total events. When only KM curves are available, digitized coordinates with number at risk data enable IPD reconstruction. The least precise method uses only median survival times. Always use the highest-tier method available for each study.
Because different estimation methods have different precision levels, it is critical to document which method was used for each study in your meta-analysis. This allows readers to assess potential bias and enables sensitivity analyses restricted to studies where higher-quality HR estimates were obtainable. Report the extraction method in your systematic review methods section and supplementary materials.
A survival curve digitizer enables systematic reviewers to extract numerical time-to-event data from published Kaplan-Meier (KM) plots when the underlying statistics are not reported in the manuscript. This situation is remarkably common in oncology, cardiovascular, and surgical research: a trial may present overall survival or progression-free survival as a KM curve without providing the hazard ratio, its confidence interval, or the log-rank test statistic needed for meta-analytic pooling. The Cochrane Handbook (Higgins et al., 2023) identifies time-to-event outcomes as requiring special extraction methods, and the Tierney & Parmar (2007) framework remains the authoritative reference for estimating hazard ratios from various forms of incompletely reported survival data.
The Kaplan-Meier data extractor workflow begins with digitization — reading coordinate pairs (time, survival probability) from the stepped KM curve using dedicated software. WebPlotDigitizer is the most widely used open-source companion tool for coordinate extraction from published figures, while DigitizeIt and Engauge Digitizer offer alternative desktop applications with semi-automated curve tracing and axis calibration features. The accuracy of downstream estimates depends directly on the number and placement of extracted points. Best practice, as described by Guyot et al. (2012), is to capture every visible step in the curve and to record the number at risk at each reported time point. These number-at-risk values are critical because they encode censoring information: without them, the algorithm cannot distinguish patients who experienced the event from those who were lost to follow-up or remained event-free at the analysis cutoff.
Once digitized coordinates and number-at-risk data are available, a hazard ratio calculator from KM curve data applies the Tierney-Parmar method hierarchy to estimate the log hazard ratio and its variance. The most precise approach uses the reported HR and confidence interval directly; the next best uses the log-rank p-value combined with the total number of events to back-calculate the log(HR) via the relationship between the chi-squared statistic and the normal distribution. When only the KM curve is available, individual patient data (IPD) can be approximately reconstructed using the Guyot et al. (2012) algorithm, which iteratively estimates event and censoring times from digitized coordinates and number-at-risk tables, and the HR is then estimated from the reconstructed dataset using Cox regression. Before applying Cox regression to reconstructed IPD, researchers should assess whether the proportional hazards assumption holds — violations, indicated by crossing survival curves or non-constant hazard ratios over time, may necessitate alternative time-to-event metrics such as the restricted mean survival time (RMST), which compares the area under each survival curve up to a pre-specified time horizon and avoids the proportional hazards assumption entirely. The least precise method — dividing the control group median survival by the treatment group median — assumes exponential (constant-hazard) survival, which rarely holds in practice but may be the only option for poorly reported studies.
Quality assurance in survival data extraction requires dual-independent digitization, where two reviewers independently extract coordinates from the same KM curve and discrepancies are resolved by consensus. PRISMA 2020 (Page et al., 2021) mandates that systematic reviews describe the data extraction process in enough detail for replication, including which software was used for digitization, which Tierney-Parmar method tier was applied to each study, and how disagreements between extractors were resolved. Documenting the extraction method per study also enables sensitivity analyses — for instance, restricting the meta-analysis to studies where the HR was directly reported versus studies where it was estimated from KM curves. Our effect size calculator can then convert these hazard ratios into other metrics if needed for comparison across outcome types.
The broader context of survival meta-analysis connects digitization to several other methodological steps. Before extracting data, reviewers should assess the risk of bias in each trial using domain-appropriate tools such as our RoB 2 assessment tool for randomized trials or ROBINS-I for non-randomized studies. After extraction, pooled hazard ratios are typically visualized with a forest plot that displays individual study HRs as weighted squares and the summary estimate as a diamond. Heterogeneity assessment via I-squared and tau-squared — available through our heterogeneity calculator — determines whether a single summary HR is meaningful or whether subgroup analyses are warranted. Together, these steps form a rigorous pipeline that transforms published KM curves into quantitative evidence suitable for informing clinical guidelines and health policy decisions.
Kaplan-Meier (KM) curve digitization is the process of extracting numerical data from published survival curves when the original data are not reported. Many clinical trials report survival outcomes as KM curves without providing the underlying hazard ratios, median survival times, or individual patient data (IPD). Systematic reviews of time-to-event outcomes require these numerical values to pool results in a meta-analysis. Digitization tools help researchers read coordinates from the curve and reconstruct the survival data needed for quantitative synthesis.
The Tierney and Parmar (2007) method provides a framework for estimating hazard ratios (HRs) and their variances from various forms of published survival data. When a study reports a p-value from a log-rank test and the total number of events, the HR can be estimated using the relationship between the log-rank statistic, observed events, and the HR. The method also covers scenarios where only KM curves are available, where reported statistics vary (e.g., median survival times, event counts), or where HRs are reported without confidence intervals. It is the standard reference for time-to-event data extraction in systematic reviews.
Number at risk tables, typically shown below KM curves, report how many patients remain in the study at specific time points. These numbers are critical for accurate data reconstruction because they provide censoring information. Without number at risk data, it is impossible to distinguish between events (deaths) and censored observations (patients lost to follow-up or still alive at end of study). When available, number at risk data substantially improve the accuracy of reconstructed survival curves and estimated hazard ratios.
Approximate IPD can be reconstructed from KM curves using the algorithm described by Guyot et al. (2012). This method uses digitized curve coordinates and number at risk data to estimate individual event and censoring times. However, the reconstructed IPD is an approximation and may not perfectly match the original data, particularly when curves have few patients at risk in later time points. The accuracy depends on the quality of digitization, the number of time points extracted, and the availability of number at risk information.
When only median survival times are available for two groups, a crude HR estimate can be calculated as: HR = median_control / median_treatment. This assumes exponential survival distributions (constant hazard over time), which is often unrealistic. The log of this ratio gives the log(HR), and the standard error can be approximated using the total number of events. This method is the least precise of available approaches and should only be used when no other data are reported. Always note this limitation when using median-derived HRs in your meta-analysis.
Use the Tierney et al. (2007) method: digitize survival probabilities at each time point where at-risk numbers are reported, reconstruct the number of events per interval, then estimate the log hazard ratio and its variance using inverse-variance weighting across intervals. Tools like WebPlotDigitizer extract coordinates from published figures, while the Guyot et al. (2012) algorithm reconstructs individual patient data (IPD) from the curve.
A risk ratio compares the probability of an event at a single time point, while a hazard ratio compares the instantaneous rate of events over time, accounting for censoring. Hazard ratios from Cox regression are the standard effect measure for time-to-event outcomes in meta-analysis. Unlike risk ratios, hazard ratios use all available follow-up data and handle variable follow-up durations.
Yes. The Tierney and Parmar (2007) method allows estimation of hazard ratios and their variances from published Kaplan-Meier curves combined with at-risk tables. Guyot et al. (2012) reconstruct pseudo-IPD from digitized curves. These aggregate data methods are standard in systematic reviews when IPD are unavailable, which is the case in most reviews. Always report the extraction method and any assumptions made.
Once you have extracted hazard ratios and their standard errors, compute standardized effect sizes with our Effect Size Calculator for converting between different metrics. Pool your time-to-event estimates visually using the Forest Plot Generator to create publication-ready forest plots with weighted HR estimates and diamond summary statistics for your survival meta-analysis.
Our methodologists can perform dual-independent KM curve digitization, IPD reconstruction, and time-to-event meta-analysis with full documentation of extraction methods for every included study.