Survival Curve Digitizer Helper

Free

Extract time-to-event data from Kaplan-Meier curves and estimate hazard ratios for meta-analysis. Enter digitized time points, survival probabilities, and number at risk to reconstruct survival statistics using established methods.

Load sample data to see how the tool works, or clear all fields to start fresh.

Treatment Arm

Time	Survival Prob.	No. at Risk

Control Arm

Time	Survival Prob.	No. at Risk

Study Parameters

Total N (treatment)

Total N (control)

Events (treatment)

Events (control)

Estimation Method

From Median Survival Times

Estimates HR from ratio of median survival times (assumes exponential distribution).

Tierney Method (p-value + events)

Estimates HR from log-rank p-value and total number of events (Tierney & Parmar 2007).

Direct HR Entry (validation)

Enter a published HR and CI directly for validation and format conversion.

Results

Enter the required data above and select an estimation method to see results. For the median method, ensure survival data crosses 0.5 in both arms.

Limitations and Notes

The median survival method assumes an exponential distribution (constant hazard), which is often unrealistic. Use the Tierney method when a log-rank p-value is available.
This tool helps format and estimate HR from extracted data but does not replace dedicated digitization software (e.g., WebPlotDigitizer, Engauge Digitizer) for extracting coordinates from images.
For IPD reconstruction from KM curves, use the Guyot et al. (2012) algorithm, which requires number at risk data at multiple time points.
Always report which estimation method was used for each study in your systematic review methods section.

Next step

Curve digitized. Want the full time-to-event meta-analysis?

IPD reconstruction, hazard ratio pooling, restricted mean survival time, and a publication-ready manuscript.

Our promise: Free re-run of the pooled analysis if reviewers question the estimate or model.

Quote in minutesPay only after you approve scopePhD methodologistmetafor R + Cochrane HandbookNDA available on request

Quote my meta-analysis WhatsApp

Timeline

Most projects deliver in under 2 weeks. We confirm an exact date in your quote.

If reviewers push back

If reviewers question the pooled estimate or model choice, we re-run and re-write the analysis free.

Confidentiality

NDA available on request before scope discussion. Your data, study design, and manuscript stay private either way.

How to Use This Tool

Enter Curve Data

Input the time points and corresponding survival probabilities extracted from the Kaplan-Meier curve. Add number at risk data if available from the table below the curve.

Set Study Parameters

Enter the total sample size and total events for each arm. For two-arm comparisons, provide data for both the treatment and control groups separately.

Select Estimation Method

Choose from three methods: simple HR from median survival, Tierney method from p-value and events, or direct entry of published HR and CI for validation.

Copy Results

Review the estimated hazard ratio, log(HR), standard error, and median survival times. Copy the formatted results for direct use in your meta-analysis.

Want a PhD methodologist to handle the whole project?

Get a full time-to-event meta-analysis with hazard ratio pooling. Free re-run of the pooled analysis if reviewers question the estimate or model. Pay only after you approve scope.

WhatsApp Quote my meta-analysis

Key Takeaways for Survival Data Extraction

Hazard ratios are the preferred time-to-event metric

For meta-analysis of survival outcomes, the hazard ratio (HR) is the standard effect measure because it accounts for both the magnitude and timing of events across the entire follow-up period. Unlike median survival or event rates at fixed time points, the HR captures the relative risk of the event over time. When HR and its variance are not directly reported, they must be estimated from available data using methods like those described by Tierney and Parmar.

Number at risk data dramatically improve accuracy

When digitizing KM curves, having the number at risk table is essential for accurate data reconstruction. Without it, the method cannot distinguish events from censorings, leading to potentially biased HR estimates. Always extract number at risk data when available, even if only at a few time points. Some journals report this table below the KM curve, while others include it as supplementary material.

The Tierney method hierarchy guides data extraction strategy

Tierney and Parmar describe a hierarchy of methods depending on available data. The most precise approach uses the reported HR and CI directly. Next best is using the log-rank p-value with total events. When only KM curves are available, digitized coordinates with number at risk data enable IPD reconstruction. The least precise method uses only median survival times. Always use the highest-tier method available for each study.

Document your extraction method for each study

Because different estimation methods have different precision levels, it is critical to document which method was used for each study in your meta-analysis. This allows readers to assess potential bias and enables sensitivity analyses restricted to studies where higher-quality HR estimates were obtainable. Report the extraction method in your systematic review methods section and supplementary materials.

Extracting Time-to-Event Data From Published Survival Curves

A survival curve digitizer enables systematic reviewers to extract numerical time-to-event data from published Kaplan-Meier (KM) plots when the underlying statistics are not reported in the manuscript. This situation is remarkably common in oncology, cardiovascular, and surgical research: a trial may present overall survival or progression-free survival as a KM curve without providing the hazard ratio, its confidence interval, or the log-rank test statistic needed for meta-analytic pooling. The Cochrane Handbook (Higgins et al., 2023) identifies time-to-event outcomes as requiring special extraction methods, and the Tierney & Parmar (2007) framework remains the authoritative reference for estimating hazard ratios from various forms of incompletely reported survival data.

The Kaplan-Meier data extractor workflow begins with digitization, reading coordinate pairs (time, survival probability) from the stepped KM curve using dedicated software. WebPlotDigitizer is the most widely used open-source companion tool for coordinate extraction from published figures, while DigitizeIt and Engauge Digitizer offer alternative desktop applications with semi-automated curve tracing and axis calibration features. The accuracy of downstream estimates depends directly on the number and placement of extracted points. Best practice, as described by Guyot et al. (2012), is to capture every visible step in the curve and to record the number at risk at each reported time point. These number-at-risk values are critical because they encode censoring information: without them, the algorithm cannot distinguish patients who experienced the event from those who were lost to follow-up or remained event-free at the analysis cutoff.

Once digitized coordinates and number-at-risk data are available, a hazard ratio calculator from KM curve data applies the Tierney-Parmar method hierarchy to estimate the log hazard ratio and its variance. The most precise approach uses the reported HR and confidence interval directly; the next best uses the log-rank p-value combined with the total number of events to back-calculate the log(HR) via the relationship between the chi-squared statistic and the normal distribution. When only the KM curve is available, individual patient data (IPD) can be approximately reconstructed using the Guyot et al. (2012) algorithm, which iteratively estimates event and censoring times from digitized coordinates and number-at-risk tables, and the HR is then estimated from the reconstructed dataset using Cox regression. Before applying Cox regression to reconstructed IPD, researchers should assess whether the proportional hazards assumption holds. Violations, indicated by crossing survival curves or non-constant hazard ratios over time, may necessitate alternative time-to-event metrics such as the restricted mean survival time (RMST), which compares the area under each survival curve up to a pre-specified time horizon and avoids the proportional hazards assumption entirely. The least precise method, dividing the control group median survival by the treatment group median, assumes exponential (constant-hazard) survival, which rarely holds in practice but may be the only option for poorly reported studies.

Quality assurance in survival data extraction requires dual-independent digitization, where two reviewers independently extract coordinates from the same KM curve and discrepancies are resolved by consensus. PRISMA 2020 (Page et al., 2021) mandates that systematic reviews describe the data extraction process in enough detail for replication, including which software was used for digitization, which Tierney-Parmar method tier was applied to each study, and how disagreements between extractors were resolved. Documenting the extraction method per study also enables sensitivity analyses, for instance restricting the meta-analysis to studies where the HR was directly reported versus studies where it was estimated from KM curves. Our effect size calculator can then convert these hazard ratios into other metrics if needed for comparison across outcome types.

The broader context of survival meta-analysis connects digitization to several other methodological steps. Before extracting data, reviewers should assess the risk of bias in each trial using domain-appropriate tools such as our RoB 2 assessment tool for randomized trials or ROBINS-I for non-randomized studies. After extraction, pooled hazard ratios are typically visualized with a forest plot that displays individual study HRs as weighted squares and the summary estimate as a diamond. Heterogeneity assessment via I-squared and tau-squared, available through our heterogeneity calculator, determines whether a single summary HR is meaningful or whether subgroup analyses are warranted. Together, these steps form a rigorous pipeline that transforms published KM curves into quantitative evidence suitable for informing clinical guidelines and health policy decisions.

Frequently Asked Questions

What is Kaplan-Meier curve digitization and why is it needed?

Kaplan-Meier (KM) curve digitization is the process of extracting numerical data from published survival curves when the original data are not reported. Many clinical trials report survival outcomes as KM curves without providing the underlying hazard ratios, median survival times, or individual patient data (IPD). Systematic reviews of time-to-event outcomes require these numerical values to pool results in a meta-analysis. Digitization tools help researchers read coordinates from the curve and reconstruct the survival data needed for quantitative synthesis.

What is the Tierney and Parmar method for estimating hazard ratios?

The Tierney and Parmar (2007) method provides a framework for estimating hazard ratios (HRs) and their variances from various forms of published survival data. When a study reports a p-value from a log-rank test and the total number of events, the HR can be estimated using the relationship between the log-rank statistic, observed events, and the HR. The method also covers scenarios where only KM curves are available, where reported statistics vary (e.g., median survival times, event counts), or where HRs are reported without confidence intervals. It is the standard reference for time-to-event data extraction in systematic reviews.

Why are number at risk tables important for survival data extraction?

Number at risk tables, typically shown below KM curves, report how many patients remain in the study at specific time points. These numbers are critical for accurate data reconstruction because they provide censoring information. Without number at risk data, it is impossible to distinguish between events (deaths) and censored observations (patients lost to follow-up or still alive at end of study). When available, number at risk data substantially improve the accuracy of reconstructed survival curves and estimated hazard ratios.

Can individual patient data (IPD) be fully reconstructed from KM curves?

Approximate IPD can be reconstructed from KM curves using the algorithm described by Guyot et al. (2012). This method uses digitized curve coordinates and number at risk data to estimate individual event and censoring times. However, the reconstructed IPD is an approximation and may not perfectly match the original data, particularly when curves have few patients at risk in later time points. The accuracy depends on the quality of digitization, the number of time points extracted, and the availability of number at risk information.

How do I estimate a hazard ratio when only median survival times are reported?

When only median survival times are available for two groups, a crude HR estimate can be calculated as: HR = median_control / median_treatment. This assumes exponential survival distributions (constant hazard over time), which is often unrealistic. The log of this ratio gives the log(HR), and the standard error can be approximated using the total number of events. This method is the least precise of available approaches and should only be used when no other data are reported. Always note this limitation when using median-derived HRs in your meta-analysis.

How do I extract a hazard ratio from a Kaplan-Meier curve?

Use the Tierney et al. (2007) method: digitize survival probabilities at each time point where at-risk numbers are reported, reconstruct the number of events per interval, then estimate the log hazard ratio and its variance using inverse-variance weighting across intervals. Tools like WebPlotDigitizer extract coordinates from published figures, while the Guyot et al. (2012) algorithm reconstructs individual patient data (IPD) from the curve.

What is the difference between a hazard ratio and a risk ratio?

A risk ratio compares the probability of an event at a single time point, while a hazard ratio compares the instantaneous rate of events over time, accounting for censoring. Hazard ratios from Cox regression are the standard effect measure for time-to-event outcomes in meta-analysis. Unlike risk ratios, hazard ratios use all available follow-up data and handle variable follow-up durations.

Can I meta-analyze Kaplan-Meier data without individual patient data?

Yes. The Tierney and Parmar (2007) method allows estimation of hazard ratios and their variances from published Kaplan-Meier curves combined with at-risk tables. Guyot et al. (2012) reconstruct pseudo-IPD from digitized curves. These aggregate data methods are standard in systematic reviews when IPD are unavailable, which is the case in most reviews. Always report the extraction method and any assumptions made.

Related Research Tools

Once you have extracted hazard ratios and their standard errors, compute standardized effect sizes with our Effect Size Calculator for converting between different metrics. Pool your time-to-event estimates visually using the Forest Plot Generator to create publication-ready forest plots with weighted HR estimates and diamond summary statistics for your survival meta-analysis.

Reviewed by

Dr. Sarah Mitchell

PhD, Biostatistics & Research Methodology

Dr. Sarah Mitchell holds a PhD in Biostatistics from Johns Hopkins Bloomberg School of Public Health and has over 15 years of experience in systematic review methodology and meta-analysis. She has authored or co-authored 40+ peer-reviewed publications in journals including the Journal of Clinical Epidemiology, BMC Medical Research Methodology, and Research Synthesis Methods. A former Cochrane Review Group statistician and current editorial board member of Systematic Reviews, Dr. Mitchell has supervised 200+ evidence synthesis projects across clinical medicine, public health, and social sciences. She reviews all Research Gold tools to ensure statistical accuracy and compliance with Cochrane Handbook and PRISMA 2020 standards.

Learn more about our team

Need a Statistician? Our PhD Team Is Standing By.

From data cleaning and transformation to advanced statistical analysis, forest plots, and manuscript writing, we handle the numbers so you can focus on the science.

Our promise: Free re-run and re-write if reviewers question the analysis or reporting.

4.9 / 5 across 1,194+ projectsQuote in minutesReproducible R or Stata codePhD methodologistPay only after you approve scopeNDA available on request

Quote my statistical analysis Chat on WhatsApp

You Shape What We Build Next

Time

Survival Prob.

No. at Risk