Cox proportional hazards regression is a survival analysis method that estimates how one or more predictors relate to the rate at which an event happens over time, while handling the fact that not everyone in a study experiences the event before it ends. It produces a hazard ratio, the relative rate of the event in one group compared with another at any given moment, without requiring you to assume a particular shape for the underlying survival curve. That flexibility is why it is the workhorse of clinical and epidemiological time-to-event analysis.
Survival data are different from the outcomes ordinary regression handles. The outcome is not simply whether an event occurred but when, and many participants are censored, meaning the study ended or they were lost to follow-up before the event happened. You know only that they survived event-free up to a certain point. Cox regression and its companion, the Kaplan-Meier estimator, are built to use that partial information rather than discard it.
Kaplan-Meier and Cox: two tools, two jobs
The Kaplan-Meier estimator describes survival. It produces the familiar stepped survival curve showing the probability of remaining event-free over time, and groups can be compared visually and tested with the log-rank test. It is descriptive and unadjusted: it shows what happened in each group but cannot account for other variables.
Cox proportional hazards regression explains and adjusts. It estimates the effect of a predictor on the hazard while holding other covariates constant, which is what you need when groups differ on characteristics beyond the exposure of interest. In practice the two are used together: Kaplan-Meier curves and a log-rank test to show the unadjusted picture, then a Cox model to estimate adjusted hazard ratios. Choosing the right combination for a given question is part of what our biostatistics consulting service does on clinical datasets.
Interpreting a hazard ratio
The central output of a Cox model is the hazard ratio. A value of 1 means no difference in the event rate between groups. A value above 1 means a higher rate, and below 1 a lower rate. A hazard ratio of 1.5 for a treatment, for example, indicates the event occurs at one and a half times the rate in that group at any given instant, with the confidence interval and p-value telling you the precision and statistical significance.
A common error is to read a hazard ratio as a risk ratio or as a statement about how many people will eventually have the event. It is a ratio of instantaneous rates, not of cumulative outcomes, and it does not by itself tell you the difference in median survival. Reporting it alongside Kaplan-Meier curves gives readers both the rate comparison and the absolute survival picture. The distinction between rate-based and risk-based measures parallels the issues covered in our logistic regression guide for binary outcomes.