Propensity score matching is a method for estimating the effect of a treatment or exposure from observational data by pairing treated and untreated participants who had a similar probability of receiving the treatment. The propensity score is that probability, the likelihood of being treated given a person's measured characteristics, and matching on it makes the treated and comparison groups resemble each other on those characteristics. The goal is to approximate the balance a randomized trial would have produced when randomization was never possible.
In a randomized trial, treatment is assigned by chance, so the groups are comparable on everything, measured or not. In observational research, sicker patients may be more likely to receive a treatment, which confounds the comparison: any difference in outcomes mixes the treatment effect with the baseline difference. Propensity score matching addresses the measured part of that problem by constructing comparison groups that look alike on the variables you observed.
When to use propensity score matching
Use propensity score matching when you want to estimate a treatment effect but cannot randomize, which is the normal situation in registry studies, electronic health record analyses, and many cohort designs. It is most appropriate when you have measured the important confounders, when treated and untreated groups overlap enough to find matches, and when a reviewer or your own design demands that the comparison groups be made comparable before outcomes are analyzed.
It is the wrong tool when the variables that drive both treatment and outcome are unmeasured. Matching can only balance what you observed. If an important confounder was never recorded, the estimate remains biased no matter how good the balance looks on paper, a limitation that must be stated plainly rather than hidden.
How the method works
The procedure has a clear sequence. First, model the probability of treatment, usually with a logistic regression that predicts treatment status from the baseline covariates, producing a propensity score for every participant. Second, match treated participants to untreated ones with similar scores, often one-to-one within a tolerance called a caliper, though one-to-many and other schemes exist. Third, and most important, check covariate balance in the matched sample to confirm the groups now resemble each other. Fourth, estimate the treatment effect within the matched sample.
The third step is where careful analysts spend their attention. Balance is assessed with standardized mean differences for each covariate, with values below 0.10 generally taken to indicate adequate balance. Comparing balance before and after matching, rather than relying on the matching procedure to have worked, is the standard a reviewer expects.