Reliability is the consistency of a measurement, whether it gives the same answer under the same conditions, and validity is its accuracy, whether it measures what it claims to measure. The two are distinct and not interchangeable: a bathroom scale that reads three kilograms heavy every time is perfectly reliable and completely invalid. Any study that relies on a questionnaire, a scale, or a rating has to establish both, because a measurement instrument that is neither consistent nor accurate cannot support a defensible conclusion no matter how sophisticated the later analysis.
Why a measure can be reliable without being valid
This is the idea that anchors everything else. Reliability concerns random error: a noisy instrument scatters its readings. Validity concerns systematic error, or bias: a biased instrument is consistently wrong in the same direction. You can have consistency without accuracy, as the heavy scale shows, but you cannot have accuracy without consistency, because an instrument that gives different answers each time cannot be reliably hitting the truth. Reliability is therefore a necessary but not sufficient condition for validity. Establishing reliability first, then validity, is the logical order for validating any instrument you build a study around.
The main types of reliability
Reliability is assessed in several complementary ways, and which ones you need depends on the instrument:
- Internal consistency asks whether the items on a multi-item scale measure the same underlying construct. It is the most commonly reported form, usually summarized by Cronbach's alpha, where values from roughly 0.70 to 0.95 are typically considered acceptable. You can compute it directly with our Cronbach's alpha calculator.
- Test-retest reliability asks whether the same people score consistently when measured on two occasions, capturing stability over time.
- Inter-rater reliability asks whether different raters assign consistent scores to the same cases, which matters whenever judgment is involved. For categorical ratings this is usually quantified with kappa; our guide to inter-rater reliability covers that case in depth.
- Parallel-forms reliability asks whether two equivalent versions of an instrument produce consistent results.
Reporting the form of reliability that matches your instrument, rather than defaulting to Cronbach's alpha for everything, is a mark of a careful measurement section.