The Newcastle-Ottawa Scale is a quality assessment tool for evaluating non-randomized studies in systematic reviews, developed by Wells et al. at the Ottawa Hospital Research Institute. It uses a star-based scoring system across three domains, Selection, Comparability, and Outcome (for cohort studies) or Exposure (for case-control studies), awarding a maximum of 9 stars to classify studies as high, moderate, or low quality.
If your systematic review includes observational studies, you need a validated method for assessing their methodological quality. The Newcastle-Ottawa Scale guide you are reading covers every aspect of NOS assessment: the three scoring domains, item-by-item evaluation criteria, differences between cohort and case-control versions, quality thresholds, and how NOS compares to ROBINS-I. Whether you are conducting your first quality assessment or refining your approach, this guide provides the practical knowledge you need to apply NOS correctly and consistently.
What Is the Newcastle-Ottawa Scale?
NOS uses a star-based scoring system that awards between 0 and 9 stars across three broad domains. Each star represents adequate fulfillment of a specific methodological criterion. The simplicity of this approach, awarding or withholding a star for each item, makes NOS faster to complete than more complex tools while still capturing the core dimensions of study quality.
The scale operates on a principle of methodological adequacy rather than perfection. A study does not need flawless methodology to receive a star, it needs to meet a minimum threshold of methodological soundness for each criterion. This pragmatic approach reflects the reality that observational studies inherently carry more bias risk than randomized controlled trials, and quality assessment should differentiate between studies that took reasonable precautions and those that did not.
NOS is endorsed by the Cochrane Collaboration as an acceptable tool for assessing non-randomized studies (Higgins et al., 2023, Cochrane Handbook Chapter 25). Its widespread use means that reviewers, editors, and readers are familiar with NOS scores, making your quality assessment results immediately interpretable to your audience.
The 3 NOS Domains and Scoring
The NOS scoring system distributes its maximum 9 stars across three domains. Understanding what each domain evaluates, and how stars are awarded, is essential for consistent, defensible quality assessment.
Selection (4 Stars)
The Selection domain evaluates whether the study identified and enrolled participants in a way that minimizes selection bias. Four stars are available, each addressing a different aspect of how participants were selected and defined.
For cohort studies, the four Selection items assess: representativeness of the exposed cohort, selection of the non-exposed cohort, ascertainment of exposure, and demonstration that the outcome of interest was not present at the start of the study. For case-control studies, the items assess: adequacy of case definition, representativeness of cases, selection of controls, and definition of controls.
A study earns one star per criterion when it demonstrates adequate methodology. For example, a cohort study earns a Selection star for exposure ascertainment if exposure was measured using a validated instrument or secure medical record rather than self-report alone.
Comparability (2 Stars)
The Comparability domain is the most subjective component of NOS and the domain most frequently misapplied. It evaluates whether the study controlled for confounding variables, awarding up to 2 stars based on the adjustment strategy.
One star is awarded if the study controls for the single most important confounder. A second star is awarded if the study also controls for any additional important confounder. The critical requirement here is that you, as the reviewer, must pre-specify which confounders qualify before beginning your assessment. For example, in a study examining smoking and lung cancer, age might be designated the most important confounder and sex the second.
This domain requires you to state in your systematic review protocol which confounders earn the first and second star. Without pre-specification, you risk post-hoc rationalization, deciding after seeing results which confounders matter, which undermines the objectivity of your assessment.
Outcome/Exposure (3 Stars)
The third domain evaluates the quality of outcome measurement (in cohort studies) or exposure measurement (in case-control studies), awarding up to 3 stars.
For cohort studies, the three Outcome items assess: method of outcome assessment, length of follow-up, and adequacy of follow-up (attrition). For case-control studies, the three Exposure items assess: ascertainment of exposure, same method of ascertainment for cases and controls, and non-response rate.
A cohort study earns a star for outcome assessment when outcomes are verified through independent blind assessment or record linkage rather than self-report. It earns a follow-up star when the duration is sufficient for outcomes to occur, and an adequacy star when the proportion lost to follow-up is acceptable (commonly less than 20%).
The following table summarizes the items evaluated in each domain for both study types:
| Domain | Cohort Study Items | Case-Control Study Items | Max Stars |
|---|---|---|---|
| Selection | Representativeness of exposed cohort, Selection of non-exposed cohort, Ascertainment of exposure, Outcome not present at start | Adequacy of case definition, Representativeness of cases, Selection of controls, Definition of controls | 4 |
| Comparability | Controls for most important confounder, Controls for additional confounder | Controls for most important confounder, Controls for additional confounder | 2 |
| Outcome/Exposure | Assessment of outcome, Length of follow-up, Adequacy of follow-up | Ascertainment of exposure, Same method for cases and controls, Non-response rate | 3 |
NOS for Cohort Studies vs Case-Control Studies
The Newcastle-Ottawa Scale exists in two primary versions: one for cohort studies and one for case-control studies. While the three-domain structure and maximum 9 stars remain identical, the specific items within each domain differ to reflect the distinct methodological concerns of each study design.
| Feature | NOS Cohort Version | NOS Case-Control Version |
|---|---|---|
| Selection focus | Exposed and non-exposed cohort identification | Case and control identification |
| Third domain | Outcome assessment | Exposure assessment |
| Follow-up items | Length and adequacy of follow-up | Non-response rate |
| Temporal direction | Prospective/retrospective follow-up | Retrospective exposure assessment |
| Common confounders | Age, baseline disease severity | Age, sex, matching variables |
The cohort version emphasizes longitudinal follow-up, whether participants were tracked long enough for outcomes to develop and whether attrition was acceptable. The case-control version focuses on whether exposure was ascertained identically in cases and controls, because differential exposure measurement is the primary source of information bias in case-control designs.
Cross-sectional study adaptation is a common need that the original NOS does not officially address. Several research groups have published modified NOS versions for cross-sectional studies, most notably the adaptation by Herzog et al. (2013). These modified scales retain the three-domain structure but replace follow-up items with items relevant to cross-sectional designs, such as sample size justification and statistical adjustment. While widely used, these adaptations are not officially validated by Wells et al. and should be cited separately from the original NOS.
When your systematic review includes both cohort and case-control studies, apply the appropriate NOS version to each study type. Report NOS scores separately by study design in your results, as a 7-star cohort study and a 7-star case-control study have met different criteria and are not directly comparable on individual domain items.
Need expert help with quality assessment using the Newcastle-Ottawa Scale? Our methodologists conduct rigorous risk of bias assessments with dual-reviewer calibration and full reporting. contact us for a personalized research quote for your systematic review, or explore our dedicated systematic review services support.