Is NOS better than ROBINS-I?

Neither is 'better.' NOS is faster and widely accepted for quick quality screening. ROBINS-I is more rigorous with domain-level judgements. Choose based on your review's needs.

What's a 'good' NOS score?

Common thresholds: 7-9 stars = high quality, 4-6 = moderate, 0-3 = low. Pre-specify thresholds in your protocol.

Can I use NOS for cross-sectional studies?

Yes, using modified versions. The adapted NOS for cross-sectional studies is commonly used but not officially validated by the original authors.

How do I handle the comparability domain?

Award 1 star if the study controls for the most important confounder, 2 stars if it also controls for a second key confounder. Name both in your protocol.

Should I report domain scores or total score?

Report both. Total score for classification, domain scores for transparency about where quality concerns lie.

Back to Blog

Methodology

8 min read

Newcastle-Ottawa Scale Guide: Assessing Quality of Observational Studies

Complete Newcastle-Ottawa Scale guide covering the 3 domains, star-based scoring system (0-9), cohort vs case-control versions, quality thresholds, and NOS vs ROBINS-I comparison for observational study quality assessment in systematic reviews.

Dr. Sarah Mitchell

February 15, 2026

Need every observational study in your review independently scored by two PhD reviewers using NOS, with discrepancies resolved and risk-of-bias plots built? That is the standard for publishable systematic reviews. Get a free quote.

Key Takeaways

NOS is a widely used quality assessment tool for non-randomized studies, developed by Wells et al. at the Ottawa Hospital Research Institute

NOS evaluates 3 domains: Selection (4 stars), Comparability (2 stars), and Outcome/Exposure (3 stars), maximum 9 stars

Separate versions exist for cohort and case-control studies, with adapted versions available for cross-sectional designs

Need expert quality assessment for your review?

Our methodologists conduct dual-reviewer risk of bias assessments, GRADE certainty ratings, and publication-ready summary tables.

Explore our Evidence Synthesis Service starting from $750.

Get a Free Quote

Quality Assessment Support

Quality Assessment Takes Expertise. Our Team Does It Daily.

Rigorous risk of bias assessment, GRADE evaluations, and summary tables that satisfy peer reviewers. We handle the methodology so your review stands up to scrutiny.

4.9/5 rating1,194+ deliveredNDA protected

Get a Free Quote Pricing

Written by

Dr. Sarah Mitchell

PhD, Biostatistics & Research Methodology

Dr. Sarah Mitchell holds a PhD in Biostatistics from Johns Hopkins Bloomberg School of Public Health and has over 15 years of experience in systematic review methodology and meta-analysis. She has authored or co-authored 40+ peer-reviewed publications in journals including the Journal of Clinical Epidemiology, BMC Medical Research Methodology, and Research Synthesis Methods. A former Cochrane Review Group statistician and current editorial board member of Systematic Reviews, Dr. Mitchell has supervised 200+ evidence synthesis projects across clinical medicine, public health, and social sciences.

Learn more about our team

Calibrating NOS scores across 30+ studies takes weeks. Our reviewers handle dual extraction, conflict resolution, and the traffic-light figure for your manuscript. Request a quote.

Research Gold's methodology team conducts quality assessments using NOS, ROBINS-I, and RoB 2 as part of our systematic review service. Start your project today.

Quality Assessment Takes Expertise. Our Team Does It Daily.

Rigorous risk of bias assessment, GRADE evaluations, and summary tables that satisfy peer reviewers. We handle the methodology so your review stands up to scrutiny.

Get a Free Quote Explore Evidence Synthesis Service

Starting from $750. Final quote after we review your project.

The Newcastle-Ottawa Scale is a quality assessment tool for evaluating non-randomized studies in systematic reviews, developed by Wells et al. at the Ottawa Hospital Research Institute. It uses a star-based scoring system across three domains, Selection, Comparability, and Outcome (for cohort studies) or Exposure (for case-control studies), awarding a maximum of 9 stars to classify studies as high, moderate, or low quality.

If your systematic review includes observational studies, you need a validated method for assessing their methodological quality. The Newcastle-Ottawa Scale guide you are reading covers every aspect of NOS assessment: the three scoring domains, item-by-item evaluation criteria, differences between cohort and case-control versions, quality thresholds, and how NOS compares to ROBINS-I. Whether you are conducting your first quality assessment or refining your approach, this guide provides the practical knowledge you need to apply NOS correctly and consistently.

What Is the Newcastle-Ottawa Scale?

NOS uses a star-based scoring system that awards between 0 and 9 stars across three broad domains. Each star represents adequate fulfillment of a specific methodological criterion. The simplicity of this approach, awarding or withholding a star for each item, makes NOS faster to complete than more complex tools while still capturing the core dimensions of study quality.

The scale operates on a principle of methodological adequacy rather than perfection. A study does not need flawless methodology to receive a star, it needs to meet a minimum threshold of methodological soundness for each criterion. This pragmatic approach reflects the reality that observational studies inherently carry more bias risk than randomized controlled trials, and quality assessment should differentiate between studies that took reasonable precautions and those that did not.

NOS is endorsed by the Cochrane Collaboration as an acceptable tool for assessing non-randomized studies (Higgins et al., 2023, Cochrane Handbook Chapter 25). Its widespread use means that reviewers, editors, and readers are familiar with NOS scores, making your quality assessment results immediately interpretable to your audience.

The 3 NOS Domains and Scoring

The NOS scoring system distributes its maximum 9 stars across three domains. Understanding what each domain evaluates, and how stars are awarded, is essential for consistent, defensible quality assessment.

Selection (4 Stars)

The Selection domain evaluates whether the study identified and enrolled participants in a way that minimizes selection bias. Four stars are available, each addressing a different aspect of how participants were selected and defined.

For cohort studies, the four Selection items assess: representativeness of the exposed cohort, selection of the non-exposed cohort, ascertainment of exposure, and demonstration that the outcome of interest was not present at the start of the study. For case-control studies, the items assess: adequacy of case definition, representativeness of cases, selection of controls, and definition of controls.

A study earns one star per criterion when it demonstrates adequate methodology. For example, a cohort study earns a Selection star for exposure ascertainment if exposure was measured using a validated instrument or secure medical record rather than self-report alone.

Comparability (2 Stars)

The Comparability domain is the most subjective component of NOS and the domain most frequently misapplied. It evaluates whether the study controlled for confounding variables, awarding up to 2 stars based on the adjustment strategy.

One star is awarded if the study controls for the single most important confounder. A second star is awarded if the study also controls for any additional important confounder. The critical requirement here is that you, as the reviewer, must pre-specify which confounders qualify before beginning your assessment. For example, in a study examining smoking and lung cancer, age might be designated the most important confounder and sex the second.

This domain requires you to state in your systematic review protocol which confounders earn the first and second star. Without pre-specification, you risk post-hoc rationalization, deciding after seeing results which confounders matter, which undermines the objectivity of your assessment.

Outcome/Exposure (3 Stars)

The third domain evaluates the quality of outcome measurement (in cohort studies) or exposure measurement (in case-control studies), awarding up to 3 stars.

For cohort studies, the three Outcome items assess: method of outcome assessment, length of follow-up, and adequacy of follow-up (attrition). For case-control studies, the three Exposure items assess: ascertainment of exposure, same method of ascertainment for cases and controls, and non-response rate.

A cohort study earns a star for outcome assessment when outcomes are verified through independent blind assessment or record linkage rather than self-report. It earns a follow-up star when the duration is sufficient for outcomes to occur, and an adequacy star when the proportion lost to follow-up is acceptable (commonly less than 20%).

The following table summarizes the items evaluated in each domain for both study types:

Domain	Cohort Study Items	Case-Control Study Items	Max Stars
Selection	Representativeness of exposed cohort, Selection of non-exposed cohort, Ascertainment of exposure, Outcome not present at start	Adequacy of case definition, Representativeness of cases, Selection of controls, Definition of controls	4
Comparability	Controls for most important confounder, Controls for additional confounder	Controls for most important confounder, Controls for additional confounder	2
Outcome/Exposure	Assessment of outcome, Length of follow-up, Adequacy of follow-up	Ascertainment of exposure, Same method for cases and controls, Non-response rate	3

NOS for Cohort Studies vs Case-Control Studies

The Newcastle-Ottawa Scale exists in two primary versions: one for cohort studies and one for case-control studies. While the three-domain structure and maximum 9 stars remain identical, the specific items within each domain differ to reflect the distinct methodological concerns of each study design.

Feature	NOS Cohort Version	NOS Case-Control Version
Selection focus	Exposed and non-exposed cohort identification	Case and control identification
Third domain	Outcome assessment	Exposure assessment
Follow-up items	Length and adequacy of follow-up	Non-response rate
Temporal direction	Prospective/retrospective follow-up	Retrospective exposure assessment
Common confounders	Age, baseline disease severity	Age, sex, matching variables

The cohort version emphasizes longitudinal follow-up, whether participants were tracked long enough for outcomes to develop and whether attrition was acceptable. The case-control version focuses on whether exposure was ascertained identically in cases and controls, because differential exposure measurement is the primary source of information bias in case-control designs.

Cross-sectional study adaptation is a common need that the original NOS does not officially address. Several research groups have published modified NOS versions for cross-sectional studies, most notably the adaptation by Herzog et al. (2013). These modified scales retain the three-domain structure but replace follow-up items with items relevant to cross-sectional designs, such as sample size justification and statistical adjustment. While widely used, these adaptations are not officially validated by Wells et al. and should be cited separately from the original NOS.

When your systematic review includes both cohort and case-control studies, apply the appropriate NOS version to each study type. Report NOS scores separately by study design in your results, as a 7-star cohort study and a 7-star case-control study have met different criteria and are not directly comparable on individual domain items.

Need expert help with quality assessment using the Newcastle-Ottawa Scale? Our methodologists conduct rigorous risk of bias assessments with dual-reviewer calibration and full reporting. contact us for a personalized research quote for your systematic review, or explore our dedicated systematic review services support.

Feature	Newcastle-Ottawa Scale	ROBINS-I
Scoring	Star-based (0-9 stars)	Domain judgements (Low/Moderate/Serious/Critical)
Domains	3 (Selection, Comparability, Outcome/Exposure)	7 bias domains
Time per study	10-15 minutes	30-60 minutes
Output	Numeric score	Overall risk of bias judgement
Target trial concept	No	Yes, requires specifying hypothetical randomized controlled trial
Training required	Minimal	Substantial
Best for	Quick screening, large reviews	Rigorous assessment, Cochrane reviews

Newcastle-Ottawa Scale Guide: Assessing Quality of Observational Studies

Key Takeaways

Quality Assessment Takes Expertise. Our Team Does It Daily.

Dr. Sarah Mitchell

Quality Assessment Takes Expertise. Our Team Does It Daily.

What Is the Newcastle-Ottawa Scale?

The 3 NOS Domains and Scoring

Selection (4 Stars)

Comparability (2 Stars)

Outcome/Exposure (3 Stars)

NOS for Cohort Studies vs Case-Control Studies

How to Score and Interpret the Newcastle-Ottawa Scale

Pre-specify the two comparability confounders in your protocol

Use our NOS calculator for consistent scoring

Always pair NOS with sensitivity analysis

Frequently Asked Questions

Related Articles

Newcastle-Ottawa Scale vs ROBINS-I

Using NOS Results in Your Systematic Review

Common Newcastle-Ottawa Scale Mistakes

Related Articles