Score the methodological quality of cohort and case-control studies using the Newcastle-Ottawa Scale. Assign stars across selection, comparability, and outcome/exposure domains with multi-study support and CSV export.
Add studies and name them. For each study, select the option that best describes the study for every NOS item. Options that award a star are marked with a icon. Each study can earn up to 9 stars. Quality ratings: 0-3 Low, 4-6 Moderate, 7-9 High.
Representativeness of the exposed cohort
Selection of the non-exposed cohort
Ascertainment of exposure
Demonstration that outcome of interest was not present at start of study
Comparability of cohorts on the basis of the design or analysis
Comparability — additional factor
Assessment of outcome
Was follow-up long enough for outcomes to occur?
Adequacy of follow-up of cohorts
Select the appropriate tab for your study design: Cohort Studies or Case-Control Studies. Each tab presents the relevant NOS items for that design.
Add a row for each study in your review and enter the study identifier. Click a study row to expand its assessment form.
For each NOS item, select the option that best describes the study. Options marked with a star icon contribute to the total score.
Review the summary table showing stars per domain and overall quality ratings. Export results as CSV or copy the formatted text to your clipboard.
Each NOS star represents a specific methodological criterion that the study meets. Stars are awarded for secure exposure ascertainment, appropriate control selection, adequate follow-up, and other quality indicators. The maximum 9 stars span selection (4), comparability (2), and outcome/exposure (3).
While 7-9 stars is commonly considered high quality, 4-6 moderate, and 0-3 low, these thresholds are conventions rather than validated cutoffs. Use NOS scores to conduct sensitivity analyses — re-run your meta-analysis restricted to high-quality studies to test whether results are robust to study quality.
A total score of 6 can arise from very different quality profiles. Always report the domain-level breakdown (selection, comparability, outcome/exposure) so readers can identify whether specific quality concerns apply to your evidence base.
NOS inter-rater reliability is moderate. Best practice requires two independent reviewers to score each study, with discrepancies resolved through discussion or a third reviewer. Report the initial agreement rate (e.g., using Cohen's kappa) in your methods section.
Observational studies — cohort, case-control, and cross-sectional designs — form the backbone of evidence in many systematic reviews, particularly when randomized controlled trials are impractical or unavailable. The Newcastle-Ottawa Scale calculator implements the star-based scoring system developed by Wells et al. (2000) at the Universities of Newcastle (Australia) and Ottawa (Canada) to provide a standardized, reproducible method for appraising the methodological quality of these study designs. A NOS score tool allows reviewers to assign up to 9 stars across three categories: Selection (up to 4 stars), Comparability (up to 2 stars), and Outcome for cohort studies or Exposure for case-control studies (up to 3 stars). Each star represents a specific methodological criterion that the study satisfies, and the total score serves as a summary indicator of cohort study quality assessment that can be reported alongside effect estimates in meta-analyses. For cross-sectional studies, which fall outside the original NOS scope, a modified version proposed by Modesti et al. (2016) adapts the star-based framework to address the unique methodological concerns of cross-sectional designs, including response rate and standardized outcome measurement.
The Selection category evaluates whether the exposed and non-exposed cohorts (or cases and controls) were drawn from comparable populations and whether exposure or outcome ascertainment methods were reliable. The Comparability category — the most heavily weighted per-star domain — assesses whether the study controlled for the most important confounders, typically age, sex, and other variables central to the research question. The Outcome (or Exposure) category examines how the outcome was measured, whether follow-up was sufficiently long and complete, and whether objective assessment methods were used. The Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al., 2023) acknowledges NOS as one of several validated tools for observational study quality, while PRISMA 2020 (Page et al., 2021) requires all systematic reviews to present results of quality or risk of bias assessments for every included study. It is worth noting that AMSTAR 2 (Shea et al., 2017) assesses the quality of systematic reviews themselves rather than primary studies — a distinct purpose that should not be confused with the NOS, which evaluates individual observational study methodology. Systematic review management platforms such as Covidence and DistillerSR now include built-in quality assessment modules that support NOS scoring alongside other appraisal tools, streamlining the workflow for review teams.
A common interpretation framework classifies studies scoring 7-9 stars as high quality, 4-6 as moderate quality, and 0-3 as low quality, although these thresholds are conventions rather than empirically validated cutoffs. Best practice involves using NOS scores in sensitivity analyses — for instance, restricting a meta-analysis to high-quality studies to determine whether the pooled effect remains robust. NOS scores can also serve as a covariate in meta-regression, testing whether study quality moderates the treatment effect across studies. In dose-response meta-analyses, quality-stratified pooling by NOS score is particularly important because poorly designed studies may distort the shape of the exposure-response curve. Similarly, funnel plot asymmetry can be stratified by NOS score to disentangle publication bias from small-study effects driven by lower methodological quality. Because inter-rater reliability for NOS has been reported as moderate in validation studies (Lo et al., 2014), dual independent scoring followed by consensus resolution is essential. Reviewers should calculate and report the initial agreement rate, ideally using Cohen's kappa as an inter-rater reliability measure, in their methods section.
Choosing the right quality assessment tool depends on the study design and the level of detail required. NOS provides a relatively quick, star-based scoring system well-suited for large reviews with many observational studies. For a more granular, domain-based evaluation of non-randomized comparative studies, the ROBINS-I bias assessment for non-randomized studies uses signaling questions across seven domains with four judgment levels. Randomized trials should be appraised with the Cochrane RoB 2 risk of bias tool rather than NOS, as the two instruments address fundamentally different bias mechanisms. For reviews incorporating qualitative or prevalence data, the JBI critical appraisal checklists offer design-specific item sets from the Joanna Briggs Institute (Aromataris & Munn, 2020). Regardless of which instrument you choose, documenting the complete scoring rationale in your data extraction form ensures transparency and reproducibility across your review team.
The Newcastle-Ottawa Scale is a widely used tool for assessing the quality of non-randomized studies (cohort and case-control) in systematic reviews and meta-analyses. Developed by Wells et al., it assigns stars across three categories: Selection, Comparability, and Outcome (for cohort studies) or Exposure (for case-control studies). A study can earn a maximum of 9 stars, with higher scores indicating better methodological quality.
While there is no universally agreed threshold, a common interpretation uses three tiers: 7-9 stars indicates high quality, 4-6 stars indicates moderate quality, and 0-3 stars indicates low quality. Some systematic reviews use the NOS as a continuous variable in meta-regression to assess whether study quality moderates the pooled effect estimate. Always report the specific domain scores alongside the total.
Both versions share the same structure with three categories and similar principles, but they differ in specific items. The cohort version evaluates outcome assessment (independent blind assessment, record linkage, follow-up adequacy), while the case-control version evaluates exposure ascertainment (secure records, structured interviews, same method for cases and controls, non-response rate). The Comparability category is identical in both versions.
The original NOS was designed for cohort and case-control studies. An adapted version for cross-sectional studies has been proposed by various authors, but it is not part of the validated original scale. For cross-sectional studies, consider using the JBI critical appraisal checklist for analytical cross-sectional studies instead, which was specifically designed for this study design.
Key limitations include: (1) the scoring is somewhat subjective, as different reviewers may interpret criteria differently; (2) the scale assigns equal weight to all items, even though some may be more important for specific research questions; (3) the threshold cutoffs for low/moderate/high quality are not evidence-based; (4) inter-rater reliability has been reported as moderate in validation studies. Despite these limitations, NOS remains one of the most commonly used quality assessment tools for observational studies in systematic reviews.
Scores of 7–9 stars are generally considered high quality, 4–6 moderate quality, and 0–3 low quality. However, these thresholds are conventions, not empirically validated cutoffs. Some reviews define their own thresholds based on the research question. Always report the individual domain scores alongside the total, as a study can score 7 overall while having a critical weakness in one domain.
The NOS was developed by Wells et al. at the Universities of Newcastle and Ottawa but has limited formal validation. Inter-rater reliability has been reported as moderate (Lo et al., 2014). Despite these limitations, NOS remains the most widely used quality assessment tool for observational studies in systematic reviews. The Cochrane Handbook acknowledges NOS but recommends ROBINS-I as a more rigorous alternative for non-randomized intervention studies.
The original NOS was designed for cohort and case-control studies only. A modified version for cross-sectional studies was proposed by Modesti et al. (2016), adapting the selection, comparability, and outcome domains. If your review includes cross-sectional studies, use the modified NOS or consider the JBI critical appraisal checklist for cross-sectional studies as an alternative.
Including randomized trials in your review? Use our RoB 2 tool for randomized trials to create traffic-light summary tables across 5 bias domains. For non-randomized comparative studies, the ROBINS-I assessment for non-randomized studies provides a more detailed 7-domain evaluation with signaling questions. For qualitative and mixed-methods studies, explore our JBI critical appraisal checklists covering multiple study designs.
Our methodologists can conduct dual-independent NOS scoring, ROBINS-I assessments, and comprehensive quality appraisal for your entire systematic review with full consensus documentation.