Not sure which statistical test to use? Answer a few questions about your outcome, groups, and design, and get the right test with its assumptions, the rank-based or exact alternative, and a direct link to the free calculator that runs it.
Choosing the right statistical test comes down to four questions asked in order. First, what is the goal: comparing groups, measuring a relationship, predicting an outcome, or quantifying agreement? Second, what type is the outcome variable: continuous measurements, ordinal ranks, or categories? Third, how many groups are involved, and are the observations independent or paired? Fourth, do the assumptions of the parametric option hold, chiefly approximate normality for continuous outcomes and adequate expected counts for categorical ones? Every standard test is the unique answer to one combination of these four answers, which is why a decision tree settles in seconds a question that produces thousands of anxious forum posts.
The most common mistakes are mismatching the design rather than the mathematics: running an independent-samples test on paired data (which throws away the pairing and its power), using a chi-square test on a table with tiny expected counts (where only an exact test is valid), and treating ordinal Likert responses as if they were normally distributed measurements. When in doubt between a parametric test and its rank-based counterpart, the rank-based test costs little power when assumptions hold and protects the conclusion when they do not.
Next step
Assumption checks, the analysis itself, effect sizes, and a publication-ready APA results section, handled end to end.
Our promise: Free re-run and re-write if reviewers question the analysis or reporting.
Timeline
Most projects deliver in under 2 weeks. We confirm an exact date in your quote.
If reviewers push back
If reviewers question the analysis, assumptions, or reporting, we re-run and re-write free.
Confidentiality
NDA available on request before any project discussion. Your data, study design, and manuscript stay private either way.
Want a PhD methodologist to handle the whole project?
Get a complete systematic review or meta-analysis handled end-to-end. Free rework on search, screening, or synthesis if reviewers push back. Pay only after you approve your quote.
The full statistical test decision tree behind the selector, in text form. Comparing a continuous outcome: two independent groups take the independent-samples t-test (Welch) or the Mann-Whitney U test when non-normal; two paired measurements take the paired t-test or the Wilcoxon signed-rank test; three or more independent groups take one-way ANOVA or Kruskal-Wallis; repeated measurements on the same subjects take repeated-measures ANOVA or the Friedman test. Comparing a categorical outcome: independent observations take the chi-square test of independence, switching to Fisher's exact test when any expected count drops below 5; paired binary outcomes take McNemar's test.
Measuring a relationship: two continuous variables take Pearson's r, or Spearman's rho when the relationship is monotonic rather than linear; two categorical variables return to chi-square with Cramer's V as the effect size. Predicting an outcome: linear regression for continuous outcomes, logistic regression for binary outcomes, and survival methods (Kaplan-Meier, Cox) for time-to-event data. Quantifying agreement: Cohen's kappa for two raters on categories, the intraclass correlation coefficient for continuous ratings, and Cronbach's alpha for the internal consistency of multi-item scales.
Every test in the selector belongs to one of three broad types. Parametric tests (t-tests, ANOVA, Pearson correlation, linear regression) assume the data follow a distribution, usually the normal, and in exchange offer the most statistical power when that assumption holds. Non-parametric tests (Mann-Whitney U, Wilcoxon signed-rank, Kruskal-Wallis, Friedman, Spearman's rho) replace the raw values with ranks, trading a little power for validity with skewed, ordinal, or outlier-prone data. Exact tests (Fisher's exact, the exact binomial, McNemar's exact variant) compute probabilities directly from the data with no approximation at all, which makes them the only defensible choice for small samples and rare events.
Cutting across those three types, tests also divide by purpose: comparison tests ask whether groups differ, association tests ask whether variables move together, prediction models quantify how predictors drive an outcome, and agreement statistics measure consistency between raters or items. The selector asks about purpose first because it narrows the field fastest; the parametric versus non-parametric versus exact choice is then a matter of checking assumptions.
Use a t-test to compare two groups and an ANOVA to compare three or more. The two agree exactly in the two-group case: the ANOVA F statistic equals the squared t statistic, so nothing is gained by running an ANOVA on two groups. What matters more is matching the design: paired data need a paired t-test or repeated-measures ANOVA, and skewed or ordinal outcomes call for the rank-based counterparts (Mann-Whitney U or Kruskal-Wallis).
The five tests most often meant by this phrase are the t-test (comparing two means), analysis of variance (comparing three or more means), the chi-square test (association between categorical variables), correlation (strength of a relationship between two continuous variables), and regression (predicting an outcome from one or more predictors). Together they cover the majority of analyses in student research, and each has a rank-based or exact alternative when its assumptions fail.
A common three-way grouping is comparison tests (t-tests, ANOVA, and their non-parametric counterparts), association tests (correlation coefficients and chi-square), and prediction models (regression). Another frequent three-way split is parametric tests, which assume a distribution such as the normal; non-parametric tests, which work on ranks; and exact tests, which compute probabilities directly from the data and remain valid at any sample size.
One widely taught taxonomy lists descriptive analysis (summarizing data), inferential analysis (generalizing from a sample), comparative analysis (differences between groups), associational or correlational analysis (relationships between variables), predictive analysis (regression and classification models), reliability analysis (agreement and internal consistency), and exploratory analysis (pattern finding without pre-specified hypotheses). The selector above covers the inferential, comparative, associational, predictive, and reliability families.
The selector links into the calculators directly, but the most used destinations are the two-sample t-test calculator, the ANOVA calculator with post-hoc comparisons, the chi-square and Cramer's V calculator, and the Mann-Whitney U and Wilcoxon calculator. Before running any comparison, the power and heterogeneity calculator checks whether your sample can detect the effect you expect.
Reviewed by
Dr. Sarah Mitchell holds a PhD in Biostatistics from Johns Hopkins Bloomberg School of Public Health and has over 15 years of experience in systematic review methodology and meta-analysis. She has authored or co-authored 40+ peer-reviewed publications in journals including the Journal of Clinical Epidemiology, BMC Medical Research Methodology, and Research Synthesis Methods. A former Cochrane Review Group statistician and current editorial board member of Systematic Reviews, Dr. Mitchell has supervised 200+ evidence synthesis projects across clinical medicine, public health, and social sciences. She reviews all Research Gold tools to ensure statistical accuracy and compliance with Cochrane Handbook and PRISMA 2020 standards.
From data cleaning and assumption checks to the full analysis and a publication-ready results section, we handle the numbers so you can focus on the science.
Our promise: Free re-run and re-write if reviewers question the analysis or reporting.