Compute approximate Bayes factors for one-sample t-tests, two-sample t-tests, correlations, and binomial proportions. Interpret evidence strength using Jeffreys' classification scale.
Test whether a sample mean differs from a hypothesized value. Uses the BIC approximation (Wagenmakers, 2007).
Select the appropriate test: one-sample t-test, two-sample t-test, correlation, or binomial proportion.
Provide the summary statistics required for each test (e.g., mean, SD, n for t-tests, or r and n for correlations).
See BF10, BF01, and log10(BF10) along with the Jeffreys scale classification of evidence strength.
Copy the Bayes factor, interpretation, and all computed values to your clipboard for reporting.
Unlike p-values which provide a binary significant/not-significant decision, Bayes factors give a continuous measure of evidence strength. BF10 = 2.5 tells you the data are 2.5 times more likely under H1 than H0. Jeffreys (1961) and Kass & Raftery (1995) provide guidelines for interpreting these values, but they are not rigid cutoffs.
A key advantage of Bayesian hypothesis testing is the ability to quantify evidence in favor of H0. A BF10 of 0.1 (equivalently, BF01 = 10) means the data are 10 times more likely under H0 than H1. P-values cannot provide evidence for the null; they can only fail to reject it.
The Bayes factor depends on the prior distribution assigned to parameters under H1. Different priors yield different Bayes factors. Sensitivity analysis (varying the prior width) is recommended to ensure conclusions are robust. The BIC approximation used here is relatively insensitive to prior specification.
In systematic reviews, reporting both p-values and Bayes factors provides a more complete picture. When p-values hover near 0.05 or results are inconclusive, Bayes factors can clarify whether the evidence genuinely supports an effect, genuinely supports the null, or is simply ambiguous.
A Bayes factor calculator online quantifies the relative evidence that observed data provide for one statistical hypothesis over another. Unlike p-values, which can only reject the null hypothesis, the Bayes factor (BF₁₀) expresses how many times more likely the data are under the alternative hypothesis than under the null. This distinction makes Bayesian hypothesis testing particularly valuable in systematic reviews: when a meta-analysis yields a non-significant p-value, researchers cannot distinguish between "no effect exists" and "insufficient evidence to detect an effect." Replication Bayes factors further extend this logic by quantifying whether a new study's data are consistent with the effect size reported in an original publication, providing a formal framework for assessing the replicability of published findings. The Bayes factor resolves this ambiguity by providing a continuous measure of evidential support in both directions — a property Dienes (2014) calls "the evidential advantage of Bayesian inference."
Bayes factor interpretation follows the scale proposed by Harold Jeffreys (1961) and refined by Kass and Raftery (1995). A BF₁₀ between 1 and 3 provides anecdotal evidence for the alternative hypothesis, 3–10 provides moderate evidence, 10–30 provides strong evidence, and values above 100 provide decisive evidence. Conversely, BF₁₀ values below 1 favor the null hypothesis on the same scale (BF₀₁ = 1/BF₁₀). In practice, a BF₁₀ of 0.1 means the data are 10 times more likely under H₀ — strong evidence that no meaningful effect exists. This two-directional interpretation is critical for systematic reviews assessing treatment futility or equivalence. For nested model comparison, the Savage-Dickey density ratio offers an elegant computational shortcut by evaluating the posterior density at the point null relative to the prior density, avoiding the need for marginal likelihood integration.
This Bayesian hypothesis testing tool supports multiple test types: one-sample and two-sample t-tests for comparing means, correlation tests for assessing relationships, and proportion tests for categorical outcomes. Each uses the default Cauchy prior (width r = √2/2 for effect sizes, as recommended by Rouder et al., 2009) but allows custom prior specification. The choice of prior matters: wider priors spread probability over larger effect sizes, making it harder for small effects to generate strong Bayes factors. Sensitivity analysis across multiple prior widths — examining how BF₁₀ changes as the prior scales from narrow (r = 0.5) to wide (r = 1.5) — provides a robustness check analogous to the leave-one-out sensitivity analysis used in frequentist meta-analysis. JASP, the open-source statistical software with built-in Bayes factor computation, automates this robustness analysis through its "BF robustness check" plot, making prior sensitivity accessible to researchers without programming experience. A key advantage of Bayesian inference is that it supports sequential updating — because Bayes factors do not depend on the stopping rule, researchers can accumulate evidence as new studies appear without inflating error rates, unlike sequential frequentist testing that requires alpha-spending corrections.
In the context of evidence synthesis, Bayes factors complement rather than replace traditional frequentist statistics. When your meta-analysis yields a pooled effect near the null, computing the Bayes factor for the pooled estimate helps distinguish between absence of evidence and evidence of absence — a distinction that the Cochrane Handbook (Higgins et al., 2023) acknowledges is impossible with p-values alone. Use our effect size calculator to compute standardized effect measures as inputs for Bayesian analysis, and our p-value to confidence interval converter when you need to reconstruct standard errors from published test statistics. For sample size planning, our power analysis calculator estimates the number of participants needed for adequate frequentist power — the Bayesian equivalent (design analysis) uses similar inputs but optimizes for expected Bayes factor rather than Type I error rate. For researchers conducting Bayesian meta-analysis, Röver (2020) provides a practical framework for specifying informative priors derived from historical data, improving precision when the number of studies is small.
A Bayes factor (BF10) quantifies the relative evidence provided by the data for one hypothesis over another. BF10 = 5 means the data are 5 times more likely under the alternative hypothesis (H1) than under the null hypothesis (H0). Conversely, BF01 = 1/BF10 quantifies evidence for H0. Unlike p-values, Bayes factors allow you to quantify evidence in favor of the null hypothesis, not just against it.
Harold Jeffreys proposed a widely used scale for interpreting Bayes factors: BF10 > 100 = Decisive, 30–100 = Very Strong, 10–30 = Strong, 3–10 = Substantial, 1–3 = Anecdotal evidence for H1, and the inverse ranges for evidence supporting H0. Some researchers use the modified Kass & Raftery (1995) scale with slightly different thresholds. These are guidelines, not strict cutoffs.
Bayes factors and p-values answer fundamentally different questions. A p-value is the probability of obtaining data as extreme or more extreme than observed, assuming H0 is true. A Bayes factor is the ratio of the probability of the data under H1 to the probability under H0. A significant p-value (e.g., p < 0.05) does not always correspond to strong Bayesian evidence, and vice versa. Bayes factors incorporate prior information and provide a continuous measure of evidence strength.
For t-tests, this calculator uses BIC-based approximations (Wagenmakers, 2007) that are relatively robust to prior specification. For the binomial test, it uses a uniform Beta(1,1) prior on the proportion. These are general-purpose defaults suitable for exploratory analysis. For confirmatory research or when strong prior information exists, consider using specialized software like JASP or the BayesFactor R package with informed priors.
Bayes factors are particularly useful in systematic reviews when: (1) you want to distinguish between “no evidence of an effect” and “evidence of no effect”; (2) sequential analysis is needed as studies accumulate; (3) you want to incorporate prior evidence from earlier reviews; (4) traditional null hypothesis testing yields ambiguous results near the significance threshold. Bayesian meta-analysis is increasingly recommended by Cochrane and other organizations as a complement to frequentist approaches.
A p-value measures the probability of observing data at least as extreme as the result, assuming the null hypothesis is true. A Bayes factor quantifies the relative evidence for one hypothesis over another, without assuming either is true. Unlike p-values, Bayes factors can provide evidence in favor of the null hypothesis and are not affected by optional stopping.
A Bayes factor of 1 means the data are equally likely under both the null and alternative hypotheses — the evidence is completely uninformative. BF > 3 provides moderate evidence for the alternative; BF > 10 provides strong evidence. BF < 1/3 provides moderate evidence for the null. Values between 1/3 and 3 are considered inconclusive (Jeffreys, 1961).
Approximate conversions exist but are problematic because p-values and Bayes factors measure fundamentally different things. The “minimum Bayes factor” bound (Sellke, Bayarri & Berger, 2001) gives BF ≥ –1/(e × p × ln(p)) for p < 1/e, showing that p = 0.05 corresponds to a minimum BF of only about 2.5 — far weaker evidence than commonly assumed.
Our biostatisticians can perform Bayesian meta-analyses with informative priors, model comparison, and full sensitivity analyses for your systematic review.