Evaluate whether a set of statistically significant findings contains evidential value or shows signs of p-hacking using p-curve analysis (Simonsohn et al., 2014). Enter p-values directly or compute them from t, F, chi-squared, or z statistics. Run binomial and continuous right-skew tests, a flatness test against 33% power, and visualize the p-value distribution with an interactive D3.js histogram. Import CSV/Excel data, auto-generate a methods paragraph, and export R code for the dmetar package.
Drag & drop a file or
CSV, TSV, Excel (.xlsx/.xls) - max 500 rows
| P-value | Computed p | |
|---|---|---|
| - | ||
| - | ||
| - | ||
| - | ||
| - |
0 p-values computed. 0 significant (p < .05). Need at least 3 significant p-values for analysis.
Enter p-values directly (one per row), or switch to t, F, chi-squared, or z input mode. Each test statistic is automatically converted to a two-tailed p-value. Import from CSV/Excel or paste spreadsheet data.
Click Analyze to filter significant p-values (p < 0.05) and run all three tests: the binomial right-skew test, the Stouffer continuous right-skew test, and the flatness test against 33% power.
The histogram shows the distribution of significant p-values across five bins ([0, 0.01] through [0.04, 0.05]). Compare the observed distribution against the flat null expectation and the 33% power curve.
A significant right-skew test means the findings contain evidential value. A significant flatness test means the evidence is inadequate. The overall conclusion integrates both results.
Copy a publication-ready methods paragraph summarizing the p-curve analysis. Export reproducible R code for the dmetar package that runs the full analysis in RStudio.
Download the p-curve plot as a high-resolution PNG. Export the results table as CSV or Excel. Copy test statistics for your manuscript.
Need this done professionally? Get a complete systematic review or meta-analysis handled end-to-end.
Get a Free QuoteWhen the underlying effect is genuine, statistically significant p-values cluster near zero rather than spreading uniformly. A significantly right-skewed p-curve provides strong evidence that the set of findings reflects true effects rather than noise or p-hacking.
A p-curve that is flat (uniform) or left-skewed (concentrated near 0.05) suggests that the significant results were not generated by real effects. This pattern is consistent with p-hacking, selective reporting, or ambient statistical noise.
Funnel plots and Egger's test examine the relationship between effect sizes and precision. P-curve takes a different approach by focusing exclusively on the distribution of significant p-values. Using both methods together provides a more thorough assessment of evidential integrity.
P-curve analysis uses only p-values below 0.05 from the set of studies. Non-significant results are excluded because the method is specifically designed to evaluate the distribution pattern among significant findings.
The flatness test compares the observed p-curve against the expected distribution under 33% statistical power. This benchmark was chosen by Simonsohn et al. because it represents the minimum level of power that would still produce a meaningfully right-skewed distribution of significant p-values.
Best practice is to report both the right-skew test (evidence of real effects) and the flatness test (evidence of inadequate evidence) in your manuscript. This dual reporting provides readers with a complete picture of the evidentiary value of the included studies.
P-curve analysis was introduced by Simonsohn, Nelson, and Simmons (2014) as a diagnostic tool for evaluating the evidential value of a set of statistically significant findings. The method addresses a fundamental question in meta-analysis: do the reported significant results reflect genuine underlying effects, or could they be the product of selective reporting and p-hacking? Traditional publication bias methods like funnel plots and regression-based tests (Egger et al., 1997; Begg and Mazumdar, 1994) focus on the relationship between effect sizes and precision. P-curve offers a complementary lens by examining the shape of the distribution of significant p-values themselves.
The core insight behind p-curve is straightforward: when a true effect exists and studies have adequate power, the distribution of significant p-values (those below 0.05) should be right-skewed, with most p-values clustering near zero. Under the null hypothesis of no effect, significant p-values follow a uniform distribution between 0 and 0.05. When researchers engage in p-hacking, exploiting researcher degrees of freedom to push p-values just below 0.05, the distribution becomes left-skewed with a concentration of values near the significance threshold.
This tool implements three complementary tests from the p-curve framework. The binomial right-skew test checks whether more than 50% of significant p-values fall below 0.025. Under the null hypothesis, exactly 50% should fall in each half, so a significant excess in the lower half indicates right-skew. The continuous right-skew test uses Stouffer's method to combine evidence across all p-values. Each significant p-value is transformed to a uniform scale (pp = p / 0.05), then converted to a z-score via the inverse normal distribution. The combined Stouffer Z-statistic follows a standard normal distribution under the null, providing a continuous measure of right-skew. The flatness test evaluates whether the observed p-curve is flatter than expected under 33% statistical power. If the flatness test is significant, the evidential value is deemed inadequate, suggesting the significant findings may not reflect true underlying effects.
Many published studies report test statistics rather than exact p-values. This tool supports direct conversion from t-statistics (with degrees of freedom), F-statistics (with numerator and denominator degrees of freedom), chi-squared statistics (with degrees of freedom), and z-statistics. All conversions use two-tailed p-values to maintain consistency. You can also import data from CSV or Excel files for batch processing, or paste tab-separated data directly from a spreadsheet.
P-curve analysis works best when combined with other methods for evaluating the integrity of a body of evidence. Use our funnel plot and publication bias tool for visual inspection and formal tests of funnel plot asymmetry (Egger's test, Begg's test, trim-and-fill). Visualize individual study estimates with the forest plot generator to assess the overall pattern of results. Calculate individual study effect sizes with the effect size calculator before conducting your meta-analysis. Test the stability of your pooled estimate with the leave-one-out sensitivity analysis tool.
After running the analysis, the tool generates a publication-ready methods paragraph that reports the number of significant p-values analyzed, the results of the right-skew and flatness tests with exact p-values, and the overall conclusion about evidential value. For full reproducibility, the R code generator produces a script using the dmetar package (Harrer et al., 2021), which includes the pcurve() function for comprehensive p-curve analysis. The generated code includes your p-values and is ready to paste into RStudio.
Important caveats apply to p-curve analysis. The method requires a sufficient number of significant p-values (ideally 20 or more) for reliable inference. P-curve cannot distinguish between genuine effects with low power and effects inflated by p-hacking when sample sizes are very small. The method assumes that the selected studies represent a meaningful set of tests of the same or similar hypotheses. Mixing studies testing fundamentally different hypotheses can distort the p-curve shape. Always interpret p-curve results alongside visual inspection of the histogram and in the context of the broader evidence.
P-curve analysis (Simonsohn, Nelson, and Simmons, 2014) is a method for evaluating whether a set of statistically significant findings contains evidential value or shows signs of p-hacking. It examines the distribution of significant p-values (those below 0.05) from a collection of studies. If the effects studied are real, significant p-values should cluster near zero (right-skewed distribution). If there is no real effect, or if researchers have engaged in p-hacking, the distribution should be flat or left-skewed (clustered near 0.05).
The right-skew test evaluates whether the distribution of significant p-values is right-skewed, meaning there are more very small p-values than would be expected under the null hypothesis. This tool computes two versions: (1) a binomial test checking whether more than 50% of significant p-values fall below 0.025, and (2) a continuous test using Stouffer's method, which transforms each p-value to a uniform scale and then combines z-scores. A significant right-skew test (p < 0.05) indicates the set of findings contains evidential value, meaning the underlying effects are likely real.
The flatness test (also called the test for inadequate evidential value) evaluates whether the p-curve is flatter than would be expected if the studies had 33% statistical power. A flat or left-skewed p-curve suggests that the significant results may have been obtained through selective reporting, p-hacking, or other questionable research practices rather than genuine effects. If the flatness test is significant (p < 0.05), this indicates that the evidential value in the set of studies is inadequate.
You can enter p-values directly (one per row), or provide test statistics that the tool will convert to p-values. Supported test statistics include t-statistics with degrees of freedom, F-statistics with numerator and denominator degrees of freedom, chi-squared statistics with degrees of freedom, and z-statistics. You can also import data from a CSV or Excel file using the drag-and-drop uploader.
Simonsohn et al. (2014) recommend a minimum of approximately 20 statistically significant p-values for reliable p-curve analysis. With fewer studies, the binomial and continuous tests may lack sufficient statistical power to detect right-skew or flatness. However, even with as few as 5 to 10 significant p-values, the p-curve histogram can provide useful visual information about the distribution pattern.
Funnel plots and Egger's regression test detect publication bias by examining the relationship between effect sizes and their precision (standard errors). P-curve takes a fundamentally different approach: it examines only statistically significant p-values and tests whether they are distributed in a way consistent with real effects. P-curve can detect p-hacking and selective reporting even when traditional publication bias tests show no asymmetry. The two approaches are complementary, and combining them provides a more complete picture of the integrity of a body of evidence.
Yes. After running the analysis, you can copy a ready-to-run R script that uses the dmetar package (Harrer et al., 2021) for p-curve analysis. The generated code includes your p-values and calls the pcurve() function, which produces the p-curve plot and all statistical tests. You can paste the code directly into RStudio for full reproducibility.
Detect publication bias with funnel plots and formal asymmetry tests using our funnel plot and publication bias tool. Visualize individual study estimates with our forest plot generator for meta-analysis. Calculate individual study effect sizes before your analysis with our effect size calculator for SMD, OR, and RR.
Reviewed by
Dr. Sarah Mitchell holds a PhD in Biostatistics from Johns Hopkins Bloomberg School of Public Health and has over 15 years of experience in systematic review methodology and meta-analysis. She has authored or co-authored 40+ peer-reviewed publications in journals including the Journal of Clinical Epidemiology, BMC Medical Research Methodology, and Research Synthesis Methods. A former Cochrane Review Group statistician and current editorial board member of Systematic Reviews, Dr. Mitchell has supervised 200+ evidence synthesis projects across clinical medicine, public health, and social sciences. She reviews all Research Gold tools to ensure statistical accuracy and compliance with Cochrane Handbook and PRISMA 2020 standards.
Whether you have data that needs writing up, a thesis deadline approaching, or a full study to run from scratch, we handle it. Average turnaround: 2-4 weeks.