Apply monitoring boundaries to your cumulative meta-analysis and determine whether the evidence is conclusive or if further trials are needed.
Drag & drop a file or
CSV, TSV, Excel (.xlsx/.xls) - max 500 rows
Boundary type: O'Brien-Fleming. Leave delta blank to use the observed pooled effect.
Enter at least 2 studies with valid data (name, year, and effect/SE or events/totals) to generate the TSA diagram.
Add each study in the order it was published, including the study name, publication year, and either effect size with standard error (continuous outcomes) or events and totals for both arms (binary outcomes). Chronological ordering is essential for valid cumulative analysis.
Set the type I error rate (alpha, default 0.05), the desired power (1 minus beta, default 80%), and the minimal clinically important difference (delta). You can also leave delta blank to use the observed pooled effect size as the anticipated intervention effect.
Choose whether to adjust the Required Information Size for between-study heterogeneity using D-squared. When heterogeneity is present, each participant contributes less information, and more total participants are needed. The tool estimates D-squared from your data automatically.
The tool computes the cumulative Z-statistic after each study, calculates the Required Information Size, and plots the O'Brien-Fleming monitoring boundaries, futility boundaries, and cumulative Z-curve on an interactive D3.js chart.
If the Z-curve crosses the upper monitoring boundary, evidence for benefit is conclusive. If it crosses the futility boundary, further studies are unlikely to demonstrate a significant effect. If it remains between boundaries without reaching the RIS, the evidence is inconclusive.
Download the TSA diagram as PNG or PDF for your manuscript. Copy the auto-generated methods paragraph describing your TSA parameters. Export the cumulative data table as CSV, or copy the reproducible R code for verification.
Need this done professionally? Get a complete systematic review or meta-analysis handled end-to-end.
Get a Free QuoteEach time a new study is added and significance is recalculated, the cumulative false positive rate exceeds the nominal 5%. With 10 updates, the true alpha may reach 20% or higher. TSA applies formal spending functions to control this inflation.
The Required Information Size is the meta-analytic equivalent of a power calculation for a single trial. It represents the total number of participants needed across all studies to reliably detect or rule out a pre-specified effect size at the desired power level.
When between-study heterogeneity exists, each participant contributes less usable information. The diversity measure (D-squared) quantifies this information loss, and the adjusted RIS equals the base RIS divided by (1 minus D-squared), often substantially increasing the required sample.
When the cumulative Z-curve crosses the O'Brien-Fleming monitoring boundary, the evidence is considered conclusive at the pre-specified alpha level, accounting for all previous looks at the data. No further trials are strictly needed for this outcome.
If the Z-curve enters the inner futility zone, continuing to accrue data is unlikely to yield a statistically significant result. This information helps research funders and ethics committees decide whether additional trials on the same question are justified.
The GRADE Working Group recommends comparing the cumulative sample size to the Optimal Information Size (equivalent to RIS) when rating imprecision. If total participants fall below the RIS, evidence certainty should be downgraded regardless of statistical significance.
When the cumulative Z-curve has not crossed any boundary and has not reached the Required Information Size, the meta-analysis cannot make definitive claims. More data from future trials is needed before firm conclusions can be drawn about the intervention effect.
TSA was designed for prospective application (deciding whether new trials are needed). When applied retrospectively to completed meta-analyses, the boundaries were not pre-specified, so results should be interpreted as exploratory evidence about adequacy rather than formal stopping rules.
Trial sequential analysis was introduced by Wetterslev et al. (2008) and further developed by Thorlund et al. (2011) to evaluate whether a cumulative meta-analysis has accrued sufficient evidence to draw firm conclusions. The core insight is that each time a new trial is added and the p-value recalculated, the cumulative type I error inflates well beyond the nominal 5% level. Trial sequential analysis imports principles from group sequential monitoring of clinical trials into evidence synthesis, providing O'Brien-Fleming alpha-spending boundaries that preserve the pre-specified error rate regardless of how many updates occur. The GRADE Working Group recommends the Required Information Size derived from trial sequential analysis as a quantitative basis for assessing GRADE imprecision in evidence certainty ratings.
The Required Information Size parallels the sample size calculation for a single randomized trial. It depends on four parameters: the anticipated effect size (delta), the type I error rate (alpha), the desired statistical power (1 minus beta), and the between-study heterogeneity. When heterogeneity exists (D-squared greater than 0), the effective information from each participant is reduced, and the RIS must be inflated by dividing by (1 minus D-squared). A meta-analysis that has not reached the RIS remains potentially underpowered, even if the conventional p-value is below 0.05. Our power analysis calculator provides the foundational mathematics behind sample size determination, which trial sequential analysis extends to the multi-study context.
The cumulative Z-curve is the central visual element of the TSA diagram. It plots how the standardized test statistic evolves as each study is added chronologically. The shape of this curve reveals whether evidence is accumulating steadily toward conclusiveness or oscillating without convergence. A Z-curve that rises steeply with early studies but flattens as more data arrives suggests that initial enthusiasm may have been driven by small-study effects or publication bias, which can be formally investigated using our funnel plot and publication bias tool.
The alpha spending function determines how the overall type I error budget is distributed across sequential looks at the data. The O'Brien-Fleming approach is conservative early (requiring very strong evidence to cross the boundary when few participants have been accrued) and becomes progressively easier to cross as the information fraction approaches 1.0. Alternative spending functions (Lan-DeMets, Pocock) distribute alpha more evenly but sacrifice power at the final analysis. The choice of spending function should be pre-specified and justified based on the research context.
The futility boundary (also called the inner boundary) addresses the complementary question: when should researchers conclude that the treatment is unlikely to show a meaningful benefit even if more trials are conducted? Crossing the futility boundary means that the conditional power to detect the pre-specified effect size has fallen below a threshold (typically 20%), making further investment in new trials ethically and economically questionable. This information is particularly valuable for research funders, systematic review update authors, and clinical guideline panels considering whether to commission new trials.
Interpreting the TSA diagram requires understanding three possible outcomes: the Z-curve crossing the monitoring boundary (conclusive evidence), entering the futility zone (further trials unlikely to change the conclusion), or remaining between boundaries without reaching the RIS (inconclusive, more data needed). Combining trial sequential analysis with a forest plot showing individual study contributions and a sensitivity analysis identifying influential studies provides a complete picture of evidence stability and adequacy.
When reporting trial sequential analysis in publications, authors should specify all pre-specified parameters (alpha, beta, delta, heterogeneity adjustment method), present the TSA diagram with clearly labeled boundaries, and state whether the analysis was prospective or retrospective. Wetterslev et al. (2008) and Thorlund et al. (2011) provide detailed reporting guidance. For meta-analyses that remain inconclusive after trial sequential analysis, quantify the additional information fraction needed and estimate how many participants in future trials would satisfy the Required Information Size, using our power analysis calculator for individual trial sample size planning.
Trial Sequential Analysis (TSA) is a methodology that applies formal monitoring boundaries to cumulative meta-analysis, analogous to interim analysis in a single randomized trial. Standard meta-analysis calculates a pooled estimate each time a new study is added, but this repeated testing inflates the risk of false positive findings (type I error). TSA addresses this by computing a Required Information Size and applying alpha-spending boundaries (such as O'Brien-Fleming) that maintain the overall type I error rate. It was developed by Wetterslev, Thorlund, and colleagues at the Copenhagen Trial Unit and is recommended by GRADE for assessing imprecision in evidence synthesis.
The Required Information Size is the meta-analytic analogue of a sample size calculation for a single trial. It represents the total number of participants that must be accrued across all trials for the meta-analysis to have adequate power to detect (or rule out) a specified effect size. The RIS depends on the expected effect size (delta), the type I error rate (alpha), the desired power (1 minus beta), and the heterogeneity among included studies. When heterogeneity is present, the RIS is adjusted upward using the diversity (D-squared) measure: adjusted RIS = RIS / (1 minus D-squared). A meta-analysis that has not yet reached the RIS is considered potentially underpowered, regardless of whether the conventional p-value is below 0.05.
The TSA diagram plots cumulative participants (x-axis) against the cumulative Z-statistic (y-axis). The blue Z-curve shows how the evidence accumulates as each study is added chronologically. The red dashed lines represent the O'Brien-Fleming monitoring boundaries for benefit (upper) and harm (lower). If the Z-curve crosses the upper boundary, you have firm evidence of benefit that is not due to repeated testing. If it crosses the lower boundary, there is evidence of harm. The green dashed inner boundaries represent futility: if the Z-curve remains inside this zone, further studies are unlikely to achieve significance. The vertical dashed line marks the Required Information Size. A Z-curve that stays between the monitoring and futility boundaries without reaching the RIS means the evidence remains inconclusive.
GRADE evaluates imprecision as one of five domains that can lead to downgrading the certainty of evidence. The GRADE handbook recommends considering imprecision not only based on the width of the confidence interval relative to the null, but also on whether the cumulative sample size meets the Optimal Information Size (OIS), which is conceptually equivalent to the Required Information Size in TSA. If the total number of participants in the meta-analysis is less than the RIS (or OIS), GRADE suggests downgrading for imprecision even if the pooled result is statistically significant. TSA formalizes and extends this logic by applying alpha-spending boundaries, making the assessment more rigorous than simply comparing cumulative N to a threshold.
TSA is most valuable when: (1) a meta-analysis finds statistical significance early with few studies and limited participants, raising concern about false positive results from sparse data and repeated testing; (2) you want to determine whether the current evidence base is large enough to draw firm conclusions; (3) you need to estimate how many more participants or trials are needed to confirm or refute an effect; (4) you are assessing GRADE imprecision and want a quantitative framework for the Optimal Information Size criterion. TSA is less useful when the meta-analysis includes dozens of large trials that clearly exceed any reasonable information size threshold, because in that scenario the boundaries are trivially satisfied.
TSA has several limitations. First, the Required Information Size depends on assumptions about the true effect size (delta) and variance, which may not be known precisely. Second, TSA assumes that studies are added in a chronologically random order relative to their effect sizes, which may not hold in practice. Third, the method is based on fixed-effect or simple random-effects models and does not account for complex sources of bias or confounding. Fourth, the futility boundary is conservative and may prematurely declare futility when the true effect is small but clinically meaningful. Fifth, as with any sequential method, retrospective application (re-analysis of completed meta-analyses) should be interpreted cautiously because stopping rules were not pre-specified.
Visualize pooled effect sizes with study weights and confidence intervals. The standard companion to TSA.
Open toolLeave-one-out analysis to identify influential studies that drive the Z-curve in your TSA.
Open toolCompute sample sizes for individual trials. The single-study foundation for the Required Information Size.
Open toolReviewed by
Dr. Sarah Mitchell holds a PhD in Biostatistics from Johns Hopkins Bloomberg School of Public Health and has over 15 years of experience in systematic review methodology and meta-analysis. She has authored or co-authored 40+ peer-reviewed publications in journals including the Journal of Clinical Epidemiology, BMC Medical Research Methodology, and Research Synthesis Methods. A former Cochrane Review Group statistician and current editorial board member of Systematic Reviews, Dr. Mitchell has supervised 200+ evidence synthesis projects across clinical medicine, public health, and social sciences. She reviews all Research Gold tools to ensure statistical accuracy and compliance with Cochrane Handbook and PRISMA 2020 standards.
Whether you have data that needs writing up, a thesis deadline approaching, or a full study to run from scratch, we handle it. Average turnaround: 2-4 weeks.