Sequential testing

11/26/2023

To answer this and other questions I will make ample use of simulations and visual aids. These are covered to a greater extent in chapter 10 of my 2019 book “Statistical Methods in Online A/B Testing” and the cited literature. In this article I will try to explain error spending in an accessible manner and without going into the mathematical details. Sequential testing is based on a concept called error spending which is what allows us to make statistical evaluations of the data continuously or at certain intervals (predefined or not) while retaining the error guarantees you would expect from a frequentist method. See the sequential testing entry of our glossary and the articles linked there for more details on both the benefits and the drawbacks to sequential hypothesis testing. On the other hand, being able to analyze test data as it gathers and to act on it swiftly has many benefits as long as one can maintain the desired error control throughout the process. In short, peeking with intent to stop breaks the validity of both risk estimates and effect size estimates and largely defeats the very purpose of A/B testing. On the one hand CRO experts, product managers, growth experts, and analysts are becoming more aware of the adverse impact of the misuse of significance tests and confidence intervals that we call peeking. However, this type of sequential analysis is not sequential testing proper as these solutions have generally abandoned the idea of testing and therefore error control, substituting it for what seems like an ersatz decision-making machine (see Bayesian vs Frequentist Inference for more on this).įrequentist sequential testing on the other hand is becoming more popular by the day with the reasons being twofold. In the context of the OP, the same result should occur if this predictive "frequentist-matching property" occurs when the observation of interest is the test statistic.Sequential analysis of experimental data from A/B tests has been quite prominent in recent years due to the myriad of Bayesian solutions offered by big industry players. Here the observation of interest is a sample standard deviation, and the Bayesian predictive distribution of this standard deviation enjoys the "frequentist-matching property": Bayesian $100p\%$-prediction intervals coincidentally are frequentist $100p\%$-prediction intervals. The conclusion is that indeed the Bayesian prediction approach (with an appropriate noninformative prior) controls the frequentist probability of success. It is not really related to the power of a hypothesis test but the approach is of the same spirit as the question of the OP: the goal is to guarantee a given probability of success for a certain event, similarly to the question of the OP in the context of hypothesis testing. Here is an example of using the Bayesian predictive distribution for planning a new experience: Sample size determination for a Gaussian mean It would be easy to explore such questions using simulations, but does there exist some theoretical results about such questions ? Then collect the new sample and perform the test.Īdopting this methodology, what about the probability to reject $H_0$ in function of $\theta$ ?.Otherwise use the predictive distribution to evaluate the required new sample size to get $80\%$ predictive power to reject $H_0$.Suppose we perform a hypothesis test from a random sample $(x_i)_^n$ with $n=10$

0 Comments

Sequential testing

Leave a Reply.

Author

Archives

Categories