Multiple Hypothesis Testing: How to Balance Power and False Positive Rate


In the field of statistical analysis, multiple hypothesis testing is a common problem that arises when a researcher conducts multiple experiments or tests simultaneously. The problem arises because the more hypotheses that are tested, the higher the probability of obtaining a false positive result. In this blog post, we will discuss the concept of multiple hypothesis testing and its implications, as well as methods for balancing the trade-off between the power of a test and the false positive rate.


When conducting a single hypothesis test, the null hypothesis is typically that there is no difference or relationship between the variables being studied, while the alternative hypothesis is that there is a difference or relationship. In a multiple hypothesis testing scenario, multiple null hypotheses are tested simultaneously. The false positive rate, also known as the type I error rate, is the probability of rejecting a null hypothesis when it is true. The power of a test, on the other hand, is the probability of correctly rejecting a null hypothesis when it is false.

The issue with multiple hypothesis testing is that the false positive rate increases as the number of hypotheses tested increases. This is known as the multiple testing problem. In order to control the false positive rate, researchers often use a technique called “multiple comparison correction,” which adjusts the significance level of each test.

Methods for Balancing Power and False Positive Rate

There are several methods that can be used to balance the trade-off between power and false positive rate in multiple hypothesis testing. Some of the most popular methods include:

  • Bonferroni correction: This method involves dividing the significance level (alpha) by the number of tests being conducted. For example, if a significance level of 0.05 is used and 20 tests are being conducted, the significance level for each test would be 0.05/20 = 0.0025.
  • Holm-Bonferroni correction: This method is similar to Bonferroni correction, but it takes into account the order of the p-values. The smallest p-value is compared to the significance level divided by the number of tests, the second smallest p-value is compared to the significance level divided by the number of tests minus one, and so on.
  • False Discovery Rate (FDR) controlling procedures: Instead of controlling the family-wise error rate, FDR controlling procedures control the expected proportion of false discoveries among the rejected hypotheses.
  • Benjamini-Hochberg procedure: This procedure control the FDR at a predefined level, for example q = 0.05.
  • Bayesian Model Averaging (BMA): This approach uses Bayesian statistics to calculate the probability of each hypothesis being true, taking into account the prior probabilities and the data.

Multiple hypothesis testing is a common problem in statistical analysis that can lead to an increased false positive rate. However, by using methods such as Bonferroni correction, Holm-Bonferroni correction, False Discovery Rate (FDR) controlling procedures and BMA, researchers can balance the trade-off between power and false positive rate. It is important to note that each method has its own assumptions and limitations, and the choice of method will depend on the specific research question and study design.

  1. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 289-300.
  2. Bonferroni, C. E. (1936). Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 3-62.
  3. Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 65-70.
  4. Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 479-498.
  5. Westfall, P., & Young, S. (1993). Resampling-based multiple testing: Examples and methods for p-value adjustment. Wiley.
  6. Bayesian Model Averaging (BMA) : Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: a tutorial (No. 1999-22). Technical report, Department of Statistics, University of Washington.