After having followed the course, the student
- is able to investigate general research questions by means of statistical testing, i.e., he/she can define null and alternative hypotheses in a formal way, perform a suitable test and interpret the outcome (in favour or against the research question).
- is able to identify sources of multiplicity in a given test problem.
- can quantify the inflation of the type I error in an uncorrected multiple testing problem.
- is capable of adjusting a given testing problem to accommodate multiplicity such that
- the FWER is controlled at a given level.
- the FDR is controlled at a given level.
- can compare the outcomes of the different multiplicity adjustments using procedures proposed by Bonferroni, Holm, Simes and Benjamini/Hochberg.
- can quantify the loss of power (increase of the type II error) of a given corrected multiple testing problem.
- is able to implement all procedures discussed in the course using R.
Modern scientific applications quite often provide statisticians with thousands of hypothesis testing problems at the same time. As an example, a typical microarray experiment for monitoring gene expression levels consists of testing thousands of genes simultaneously and identifying those genes that are differentially expressed along the process. After having tested 10000 genes, say, by means of a standard statistical significance test with a significance level of 0.05, we need to expect to have falsely identified as many as 500 genes. In general, if we perform N independent hypothesis tests, the probability of making one false rejection, i.e., a type I error, converges to one exponentially fast with N. In experiments and studies like the above, it is, therefore, necessary to understand the effects of multiple testing and to know how to guard against making 'too many' false decisions. The challenge is to design procedures that control the type I error simultaneously over all tests, while maintaining a powerful procedure.|
At the beginning of this course we briefly recall the concepts of statistical significance testing. In particular we review several specific examples statistical tests, which are often used for data analysis in practice. Subsequently, we address the problem of multiplicity and the associated inflation of the type I error and discuss the two most popular strategies of overall error control in multiple inference: control of the family-wise error rate (FWER) and control of the false discovery rate (FDR), where we focus on the procedures by Bonferroni, Simes, Holm and Benjamini/Hochberg. Finally, we discuss the link between popular statistical procedures such as binary classification or model selection to multiple testing problems.
The course is accompanied by tutorials in which we mostly deal with practical issues related to multiple testing. A particular focus will be on data analysis using packages and routines for the software environment R for statistical computing. In particular, the packages multcomp and multtest are introduced, which cover all multiple tests that will be discussed in this course. In each tutorial, we will focus on a specific data set and a specific research question to apply the procedures discussed in class.
|This course is suitable for students who have successfully participated in the module B-AM M5 Statistics and Analysis, that is, the students should have solid knowledge of basic statistical and probabilistic concepts such as expectation, mean, variance, discrete and continuous probability distributions, hypothesis testing and confidence intervals. Moreover, basic knowledge of R is recommended.|
|Bachelor Applied Mathematics||Verplicht materiaal|
|Lecture notes. The lecture notes contain all relevant procedures, examples and theoretical results. Proofs of the theoretical results will be provided as sketches only. More details to add will be provided in the lectures.
The lecture notes will be based on the monograph by Dickhaus (listed below) as well as on some seminal papers in the field, a list of which will be provided in the lecture notes.|
|Thorsten Dickhaus. Simultaneous statistical inference. Springer, Heidelberg, 2014. ISBN: ISBN-10: 3662510065|
|Written exam and assignments|
OpmerkingWritten exam with open questions at the end of the course .