CloseHelpPrint
Kies de Nederlandse taal
Course module: 202001377
202001377
Simultaneous Statistical Inference
Course info
Course module202001377
Credits (ECTS)5
Course typeStudy Unit
Language of instructionEnglish
Contact persondr. K. Proksch
E-mailk.proksch@utwente.nl
Lecturer(s)
Lecturer
dr. T. Akkaya
Examiner
dr. K. Proksch
Contactperson for the course
dr. K. Proksch
Lecturer
dr. J.B. Timmer
Academic year2022
Starting block
2A
RemarksElective of module 11 B-AM
Application procedureYou apply via OSIRIS Student
Registration using OSIRISYes
Aims
After having followed the course, the student
  • is able to investigate general research questions by means of statistical testing, i.e., he/she can define null and alternative hypotheses in a formal way, perform a suitable test and interpret the outcome (in favour or against the research question).
  • is able to identify sources of multiplicity in a given test problem.
  • can quantify the inflation of the type I error in an uncorrected multiple testing problem.
  • is capable of adjusting a given testing problem to accommodate multiplicity such that
  1. the FWER is controlled at a given level.
  2. the FDR is controlled at a given level.
  • can compare the outcomes of the different multiplicity adjustments using procedures proposed by Bonferroni, Holm, Simes and Benjamini/Hochberg.
  • can quantify the loss of power (increase of the type II error) of a given corrected multiple testing problem.
  • is able to implement all procedures discussed in the course using R.
Content
Modern scientific applications quite often provide statisticians with thousands of hypothesis testing problems at the same time. As an example, a typical microarray experiment for monitoring gene expression levels consists of testing thousands of genes simultaneously and identifying those genes that are differentially expressed along the process. After having tested 10000 genes, say, by means of a standard statistical significance test with a significance level of  0.05, we need to expect to have falsely identified as many as 500 genes. In general, if we perform N independent hypothesis tests, the probability of making one false rejection, i.e., a type I error, converges to one exponentially fast with N. In experiments and studies like the above, it is, therefore, necessary to understand the effects of multiple testing and to know how to guard against making 'too many' false decisions. The challenge is to design procedures that control the type I error simultaneously over all tests, while maintaining a powerful procedure.
At the beginning of this course we briefly recall the concepts of statistical significance testing. In particular we review several specific examples statistical tests, which are often used for data analysis in practice. Subsequently, we address the problem of multiplicity and the associated inflation of the type I error and discuss the two most popular strategies of overall error control in multiple inference: control of the family-wise error rate (FWER) and control of the false discovery rate (FDR), where we focus on the procedures by Bonferroni, Simes, Holm and Benjamini/Hochberg. Finally, we discuss the link between popular statistical procedures such as binary classification or model selection to multiple testing problems.

The course is accompanied by tutorials in which we mostly deal with practical issues related to multiple testing. A particular focus will be on data analysis using packages and routines for the software environment R for statistical computing. In particular,  the packages multcomp and multtest are introduced, which cover all multiple tests that will be discussed in this course. In each tutorial, we will focus on a specific data set and a specific research question to apply the procedures discussed in class.
Assumed previous knowledge
This course is suitable for students who have successfully participated in the module B-AM M5 Statistics and Analysis, that is, the students should have solid knowledge of basic statistical and probabilistic concepts such as expectation, mean, variance, discrete and continuous probability distributions, hypothesis testing and confidence intervals. Moreover, basic knowledge of R is recommended.
Module
Module 11
Participating study
Bachelor Applied Mathematics
Required materials
Course material
Lecture notes. The lecture notes contain all relevant procedures, examples and theoretical results. Proofs of the theoretical results will be provided as sketches only. More details to add will be provided in the lectures. The lecture notes will be based on the monograph by Dickhaus (listed below) as well as on some seminal papers in the field, a list of which will be provided in the lecture notes.
Recommended materials
Book
Thorsten Dickhaus. Simultaneous statistical inference. Springer, Heidelberg, 2014. ISBN: 978-3-642-45182-9
Instructional modes
Lecture

Tutorial

Tests
Written exam and assignments

Remark
Written exam with open questions at the end of the course .

CloseHelpPrint
Kies de Nederlandse taal