After this course the student is able to
- Use, apply and understand several types of advanced statistical learning models.
- To apply different methods to build and validate a statistical learning model, including variable selection.
- To deal with missing data with appropriate methods.
- To critically appraise the outcomes of such statistical models for health care or other application fields
- Use R and Rstudio for applying statistical learning
- Apply his/her obtained knowledge to a real life practical problem
The amount of data that is available in health care for example is growing exponentially over the last two decades. Moreover, the diversity of digital health data became larger with the rise of the field of genomics, proteomics, diagnostics and imaging, lab tests and wearables. The field of statistical learning is seen as a promise for better risk stratification, personalized medicine, and health care optimization. Also in other fields the amount and type of data that is becoming available is rising.|
Statistical learning is the combination of machine learning with statistics. Similar to machine learning, it studies computer algorithms for learning with the goal to make predictions, where the same algorithms as in machine learning are applied, but in addition, it involves statistical models and the assessment of their uncertainty. Statistical learning is about automatically learn the structure, patterns and regularities in complicated data, and to use these patterns to predict future data. Although statistical learning is applied in all kind of fields, we focus primary on health science, but the methodology learned in this course can be easily used in other fields.
Main topics in the course will be:
In this course you will learn how to derive sound statistical models including variable selection and dealing with missing data, to assess their quality and goodness of fit with the use of the statistical programming language R. In the first week we will start with a brief introduction to R and Rstudio. Next, you will learn the theory behind the models with self-study and available online material during the following weeks. Every week there will be two meetings of two hours each where you can work under supervision on the lab assignments and ask questions on the theory. After that you will work on a health related project (this can be individually or with a fellow student) where the aim is to apply the models that you have learnt. For this project you need to write a small report. The final assessment will be an individual assignment on the theory and / or application of statistical learning. The focus of the course would be not only on applications in health care.
- Supervised learning with regression and/or classification and tree-based statistical models
- Model selection with resampling methods (cross-validation, bootstrap, Ridge, Lasso)
- Variable selection methods
- Unsupervised learning (principal components analysis and clustering methods
- Methods for dealing with missing data