|
After following this course, the student is able to:
- explain the different dimensions of (value-based) reinforcement learning algorithms,
- explain the ODE (ordinary differential equation) approach to show convergence for stochastic approximation schemes,
- model a (real-world) sequential decision-making problem (SDMP) as a Markov decision process,
- choose a suitable reinforcement learning algorithm to solve the SDMP, which may include the design of an appropriate approximation framework,
- implement and test reinforcement learning algorithms using a modern software package,
- analyze a given reinforcement learning algorithm with respect to stability, convergence, and optimality,
- analyze, critically evaluate and explain a scientific article's reinforcement learning problem and the corresponding solution approach.
|
|
Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents should act in an environment to maximize a reward signal. Formally, the environment is given as a Markov decision process (MDP) for which the underlying dynamics may be known or unknown. This course introduces techniques for modeling and solving RL problems, focusing on provable performance guarantees such as convergence and optimality.
The covered models and algorithms correspond to Chapters 1-10 in the textbook by Sutton and Barto: "Reinforcement Learning: An Introduction," 2nd edition. In addition to the textbook, we draw theory from Borkar's book on" Stochastic Approximation: A Dynamical Systems Viewpoint" and further insights from Powell's book "Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions."
There will be two homework sets. One set will focus on theoretical exercises and the other on implementation and experimentation. In addition, students will study a recent scientific article on RL in groups of 2 or 3 and present it in class.
Assessment
- Written exam (50%)
- Two homework sets (30%)
- Reading and presenting a scientific paper (20%)
|
 |
|
|
|
 Voorkennis191531920 - Markov Decision Theory and Algorithmic Methods is a prerequisite. |
Master Applied Mathematics |
Master Electrical Engineering |
Master Interaction Technology |
Master Systems and Control |
| | Verplicht materiaalBookSutton and Barto, "Reinforcement Learning: an Introduction", second edition, 2018. https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf |
 |
| Aanbevolen materiaalBookBokar, "Stochastic Approximation, A Dynamical Systems Viewpoint", 2008. https://doi.org/10.1007/978-93-86279-38-5 |
 | ArticlesPowell, ”A unified framework for stochastic optimization”, EJOR, 275(3), 2019, 795-821.
https://doi.org/10.1016/j.ejor.2018.07.014 |
 |
| Werkvormen Hoorcollege 
 | Presentatie(s) Aanwezigheidsplicht |  | Ja |

 | Vragenuur 
 | Werkcollege 
 | Zelfstudie met begeleiding 
 |
| Toetsen Written Exam
 |
|
| |