
After following this course, the student is able to:
 explain the different dimensions of (valuebased) reinforcement learning algorithms,
 explain the ODE (ordinary differential equation) approach to show convergence for stochastic approximation schemes,
 model a (realworld) sequential decisionmaking problem (SDMP) as a Markov decision process,
 choose a suitable reinforcement learning algorithm to solve the SDMP, which may include the design of an appropriate approximation framework,
 implement and test reinforcement learning algorithms using a modern software package,
 analyze a given reinforcement learning algorithm with respect to stability, convergence, and optimality,
 analyze, critically evaluate and explain a scientific article's reinforcement learning problem and the corresponding solution approach.


Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents should act in an environment to maximize a reward signal. Formally, the environment is given as a Markov decision process (MDP) for which the underlying dynamics may be known or unknown. This course introduces techniques for modeling and solving RL problems, focusing on provable performance guarantees such as convergence and optimality.
The covered models and algorithms correspond to Chapters 110 in the textbook by Sutton and Barto: "Reinforcement Learning: An Introduction," 2nd edition. In addition to the textbook, we draw theory from Borkar's book on" Stochastic Approximation: A Dynamical Systems Viewpoint" and further insights from Powell's book "Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions."
There will be two homework sets. One set will focus on theoretical exercises and the other on implementation and experimentation. In addition, students will study a recent scientific article on RL in groups of 2 or 3 and present it in class.
Assessment
 Written exam (50%)
 Two homework sets (30%)
 Reading and presenting a scientific paper (20%)




 Voorkennis191531920  Markov Decision Theory and Algorithmic Methods is a prerequisite. 
Master Applied Mathematics 
Master Electrical Engineering 
Master Interaction Technology 
Master Systems and Control 
  Verplicht materiaalBookSutton and Barto, "Reinforcement Learning: an Introduction", second edition, 2018. https://www.andrew.cmu.edu/course/10703/textbook/BartoSutton.pdf 

 Aanbevolen materiaalBookBokar, "Stochastic Approximation, A Dynamical Systems Viewpoint", 2008. https://doi.org/10.1007/9789386279385 
 ArticlesPowell, ”A unified framework for stochastic optimization”, EJOR, 275(3), 2019, 795821.
https://doi.org/10.1016/j.ejor.2018.07.014 

 WerkvormenHoorcollege
 Presentatie(s)Aanwezigheidsplicht   Ja 
 Vragenuur
 Werkcollege
 Zelfstudie met begeleiding

 ToetsenWritten Exam


 