After following this course, the student is able to:
- model a real-world problem in a reinforcement learning model,
- choose a suitable learning algorithm and design an appropriate approximation framework for that model,
- use and extend a state-of-the-art software package for reinforcement learning,
- validate a given algorithm with respect to convergence, stability, and optimality.
Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment to maximize reward. This environment is stated as a Markov decision process (MDP) for which the underlying model is unknown. This course introduces techniques for modeling and handling large-scale RL problems. The focus is on i) provable performance guarantees, like on (speed of) convergence, and ii) algorithmic techniques.|
The covered models and algorithms correspond to Chapters 1-10 in the textbook by Sutton and Barto: “Reinforcement Learning: an Introduction,” 2nd edition. In addition to the textbook, we draw theory from Borkar’s book on ”Stochastic Approximation, A Dynamical Systems Viewpoint” and further insights from Powell’s article “A unified framework for stochastic optimization.”
There will be two mandatory homework sets on these topics. These sets will include both theoretical exercises and exercises focused on implementation and obtaining numerical results. In addition to these topics, students will study a recent scientific paper in groups of 2 or 3. They will study this paper under the supervision of a lecturer and present the paper in class.
- Written exam (60%)
- Two mandatory homework sets (20%)
- Reading and presenting a scientific paper (20%)