SluitenHelpPrint
Switch to English
Cursus: 202100109
202100109
Reinforcement Learning
Cursus informatieRooster
Cursus202100109
Studiepunten (ECTS)5
CursustypeCursus
VoertaalEngels
Contactpersoondr.ing. A.B. Zander
E-maila.b.zander@utwente.nl
Docenten
Contactpersoon van de cursus
dr.ing. A.B. Zander
Examinator
dr.ing. A.B. Zander
Collegejaar2021
Aanvangsblok
2B
AanmeldingsprocedureZelf aanmelden via OSIRIS Student
Inschrijven via OSIRISJa
Cursusdoelen
After following this course, the student is able to:
  • model a real-world problem in a reinforcement learning model,
  • choose a suitable learning algorithm and design an appropriate approximation framework for that model,
  • use and extend a state-of-the-art software package for reinforcement learning,
  • validate a given algorithm with respect to convergence, stability, and optimality.
Inhoud
Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment to maximize reward. This environment is stated as a Markov decision process (MDP) for which the underlying model is unknown. This course introduces techniques for modeling and handling large-scale RL problems. The focus is on i) provable performance guarantees, like on (speed of) convergence, and ii) algorithmic techniques.

The covered models and algorithms correspond to Chapters 1-10 in the textbook by Sutton and Barto: “Reinforcement Learning: an Introduction,” 2nd edition. In addition to the textbook, we draw theory from Borkar’s book on ”Stochastic Approximation, A Dynamical Systems Viewpoint” and  further insights from Powell’s article “A unified framework for stochastic optimization.”

There will be two mandatory homework sets on these topics. These sets will include both theoretical exercises and exercises focused on implementation and obtaining numerical results. In addition to these topics, students will study a recent scientific paper in groups of 2 or 3. They will study this paper under the supervision of a lecturer and present the paper in class.

Assessment
  • Written exam (60%)
  • Two mandatory homework sets (20%)
  • Reading and presenting a scientific paper (20%)
Voorkennis
191531920 - Markov Decision Theory and Algorithmic Methods is a prerequisite.
Participating study
Master Applied Mathematics
Participating study
Master Computer Science
Participating study
Master Electrical Engineering
Participating study
Master Interaction Technology
Participating study
Master Systems and Control
Verplicht materiaal
Book
Sutton and Barto, "Reinforcement Learning: an Introduction", second edition, 2018. https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf
Aanbevolen materiaal
Book
Bokar, "Stochastic Approximation, A Dynamical Systems Viewpoint", 2008. https://doi.org/10.1007/978-93-86279-38-5
Articles
Powell, ”A unified framework for stochastic optimization”, EJOR, 275(3), 2019, 795-821. https://doi.org/10.1016/j.ejor.2018.07.014
Werkvormen
Hoorcollege

Presentatie(s)
AanwezigheidsplichtJa

Vragenuur

Werkcollege

Zelfstudie met begeleiding

Toetsen
Written Exam

SluitenHelpPrint
Switch to English