SubjectsSubjects(version: 845)
Course, academic year 2018/2019
   Login via CAS
Deep Reinforcement Learning - NPFL122
Title in English: Hluboké zpětnovazební učení
Guaranteed by: Institute of Formal and Applied Linguistics (32-UFAL)
Faculty: Faculty of Mathematics and Physics
Actual: from 2018 to 2018
Semester: winter
E-Credits: 6
Hours per week, examination: winter s.:2/2 C+Ex [hours/week]
Capacity: unlimited
Min. number of students: unlimited
State of the course: taught
Language: Czech, English
Teaching methods: full-time
Additional information: http://ufal.mff.cuni.cz/courses/npfl122
Guarantor: RNDr. Milan Straka, Ph.D.
Annotation -
Last update: Mgr. Barbora Vidová Hladká, Ph.D. (25.01.2019)
In recent years, reinforcement learning has been combined with deep neural networks, giving rise to game agents with super-human performance (for example for Go, chess, or 1v1 Dota2, capable of being trained solely by self- play), datacenter cooling algorithms being 50% more efficient than trained human operators, or improved machine translation. The goal of the course is to introduce reinforcement learning employing deep neural networks, focusing both on the theory and on practical implementations.
Aim of the course -
Last update: doc. RNDr. Vladislav Kuboň, Ph.D. (05.06.2018)

The goal of the course is to introduce reinforcement learning combined with deep neural networks. The course will focus both on theory as well as on practical aspects.

Course completion requirements -
Last update: doc. RNDr. Vladislav Kuboň, Ph.D. (05.06.2018)

Students pass the practicals by submitting sufficient number of assignments. The assignments are announced regularly the whole semester and are due in several weeks. Considering the rules for completing the practicals, it is not possible to retry passing it. Passing the practicals is not a requirement for going to the exam.

Literature -
Last update: doc. RNDr. Vladislav Kuboň, Ph.D. (05.06.2018)

Richard S. Sutton and Andrew G. Barto: Reinforcement Learning: An Introduction, Second edition, 2018.

John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel: Trust Region Policy Optimization https://arxiv.org/abs/1502.05477

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov: Proximal Policy Optimization Algorithms https://arxiv.org/abs/1707.06347

David Silver et al.: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm https://arxiv.org/abs/1712.01815

Requirements to the exam -
Last update: doc. RNDr. Vladislav Kuboň, Ph.D. (05.06.2018)

The exam consists of a written part and an optional oral part, where the students can react to queries regarding the written part and also answers additional questions.

The requirements of the exam correspond to the course syllabus, in the level of detail which was presented on the lectures.

Syllabus
Last update: Mgr. Barbora Vidová Hladká, Ph.D. (25.01.2019)

Reinforcement learning framework

Tabular methods

  • Dynamic programming
  • Monte Carlo methods
  • Temporal-difference methods
  • N-step bootstrapping

Approximate solution methods

Eligibility traces

Deep Q networks

Policy gradient methods

  • REINFORCE
  • REINFORCE with baseline
  • Actor-critic
  • Trust Region Policy Optimization
  • Proximal Policy Optimization

Continuous action domain

Monte Carlo tree search

Training networks with discrete latent variables

Entry requirements -
Last update: doc. RNDr. Vladislav Kuboň, Ph.D. (05.06.2018)

Python programming skills and Tensorflow skills (or any other deep learning framework) are required, to the extent of the NPFL114 course. No previous knowledge of reinforcement learning is necessary.

 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html