Reinforcement Learning

Credits

Types

Elective

Requirements

This subject has not requirements , but it has got previous capacities

Department

Web

https://www.cs.upc.edu/~mmartin/RL-MAI.html

Mail

mmartin@cs.upc.edu

The objective of this course is to introduce and deepen the framework of learning by reinforcement, where an agent learns the appropriate behavior to solve its objectives from direct interactions of the agent with the environment and without previous knowledge of the world.

Specifically, the course will begin introducing the most basic concepts of learning by reinforcement until reaching the most modern algorithms that are state of the art. Next the course will deepen in different advanced techniques that try to extend the described frame to (1) a more efficient learning from techniques of exploration and of modelization of the environment, (2) to a continuous learning of the agent to different tasks necessary for a General Artificial Intelligence and (3) to the automatic learning of behaviors in systems multi-agents either in cooperative and competitive environments.

When finishing, the student will know the state of the art in reinforcement learning and the domains where it is appropriate to apply it, and she will have implemented different algorithms in the programming frameworks more relevant in the area.

Teachers

Person in charge

Mario Martín Muñoz ( mmartin@cs.upc.edu )

Weekly hours

Theory

Problems

Laboratory

Guided learning

Autonomous learning

5.33

Competences

Generic Technical Competences

Generic

CG3 - Capacity for modeling, calculation, simulation, development and implementation in technology and company engineering centers, particularly in research, development and innovation in all areas related to Artificial Intelligence.

CG4 - Capacity for general management, technical management and research projects management, development and innovation in companies and technology centers in the area of Artificial Intelligence.

Technical Competences of each Specialization

Academic

CEA3 - Capability to understand the basic operation principles of Machine Learning main techniques, and to know how to use on the environment of an intelligent system or service.

CEA9 - Capability to understand Multiagent Systems advanced techniques, and to know how to design, implement and apply these techniques in the development of intelligent applications, services or systems.

CEA11 - Capability to understand the advanced techniques of Computational Intelligence, and to know how to design, implement and apply these techniques in the development of intelligent applications, services or systems.

Professional

CEP2 - Capability to solve the decision making problems from different organizations, integrating intelligent tools.

CEP3 - Capacity for applying Artificial Intelligence techniques in technological and industrial environments to improve quality and productivity.

CEP8 - Capability to respect the surrounding environment and design and develop sustainable intelligent systems.

Transversal Competences

Teamwork

CT3 - Ability to work as a member of an interdisciplinary team, as a normal member or performing direction tasks, in order to develop projects with pragmatism and sense of responsibility, making commitments taking into account the available resources.

Objectives

To understand the most important algorithms and state of the art in the area of learning by reinforcement
Related competences: CEA3, CEA11, CG3, CG4,
To know how to computationally formalize a real world problem as learning by reinforcement and know how to implement in the most current environments the learning algorithms that solve them
Related competences: CG4, CEP2, CEP3, CEP8, CT3,
To understand the most advanced and recent techniques in the field of Multi-Agent learning to cooperate and compete.
Related competences: CEA9, CG4, CEP2, CEP3, CEP8, CT3,
To understand the difficulties and inefficiencies of the reinforcement learning approach and propose the techniques and approaches that could solve them
Related competences: CEA11, CG3, CG4, CEP3, CEP8,
To understand the need, fundamentals, and particularities of behavior learning and the differences it has from supervised and unsupervised machine learning.
Related competences: CEA3, CEA11, CG3, CG4,
To distinguish the kind of problems can be modeled as a reinforcement learning problem and identify the techniques that can be applied to solve them.
Related competences: CEA3, CEA11, CEP2, CEP3,

Introduction: Behavior Learning in Agents and description of main elements in Reinforcement Learning
Intuition, motivation and definition of the reinforcement learning (RL) framework. Key elements in RL.
Finding optimal policies using Dynamic Programming
How to learn the optimal policy with full knowledge of the world model: algebraic solution, policy iteration and value iteration.
Introduction to Model-Free approaches.
Basic algorithms for reinforcement learning: Monte-Carlo, Q-learning, Sarsa, TD(lambda). The need for Exploration. Differences between On-policy and Off-policy methods.
Function approximation in Reinforcement Learning
Need for function approximation and Incremental methods in RL. The Gradient Descent approach. RL with Linear function approximation. The deadly triad for function approximation in RL. Batch methods and Neural Networks for function Approximation.
Deep Reinforcement Learning (DRL)
Revolution in RL by introducing Deep Learning. Dealing with the deadly triad with the DQN algorithm. Application to the Atari games case. Evolutions of the DQN algorithm: Double DQN, Prioritized Experience Replay, multi-step learning and Distributional value functions. Rainbow: the state-of-the-art algorithm in discrete action space.
Policy gradient methods
What to do in continuous action spaces. How probabilistic policies allow to apply the gradient method directly in the policy network. The REINFORCE algorithm. The Actor-Critic algorithms. State-of-the-art algorithms in continuous action spaces: DDPG, TD3 and SAC.
Advanced Topics: How to deal with sparse rewards
The problem of the sparse reward. Introduction to advanced exploration techniques: curiosity and empowerment in RL. Introduction to curriculum learning to easy the learning of the goal. Hierarchical RL to learn complex tasks. The learning of Universal Value Functions and Hindsight Experience Replay (HER).
Advanced topics: Model Based Reinforcement Learning (MBRL)
Separating the learning of the policy from the learning of a model of the world has some benefits and some problems. Sample efficiency in RL by hallucination and imagination.
Advanced Topics: Towards Long-life learning in agents
Is RL a way to obtain a General Artificial Intelligence? Multi-task learning in RL, Transfer learning in RL and Meta-learning in RL. State-of-the-art approaches.
Reinforcement Learning in the multi-agent framework
Learning of behaviors in environment where several agents act. Learning of cooperative behaviors, Learning of competitive behaviors, and mixed cases. State-of-the art algorithms. The special case of games: The AlfaGo case and the extension to Alfa-Zero.

Activities

Activity Evaluation act

Introduction, motivation and examples of successful applications in RL

Development of the corresponding topic and laboratory exercises
Objectives: 6 5
Contents:

1 . Introduction: Behavior Learning in Agents and description of main elements in Reinforcement Learning

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Definition of the RL framework. Key elements in RL. Finding the optimal policy using Value Iteration and Policy Iteration

Development of the corresponding topic and laboratory exercises
Objectives: 1 6 5
Contents:

1 . Introduction: Behavior Learning in Agents and description of main elements in Reinforcement Learning

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Introduction to Model-Free approaches. Monte-Carlo, Q-learning, Sarsa, TD(lambda)

Development of the corresponding topic
Objectives: 1 6 5
Contents:

3 . Introduction to Model-Free approaches.

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Function approximation in RL

Development of the corresponding topic and laboratory exercises
Objectives: 1 2 4
Contents:

4 . Function approximation in Reinforcement Learning

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Deep Reinforcement Learning (DRL)

Development of the corresponding topic and laboratory exercises
Objectives: 1 2 4
Contents:

5 . Deep Reinforcement Learning (DRL)

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Policy gradient methods

Presentation of the corresponding course topic and lab exercises
Objectives: 1 4
Contents:

6 . Policy gradient methods

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Tutorized practical works

Objectives: 2 6 3
Contents:

1 . Introduction: Behavior Learning in Agents and description of main elements in Reinforcement Learning
2 . Finding optimal policies using Dynamic Programming
3 . Introduction to Model-Free approaches.
5 . Deep Reinforcement Learning (DRL)
4 . Function approximation in Reinforcement Learning

Theory

Problems

Laboratory

Guided learning

Autonomous learning

10h

Final exam

Week: 15

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Study of the state-of-the art in an advanced topic work

Week: 11

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Advanced topics on behavior learning: Increasing sample efficiency

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Multiagent RL

Objectives: 3
Contents:

10 . Reinforcement Learning in the multi-agent framework

Theory

Problems

Laboratory

Guided learning

Autonomous learning

Teaching methodology

Theory classes will introduce the knowledge, techniques and concepts required to apply them
in practice during the laboratory classes. Theory classes will be mainly of the type magistral
lecture, but some of them may be of the type exposition-participation, with the participation of
the students in solving problems or exercises.

Laboratory classes have as objective that the students work with software tools which allow the
application to real problems of the techniques presented in theory classes. Students will use
these tools to develop their practical work of the course, which will consist of a part of autonomous
individual work and a part of cooperative work in a team of 2/3 people. Some time of the laboratory
classes will be devoted to the orientation and supervision by the professor of these autonomous
and cooperative works.

Evaluation methodology

The mark (M) is calculated as follows:

M = 0.20 * Quiz + 0.30 * Practical + 0.5 * Theoretical

where

*Quiz* refers to a Quiz with theoretical and conceptual questions about the first part of the course
*Practical* refers to an implementation of a RL algorithm on a problem done in Python
*Theoretical* refers to a study of the state-of-the art in an advanced topic work to be chosen by the student

Bibliography

Basic

Reinforcement learning : an introduction - Sutton, Richard S; Barto, Andrew G, The MIT Press, [2018]. ISBN: 9780262039246
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004166329706711&context=L&vid=34CSUC_UPC:VU1&lang=ca
Deep reinforcement learning - Morales, M, Manning Publications, 2020. ISBN: 9781617295454
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004208939706711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Complementary

Deep reinforcement learning in action - Zai, A.; Brown, B, Manning Publications Co., 2020. ISBN: 9781617295430
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991004203829706711&context=L&vid=34CSUC_UPC:VU1&lang=ca

Web links

Master in Artificial Intelligence http://www.fib.upc.edu/en/masters/mai.html
Web page of the course with all materials. https://www.cs.upc.edu/~mmartin/ATCI-RL.html

Previous capacities

Basic concepts of Deep Learning.