Human in The Loop-Reinforcement Learning

Theoretical (Analytical):

Practical (Implementation):

Literature Work:


Overview

Deep Reinforcement Learning (Deep RL) has achieved some notable successes, e.g., playing video games at super-human levels or learning robot control policies. A major challenge for the application of RL is the necessity to specify reward functions in order to communicate the desired goal to the agent. This reward function generally has to be specified by the designers of a scenario, and relies on simple heuristics or intuition. A different way of achieving this, is to directly use small scale feedback from humans observing a RL agent acting in the environment and rating the quality of execution. For the seminar, your task is to investigate strategies that are used to learn goal-directed behavior based on limited amounts of human interaction. Furthermore, the focus should lie on specific interaction mechanics and not the domains in which they are used.

Problem Statement

We choose an exemplary domain as a simplified use case. An existing web tool is extended with interactive, visual components to enable users to give feedback for agents that act in the environment.  The goal is to train an agent on a specific task, and to show that our approach can outperform a baseline algorithm, e.g. a standard RL algorithm trained with a fixed reward function.

Tasks

  • Develop and integrate a use case into an existing software plattform
  • Add user interactions and visualizations to enable users effective input of feedback
  • Develop a visualization to display the training progress
  • Run training and evaluate resulting agent

Requirements

Good programming skills in Python and Javascript/TypeScript.

Knowledge of Reinforcement Learning is a big plus.

 

 

Scope/Duration/Start

  • Scope: Master
  • 3 Month Project, 6 Month Thesis
  • Start: immediately

Contact

References

[1] Deep Reinforcement Learning from Human Preferences, Christiano, Paul and Leike, Jan and Brown, Tom B and Martic, Miljan and Legg, Shane and Amodei, Dario, arXiv:1706.03741, 2017

[2] A survey of inverse reinforcement learning techniques, Shao Zhifei , Er Meng Joo, International Journal of Intelligent Computing and Cybernetics, 2016


[3] Progressive learning of topic modeling parameters: A visual analytics framework,
Mennatallah El-Assady, Rita Sevastjanova, Fabian Sperrle, Daniel Keim, Christopher Collins, IEEE transactions on visualization and computer graphics, 2017

[4] A Framework for Data-Driven Robotics, Serkan Cabi and Sergio Gomez Colmenarejo and Alexander Novikov Et al., abs/1909.12200, 2019