Human in The Loop-Reinforcement Learning

Theoretical (Analytical):

Practical (Implementation):

Literature Work:


Deep Reinforcement Learning (Deep RL) has achieved some notable successes, e.g., playing video games at super-human levels or learning robot control policies. A major challenge for the application of RL is the necessity to specify reward functions in order to communicate the desired goal to the agent. This reward function generally has to be specified by the designers of a scenario, and relies on simple heuristics or intuition. A different way of achieving this, is to directly use small scale feedback from humans observing a RL agent acting in the environment and rating the quality of execution. For the project, your task is to investigate strategies that are used to learn goal-directed behavior based on limited amounts of human interaction. Furthermore, the focus should lie on specific interaction mechanics and not the domains in which they are used.

Problem Statement

The users may lose engagement, especially when performing repetitive tasks, i.e., when interactively creating or updating a machine learning model. The goal of the project is to implement a collaborative VA system, where users can compete with each other while improving the quality of a clustering model.


Good programming skills in Python and Javascript/TypeScript.

Knowledge of Reinforcement Learning is a big plus.


  • Scope: Master
  • 6 Month Project, 6 Month Thesis
  • Start: immediately



[1] Deep Reinforcement Learning from Human Preferences, Christiano, Paul and Leike, Jan and Brown, Tom B and Martic, Miljan and Legg, Shane and Amodei, Dario, arXiv:1706.03741, 2017

[2] A survey of inverse reinforcement learning techniques, Shao Zhifei , Er Meng Joo, International Journal of Intelligent Computing and Cybernetics, 2016

[3] Progressive learning of topic modeling parameters: A visual analytics framework,
Mennatallah El-Assady, Rita Sevastjanova, Fabian Sperrle, Daniel Keim, Christopher Collins, IEEE transactions on visualization and computer graphics, 2017

[4] A Framework for Data-Driven Robotics, Serkan Cabi and Sergio Gomez Colmenarejo and Alexander Novikov Et al., abs/1909.12200, 2019