RLHF (Reinforcement Learning from Human Feedback) is increasingly used to train machine learning models, like LLMs, to align them to human preferences and requirements. So far, the types of feedback used in RLHF are fairly limited, for example, often we only use pairwise comparisons. However, humans are used to express feedback for behavior in diverse ways: This could be numerical feedback, corrections, demonstrations of intended behavior, etc. So a prominent research question is: "How can we enable expressive human feedback for the behavior of AI systems, and use it to train such models?" To enable such diverse human feedback, we need to improve the understanding of human feedback, e.g., the biases and accuracy of different human feedback types. In this project, your task is to contribute to the collection and analysis of human feedback for different problems, ranging from game-like environments to text generation.
You will work to further improve and extend an existing system for the collection, processing and analysis human feedback, that can be used to train machine learning models in an interactive way. These extensions should enable the sucessfull experimentation with human subjects. The existing system is already aplicable to a range of tasks or scenarios, but can be further extended to encompass even more tasks.
- Get familiar with the basics of reinforcement learning from human feedback
- Extending an existing code base
- Choose your own focus:
- User Interface Design
- Feedback Processing Systems
- Training machine learning models with human feedback
- Performing studies with Human subjects
- Choose and implement your own task/scenario of interest, e.g.
- text generation
- prodcutivity tasks
Knowledge of Reinforcement Learning/RLHF is a big plus, but not a fixed requirement.
- Scope: Bachelor/Master
- 3 Month Project, 3 Month Thesis or 6 Month Thesis
- Start: immediately
 RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback, Metz et al., 2023, https://arxiv.org/abs/2308.04332
 Deep reinforcement learning from human preferences, Christiano et al., 2017, https://arxiv.org/abs/1706.03741
 Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback, Casper et al., 2023 https://arxiv.org/abs/2307.15217