Stable Diffusion is a state-of-the-art model for text-to-image generation. Taking a textual description as input, e.g., "a photograph of an astronaut riding a horse", the model iteratively refines an image to match this description. While the results can be astonishing (see figure 1), they are, at the same time, serendipitous and unpredictable.
It often stays unclear
- how a text prompt has to be refined to reach a desired output,
- which part of the input hinders the algorithm from producing good results, or
- how the local search space is distributed, i.e., how the close neighborhood of an input sentence looks like as image outputs.
The goal of this project is to develop a visual analytics tool which allows to produce images according to a textual input with a pre-trained Stable Diffusion model. From those images, the user should be able to further refine the inputs to steer the outputs towards his desired results.
How to leverage the state-of-the-art text-to-image model Stable Diffusion to create an interactive VA workflow for user-oriented creation of imagery?
- Implement a VA system with a front- and backend to interactively query a Stable Diffusion model
- Enable users to provide textual (and, optional, image) input
- Implement a feedback loop to steer the results of the model interactively
- Experiment with the system to generate interesting insights into the model's behavior
- High interest in the topic and knowledge about deep learning are useful
- Basic knowledge about information visualization and data mining
- Good programming skills:
- Scope: Preferably Master, Bachelor might be possible in some cases
- Duration: 3 months project, 6 months thesis (Master)
- Start: immediately