Exploration of Datasets for Visualizations of High-Dimensional Data

Theoretical (Analytical):

Practical (Implementation):

Literature Work:


Which is an appropriate dataset to evaluate a visualization for high-dimensional data?

In visualization research, quite often an arbitrary dataset is used to show the application and usefulness of a novel technique. These datasets typically differ significantly in their data distributions, dimensionality and the number of data records. Therefore, a fair comparison to competetive approaches is often not possible as data characteristics, problems and user tasks differ.

The goal of this project is to give an overview about the most commonly used datasets, their applications and to give researchers guidance for the selection of an appropriate dataset for given analysis tasks.


  • Literature review to extract commonly used datasets together with their application
  • Development of an interactive website to explore and visualize the datasets
  • Development of meta-visualizations of the used datasets


  • Good knowledge in information visualization
  • Motivation to search for and read many scientific papers about high-dimensional visualizations
  • Good programming skills in Java and Javascript


  • Scope: Bachelor/Master
  • 6 Month Project, 3 Month Thesis (Bachelor) / 6 Month Thesis (Master)
  • Start: immediately



  • Generative Data Models for Validation and Evaluation of Visualization Techniques [Schulz et al., 2016]
  • Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA:
    University of California, School of Information and Computer Science.
  • SOURCESIGHT: Enabling Effective Source Selection [Rekatsinas, Theodoros, et al., 2016]
  • An Interactive Data Repository with Visual Analytics [Rossi and Ahmed, 2016]