Task-driven projection of high-dimensional data

Theoretical (Analytical):

Practical (Implementation):

Literature Work:


Overview

High-dimensional data is difficult to access and understand by human analysts. Visualization researchers developed several techniques to represent high-dimensional data on screens. One common technique are scatter plots showing a projection or embedding of data points in two-dimensional space. While the number of available techniques has grown to the hundreds, it remains largely unclear which techniques are more suitable for data visualization when an analyst follows a specific task. Common tasks are inspecting clustering results, classifying data points, or simply looking for structure in an unknown dataset, but also inspecting the neighborhood of individual datapoints in more detail. Techniques such as PCA, MDS, Isomap, and recently t-SNE are widely applied. While there are several measures available to assess the quality of resulting low-dimensional reductions, it is still not clear which measure offers best performance for specific tasks.

Tasks

  • Implement quality measures with a unified interface.
  • Automatically test and compare different measures.
  • Present initial results on the suitability of the measures you implemented for pursuing selected tasks.

The project may be combined with the student seminar. In the seminar you would compile a set of relevant tasks and identify potentially suitable quality measures.

Requirements

  • Programming in Python (preferably knowledge of Scikit-learn)

Scope/Duration/Start

  • Scope: Bachelor/Master
  • Duration: 6 Month Project, 3 Month Thesis (Bachelor) / 6 Mionth Thesis (Master)
  • Start: Immediately

Contact

References