High-dimensional data is difficult to access and understand by human analysts. Visualization researchers developed several techniques to represent high-dimensional data on screens. One common technique are scatter plots showing a projection or embedding of data points in two-dimensional space. While the number of available techniques has grown to the hundreds, it remains largely unclear which techniques are more suitable for data visualization when an analyst follows a specific task. Common tasks are inspecting clustering results, classifying data points, or simply looking for structure in an unknown dataset, but also inspecting the neighborhood of individual datapoints in more detail. Techniques such as PCA, MDS, Isomap, and recently t-SNE are widely applied. While there are several measures available to assess the quality of resulting low-dimensional reductions, it is still not clear which measure offers best performance for specific tasks.
- Implement quality measures with a unified interface.
- Automatically test and compare different measures.
- Present initial results on the suitability of the measures you implemented for pursuing selected tasks.
The project may be combined with the student seminar. In the seminar you would compile a set of relevant tasks and identify potentially suitable quality measures.
- [Nonato2018] Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment
- [Liu2015] Visualizing High-Dimensional Data: Advances in the Past Decade
- [Etematpour2016] Choosing Visualization Techniques for Multidimensional Data Projection Tasks: A Guideline with Examples
- [vanderMaaten2009] Dimensionality Reduction: A Comparative Review