Visualizing the role of diversity in ensembles of classifiers

Theoretical (Analytical):

Practical (Implementation):

Literature Work:


Problem Statement

Classifier models are among the most popular machine-learning algorithms. Classification tasks cover a wide range of applications (e.g.  spam filtering, face detection, and fraud detection). Often, the performance of a group of classifier models working together and the combination of their results to achieve a single outcome is better than single models running independently. In this context, diversity among models (e.g. when classifiers produce different errors) is a known aspect that usually favors superior performance with ensembles of classifiers. However, this is not true all the time. It depends on the data, the classification task at hand, and the models. This thesis aims at producing visual representations of ensembles of classifiers with distinct models. We will work with several datasets and distinct classification problems. At the end, we want to search for patterns that help to explain the role of diversity in ensembles of classifiers.

Tasks

  • Get familiar with classification models and ensembles of classifiers
  • Experiment with different visualizations for the representation of classification models, extracting for each model features like performance metrics, diversity scores, and others
  • Implement basic search algorithms that select the best combination of models from big model libraries, in order to generate the ensembles that will be further visualized and explored
  • Implement a basic interactive visualization that allows the exploration of the collection of ensembles defined for this study, aiming to construct visual explanations that help to understand the role of diversity in ensembles of classifiers

Requirements

  • Good knowledge in Java programming language
  • Basic knowledge or the interest in learning about machine-learning classifiers

Scope/Duration/Start

  • Scope: Bachelor (3 months project + 3 months thesis)

Contact

References

  • Zhou, Zhi-Hua. Ensemble methods: foundations and algorithms. CRC press, 2012.
  • Brown, Gavin, and Ludmila I. Kuncheva. "Good" and "bad" diversity in majority vote ensembles. International Workshop on Multiple Classifier Systems. Springer Berlin Heidelberg, 2010.
  • Talbot, Justin, et al. EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2009.