[XAI] Neural Network Capacity and Bottlenecks

Theoretical (Analytical):

Practical (Implementation):

Literature Work:


Overview

Finding a good set of hyperparameters for a neural network is a complex problem, typically involving a time-consuming trial-and-error process. Often, the number of trainable parameters is chosen too large to solve a specific task, leading to parts of the network lying idle. Therefore, many recent works focus on automated architecture search [1, 2] or pruning [3, 4] of neural networks. In contrast to such automated or trial-and-error-based approaches, this project aims at finding a reasonable capacity for the network based on the amount of information stored in the dataset. This affects not only the total number of neurons in a network, but also its shape, i.e., how neurons are distributed across different layers.

The goal of this project is to evaluate the information-theoretic entropy of the data set and the neural network model [5, 6]. Putting both into relation helps to choose an initial network capacity and network shape. Furthermore, identifying bottlenecks [7] in the network shape is a relevant sub-problem, since a single layer with insufficient capacity might limit the performance of a network with otherwise sufficient capacity.

The project aims to reveal connections between

  • the entropy in the training dataset
  • the entropy in the trained model
  • the capacity of the hyperparameter space
  • the capacity of individual layers

Problem Statement

While choosing the hyperparameters for a neural network, information-theoretic considerations are rarely taken into account. It does, however, seem reasonable, to orient the capacity of the network towards the information content of the dataset which should be represented. Still, accessible techniques to evaluate the saturation of neural network layers during training are missing up-to-date. This includes the capacity of individual layers compared to the other layers in the network, possibly creating information bottlenecks.

Tasks

  • Get familiar with TensorFlow / PyTorch
  • Train models of different complexity on example datasets
  • Save the weights of the trained models
  • Research entropy measures for datasets
  • Evaluate the entropy (information content) of
    • the dataset
    • the saved weights
  • Analyze connection between information contents
  • Compare information content of individual layers to
    • identify bottlenecks
    • find over-sized layers

Requirements

  • Programming skills in Python
    (preferably also with Pytorch or Tensorflow)
  • Basic knowledge of neural networks
  • Interest in mathematics

Scope/Duration/Start

  • Scope: Bachelor/Master
  • Duration: 6 Month Project, 3 Month Thesis (Bachelor) / 6 Month Thesis (Master)
  • Start: immediately

Contact

References

[1]

T. Elsken, J. H. Metzen, and F. Hutter, “Neural Architecture Search: A Survey,” Journal of Machine Learning Research, vol. 20, no. 55, pp. 1–21, 2019.

[2]

M. Wistuba, A. Rawat, and T. Pedapati, “A Survey on Neural Architecture Search,” CoRR, vol. abs/1905.01392, 2019.

[3]

M. Zhu and S. Gupta, “To prune, or not to prune: exploring the efficacy of pruning for model compression,” arXiv:1710.01878, 2017.

[4]

J. Frankle and M. Carbin, “The Lottery Ticket Hypothesis: Training Pruned Neural Networks,” CoRR, vol. abs/1803.03635, 2018.

[5]

C. E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, Jul. 1948.

[6]

M. Borda, Fundamentals in Information Theory and Coding. Springer-Verlag GmbH, 2011.

[7]

N. Tishby and N. Zaslavsky, “Deep Learning and the Information Bottleneck Principle,” CoRR, vol. abs/1503.02406, 2015.