AI-Cafe presents: The Information Bottleneck Principle for Analysis and Design of Neural Classifiers
The information bottleneck principle, a mathematical formulation of Occam's Razor, aim to create latent representations that are sufficient for a task and maximally compressed – a minimal sufficient statistic. In this talk, we first critically reflect on the application of the information bottleneck principle in deep learning, addressing the question whether and how compression can be connected to generalization performance. We discuss theoretical, experimental, and engineering evidence in the shape of non-vacuous generalization bounds, information plane analyses, and neural classifiers successfully trained using the information bottleneck principle. Taken together, these three perspectives suggest that compressed representations help improving generalization and robustness.
In the second, shorter part of the talk, we argue that (variational) approaches used to implement the intractable information bottleneck objective can also be successfully used to implement other information-theoretic objectives. We concretize this with the example of invariant representation learning for fair classification. We show that the resulting method has interesting and desirable properties, suggesting that information-theoretic objectives can be useful ingredients for deep learning.

