EP15: The Information Bottleneck and Scaling Laws with Alex Alemi
In this episode, we sit down with Alex Alemi, an AI researcher at Anthropic (previously at Google Brain and Disney), to explore the powerful framework of the information bottleneck and its profound implications for modern machine learning.
We break down what the information bottleneck really means, a principled approach to retaining only the most informative parts of data while compressing away the irrelevant. We discuss why compression is still important in our era of big data, how it prevents overfitting, and why it's essential for building models that generalize well.
We also dive into scaling laws: why they matter, what we can learn from them, and what they tell us about the future of AI research.
Papers and links:
- Alex's website - https://www.alexalemi.com/
- Scaling exponents across parameterizations and optimizers - https://arxiv.org/abs/2407.05872
- Deep Variational Information Bottleneck - https://arxiv.org/abs/1612.00410
- Layer by Layer: Uncovering Hidden Representations in Language Models - https://arxiv.org/abs/2502.02013
- Information in Infinite Ensembles of Infinitely-Wide Neural Networks - https://proceedings.mlr.press/v118/shwartz-ziv20a.html
Music:
“Kid Kodi” — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.
“Palms Down” — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.
Changes: trimmed