EP21: Privacy in the Age of Agents
Guest: Niloofar Mireshghallah (Incoming Assistant Professor at CMU, Member of Technical Staff at Humans and AI)
In this episode, we dive into AI privacy, frontier model capabilities, and why academia still matters.
We kick off by discussing GPT-5.2 and whether models rely more on parametric knowledge or context. Niloofar shares how reasoning models actually defer to context, even accepting obviously false information to "roll with it."
On privacy, Niloofar challenges conventional wisdom: memorization isn't the problem anymore. The real threats are aggregation attacks (finding someone's pet name in HTML metadata), inference attacks (models are expert geoguessers), and input-output leakage in agentic workflows.
We also explore linguistic colonialism in AI, or how models fail for non-English languages, sometimes inventing cultural traditions.
The episode wraps with a call for researchers to tackle problems industry ignores: AI for science, education tools that preserve the struggle of learning, and privacy-preserving collaboration between small local models and large commercial ones.
Timeline
[0:00] Intro
[1:03] GPT-5.2 first impressions and skepticism about the data cutoff claims
[4:17] Parametric vs. context memory—when do models trust training vs. the prompt?
[9:28] The messy problem of memory, weights, and online learning
[16:12] Tool use changes model behavior in unexpected ways
[17:15] OpenAI's "Advances in Sciences" paper and human-AI collaboration
[24:17] Why deep research is getting less useful
[28:17] Pre-training vs. post-training—which matters more?
[30:35] Non-English languages and AI failures
[33:23] Hilarious Farsi bugs: "I'll get back to you in a few days" and invented traditions
[37:56] Linguistic colonialism—ChatGPT changed how we write
[41:20] Why memorization isn't the real privacy threat
[47:14] The three actual privacy problems: inference, aggregation, input-output leakage
[54:33] Deep research stalking experiment—finding a cat's name in HTML
[1:01:13] Privacy solutions for agentic systems
[1:03:23] What Niloofar's excited about: AI for scientists, small models, niche problems
[1:08:31] AI for education without killing the learning process
[1:09:15] Closing: underrated life advice on health and sustainable habits
Music:
"Kid Kodi" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.
"Palms Down" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.
Changes: trimmed
About
The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.