AI for Science with Qichao Hu (Molecular Universe / SES AI)


Most AI-for-science companies are selling shovels. Qichao Hu wants the gold.
In this episode, we talk with Qichao, the founder and CEO of Molecular Universe, the AI-for-science platform that grew out of SES AI, a high-energy-density battery developer he's run for fourteen years. His core distinction is that companies from the AI world build tools, such as foundation models that predict properties, while companies from the science world care about the final product, such as the new battery or material that actually ships. Molecular Universe sits firmly on the science side, and the difference shows up everywhere from what they publish to what they refuse to.
We get into the actual workflow of materials discovery and where AI compresses it. A single trial in a traditional lab can take a year with maybe a 40% success rate; the goal is to run a thousand candidates in parallel and turn that year into a week. Qichao walks through improving low-temperature fast-charging for EV batteries: from hypothesis generation through molecule-, material-, and device-level property prediction, down to autonomous labs that synthesize and test the top candidates without a human touching a pipette.
The hardest problem, it turns out, isn't predicting molecular properties or measuring device performance, but it's the black box connecting the two. In batteries, that's the solid-electrolyte interface, which the field has been hand-waving about since the seventies. And the thing standing in the way of cracking it isn't a clever training trick but data: companies sitting on twenty years of records are finding it too messy, incomplete, and poorly labeled to train on, and are having to start collecting from scratch with new protocols and robots.
Timeline
- 00:13 — Intro and welcome;
- 01:19 — Shovel vs. gold
- 05:18 — Why the world's smartest scientist doesn't automatically give you a better battery
- 07:25 — The discovery workflow
- 09:37 — Exploration vs. exploitation
- 11:54 — Safety and filtering: screening novel molecules against banned and toxic-substance lists
- 17:55 — How hypotheses get generated, and where frontier LLMs help
- 20:29 — From hypothesis to ~400 formulations: property prediction, ranking, and handing off to autonomous labs
- 26:37 — "A foundation model for everything" — and the black box between molecular properties and device performance
- 30:01 — World models and physics
- 33:09 — The great unknown in batteries
- 37:08 — Simulation vs. reality: calibrating massive simulated datasets with a sliver of experimental data
- 41:47 — Lab robotics: how fast the hardware has caught up, and what a floor of autonomous labs looks like
- 43:50 — The real bottlenecks
- 50:21 — Pre-training from scratch vs. post-training LLMs, and why training tricks haven't reduced the need for good data
- 52:42 — Evaluation
- 55:42 — Publish the B+ model, keep the A model
- 58:05 — Five years out
- 1:00:37 — Closing thoughts and wrap
Music:
- "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.