49
June 23, 2026
We talk a lot on this show about RL, agents, and the move between pre-training and post-training, but not enough about the layer everything actually runs on. Benny Chen, co-founder of Fireworks AI, one of the largest inference platforms around, walks us through what it takes to serve models at scale: sourcing GPUs, writing the kernels, the runtime, and the routing layer that lets a customer hit one endpoint and forget the rest. We talk why the real bottleneck is power, not chips, and why that fa