May 4, 2026

Inside xAI, and the Bet on AI Math - with Christian Szegedy (Math Inc)

We talked with Christian Szegedy, co-inventor of Inception and Batch Normalization, founding scientist at xAI, now at Math Inc, about what it takes to build a frontier lab, and why he left xAI to work on formal mathematics. Christian thinks Lean and auto-formalization are the missing piece for trustworthy AI: a machine-checkable layer underneath all reasoning, where proofs are guaranteed correct without anyone having to read them.

We got into his bet with François Chollet that AI will hit superhuman mathematician level by 2026, and what that actually unlocks beyond math itself: verified software instead of vibe-coded apps that break when you refactor, AI systems you can actually trust because their reasoning is checkable, and a path to handling protein folding, chemistry, and parts of biology with real guarantees instead of hand-waving. Christian also walked us through how Math Inc's Gauss system pulled off a proof in two weeks that human experts had estimated would take another year.

We also covered xAI's first 12-person year, why Christian no longer buys the original batch normalization story, why he's sure transformers won't be the dominant architecture in five years, what mathematicians do in a world of cheap proofs, and his take on whether humanity will handle AI well. He distrusts humanity more than he distrusts AI.

Timeline

00:12 — Intros: Christian's background (Inception, Batch Norm, xAI, Math Inc)

01:29 — Building a frontier lab from scratch: the first 12 people at xAI

04:15 — Hiring for proven track records when 200K GPUs are at stake

06:07 — Elon's "dependency graph" and balancing long-term vision with investor demos

07:28 — Gauss formalizes the strong prime number theorem in 2 weeks

12:25 — What "formalization" actually means (and why it's not what most people think)

14:39 — Why Lean gives 100% certainty and why that matters for RL

15:26 — ProofBridge and joint embeddings across mathematical subfields 18:07 — Does math formalization transfer to coding and other fields?

21:44 — Can every domain be mathematized?

23:14 — Verified software, chip design, and why vibe-coded apps are dangerous

26:35 — Scaling Mathlib by 100–1000x

28:27 — Artisan formalizers vs. invisible machine-language formalists

33:26 — Can verification generalize?

45:19 — Revisiting Batch Norm: covariate shift, loss landscape, and what really happens

48:22 — Is normalization even necessary?

50:10 — What's actually fundamental in modern AI architectures

51:41 — Why Christian thinks transformers won't last 5 years

52:38 — The 2026 superhuman AI mathematician bet

55:15 — What's missing: better verification + a much larger formalized math repository

56:13 — Lean vs. Coq vs. HOL Light - does the proof assistant actually matter?

59:26 — The role of mathematicians in 5–10 years

1:02:00 — A human element to mathematics: Newton, Leibniz, and competitive proving

1:03:25 — The telescope analogy: AI as the instrument that lets us see the math universe

1:05:19 — Job apocalypse or Jevons paradox?

1:08:41 — Advice for students

1:09:50 — Can we formally verify AI alignment?

1:11:52 — Closing thanks

Music:

"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
"Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
Changes: trimmed

About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

Ravid Shwartz-Ziv: Hi everyone and welcome back to the information bottleneck and today we have a great guest Cristian, â an AI researcher he has a long history in AI and deep learning he co-invented inception and batch normalization he co-founded XAI and then he left to to to founded the Math Inf and And here he is. Hi, hi Kristen. How are you?

Christian Szegedy: Hey, it's my pleasure to talk to you. So happy to be on your podcast.

Ravid Shwartz-Ziv: Thanks for coming and as always, Allen, hey Allen!

Allen Roush: Nice to see you, Ravid, and nice to meet you, Christian. It is a real honor.

Christian Szegedy: Thank you.

Ravid Shwartz-Ziv: So today we will talk about different topics, including how to build FrontierLab and what does it mean specifically in math and what we actually think about â AI and math and how we see the future. So let's start with â XAI. You co-founded XAI, I think, three years ago, right?

Christian Szegedy: Yes.

Ravid Shwartz-Ziv: Something like that. So let's focus, I want to focus in the, I think the interesting question, like how you actually build a frontier lab? Like what types of people and what types of directions and paths you need in order to build an organization that at the end will make a great model.

Christian Szegedy: Yes, mean, I would more like consider myself as a founding scientist or a founding engineer than a co-founder really. I Elon was doing like most of the like organization and he was the CEO and he's basically dropped the whole company in a sense and was maybe Igor and Jimmy were helping much more than I did. so, so yeah, I think also Another remark that Elon would hate this notion of lab. He didn't think of XCI as a lab. He thought of it as more like an entity that â drives engineering and creates the best AI in the world â and helps accomplishing a lot of other engineering goals. So I don't think he... So every lab is different or every entity is different and it's very different from let's say what Jan Lekun has started or what â Anthropic is or how OpenAI started. All of them were very unique in their own ways. So I think there is no real pattern there. So I think what XEI really needed or we tried to achieve in the early days is to get people around that. that had a proven track record, knew what they did, already had experience with various technologies like language models and training and large-scale use of GPUs. So was an extremely small â circle of technical members, like 12 people for a year, who used initially 5,000-ish, then more like at the end, when I left, more than... 100,000 GPUs but closer to 200. And it was a very small amount of people using a huge amount of compute. So you had to be very deliberate in what you are doing and how you are spending these insane resources. So that's one issue. So basically we didn't take any chances of hiring people who didn't have a proven track record of being able to utilize that compute. and make a good use of it. So that was very important.

Ravid Shwartz-Ziv: But how does it start? Do you write on the board like we need people like in these like people that know to pre-train and people that like know this topic and this topic and this topic or like we have like we just want like talented people that can they will figure it out how to train a frontier model.

Christian Szegedy: Peace out. Yes. Yeah, I mean, we had definitely people who were hired for a reason and most of them, most of us were hired for a reason. And some, for example, we had â like engineers who were hired from DeepMind who did like a lot of â pre-training and... and they knew the ins and outs. So Igor was, for example, pretty much involved in a lot of pre-training effort in various models. So yeah, we have a lot of people who were hired for particular reasons. And I mean, as Ilan. So basically you need different skills at the beginning of the company. basically you have to create like the foundations â for your models. So collecting the data, training the initial models, getting together the infrastructure, all that work. â you needed people for that reason. And we hired several good people who did amazing infrastructure work. Some people were very knowledgeable about data centers. So yeah, mean, so in the beginning Frontier Labs are like a startup that is... â plan to be scaled to a huge size or a huge compute, it needs to be very deliberate on getting the right skillset together.

Ravid Shwartz-Ziv: And how do you manage that? Now we don't have anything, right? But there is some really, really far goal. And how do you manage this trade-off between decide what is good for the next version of the model that is probably really bad. Yeah, we want it to be the best model out there in a year from now.

Christian Szegedy: I mean, that's definitely nobody would think of that you will do a bad model or anything like that. And actually, think it's like Ilan is more principled in, I mean, in that sense that he had like very clear vision of understanding what intelligent explosion will mean and he's still... like more forward looking than most other places. So a lot of other big companies say, okay, we have to invest in AI, let's spend this much billion dollars. And Elon says that I need this and this because this is required for that and that. So he has a clear dependency graph in his head and then he says, okay, if AI will be big and we don't do it, then we will basically, we will have to pay a lot of money to the competition. So let's not do that. Let's try to build it on our own. then he's basically, finds a way of... of scaffolding the whole thing and get to a point where it's competitive and also then it â satisfies his needs. So I think Elon puts â all the different stories together in his head and he understood that AI is a basis for a lot of other technologies and that's why he built the lab. But in the meantime, he had to get investments. So basically he had to say, okay, we produce this, we produce that, we demonstrate skills here. â we have used that number etc. because otherwise it's hard to get funding.

Allen Roush: So I want to shift a little bit, much as I love this discussion, to â auto-formalization and mathematics, Inc. Let's start with, guess, â Goss. He completed, or sorry, it, it completed, should be gendering, completed the strong prime number theorem formalization in three weeks after human experts stalled for, I think it was 18 months. Where exactly

Christian Szegedy: Yeah.

Allen Roush: did the humans get stuck that Gauss didn't? â And what was it complex analysis components or like search strategy? And also what you think this reveals about the comparative advantages of like AI formalization â versus human formalization?

Christian Szegedy: Okay. Yeah, mean, okay, so the actual story is that this was a long-standing effort by Alex Kontorovich and Terry Tao, who were trying to prove some version of the prime number theorem. The strongest form of the prime number theorem is the Riemann hypothesis, so that's a very, I mean, that would be a great thing if we could solve. I mean, we don't have the proof, but this proof is known for 130 years. But basically the story was that they used a lot of students, like used, mean, they worked with a lot of students. that they gave all the work to them and they did a lot of the actual link holding and then they worked for a year and they estimated, they didn't get stuck, they estimated that okay, the Boracaratory Theodoric CRM would take another year or something like that. Or maybe a half a year, but generally a lot of time. And they said, okay, we are just, I mean, that was enough so far. So we have proven this CRM. module or the BarÃ¡ka-LatÃ©odali CRM and then we had to add a complex analysis CRM. And yeah, and then â we were looking into that and then that was a good test bed of saying, okay, can we do it faster? so we worked with Jared â Littman who was at Stanford and he basically created a blueprint for it and and we started to hand feed â that blueprint to the Gauss system and finished it in about two weeks. So which is pretty impressive. But this was still a lot of human effort involved so we had to... So Jared found like various subtle bugs in the textbooks that proved that CORAMs and then they had to be fixed. So basically the turn out that the CORAM was somewhat more technical than it looked. So it required, for example, not just like another proof, have one epsilon and delta, and then we had to basically introduce like three other radii that depended on each other, et cetera. And basically that math had to be also kind of figured out, so it was not like just â formalizing the CRM. But basically, I think that the cool thing was that we started the original version of the Gauss system. Actually, it was not my thing back then. So this whole effort started at Morph Labs, which was Jesse's company, and it's more like an infrastructure company. But when I left XAI, I told Jesse that I leave XAI if... if we are doing this formalization project together. And he said that, yes, let's be what. And he diverted a lot of his resources and attention to this project. And basically our first formalization attempt proved 2,000 lines of basically created a 2,000 lines long formalization. This next prime that was a simplified version of the ABC CRM. â relatively early result by the Breuil from the 60s. That was not even published. It was more like a folklore lizard for a while. And then... â That was 2000 lines of Linquot created semi-automatically, but much faster than humans would do that work. And Jared didn't know much Lin, so he could do it. So we had Jared who knew mathematics and we had students who knew Lin to some extent, but not very much and mostly just using AI. â

Ravid Shwartz-Ziv: But let's go a step back a bit. Why you think formalism, what does it even mean, formalism, and why it's important?

Christian Szegedy: Yeah, mean, that's a good question. So actually, don't think, so I think formalization is an overloaded term. So a lot of people think that mathematics as seen in the textbooks is formal mathematics. Like that's what would normal people would think, oh, that looks so complicated and a lot of formulas, so it's formal. So people use formal in that sense. But. What we mean by formal mathematics is basically that we turn mathematics into a computer code. And it can be written in some language. That language can be like various, there are various languages for that purpose. And the latest language is lean. And people like that language more because it's a bit simple. more elegant, so it has a more elegant syntax and higher abstraction levels than previous formalization languages. But it's essentially the same task as writing a computer code, just much more complicated because you need a very deep mathematical understanding to use that system. So it takes much longer to write 100 lines of lean formalization than 100 lines of C code for some relatively like well-known programming task.

Ravid Shwartz-Ziv: you

Christian Szegedy: And so yeah, so basically formal mathematics is this turning mathematics into this programming language. It's much more verbose than let's say a normal paper. So for example, this ABC CRM, like it's a simplified ABC CRM that took like, it was like a page, one and a half page. but it was 2,000 lines of lean code when it got formalized. So it was much, more verbose than â human proof. So when you formalize something, has, â kind of the same level of details that you have to go into as if when you write a computer code. You can explain to me in three sentences what you want to code up and then you spend like three weeks on coding it. â the similar thing is with lean. â â And the advantage of that is that once you have it, you can just ask the link compiler, verify this proof for me. And it doesn't need any AI. It's just completely mechanistic. And then it will tell you 100 % certainty that your proof is correct. So there is no more error left to discuss. I mean, unless the link has a bug.

Allen Roush: Thank

Christian Szegedy: Generally, the only bug that might remain is in the formalization of the statement itself, but its proof will be 100 % guaranteed. So that's a big deal because one of the scarce resources for reinforcement learning in AI is verifiable. problems. if you, so one of the problems of like saying like frontier maths, like a benchmark and then say, okay, solve this â problem, mathematical problem. And then the AI spits out some â text and then a mathematician has to read it or like he doesn't understand and another mathematician and then they might discuss it for weeks and decide, okay, that might be correct or it's roughly correct or it's a bit, it has some holes but it's essentially correct or it has a subtle bug. So it's much harder to get to a closure even for a single CRM. It happened on various â latest attempt on solving problems that people discussed for weeks whether the computer generated solution was correct. If it's in lean, then there is no more discussion than the proof is correct. You might not understand the proof because it's very abstract, but it's correct.

Allen Roush: So related to that. â you had that â proof bridge paper, which has â joint embeddings aligning like the natural language and the lean proof pairs. But how sensitive is that to distribution shifts across mathematical subfields? I didn't look exactly at all of the training data, but I assume you didn't get every mathematical subfield, right? And so does a model trained primarily on analysis proofs transfer to combinatorics or algebra? â or does each domain require its own like embedding space or maybe even â further like set of data?

Christian Szegedy: â personally am not on that paper so I don't really know the exact â But yeah, I like that paper but I don't know the exact details. I assume it's... â mean, actually I was â up some similar ideas already in 2019 in my... â auto-formalization as a frontier for AGI. that's basically that was my paper. So that was promising to pass to our general artificial intelligence. So that's basically, mean, I think it's, so this aligning embeddings, I think â it's a nice thing if you do it because you, I mean, you have less resources than to cannot train like large language models because you don't have enough. compute. So then that might be helpful and I might be accelerating the overall research or like making the auto formalization much more â efficient in terms of compute. But a lot of this auto-formalization today is just made by just agentic AI where you are just using some frontier language model and you don't need these kind of pitches to be honest. Probably it will be a good idea in the long run and I don't really know how sensitive is it to... to all of them in generalization questions. So I have no idea.

Ravid Shwartz-Ziv: So do you think, like, okay, so let's assume that you learned that you learned the model like to be really good in math formalizations. Does it useful to, do you think it's useful to other fields like both the model itself, but also like the metals and the thing that we can learn from it?

Christian Szegedy: So could you repeat the question?

Ravid Shwartz-Ziv: Like do you think that now like model that you're trying to be expert on lean and like to be very good in formalization Do you think it can like you can use the same metals? In other fields that I don't know if they're right even like coding right like do you think there is some? Generous ability between these fields

Christian Szegedy: Yeah, yeah, mean, so I mean, it's hard to say definitive statements. measurements are all over the place, but there is a general agreement that especially what he found in early reasoning models that models that were trained on informal mathematics, like solving informal mathematics, not lean, but informal maths definitely have coding abilities. So basically there is translate transition. So basically you you have generalization from informal mass to coding. So I assume that if you have an extremely sophisticated system that does Lean, it basically probably helps programming, but I mean, I don't think anybody did that â experiment. So generally, I don't see Lean as a prime capability or like any of the formalities. So that's not the, my goal is basically it's more like So there are two goals for formalization. two main purpose that it can have. One purpose is create a scaffolding. â at which you can start addressing more and more complicated mathematics without any human feedback. So you can say, okay, if my lean is really good, then I can go to more and more frontier maths and start solving problems that no human has solved and still verify them and not wait for like years for humans to verify those results. So it's like it accelerates things like 100 or 1000 X. The second thing I think for lean is important is is basically this kind of like it can become kind of like the assembler language of of all of reasoning and because mathematics is not just mathematics so I think when mathematics will start to become cheap and easy to get like as Teri Tava said, generating ideas now is super cheap but I would say even just doing and creating a completely new mathematical field will be very cheap like a few hundred dollars maybe and then at that point you can devise mathematics for various particular specialised purposes in various domains. like biology or physics or computer special areas of computer science that were not mathematics yet because first of all we are just humans are too slow etc. And then we can basically create like whole mathematicizations of topics that were not mathematics yet. And we have much more guarantees of correctness. So for example, AI is not mathematics yet. And we could get much, much better foundations for AI if we create the right mass. And that right mass might be developed by AI, not by us.

Ravid Shwartz-Ziv: But do you think we can do it for all the fields? We can describe all the fields with math?

Christian Szegedy: No, not all the fields. I mean, we don't necessarily need mathematics to tell whether. So if you do self-driving and you want to recognize the street sign, then you don't really need like a mathematical theory for that. you might discover rules that make the models much more robust, and then they don't, then there's rules, there's mathematical. methods will help us to create let's more robust self-driving systems because then the AI will be used to build those AI systems. So I think of course the particular application of like saying which route should I take to get to a certain point on the map There is mathematics behind it, but not that particular question is mathematically interesting. The mathematically interesting is, okay, how do I find the shortest path or how do I look up something quickly, et cetera. And those things can be addressed mathematically. And I think that will be a big thing because now we can create math very, very quickly and we will be generating much more efficient algorithms much more quickly. And those algorithms will be verified, not like... The code we generate right now is not really trustworthy at all. So all these wipe coded programs are kind of nice to some extent and they work to some extent, but they have their drawbacks.

Ravid Shwartz-Ziv: But do you think, like, what does it mean to rust in this case? Because, like, at end, right, I want to build, like, a website or whatever, some app, right? What does it mean? really depends... At the end, the visibility of this app really depends on the user itself. So how you can even think to formalize it in some sense?

Christian Szegedy: Mm-hmm. Yeah. Yeah. You can still formalize a lot of like aspects. can ask the same question, why do you verify chips or whatever. You verify chips because they are... they go through a long optimization process and during the optimization process, small things might break. So the human specifies the chip and at the end of the optimization, your chip might do something else that the human wanted in the beginning. So that's already one important thing to verify. The second thing is that there are definitely â important aspects of every code that you don't want to happen. So for example, you don't want to crash with a memory error or a segmentation fault or whatever. â want there are various kind of injection attacks that you don't want to happen ever or you don't want to access certain memory or you want to say that if a user initiates a certain transaction it never will cause certain part of your database change or all those things can be basically all those properties could be â extracted by an AI and then formulated mathematically and these invariants could be kept through the overall development process. So currently if your AI, maybe your application works and then you ask your AI to refactor your code and then it changes 10,000 lines and then who knows what it broke. So we definitely. It's much more dangerous than in chip design where the tools were developed with extreme scrutiny and say, okay, we make sure that there is no bugs and there are thousands of unit tests and other tests. And then still you have bugs and still you break your chip, but the verifiers catch those mistakes at the end. So you don't tape out another Pentium with a floating point bug. So that's basically for code, it's even worse now because our tools that optimize the code are super brittle. They're like much, much higher probability of introducing errors. So it's basically guaranteed at this point. So that's something we need to fix. â And that's why I think, but these websites or whatever, these vulnerabilities are one thing. I think what's a real other interesting thing is algorithms that rely on more mathematical insights. And there are surprisingly many areas where you actually have like a clever algorithm at the core and you want to prove that that clever algorithm works or it has the certain characteristics like it's fast enough, it doesn't consume certain memory above a certain amount, et cetera. And those things â I mean, this was traditionally the work of computer scientists, but even every single problem that people thought about took years to refine gradually over and over, like publishing papers about that algorithm slightly better and better. And now this could be done automatically.

Allen Roush: So related to that, I've seen from your mission statement, and I think you had some public announcements back in September mentioning you wanted to increase the amount of formal mathematical code in existence by two to three orders of magnitude. And we're talking about the risk of bugs. Mathlib already has 1.5 million lines of code. Say it again?

Christian Szegedy: which is tiny, which is tiny. I mean, 1.5 million lines of code is nothing compared to code and code, even like simple projects like.

Allen Roush: Yeah, well... Well, Thor, are you claiming 150 plus million lines of verified lean in like a year then? And what's the bottleneck, right? Like, is it generation speed or like proof search or like the availability?

Christian Szegedy: I think at this point it's getting more and more, I human is still a bottleneck because not every, so our formalization systems are not 100 % automatic, especially when it comes to formalizing definitions and theories like from scratch. Then humans typically have to check that the definitions are correct. I mean it's getting better and better and we have more and more automatized ways of verifying it, but it's still a bit like...

Ravid Shwartz-Ziv: Thank

Christian Szegedy: â problematic. The second problem I see is compute cost and this formalization effort cost a lot of money and they had to go down in cost in order to achieve the 100x â more verified math. And the third problem I think is collaboration. So basically we don't have a... So they have a lot of isolated efforts, but they are isolated and they don't build on top of each other. So we have to have an environment in which these agents or companies can collaborate and then build the overall building of mathematics not just like individual towers everywhere.

Ravid Shwartz-Ziv: So, like, I know a lot about verification and things like that. So, like, describe me, like, what is the actual work? Like, you create a lot of, like, proofs and then try to validate them?

Christian Szegedy: Yeah, that's one aspect, but in general. actually, I think I already â mentioned two goals. So one goal is that, what you said right now is that if we prove something completely new, like here I come here and say, OK, I proved the... â the ABC cojecture or whatever. some people like Mochizuki claims he did it, but many people don't believe it. But we could also check his proof if our systems get really good. And then we could find, okay, mean, either we fail at some point and then we can ask the target, okay, we could prove this, this and this, but here our formalizer just gets stuck. can you help us? And then we could avoid this kind of controversies in mathematics. But I think the real goal is not really like clearing a few controversial statements. It's much more about... â like mathematics, so all these AI agents will be extremely powerful within a year or even half a year. So they will be better than any human and then we will start creating mathematics at â basically solutions to a lot of problems that were never looked like possible before, but we will not be able to verify those. How do you verify that amount of mathematics if you produce like a lot of... like English language mathematics, then you can ask an AI to check it, but do you trust that AI that checks those statements? I mean, or that proofs? I mean, I wouldn't. I don't think that the AI's are trustworthy in that respect of saying, okay, I went through this proof and I think this proof is correct. And then we say, okay, we have this proof. So I don't think that works. I really think we need a formal mathematical proof that we can check without AI and then we can say, yeah, okay, now it works. And the other thing is that I think is this formalization will become basically like an, so basically like an invisible machine language of all of reasoning everywhere. So it's basically just like the same way that you don't look at your assembly code typically where you write the program, even if it's a compiled problem, but otherwise even. So it's like same thing will be with Lee, that you, in the background you always have this checks that everything is all right, everything is satisfying properties, everything, all my code is working, et cetera, or all my CODEMS are correct, but you only interact with natural language. You don't need to care about all that, like a of Lee. and formal verification going in the background. So a chip designer also doesn't care what's going inside the set solver. But I mean, that might be the source of this controversy that people came like, okay, that there was this controversy that Masing was formalizing this sphere packing result and then... Then people came in with various problems and they're saying, okay, I mean, what do we do with this result? I mean, we can accept it as â like all that code and but that code is not very nice and they cannot build on it because it's a code just generated for, not for humans. It's kind of like similar than if you write a C code and then you compile it to assembler and give it an assembler program where that, okay, now you have to develop. You have to build up on this assembler code. And then, of course those programmers would be upset because they say, I mean, I don't want to extend SC compiled assembler because I have no idea what it's doing or how it is doing. It's very, very artificial. And then, so there is these two schools, these old school formalizers who formalize because they like this nice formalized proof that they cannot, so it's basically an artisan work. And then there is the vision that you generate all these formal proofs and nobody wants to... But we don't care that anybody looks at them, that's not their purpose. And I'm in the later camp, I don't really care about the beauty or whatever of those limp loops. For me it could be a completely different logic or whatever, it could be a very messy first-order logic proof, I would be just as happy with it. But other people, like... They are in the camp of like, yeah, it's our garden, this nice garden of all this super nicely written formal proofs and we just tending to this garden and growing this proofs here. Yeah.

Ravid Shwartz-Ziv: And do you think we can use it to, for example, chain of thoughts, right? Like all these discussion about, does it useful? Or can we use this kind of output of the model that blur about something that we don't really know? Do you think we can... use like, I don't know, like in the optimal use case we can have actually some verification process of this output.

Christian Szegedy: Me, I don't understand. So what verification process do you want?

Ravid Shwartz-Ziv: So I don't know, I want like, for example, like if you said it like in math, right? In math, we are doing some verification for like from, verify the solution, right? We verify the process and this give us some guarantee about the solution. Do you think we can generalize it to other cases that like we can have some guarantees about the solution based on the intermediate steps of the process?

Christian Szegedy: Mm-hmm. Yeah, okay.

Ravid Shwartz-Ziv: For example in biology Right? Like, if we know, like, that, like, if we can have, like, ask a model, like, what is the, the final solution to some, like, reaction, like, we can have, we can check it, right? We can check, like, the intermediate reactions and if we have a good understanding of these reactions, we can verify it, right? So, do you think this verification of the intermediate â step, this is â

Christian Szegedy: Yeah.

Ravid Shwartz-Ziv: something that we can use in other places. say like math is very unique in this sense that like we know all of it, we know how to decompose it, so we can we can use an abstractly to...

Christian Szegedy: Yeah, mean, as mentioned earlier, I'm very, very, very bullish about mathematics in a lot of subjects that looks very hard to mathematics. Because basically AI will make our mathematical abilities so strong. that you will be able to capture mathematical statements that are much less clear cut than previously. So for example, you will be argue, so if AI helps us to discover certain statistical or probability, so theoretical stuff about how â quantum mechanics works or like how â molecule works or chemistry works or then we can basically we will be able to have a lot of, maybe not 100 % logical evaluation, but very strong statistical bonds. on what is the probability that something went wrong. So for example, like if you do protein folding, it's not clear that your protein has the absolute minimum energy, that it's really the way it will fold. But mathematically, you could prove certain lower bonds and say, okay, I'm only this far away from optimal. And then also the question would be, is there any other structure that folds similarly? this can be captured. So basically, AI will be able to create guarantees that can be captured with very sophisticated mathematics and then that mathematics could be verified with verifiers. And I'm very bullish about these things because this is almost impossible for humans, but for AI this might be. I think it will be possible. â We need to accept that AI is so much more powerful in mathematics. that we actually aim for these kind of applications. Because a lot of people are just not really seeing the potential of mathematics in these domains because they say, this chemistry or biology, it's very, very fuzzy and we don't really know the rules. And that's true. But it's us. So we already discovered a lot of new things in the past decades. it's like humans are slow and it takes a very, very long time to come up with these things. But AI will be much, much faster.

Ravid Shwartz-Ziv: But you think there is a trade-off, know, for example, like I worked in the past on theory for AI, you know, like to prove like what does it mean the intermediate level thing like that or compression. But in the end you have a very fundamental trade-off between how much you can formalize it in a rigorous way, like to practice in real world, like how the actual models are working and what you

Christian Szegedy: Mm-hmm. Yeah.

Ravid Shwartz-Ziv: can assume about them. So do you think this is also true for â this case? There is like a trade-off between rigorals and like to be like very formal methods to do math and models, NLMs that you can scale to billions of models?

Christian Szegedy: Mm-hmm.

Ravid Shwartz-Ziv: of parameters.

Christian Szegedy: So I think the number of parameters, the increase of parameters might be a good thing for mathematics because you can look at limits. If you have like very light structure you often get like a simpler statement and then if you look at smaller structures but... So yeah, I think you are raising multiple questions in one question. So one idea is that, so for example, when we think theoretically about AI, then we often don't apply the exact same mathematical output to create the AI as we, so we are using mathematics to build an intuition, and then we just say we approximately apply that intuition and see whether it has at least any correlation. with the real world behavior. And I think that's already possible because that's a kind of like within the human... So basically even within human abilities. So we can do that. We always analyze things mathematically and some point it breaks down and say, okay, we give up, but at least it gives us an idea of what to design or how to run this or what to... So almost all the optimizers that were designed in the past years, they had a mathematical idea in them, but they're not... whatever you prove, it's not really proving that actually your optimizer will be better than the other. Or when you have something like this, â this essential, â new attention mechanism like the latest one where you are just trying to remove the redundancies from the attention. â That also has a mathematical idea, here there correlations here and it's not efficient and whatever and we remove the correlations and then you try it and it works and then you don't prove that it works but at least the mathematics gave some... idea of how to improve things and then if you try it out and AI will be doing the same thing. that's why it's important for the AI not just try random stuff and say okay here is a new attention mechanism I don't know why but it's like it's probably better. The other thing I think will be important is that you actually will be able to find like real So prove real things about AI that you cannot prove today because it's just so complicated and then AI will be able to look at, analyze your network completely for numerical analysis like floating point inaccuracies or maybe even like how you write on certain things. So there are a lot of extremely unintuitive, subtle details in... what you write and how it turns out in reality. So for example in floating point addition not even associative or whatever. So it matters how you add up things or how you do even the basic operations. It can make or break your model. And they can think about all these things and make sure that whatever has to cancel out cancels out or on average you get the right things and they can analyze these things. And it will be able to analyze thousand times more things than we can as humans because when you do it to today's age where you have like one month to come out with a paper and then somebody scoops you in the meantime if you don't do it then you cannot do like a lot of analysis. Just like, come up with an idea, quickly implement it, it works, okay, you can write a paper, that's it. But AI will be much more thorough than that.

Allen Roush: So related to this, â if Gauss can formalize field metal proofs, can it eventually formalize software correctness proofs? And so I guess my real question is like, is Mathematics Incorporated secretly a program verification company?

Christian Szegedy: Okay. I mean not secretly, it was in the charter so that was one of our goal is to create verified super intelligence and that applies to programming next and then sciences â later. So yeah, but I'm saying it's basically what I said about like AI, mathematics for AI, it applies to any other program as well. So AI is a particularly useful application because it uses a lot of math directly. So of course it's a good idea to use AI for AI, especially mathematical AI for AI. But I mean other code definitely. So I think that's what I said before that we definitely we want to move toward verified software and I think since a lot of software, the most interesting software have some mathematical tricks in them or they... They talk some complicated logic. example, if you look at all the various protocols, you have to verify that the protocols are safe in some way or the other. So they are mathematical statements about not having certain situations arising, so even whatever the players do adversarially. You want to verify protocols, want to verify numerics, want to verify â memory behaviour, algorithmic behaviour, like is this really â like consuming not too much memory, so it's a linear memory use for example, or all these things and it could be... So AI needs mathematics for that. For example if you want to like hard code like a simple big O analysis into programming language. Several people tried it earlier in 2000, making a programming language so that we are tracking the asymptotic behaviour of programs. But it mostly failed because it's like most of the behaviour depends on actual more fine grained behaviour of the program. So it's not like you can track it easily with a with a simple type system. So therefore we need like real mathematical energies and bigger notation requires an understanding of what is an algorithm, is asymptotic, what is a limit, what does it mean to function to be asymptotically similar or one dominating the other, all these things. is a mathematics, is calculus, this is algebra, whatever. is all you need and then you need basically a big foundation for all those stuff. So in order to analyze code we will need a very firm mathematical foundation on which you can do that and you don't want to hand code all these things it will fail but AI can do it.

Allen Roush: And then, â I'm curious then also â just going back to kind of, you you alluded to your like foundational deep learning work that you've done in the past. â Like, you know, you did batch normalization right back in 2015. â And â it has like so many citations now, right, 45,000 plus. And I think its original explanation was around reducing internal covariate shift. â But it's been challenged by some other work, â which suggests that, for example, that it smooths the loss landscape. Do you accept some of this subsequent work that kind of offered revised explanations?

Christian Szegedy: So yeah, mean, generally, I mean... So just to give a very candid view on what happened is basically, I I was thinking of this Batch Norm idea before, like the paper for two years. And then when I met Sergei, he was already implementing his own version without completely independently. And he was proposing this covariate shift idea and he wrote in the paper and I was saying, sure. I didn't completely understand it, to be honest. I... I'm always extremely skeptical of any theoretical results of being presented as an ultimate truth. â So I mean...

Ravid Shwartz-Ziv: So why you think? Why you think does it work?

Christian Szegedy: So I mean, I had a very simple, original motivation for doing, similar to... So I was thinking of that for a long time, around the time that Dropout came out, and I was saying, if something, typically, it's a no-brainer that if you want, I mean, back at that time, it was a no-brainer that if you want your... your network to work well, then you want it to normalize its inputs. So it's basically, and I said, okay, why not normalize the inputs everywhere? I mean, it's like every layer, every output of every layer is the input of the rest of the network. So if it works on the first layer, basically the rest of the network has to have also a normalized input. But I mean, there is other explanation that's saying like you want to have â your activations in the range that's kind of like the most sensitive range for your, functions so basically you don't want them to be like very far away because then if like activations go too far from each other then they cannot be like So one will dominate the other and then the other will not play any role anymore. The other reason is that... So basically there are lot of reasons of why normalization is a good idea in general and that was the main motivation and it was like a completely no-brainer to me that you typically optimize your input and it has a huge effect of normalizing your input activation.

Ravid Shwartz-Ziv: Do you think it's actually necessary? Because we talked with Zong Liu from Princeton, and they published a recent paper about layer-free normalization. Basically, they used sigmoid or tangents, â instead of normalization, and they learned the slope of it. Do you think it's necessary?

Christian Szegedy: No. Mm-hmm. But normalization.

Ravid Shwartz-Ziv: like this

Christian Szegedy: mean the slope of the sigma is, I mean I don't know that paper I might say something completely stupid here but I mean the slope of the things are basically like constant multipliers before, so they are kind of like normalizations. mean so yeah I so back to your â

Ravid Shwartz-Ziv: No, but like... No, but like... But do you think? OK, continue. So, yeah.

Christian Szegedy: So yeah, your original question was, it necessary? And I say, no. mean, batch normalization, think, is a convenience tool that if you don't know the right hyperparameters, in certain circumstances for certain networks, like convolutional networks especially, batch norm is helping you to extend the... the set of hyperparameters for which your model will work well. So you have less luck. I don't think it's necessary. It's like if you have the exact right hyperparameters, you typically don't need best known.

Ravid Shwartz-Ziv: What do you think is necessary for current â AI models? What are the fundamental parts there? Or maybe they aren't. Do you think now if we will replace a transformer with other architecture? I don't know.

Christian Szegedy: What do you mean? I don't get it.

Ravid Shwartz-Ziv: whatever Mamba or something like that, or do you think if we now replace, if we will change the data or the training algorithm, do you think there is something fundamental that we must have in order to train good AI models?

Christian Szegedy: XLSD. I we have like reasonably good AI models already and it typically never works without good data or good optimizer or good â network and good normalization. I mean, transform has a lot of normalization, not batch normalization, but I mean, â several kinds of normalizations work, but batch norm is not the most appropriate for them. â But yeah, there are results that you can avoid all these normalizations if you do other stuff. â it's not, I don't think, so I mean, I think data is absolutely critical to get right. And all the others like normalization or infrastructure or like â architecture is somewhat secondary. Of course, if you are too far away from the, so there are a lot of different architectures that can work pretty well. â don't need to be, I mean, we do â because â that we have like in three, four years of tuning of various components. Like we have like insanely high â mechanisms and â all the research of how to manipulate them or normalize or to correlate or whatever. â then we have like mixture of experts that are also very, very well studied and they help you to use less memory during inference.

Ravid Shwartz-Ziv: Mm-hmm.

Christian Szegedy: and we have KV caches and all these things. created so we have a huge infrastructure for LLMs that like transformer based LLMs where we just built it up. It's basically this similar to the hardware lottery. We have the infrastructure lottery that is very hard to create a new infra from scratch that competes with it. It's not because other infrastructures might be much better. I'm â 100 % convinced that transformers will not be the... the main workers in five years. I it will be something that AI will design or develop or invent from scratch and it will be much better than Transformers.

Allen Roush: So speaking of predictions in time, I think that there was a Twitter thread where you revised your superhuman AI mathematician timeline from 2029 to 2026. You think that we will achieve that this year? specifically, what specific capability would a system need to demonstrate for you to declare the milestone achieved?

Christian Szegedy: Yeah. Okay. Okay, so I mean, I had some very clear, I think it's like, I mean, clear to me, like, guidelines. So basically, when I revised, so I mean, my originality, when I revised myself, I revised from 1029 to 2027. So that was basically the original thing. And then I had like a discussion with â Francois, like 2022, and there he was. He said, yeah, let's bet. And I said, OK, let's bet on 2027. And he said that, mean, you have, I mean, I'm not so certain about that. And then he proposed, let's do 2026, so June, which is the. We had a paper together with FranÃ§ois 10 years ago, so that was with a 10 years anniversary of a DeepMath paper from 2016, and then said, okay, sure, that's bad. So I mean, I'm not really â sure if it's, so the criterions, let's put it, so first of all, the criterions that I suggested that we will formalize, I think, 10 or 100 textbooks completely autonomously, and we will see, â AI's completely autonomously created AI proofs on at least 10 conjectures in 10 different areas of mathematics that were attacked by humans for at least a decade or longer. So I didn't suggest that we will see like any millennium price problems solved this year or next year even. I don't really know anything about maybe it will take 100 years even for AI to prove PN equal NP if ever. But. I mean, these things are singular and you don't really know how hard they are. But I think it's like what we can see already is that we... So the formalization abilities go up exponentially, like we will see, I think. 10, 10 books easily, but even hundreds by this summer, formalized completely autonomously. And we will see a lot more. So we already seen several not too hard statements, but still frontier statements that have been open for years solved by AI. And I think it will accelerate. So we are very, very close to that point, what I suggested like five years ago.

Ravid Shwartz-Ziv: What? What are the missing parts?

Christian Szegedy: I I don't know if it's missing, it's more like is the capabilities, how far is it really like making, can do anything that mathematicians can do. And I mean, I currently, I I think that this year we will not be like as, we will not see an AI. by this summer or whatever, or September, that is as good as a Fields Medalist, but it will be like a very strong working mathematician level, like top 10 % time. And I mean, what we need is basically I think is verification, some better verification, but a lot of companies are working on it and people working on that, also universities. And also because I mean it doesn't matter if you prove a hard problem if you cannot verify that the solution is correct, right? So that's why we need formalization. And then the other thing we need is like a much bigger â repository of formalized mathematics because you cannot really formalize things if you don't formalize the necessary ingredients.

Ravid Shwartz-Ziv: and

Allen Roush: And we've talked a lot about like Lean, the mathematical formalization like proof assistant. Are there any other proof assistants that you're very interested in for all of this mathematical formalism? I've heard of, I'm really not an expert on the field, so I don't know the quirks of the ecosystem, but I've heard of Coq and some others. What's your take on all of these?

Christian Szegedy: I â generally, even in the last â years I've at Google, â and Lean was big, I didn't work on because I thought that AI doesn't need the same level of niceness. â Lean is very nice for humans, but for AI OK. So if you're... â If your language is strong enough, like COCK is strong enough and WHORELIGHT is strong enough, these are all higher logic systems. So you don't really need, even first-order logic would suffice. So you don't really need LEAN or any of them. You just need one of them. So I mean, you don't want to build up like 10 different repositories for all of them. It's like it's... I mean, I think it was Linus who taught once that the good thing about standards is that there are so many of them or something like that. So it's basically the problem is that you don't really need like 10 different programming languages for the same purpose. if you don't, especially if you don't have to write them manually, if you don't have to write them as a human. So for example, Rust is nice, but if you would have C++ plus formal verification, then who cares whether your AI writes C++ or Rust? I mean, I don't. It's like Rust is a nicer language than C++, but I mean, I still fine. I mean, if it verifies me everything with guarantees, with a proof checker, then I don't care.

Allen Roush: Well, there's concerns about who has the most training data in a particular language. It's also about token efficiency,

Christian Szegedy: Yeah, yeah. I mean, so, okay, so that's a very good point. So one of the reasons that I work on Lean now and almost everybody is that Lean has almost human â data that basically that was called was written in Lean, most formalization currently. And that basically shifted the tide of saying, okay, previously there were a lot of projects and all of them had similar data, but now Lean has more data than all of the others. So naturally, as an AI person, you go where the data is and say, let's use Lean. But if you start... doing reinforcement learning loop and generate a lot of proofs from scratch, then you wouldn't care. It's like, okay, then you start doing RL and then you address all of mathematics and generate all these statements completely automatically. Then basically like metamask would do or whatever, even the crepiest formalization would do.

Ravid Shwartz-Ziv: And so I have a question about like math in general. So like, do you think so right now that we can essentially formalize everything or almost can formalize everything and you know, agents can go and prove exponentially many theorems, right? And we can able to check them automatically in the near future, right? â What do you think the role of mathematicians will be? What they have actually to... What they can do after it? Do you think we will see no more mathematicians anymore or something different?

Christian Szegedy: So I think in the next five to 10 years, we probably see like a, so the role of mathematicians will shift more towards like giving the, so basically the direction of which areas do we explore. â So basically telling the AI that, okay, I want to see, like, develop a theory for this or that theory for that, then â basically iterating with AI on various of these things. I'm not sure if that role will go on forever, but I think for the next five years probably will. Also, think mathematicians have a lot of other roles like teaching and then I don't really... I mean, I think AI can be a very helpful like help to educate people, but I wish, I mean, I have like kids too and I don't really wish that they will be exclusively taught by AIs. So I actually, started a company in Hungary that is more for AI augmented education and the goal is to create a system where teachers are augmented with AI. you they can do the right thing. So basically they could give the inspiration, the human touch, everything. So basically what the teacher should do, but they don't need to do a lot of the boring work like reading or setting up problems and figuring, or creating the curriculum, all that can be done by AI. But â the teacher should be there as a role model for the student. So I think humans just like humans, it's like that's our nature is that we want to be with humans and we get excited by humans and if you look at chess for example then I mean chess is not that it's like it became kind of like a soap opera but still that is the skill is also there so it's like people like skillful people if they are even if they are much worse than any AI â

Ravid Shwartz-Ziv: But do you think like chess competitions are between humans, right? Like we don't see, I don't know, mathematicians that are compete with other mathematicians. Do you think like...

Christian Szegedy: A lot of the history of mathematics was that, like, you're, like, starting with Cardano and whatever, even Pythagoras, I think, it's like they were challenging each other and trying to, Newton and Leibniz or whatever, so they did a lot of these challenges to each other, that's, can you, I can do this, can you do that too, and then... So I think that there is a much more human element to mathematics than most people know. â But I think this puzzle solving effect of saying we are putting it on a pedestal that this person is so much better problems. Maybe we'll still do it, but I think it will be very strange for mathematics. I don't really know how it will work, but I think â it is kind of like another analogy that I like to say that, I mean, when... Astronomy, before the invention of the telescope, you had to look at the sky by your natural eyes. So you needed very good eyes to be an astronomer, because otherwise you couldn't see the stars or the planets. And then there were clear limits of how much you can do that way.

Ravid Shwartz-Ziv: Thanks.

Christian Szegedy: And then the telescope came by and then people like, okay, we don't care anymore. we can just like, and it becomes more like a theoretical work. And I think that similarly like mathematics is we have this telescope that can go into the look into the mathematical universe much more deeply than any human mind can. And we can decide what we do with that. I think it will be some mathematicians we need to adapt, but they typically still in denial. think that most of them are kind of excited that they can learn suddenly. It's kind of like coming, dying and then you come to heaven and suddenly you can learn a lot of, okay, how this works and then God explains to you. That's the dream of a scientist, a religious scientist, that I die and then I can learn everything.

Ravid Shwartz-Ziv: They can implement. Now when they retire they have time to put time in all the other things that they wanted.

Christian Szegedy: I mean, it's up to us what we do with the AI, but I think, I mean, I'm very concerned about AI's potential. I don't really think that humans need artificial work. So it's like... So that's why I'm kind of like, I'm not that worried about that. My worry is whether the benefits of AI will be distributed in a way that â nobody is left behind. So that's an important problem. Other problem is of course AI could be like completely disastrous, just like terminators killing everybody. I think that's a very realistic outcome. And then we can like other even crazier outcomes that I also believe. but realistic. So yeah, think I mean, I'm not super optimistic, but so that's why I'm also kind of thinking about these problems. And I think verification could play a role to some extent of making AI be more honest and not cheating us.

Allen Roush: So, â related to what humans role is in, you know, as AI gets better, I'm influenced by how in chess, it took five to 10 years after Deep Blue comes out for AI to become so much better, chess engines at that point, that even grandmasters trying to help AI like centaurs were worse than the AI models alone. But at the same time, I'm also influenced by the fact that chess is probably the biggest it's ever been. And it seems like most people who are sufficiently using AI right now are the busiest they've ever been. So I'm wondering if Jevin's paradox outweighs. And so what's your take on like, do you think that there's like a job apocalypse or are we all gonna work harder than ever and the work explosion or? I don't know. â

Christian Szegedy: These are very exciting questions. I think a lot about it. I made several â predictions like this about mathematics. Also I made another prediction like five years ago about RKGI benchmark. discussed it with friends. I just lost it by a month. â But yeah, think my track record is fairly good in these predictions, but I started... to be like very honest to myself, then I'm, I think we reached a point where it's like there is an event horizon for me at least in the next two years. Maybe for the next year I can make predictions, but not more than two years. So I think it's like the potential of AI is getting to that level that is basically any thinking of like any projections or whatever I do is like lost in this non-linear effects of AI on the whole society. So I think, so what I'm most concerned about is that we have a lot of different options that it's like it depends on us of how do we deal with that and I â distrust humanity more than I distrust AI actually. I think we don't have a good track record of making, so taking care of everybody to the effect that we could. So basically humanity is a mess and yeah, and now we are at a point in history where we should get our... Yeah, we would need to be better than that, but it doesn't look like we are getting any better. So I think it could be very sad. We could use AI to basically enslave humanity, or we could use AI to create dictatorships and surveillance everybody. Or we could use AI to build huge robot armies. And I think that's, unfortunately, that's something that's happening. I think humanity could have a super bright future, but I think we might destroy it.

Ravid Shwartz-Ziv: Okay, we're almost out of time and there are a few questions, several questions on social media. We answered some of them, so let's see. Okay, what should students who work in this field focus on if companies like Mathin can do large-scale auto-formalizations?

Christian Szegedy: I mean, if you do manual formalization efforts, would say, I mean, you do it for your fun maybe, but not because it's a good career path. I don't really understand the question in terms of like, I think... So basically, I don't know the whole general this question is. I mean, it depends on the scope of the question. So if it just applies to formalization, I definitely would say don't do artisan formalizations if you don't, I mean, if you don't have fun with it and just you want to like solve Sudoku puzzles or whatever, that's similar level. But I mean, if you generally ask about mathematics, so what the mathematics students should do, I still think one should rethink the... career choices or life choices or whatever. And try to embrace AI and then have fun with AI and then figure out â how to use it best, to leverage AI best. So I think it's like the traditional work â passes or career passes I think they will be over very soon and the most important skill set will be how well you are working with other humans and how well you work with other AIs.

Ravid Shwartz-Ziv: Okay, â another question is like, do you think it's possible to formally â verify alignment properties of a model or they fundamentally to informal for formal systems?

Christian Szegedy: Yeah, think we can, so it's, think it's, cannot 100 % verify alignment. So mean, I like the word validation better. So validation means that if you give some task to an AI, then you. You make sure that the AI's formalization of that task, like saying it transforms, for example, in mathematics, you take an informal statement, you transform it to a formal statement, then whether the two are in alignment, whether the formal statement actually, if you prove the formal statement, would it count as a proof for the informal statement, for example? Or you create an algorithm based on a definition, then would your algorithm actually, the formal algorithm would do the same thing as your informal specification said. And I think we could improve validation a lot, but I mean it's inherently impossible to make it like hundred percent certain. Because I mean, that's the whole point of informal is that it's not 100 % specified. So you cannot make it 100 % certain that the human meant what is done. And there are a lot of other issues that, for example, alias are raised like, for example, what happens if your AI solves the task you told it to solve, but in the meantime, it kills humanity, for example, or whatever, like does a lot of other harm, like boils down all the water from a river or stuff like that. â I mean, these are much harder questions and the problem you cannot specify all the harms that an AI could do so it's very hard to verify formally whether your AI doesn't do all the harm that could be bad.

Ravid Shwartz-Ziv: Okay, yeah, I think we're out of time. Do you have anything else that you want to add to promote, to sell?

Christian Szegedy: I don't know.

Ravid Shwartz-Ziv: Okay so, Christian, thank you so much for coming. It was a real pleasure. Thank you so much and, Ellen, thank you.

Christian Szegedy: It was my pleasure, yeah.

Allen Roush: Yeah, it's a pleasure to be here and to get to ask these questions to somebody so legendary.

Christian Szegedy: Thank you very much. Thank you very much. a great day. Bye.

Ravid Shwartz-Ziv: Thanks so much.

Christian Szegedy