May 20, 2026

Intelligence in an Open World - with Mengye Ren (NYU)

Intelligence in an Open World - with Mengye Ren (NYU)
The Information Bottleneck
Intelligence in an Open World - with Mengye Ren (NYU)
Apple Podcasts podcast player iconSpotify podcast player icon
Apple Podcasts podcast player iconSpotify podcast player icon

We talk with Mengye Ren, Assistant Professor at NYU's Center for Data Science, about what intelligence actually means once you step outside a benchmark, and why scaling a single centralized model isn't the whole story.

We get into why intelligence has to be defined in open environments, not closed ones, and what that means for how we measure progress. We push on the creativity question: today's models sample bottom-up from a softmax or a Gaussian, with no internal loop of consideration, and as Mengye puts it, we haven't understood creativity yet and we're already prepared to hand it over.

We also talk about what's missing for the next paradigm: continual learning, memory, embodied grounding, and smaller models that actually accumulate experience instead of re-deriving everything from scratch each call. Along the way, we get into JEPA and latent variables, biology as inspiration vs. blueprint, why frontier labs don't lean on explicit latents, the limits of synthetic data and world models, agent-to-agent communication, model uncertainty and forecasting, and whether ML education still matters when AI writes the experiments.

A grounded, contrarian conversation about where AI research should be looking next, beyond benchmarks, beyond scale.


Timeline

00:00 — Intro and welcome

01:24 — What is intelligence? Defining it relative to objectives and open environments

04:19 — Is intelligence really the path to human flourishing, or is it productivity?

04:57 — Safety, scalable oversight, and whether stronger models help or hurt

06:09 — What does "alignment" actually mean?

07:18 — Centralized vs. decentralized models: objectivity vs. personal meaning

08:50 — Hinton vs. LeCun: where Mengye stands on AI risk

10:29 — Bottom-up vs. top-down architectures and feedback loops

21:28 — Biology and AI: inspiration, not blueprint

24:14 — Biological plausibility, spiking nets, and where the analogy breaks

25:39 — JEPA, Mamba, and architectures beyond the transformer

27:31 — Language as a special modality: abstraction built for communication

29:04 — Are we too locked into the current paradigm? Risk of creativity collapse

30:09 — Synthetic data, simulation, and the brain's own generative models

31:43 — World models and physical AI: how babies actually learn 33:03 — The case for smaller, continually learning models

37:02 — The role of academic research in a frontier-lab world

39:47 — Why LLMs aren't funny: the creativity gap

40:35 — What research areas matter most: embodiment, continual learning, creativity

42:05 — Creativity is bounded by experience — and why bottom-up sampling isn't enough

45:35 — Agent-to-agent communication and the limits of sub-agents

46:39 — Model confidence, epistemic uncertainty, and forecasting

49:44 — Tokenization, static vs. dynamic worlds, and always-learning systems

52:20 — Latent variables, JEPA, and why frontier models skip them

53:40 — The future of ML education when AI writes the experiments


Music:

  • "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
  • "Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
  • Changes: trimmed

About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

Ravid Shwartz-Ziv: Hey everyone and welcome back to the information bottleneck and as always we have my co-host Alan.

 

Allen Roush: Nice to chat with you again, and it's really great to introduce our guest, ⁓ Mengyi, here.

 

Ravid Shwartz-Ziv: Yeah, so hi, Manny, Mengi. Mengi is an assistant professor at NYU Center of Data Science. And today we are going to talk a lot about intelligence and what does it even mean, intelligence, and how can we actually use it for better models and AI. Now, I know this is very broad concepts or topics, and we will try to make it more specific. But for sure, we can't cover all of it. But maybe we'll start with definitions, let's say, or like the basics. What do you think is intelligence, or what does it even mean, intelligence?

 

Mengye Ren: Great. Yeah. Hi, Ravid. Hi, Alan. Great to be here. Yeah, very excited to chat with you about various topics. Well, I think it's a very interesting question to ask, what is intelligence? And to me, I think it always be ⁓ defined with regard to some objective. And so the quality or the efficiency of an agent being able to achieve to certain objective ⁓ will demonstrate the level of intelligence. And of course, ⁓ it encompasses a lot of capabilities, for example, to be able to sense and perceive and act in the physical world, but also trying to adapt and learn flexibly in new environments. I think that's also one demonstration of intelligence as

 

Ravid Shwartz-Ziv: So do you think we can have ⁓ one definition of intelligence that this is somewhere very, let's say, measurable objective that we can actually try to get? Now we just need to optimize this and get an intelligent system then?

 

Mengye Ren: ⁓ I think perhaps we've been always trying to optimize according to some objectives or benchmarks and datasets. But the question is, this is not the full image. The world is so big to be able to encapsulating ⁓ one small model. And so we always need to anticipate changes and uncertainties into the future. So I think ⁓ this is a open question how we can define intelligence in a closed environment. And my answer to that is we have to look into how things change in the open environment.

 

Ravid Shwartz-Ziv: So, but why you actually think we actually need to optimize intelligence? Like, and then we want the model that work on some task and solve some task, you know, like maybe other task or many task, but why you think we actually need to, to in for intelligence, to have a model that has this undefined or like not well defined property?

 

Mengye Ren: Yeah, I think it's a great question. The question is, you know, what are we even developing for? Right? What is the end goal? Right? So I think the only thing that could make sense is to, you know, increase the total happiness of humanity and increase the chance of survival as humanity. That can be the only end goals. Right? Anything. Beyond these would be some intermediate objective.

 

Ravid Shwartz-Ziv: But why intelligence is the way to get there? Like let's assume that our goal is to ⁓ improve the happiness of humanity. Why intelligence models are the way to go there, to get these goals?

 

Mengye Ren: Not necessarily. think like, yeah, I don't think intelligence is the only way to get there. Intelligence might alleviate a lot of the burdens and manual work in a lot of places, but I think it's the main driver is ⁓ productivity and economics rather than ⁓ happiness.

 

Allen Roush: So this is a very, This is a worldview then that I have to ask ⁓ you about, which is like, do you think that as systems get more intelligent that we might get a natural tendency towards more safety? Because we've seen Anthropic already, you know, refuse to release mythos to the public, their latest model, and I'm kind of worried now ⁓ because I worry that if all we have is so-called scalable oversight where we just have smarter models, ⁓ try to keep smaller models from doing unaligned things. I don't think that that's a durable solution. So what do you think about all this?

 

Mengye Ren: Yeah, that's a fantastic question. think there are multiple perspectives regarding safety. And then one popular view is that you need definitely a stronger model, but you also need a line model. So that a strong model, given that it's aligned to humans, is able to provide higher quality in terms of safety. But there are also caveats to that views as well.

 

Ravid Shwartz-Ziv: What does it mean even alignment? Okay. Let's talk we talked to a long time ago with From Google like at the time that I talked about alignment and like how you actually like measure in line But I want I love to hear your perspective like when you say alignment, what does it mean? An alignment to your humans. It's like very very vague

 

Mengye Ren: Yeah, it's alignment to the survival of humanity. So that's the ultimate goal. towards that survival, there are many values that we wish to preserve in human societies that we don't want to be disrupted by AIs. And that's a value judgment that could be defined many different ways, but different societies may want. different types of alignment more specifically, but we want that value to be injected in AIs, ⁓ which is advocated by the alignment ⁓ side of the community and the safety side. Many of them.

 

Ravid Shwartz-Ziv: So what do think, and what should release their newest models to the world? Or just a PR? I think, personally, it's just a PR announcement, but...

 

Mengye Ren: You know, Yeah, think like ⁓ stronger model has a lot of potentials and they do have like safety implications and they are stronger. If used right, they can provide a layer of safety, but if being ⁓ exploited, it's vulnerability, it actually can disrupt the safety in lot of ways. What my view really is, and I think I'm going to write this in a white paper. in the short future is that we don't necessarily only need a big ⁓ centralized model, but we also have to explore these different decentralized model that represent the individual meanings and diversity rather than a centralized model. So I think they both provide value. Centralized is more arguably more objective and has deeper reasoning that can provide insight. in lot of high ⁓ impact issues, whereas small individual personalized models can provide more deeper interactions, understand personal perspectives more, and provide meanings as we explore our own human agency and creativity.

 

Ravid Shwartz-Ziv: But do you think, so you work with Hinton, right, at Google. So if Hinton is in maybe one side of the scale that AI will, we need to be afraid of AI now, and there are a lot of risks, and we need to try to deal with these risks in advance versus maybe young perspective, right? That everything will be fine, and we will find a way to deal with that when we will have it. Where do you stand?

 

Mengye Ren: ⁓ You know, like, I think different perspectives are always very fun to listen to and thinking about. Without these sharp views, we wouldn't have a clear picture. But I also think like the reality probably lies somewhere in between, right? So people are going to be increasingly aware of the risk for safety. And then people are probably going to figure out a way to prevent the collapse. So it's going to be interesting to see what it rolls out, but the self awareness is also going to be a key factor.

 

Ravid Shwartz-Ziv: Eren, where do you think... where will you stand in this fight? In this argument?

 

Allen Roush: Well, and the two sides are, again, one's the average. I mean, I tend to think that I'm very optimistic. I'll point that out. I'm very much like a bull on the idea that AI is overwhelmingly good for the world.

 

Ravid Shwartz-Ziv: Hinton vs. Young about the risk in AI.

 

Allen Roush: ⁓ And a lot of that is because I mostly trust that the people who are capable of building it understand how important it is to keep systems safe. ⁓ But there have been notable exceptions. Like for example, I thought that Grok, like what happened with Grok for a while there just making it totally easy on for anybody on X to like, you know, basically take the clothes off anybody they wanted to on images. There was there was a period there of about two weeks to a month of absolute disgust. and chaos on eggs. And ⁓ I thought that would have never happened, right? I thought that the hubris even of Elon Musk would not go as far as to allow such ⁓ unfettered, like unaligned systems to be released to that larger group of people. But clearly I was wrong. And so the idea that because I was wrong and probably will be wrong about this kind of rose-tinted glasses of the future, I don't know, I don't see a dystopia or anything locked in either, but I definitely think that the risks of a mistake made on AI safety are going to go up astronomically over time. And indeed, ⁓ Sam Altman just got attacked with a molotov cocktail and then there was another attack or something that happened just like today. And I even just sent to Ravidith an article being like, people are going to respond to AI with violence. ⁓ violence in the form of like war fighting you know Trump literally went on Twitter and was like go buy Palantir and he spelled out the stock PLTR he spelled it out I was like okay I see what the administration is telling people so I I don't know where that puts me between the two, but those are most of my safety concerns. And then of course, the whole, I guess, recursive self-improvement implying that eventually we'll kind of lose control as humans. And thus, going back to the whole, really have to get it right the first time on a super intelligent system.

 

Mengye Ren: The thing is, yeah, you're right that we really have to get it right the first time and as itself improves, it's going to ⁓ irreversibly change the way we think, the way we work. And yeah, so the consequences may be more far reaching than we're seeing right now. And it's not just going to be a single interaction, a single response that's unsafe. It's gradually distilling into every part of our lives.

 

Ravid Shwartz-Ziv: Do you think we will see self-improvement in the near future? A real self-improvement that the model basically improve itself without human intervention?

 

Mengye Ren: So far we haven't seen it and I have you know doubts on there might be some fundamental limitations on self-improvement and especially the way it needs to ⁓ require external worlds for information either from humans or it's able to sense and extract new information from the world. So far it's all this intelligence has been built on humans, human created knowledge. and it's able to bootstrap that. But to what extent it can bootstrap new intelligence based on its own generations, I think there might be some fundamental limitation. And one limitation could be potentially addressed by having multiple perspectives from different models trained with the more diverse data. And that's actually what our recent paper on verification, ⁓ LLM verification, is telling us that know, cross-model verification actually gives you better performance than self-verification.

 

Ravid Shwartz-Ziv: So what I think is that, I think you actually have some fundamental limitations in the model versus getting information from other sources because at the end, if you're assuming that the data is quite similar between models, what is the difference?

 

Mengye Ren: ⁓ So in the end, the data is similar. ⁓

 

Ravid Shwartz-Ziv: Let's assume that the data is similar, okay? That all the models have roughly the same data. So why we actually need different models in order to...

 

Mengye Ren: Yeah, so if the data is similar, you eventually trained some equivalent models and they wouldn't be much different, although the training process could still provide different trajectories of the weight evolution, but it won't fundamentally give you more different models. I mean, this is probably based on information theory. And ⁓ I think eventually it needs different data. so, you know, ⁓ there are just... different corners of the world that aren't just covered by internet. And these data fundamentally live in individual perspectives. And I think it could be a dystopian view to aggregate all of the possible data available on this earth into one single model. But instead, I think we should keep it ⁓ at different corners, if possible.

 

Allen Roush: So what do you think about the importance or value of continual or incremental learning? Because Dario and others in the ⁓ Foundation Model Labs seem to argue that you don't actually need to enable it to have proper recursive self-improvement. ⁓ But I'm a big personal defender of it. I think it's very important, and we need to figure out how to enable it without, you know, like in Microsoft, Hey famously had enabled and you know 4chan or whatever and this was before the whole chat GPT revolution this was very early Markov chain based chat box but 4chan was able to do early type of prompt injecting of just spamming it with racism and so that was a bad experiment but I think if we if we make continual learning today with LLMs significantly more like like what do you call it like gentle in how much it tunes but actively updating the weights that seems like a possible mechanism to do a type of self-learning with that like human feedback? Like do you think that's important or necessary for the future of ⁓ like learning?

 

Mengye Ren: Yeah, I think it's very important. I do a lot of continual learning research. I have been asking myself the very hard question that you just asked, what is the value of continual learning? And to put it in a simple way, right? So if you just accumulate different contexts in a big context window, however long that is, your function that process the context will never change. So that's the same perspective that you're going to look at. data point number one and data point number two. In a continual learning system, the function that you process these data changes. So the way you look at data number two will be shaped by the information from data number one. Okay, so I think fundamentally it's going to give us lot more efficiency and flexibility in terms of self-improvement, understanding this changing world. Now, if I have to make a claim, I don't think I can make a claim today. I don't know. I stay agnostic with how long and how far these always accumulating system can go. Perhaps Dario has a point, ⁓ but I think I would ⁓ not commit to a full claim.

 

Ravid Shwartz-Ziv: But do you think like at the end, continual learning will be the main way that we will learn or that we will teach these systems? Or like we will still have this very offline process of training and different types of training?

 

Mengye Ren: For real world agents, ⁓ like robots, like us, like animals, continual learning is the only way to learn. There's no other way. We don't know any other way of ⁓ learning in the real world. Just because the data is so scarce, and we have to leverage every single data point on a sequence to be able to embrace this non-stationary world. However, like if your interest is to collect a big data set from a static world, you might not need continual learning.

 

Ravid Shwartz-Ziv: But do you think we have, again, do you think at the end, 10 years from now, 20 years from now, do you think we will still have this pre-training stage?

 

Mengye Ren: Sorry, I still have this context retraining stage. ⁓ So part of my research is to hope that it can provide potentially a different paradigm. It's very hard to predict from 10 years from now what it would look like. But it seems like there has been a lot of progress, but the very type of architecture doesn't massively change in the past five years.

 

Ravid Shwartz-Ziv: free training.

 

Mengye Ren: from always accumulating to always learning.

 

Ravid Shwartz-Ziv: Do you think we will need different types of model of architectures or we can adjust the current ones to have it?

 

Mengye Ren: Yeah, I think it's plausible to just have the current architecture and enable always learning, but figure out what type of learning algorithm is needed to address this non-stationary change.

 

Ravid Shwartz-Ziv: Do you think it will still be based on backpropagation?

 

Mengye Ren: I think that's a great question. don't know. think a biological system gave us some insight that maybe back prop is not needed. To what extent it's not needed, to what extent it provides, non-back prop provides benefit. It's yet to be understood.

 

Ravid Shwartz-Ziv: and

 

Allen Roush: I just want to weigh in that I'm also a believer that I don't think back prop is necessary at all. That there's alternatives to it that are feed forward only approaches.

 

Mengye Ren: Yeah, I agree. think there are a lot of space to explore whether there exists some local objectives, feedback ⁓ units, modules that be more recurrent. Yeah, so these are lot of things to explore for sure.

 

Ravid Shwartz-Ziv: And so you mentioned biology and like biological systems and what we can learn from them. So do you think like, I don't know, till now at least the research that I'm familiar with is like, there are two types of research, let's say, like one that's like trying to understand biology with AI, right? But the other way around doesn't really work. Like the other way around is like, sometimes people are doing some post... ⁓ retrospective analysis of like, yeah, we have a good algorithm and like, let's try to fit it to some biology systems. But most of the time, the like to take some biological idea and to go over it like to iterate till you have a state of the art method, AI artificial metal that works is almost never happened. So, correct me I'm wrong ⁓ but this is my feeling and the question is like why do you think it's like that and if you think it would actually like change in the future

 

Mengye Ren: Yeah, so I don't think it's like that. ⁓ You know, like there are a lot of small little ideas that might not pan out in AI system, but the very idea of neural net is actually inspired from biology. And so that's just we can't escape from that part of history. Like we can't have neural net. We can't imagine billions of ⁓ parameters ⁓ with self-organizing system. to emerge intelligence in other ways if we don't have that inspiration. ⁓ But I think like there is a distinction versus, you know, as you pointed out, maybe some of the biological ideas doesn't ⁓ translate to AI, but AI is a way to understand biology for sure. So a lot of the, ⁓ you know, the scientific discoveries are based on, you know, either the behavioral level or based on the microscopic level. And there is less functional view and functional picture of how different mechanism function as in a biological system. And this doesn't have to be situated on a neuron. It could be purely mathematical as a dynamical system. So I think that role of AI is fundamentally important, as if we can... reverse engineer what a biological system can have in the minimal and functional sense.

 

Allen Roush: Bro, So when you mentioned biological plausibility, my understanding is that sure, there's some kind of analogy between giant tensors of floating point values ⁓ and how the human mind works. But as we try to do more modeling of that biological plausibility, such as by implementing some analogy to the spiking mechanism in our brains, how neurons spike and fire. And I don't claim to be an expert. my understanding is Ravid knows a little more about this than me, but my understanding is that there haven't been any real successes with spiking neural nets, and so do you think that there's like a limit where we don't want to follow our own design of our wetware too closely, or at least our belief in our own design of it?

 

Mengye Ren: Yeah, absolutely. think there's always a level of fidelity how we want to model the functional system. And I don't claim that they don't bring value, but you are right that so far we don't have enough evidence that they are needed in AI system. It's same argument for non-backprop, right? So like if you are a believer of non-backprop, maybe you should also believe spiking neural nets. And I think not saying you have to believe both, but the underlying logic is kind of similar.

 

Allen Roush: Well, what do you think is the architecture then that good AI systems of the future work? mean, do you have opinions on like JEPA versus or like Mamba or any of these other alternatives to traditional transformers?

 

Mengye Ren: Yeah, think like, yeah, the recurrence is definitely a big question mark and could enable some aspects of online adaptability and continual learning, not full of it, but at least some parts of it that's currently limited by a fixed window size. ⁓ I think JAPA is an interesting idea and it's very worth exploring in general ⁓ because the planning and the generation happens in the latent space, which provides us the abstraction, right? As opposed to like trying to generate, you know, things that are either in the input level or very close to the input level, which doesn't provide us the abstraction. And this, I think, attaches very fundamentally on the ⁓ how AIs generate. And these days, AI generates... diverse outputs just based on one thing, the softmax distribution at the token level. Right. So I think that's something to be understood. Is that going to give us enough creativity from now on? Like the human creativity will now rely on that softmax or maybe the diffusion, the Gaussian perturbation, right? Is that going to, the Gaussian perturbation on the VAE space? let's put it that way, right? Is that gonna provide all the ⁓ diversity and creativity? I'm not sure. I think that's something to be understood and figure out.

 

Ravid Shwartz-Ziv: Do you think there is a difference between Van Gogh transition or other modalities? Or everything is just different types of data and we can handle it together?

 

Mengye Ren: Yeah, I think, you know, it's interesting that you bring up this. think different modality definitely makes sense. They are projections of the world into a different subjective view, like the camera is a subjective view of the world. Right. think language is something very special. It's not really like a sense as in it's more like created by humans. And the way it's being created is that we need communication between different subjective individuals. I can't communicate brainwaves directly, so I have to rely on some common language tokens. And that's a good, very good level of abstraction that captures the core of the meaning of the communication and abstract out all the other details. So I think that's a great abstraction and in fact a lot of the intelligence has been bootstrapped on that level of abstraction but in the end it doesn't capture the sensory motor intelligence and the physical world so there will be need to be more grounding from the visual inputs from the physical groundings or from other types of sensory signals such as the audio signal.

 

Allen Roush: So, ⁓ do you think that there will be any movement in the industry? Right? mean, that's another question. It's like, do you think we've gone too far with the current paradigm of models and that even if there is a better paradigm, such as like what you've discussed, that it will struggle to get adoption? Or do you think that we're in a ripe spot for adopting some of these things?

 

Mengye Ren: Well, I think definitely we are seeing an exponential trajectory of ⁓ progress. And ⁓ of course, as you said, you were optimistic about the development of AI. I think it definitely gives us productivity. But I think there is worry that it may lead us to ignorance from other aspects and provides other paradigms. And in the end, The creativity collapsed and we won't be able to develop other paradigms anymore.

 

Ravid Shwartz-Ziv: What do think about synthetic data? There is a huge discussion about what is the actual benefit, can we push it further, do we need any real data? ⁓ What do you think about it? Simulations and all these related topics, do you believe that it's a good direction?

 

Mengye Ren: Yeah, I think this is an interesting direction. Simulation is very needed for the purpose of safety. You want to simulate a different environment so that you can deploy your agents in that simulated environment before you deploy it to the real world. To what extent simulation is needed for understanding an embodied intelligence or biological intelligence, ⁓ I think it remains unknown. And I don't think we need a full fidelity simulation. It's more like in the latent space. The brain does use synthetic data in the form of the generative model and in form of ⁓ synthetic ⁓ stimulus even before the baby was born, have this synthetic retinal waves that perhaps give us a good initialization in the neuron. ⁓ But I think this also needs to be seen in real system to see whether this type of data can ⁓ make us more data efficient in general.

 

Allen Roush: And do you believe that that's important for figuring out physical AI systems too? Because today's world models, as that term has been used by real production systems, have been basically video generation models that you can walk in or move in and prompt the world that they'll create. I've seen all the arguments for it are about creating the synthetic data to enable good physical robots. like subscribe to this idea and think it's the right path forward.

 

Mengye Ren: ⁓ Yeah, so I think like I draw inspirations from how a baby learns from the world and learns to. You know, playing with toys and eventually be able to be flexible with manipulation of their physical body. ⁓ So in that sense, I don't think full real scale large number of data simulation is needed to for that type of developmental intelligence. That being said, lot of the directions that's currently developing, pouring all the data for world model and the video generation still might get there before we even figure out the developmental questions for humans.

 

Ravid Shwartz-Ziv: And do you think we'll see more and more bigger models with more compute? And now you need who knows how many GPUs in order to train these models? Do you think we will see this also in the future? Or other approaches will come? Smaller models, I don't know, very specific models to specific types of tasks or domains.

 

Mengye Ren: Absolutely. think there is a real opportunity for smaller models. So I was struggling with my open cloud agent to do my reimbursement receipts. And they are actually pretty nice models. have sometimes smarter intelligence than myself, but they couldn't figure out how to do receipts and always make the same mistakes all the time. said, remember this, save this in your skills, save this in your memory. Maybe a counter argument to Dario's point on continual learning. I think that's doing receipts for me is like a small level of intelligence should be able to do over some reasonable number of exposures to that experience. And that will be massively efficient and cheaper than running the biggest bottle that can do math proofs but only be able to do receipts.

 

Allen Roush: It's funny. ⁓ I just want to quickly mention it's funny you bring up the receipts example. I've actually found Claude code and codecs are very capable of processing receipts. And I think it's because they end up using like state of the art OCR tools sometimes to actually go look at your receipts and read them. And if you've taken decent enough photos or anything, I've had success with this where I had like zero or maybe one like inaccuracy in the resulting invoice. that were created that I had to fix, it's been... But I have also found OpenClaw seems to be a lot less effective at getting most things done than Claude Coder codexes. And so I've been actually disappointed with OpenClaw for everything except like sending messages with my Slack to tell my bot to like send a message to somebody else, right? In like iMessage, right? Like the being a gateway basically, so.

 

Ravid Shwartz-Ziv: And why you think it's happening? Because of the hardness? Why you think, like, you see such a huge difference, you know, that you don't see it in the benchmarking themselves?

 

Allen Roush: I think that's...Meng Yi, I think that's for you.

 

Mengye Ren: sorry, I thought that's a question for Alan. I think like, ⁓ you know, I think you brought an interesting example. There are differences for sure. And I think the way the context is engineered by open call, maybe having too much distracting information, like everything, the history, what the world is. And then like, OK, here is a receipt. need to do OCR. Right. But my my. Pinpoint wasn't really about OCR in my real usage. It's actually just email receipts. And they couldn't get it organized in a Google Doc. It has all kinds of mistakes, messed up the dates, and can't print them from my email, and all kinds of silly mistakes. So I think I have been ⁓ disappointed ⁓ by the usefulness of the... an agent that does the simple task. And I believe that he future doesn't probably doesn't need all that high level intelligence for every single task out there. And there's a huge opportunity to develop smaller model that can continue to learn, accumulate their experience and improve.

 

Ravid Shwartz-Ziv: So you spent time at Google and now you are at NYU. What do you think is the role of academic research? you think this is something that's relevant these days, that you need so many GPUs in order to train frontier models?

 

Mengye Ren: ⁓ Yeah, I think academic research is not equal to training GPU jobs. I think it's really about ⁓ to explore the unknown and to be ⁓ to willing to explore ⁓ the unknown boundary of human knowledge. So I think that's really what ⁓ we stand for is to understand in the most abstract form to be understand who we are and where we are heading to.

 

Ravid Shwartz-Ziv: And do you think we will see a combination or some integration between training frontier models and academic research or exploring the unknown? Or do think we will see that the academic labs are doing very unique things and frontier industrial labs just trying to push you to do the same thing but just scale it up?

 

Mengye Ren: Well, I think the type of work is going to fundamentally change now with AI coding, ⁓ AI writing and all kinds, not just academic research, but influence a lot of different spheres and the academic research. think human insights and feedback loops are still very necessary. Like it couldn't just figure out. It could figure out a very long way from where I am and the amount of stuff it wrote until the writings. maybe needs me working on it for two weeks without using AI, but ⁓ it makes mistakes all along the way and I have to like really give the feedback so that it can come back again and then rethink about this. So the type of work used to be is like I make a small step, but I think for a longer time versus it makes a huge jump and then jump all the way back and then make another huge jump and jump all the way back. So we all have to adapt. to that type of workflow, but so far, hopefully for the foreseeable future, humans stay relevant in providing the creativity and insights.

 

Allen Roush: And do you think it's inevitable that humans retain an advantage in creativity? for, know, LLMs are often really not that funny relative to, you know, how intelligent they are at other things and also struggle at many other creative tasks. And so what's your take on kind of the issue of that, even in any type of generative model, really, for any modality?

 

Mengye Ren: Yeah, think the fundamental thing is we haven't understood creativity yet and we are ready to give it away. So I think that's the fundamental issue.

 

Allen Roush: That's quotable dialogue right there. ⁓ We haven't even understood it yet. We're ready to give it away.

 

Ravid Shwartz-Ziv: Okay, so what do think are the most interesting ideas to explore? Like areas of research to explore?

 

Mengye Ren: What I think right now, I think there are technically there are a few areas that AI just are fundamentally lacking today is first of all, you know, a embodied system that learns from the visual signal and grounds to the physical actions and be able to perform abstraction along the way in a learning environment. Secondly, it's about the continual learning and memory and be able to accumulate experience and learn from the data scarce environment. And thirdly, think it's creativity is to understand, you know, how models sample and how they can explore and how they can help us to realize the fullest form of exploration and creativity. Yeah, but I think in general, ⁓ I think ⁓ we have to understand ⁓ more about from the perspective of human intelligence and only by understanding humans can we imagine a future where AI help us ⁓ flourish.

 

Ravid Shwartz-Ziv: And do you think like, for example, creativity. What do you mean by creativity, right? Like, humans has let us know that we can define like the creativity that human has, right? Do you think models should have the same types of creativity? Or like they should like, I don't know, do you have the other type that like will help us to be more creative? Or like, how do you see it?

 

Mengye Ren: There are different perspectives to that. And fundamentally, I will raise one perspective, but I'm going to ⁓ have another paper coming out on creativity. So I won't give it away. But one aspect is that creativity is deeply bounded to experience. Like, imagine if you have the same training data, OK? No matter how you sample the model, that creativity is the same. It's from the same perspective. is from the same experience. And the reason why we are able to create different things is precisely because we had a different trajectory coming to this world and sitting here right now. And I think understanding the influence from the perspective or subjective experience to creativity is core to that. And of course, there is how we sample. I don't think Softmax or diffusion model has understanding of creativity to ⁓ some extent, they are able to sample from a distribution, but that is very much bottom up. ⁓ By bottom up, mean that you're perturbing a Gaussian or you sample directly from a softmax. You have no awareness of how, it is going to. It is like me trying to decide what I'm going to get for lunch. Okay. The moment that I decide I already made a decision, but that's not how artists create new things. It needs a intrinsic loop of consideration, thinking and understanding. And all that needs to be going on in this process of creating new things. yeah, my takeaway is that you can't just have a bottom-up process of sampling future.

 

Ravid Shwartz-Ziv: But do you think like what types of new things you think we can get? know, like that we can get today, right? Like what types of thinking or output or results or new information?

 

Allen Roush: you

 

Mengye Ren: ⁓ what type of new things we can think today.

 

Ravid Shwartz-Ziv: No, like you said it like okay now today we don't have this creativity right like we are not like we are not making the right process, but what things will be unlocked when We have the right process, but we will figure it out like how to to be more creative with models

 

Mengye Ren: ⁓ Yeah, I think it's all as a design process as a human and they are interaction process to be figured out. Right. And the way many people use AI today is dumping a context and taking whatever comes out. I think that way we're fundamentally going to likely experience a collapse of creativity that way.

 

Allen Roush: Well, ⁓ how important then is like agent to agent communication, right? Because it seems like it's all about context management, context shuffling and related. And today a lot of the parallel like subagents that get launched basically just launch with a clean context and then report their results back. But isn't there some kind of fancier way? Like do you think it's important to figure out agent to agent communication?

 

Mengye Ren: I think it's a very important question to figure it out. The way we construct agents today is like you have the same base model and then you construct different contexts and delegate sub-agents for sub-tasks. And this is a way to increase different perspective in the way that with different context window, but we don't create perspectives just by launching a sub-agents alone. So the perspective has to come out eventually from different training trajectories.

 

Allen Roush: Yeah. Well, ⁓ related to that a little bit is like how important is model confidence in their outputs? Like, do you think we need to have... ⁓ ⁓ models like find some way to report their epistemic uncertainty and is there further do you even think there's ways to be accurate about reporting this kind of thing I mean I suppose humans have problems with this too right but like do you do think we can ever achieve some notion of that with L or with with any generative model?

 

Mengye Ren: Yeah, it's a great question. So in fact, think LLMs as their capability grow, they have better control of reporting uncertainty in general, and especially in a type of task that we care a lot in our lab, which is the forecasting task. And it's measure that the better LLMs has better ⁓ report of their uncertainty. And we also study also ⁓ ⁓ related question is that how can LLMs also align to our perception of uncertainty? But in the end, fundamentally, can't, ⁓ you know, the world only lived once, we only lived once, there's no fundamental way of ⁓ piece out the epistemic uncertainty fully unless we, you know, cast something as an equivalent class, and then, you know, count those as repetitive samples. So yeah, so I think like, ⁓ it's going to be a fundamental challenge, but better uncertainty will unlock better capability for sure.

 

Allen Roush: Well, and you mentioned the softmax earlier. mean, that's one notion, one way that, you know, conversion of things to a 1-0 spectrum happens, right? Or hopefully I'm not mixing that up with sigmoid, I always seem to. But one of those two things refers to that. what's your take on kind of the role of like pure math in this kind of stuff?

 

Mengye Ren: Yeah, I think it's one type of mechanism that allows you to do bottom-up samples, right? So the way LLMs come up uncertainty these days is also from that process, but you actually ask from 0 to 100, how likely do you predict that event to happen? And they will predict a token, which is, for example, 68%, right? So and that's coming from the softmax distribution. So it has a good understanding what that token mean numerically and how does it map to a distribution of uncertainty rather than the raw softmax distribution.

 

Allen Roush: Makes sense. Yeah, and I'm realizing that they both actually map everything to like a zero to one, just if it's binary or multi-class. ⁓ I think you're muted, Ravid.

 

Ravid Shwartz-Ziv: One thing, you think like what about tokenization? Do you think like we actually will get rid of it at some point? Or do you think this is something that we will actually make it more like building in our systems, right? Like all the vision tokens that we actually have some, like share the tokenizer for all of them.

 

Mengye Ren: ⁓ So you're talking asking about localization as a different ways to tokenized stuff.

 

Ravid Shwartz-Ziv: Yeah.

 

Mengye Ren: ⁓ Yeah, I think there could be ⁓ better ways to tokenize stuff. In my view, things should be compressed based on their own environments. As the environment change, maybe you need a different tokenizer, you need to change the distribution, you need to compress it differently. But so far for a static world, having a fixed tokenizer might work very well in a lot of things.

 

Ravid Shwartz-Ziv: What do mean by static vs dynamic world? What are the cases that you have a dynamic world?

 

Mengye Ren: Yeah, so for example, maybe a new type of signals come in more often, And then all of a sudden, your old distribution doesn't match with your new distribution. that before, maybe you allocate a lot of the budget of compression to the old frequent signals. And now you have to all of a sudden capture this new frequently occurring things. in your dictionaries. think that's what I meant by this changing world and be able to update your dictionary based on the changing distribution.

 

Ravid Shwartz-Ziv: And you think you should do it like during inference? Or this is something that you actually need to learn?

 

Mengye Ren: Yeah, I think the system should just always be learning.

 

Ravid Shwartz-Ziv: But so during inference, like what, you need to detect what is the right hidden variable of the world and to use this to predict what is the current environment.

 

Mengye Ren: ⁓ No, think we cannot know what's the right latent variable of the world. We can only try to compress our own experience and then come up with latent variable that's more useful to predict our ⁓ next experience.

 

Ravid Shwartz-Ziv: So in general, what do you think about latent variables? Do think this is something that we should train explicitly?

 

Mengye Ren: Yes, think latent variables are excellent abstractions of the world and it provides for embodied agents, provide the right level of abstraction and data efficiency. And if there is objectives for latent variables, for example, JEPA as an example, I think we should definitely explore that as an option.

 

Ravid Shwartz-Ziv: And why do we think that we don't see it enough, like, let's say the front-end models are not using light and viables?

 

Mengye Ren: ⁓ I think they have enough data.

 

Ravid Shwartz-Ziv: So do you think when we go to environments that we don't have enough data, in this case we will need to use explicit variables?

 

Mengye Ren: we don't have enough data or maybe we have the changes in data, right? And then every single window of that sequence doesn't have enough data. And then we will have to use the little variables to constrain the types of solutions that we

 

Allen Roush: And what do you think, I know this is a slight bit of topic change, but what do you think about the future of machine learning education? I mean, now with, ⁓ you know, some of these AI systems are so good, they're writing experiments and even some of the theory that we used to teach, you know, things like, I mean, maybe you use cross validation. It was drilled into my head how important cross validation was for, you know, verifying results. And now such a thing is impossible. to do computationally, feasibly on pretty much any type of experiment. And then, you know, I remember double descent, triple descent now, whatever happened to the basic theory ⁓ of how you did things circa 2018 to 2020. So how do you see the evolution of that? And I ask this because you are a professor, right? So.

 

Mengye Ren: Yeah, so I think like there are different questions to that. What are the importance of ML, ⁓ fundamental understanding of ML, and what are the importance of education? I think education is a very broad topic. We provide education, we encourage education because we want to, to have a brighter future, to have more ⁓ options to explore their passions and what they want to build for the world. For as for fundamental understanding of ML, think it's still very much necessary. Like there are, you know, fundamental limits and fundamental insight, what intelligence could arise from these systems and why they arise. And I think today we're still lacking the full picture of that understanding. And especially important when we move towards this online continual experiential learning regime for always learning agents. And there don't actually have enough data points. So then, like, what is the fundamental limitations that allows your agents be able to learn from and be able to generalize from that limit of data points? I think that's still very relevant to today's frontier. So I hope that answers the two parts of your question.

 

Ravid Shwartz-Ziv: But people will say, okay, so if you don't have data points now, that's fine. We'll just put, I don't know, one billion dollar and we'll collect these data points. What's the problem?

 

Mengye Ren: ⁓ Yeah, so eventually we will run out of these data points unless we install a camera on every single person's head and start collecting their own experience. And I don't think that's a future we all want to head into.

 

Ravid Shwartz-Ziv: Yeah, fair enough. But do you think there is a trade-off between collecting more points and understanding better the models and trying to structure it better? Do you think they are besides data efficient? Are we getting something more fundamental by using your approach, if you want?

 

Mengye Ren: Yeah. Yeah, so I think the more fundamental aim is to understand human intelligence. And human intelligence is a working example that provides the breadth of intelligence. We're not always Olympic, mass Olympician, but we can do a variety of tasks ranging from physical to cognitive tasks. And we're able to adapt in new environments and with limited number of data points without having to read the internet first. So yeah, so I think ⁓ the mind of main motivation is to understand human level intelligence and through the lens of continual learning, data efficient learning, embodied in environments.

 

Ravid Shwartz-Ziv: Okay, I think we are almost out of time. Do you have anything else that you want to add? To promote?

 

Mengye Ren: ⁓ No, I think that was great. Yeah, I think we touched upon a lot of topics, very exciting topics, including some of those I'm currently thinking about, but I won't be able to give a full definition because I'm also working on papers related to these interesting topics. ⁓ But yeah, but that was a great conversation.

 

Allen Roush: Yeah, I really enjoyed it. I don't sit down and talk to people who are working in so many subparts of machine learning at the same time too often. We really have a broad kind of skill set.

 

Mengye Ren: Thanks. Yeah, I appreciate you guys' questions as well. I think a lot of it is really pushing me thinking harder. And I hope some of the answers ⁓ can, through your channels, inspire the audience. But yeah, hope you like it, and we should stay in touch.

 

Ravid Shwartz-Ziv: I'm sure. Thank you so much and thank you for the audience and see you next time.

 

Mengye Ren: Thank you.