March 23, 2026

Why Healthcare Is AI's Hardest and Most Important Problem with Kyunghyun Cho (NYU)

Why Healthcare Is AI's Hardest and Most Important Problem with Kyunghyun Cho (NYU)
The player is loading ...
Why Healthcare Is AI's Hardest and Most Important Problem with Kyunghyun Cho (NYU)
Apple Podcasts podcast player iconSpotify podcast player icon
Apple Podcasts podcast player iconSpotify podcast player icon

We talk with Kyunghyun Cho, who is a Professor of Health Statistics and a Professor of Computer Science and Data Science at New York University, and a former Executive Director at Genentech, about why healthcare might be the most important and most difficult domain for AI to transform. Kyunghyun shares his vision for a future where patients own their own medical records, proposes a provocative idea for running continuous society-level clinical trials by having doctors "toss a coin" between plausible diagnoses, and explains why drug discovery's stage-wise pipeline has hit a wall that only end-to-end AI thinking can break through. We also get into GLP-1 drugs and why they're more mysterious than people realize, the brutal economics of antibiotic research, how language models trained across scientific literature and clinical data could compress 50 years of drug development into five, and what Kyunghyun would do with $10 billion (spoiler: buy a hospital network in the Midwest). We wrap up with a great discussion on the rise of professor-founded "neo-labs," why academia got spoiled during the deep learning boom, and an encouraging message for PhD students who feel lost right now.


Timeline:

(00:00) Intro and welcome

(01:25) Why healthcare is uniquely hard

(04:46) Who owns your medical records? — The case for patient-controlled data and tapping your phone at the doctor's office

(06:43) Centralized vs. decentralized healthcare — comparing Israel, Korea, and the US

(13:19) Why most existing health data isn't as useful as we think — selection bias and the lack of randomization

(16:53) The "toss a coin" proposal — continuous clinical trials through automated randomization, and the surprising connection to LLM sampling.

(23:07) Drug discovery's broken pipeline — why stage-wise optimization is failing, and we need end-to-end thinking

(28:30) Why the current system is already failing society — wearables, preventive care, and the case for urgency

(31:13) Allen's personal healthcare journey and the GLP-1 conversation

(33:13) GLP-1 deep dive — 40 years from discovery to weight loss drugs, brain receptors, and embracing uncertainty

(36:28) Why antibiotic R&D is "economic suicide" and how AI can help

(42:52) Language models in the clinic and the lab — from clinical notes to back-propagating clinical outcomes, all the way to molecular design

(48:04) Do you need domain expertise, or can you throw compute at it?

(54:30) The $10 billion question — distributed GPU clouds and a patient-in-the-loop drug discovery system

(58:28) Vertical scaling vs. horizontal scaling for healthcare AI

(1:01:06) AI regulation — who's missing from the conversation and why regulation should follow deployment

(1:06:52) Professors as founders and the "neo-lab" phenomenon — how Ilya cracked the code

(1:11:18) Can neo-labs actually ship products? Why researchers should do research

(1:13:09) Academia got spoiled — the deep learning anomaly is ending, and that's okay

(1:16:07) Closing message — why it's a great time to be a PhD student and researcher


Music:

  • "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
  • "Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
  • Changes: trimmed


About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

Ravid Shwartz-Ziv: Everyone and welcome back to the information button I put us and today we have a really great guest a Keon Hong a he's a Glendy various various professor of health statistics ⁓ and in general professor for computer science and and data science at the New York University. Hey


KC: Thanks for the invitation.


Ravid Shwartz-Ziv: Thank you for coming and as always we have Alan. Hey Alan!


Allen Roush: Good to see both of you and I'm back in North America again and I have so many questions. This is such a special guest and one who's right up there, you know, with obviously nobody's quite Ian Lacoon, right? But that's, we're getting close again, thank goodness. So this is a very, very dignified praise that I'm trying to put here. ⁓


KC: I


Ravid Shwartz-Ziv: you


KC: Thank you.


Ravid Shwartz-Ziv: Yeah, so today we are going to talk about so many topics because you're doing so many things. So I think we can start with healthcare. I think this is such an important topic and so timely. So, yeah, let's start like, how do you see the field? You know, there are so many ⁓ changes ⁓ in LLMs, in AI, like, right? Like every day we have something that ⁓


KC: Mm-hmm.


Ravid Shwartz-Ziv: big tries to beat and to go to this field. How do you see it in healthcare? Do you see that this phenomena also apply for healthcare? Do you see that we are already there or we are getting closer to be dominated by LLM or do you see something else?


KC: Yes. Mm-hmm. Yeah, I'm actually very ⁓ happy to see that many of these so-called frontier labs, well as scientists in frontier labs, are thinking about this major issue called healthcare in our society. Because healthcare is something that is very unique. It's unlike any other industry sectors that we imagine our head. Because healthcare is something that we cannot opt out. So almost everyone in this modern society is born inside healthcare. So even if you are born at home, often there are midwives. Midwives are all clinicians that part of the healthcare. And then eventually everyone dies within healthcare as well. In fact, the death certificate is issued by the clinicians and then they are all healthcare professionals as well. Now what that means is that this is not the thing that we can just approach it by simply saying that, well, here's the industry sector, we're going to just revolutionize it. Because one small change is going to affect everyone on the society. So we have to be very careful, but because of that, it's very difficult to change anything. And then in order for us to actually change anything from it, we have to approach it extremely seriously. And then I see the OpenAI, the Ionfropic joining actually makes people think about healthcare a lot. And then I think that that's already the great first step. Now, unfortunately, healthcare, because of this kind of conservative nature of healthcare, we can't really change any single aspect of the healthcare that easily. I've worked on ⁓ drug discovery at Genentech for the past five years, then even drug discovery, is a small part of the overall healthcare, still cannot really be, could not be changed that easily. Although we have amazing AI system for drug discovery or the molecular design, those things are just tackling the tiniest possible part of the healthcare. And then when you have this kind of large system, there's always a question of... Will this large system change if we start changing one, let's say, not better time? Seems like that's a very difficult thing. But at least we now have started to actually look at the whole thing. And hopefully, we'll be able to change it sometime soon. As in your idea, how do we actually think about the health care from scratch using this new technology, PsyGate AI? So I can actually give you one example of how I think about it is that there is a weird thing. You probably have noticed if you've ever been to clinics or the hospitals, or even if your families have been there, it's the fact that the hospitals maintain patient records. And if you think about it, patient records are of course legally as well as inherently your own, like the patient's own. And then nowadays, everyone actually carries multiple devices that have like gigabytes of the memory and that we all have access to cloud. And then we have Another, let's say, tens of gigabytes of the storage in the cloud as well. But somehow, we never maintain our own patient record. Patient record is somehow always maintained at a hospital in the case of the US or across the different countries, some centralized organization, either government-driven or whatnot. Now, this is a really big question I started to have is that, why is that the case? In fact, this is a really weird thing. If you think about one more thing, that is that we have to ask hospitals, for our medical records. And then sometimes hospitals can also delay it or not even answer our request as well. And then that's where everything becomes weird. But if you think about the historical context, it's not weird at all. Why is that the case? Because record keeping has never been that easy. So individuals were not able to easily and safely secure a paper-based record at their places. They would move around, they would lose it, if there was a fire, everything goes away. And thereby doctors were maintaining the records of their patients on behalf of those patients. Now, of course, times has changed, but somehow the system has not changed at all. And this actually brings up a lot of the issues here as well. Because I don't own and bring my own patient record, every time I go to a new hospital, new doctor, new clinic, or if I move to another country, then everything starts from scratch. I can of course bring my own records, but how often do we actually do that? We just start from scratch. And then what that means is that all the information about myself or any of the patient is always fragmented across multiple hospitals, multiple clinics, and even multiple countries as well. And these things are all kinds of things that we cannot simply change overnight because obviously we have to ensure that everyone still maintains their patient record without having to spend too much of the effort because we need to make it happen for everyone, everyone in the society. But then you to, it has to happen somehow. So there are a lot of the say complications and the things that we think we can do much better with the more recent technologies and also the future technologies like AI. But it will take some time. And then that's the reason why I actually welcome the say OpenAI and Thrauping, Microsoft and all those big companies along with the amazing research hospitals work on this ⁓ more so than before.


Ravid Shwartz-Ziv: But I have a question. It looks that for sharing information, you don't need these sophisticated AI systems, right? You can just do some regulation that you have some general main system, right? For example, in Israel, right? There is a system that is shared between hospitals and some clinics. And every doctor can, if you give them permission,


KC: Mm-hmm. Mm-hmm. Mm-hmm.


Ravid Shwartz-Ziv: it can access to your records, right? Like, you don't need all these things. So do you think that, like, this is... it will make it easier? Or do you think this is just one step in the way to get to the real...


KC: Absolutely. You're absolutely correct. You're absolutely correct. In fact, what we really want is that everything is maintained by us, directly. Of course, it's going to be out for some of these companies. So maybe Google is going to provide this kind of personal health record, let's say, management, let's say, service, or Apple may do that. Apple is doing that already to a certain degree with the Apple Health. What we want is that we go to the healthcare providers, and then we're going to literally tap our phone. and then give them the access to it for six months. But the nice thing is that the doctor that I gave the access to will have the full record of me. And then history of the patient as well as the family history are two most important thing in order to make the proper diagnosis as well as the treatment. Thereby, it's going to actually improve the quality of your healthcare so much ⁓ greater. An interesting thing is that this is going to happen eventually, but this actually requires us to change the entire backend of the healthcare. that involves not only the medical records of the patients themselves, but also the billing information. Who's actually going to pay for it? Because this connects to this earlier discussion we had about the inability to opt out. Because no one can opt out, healthcare, everyone is paying for everyone else. And then thereby, we have to actually keep track of it really carefully and always convince everyone else that they need to pay for my healthcare. And then this makes it really, really complicated. So how are you going to update that? That's actually going to be yet another thing that needs to change quite dramatically.


Allen Roush: So, and I should have probably asked this right before we started the recording because I don't know. So obviously, you you spoke about like trying to do work in Korea. ⁓ And so I don't know like your citizenship status. The reason I'm trying to ask is ⁓ for when you're talking about health care, this as an American, this is the most like American problem ever. But I also know that the history, especially of modern Korea,


KC: Okay. Uh-huh. Right. Mm-hmm.


Allen Roush: I will just say it was easy as an American to do things there. I felt like a ⁓ very, you know, America brought a lot of things, possibly a more privatized healthcare system. And so I guess, first of all, please correct me if anything I've said is wrong in that analysis. But can you give any like differences between Korea's healthcare system in the US is in the context of like the AI question and all of these, if you know those questions well.


KC: Big Z Yeah, I have some thoughts, of course. I don't know too much about all those things. I am a dual citizen, so I'm a Korean and also American. And then I was born and raised in Korea, so I experienced the health care in Korea. And also, I've been here now more than 10 years, so I've experienced a lot in US system as well. Now, of course, fortunately, I'm a faculty member at NYU, which also has its own medical school that gives us a kind of a bit more superpower than many other people. So that's been a privilege that I've enjoyed. ⁓ But then of course, the healthcare system is very, very different. In fact, the Korean healthcare system is more in line with every other country's healthcare system that is highly, highly centralized. On the other hand, U.S. healthcare system, we all know that it's extremely, extremely decentralized. then of course, it's easy to think of it as a public versus private or the single payer versus multiple payers, but I think it's actually more correct to think about it as a centralized versus decentralized because you can actually have a publicly mandated decentralized healthcare system as well. fact, it doesn't have to be fully centralized. Meanwhile, there can be a centralized private version as well. So what is the main difference is that a lot of what the doctors can do to the patients and then what are the things that are going to be covered by the insurances are going to be determined extremely carefully and that is centralized in Korea as well as many other countries. In the US, There are lot of rules and of course the insurance companies together with Medicaid and Medicare, which are in fact the largest, let's say, payers in US as well, they will determine what are going to be covered, but it's relatively decentralized. You can always find new insurance companies or you can also decide to simply pay yourself for all those treatments you can, even if the insurance companies do not cover in US. And then this actually makes it a really ⁓ interesting case in the sense that the there's a much more kind of incentive as encouragement in US to try things that are much more at the frontier, much more rapidly. That of course is the reason why there is a lot of progress both in terms of the generic healthcare as well as AI for healthcare in US. But also it turned out that the centralized versions just like in Korea has a distinct advantage is that the data that drives a lot of the innovations in AI are indeed actually contained and stored in one place where once the researchers can have access to, they can really look at the entire country's records and then try to build a system that can tackle the issues that actually exist in the country. So it's a bit of a two sides of the same coin because at the end of the day, it's still the same thing that the healthcare in every country has to cover everyone. No one really opts out. Even in US, it's kind of impossible for us to opt out of the healthcare system. But there is a kind of major difference there. So the interesting thing is that we have a global air frontier lab that we created at NYU. There is a collaboration between NYU and Korean government. And then about seven institutions from the Korea are participating in this lab. And we have three research teams. And third team is AI for healthcare. For that one, NYU Langone researchers together with me are working together with the university hospitals from Korea. together and that's where we learn a lot about the differences as well as the potential synergies that can be had.


Ravid Shwartz-Ziv: So, okay, let's assume that we know that it's very complicated to collect the data and to merge it and to make it, to unify it, but once, first of all, like what types of data do you actually want or like you actually think it's useful, and then when you have this data, what you are going to do with that.


KC: Hmm. Yeah, that's a great question. I don't think a lot of the data we have already collected, for instance, in healthcare, or we can even talk about the more narrowly for the patient data, as well as the, let's say, for drug discovery, let's say, molecular data, are as ⁓ useful or as informative as people think they are. The reason being that many of these datasets were collected, or the data has been collected. without the purposes that are aligned well with how we build AI systems. So for instance, I have some this outrageous idea that I've been having over the past, two, three years. I wrote a one blog post on my homepage and then that actually made me hated by many of the doctors even more so than before. However, if I just repeat the content there here once more is that a lot of the EHR data is somewhat for the purpose of, for instance, counterfactual or the causal inference. The reason being that ⁓ almost no randomization that happens when you go to the hospitals. ⁓ then why is that? This is really a mystery to me or it was a mystery to me because there is always uncertainty. And when it comes to diagnosis, as was the treatment by the healthcare providers to patients, ⁓ level of the uncertainty is dramatic. In fact, almost nothing in healthcare as far as I can tell is certain. Even a particular disease, defining a disease is an extremely difficult problem.


Ravid Shwartz-Ziv: Do you mean about like, about the day? Do you mean like to diagnose like the specific disease, the specific problem that you have?


KC: So in fact, yeah, for everything you can imagine. So what let's think about the diagnosing a particular disease. For that to happen, we need to have a very clear definition of that particular disease. But we often don't have the definition of the disease. Disease is really just a cluster of symptoms. Unless it's a very particular infectious disease where we know the infectious agent. It's almost always just a cluster of symptoms. They often are not one disease as well. And then next thing, what do we need? We need to know about the patient. What kind of, let's say, patient am I looking at? It's really difficult to tell because we don't really have a full history of the patient. We don't have the family history. And the patients often don't share all the information with the doctors as well. There's a bit of a trust issue, right? And that's a bit unique problem in US as well because of the... history, modern history here. And then this all makes it impossible for any clinician to be able to make a hundred percent certain, let's say a diagnosis or even make a recommendation on the treatment. But then at the end of the day, what we would say as a machine learning researchers or the statisticians is that if there is uncertainty, then what should be done is that yes, they need to toss a coin and then choose the one that according to this random number generator. as long as they are of a high probability. But unfortunately, that's not what happens. Let's say in one clinician, they see, let's say, 10 patients, then let's say they all had similar symptoms, and then similar patient history and everything. And then say there was some uncertainty on the diagnosis. The doctor is going to give the same diagnosis to all of them. They're not going to toss the coin. And thereby, in fact, we have a huge amount of the selection bias here in our data. Thereby,


Ravid Shwartz-Ziv: you


KC: This kind of data is not going to be that useful for the purpose of the causally, let's say, consistent, let's say, diagnosis generation or the diagnosis systems when we build them. So what that means is that the data is there. We have massive amount of data, patient data, all those. And then when it comes to patient data, all those images, lab measurements, prescription that have been given to them, treatments that have been given to them. And then sometimes even the records of what kind of things they've been needing, right? Diet, the historians and so on. All those things are there, but what we actually don't have as a well controlled one is that what were the things that were given to them and then what were the alternatives that were considered? We actually have no idea about it. So then what we can do, I think, is that we need to actually let the both train and enforce or encourage the clinicians to toss the coin. So what I have been suggesting is that for every time you go see a doctor or the clinicians, they want to all write a note. eventually. And then what we want them to do is that the right before they make a diagnosis or write a prescription or provide a command with a treatment plan, we want them to write down at least two to three plausible diagnosis or plausible treatment plans on the screen. And then the system is going to choose one for them automatically. And then they just follow this one. And then there's an automated randomization happening. And then what that means is that you are running the clinical trial nonstop forever and ever.


Ravid Shwartz-Ziv: But you have also the problem of precision and recall, right? And this is not like a constant threshold, right? It may be that for different people, will need to put different doctors will put the threshold in a different place.


KC: You're absolutely correct. fact, well, so that doctors don't put really the thresholds, because they cannot really think about, they cannot come up with the proper distribution over the all possible diagnosis anyway. we are using them as getting the top K, let's say, diagnosis, but among them, we treat them as all equally equivalent, and then we're able to choose one uniformly for them. Now, this is a, I have to say very... difficult idea to chew and digest. gotta say that's actually the case. But in some sense, in this way, we can actually run this kind of nonstop clinical trial at the society level. And then thereby the data that is collected will actually allow us to build a universal causal inference engine for healthcare. And I think that that's the dream that we all have.


Allen Roush: So I just want to intervene for a moment. This is a fascinating idea. I've never thought about this, by the way. And I also want to point out that it collapses the problem into a nearly identical one to LLM sampling, which if anybody that knows our podcast at least knows me that that is near and dear to my heart. And so you mentioned TopK, but you can devise ⁓ equivalents to MinP, for example, which we're, Ravid and I are on, where we say, the predicted probability influences how many


KC: Yeah. Mm-hmm. Yeah? Mm-hmm.


Allen Roush: election candidates we have and that also when you start explaining more sophisticated forms of sampling using what you're doing though, it becomes a lot easier for people to swallow that. Though Top K will really encourage either some diversity or some weird behavior from doctors in the system when they feel really confident about life. But I love this in the same way that I love taking ⁓ some ideas about like civics from movies or books like Starship Troopers.


KC: Absolutely. ⁓ huh. Very right. Yep. ⁓ huh. ⁓ huh. New York, yes.


Allen Roush: It's almost an idea that seems like it would come from a movie or a book like that. So I just want to hear more. This is fascinating.


KC: Ha ha ha. Yeah, and then you're absolutely correct. And then this actually connects to all those, you basic fundamental concepts in machine learning as you pointed out, right? So in fact, if you go all the way down, at the end of the day, every learning problem can almost always can be written down as a reinforcement learning problem with some form of the reward function. But then what is the most challenging part there is that it's not the learning process itself. So fitting a parametric form of the function, think we have actually figured it out over the past, three decades. We have the residual connection, we have attention, we have nowadays, like it's a muon and all those amazing, let's say stochastic optimizer or something like that. So we figured that part out. But then the key is that one thing that we'll never be able to figure out, in my opinion, is how we do the exploration because exploration is inherently domain specific or the problem specific problem. there cannot be any arbitrary, let's say, exploration scheme that works because I can always come up with the arbitrary word function that's going to defeat the whole, let's say, exploration scheme. But then when it comes to this kind of healthcare, what is really interesting is that this seems where just like how the chat GPT as well as the all those conversations at the LLM has become, let's say, overnight sensation is that everyone uses it and everyone will use it. Thereby, the amount of the data that this created is massive. And then if we add this kind of a bit of sprinkle of the exploration on top of that, doing top P, mean P or all those different sampling algorithms and then get some kind of implicit feedback that we suddenly have a massive amount of data that allows us to build a system that actually maximizes whatever the arbitrary reward we probably want to maximize down the road. But with the healthcare, we cannot do that at the moment because all the clinicians actually swear. to do best for each and every patient. Thereby, if they have a tiny suspicion, however miscalibrated it is, that the one diagnosis is better for this patient, then if they are the same patients over, they're going to go for the same diagnosis over and over. So that currently the healthcare doesn't train the individuals within the healthcare system to think at a group level or at the society level. We only think at an individual level. So changing that. is going to be give us a kind of a say kind of data I think is going to bring us to the next level of the AI for healthcare, the healthcare driven by AI.


Allen Roush: I love it. I love it. It's fascinating. What to think about. ⁓


Ravid Shwartz-Ziv: So first... So like, what do you think is missing? What parts are missing in this... for making this happen?


KC: This will be extreme story about that. Yeah, I'm into healthcare. So maybe healthcare is a kind of, let's say, big a chunk. So I'm going to talk a bit about the drug discovery and development where I spent some more time on over the past four and a half years at Genentech after selling our pressure on design. So what is difficult is that they say I want to come up with a new drug. Usually what people are doing now is actually not to come up with a drug initially. they start thinking about the disease. What are the diseases that actually affect a lot of people? And that often may have a cause that is based on genetics. Because if there is an environmental cause, it's usually easier to fix the environment than come with a new drug. So we want to actually try to do that one. But there is a genetic cause, then we have to come up with the drug because that's not the thing that we can fix by changing the environments. So then let's say we try to find some kind of genetic cause. And then we identify the target gene or the proteins. And then once we identify it, and if you think that that's the right gene or the target to design a drug form, then we're going to design a molecule. Could be antibodies, could be small molecules, or different types of modalities. And then we're going to choose one that we know are going to bind to some of the proteins that are in the pathway starting from the cause to the symptoms. Great. Find it. And then we need to develop it further. We need to characterize how it's going to work within our body and then we need to characterize the safety profiles because almost everything that we design is toxic. In fact, designing a toxic molecule is very easy. Everything is toxic to us. Nature is not friendly to us. So we have to make sure that it's actually safe for us. And then on and on and then we're going to do some more testing before we ask the FDA for the clinical trial. And then we do the phase one and the phase one is where we check the actual safety. We recruit the healthiest possible people, 30 to 50 of them. And then we're going to increase the dose until the point at which we see that, okay, there is no issue with the safety and then this is the maximum dose we can go to. And then if we find that there is a dose, we go to phase two. That's where we actually test the efficacy. And then on and on and on until the phase three, and then we need to manufacture them. We need to build a factory and everything. Now, along this way, ⁓ it takes about five to 10,000 people over 50 years up to. So starting from the beginning to the end. And then what that means is that no one can actually see the entire thing that easily. And then everyone needs to look at the individual stages only. And each of the stages, they have perfected their art. They know what to do. They have the criteria they've been following. They know how to check all those boxes. Now, unfortunately, just like what we are talking about in machine learning in 2006 to 2009, those layer-wise pre-training. At the moment, every layer or the stages have been optimized on its own without actually taking into account what's going to happen later on. And then thereby, a lot of these stages have extremely low chance of being successful if you look at whether any one of the things that came out of the stage became the actual eventual therapeutics. That is because we are not thinking end to end. And then the same thing in healthcare as well is that a patient comes in, we look at the symptom. or not we, mean the actual doctors look at the symptom and then try to come up with the best treatment plan. However, this actually takes into account that particular incident for which the patient came to the hospital. We cannot really think about the much longer term. Can we think about the patients at the level of the birth to the death altogether? Or can we look at the drug discovery as the, I come up with the molecule that's going to be eventually successful at the clinic, not passing this stage? We haven't been able to do that because we don't know how to do that. And thereby, the whole system is somewhat fixed in this kind of old school stage-wise funnel-based paradigm. And then I think AI is here to actually get us out of this stage. And then that actually happened within machine learning itself, right? We're doing a feature engineering, feature selection, feed a linear model, and I hope that everything works well. And then later on, went, okay, let's train a restricted Boltzmann machines over and over on top of each other to build an initialized deep neural networks better. That was actually my ⁓ PhD dissertation myself. But then eventually, what was a key thing to the success, just like what Yan did in his eighties, his PhD dissertation, just to end to end training. We know the loss, we need to update everything altogether. And then the best way to ensure that what we build is going to achieve the goal that we want. rather than just making the visual stages better and then hope for the best. yeah, mean, this kind of change of the conceptual change is going to be the first thing that needs to be done. Some people started to look into it, but not everyone yet.


Ravid Shwartz-Ziv: But in healthcare, it's much more complicated, right? Like, if you are, like, when you're doing, when you have, I don't know, terminal bench 2, then that's fine, like, you break some code and you can try again and again, right? But in healthcare, it's much more complicated, like, it's, first of all, like, it's life or death, right? And also, like, it's very expensive, like, you need to collect people, like, the physical world is much more complicated. So...


KC: Thank you. Absolutely.


Ravid Shwartz-Ziv: how do you think like we should approach it?


KC: Yeah, absolutely. It's absolutely true that much more is at stake with healthcare. If the code actually breaks, or even when the AWS had a 13-hour long outage, the work continued. But of course, healthcare, if you have some mistake that goes on for the 13 hours at ICU, then people will really die. So it is really much more serious. And then that's the reason why we have to be much, and that's the reason why everything has been moving slowly. But also we have to acknowledge, I think that the healthcare or the drug discovery, especially drug discovery at the moment is already at the stage of failing the whole society. Why is that? Because the success rate of inventing and approving new drugs is only going down and then it's going down rapidly. What that means is that the, of course, Gary was, you know, at the kind of, hammered because he said that the deep learning was hitting the wall many years ago, which was incorrect. However, when it comes to drug discovery, I do believe that the existing paradigm has hit the wall. It's just that because the process is so long, we're just feeling that the fact that we hit the wall very slowly. We're just not sensing it, but we have hit the wall. We need to go over that wall and then AI is going to actually help us. Now, One thing about, so drug discovery is one thing because we're still working with the molecules up until the last minute. But then healthcare, general healthcare is all about the patients. And then of course we cannot run experiments with the patients. The patients are us. Every single one of us gets sick at one point. Thereby we are all patients ourselves. So that's the reason why this kind of wearables are really important. All those measurements are really important. So trying to track how the individuals are evolving over time. And then trying to be as preventive as possible is one thing. So I'm Glenn DeVrie, Professor of Health Statistics. I got this in doubt about a year or so ago. And then Glenn DeVrie, who actually unfortunately passed away a few years back ⁓ by accident, he actually founded a company called Metadata. And he's been very, very into this kind of how to make healthcare more data-driven. And then he wrote one book before he passed away called A Patient Equation. This is one of the kind of, let's easy to read book for the public and also still give the public the sense what we can do if we can actually collect more data and then do a better, let's say, processing of the data in order to provide a care to the not patient, but everyone. I think that we are getting there. It has to be done slowly, but the baseline, the current status is actually not that great. It's getting worse and worse. So we have to do it despite the fact that it's going to be very difficult.


Allen Roush: So ⁓ I, and you know, I'm so fascinated by healthcare. I don't care if we spend the whole episode on it, to be honest, even though there's other topics. It's just, ⁓ I've been involved ⁓ in, so I have both my family who's ⁓ in the hospital for different but serious reasons. One with colorectal cancer, which they just got removed stage two, they're clean, everything just went home. So that's a miracle. And then a less good outcome, a mom who had a ⁓ foot


KC: Ha ha.


Allen Roush: infection that lasted like 15 years and she finally tried to infect her spine. So they amputated her foot at her request. Yeah. And so now she's figuring out how to live with, you know, a wheelchair and a prosthetic. And so I as I care giver trained and in physical therapy with her and all, you know, all this stuff happening to me. And I have been unfortunately given a crash course on this stuff. So


KC: ⁓ no. Ugh. Yes. Yes.


Allen Roush: You know, one thing, and this is one where when you talk about aiming towards collective populations, ⁓ most of, and I would claim even for my parents' sick case, you can trace back their problems to ⁓ the traditional problem in America for most healthcare, which is people eating too much. ⁓ Metabolic syndrome is what they call it, which is what heart disease and diabetes and everything else comes from. And I'm looking at GLP-1-3,


KC: Okay. next. Mm-hmm.


Allen Roush: drugs and I am like, we need give that stuff out like candy. Like I want like massive, I want like programs. I know there's like memes about it too, where like people, people will like do memes in the other direction to like, you know, weird new types of body shaming, which is hilarious in its own way. But like the main thing is do you agree with this? because you know, a lot of people will say, well, what about the side effects? And I'm like, have you seen the side effects of not doing overweight? Like, have you seen, I've


KC: Yes. Mm-hmm. Mm-hmm. Okay. Hahaha!


Allen Roush: seen them and I'm curious your thoughts on this stuff.


KC: Right, right. Yeah, GLP-1 is fascinating. Actually, its history is even more fascinating because we've known about the importance of the GLP-1 receptor for diabetes from 80s. And it took us like, what is it, like 40 years to figure out that the, figure out and approve a drug that is based on the GLP-1, let's say GLP-1 to treat this kind of weight loss as well, the overweight syndrome, right? This is really fascinating. Now, what is really interesting about GLP-1 as well is that in fact, probably, I don't think anyone knows for sure with the 100 % confidence is that it's not working because what we thought, how we thought it was actually working for let's say, weight loss. It's likely that it's actually acting on our brain. So GLP-1 receptors are not only in our gut, but also in our brain as well. And then I think that there are growing... ⁓ pile of evidence showing that it actually works in the GLP-1 receptor in our brain. ⁓ it actually does change the behaviors of the people as well. ⁓ fascinating. And then this also tells us about the ⁓ drug discovery in healthcare is ⁓ And then we have to embrace the uncertainty because we have extremely high uncertainty about how this drug works. But we're just prescribing like new ideas, you said, ⁓ at the moment, ⁓ is this a good thing? I do think that the... ⁓ Increasing the access is better. But of course, any drug always has the unanticipated side effects that may not be actually serious, by the way, that may not be serious for individuals, but over time may actually show up. So in my view, just like what I talked about earlier about this kind of society level clinical trial, where I toast a coin every time, we probably want to ensure that we don't give everyone this GLP one, nor to restrict the access too much. but kind of a mix it in so that we can actually see it at the society level over time, how it affects individuals, society, as well as the overall economy as well, because changing behaviors will definitely have a huge impact on the overall economy and society as well. So maybe not everyone, but more people than now, I believe, yes.


Allen Roush: Cool. the anti-addictive properties, Yeah, ⁓ are having, like, ⁓ the more I look at these, I really consider it like the miracle of medical science in my lifetime. Like, more than any one change that I have seen, I'm like, okay, it's a choice now. Not fully, right? But like some of the, ⁓ forget the name of it, the one ⁓ starts with the T-trezipatide, something like that. That's ⁓ the strongest and best performing clinical form


KC: Yeah Mm-hmm. Right. For best of time, okay. ⁓


Allen Roush: of it for most people. And I look at that and I'm like, that is just a marvel. That's like going to the moon, you know, given how big of a problem that was. so so it's just that's where even if it's like harder to do drug discovery. And I'll point out that one of my favorite films is called The Andromeda Strain and Ed Book and all that about the whole, you know, you have to time travel back to when the fungus you needed was actually there to like solve a plague. I'm curious about how you view this and then also things like the risk of superbugs.


KC: Yeah. Right. Yeah. Yeah. Hmm.


Allen Roush: ⁓ I have my own problems. I actually, and this is where I'll get controversial. I think you should, ⁓ that the vast majority of anti like MR, maybe not MR say infections, but risk of superbugs is primarily in overuse in agriculture and that I should be able to get like freaking penicillin if I want to from like my pharmacy without a doctor's note. And like I can in Asia in a lot of cases for like frontline antibiotics. And so I'm just curious, what are your thoughts at all of these things?


KC: I ⁓ Mm-hmm. you Hmm That's true.


Allen Roush: I know antibiotics overprescribed, but I don't think the risk is that bad compared to farc- know, agriculture again, right? But anyway.


KC: Hahaha Yeah, yeah. I'm at the few thoughts. So, okay, the more reason, let's say, weight loss drugs or that their variants coming out are not now actually targeting just one, let's say, receptor, that is the GLP1 receptor. They are actually looking at more than one of them. So can we design a molecule that can target two, if not three, let's say different targets simultaneously. So GLP-1 is one of them, GYY, and then there are a few of them, GIP and few of them we can imagine. And then this is a really one thing that makes me somewhat frustrated by this, let's say, whole field of the drug discovery is that the choice of the targets is somewhat arbitrary. Because if we knew really about everything, why didn't we actually design a molecule for both GLP1, GLP and GIP from the beginning. Because we didn't know, we just saw that, ⁓ it works well when we target a GLP receptor. Let's try to go for the other receptors as well. And then some are going to be successful, some are going to be unsuccessful. We actually don't know, but we pretend that we know. I think that's the biggest issue, we pretend that we know. In fact, my thinking is that we should actually simply go for every possible combination of about 20 or so. Let's say targets that a lot of the pharmacists and biotechs are tackling all simultaneously and then test them en masse. That's probably the better way to go than what we are doing is that we go one at a time pretending that we know what we are doing, although we actually don't know exactly what is going on anyway. So it is definitely a wonder drug, but this drug is going to evolve quite rapidly over the next few years. And then along the way, hopefully we'll know how to actually control these things better. So not by making one drug and hoping that it's going to work, but designing the drugs that are much more specific and sophisticated so that it's going to work for the individuals or some subpopulations at a time much better. And this connects to all those issues of controllability. At the moment, what I see with the drugs is that they say, I have a laptop, and then I see that there is some issue. And what I'm going to do is that I'm going to actually try to pour some water or some kind of substance on my laptop. And then going to run the testing on 30,000 different laptops I have. And I'm hoping that one of the subsets that I poured onto my laptop is going to fix this problem miraculously. And then somehow, DownDraw is going to fix also other problems as well. And then this is a very ⁓ inefficient way. It could be made efficient if you do it at, let's say, millions, if not billions of the scale. But doing it only narrowly and then thinking or pretending that we know something is, I think, the worst kind of midpoint we are at. And then hopefully we can actually go to the massive experimentation and then deeper understanding simultaneously there. And then we'll be able to kind of say, let's just aim for weight loss. Let's aim for the trading, the particular addictions. Let's say maybe in the case of the, maybe cigarette smoking we want to target. We want to target for the gambling on that. Because of one weird thing about the GLP-1 agonist at the moment is that it doesn't actually work for the cigarette smoking. People stop drinking, but they still smoke. And then we don't know why that particular thing happens. Maybe nicotine has a different mechanism. Who knows, right? It's a very interesting thing that is happening there. So this kind of, let's say, GLP-1 and so on, absolutely fascinating. Now coming to antibiotics, there comes a really interesting economic issue that we need to talk about. What is amazing thing about the GLP-1 drugs that we see is that their safety profile is extremely good. There are side effects, but relative to the... many other drugs where the side effects can be extremely extremely sometimes fatal as well. GLP-1 tends to have a much better safety profile that makes it easier for pharmacists as well as pharmacists to prescribe them and then prescribe the compound version of them much more readily. And then because it can be prescribed to a large population. Alan, you're talking about why don't we give out to everyone? That means that there is now the economy of scale. Thereby the price can come down. But for most of the other drugs, that's not the case. And also for the antibiotic, there comes a weird issue where as soon as there is a particular resistant, let's say, of the bacteria for the particular antibiotic, the value of that antibiotic drops very dramatically because you cannot really prescribe the NLNR because the particular resistant strain is going to make it a point to have it. So what people do, so they, from the biotech and pharmaceuticals perspective, ⁓ antibiotic research or the development is an economic suicide, if you think about it, because it doesn't make making antibiotics easier. But antibiotics economy is actually in a shambles because the value is going to just drop off as soon as it's being prescribed quickly and then there are more resistance strain. So often what happens is that the government tends to actually jump in like all over the world, trying to buy this kind of frontier antibiotics and then save them. and then still use it to essentially subsidize the development. But the development cost is only going up and then the subsidy can stay only flat. So that's an issue. Now, this is also where I think AI is going to really help us because AI helps us finding better antibiotics or any kind of drugs because it can tell us based on all those existing data as well as experiments that it recommends, tell us where to look for. And then that's I think the superpower that AI is going to give us. And then... where to look for is going to be determined based on this kind of end-to-end prediction. Will it be very promising at the clinic? So then maybe, once that happens, it's going to change a lot. Genentech has been looking at the AI for antibiotic discovery for many years. There is definitely a progress and a promise that we can see.


Ravid Shwartz-Ziv: What do think about language models in this for this use case? Like for sure like in medical, know, when I'm going to the doctor and like he's just like, I recording ⁓ my notes and then like some LLM will collect it and analyze it and all these things, right? And maybe, right, like if you if to analyze my case and if I have a lot of like tags, it can be helpful


KC: Mm-hmm. Mm-hmm. Yes. Mm hmm. Yes, yes.


Ravid Shwartz-Ziv: for sure. But what do you think about direct design? Do you think that we can actually use Anto? Some knowledge that are out there, for example, in the literature, like Anto, to help it analyze or to use these coding skills, for example, to make more rigorous analyzes?


KC: Mm-hmm. Yes. yeah, absolutely. Actually on both sides. ⁓ perhaps I can start from the hospital side as well. ⁓ Hospitals, ⁓ thing that every clinician is ⁓ like very well on ⁓ how to write clinical notes. The reason why they are trained extremely ⁓ to write the clinical notes and they spend massive amount of time writing clinical notes is because ⁓ whole healthcare is designed so that even if the patient is being let's say treated by another doctor or seen by the another nurse, month if not years later, they should be able to get the same level of the healthcare as if the patient has been seeing the exactly same doctor over that duration. So thereby the clinical notes is where clinicians are trained to write everything that is relevant to the diagnosis as well as the treatments and then outcome of the treatment as well. And then what that means is that the Having this kind of language models or the let's call it clinical language models gives us a superpower. So rather than go into this kind of EHR database and then trying to extract out the particular features we believe are going to be helpful for building a predictor, what we now can do is to have this language model read this clinical notes that is the effectively enriched feature set that describe individual patient to make for higher predictions. the prediction models, build the prediction models, and then that can actually include, let's say, hundreds, if not thousands of them. Do we think that this patient is going to stay here for how many days? So length of stay. If we can make a prediction of the length of stay better than random chance, then we can suddenly optimize the entire hospital in order to accommodate more patients and provide them with a better care. can we actually guess that this patient may be readmitted within the 30 days at the time of the discharge? If it says that they potentially, then we're going to look at them once more because any patient who comes back even after they got discharged is a bad sign, not only for hospital, but for the patient. And then by ensuring that they only get discharged when they should, we actually effectively opens up the future capacity for the patients. So in this sense, these language models are amazing. gives us a superpower. That's the kind of thing that I've been working together with the NYU Langland, especially the Eric Orman, who's the neurosurgeon there as well. Now on the drug discovery, language model is even more fascinating there. Now what we actually tried when I was at Genentech until recently is to train one language model that is trained on the scientific articles. molecular data that we have, all those experimental measurements and so on, and the various developmental data, as well as the clinical trial data and the actual patient EHR data as well. So Roche actually has all those different units underneath. And then what we could see was that these models are able to make connections across these different stages that in some sense, spend several decades. So not a single human scientist can actually make that kind of connection because too much of the data as well as too diverse set of the data sources we get into. And then this language model is how we are actually going to make connections and capture all those correlations. Either causal or superiour, that's fine. We'll actually experimentally validate whether those correlations are causal or superiour, but that exists across all those stages. Thereby allow us to back propagate predicted clinical outcome per patient or the per patient group all the way to effectively to the coldness of the atoms in the molecule. I think that that's where we are actually going to. And that's the reason why these kind of large scale models are very exciting. Yes, it is a great transcription model that actually saves tons of time. So thereby 40 % of the time that are being spent on the note taking will be used for the patient. So that's amazing. But once further, we are essentially squeezing the entire 50 years of the process. into let's five years by doing a back propagation across 50 years of the potential let's say progress. So yeah, I'm very excited. That's why I think that everyone is excited about it. Hopefully the OpenAI and Thraupig and Microsoft are excited by this because of this reason rather than just putting out a web app saying that, well, here's open, ⁓ chat GPT health. Great, great idea, great thing, but not sure if that's what the frontier left should do.


Ravid Shwartz-Ziv: So I have a question regarding the... How much the data is dependent on the... How much you need to specialize in the data, right? I don't know. When I worked in my BA on detecting some features in Inbrows, so...


KC: and


Ravid Shwartz-Ziv: like we needed very specific types of knowledge in order to detect, like to understand each one of the organs and things like that. Do you think Entropic and OpenAI, like they can just throw data, throw compute and get results without knowing the specific field? Or do you think this is something that is very like task specific?


KC: Yeah, in 20, yeah, between 2016 and 18, we spent about three years at NYU Langona Radiology to collect the data set called the NYU Breast Cancer Screening Dataset Version 1. And then we really spent a lot of time filtering, cleaning up, and then finding follow-up exams to check whether the lesions that were detected in the breast cancer screening mammograms were indeed, let's say malignant or benign. So we spent three years getting all those data. Now, I think that this is probably the right way to do it, especially if you want to utilize the existing data as well as we can, because the existing data was collected and recorded and maintained without actually taking into account that it's going to be consumed by this AI algorithms or the machine learning algorithms. But going forward, the thing about any kind of data we think about in the real world is that the real world changes, thereby the future data is always more valuable than the past data. The value of the past data always drops, future data's value only goes up. So what that means is that if we are now in a situation where all these AI systems are being deployed or at least deployed in shadow as well. So we can, in fact, start with this kind of roughly curated data, as long as there is a large amount of data with a large model that captures all possible correlations there. We're going to deploy them maybe alongside the doctors or on its own in a situation where that is actually safe and then have to be reasonable to do so. And then we're going to use this model to collect more data and then mark which data is more important, which data tells us about the causal relationship or not, and then use that to fine tune it. So in a sense, last year, it reached a certain as well as a lot of people talk about the continual learning. Continual learning has been happening for already three, four years in the real world, in the wild, essentially. And then we just need to do it. in a better way and a more focused way for each of the individual ⁓ areas so that we go deep and then collect a much better future data, I think that that's the right way to go about it. So in that sense, yes, they are doing a good job, I believe. They just need to now figure out what is the next step. How are we going to ensure that they get a good enough feedback quickly so that all those past data is going to eventually be removed from the whole pile?


Allen Roush: Wow. So. ⁓ You know, there's these connections between language modeling as a task and ⁓ personalized medicine, DNA, and really a lot of what they do in bioinformatics is honestly the same stuff applied to different domains. And sometimes we're the ones looking over their shoulder, not the other way it's around. ⁓ So do you have any thoughts there? Because I got my dog's genetics tested and I've been fascinated. like, isn't this where clustering gets motivated for, right?


KC: Okay.


Allen Roush: dealing with DNA, like why hasn't this gotten better?


KC: Yeah, so there are a few things. There is this kind of idea that the genes or the effect of the genes are largely linear. That's kind of the belief that is exercised by many of the bioinformaticians as well as computational biologists. And that has been going on for many decades. Now, there are a couple of reasons because it's always a cycle, right? It's not that they necessarily believe that that is the truth, but rather that actually agrees well with the tools that they have at their disposal. Now lot of these linear models, in particular with the sparsity, does get us a large portion of the correlations that exist. So if I have certain type of the symptoms of the patients and if I have a genetic-less sequencing information, then I would anticipate that the couple of those genes are going to order their expression or the mutation are going to tell us about these symptoms or are associated with the symptoms, then I can now start from there and then say that, well, which one is the actual cause? Let's try to design a drug or some kind of preventive measure there. So that was kind of driven by the tools. But as we actually learn increasingly more, so is that the fact that the 90 % or more of the association is captured by the linearity does not imply that the 10 % or 5 % are not important. In fact, Perhaps the more important and controllable signals are often nonlinear and that exists in this kind of smaller variation in the data. So now that we now know how to train this kind of highly nonlinear sequence-based models or the graph-based models, now everyone started to see that in fact, there are highly nonlinear and higher order relationships or the functions that are happening across the different genomes as well as even the environmental factors as well. So that's, think, the reason how it works. And in fact, it's the same thing with natural language processing and machine learning as well. So when I was working more on the machine translation side in 2015 and 16, some of people actually did tell me that I wasn't working on kind of core machine learning because I was working on NLP. I wasn't working on vision. I wasn't working on anything else. But then now looking back, in fact, all those language models or the machine translation systems were built. have become suddenly the core of AI. So in a sense that this is kind of a say, what we can only understand well what we see. And then we tend to actually make a mistake that our perception of the world is the truth about the world. And then whenever our perception changes, our thought about the truth of the world also change. So I think that that's what's happening. And then you're indeed right, HLN and whatnot, all those techniques were either independently invented. or were invented from the first place in bioinformatics and many of these kind of computational biology.


Ravid Shwartz-Ziv: So what do you think so two questions like first, what do you think now if you have like I don't know 10 billion dollars and what do you think is the most? urgent or like impactful thing to do both in drug discovery and also like in health like general health care


KC: ⁓ 10 billion dollars and that... I see. ⁓


Ravid Shwartz-Ziv: And the follow up is like what do think will actually happen in the next like let's say 2, 3, 5 years to the film?


KC: So the retirement into an island is not an option I guess. Okay, all right, I'm going to skip that one.


Ravid Shwartz-Ziv: It's also possible but with like a doctor that look on your health all the time


KC: Right, right. So there are a couple of things that I think are really important. One is very immediate one is if I had a $10 billion, I mean, if I had just a billion dollars, that's fine. We need to actually work much more and much better on this kind of infrastructure side. So compute infrastructure is indeed becoming a bottleneck for going deep into individual domains at the moment because Many of, most of the compute resources are being effectively monopolized by very small number of the highly, let's say, competent, but extremely narrow minded, let's say companies in the world, most of which are actually in Silicon Valley these days. I'm not going to name any of those companies, but I think that we all know who they are. And then what this means is that yes, we are making huge progress, but then the progress is really narrowly defined. And if we are actually missing out on some of the areas such as healthcare drug discovery and whatnot, that we think are going to have an actual bigger impact and touch upon every single one in this society. So if I had a $10 billion, what I would actually do is to build a GPU cloud that are going to be more distributed, but are going to be in fact, much better priced, extremely competitive and provided to these startups as well as the companies that are working on other more still narrow, yes, other narrow areas and go deep and let them go deep like the healthcare drug discovery or whatnot. I think that that's the one thing that needs to be done immediately. And probably I shouldn't do it. It should be done by all the governments as well as the big companies all over the world. The second thing is that if I decided to go narrow and deep with the $10 billion, we need this creating a patient in the loop kind of let's say drug discovery system. And then I think the technologies are there, we just need to glue together the pieces where some are going to be AI algorithms, like training this big models inference, but also aspects of the causal inference, causal discovery and experimental design. So black box optimization combined together with the actual healthcare providers. Effectively, what I think needs to be done is that if I had $10 billion, ⁓ $10 billion is great. Okay, sounds great. I think I would actually go out there and then buy a hospital network. somewhere in Midwest. And then while running the hospital, on this side, in-house development of the entire, let's say, loop that involves a patient, clinicians, lab, and also the AI algorithm that is going to try to tackle the problem of the, let's say, extremely personalized or hyper-personalized ⁓ treatment, let's say, design and treatment. I think that's what needs to be done. I believe that if we put $10 billion over the next 5 to 10 years, we would be able to demonstrate that this can be done. From which, of course, whole society takes over and then you have to make it into a more standardized protocol. So that's what I think is going to happen. Even if I don't do it, I believe there are people who are at the frontier who are thinking about this, of course, Genentech being one of them. But isomorphic labs is thinking along that line. And then there are many new generations of the AI for biotech, AI for health startups that are probably thinking in a similar way.


Allen Roush: Do you think that maybe it's a little bit connected to the centralization versus decentralization? know, I tend to think AI is a winner take most industry, but healthcare hopefully isn't in a sense, right? mean, intuitively I want competition in there. Do you think that ⁓ it's better to be a smaller company or a bigger company in some sense? I mean, I'm sure most would say bigger, right? And they want more money, but like, do you think that there's more of


KC: Yeah.


Allen Roush: for small startups to outcompete current existing big players.


KC: ⁓ So there are two types of the scaling. So a lot of the scaling that we talk about nowadays when it comes to AI is horizontal scaling. Let's make the general purpose, let's say, GPU computing, let's scale it up so that we have a very big model that is able to know and then do a lot of things generally well across the different domains. But then, of course, there is the aspect of the vertical scaling. is that even if you have amazing, let's say, CHGPT or the GPT-like models, if those models never actually get to interact with the patients or never get to interact with the actual experimental devices, thereby interact with the real world, there will always be a limitation to which these models can resolve the superiors correlation from the causal correlation. And then without the causal correlation that is dominating what is being within these models, these models whatever comes out of these models will not be actually trustworthy for us to test it out. So what that means is that there has to be a vertical integration that needs to be done on the vertical scaling. In the case of the, of course, drug discovery, you need to be able to vertically integrate everything from the science all the way to the clinical trial. And then many of the big pharmas have the ability, I'm not entirely sure they have the will or the idea, kind of future looking prospect to be able to do so. But this is a kind of thing. In the case of healthcare, of course, you can think of it from the patient health providers all the way to the payers as well. And then this kind of vertical scaling. I think that this kind of vertical scaling in few domains like the healthcare, drug discovery, even coding included in my opinion, coding as well as the construction and many of the other stuff will actually be important. then those places, small companies will start, but rapidly those small companies will vertical integrate. thereby they will look like a very big company. But initially, it'll have to be small company because in company all those United sectors like manufacturing and so on, they unfortunately are very, very behind in thinking about what AI can do for their sector.


Allen Roush: Well, ⁓ no, you go.


Ravid Shwartz-Ziv: I want to talk a bit. I want to talk a bit about like regulation, you know, from American perspective, every regulation is bad. Every regulation is like making the progress slower, right? You don't need it. You need to remove anything that related to regulation. ⁓ But from other hand, right, we see that the American health system is quite broken, right? And do you think this is something that like, Currently, is it good enough? Do you think that we need more or less or different regulations? Both in healthcare and also drug designs.


KC: ⁓ Yeah, well generally, I mean, just like everyone else in the world, think the regulation always brings both the joy and the frustration. And often it depends on whether you are being regulated or whether you are regulating others, I feel like. But then ⁓ perhaps what I can say is that not necessarily for healthcare or drug discovery in particular, but generally about the regulation because there's a... major issue of the AI regulation these days as well. are two aspects that are missing at the moment in these kinds of discussions. Four are not missing. Sam Altman is never missing. Dario is never missing. And then all those famous senators, they are never missing. Josh Hawley is never missing somehow. They're showing up at every hearing. And then all those things. That's great. And then all those prime ministers and so on all over the world, they are never missing. And then all those companies are not missing. And then all those companies that are going to benefit from AI, they are never missing. But for missing, the first one is the actual people who are going to be subjected to AI. In fact, once the healthcare is going to be dominated by all those AI algorithms, we are all going to be the subject of the AI technology. And then in my view, regardless of our own expertise as a patient, we have to be involved in this discussion, but we also don't really involve the people who are users and are being subject to AI technologies. You see all those Senate hearings. You don't see the people who actually got hurt by this kind of misuse of the AI technologies, but rather we see all those star CEOs more so. So that's the one aspect that I think is problematic. So we need to try to involve these people. And the second thing is that there's a question of whether the regulation ⁓ should come before or after when the technology has been deployed for good or bad. And then... When it comes to this question, I'm more on the side of the latter. The reason being that I do want to give the chance to the legislature to be able to build a very kind of a say, loose, let's say, boundary that will be exercised by the judiciary. And only when that boundary turned out to be not enough, we should, using a legal, a legislature's power to narrow slowly, let's say, you know, let's shrink it. rather than start with the shrinkage and then trying to increase it later on because the laws are difficult to change. once the laws are there, changing that is going to be difficult due to the existence of the entire societies as a stakeholders in any one of them. So these two are the things that I have in my mind. But of course, this is something that there's no right answer. That's the reason why we have democracy and why we want to involve everyone when it comes to not the development of the technologies. But when it comes to regulating technologies, so this is just my thought as one of the, I don't know, seven billion people on this earth who have some kind of stake and right to this whole discussion.


Allen Roush: So what are your personal next steps? Because having this discussion, I want to vote for your campaign for legislature, you know, but I have a suspicion you're not going to run for office and do more great science. So what's the plan?


KC: Hahaha Well, so I'm now back 100 % at NYU. So I'm going to focus a bit on teaching as well as advising my students. But I am thinking a bit about, you know, what are the ways by which all those things that we discussed, I could actually, I don't know, lend some help. So on the NeoCloud side, I'm talking to a lot of people in Korea, including the government, as well as the companies to see if there's a way for me to put together the team to build one. On the side of this kind of biotech healthcare, I am indeed talking to a lot of people, but not necessarily as someone who's going to start something, but trying to get the sense of, so I'm essentially in the back, in the mode of the landscaping. What is happening? Who are actually looking at things? And more specifically, recently, together with my former colleagues, Steven Ra, as well as Kuno Choi, we started to look into what are the aspects of the biotech that is related to drug discovery. that can be scaled up in terms of the data generation and that are very, very close to patients. And then we started to study a lot about this kind of organoids and how organoids can be connected with the patient care directly in order to create this kind of data flywheel. And then with the AI in the middle, we can actually use this data flywheel to make the recommendation, both during the exploitation and then exploration. But of course, We have to be careful about this one for sure, right? You don't want to explore too much with the actual patient sitting in the doctor's office. So we're discussing about that. But generally, I'm in the mode of the exploration and the teaching. So having fun, actually, And also doing all those committee work at the university. They did not notice that I was back 100 % of time and they started to put me on all possible committees that you can imagine. So I'll be a bit busy trying to make departments here work.


Ravid Shwartz-Ziv: Hmm. have a question like related to this like what do you think about this? It looks at all the professors now like starting their own company, right? ⁓ ⁓ did it like long time ago. ⁓ What do you think about it? Do you think like this is like professors have like advantages in this ⁓ do you think this is more, yeah, we need to do something very empirically so


KC: Okay. ⁓ huh. All right. Mmm. Hmm.


Ravid Shwartz-Ziv: Let's leave all these people that write papers for the living.


KC: Yeah, I have so many ⁓ thoughts. One is that they should have made a company last year instead of the five years ago to raise much more and they had to become richer. But that aside, that's for personal thing. ⁓ There is a really weird thing that I noticed here is that, of course, I'm always ⁓ very pro, let's say, startups. I want my students make a company. I want students to go out there and then you have to contribute to building new startups and whatnot. I do have some of my former students have become entrepreneurs themselves. Like the William Falcon is one of the examples who founded the Lightning and then you have the now CEO of this gigantic NeoCloud. But when it comes to professors, there is a unique advantage that professors have in starting a company is that we are kind of trained to think deep and then talk a lot. And that actually helps because it's actually there is two aspects to it. One is that the... Because we think a lot, we tend to have a better expertise in particular area than anybody else that helps. And then we talk a lot and then talking about these deep ideas, help self-motivate and the motivation of team members and others as well. then this is, think, the superpower that people actually miss. Everyone goes for all those professors that say expertise in technology or the particular domain. But I think the superpower is more or less on the side of the, you actually deliver what the idea behind this technology in a convincing and in a way that is authoritative. think the professors are very good at it. I think they have a advantage. But the real thing that I started to notice is that quite a few of the companies or the startups that have just been founded over the past few months, they started to call it NeoLab, right? They started with the Ilias company, the Miros thinking machines and whatnot. then nowadays, there's a, what was the name? Flapping ⁓ wings? Flapping airplanes or something like that. But anyway, okay, so I feel like the name is a bit not serious enough, but that aside, that's not the point of it. And then what I see is that some of these professors crack the code. So what do we do actually daily is that we write so many grant proposals because we need to constantly raise money to support our students and our research. And as soon as the research is over, we need to raise again by writing all those grant proposals. And then this perpetual loop is the one thing that I dread about being a faculty member. But for instance, let's say somehow I magically raise $1 billion for this NeoLab that doesn't necessarily have a product, but just the idea of the direction that we want to pursue. I'm going to simply buy a billion dollar worth of the US Treasury bond. Maybe not nowadays, I don't know. The US Treasury seems to be a bit in a shambles, but anyway, bond. And then I get the coupon every year. Let's say 5%. So 5 % of a billion dollar is more than enough to run a research lab that's going to have a top-class researchers, counting, let's say, 20 to 30 of them, nonstop. They cracked the code. Ilya, I think, cracked the code that he now can do whatever the research he wants with the people that he wants to work with, without having to raise any more money, until the point at which he actually productized something. And then that's what happened with the OpenAI as well. OpenAI started with the pledge of the donation from Elon, as well as a few other people. And then they were just using that money to... churned out the research, but until the point at which they became a product company, because they had a product. So in a sense, I feel like these so-called neo-labs and the professors who are actually founding, starting these companies have cracked the code for private, let's say, research lab. And I'm very, both envious and curious, and also, I don't know, confused. Let's put it in that way.


Ravid Shwartz-Ziv: Jealous.


KC: Yeah jealous yes, ⁓ yes exactly yes


Ravid Shwartz-Ziv: But do you think for the long term these companies, these AI labs are more than or have higher chances to get some product out of it? like, ⁓ yeah, we will raise a lot of money, we will do some research for a few years and then someone will equal hire us.


Allen Roush: I would eat


KC: So the answer, yeah, my thought is no. So they won't be able to churn out the products that are going to be very successful, at least not with the existing founders and founding members on the helm. They want to do research, great. They want to do a lot of things. But as far as I can tell, lot of people, including myself, we have a great expertise in technology and doing research and even product development as well. But when you have a product, It's actually not only the quality of the product, but you need to push it out there and then make sure that the people use it. And then you need to work together with the competitors to make sure that the competition continues in a healthy way so that the whole pie grows and everything. All these things, I don't think we're necessarily good at this. Very small number of us will be able to convert into that kind of role and then we'll do very well. One of the examples is Karl-Morris Herman who used to be at DeepMind, worked a lot on the NLP and then the idea, I've known him for some time. He's now a full-time entrepreneur. has his own startup called the Reliant AI. Great. But it's very, very rare case. In my view, researchers should do research. Unfortunately, at this point, a lot of things that they want to do are at the boundary between the research and product development. Thereby, they go into the product development side and do the research there because it's easier to raise money. But eventually, when the balance is remat, they'll all come back to the academia or more of a research focused organization. That's my belief.


Ravid Shwartz-Ziv: So how do you see like the, yeah, to an island, yeah. But how do you see like the role of the academia in this situation? Do you think like what we actually need to do or like what you want to do? I don't know, publish papers, like focusing on develop better products, like what is the balance that you see both in academia and also in industry?


KC: Or retire, I don't know. Yes. Mm-hmm. Right.


Ravid Shwartz-Ziv: big industries loves.


KC: ⁓ University is a lousy place to develop any product, so I highly recommend you to instead don't develop a product at the university. We are not a good place to be. ⁓ Our role will continue to be exactly what it has been. It looks like our role has been robbed, but that's just because during the past 10 years or the 15 years, we just had a great time. So for instance, people like me in particular, I got super lucky. I started my study in machine learning and in particular the deep neural networks right before it got super popular and then everything got popular and then whatever we were doing were almost right like a step behind the productization. Thereby companies were pouring money, university was pouring money, everyone was pouring money into doing so. So we were just doing writing papers that should have been actually the white paper that leads to the actual product or the white paper that comes after the product but because of this small lack that existed back then. For about 10, 15 years, we could do what others had to do in terms of the product development at university, pretending in many parts that it's a research, that it's science, and that we were writing papers. And then we got spoiled. I think I got spoiled. And then we started to have this illusion that we do whatever we want and we're supposed to pay tons of money. That's absolutely not the case. And in fact, most of the industries that are more mature, go there, PhDs are a tiny fraction of the entire workforce. Because to make the actual product and push forward, ⁓ you don't need PhD, you need a well-trained people of all the ranks, as can imagine. PhDs are going to continue to do their research and write papers, but they are not majority nor the ones to actually make money as well. So it's going to come back to that one. But for now, we all are... actually having this illusion that something that we are supposed to do has been taken away, when in reality, that was never actually for the university to do. We just got super lucky. I got lucky. So I'm very grateful for being lucky, but that's what happened. So I think that it's going to just come back to the normal kind of state in a few years, if not a couple of decades. And then we'll just continue to do our research because yes, AI is going well, but I think that there are so many things that we want to answer because we want to know not because AI can't find it they can't know I want to know there's a curiosity aspect so we'll continue to do it I think it's going to be fun


Ravid Shwartz-Ziv: I agree in general. ⁓ Okay, we're almost out of time. ⁓ Do you have something that you want to add, that you want to advertise, that you want to say?


KC: Thank ⁓ Not really, but I just want to say that I think it's a very nice time to do research and science. And then despite this, I do hear from my own PhD students as well as many other students all over the world saying that they feel like they are lost and that they feel like they lost motivation to pursue this kind of PhD or the research kind of a career. And then I just want them to know that I think it's a great time to do research. There are so many exciting things to be found and discovered down the road. And especially this is the right time because we are probably making a huge progress in the industry. What that means is that everyone is going to look for the next paradigm to come out of this kind of university or the research area. great time to be, so no need to be demotivated. think it's a great time to be even motivated further, yes.


Ravid Shwartz-Ziv: Okay, yeah, and with this positive argument, we will finish. Thank you so much that you joined us today.


Allen Roush: Yeah.


KC: Thank you.


Allen Roush: Thank you. Yeah, it's been a real pleasure to have you on and talk healthcare. It's unfortunately near and dear to my heart now too.


KC: I'm very sorry about that. Hopefully we'll be able to make the healthcare better altogether so that there's going to be fewer and fewer people who will actually suffer from the health issues or the healthcare system in general.


Allen Roush: Amen.


Ravid Shwartz-Ziv: Yeah. And thank you all for joining us and see you next time. ⁓


KC: Thank you.