Google DeepMind CTO: Advancing AI Frontier, New Reasoning Methods, Video Generation’s Potential
Channel: Alex Kantrowitz
Published at: 2025-05-24
YouTube video id: dIPdY541vus
Source: https://www.youtube.com/watch?v=dIPdY541vus
What's going on in the heart of Google's AI research operation? We'll find out with Google DeepMind's chief technology officer right after this. Welcome to Big Technology Podcast, a show for coolheaded and nuance conversation of the tech world and beyond. We have a great show for you today, a bonus show just as Google's IO news hits the wire. We have so much to talk about including what's going on with the company, what it's announced today, but also really what is happening in the research effort underlying it all. And we have a great guest for you. Joining us today is Korai Kavukolu. He is the chief technology officer of DeepMind. We're going to speak with Korai today and then tomorrow you'll hear from DeepMind CEO Demesis Avis. Korai, great to see you. Welcome to the show. Thank you very much. Welcome. Thanks for inviting. All right. Uh it's great to be here. Folks, by the way, if you're watching on video, uh Cory and I are in two separate conference rooms in uh Google's uh I don't know, it's a pretty cool new building that they have. It's called what? Gradient gradient. We call it the gradient canopy. Gradient canopy. Anyway, we're here. Um and Google's uh IO presentation is about to uh go on and that's what's going on. So um anyway, let's talk about what you're working on, Korai. I mean I'm curious like what's happening within Google uh Google deep mind. Um can you tell me a little bit about the biggest problems that your research house is tackling? Yeah thank you very much first of all I mean great to have this conversation. um like when when when you think about the whole of Google deep mind like there's first and foremost there's a single vision right that we want to build AGI that's our goal and um but when we think about building AGI there are two aspects to it one doing all the research that is very targeted on going towards building AGI doing that research but we are also very much ambitious and passionate about doing a lot of research that showcases and explores how AI even in its current form can be used to impact the world. So there are those two categories of things. So like with Gemini models, with VO models, all those kinds of generative AI models and exciting things that is happening there. That's our main line of AGI research that is going on. We also have uh as many people would know things like alpha folds, our work on mathematics, chemistry and all those kinds of things where we are really exploring the boundaries of how AI can be used to do new kinds of science, right? And then we have much more general exploratory computer science related um research that's going on. So, it's a it's a big spectrum and there was a moment a couple years ago where I think those of us outside the company started to notice, wow, Google is really pushing allin on generative AI and I think your career journey was a little bit part of that. You went from overseeing lots of projects to be strictly focused on generative AI which means you have I think one of the best views in the world as to what is helping models uh get better. And I wanted to ask you a question that we've been asking on the show a lot, which is the scale question. Um, now Google has a tremendous amount of compute at your disposal. Uh, and so you basically have the option. Is it scale that you want to throw at these models or is it new techniques? So let me just ask it to you as plainly as I can. Is scale the star right now or is it a supporting actor in terms of trying to get models to the next step? It's a um it's a good question I think also the way you framed it u because um it is definitely an important definitely an important factor right like because I think like the way I'd like to think about this is it's rare that in any research problem you would have a dimension that like pretty confidently would would give you improvements right like with of course like with maybe diminishing returns but most of the time with research it's always like that. So like when we think about our research right now in the case of generative AI models right scale is definitely one of those but like it's one of those things that are equally important with other things when we are thinking about our architectures like the architectural elements the algorithms that we put in there that com that make up the model right they are as important as the scale. We of course analyze and understand as with scale how do these different architectures different algorithms become more and more effective. That's an important part because you know that you are putting more computational capacity and like you want to make sure that you research the kinds of architectures and algorithms that pay off the best under that kind of scaling property. Right? But as I said that's not the only one. Data is really important. I think it is as critical as any other thing. The algorithms, architectures, modules that we put into the system as important. Understanding their properties with data with more compute that is as important, right? And then of course inference time techniques is as important as well, right? Like because now that you have a particular architecture, a particular model, you can multiply its reasoning capabilities by making sure that you can use that model over and over again through different techniques at inference time. You know, it to me it's both hopeful and puzzling uh to hear about all the different techniques to make these models better. And I'll explain that. Um, it's helpful because it seems like we're definitely going to see a lot of improvement from where the models are today. And the models are already pretty good. Um, the the the thing that's puzzling to me is u the idea with scale was there was effectively limitless potential in making these AI models bigger. And you said the words diminishing returns. Um, and we've heard that from from you and basically everybody working on this problem. And uh and it's no secret, right, that right now we've been waiting forever for uh GPT5. Uh Meta had some problems with Llama. Uh Anthropic has been trying to tell us there's a new uh Claude Opus model coming out forever. We haven't seen it. So clearly a lot of the research houses may maybe with the exception of Google uh are struggling with what you get from when you make the models bigger. And so I just want to ask you about that. I mean, it seems like it's nice that there are all these techniques, but again, thinking about this one technique that was supposed to have limitless potential, is that a disappointment for the generative AI field overall if that's not going to be the case? Yeah. Um I I really I really don't think about it that way because we have been able to um push the capabilities of the models quite effectively, right? I think in a way um the whole scale discussion starts from the scaling laws right like scaling laws uh explain the performance of the models under both data and compute and number of parameters right and like researching all three in combination is the important thing and when when I look at the kind of um progress that we are getting from that general technology I think it is it is still improving Um what I what I think is important is to make sure that there's a broad spectrum of research that is going on across the board and like rather than thinking about scaling only in one dimension there's actually many different ways to think about it and investing in those and we can see the returns that I think across the field really not just um not just here at Google but across the field many different models are improving with quite significant steps right um so I think as a field the progress has been quite stellar I think it's very exciting and in Google we are very excited about the progress that we have been having with Gemini models like going from 1.5 to two to 2.5 I think we had a very steady progress very steady improvement in the capabilities of models both in the spectrum of the capability ities that we have but also at the quality level for each capability as well. Right? So I think what I'm what I'm excited about is we are pushing the frontier all the time and we see returns in many re research directions and many different dimensions of um research directions and um I'm excited that there's actually I think there is um there's a lot more progress to do and there's a lot more progress that needs to happen for reaching AGI as well. You started your remarks today saying that the goal is AGI and there's progress that needs to happen before AGI. You just said we had Yan Lakun on the show a couple of weeks ago. You worked in Jan's lab. Jan emphatically stated there is no way the AI industry is going to reach human level intelligence, which is his term for AGI, just by scaling up LLMs. Do you agree? Well, I mean, I think um that's a hypothesis, right? that might turn out to be true or not but also I don't think that there is any research lab that is trying to only do scaling of the LMS so like I don't know if anyone is actually trying to negate that hypothesis or not I mean we are not from from my point of view we are investing in such a broad spectrum of research that I think that is what is necessary and clearly I think like many of the researchers that I talked to and me myself I think that um there is a lot more um critical elements that needs to be invented right so there is critical innovations on our path to AGI that we need to uh we need to get through that's why we are still looking at this as a very ambitious research problem and I think it is important to keep that kind of critical thinking in mind with any research problem you always try to look at multiple different hypotheses try to look at many different solutions. A research problem this ambitious like probably the most important problem that we are working in our lifetimes, right? It is the hardest problem maybe we are working as um as a problem as a research um problem in our in our um in our work. I think like um like having that really ambitious research agenda and portfolio and uh making investments in many different directions is the important thing from my point of view. What is important is defining where the goal is that our goal is AGI. Our goal is not to build AGI in a particular way. What's important is build the AGI in the right way that is positively impactful. that is building uh building on it that we can bring a huge amount of benefits to the world that's why we are trying to research AGI that's why we are trying to build AGI right like AGI in itself sometimes like u it it might come across as it's a goal in itself the goal in itself is the fact that if we do that then we can hugely benefit all of society all of the world right that's the goal so like with that responsibility of course like you put in not just partic particular it's not very important to me if that particular hypothesis is important or not. What is important is we reach that with doing a very ambitious research by pursuing a very ambitious research agenda and building a very strong um understanding of the field of intelligence. Okay. So let's get to a little bit of that research agenda. One of the announcements that you're making at uh IO which is this week which just uh by when this airs it will just have been made is that you're there's a new product called deepthink that you're releasing which is uh relying on reasoning or as you put it uh test time compute I think I have that right in terms of what the product's going to look like. How effective has uh including reasoning in these models been in advancing them? I mean would you say when you think about all the different tech techniques that you've discussed so far today uh scaling included how how uh what sort of a magnitude improvement are you seeing by uh using reasoning and talk a little bit about deep think okay I mean first of all deep think like it's not necessarily it's not a like a separate product it is a mode that we are enabling our 2.5 pro model so that like it can spend a lot more time during inference time to think to build hypothesis and the important thing is to build parallel hypothesis rather than a single chain of it can build parallel ones and then re can reason over multiple of those build a hypothesis build an understanding over those and then continue building those parallel chains of thoughts but this one thinks a little bit longer than your traditional reasoning model it will I mean in the current setup Yes, it takes longer and it takes um because like understanding those parallel to building those parallel to thoughts it's all um it's it's it's it's all a much more um longer process but like one thing that we are also um that we are also positioning it as is right now it's research right like we are sharing some initial research results we are excited about it we are excited about the technique that what it enables uh what it can actually uh what it can actually enable in terms of new capabilities and new new new performance levels. But it's early days and that's why uh we are only sharing it right now. We're going to start sharing with safety researchers and some trusted testers because we want to also understand the kinds of problems that people want to solve with it and the kinds of new capabilities it brings and how we should train it the way that that we want to train. Right. So it is it is early days on that but it is like what what I think is an exciting research direction that we found in the inference time thinking model space. Yeah. Can so can you talk about what precisely it does different than traditional reasoning models? Like the current um reasoning thinking models most of the time at least I can talk from from our research point of view builds a single chain of thought right and then as you build a single chain of thought and as the model continues to attend to its chain of thought it builds a better understanding of what response it wants to give you. it can alternate between different hypothesis reflect on what it has done before. Now of course like one if you think about it just also in a visual kind of space. One kind of scalability that you can bring onto the table is can you have multiple parallel chains of thoughts so that you can you can actually um analyze different hypotheses in parallel and then you will have more capacity exploring different kinds of hypothesis and then you can look at you can compare those and then you can eliminate the ones or you can you can you can continue pursuing and you can sort of expand on particular ones. It's a very intuitive process in a way but of course it is more involved. I just want to cap this segment by asking you um in terms of the pace of improvement of models like I'm just going to use the open AI uh schema just to give an example. Um the progress this is something that everybody who uh that comes on the show says the progress of going from like GPT3 to GPT4 was undeniable. Um GPT4 to 4.5 less of a leap. So I want to ask you just in terms of the velocity of improvement if that's the right way to put it. Are we coming back down to earth a little bit right now? Um again when I look at our model family right going from Gemini 1 to 1.5 to two to now to 2.5 I'm very excited about the pace that we have when I look at the capabilities that we keep adding right like uh we have always designed Gemini models to be multimodel from the beginning right like that was our ambition because we want to build AGI we want to make sure that uh we have models that can fulfill the capabilities that we expect from from from a general intelligence. So multimodality was key from the beginning and we have been as the uh versions have been progressing we have been adding that natural multimodality more and more and more and when when I look at the pace of improvement in our reasoning capabilities like lately we have added the thinking capabilities and I think with 2.5 pro um we wanted to make a big leap in our reasoning capabilities our coding capabilities and I think one of the critical things is we are bringing all these together in one single model family and that is actually one of the catalyzers of of of improvement and improvement at pace as well. is harder, but we find that creating a single model that can understand the world and then you can ask questions about, oh, can you code me um this this this sort of like a simulation of a tree growing and then it can do it right? that requires understanding of a lot of the things not just how to code because like again we are trying to bring these models to be useful to be usable by a very broad uh audience and um I think our pace has been really reflective of the research investments that we have been doing across the board. So no velocity slowdown is what I'm hearing from you which is good. Um look I think like let me just put it in the way that I'm very excited about everything that we have been doing as Gemini progresses and research is getting more and more exciting. Of course like for us folks who are doing research it is it's really good. Okay. So I want to ask you you know you're on the model side. I want to ask you basically sometimes we debate on the show what the value is of improving models. So let me just like put a thought experiment to you. Uh what do you think the value of improving these models by 10% would get us? You froze for a second. So maybe if you build up to the question maybe I missed but I I got the latest part. Yeah. So what do you think improving these models by 10% would get us? I think like um the question there is like how do we define 10%. Right. like uh that is where the um that is where the value is defined already. Right? One of the important things about doing research and improving the models is quantifying progress. Right? We use many different ways to quantify progress and not every one of them is linear and not every one of them is linear with the same slope. Right? So when we say by improving 10% if we can improve 10% by its understanding in math right understanding of really highly complex reasoning problems I think that is that is a huge improvement because then that actually expands the general knowledge that that would indicate that the general knowledge and the capabilities of the models have expanded a lot right and you would expect that that would make the model a lot more applicable able to a broader range of problems. And what about if you uh improved the model by like 50%. What would that get you? Would the is your product team like saying there are things that we can build if this model was just like 50% better? Again, I think like we work with product teams a lot, right? Like that's actually a taking a step back. That's a that's a quite important thing for me. um thinking about AGI as a goal I think that also goes through working with the product teams because it is important that when we are building AGI it's a research problem. Mhm. We are doing research but the most critical thing is we we actually understand what kind of problems to solve what kind of domains to evoke these models from the users. So that user feedback and that knowledge from the interaction with the users is actually quite critical. So when our products tell us about okay here is an area that we want to improve on then that is actually quite important feedback for us that we can then turn into metrics and and and and pursue those like as you ask like I mean if as we increase the capabilities of the model across I think what is important is across a broad range of metrics which I think we have been seeing in Gemini as I said from like 1.5 to 2.5 right You can see the capability increases across the model. A lot more people can actually use the models in their daily life to help them to to either learn something new or to help them solve a an issue that they see. But that's the goal, right? Like at the end of the day again like the like the reason we build this technology is to build something that is helpful and the products are a critical aspect of how we measure and how we understand what is helpful and what is not and as we increase more in that I think that's that's our main ambition that's great let's take a concrete example that again the company Google is releasing today talking about today which is uh V3 so this is your video generation model and I I think we've really seen an unbelievable uh acceleration in terms of what these models can do from the first generation to second generation to the third. And for listeners and viewers, what's hap what Google is doing now is not only are you able to generate scenes. Uh you're able to generate them with sound. And having watched one of these videos or a couple of them, I can tell you the sound matches. Um and then there's this other crazy product that Google's putting out. I think it's called flow where you could just extend the scene that you've generated and storyboard out like your own ba basically short film. Yeah. So I I'd love to hear your perspective on how this happened and is this like you know I kind of asked you what do we get at 10% 50% but is this kind of that perfect example of um the model getting better producing something that goes from you know that's a fun little video to like oh I can really use this now. Yes. Um I think the main difference the main um the main progress going from V2 to V3 from V1 to V2 it was a lot more about understanding the physics and the dynamics of the world with V2 I think for the first time we could comfortably say that for many many cases right the model has understood the dynamics of the world well that's very important right like to be able to have a model that can generate scenes and and and complex scenes where there's dynamic environment happening and also there's interactions of objects happening. I remember one of the things that was quite viral was like cutting the tomato where it was so precise the video generated by V2 that um it looks so realistic that one like like a person was slicing tomatoes and the dynamics there and the and and how both the like not just any single object like how the hand moves but also the interaction between the between different objects the blade the tomato how the slice falls down and everything. It was very precise. Right? So that interactive element was important. Understanding the dynamics is about not just understanding the dynamics of a particular single object but it's also multiple objects interacting with each other which is much much more complex. So I think there we had a big jump with V3. I think we are doing another jump in that aspect. But I see the sound as an orthogonal a new capability that is coming in. Of course, our real world we have multiple senses and vision and sound go hand in hand, right? Like they are perfectly correlated. We perceive them all the all at the same time and they complement each other. So to be able to have a model that understands that interactivity, that complimentarity and being able to generate scenes and videos that can generate both at the same time. I think that speaks to the new like the capability level of the model and like the quality. I think like this is the first step like there are very impressive examples. there are examples that are like um a little bit more falling short of what you would say okay this is really natural but like I think this is an exciting step in terms of expanding that capability and as you said I think I'm excited to see how like this kind of technology can be useful right like you just said that oh it is becoming useful I think that is great to hear right like that like now this is a technology that can be built and I think flow is an experiment in that direction to give it to the uh to give it to users So that like for for for people to experiment and build something with it. Yeah. You like prompt a scene and then they create that creates a scene then you prompt the next scene and you can continue to have a flow a story flow which is a good good name for it. All right. This next question comes to me from a pretty smart uh AI researcher. um they basically talked about how there's this basic uh there's a tension between open- source and proprietary and of course we have companies like Google that's building um you know obviously attention is all you need the transformer came from Google now Google's building proprietary models we saw deepseeek uh push the the um state-of-the-art forward you could argue u so this this person wanted to know and I think it's a really good question um is there a coordin ation or um or or possible between open source and proprietary. I mean we see open AAI doing the you know their new open source model or teasing it or should each uh sort of side try to get its own part of the market. What do you think? Um I think like I want to say a couple of things right like um first and foremost again like take a step back like there's a lot of research that went into building this technology right like of of course like in the last like um two three years I think it became so accessible and so general that people are using in their daily lives but there's a long history of research that built up to this point right so like as a research lab Google and like before of course like there's deep mind and Google brain two separate labs that are working in tandem um in different aspects and many of the technologies that we see today has been built as research prototypes right as research ideas and have been published in papers as you said transformers the most critical technology that is underlying things and then uh and then models like Alph Go right alpha fold all of these kinds of things all these research ideas have been evolving into building the knowledge space that we have right now. All that research I think publications and and and and open sourcing all those have been a critical element because we were uh we were we were really in the exploratory space at those times. Nowadays I think like the other thing that we always like need to remember is actually we have at Google we have our GMA models right that that are there that are the open weights models just like llama open weights models we have the gem open weights models the reason to do those for us is also there's a different community of developers and users who want to interact with those models who actually need that kind of being able to download those weights into their own environment enironment and use that and and and build with that. So I feel like it's not an eitheror. I think there are different kinds of use cases and communities that actually benefit from different kinds of models. But what is most important is at the end of the day in the in the path towards AGI of course it's important that we are being conscious about what we enable with the technologies that we develop. So when we develop our frontier technologies, we t we we choose to develop them under the Gemini umbrella which are not open weights models because we want to also make sure that we can be responsible in the way that they are used as well. Right. Right. But at the end of the day, what really matters is the research that goes into building the technology and doing that research and pushing the frontier of the technology and building it the right way with the positive impact. And I think it can happen both in open weights ecosystem or in the closed system. But I think like when I think about all the um sort of the umbrella of things that we are trying to do, we are quite ambitious goals building AGI and doing it the right way with the positive impact. That's how we develop our Gemini models. Okay. I have like 30 seconds left with you. You're chief technology officer. Uh are you a fan of vibe coding? Yes, exactly. I I find it really exciting, right? Like I mean because like what it does is all of a sudden it enables a lot of people who are not necessarily who do not necessarily have that coding background to build applications. It's a whole new world that is opening right like u you can actually say oh I want an application like this and then you you see it you can imagine what kinds of things could be possible in the space of learning right you want to learn about something you can you can have a textual representation but you can ask the model to build you an application that explains you certain concepts and it would do it right and this is the beginning right like some things it does well some things it doesn't well it doesn't do well but I find it really exciting. This is the kinds of things that the technology brings. All of a sudden like the whole space of building applications, the whole space of building um dynamic interactive applications becomes accessible to a large broader community and set of people. All right, great to see you. Thank you so much for coming on the show. Yeah, thank you very much. Thanks for inviting Alex. Definitely. We'll have to do it again in person sometime. All right, everybody. Thank you for listening. We'll have Dennis Asabis on the CEO of Google DeepMind tomorrow and so we invite you to join us then. We'll see you next time on Big Technology Podcast.