Google DeepMind CTO: Advancing AI Frontier, New Reasoning Methods, Video Generation’s Potential

Channel: Alex Kantrowitz
Published at: 2025-05-24
YouTube video id: dIPdY541vus
Source: https://www.youtube.com/watch?v=dIPdY541vus
What's going on in the heart of Google's
AI research operation? We'll find out
with Google DeepMind's chief technology
officer right after this. Welcome to Big
Technology Podcast, a show for
coolheaded and nuance conversation of
the tech world and beyond. We have a
great show for you today, a bonus show
just as Google's IO news hits the wire.
We have so much to talk about including
what's going on with the company, what
it's announced today, but also really
what is happening in the research effort
underlying it all. And we have a great
guest for you. Joining us today is Korai
Kavukolu. He is the chief technology
officer of DeepMind. We're going to
speak with Korai today and then tomorrow
you'll hear from DeepMind CEO Demesis
Avis. Korai, great to see you. Welcome
to the show. Thank you very much.
Welcome. Thanks for inviting. All right.
Uh it's great to be here. Folks, by the
way, if you're watching on video, uh
Cory and I are in two separate
conference rooms in uh Google's uh I
don't know, it's a pretty cool new
building that they have. It's called
what? Gradient gradient. We call it the
gradient canopy. Gradient canopy.
Anyway, we're here. Um and Google's uh
IO presentation is about to uh go on and
that's what's going on. So um anyway,
let's talk about what you're working on,
Korai. I mean I'm curious like what's
happening within Google uh Google deep
mind. Um can you tell me a little bit
about the biggest problems that your
research house is tackling?
Yeah thank you very much first of all I
mean great to have this conversation.
um like when when when you think about
the whole of Google deep mind like
there's first and foremost there's a
single vision right that we want to
build AGI that's our
goal and um but when we think about
building AGI there are two aspects to it
one doing all the research that is very
targeted on going towards building AGI
doing that research but we are also very
much ambitious and passionate about
doing a lot of research that showcases
and explores how AI even in its current
form can be used to impact the world. So
there are those two categories of
things. So like with Gemini models, with
VO models, all those kinds of generative
AI models and exciting things that is
happening there. That's our main line of
AGI research that is going on. We also
have uh as many people would know things
like alpha folds, our work on
mathematics, chemistry and all those
kinds of things where we are really
exploring the boundaries of how AI can
be used to do new kinds of science,
right? And then we have much more
general exploratory computer science
related um research that's going on. So,
it's a it's a big spectrum and there was
a moment a couple years ago where I
think those of us outside the company
started to notice, wow, Google is really
pushing allin on generative AI and I
think your career journey was a little
bit part of that. You went from
overseeing lots of projects to be
strictly focused on generative AI which
means you have I think one of the best
views in the world as to what is helping
models uh get better. And I wanted to
ask you a question that we've been
asking on the show a lot, which is the
scale question. Um, now Google has a
tremendous amount of compute at your
disposal. Uh, and so you basically have
the option. Is it scale that you want to
throw at these models or is it new
techniques? So let me just ask it to you
as plainly as I can. Is scale the star
right now or is it a supporting actor in
terms of trying to get models to the
next step?
It's a um it's a good question I think
also the way you framed it u because um
it is definitely an important definitely
an important factor right like because I
think like the way I'd like to think
about this is it's rare that in any
research problem you would have a
dimension that like pretty confidently
would would give you improvements right
like with of course like with maybe
diminishing returns but most of the time
with research it's always like that. So
like when we think about our research
right now in the case of generative AI
models right scale is definitely one of
those but like it's one of those things
that are equally important with other
things when we are thinking about our
architectures like the architectural
elements the algorithms that we put in
there that com that make up the model
right they are as important as the
scale. We of course analyze and
understand as with scale how do these
different architectures different
algorithms become more and more
effective. That's an important part
because you know that you are putting
more computational
capacity and like you want to make sure
that you research the kinds of
architectures and algorithms that pay
off the best under that kind of scaling
property. Right? But as I said that's
not the only one. Data is really
important. I think it is as critical as
any other thing. The algorithms,
architectures, modules that we put into
the system as important. Understanding
their properties with data with more
compute that is as important, right? And
then of course inference time techniques
is as important as well, right? Like
because now that you have a particular
architecture, a particular model, you
can multiply its reasoning
capabilities by making sure that you can
use that model over and over again
through different techniques at
inference time. You know, it to me it's
both hopeful and puzzling uh to hear
about all the different techniques to
make these models better. And I'll
explain that. Um, it's helpful because
it seems like we're definitely going to
see a lot of improvement from where the
models are today. And the models are
already pretty good. Um, the the the
thing that's puzzling to me is u the
idea with scale was there was
effectively limitless potential in
making these AI models bigger. And you
said the words diminishing returns. Um,
and we've heard that from from you and
basically everybody working on this
problem. And uh and it's no secret,
right, that right now we've been waiting
forever for uh
GPT5. Uh Meta had some problems with
Llama. Uh Anthropic has been trying to
tell us there's a new uh Claude Opus
model coming out forever. We haven't
seen it. So clearly a lot of the
research houses may maybe with the
exception of Google uh are struggling
with what you get from when you make the
models bigger. And so I just want to ask
you about that. I mean, it seems like
it's nice that there are all these
techniques, but again, thinking about
this one technique that was supposed to
have limitless potential, is that a
disappointment for the generative AI
field overall if that's not going to be
the case? Yeah. Um I I really I really
don't think about it that way because we
have been able to um push the
capabilities of the models quite
effectively, right? I think in a way um
the whole scale discussion starts from
the scaling laws right like scaling laws
uh explain the performance of the models
under both data and compute and number
of parameters right and like researching
all three in combination is the
important thing and when when I look at
the kind of um progress that we are
getting from that general technology I
think it is it is still improving Um
what I what I think is important is to
make sure that there's a broad spectrum
of research that is going on across the
board and like rather than thinking
about scaling only in one dimension
there's actually many different ways to
think about it and investing in those
and we can see the returns that I think
across the field really not just um not
just here at Google but across the field
many different models are improving with
quite significant steps right um so I
think as a field the progress has been
quite stellar I think it's very exciting
and in Google we are very excited about
the progress that we have been having
with Gemini models like going from 1.5
to two to 2.5 I think we had a very
steady progress very steady improvement
in the capabilities of models both in
the spectrum of the capability ities
that we have but also at the quality
level for each capability as well.
Right? So I think what I'm what I'm
excited about is we are pushing the
frontier all the time and we see returns
in many re research directions and many
different dimensions of um research
directions and um I'm excited that
there's actually I think there is um
there's a lot more progress to do and
there's a lot more progress that needs
to happen for reaching AGI as well. You
started your remarks today saying that
the goal is AGI and there's progress
that needs to happen before AGI. You
just said we had Yan Lakun on the show a
couple of weeks ago. You worked in Jan's
lab. Jan emphatically stated there is no
way the AI industry is going to reach
human level intelligence, which is his
term for AGI, just by scaling up LLMs.
Do you agree? Well, I mean, I think um
that's a hypothesis, right? that might
turn out to be true or not but also I
don't think that there is any research
lab that is trying to only do scaling of
the LMS so like I don't know if anyone
is actually trying to negate that
hypothesis or not I mean we are not from
from my point of view we are investing
in such a broad spectrum of research
that I think that is what is necessary
and clearly I think like many of the
researchers that I talked to and me
myself I think that um there is a lot
more um critical elements that needs to
be invented right so there is critical
innovations on our path to AGI that we
need to uh we need to get through that's
why we are still looking at this as a
very ambitious research problem and I
think it is important to keep that kind
of critical thinking in mind with any
research problem you always try to look
at multiple different hypotheses try to
look at many different solutions. A
research problem this ambitious like
probably the most important problem that
we are working in our lifetimes, right?
It is the hardest problem maybe we are
working as um as a problem as a research
um problem in our in our um in our work.
I think like um like having that really
ambitious research agenda and portfolio
and uh making investments in many
different directions is the important
thing from my point of view. What is
important is defining where the goal is
that our goal is AGI. Our goal is not to
build AGI in a particular way. What's
important is build the AGI in the right
way that is positively impactful. that
is building uh building on it that we
can bring a huge amount of benefits to
the world that's why we are trying to
research AGI that's why we are trying to
build AGI right like AGI in itself
sometimes like u it it might come across
as it's a goal in itself the goal in
itself is the fact that if we do that
then we can hugely benefit all of
society all of the world right that's
the goal so like with that
responsibility of course like you put in
not just partic particular it's not very
important to me if that particular
hypothesis is important or not. What is
important is we reach that with doing a
very ambitious research by pursuing a
very ambitious research agenda and
building a very strong um understanding
of the field of intelligence. Okay. So
let's get to a little bit of that
research agenda. One of the
announcements that you're making at uh
IO which is this week which just uh by
when this airs it will just have been
made is that you're there's a new
product called deepthink that you're
releasing which is uh relying on
reasoning or as you put it uh test time
compute I think I have that right in
terms of what the product's going to
look like. How effective has uh
including reasoning in these models been
in advancing them? I mean would you say
when you think about all the different
tech techniques that you've discussed so
far today uh scaling included how how uh
what sort of a magnitude improvement are
you seeing by uh using reasoning and
talk a little bit about deep think
okay I mean first of all deep think like
it's not necessarily it's not a like a
separate product it is a mode that we
are enabling our 2.5 pro model so that
like it can spend a lot more time during
inference time to think to build
hypothesis and the important thing is to
build parallel hypothesis rather than a
single chain of it can build parallel
ones and then re can reason over
multiple of those build a hypothesis
build an understanding over those and
then continue building those parallel
chains of thoughts but this one thinks a
little bit longer than your traditional
reasoning model it will I mean in the
current setup Yes, it takes longer and
it takes um because like understanding
those parallel to building those
parallel to thoughts it's all um it's
it's it's it's all a much more um longer
process but like one thing that we are
also um that we are
also positioning it as is right now it's
research right like we are sharing some
initial research results we are excited
about it we are excited about the
technique that what it enables
uh what it can actually uh what it can
actually enable in terms of new
capabilities and new new new performance
levels. But it's early days and that's
why uh we are only sharing it right now.
We're going to start sharing with safety
researchers and some trusted testers
because we want to also understand the
kinds of problems that people want to
solve with it and the kinds of new
capabilities it brings and how we should
train it the way that that we want to
train. Right. So it is it is early days
on that but it is like what what I think
is an exciting research direction that
we found in the inference time thinking
model space. Yeah. Can so can you talk
about what precisely it does different
than traditional reasoning models? Like
the current um reasoning thinking models
most of the time at least I can talk
from from our research point of view
builds a single chain of thought right
and then as you build a single chain of
thought and as the model continues to
attend to its chain of thought it builds
a better understanding of what response
it wants to give you. it can alternate
between different hypothesis reflect on
what it has done before. Now of course
like one if you think about it just also
in a visual kind of space. One kind of
scalability that you can bring onto the
table is can you have multiple parallel
chains of thoughts so that you can you
can actually um analyze different
hypotheses in parallel and then you will
have more capacity exploring different
kinds of hypothesis and then you can
look at you can compare those and then
you can eliminate the ones or you can
you can you can continue pursuing and
you can sort of expand on particular
ones. It's a very intuitive process in a
way but of course it is more involved. I
just want to cap this segment by asking
you um in terms of the pace of
improvement of models like I'm just
going to use the open AI uh schema just
to give an example. Um the progress this
is something that everybody who uh that
comes on the show says the progress of
going from like GPT3 to GPT4 was
undeniable. Um GPT4 to 4.5 less of a
leap. So I want to ask you just in terms
of the velocity of improvement if that's
the right way to put it. Are we coming
back down to earth a little bit right
now?
Um again when I look at our model family
right going from Gemini 1 to 1.5 to two
to now to 2.5 I'm very excited about the
pace that we have when I look at the
capabilities that we keep adding right
like uh we have always designed Gemini
models to be multimodel from the
beginning right like that was our
ambition because we want to build AGI we
want to make sure that uh we have models
that can fulfill the capabilities that
we expect from from from a general
intelligence. So multimodality was key
from the beginning and we have been as
the uh versions have been progressing we
have been adding that natural
multimodality more and more and more and
when when I look at the pace of
improvement in our reasoning
capabilities like lately we have added
the thinking capabilities and I think
with 2.5 pro um we wanted to make a big
leap in our reasoning capabilities our
coding capabilities and I think one of
the critical things is we are bringing
all these together in one single model
family and that is actually one of the
catalyzers of of of improvement and
improvement at pace as well. is harder,
but we find
that creating a single model that can
understand the world and then you can
ask questions about, oh, can you code me
um this this this sort of like a
simulation of a tree growing and then it
can do it right? that requires
understanding of a lot of the things not
just how to code because like again we
are trying to bring these models to be
useful to be usable by a very broad uh
audience and um I think our pace has
been really reflective of the research
investments that we have been doing
across the board. So no velocity
slowdown is what I'm hearing from you
which is good. Um look I think like let
me just put it in the way that I'm very
excited about everything that we have
been doing as Gemini progresses and
research is getting more and more
exciting. Of course like for us folks
who are doing research it is it's really
good. Okay. So I want to ask you you
know you're on the model side. I want to
ask you basically sometimes we debate on
the show what the value is of improving
models. So let me just like put a
thought experiment to you. Uh what do
you think the value of improving these
models by 10% would get us?
You froze for a second. So maybe if you
build up to the question maybe I missed
but I I got the latest part. Yeah. So
what do you think improving these models
by 10% would get us?
I think like um the question there is
like how do we define 10%.
Right. like uh that is where the um that
is where the value is defined already.
Right? One of the important things about
doing research and improving the models
is quantifying progress. Right? We use
many different ways to quantify progress
and not every one of them is linear and
not every one of them is linear with the
same slope. Right? So when we say by
improving 10% if we can improve 10% by
its understanding in math right
understanding of really highly complex
reasoning problems I think that is that
is a huge improvement because then that
actually expands the general knowledge
that that would indicate that the
general knowledge and the capabilities
of the models have expanded a lot right
and you would expect that that would
make the model a lot more applicable
able to a broader range of problems.
And what about if you uh improved the
model by like 50%. What would that get
you? Would the is your product team like
saying there are things that we can
build if this model was just like 50%
better?
Again, I think like we work with product
teams a lot, right? Like that's actually
a taking a step back. That's a that's a
quite important thing for me. um
thinking about AGI as a goal I think
that also goes through working with the
product
teams because it is important that when
we are building AGI it's a research
problem. Mhm. We are doing research but
the most critical thing is we we
actually understand what kind of
problems to solve what kind of domains
to evoke these models from the users. So
that user feedback and that knowledge
from the interaction with the users is
actually quite critical. So when our
products tell us about okay here is an
area that we want to improve on then
that is actually quite important
feedback for us that we can then turn
into metrics and and and and pursue
those like as you ask like I mean if as
we increase the capabilities of the
model across I think what is important
is across a broad range of metrics which
I think we have been seeing in Gemini as
I said from like 1.5 to 2.5 right You
can see the capability increases across
the model. A lot more people can
actually use the models in their daily
life to help them to to either learn
something new or to help them solve a an
issue that they see. But that's the
goal, right? Like at the end of the day
again like the like the reason we build
this technology is to build something
that is helpful and the products are a
critical aspect of how we measure and
how we understand what is helpful and
what is not and as we increase more in
that I think that's that's our main
ambition that's great let's take a
concrete example that again the company
Google is releasing today talking about
today which is uh V3 so this is your
video generation model and I I think
we've really seen an unbelievable uh
acceleration in terms of what these
models can do from the first generation
to second generation to the third. And
for listeners and viewers, what's hap
what Google is doing now is not only are
you able to generate scenes. Uh you're
able to generate them with sound. And
having watched one of these videos or a
couple of them, I can tell you the sound
matches. Um and then there's this other
crazy product that Google's putting out.
I think it's called flow where you could
just extend the scene that you've
generated and storyboard out like your
own ba basically short film. Yeah. So I
I'd love to hear your perspective on how
this happened and is this like you know
I kind of asked you what do we get at
10% 50% but is this kind of that perfect
example of um the model getting better
producing something that goes from you
know that's a fun little video to like
oh I can really use this now. Yes.
Um I think the main difference the main
um the main progress going from V2 to V3
from V1 to V2 it was a lot more about
understanding the physics and the
dynamics of the world with V2 I think
for the first time we could comfortably
say that for many many cases right the
model has understood the dynamics of the
world well that's very important right
like to be able to have a model that can
generate scenes and and and complex
scenes where there's dynamic environment
happening and also there's interactions
of objects happening. I remember one of
the things that was quite viral was like
cutting the tomato where it was so
precise the video generated by V2 that
um it looks so realistic that one like
like a person was slicing tomatoes and
the dynamics there and the and and how
both the like not just any single object
like how the hand moves but also the
interaction between the between
different objects the blade the tomato
how the slice falls down and everything.
It was very precise. Right? So that
interactive element was important.
Understanding the dynamics is about not
just understanding the dynamics of a
particular single object but it's also
multiple objects interacting with each
other which is much much more
complex. So I think there we had a big
jump with V3. I think we are doing
another jump in that aspect. But I see
the sound as an orthogonal a new
capability that is coming in. Of course,
our real world we have multiple senses
and vision and sound go hand in hand,
right? Like they are perfectly
correlated. We perceive them all the all
at the same time and they complement
each other. So to be able to have a
model that understands that
interactivity, that complimentarity and
being able to generate scenes and videos
that can generate both at the same time.
I think that speaks to the new like the
capability level of the model and like
the quality. I think like this is the
first step like there are very
impressive examples. there are examples
that are like um a little bit more
falling short of what you would say okay
this is really natural but like I think
this is an exciting step in terms of
expanding that capability and as you
said I think I'm excited to see how like
this kind of technology can be useful
right like you just said that oh it is
becoming useful I think that is great to
hear right like that like now this is a
technology that can be built and I think
flow is an experiment in that direction
to give it to the uh to give it to users
So that like for for for people to
experiment and build something with it.
Yeah. You like prompt a scene and then
they create that creates a scene then
you prompt the next scene and you can
continue to have a flow a story flow
which is a good good name for it. All
right. This next question comes to me
from a pretty smart uh AI researcher. um
they basically talked about how there's
this basic uh there's a tension between
open- source and proprietary and of
course we have companies like Google
that's building um you know obviously
attention is all you need the
transformer came from Google now
Google's building proprietary models we
saw deepseeek uh push the the um
state-of-the-art forward you could argue
u so this this person wanted to know and
I think it's a really good question um
is there a coordin ation or um or or
possible between open source and
proprietary. I mean we see open AAI
doing the you know their new open source
model or teasing it or should each uh
sort of side try to get its own part of
the market. What do you think?
Um I think like I want to say a couple
of things right like
um first and foremost again like take a
step back like there's a lot of research
that went into building this technology
right like of of course like in the last
like um two three years I think it
became so accessible and so general that
people are using in their daily lives
but there's a long history of research
that built up to this point right so
like as a research lab Google and like
before of course like there's deep mind
and Google brain two separate labs that
are working in tandem um in different
aspects and many of the technologies
that we see today has been built as
research prototypes right as research
ideas and have been published in papers
as you said transformers the most
critical technology that is underlying
things and then uh and then models like
Alph Go right alpha fold all of these
kinds of things all these research ideas
have been evolving into building the
knowledge space that we have right now.
All that research I think publications
and and and and open sourcing all those
have been a critical element because we
were uh we were we were really in the
exploratory space at those times.
Nowadays I think like the other thing
that we always like need to remember is
actually we have at Google we have our
GMA models right that that are there
that are the open weights models just
like llama open weights models we have
the gem open weights models the reason
to do those for us is also there's a
different community of developers and
users who want to interact with those
models who actually need that kind of
being able to download those weights
into their own environment enironment
and use that and and and build with
that. So I feel like it's not an
eitheror. I think there are different
kinds of use cases and communities that
actually benefit from different kinds of
models. But what is most important is at
the end of the day in the in the path
towards AGI of course it's important
that we are being conscious about what
we enable with the technologies that we
develop. So when we develop our frontier
technologies, we t we we choose to
develop them under the Gemini umbrella
which are not open weights models
because we want to also make sure that
we can be responsible in the way that
they are used as well. Right. Right. But
at the end of the day, what really
matters is the research that goes into
building the technology and doing that
research and pushing the frontier of the
technology and building it the right way
with the positive impact. And I think it
can happen both in open weights
ecosystem or in the closed system. But I
think like when I think about all the um
sort of the umbrella of things that we
are trying to do, we are quite ambitious
goals building AGI and doing it the
right way with the positive impact.
That's how we develop our Gemini models.
Okay. I have like 30 seconds left with
you. You're chief technology officer. Uh
are you a fan of vibe coding?
Yes, exactly. I I find it really
exciting, right? Like I mean because
like what it does is all of a sudden it
enables a lot of people who are not
necessarily who do not necessarily have
that coding background to build
applications. It's a whole new world
that is opening right like u you can
actually say oh I want an application
like this and then you you see it you
can imagine what kinds of things could
be possible in the space of learning
right you want to learn about something
you can you can have a textual
representation but you can ask the model
to build you an application that
explains you certain concepts and it
would do it right and this is the
beginning right like some things it does
well some things it doesn't well it
doesn't do well but I find it really
exciting. This is the kinds of things
that the technology brings. All of a
sudden like the whole space of building
applications, the whole space of
building um dynamic interactive
applications becomes accessible to a
large broader community and set of
people. All right, great to see you.
Thank you so much for coming on the
show. Yeah, thank you very much. Thanks
for inviting Alex. Definitely. We'll
have to do it again in person sometime.
All right, everybody. Thank you for
listening. We'll have Dennis Asabis on
the CEO of Google DeepMind tomorrow and
so we invite you to join us then. We'll
see you next time on Big Technology
Podcast.