Defying Gravity - Kevin Hou, Google DeepMind

Channel: aiDotEngineer
Published at: 2025-12-02
YouTube video id: HN-F-OQe6j0
Source: https://www.youtube.com/watch?v=HN-F-OQe6j0
[music]
All right. Hello. Last one of the day.
Can we get a uh little energy boost?
Who's ready? Who's ready? [applause]
All right, happy Friday. I hope everyone
has had a good week, a good conference.
Um, and let me tell you, it's been a
really bad week if you are Gravity.
Wicked 2 is coming out tonight. And
then, of course, Anti-gravity came out
earlier this week alongside Gemini 3 Pro
on Tuesday.
Google Anti-gravity is a brand new IDE
out of Google DeepMind. It's the first
one from a foundational lab and it is
coming right off the press. In fact, um
I probably should be working on the
product right now, but I wanted to spend
some time to share what we've built here
today.
Anti-gravity is unapologetically
agent first. And today, I'm going to
tell you a little bit about what that
means and how it manifests in the
product. But perhaps maybe a little bit
more interestingly, we're going to talk
a little bit about how we got here.
Product principles, direction of the
industry, these sorts of things. Um so
my name is Kevin How. I lead our product
engineering team at Google Antigravity.
And let's start with the basics. Um, and
first just to get a sense of the room.
Um, who has used anti-gravity?
All right, there you go. Power of
Google. Love it. Um, who's used the
agent manager?
Cool. Nice. Good. Good. All right. So,
basics of anti-gravity.
Anti-gravity, notably anti-gravity, not
anti-gravity. Anti-gravity. It's an AI
developer platform with three surfaces.
The first one is an editor. The second
one is a browser and the third one is
the agent manager. So we'll dive into
what this means, which one what what
each looks like. So a paradigm shift
here is that agents are now living
outside of your IDE and they can
interact across many different surfaces
that your agent or that you as a
software developer might spend time in.
And let's start with the agent manager.
So that's the thing up top. This is your
central hub. It's an agent first view
and it pulls you one level higher than
just looking at your code. So instead of
looking at diffs, you'll be kind of a
little bit further back. And at any
given time, there is one agent manager
window.
Now you have an AI editor. This is
probably what you've grown to love and
expect. Has all the bells and whistles
that you would expect. Uh lightning fast
autocomplete. This is the part where you
can make your memes about yes, we forked
VS Code. And it has an agent sidebar.
And this is the sort of thing it's
mirrored with the agent manager. And
this is when you need to dive into your
editor to accomplish maybe your 80% to
100% of your task. And at any point, we
made it very very easy because we
recognize not everything can be done
purely with an agent for you to command
E or control E and hop instantly from
the editor into the agent manager and
vice versa. And this takes on under 100
milliseconds. It's zippy. And then
finally, something that I love, an agent
controlled browser. This is really,
really cool. And hopefully for the folks
in the room that have tried
anti-gravity, you've noticed some of the
magic that we've put in behind here. So,
we have an agent controlled Chrome
browser. And this gives the agent access
to the richness of the web. And I mean
that in two ways. The first one, context
retrieval, right? It has the same
authentication that you would in your
normal Chrome. You can give it access to
your Google Docs. You can give it access
to, you know, your GitHub dashboards and
things like that and interact with a
browser like you would as an engineer.
But also what you're seeing on the
screen is that it lets you it lets the
agent take control of your browser,
click and scroll and run JavaScript and
do all the things that you would do to
test your apps. So here I put together
this like random artwork generator. All
you do is refresh and you get a new
picture of um like a Thomas piece of
Thomas Cole artwork. And now we added in
a new feature which is this little
little modal card. and the agent
actually went out and said, "Okay, I
made all the code, but instead of
showing you a diff of what I did, let's
instead show you a recording of Chrome."
So, this is a recording of Chrome where
the blue circle is the mouse. It's
moving around the screen, and in this
way, you get verifiable results. So,
this is what we're very excited about
our uh our Chrome browser. And then the
agent manager can serve as your control
panel. The editor and the browser are
tools for your agent. And we want you to
spend time in the agent manager. And as
models get better and better, I bet you
you're going to be spending more and
more time inside of this agent manager.
And it has an inbox, and I'll talk a
little bit about this and sort of why we
did this, but it lets you manage many
agents at once. So you can have things
that require your attention. For
example, running terminal commands. We
don't want it to just kind of go off and
just run every terminal command. There
are probably some commands that you want
to make sure you you hit okay on. So
things like this will get surfaced
inside of this inbox. One click, you can
manage many different things happening
at once.
And it has a wonderful OS level
notification. So if there is something
that you need, it will sort of let you
know. And this kind of solves that
problem of multi-threading across many
tasks at once. And so our team is
thrilled to launch this brand new
product. It's a brand new product
paradigm. And we did so in conjunction
with Gemini 3, which was a very exciting
week for the team. But alas, we ran out
of capacity.
[laughter]
Um, this has been tormenting me the last
couple of days. And so I apologize. On
behalf of the anti-gravity team, I'd
like to apologize for our global chip
shortage. Um, we're working around the
clock to try and make this work for you.
Uh, hopefully we'll have a few less of
these sorts of errors. Um, but we, it's
what's been really exciting is people
who have used the product have seen what
the magic of combining these three
surfaces can do for your workflows, for
your software development. Um, so let's
talk about it. Why did we build the
product? How did we arrive at this sort
of conclusion? You might say, "Oh,
adding in a new window, it's pretty
pretty random, right? It's this one to
many relationship between the agent
manager and many other surfaces.
Um, and it's important to remember I've
I've been at this conference a couple of
times and and everything every single
time there is this theme. The product is
only ever as good as the models that
power it. And this is very important for
us as builders, right? Every year there
is this sort of new step function. The
first there was a year when it was
autocomplete, right? Copilot. And this
this sort of thing was only enabled
because models suddenly got good at
doing this short form autocomplete. And
then we had chat. We had chat with RHF.
Then we had agents. So you can see how
every single one of these product
paradigms is sort of motivated by some
change that happens with model
capabilities. And it's a blessing that
our team is able to work and be embedded
inside of DeepMind. We had access to
Gemini for a couple of months um earlier
and we were able to work with the
research team to basically figure out
you know what are the strengths that we
want to show off in our product. what
are the things that we can exploit and
then also what are the gaps right this
desired experience where are the gaps in
the model and and how can we fix that
right and so this is this was a very
very powerful part of why anti-gravity
came to be and there are four main
categories of improvements powered by a
little nano banana artwork the first one
is intelligence and reasoning you all
are probably familiar with this you use
nano or you used um Gemini 3 and you
probably thought it was a smarter model
this is good it's better at instruction
following it's better at using tools.
There's more nuance in the tool use. You
can afford things like, you know,
there's a browser now. There's a million
things that you could do in a browser.
It can literally even execute
JavaScript. How do you get an agent to
understand the nuance of all these
tools? It can do longer [clears throat]
running tasks. These things now take a
bit longer, right? And so you can afford
to run these things in the background.
It thinks for longer. Just time has
gotten stretched out. And then
multimodal. I really love this property
of what Google has been up to. the
multimodal functionality of Gemini 3 is
off the charts and you start combining
it with all these other models like Nano
Banana Pro um and you really get
something magical. So we have these
roughly four different categories where
things have gotten much better
and if you think about these properties
the question becomes what do we do about
these differences and from a product
perspective it's like how do you
construct a product that can take
advantage of this new wave and hopefully
and in my opinion this is the next step
function autocomplete chat agents and
then I probably got to come up with
something more interesting than whatever
this thing is called.
So step one is we want to raise the
ceiling of capability.
We want to aim higher, have higher
ambition.
And so a lot of the teams at DeepMind
were working on all sorts of cutting
edge research, right? There's Google is
a big big company. And one of my
learnings going from a startup to one of
these bigger companies is that there is
a team of people that is attacking a
very very hard technical problem. And as
a nerd, this is super exciting, right?
And then as a product person it's like
wow we can start using computer use. So
browser use has been one of these huge
unlocks.
And this is twofold right I mentioned
the sort of retrieval aspect of things.
Um
I guess for for software engineers there
is much more that happens that is beyond
the code right you can roughly think
about it as there's what to build
there's how to build it and then you
actually have to build it. I would say
building it has become more or less you
know it's reasonable for the model to
now given context it can generate the
code that hopefully functionally works
and then you've got the what to build
this is the part that is up to you kind
of human imagination and then there's
the how to build it right and there's
this richness in context the richness
and institutional knowledge and these
are the sorts of things that having
access to a browser having access to
your bug dashboards having access to
your experiments all these sorts of
things that now gives the agent this
additional level of context and maybe I
should have clicked before, but if you
saw on the screen, let's see, how do I
do this?
So, this is now the other side of
things. Browser is verification. So, you
might have seen this video, this is a
tutorial video that we put together on
just how to use it. But this is the
agent. The blue border indicates that
it's being in control by the agent. And
so, this is a flight tracker. You put
in, you know, a flight ID and then it'll
give you sort of the start and end of of
that flight. And this is being done
entirely by a Gemini computer use
variant. So it can click, it can scroll,
it can retrieve the DOM, it can do all
the things. And then what's really cool
is you end up with not just a diff, you
end up with a screen recording of what
it did. So it's changed the game. And
the model can take this and because it
has the ability to understand images, it
can take this and iterate from there. So
that was the first category, browser
use, just an insane, insane magical
experience. Now the second place that we
wanted to spend time is on image
generation. And we noticed this theme
when we, you know, when I when I first
started at at Google, we noticed, okay,
Gemini is spending a lot of time on
multimodal. And this is really great for
consumer use cases, right? Nano Banana 2
was was mindboggling. Um, but also for
devs. Devs are inherently this is a
multimodal experience. You're not just
looking at text. You're looking at the
output of websites. You're looking at
architecture diagrams. There's so much
more to coding than just text. And so
there's image understanding. This is
verifying screenshots, verifying
recordings, all these sorts of things.
And then the beautiful part about Google
is that you have this synergistic
nature. This product takes into account
not just Gemini 3 Pro, but also takes
into account the image side of things.
And so here I want to give you a quick
demo of um mockups. So I have a hunch
and you all probably believe this too.
Design is going to change, right? You're
going to spend, you know, maybe some
time iterating with an agent to to
arrive at a mockup. But for something
like, oh, let's build this website. we
can start in image space. And what's
really cool about image space is it lets
you do really cool things like this. We
can add comments. And so you end up
commenting and leaving a bunch of a
bunch of queued up responses. And it's
kind of like GitHub. You'll just say,
"All right, now update the design."
And then it'll put it in here. The agent
is smart enough to know when and how to
apply those comments. And now we're
iterating with the agent in image space.
So really, really cool new capability.
And what was awesome is that um we had
Nano Banana Pro, you know, we pulled an
allnighter for uh for the Gemini launch
because that was our first launch. Then
they said, "Do it again. Do it on
Thursday." So we made Gemini Pro um or
I'm getting all these model names
confused. The image Gen one, the Nano
Banana one, we made that available on
day one. I'm running on very little
sleep on day one inside of the
anti-gravity editor. And our hope is
that the anti-gravity editor is this
place where any sort of new capability
can be represented inside of our
product.
And so step two was all right, we have
this new capability. We've pushed the
ceiling higher. Agents can do longer
running tasks. They can do more
complicated things. They can interact on
other surfaces. And so this necessitates
a new interaction pattern. And we're
calling this artifacts.
This is a new way to work with an agent.
And this is one of my favorite parts
about the product. And at its core is
this agent manager.
So let's start by defining an artifact.
An artifact is a dynamic representation
of something that the agent generates.
Sorry, it's a an artifact is something
that the agent generates that is a
dynamic representation of information
for you and your use case. And the key
here is that it's dynamic.
Artifacts are used to keep the agent
organized. They can use used for uh kind
of like self-reflection and and
self-organization. It can be used to
communicate with the user to maybe give
you a screenshot to maybe give you a
screen recording like we described. And
it can also be used across agents,
whether this be with our browser sub
agent or with other conversations or as
memory. And this is what you see on the
right side of this agent manager. We've
dedicated sort of half the screen and
and your sidebar to this concept of
artifacts.
And so we've all tried to follow along
chain of thought. And I would say this,
you know, we did some fanciness here
inside of the agent manager to make sure
conversations are broken up into like
chunks. So in theory, you could follow
along a little bit better in the
conversation view, but ultimately you're
looking at a lot a lot of strings, a lot
of tokens. This is like very hard to
follow. And then this is actually like
there's like 10 of these, right? So you
just scroll and scroll and scroll.
You're like, "What the heck did this
agent do?" And and this this has been
traditionally the way that people review
and sort of supervise agents. You're
kind of just looking at the thought
patterns.
But isn't it much easier to understand
what is going on inside of this visual
representation? And that is what an
artifact is. The whole point and the
reason why I'm not just standing up here
and giving you this long, you know,
stream of consciousness is because I
have a PowerPoint. The PowerPoint is my
artifact. And so Gemini 3 is really
really strong with this sort of visual
representation. It's really strong with
multimodal. And so instead of showing
this, which of course we always let you
show, we always we will always show you
this, but we want to focus on this. And
I think this is the game-changing part
about anti-gravity.
And the theme is this dynamicism.
The model can decide if it wants to
generate an artifact. And let's remember
there are some tasks. We're changing a
title. We're changing something small.
Doesn't really need to to produce an
artifact for this. So, it will decide if
it needs an artifact. And then second,
what type of artifact? And this is where
it's really cool. There there are many
potential in potentially infinite ways
that it can represent information. And
so, the common ones are markdown in the
concept of a of a plan and a
walkthrough. So, this is probably what
you've used most most often. When you
start a task, it will do some research.
It will put together a plan. This is
much very much like a PRD. It will even
list out open questions. So, you can see
in this feedback section, it'll surface,
hey, you should probably answer these
three questions before I get going. And
what's really awesome, and we're betting
on the models here, what's really
awesome is that the model will decide
whether or not it can auto continue. If
it has no questions, why should it wait?
It should just go off. But more often
than not, there are probably areas where
you may be underspecified or maybe it
did something during research, right?
everyone has gone through and and
started a big refactor then realized
they actually don't have all the
information ahead of them. They got to
go back to the drawing board, maybe talk
to some people. Same idea. So it'll
surface um it'll surface open questions
for you. And so that's you'll start with
that implementation plan and then you'll
say all right LGTM let's like send it.
You go all the way down. It might
produce other artifacts. You know we've
got a task list here. This is the way
that you can monitor the the progress of
the agent instead of looking at the
conversation. might put together some
architecture diagrams and then you'll
get a you'll get a walkthrough at the
end and this walkthrough you kind of saw
a glimpse of this before but it is hey
how do I prove to you agent to human
that I did the correct thing and I did
it well and then this is the part that
you'll end with it's kind of like a PR
description and then there's a whole
host of other types right Images screen
recordings these mermaid diagrams and
really what's what's what's quite cool
is that because it's dynamic the agent
will decide this over time so suddenly
there's maybe a new type of artifact
that we maybe we missed Right? And then
it'll figure that out. It'll just become
part of the experience. So it's very
scalable. But this artifact primitive is
something that's very very powerful that
I'm pretty excited about. And then I
guess another question is why is it
needed? So we'll always explain to the
user what the purpose of this artifact
is. Um and then interestingly like who
should see it? So should the sub agents
see it? Should the other agents see it?
Should other conversations see this?
Should this be stored in my memory bank?
Right? If this is something that I
derived, one of the cool examples um
that I like is like if you give it a a
piece of documentation and give it your
API key, it'll like go off and run curl
requests to basically figure out the
exact schema of like what the types of
APIs you're using and it'll do this like
deep research um for quite a while and
then it'll give you a report and
basically like deeply understand uh this
sort of uh this sort of API. You
wouldn't want to just throw that away
and have to rederive it the second time
you did this. So it'll store it in your
memory and then all of a sudden that's
just a part of your knowledge base. So,
and then there's also this idea of like
notifications, right? So, if there's an
open question, you want the agent to be
proactive with you. And that's another
very cool property of this artifact
system. We want to be able to provide
feedback along this cycle. So, from task
start to task end, we want to be able to
provide feedback and inform the agent on
what to change.
And the artifact system lets you iterate
with the model more fluidly
during this process of execution. And
so, not to sound like a complete Google
shell, but I love Google Docs, right?
Google Docs is a great pattern. It's
awesome. The comments are great. And
this is how you might interact with a
colleague, right? You're collaborating
on a document. Then all of a sudden, you
want to leave a textbased comment. So,
we took inspiration from that. We took
inspiration from GitHub. But you leave
comments. You highlight text. You say,
"Hey, maybe this part needs to get
ironed out a bit more. Maybe there's a
part that you missed or actually don't
use Tailwind. Use vanilla CSS." So,
these are the sorts of comments that you
would leave. You'd batch them up and
then you go off and send. And then in
image space, this is very cool. We now
have this like Figma style drag and drop
like or not drag, you know, highlight to
select. And now you're leaving comments
in a in a completely different modality,
right? And we've done this and
instrumented the agent to ma naturally
take your comments into consideration
without interrupting that task execution
loop. So at any point during your
conversation, you could just say, "Oh,
actually, you know, mid mid browser
actuation, I actually really don't like
the way that that turned out. Let me
just highlight that, tell you,
send it off." and then I'll just get
notified when you're done taking into
consideration those comments. And so
it's a whole new way of working. And
this is really at the center of what
we're trying to build with anti-gravity.
It's pulling you out into this higher
level view. And the agent manager really
is built to optimize the UI of
artifacts.
So we have a beautiful, beautiful
artifact review system. We're very proud
of this. And it can also handle sort of
the
property that is like parallelism and
orchestration. So whether this be many
different projects, whether this be the
same project and you just want to
execute maybe a design mockup iteration
at the same time you're doing research
on an API at the same time you're
iterating and and and actually building
out your app. You can do all these
things in parallel and the artifacts are
the way that you provide that feedback.
The notifications are the way that you
know that something requires your
attention. It's a completely different
pattern. And what's really nice is that
you can you can take a step back and of
course you can always go into the
editor. I'm not going to lie to you.
There are tasks that you know you maybe
don't trust the agent yet. We don't
trust the models yet. And so you can
command E and you can command E and
it'll open inside the editor within a
split second with the exact files, the
exact artifacts and that exact
conversation open ready for you to
autocomplete away to continue chatting
synchronously to get you from 80% to
100%. So we always want to give devs
that escape hatch. But in the future
world, we're building for the future.
You'll spend a lot of time in this agent
manager working with parallel sub
agents, right? It's a very very exciting
concept.
Okay, so now that you've seen we've got
new capabilities, multitude of new
capabilities, we've got a new form
factor. Now the question is like what is
going on under the hood at Deepmind? And
the secret here is a lesson that I guess
we've just learned over the past I don't
know we've spent like or I I've
personally spent like three years in in
codegen. It's just to be your your
biggest user, right? And that creates
this research and product flywheel.
And so I will tell you anti-gravity will
be the most advanced product on the
market because we are building it for
ourselves. We are our own users. And so
in the dayto-day
we were able to give Google engineers,
deep mind researchers, we were able to
give them an early access and now an
official access to anti-gravity
internally. And so now all of a sudden
the actual experience of the models that
people are improving, the actual
experience of of using the agent manager
and touching artifacts
is letting them see at a very very real
level what are the gaps in the model.
And whether it be computer use, whether
it be image generation, whether it be
instruction following, right? Every
single one of these teams, and there are
many teams at Google, has some hand
inside of this very, very full stack
product.
And so you might notice as an
infrastructure engineer, you might say,
"Oh, this is a bit slow.
page. Well, go off and and make that
better, right? So, it gives you this
level of insight that eval just simply
can't give you. And I think that's
what's really cool about being a deep
mind. You are able to integrate product
and research in a way that creates this
flywheel and pushes that frontier. And I
guarantee you that whatever that
frontier provides, we will provide an
anti-gravity for the rest of the world.
These are the same product. And so, I'll
give you two examples of how this is has
worked. The first one was that computer
use example, right? in collaboration
with the computer use team which we sit
you know a couple couple tens of feet
away from we identify gaps on both sides
right so we're not just using an API we
are interacting across teams to
basically say oh like the capability is
kind of off here can can we go off and
figure out what's going on here maybe
there's a there's a mismatch in data
distribution and then on the other side
it's like yo your like agent harness is
like pretty screwed up you got to fix
your tools right and so then we'll go
off and we'll fix our side but it's this
harmony it's it's both sides talking to
each other that really makes this type
of thing possible. Similarly, you come
up with a new product paradigm
artifacts. Artifacts were not good on
the initial on the initial uh versions,
right? What part of training, what part
of data distribution includes this like
weird concept of reviews? And so, it
took a little bit of plumbing, a little
bit of work with the research team to
figure out, all right, let's steadily
improve this ability. Let's give you a
hill to climb. And then now we were able
to launch Gemini 3 Pro with a very good
ability to handle these sorts of
artifacts. And so it's this cyclic
nature that I'm really really betting
on.
And this this is really how anti-gravity
will defy gravity. We've got pushing the
ceiling. We're going to have an agent
with very very high level of ambition.
We're going to try and do as much as we
can. And this includes vibe coding.
Though I will say there are some
excellent products out there by Google.
AI Studio is an excellent product.
We are in the business of increasing the
ceiling.
Second, we built this agent first
experience artifacts agent manager. And
then finally, we have this research
product flywheel. And this is the magic.
And this is the three-step process that
we used in building anti-gravity.
So, it's been a blast. I mean, I've I've
been back at um AI Engineer Summit.
Thank you again, Swix and Ben, for
having me. It's been awesome to come
back every year. And so on behalf of the
anti-gravity team, I just want to thank
you for your time, for your patience as
you use the product um and your support.
And of course,
you too can adopt a TPU and help us uh
turn off pager duty a bit more. Um and
then of course, you know, you could also
yell at me on Twitter. That's another
way of doing it. Maybe do it in DMs
instead. Um but we've got a lot of
exciting things and I'm really really
excited to bring anti-gravity to market.
The team is thrilled that this is now
out in the wild. So we welcome your
feedback. Um, and thank you again for
listening. Enjoy the rest of the
conference. [applause]
[music]