Build & deploy AI-powered apps — Paige Bailey, Google DeepMind

Channel: aiDotEngineer
Published at: 2026-04-29
YouTube video id: G_bHFmEAarM
Source: https://www.youtube.com/watch?v=G_bHFmEAarM
Can everyone hear me?
>> Excellent. Awesome. Greetings, valiant
few. Um, I'm not sure how many folks
have heard, um, but there were some
electrical issues in the rest of the
building. Um so uh so y'all were the
ones who showed up early which means
that y'all are part of the few that get
to hear the talks. Um so if you don't
feel lucky uh uh just wait like you know
uh there is you are definitely uh you
are definitely experiencing um something
uh something special this morning. Um,
and then, uh, for anybody who wants to
come back and hear more or you missed
the Gen Media session a little bit
earlier today, um, we're going to be
doing a whistle stop tour of all of the
presentations, uh, for DeepMind this
afternoon. Um, so you can come back and
meet more of the team and kind of hear
more about all of the talks and all of
the technologies. Um, I also really
really love for sessions to be
interactive. So, I'm going to show you
some demos. Um, this is going to be very
demoheavy as opposed to slideheavy. Um,
and then if you have any questions along
the way, please feel free to shout them
out. It's always much more interesting
if this is uh, you know, more of a
conversation than just me like showing
stuff um, uh, over and over again. So,
uh, I don't think it's a secret. Uh, and
also I guess introductions. Hi
everybody. My name is Paige. I do I'm
one of the leads for developer relations
at Google DeepMind. Um, I've been doing
machine learning for a really long time.
I started in 2009 and was contributing
to some of the early days of open source
scientific computing libraries, things
like NumPy, Scypi, Scikitlearn. Um, and
then did product for a couple of years
and back on the engineering ladder. Um,
and really really love that now it's
really really hard to to nitpick um
what's product, what's engineering,
what's design, and what's de um and all
of the roles seem to be conflated a
little bit. Um, so I don't think it's a
secret that Google has been a little bit
busy over the course of the last while.
Um, over the last month and a half,
we've been releasing models um, so fast
I feel like everybody's got a little bit
of whiplash. Um, uh, Gemini 3.1 Flash
Live, which we'll take a look at in a
second. Gemini 3.1 Pro and Flash Light.
Um, so respectively our largest and
smaller model um, that are uh,
performant, efficient, able to do a lot
of things. um very very quickly and at
lowcost profiles. Um we actually just
had augment code if anybody is familiar
with augment code replat um their entire
agent system to default to Gemini 3.1
Pro specifically for performance plus
cost uh related reasons. Um Nano Banana
2 for image generation and editing. our
embeddings model um which is supporting
video and images and audio and text and
code all in the same embedding space. So
you can say show me all of the content
related to cats and it will show you not
just video of cats, not just images, but
also things like uh audio of a cat
purring or meowing um or like books
about cats, all sorts of stuff. Um, LIIA
3 for music generation, um, which you
saw if you were in the gen media session
just a little while ago. Genie3 for
world model building, um, so being able
to dynamically generate new worlds based
on user input. Um, our full stack
runtime for AI studio, which includes
things like databases and OOTH. Um,
Gemma 4, which is part of our open model
family. We're lucky enough to have a
member of the Gemma team um here at AIE
uh this uh uh this week. Um so
definitely um Ian, raise your hand. Um
yep, greetings. And so uh so the Gemma
uh the Gemma team, if you're interested
in open models, would be excellent to
talk to. Um and then also VO3.1 Light
for video generation at a cost profile
that's pretty compelling. Um so lots of
different stuff. Uh just a show of
hands, how many how many folks have
heard of all of these models before? Um
excellent. Uh and uh the deep minders in
the back row like hopefully uh hopefully
like that was uh like yeah the uh but if
you haven't heard of any of these by the
end of the session you'll know all about
them um and hopefully uh know which ones
you could use or consider for your
projects. Um so I don't think it's a
secret that uh that Gemini is is kind of
special in the industry. Um one of the
reasons that it's very special is that
it's multimodal both for inputs and also
multimodal in terms of outputs. Um so it
supports video, images, audio, text,
code for inputs. Um but it can also
output multiple modalities. It can
output text and code but also audio. Um
uh images. it can uh uh images and text
interled and most of the other models on
the market are only capable of handling
text and code um as outputs uh and only
things like static images as inputs. Um
so it's pretty compelling to see what
you're able to do. Um and VR APIs you're
also able to handle flexible um kind of
input formats. So you can have PDFs with
embedded images. um you can have uh you
know different types of video, different
types of audio um uh that that you can
serve as as tokens for inference.
But again, a lot cooler to see it rather
than to just have me talking about it
and waxing poetic. Um so I am going to
go ahead and go to AI Studio real quick.
um and pull up uh pull up my personal
instance of AI Studio which is uh you
know I always say it but if you see
anything embarrassing please don't judge
me. Um so this is how many folks have
used AI Studio before.
Cool. Excellent. For folks who have
never used it you can access it at a.dev
or ai.studio or aistudio.google.com.
Um it works just with your personal
Gmail account. Um, so you can get
started for free. Uh, you can select
different models here off to the right.
Um, so you can see that there are
different kind of pills here for the
kinds of modalities that you might want
to work with. So things like video for
um, uh, like video generation. Um, VO
3.1,
uh, 3.1 fast and 3.1 light are all kind
of in this tier section. Um, you can
also select the different Gemini models.
So, Gemini 3 flash preview, um,
flashlight preview. I'm going to select
that one just for the interest of time.
And you can also do things like toggle
on and configure many of these different
tools here off to the right. Um, so you
can specify things like structured
outputs, um, code execution, which we'll
take a look at in a second, function
calling, um, you can turn on things like
grounding with Google search, so just
automatically incorporate that as a
tool. um grounding with Google Maps uh
and also even things like URL context
which gives you kind of like poor man's
retrieval um uh you can have a list of
URLs and then incorporate that into the
model's context window so it can use
that um to ground some of its outputs.
Um and as I'm sure all of y'all know um
models are kind of limited based on the
data that they have as part of their
pre-training and post-training mixtures.
So, if they're only trained on data up
to a specific point, that's all of the
insight that they have out of the box
for those uh kinds of data. If you want
it to be able to answer questions that
happened after that date, um you're
going to have to give it access to tools
either through search or through
retrieval in order to do that work. Um
and again, if anybody has questions as
I'm kind of rambling along, feel free to
raise your hand and shout them out. This
is a small enough group that we can um
that it should be pretty fun. Cool. So,
I I've turned on grounding with Google
search. Um you can also add uh you can
also add media. So, you can connect to
drive, you can upload files, you can
record audio, um uh add camera footage,
link a YouTube video, link sample media.
Um and YouTube just works via URL. So
you can paste in a YouTube URL um and
have that uh be used for inference with
the Gemini models. So as an example, I
haven't tried this um so we'll see if it
works. Uh uh but we can take a look uh
to see uh to see if we can find a
dinosaur video. Um I love T-Rexes. So
this uh so this past uh weekend in the
Bay Area, we had this thing called Bay
Area Big Wheels. um which uh and it
defaults to one frame per second. You
can also specify different start and end
times. Um so just for the interest of
speed um I might specify start time is
like maybe 0 0 0
um or uh 0 seconds and then maybe end
time would be uh um maybe like 300
seconds. Um
uh and you can see that this ends up
being around 27,600
tokens for five minutes of content. Um
but I could say create a table
um with timestamps
for all of the kinds of dinosaurs
that uh come on in uh you see in this
video. No worries, it's all good.
Make sure to include a fun fact about
each dinosaur type. Um, and then hit
run. Um, but what I was saying about Bay
Area Big Wheels is that there's a big
bendy hill in San Francisco um, with a
whole bunch of very whiplash sort of
turns and everybody gets a little
tricycle and rides down it. Um, and I
did that this past weekend. Um, it's an
Easter Sunday tradition. Um, but I was
dressed as a dinosaur and was handing
out dinosaur Easter eggs. Um, so this is
very on brand. And what's happening
behind the scenes is we've turned on
grounding with Google search. Um, so we
have search as a tool which can help
inform some of our fun facts. We've got
the video that's being pulled in. So the
the first um 5 minutes worth of content.
Um, we can see the the different
dinosaur types. Um Rexy and his parents
um obviously have a lot of appearances
in this first episode. Um as well as a
brachiosaurus, a velociaptor, and a
pteranodon, which is a flying reptile.
Um and I love that it's uh I love that
it's calling out the true fact um that
uh that pteranodons are terasaurs, not
dinosaurs. Um and the uh and you can
also see the different uh the different
citations along the way. um uh from the
the URLs that are informing all of these
fun facts. Um you can also click get
code to see all of the code that you
would need in order to replicate the
experiment that you just did in AI
Studio. So it selects the appropriate
model. It shows you how you would handle
the URI for YouTube. Um, and then it
also gives you insight into the the
prompt that you can use for the video
um, in order to in order to do the work
in Python, in TypeScript, in uh, Java,
whatever your favorite language might
be. Um, and if you wanted to not use a
YouTube URL, if you wanted to use your
own video, you would be able to pass
that to the model, too. Um, it's just
really really handy to be able to pull
in a YouTube URL um as opposed to having
to do the process of downloading it and
then kind of um sending it off um
sending it off yourself. Um, and now I
also want to watch this episode of Rexie
the Little T-Rex. Um, this looks very
cool. If you hadn't seen as well in the
thinking config, you have different
thinking settings for all of our Gemini
3.1 series. Um so minimal, low, medium,
and high. If you want the model to spend
more tokens thinking, um you can turn on
high thinking. Um but and uh I often
just keep it on minimal or low just for
time sake. Um for Gemini 3.1 flashlight
uh you get a really nice uh price uh
versus uh sort of price um price
performance and uh and also speed
profile for for the models. Um so you're
not having to make big big tradeoffs
between them.
So that is how you would uh interact
with Gemini 3.1 flashlight um within AI
studio for video analysis. Um one of the
other slept upon features I think in our
APIs as well as in AI studio in general
um is compare mode and also code
execution which we see here off to the
right. So, if I turn on code execution,
um what we do is we give Gemini a
sandboxed environment with Python and a
whole bunch of data science libraries
pre-installed um where it can kind of
pull in um those libraries as tools to
to kind of help solve arbitrary data
science tasks. Um and since this is uh
sort of giving the model access to it in
a sandboxed environment, you don't run
the risk of having any of this impact
your local environment, which is quite
nice. um the uh so as an example um if I
select Gemini 3.1 flashlight preview
turn on code execution go into compare
mode I might try to compare it against
Gemini 3 flash preview also with code
execution um and then one of the things
that you can do um and we'll see if this
works um is you can select a picture so
this is just some Lego bricks
um what I could do is paste this in
we can make sure that it's uh secure and
safe um uh for the corporate overlords.
Um this image itself is around 1,000
tokens, but I could say something like
draw bounding boxes
around all of the green Lego bricks uh
using Python.
um make and then maybe display the image
with with bounding boxes
and hit run.
And what should happen is that we see a
head-to-head comparison of the two
different models. Um Gemini 3.1
Flashlight was able to get it right out
of the gate. Um which is pretty wild. So
this super super tiny model worked
really really fast. wrote the Python
code to uh to pull in the image to
analyze it and to define the bounding
boxes. Um and then if you hover over the
token consumption, the amount of dollars
required to do this work is pretty wild,
right? Like so so being able to uh being
able to pull in an image, do this kind
of analysis, you could have also asked
for things like segmentation masks. um
you could have asked to count um
specific kinds of entities in the in the
photo. Um again using bounding boxes or
something similar. Um and all of this
was done at well under a fraction of a
penny. Um so so strongly strongly
recommend experimenting with the uh the
smaller weight models especially turning
on these tools to help them do their
work more effectively. Um, and the like
you can also see that Gemini 3 flash
preview got to the got to the same
answer. It just took a little while
longer. Um, and then the the cost um is
slightly more but still well under a
penny. Um,
cool. Um, so this uh that's compare mode
again using Gemini 3.1 flashlight um
just with the addition of code execution
along the way.
um for folks who might be interested in
URL context, just because I know that
this is this is something that we've
heard quite a bit about from um from
folks that are that are using the the
Gemini APIs pretty regularly. If you
turn on URL context, um you can do
things like add URLs. So, I'm going to
pull in a URL for a blog post about GEMA
4. Um, which was released just recently
last week after the models training data
cut off. Um, I'm going to pull in uh,
you know, a blog post about Genie 3. Um,
also after the models training data cut
off. Um, and I could say something to
the effect of compare and contrast.
Um, uh, Genie 3 and Gemma 4. Um, tell me
uh how they're similar uh different or
completely unrelated.
Um, and they're mostly completely
unrelated, but we'll see what the model
thinks. Um, hit uh maybe turn on medium
for the thinking level. Um, and then hit
run. And what we should also see is that
the model is able to give its output but
site each one of the sources that it's
using in order to make its assumptions.
Um, so you can see the different sources
down at the bottom. Um, the two URLs
that I had used, you can use, u, you
know, many, many more than just two. Um,
but it cites each one of the the sources
in line as it's making assessments along
the way. Um, and so you can use publicly
available information. And then there
are also tools within Vert.Ex s that
allow you to do retrieval on custom um
custom documents that are internal only
um without necessarily having to set up
a vector database for for retrieval.
And again, if you click get code, it
gives you all of the code that you would
need to replicate what you're what
you're doing in the uh in the AI Studio
interface.
Cool. So, we've talked about uh we've
talked about the Gemini 3.1 series of
models. Um you can also use uh you can
also use 3.1 Pro Preview which is our
largest um model series. It's also our
most expensive our our kind of uh by
virtue of being larger. It's also a
little bit slower. Um Gemini 3 Flash is
kind of the workhorse. It's the the
model that we see most commonly used in
production. And then Gemini 3.1
flashlight is also gaining traction
especially for people who are using
Gemini 2.0 flash or 2.5 flash. Um and
then for our Gemini Live series, how
many people have seen or heard of Gemini
Live before?
Cool. Couple of hands. Like thank you.
Thank you deep minders for uh for like
you know like paying
like like paying attention and also for
using the models. Um Gemini live gives
you the ability to share your screen,
share a video feed, share audio with a
model and have a conversation with it
dynamically. um including things like
adding custom function calls um
generating automatic function responses
and then also doing things like
grounding with Google search. Um, so I
can turn on grounding with Google search
again, um, just as a tool that the model
can call. Um, and then I can give the
model access to my screen to help me
either navigate an app, um, to help me
answer questions about what it sees. Um,
and to do that in a broad variety of
languages, accents, etc. Um, so as an
example, I can start screen sharing.
Hey there, Gemini. Uh, can you tell me
what you see on the screen?
I see a Google search for Lego bricks
and pieces. There are tons of images of
colorful Lego bricks, sets, and bulk
collections from various websites. On
the right, there's a larger image
featuring close-ups of red, yellow,
blue, and green bricks. Are you looking
for something specific on this page? And
do any of y'all does anybody in the room
speak a language that is not English?
>> Yeah, Gom like you would you would not
uh like you you would not be a reliable
narrator. Like but at least like last
night we tried this and one of our
colleagues yelled out yes uh very
confidently um a language that they do
not speak and so we did not have anybody
to verify but I saw a couple of hands on
the front row. What what languages do
y'all speak?
>> Italian.
>> Italian. Okay. So, I'm going to Are you
okay with verifying if the model is is
like saying true things in Italian?
Okay, cool. Cool. Thank you for being
our fact checker. Hey, Gemini. Uh, could
you tell me what you uh could you tell
me what you said again? Just tell me in
Italian. And then also, could you tell
me what the weather is like in London
today? Also in Italian.
It was good. Excellent. Very cool. Um,
and so, so you can you can ask for
different languages. You can also say,
"Hey, Gemini, can you tell me a poem
about what you see on the screen, but
could you uh could you tell me it using
a Texan accent?"
>> Well, howdy there. Let me take a gander
at this here screen. I see piles of Lego
bricks, a colorful commotion, like a
tiny town fixing to rise from a plastic
ocean. Red and yellow, blue and green in
batches big and small. Ready to build a
castle tall or maybe just a wall. Folks
want to buy them, sell them, trade them
from all over the place to put a smile,
I reckon, on a little builder's face.
Anything else you need help with,
partner?
>> That's giving me flashbacks to my
hometown. the um but uh you can change
uh you can change all of this
dynamically just uh by asking naturally
within the flow of conversation um so
you could imagine practically a scenario
like perhaps you have um an entryway in
a bank um and there's some sort of a
screen somebody comes in starts speaking
in Spanish or start speaking in their uh
you know the language that they feel
most confident in and the model's able
to dynamically respond and answer their
questions in a language that's familiar
to them um Or you could uh kind of
specify within system instructions
um what the uh what language, dialect,
accent, style you might want the model
to adopt. Um so if you only want the
model to respond in a specific language
or a specific style or with a specific
tone, um strongly strongly recommend uh
modifying the system instructions. And
same as always, if I click get code, you
see all of the code that you would need
to use to replicate the experiment that
you did um within the UI. Um so you can
see the the media resolution settings um
the settings for compression um and all
of that kind of uh kind of incorporated
in naturally. Um I can also do things
like share video feeds. So
hey there Gemini. Uh, how many fingers
am I holding up?
>> You're holding up two fingers.
>> What about now?
>> That's a thumbs up.
>> Yep. Cool.
And so uh so big big uh kind of uh
spectrum of things that you can
accomplish with Gemini Live and again
just a very very low price point
compared to other solutions that make
you kind of stitch together the
speechtoext LLM understanding and
texttospech pipeline with all of the
video content inputs and outputs all by
yourself. Um we have another feature
like I I always feel like um whenever
I'm describing AI studio I'm just like
and also and also and also you can do
all these other things. Um we have
another feature called build which if
you've played with vzero.dev or lovable
feels very similar. Um it gives you the
option to to kind of create and deploy
um and to share um a whole spectrum of
apps. And now we have even added support
for things like databases and
authentication. Um so you can add a
database, you can add um login with
Google, you can add custom API keys um
that are all kind of kept secure for
you. Um and you can also uh of course
create and edit existing apps. Um, so
you saw a little while ago uh some
examples using um using music uh um from
uh from LIA 3 which is exciting. Guilm
who created the LIIA studio app is in
the is in the back today um and is uh
like these are all really really
fascinating to play with if you haven't
had a chance to experiment with some of
the generative media models. Um you can
see some of the examples with Nano
Banana 2 as well. Um, and also with
media pipe. So, as an example, if I
click on this app, um, you can, uh, see
that it's requesting camera access. This
is a game that's taking in, um, kind of
the the location of my hand. So, I can
grab, uh, grab and kind of, uh, we can
all find out that I play this game
really badly. Um, but you can uh sort of
play um the uh the game and then also
inspect all of the code that's used to
create the app itself. Um, but for the
purposes of this, I'm going to just show
you how you can get started with
creating an app from scratch just based
on anything that you could possibly
imagine. Um, so and I'm going to use
database and authentication. Um so so we
can uh so we can sort of uh add uh fire
store and authoriz um sort of the Google
login with Firebase. And I will click
this little speechtoext microphone that
we have here. So uh it's easier than me
typing out all of the details. Um, but
as an example,
I create an app that allows me to upload
uh uh to upload a picture of a
bookshelf. Um, the bookshelf uh should
have a lot of books in kind of profile
view so we can see all of the spines and
maybe some information about um the
titles of the books, uh the author's
names, etc. But the app should use
Google search grounding to add more
information. So what we should get is
like the author name, um the the title
name, a description of the book, kind of
what the the category of the book might
be. Um and it should uh the app should
ask the user to log in um with their
Google login. it should save all of that
information for the user um to uh to a
database um and uh you know we should be
able to to have that persist. So it's
it's basically like you take a picture
of your bookshelf um and it
automatically catalogs all of your books
for you
which is a lot like that uh that in
theory would have been a startup you
know three four years ago. Um, but uh
this looks reasonably correct. Um, so
I'm going to go ahead and click build.
Um, and what's happening behind the
scenes is you can see Gemini 3.1 Pro
Preview kicks in. Um, it starts thinking
and planning about what would be needed
in order to create this app. Since it's
doing a lot standing up a database, like
thinking about authentication, it's
going to take a while. And while it is,
um, we're going to be uh kind of going
to uh going to show another couple of
demos. Um, so, uh, so we can let the let
the model cook in the background. Um,
and then if it needs to, if it needs me
to take any actions, there will also be
like a little ping, um, so we can we can
hear it in the background. Um, just in
case uh just in case along the way. Um,
so I am going to minimize this a little
bit. Um, and I'm going to pull up my uh
pull up my other browser window. Um and
we're going to take a look at project
Genie. Um so how many people have heard
of Genie before? Yep. Excellent. So all
of the hands in the back row, thank you.
And then uh also uh also a few folks uh
here in the audience as well. Um Genie 3
is Deep Mind's model for generating new
worlds. So you can describe a kind of a
scene um describe a character and then
actively experience it uh with each
frame generated dynamically. No physics
engine behind the scenes, no Unity, no
Unreal Engine, just each frame generated
dynamically pixel by pixel. You can
navigate it using the arrow keys off to
the left. So the um the WD keys um
within the the Genie app. Um, and you
can also change the video perspective
using the arrow keys. Uh, do things like
click the space bar. Um, but it's
everything from this like volcanic
landscape where you're navigating with
uh with kind of a a rover um to things
like navigating a watery landscape on a
jet ski. And you can see that if you hit
one of these lights um it actually
responds as if there was some sort of a
physics engine based on its uh training
data and other information that it's
seen along the way. Um, it also sounds
like AI Studio might have done
something. So, we'll take a look at that
in just a second, too. Um, and then, uh,
you can also see things like hurricanes
and what it would be like to experience
a hurricane in Florida. Um, uh,
jellyfish, you know, and these thermal
underwater, um, situations.
Um, just really wild and very magical
sorts of experiences. Anything you could
uh, anything you could create. Um, so
let's take a look at what AI Studio is
asking me for. So it wants me to enable
the Firebase database. Um, and it looks
like it's setting that up. So that seems
good. Um, like it's on track. Um, and
I'm going to head over to Project Genie
and we're going to explore and create a
world. Um, if I could sign in.
We're very big on security
for good reason.
Yeah.
Then uh so we have the uh we have the
option to create an environment to
create a character and since I am
feeling homesick after hearing that like
Texas twang about uh uh about the poem
for Lego bricks. I'm going to say um
Big Bend National Park
in Texas
um in the middle of the summer. Um
sunshining
in the sky. Um but all of the uh rock
formations
are made out of Lego bricks. Um, and the
uh the sky has a rainbow. Um, a
quadruple rainbow.
Um, why not? And that uh that I can
guarantee you is is not like a a
situation um that that exists in actual
Texas. Um
ground is uh sandy and dusty. Um, and
then maybe the character is
um,
uh, what would be a good idea for a
character?
Um, ostrich
with a rocket blaster. Um, and goggles.
Maybe make it pink. So, pink arr.
Cool. I don't think Texas has ever had
that. So, uh, we'll see. We'll see what
gets uh what gets created behind the
scenes. Genie3 is actually a composition
of models. So it's not just one model.
It's nano banana VO um Gemini to help
with prompting all kind of stitched
together along with some really really
interesting approaches towards
distributed systems and compute. Oh my
gosh, this is amazing. Um like I
immediately want a YouTube video about
this guy. Also, we see some Lego brick
rock formations. So, let's create this
world. Um, and then what we should be
able to do is navigate through it again
using the arrow keys. Uh, the arrow keys
to change the visualization, the views,
and the WD keys um to navigate the the
little dude around the world. Um um so,
so we've got uh we've got the the
um like a couple of little options for
the ostriches. um each one moving. Um so
you can see that it also seems to have
given him like very very muscular arms
like maybe maybe it uh wants the the
ostrich to um uh to to kind of uh be a
like a military a military grade
fighter. Um but you can see it you can
see it walking around um navigating the
Lego bricks. And then if I turn around,
um, let me see if I can find my way out
of this rock formation. Um, you can even
have it investigate some of the some of
the scenes. So, we've got the the
rainbows. Um, if I'm remembering
correctly, if you walk towards this
canyon, there should be a river at the
bottom. So, we can try to make him jump
into the canyon. Um but uh but all of
this is kind of captured again just
dynamically
um by the uh by the Genie 3 by the Gen3
model harness itself. Um so come on. Oh
no, I didn't make it in time. Um but
it's it's really interesting to to see
some of the things that you can build.
one of our colleagues um furer on
Twitter. So FR
um
created a game where you're a fish and
you have to escape a kitchen and you're
just like bouncing along as a fish uh
you know trying to trying to get out um
before it's dinner time. Um so so really
really cool to to be able to see some of
these things in action. Um Genie 3 is
not currently available as an API just
yet. Um but the team is uh you know
actively thinking about a trusted tester
program. Um and today you can access
Genie3 through a an ultra subscription.
Um though the ultra subscription is only
available with Genie in a in a select
number of countries. So I strongly
recommend um taking a look at that.
>> Yep. Question. My understanding is like
very
>> No,
no, you would not be able to create the
3D game meshes or pull them uh pull like
this ostrich dude in as an asset for a
game. It is just the pixels. Um so we
have seen people um couple together
things like the the images that are
generated with nano banana and and kind
of use additional techniques to turn
them into 3D assets. Um but uh but that
does require additional work. This isn't
automatically creating the 3D assets for
the games themselves.
But it's a it's a really really good
question. There are some other companies
um there are some other companies that
are taking different approaches for
world model building. So Fee Lee's
company as an example at World Labs uh
is taking a a different approach towards
building out these uh these environments
that do incorporate more of kind of like
the Unity Unreal Engine style asset
generation. Um but but I think it's uh
longer term um as all of the models seem
to converge on many input modalities,
many output modalities, um we'll
probably see all of that kind of
converge as well. Um, and so I I
wouldn't be surprised if in the future
there would be an opportunity to have
like video as an ingested thing for a
model and then 3D world or like the code
for it produced externally. Yeah,
especially given that with Gemini today,
you can already um kind of give it a
give it an image and then say please
create like an SVG of this image. Um,
and it can do it pretty well, which
actually might be a fun demo. So like
but one I've never tried before. So like
let's see let's see if it actually
works. Um and hopefully is not just me
um pretending that it does. Uh so but
what you can do is if we making sure I'm
still sharing my screen. Cool. Um,
>> well, so but that but that benchmark has
gotten saturated, right? Like the so so
uh I'll take the Lego bricks uh the Lego
bricks photo that we had just used um
and say something like create an SVG
um of this image
um SVG representation of this image
which is a very very simplistic prompt.
I could probably get a lot better um I
could probably get a lot better results
by asking Gemini to expand upon this
prompt um as opposed to as opposed to me
just kind of like spitballing a really
really simple one. Um so if we don't get
great results, we'll ask Gemini to
rewrite our prompt to um to improve it.
And so we can see the the thinking kick
in. one of my most favorite hackathon
projects ever. Um they um created uh
they used Nano Banana actually to take
an input image and then to show step by
step how you would be uh how you would
draw it um um with the the different the
different stroke marks along the way. Um
but we can see that it's thinking
through the perspective. Um it's
defining the bricks. It's thinking about
the dimensions of the bricks. It's
calculating a grid, defining some colors
since I turned on the thinking level to
be high for the Gemini 3.1 Pro model.
Um, it's doing an awful awful lot of
thinking about uh simulating the the
rotations and the transformations.
Um, we can see see that happen along the
way. It also sounds like AI Studio um
has an update um for the bookshelf
cataloger. So, let's take a look at that
while the SVG is generated. Um, Firebase
terms accepted. Uh, let's retry,
um, to see to see what it's doing. And
it looks like it was able to create some
code for the TypeScript, the CSS, etc. I
wonder if because I started using Gemini
3.1 Pro in a different tab. Um maybe it
got a little bit tired,
but we'll see. Um I also really really
love that you can experiment with the
generative media models in AI Studio. So
if you were here for the earlier session
um you saw Guiam share a lot about Lria
about uh our nano banana models about
VO3.1 light um and so as an example with
nanobanana 2 um you also have the option
to do things like image search grounding
so you can turn on image search and it
will kind of reverse image search and
bring back um things that are that are
tightly aligned um with what you're
asking for. So, as an example, I could
add sample media for this um like this
cute little dog. Um maybe sample media
for
um
let's see,
sample media for Hey there, greetings.
>> Welcome. like the uh uh sample media for
um maybe this uh outside uh this very
very nature friendly um location and
then say something like show me the dog
in the middle of the natural park um
with a can of Celsius
um which if you have never had Celsius
like bless your heart like that seems
like a great life. Um, Celsius is like a
notoriously
uh a notoriously disgusting or at least
from my perspective it's pretty
disgusting. Um, but very popular at
hackathons caffeinated beverages that
tastes um that that tastes a little bit
like battery acid. Um, at least to me
like I'm sure it tastes delicious to to
many other folks. Um, it's also very low
calorie. So, so it's uh it's a little
bit like a Red Bull alternative, but
I've given it a picture of a dog, the
picture of this natural scene. I've
turned on a reverse image search. Um, so
it should be able to pull in details
about what a Celsius can might look
like. Um, and it's thinking through the
assignment. It's got my little dog in
the natural scene with a can of Celsius.
Um and you can also if you hover over
the token consumption see that uh in
comparison to the nano banana um the the
kind of pro model or protier model um
it's much much more cost effective than
uh than previous iterations.
So uh if you're if you're interested in
using the Nanobanana series, Nano Banana
2 is a good one to get started. And just
as always, if you click get code, um it
gives you the code that you would need
to uh to replicate whatever you just did
in the AI studio UI just using
TypeScript or Python or whatever it
might be.
>> This is true like the if you if you want
the uh just as as always if you uh
change the thinking settings to be
minimal or low um the model will give
you a response much more quickly.
Whereas if you ask it to think, it will
spend um a lot of time generating tokens
for planning and for reasoning about the
task that you've described.
Cool. And so let's go back to this SVG
representation. It looks like we've got
a first pass. Um so I'm going to copy.
Um I'm going to go to an SVG visualizer.
Um just an online one. Um and then paste
in that. And it looks like we've got our
Lego bricks. They're a little bit
distorted. Um, but they but they look
pretty reasonable, honestly. Um, and
then the, as a reminder, the the picture
that we were trying to replicate is this
one. And it was able to get all of the
different kinds of Lego bricks just not
in the right configuration setting. Um,
so it's really really cool to see um
that you can kind of pull in an image.
Um, and then with a this was a very very
simple prompt, but with a a much more
detailed prompt, you would probably be
able to to get a much better
representation. I'm also curious like if
I turn on code execution
um like I wonder if it would be able to
have um I I wonder if it would be able
to invoke code execution as a tool call
in order to do that more effectively.
Um, so we'll see that. We'll see that in
a second. Um,
so it does look like it was able to uh
it does look like it was able to pull in
an appropriate library to to think
through the um to think through the
process of generating um generating SVGs
or an SVG for the image.
And it's even doing the segmentation.
This is very cool. Um,
and for folks who came in a little bit a
little bit later, code execution is a
tool um, automatically invokeable via
the API that'll gives Gemini the option
to to kind of create a sandboxed Python
environment with a whole bunch of data
science libraries pre-installed and it
can invoke those as kind of subtools
um, within the environment.
Awesome.
Very very cool. So, uh we're also still
uh building out the the uh the bookshelf
visualizer. It looks like it's creating
the the Firebase blueprint um as well as
some of the rules. And so, if we go back
to code, we can see all of this um
getting generated along the way. Um,
another thing that I I strongly strongly
recommend folks take a look at if you've
uh if you have interest is our video
generation city uh series. So, um, we
have a new model called VO3.1 Light that
also gives you the option to create, um,
really, really nice stock footage backed
with audio. Um, as well as, uh, as well
as basically anything that you would be
using the the larger tier series of BO
to do um, just with the the model
itself. Um, so as an example, uh, let's
go to, uh, let's go to Gemini and ask it
to help us generate a prompt. Um, I'm
just going to to turn on thinking to be
low, um, and say something like create a
prompt for a video generation model. Um,
to generate stock footage
uh, for a vegan basketball themed uh,
food truck. uh uh make sure that the
food options are Warriors uh themed um
which is uh which is a San Francisco
which is a San Francisco team and then
hit run.
And then what we're going to do is we're
going to take this output prompt
um and then put it in uh put it in VO3.1
light.
Hit run. You can see that the output
resolution is set to 720p.
You have a couple of different options
for output resolution, not 4K, um, which
is something that, uh, that you would
need to use kind of a higher tier video
generation model for. You can also
specify different aspect ratios. So 16x9
or 9x6 if you want more of a mobile app
experience. And you can also uh, sort of
configure the video duration. So if you
want 8 seconds versus if you want um you
know something a little bit more concise
like four or six seconds um you can pull
that in. Um as well as uh this is a paid
tier model so you would have to attach
an API key in order to use it. Um the
handy thing or another handy thing about
AI Studio is that if you expand the
settings off to the left um you can see
there's a section called get API key.
Um, and if you click get API key, you
can create um you can create one that's
acceptable for free tier use just out of
the box without having to Oh my gosh,
Chef Curry. This is amazing. Chef Curry.
Uh,
Splash Brothers.
And that does look like tofu, like tofu
barbcoa with kale and with avocado and
with edamame. Like I would, oh my gosh,
I love this.
Ah, no kidding. Like, and I am
absolutely going to send this to
somebody I know because their dream is
to start like a vegan basketball food
truck. Um, the as well as like a custom
vegan nut butter business, which I think
would be a really like apparently nut
butters have like a 50 to 60% margin.
Um, so if any of us need like a a hobby
plan, uh, like maybe maybe cultivating
some of these culinary hobbies is a good
is a good one to take. Um, the another
thing that I I want to make sure to
mention, we talked about it a little bit
is, and uh, we have Ian from the Gemma
team also available in the back. Um,
he'll come he'll be coming back uh,
later this afternoon to discuss as well.
Um, but we just recently released our
Gemma 4 series of models which are
extremely extremely powerful. Um, so
they're they're able to punch far above
their weight um, in terms of the the
parameter size and the compute footprint
associated, but they're uh, but you can
use them via the APIs in AI Studio as
well for free. Um, so if you if you want
to be able to test out the Gemma series
of models, you can have this kind of try
before you buy experience within AI
Studio um before downloading them to
your own infrastructure. Or if you don't
necessarily have a spare GPU at home um
hiding out in your closet, uh you can
just kind of uh you can ping it through
the AI Studio interface as well. Um, and
if you click, um, I'm going to, uh, do
another another prompt. Um, and then
pull in just, uh, pull in just an
example image. Um, the the Gemma for
models also support multimodal
understanding. So, they can analyze
audio or video or images. You can say
something like um, generate a brief
description
um, of this image.
turn thinking level to minimal.
And then the the Gemma models are are
pretty fast as well. Um so if you if you
need a lighterw weight model accessible
via an API that you can work with for
free or if you need a model that you can
download, use on your own
infrastructure, fine-tune and run for
free with an Apache 2 license, um the
Gemma 4 models are an incredible option
for you to try.
Um they also run on mobile devices for
the smallest versions. So you can have
one locally downloaded to your pixel. Um
the next series of pixels uh like pixel
10 should have Gemma already added to
it. Um and then Chrome as a browser is
also incorporating the Gemma models.
Cool. So we've uh seen the vegan warrior
food truck. We've seen Genie 3. We've
seen our open model family. um some uh
some Lego bricks and pieces. Um it looks
like the the AI Studio app is still
cooking a little bit. Um and uh one of
the other things one of the other things
that was mentioned was um one of the
other things that was mentioned
was uh the Liium model um which is also
available via AI studio. Um, so if we go
to audio,
uh, you can see a couple of different
models that are available to try via
API. So, LIA 3 Pro preview, LIA 3 clip
preview. Um, so as an example, uh, if I
click on this guy, um, you can see the
the kind of some of the the automatic
templates that you can use with it. So,
acoustic folk, um, '90s all rock, etc.
But I really really love this app that
Guom built um which you can find in the
gallery and can also remix to your
heart's content
and it incorporates uh different sound
configurations.
Um so if we preview this guy uh you can
see an option to create your own sound.
So, a clip um maybe electronic
uh dancable
um uh
vegan food truck
uh vegan basketball food truck
um and Legos.
Uh and then
we talked about Italian. What language
do you speak, sir? In the front row.
>> Oh,
>> yep.
>> Spanish.
>> Spanish. Excellent. Uh Spanish.
um uh lyrics in Spanish. Um and then
create.
And we should see the the clip start
synthesizing.
Um it looks that does look pretty pretty
Spanish. Um and we'll see we'll see what
it means for electronic. And oh my gosh,
that's amazing.
That is so cool. Um
Fore.
This is
>> advertising agency,
>> you know, like
>> well, so so we've got a video for it.
We've got a we've got a theme song for
it. Like clearly this is something that
we should all be like uh like our post
ASI plan is now like we're going to
start a vegan food truck that's
basketball themed and Legos. Um but this
is uh this is our LIA 3 model. Um uh we
had a session a great session about
generative media just before this um led
by Guiam. Um so if you missed it uh it
should be recorded and you can watch it
you can watch it afterwards and we'll
also be talking a little bit about it in
the workshop later this afternoon. Um
but again all of the code is kind of
available for the app so you can
experiment with it and test it out. Um
and then we'll also take a look at the
Oh cool. So it looks like our bookshelf
cataloger is done. Um, I'm going to go
ahead and sign in with Google. Um, so it
should ask me to log in with my personal
Gmail account. Um, we're going to
continue.
So, it's signed in as me, which is
great. Um, we're going to upload a
photo. So, I'm going to find a bookshelf
um with books on it.
A smaller one to make it a little bit
easier.
Let's go with I've tried.
So, we'll see what this one looks like.
Yep. So, this one has some uh this one
has some like handwritten style text
that I want to see if the the model will
be able to pick up on. And also, you
can't really see some of the some of the
author names. So, I want to see if it'll
be able to um sort of figure out what
the what the book title is, even though
it can't see everything on the spines.
Um I'm going to upload this photo that
we just downloaded.
And it shows the latest upload.
It's figuring out the book details
and it's adding all of them. So, it's
it's figured out the the different kinds
of books, the the name of the authors,
um the the descriptions of the books. Um
and then if I sign out and sign back in
again,
it should be able to also persist. Yep.
So, it persists all of the books that I
had on my shelf. Um, and then if I
wanted to share it with all of y'all,
um, uh, and copy the link,
um, uh, make public.
So public anybody can access, um, if
anybody wanted to, uh, like QR code
generator.
Um yeah, if anybody wanted to try out
this uh this bookshelf um app
themselves, you can access it by uh by
trying out the QR code there um and
going to it. Um which is pretty wild,
right? Like it's also a one button click
deploy to deploy to Cloud Run. Um though
uh like in the interest of not burning
up my quota too awful much uh like that
is I will refrain from doing it for this
app in particular. Um
but those are those are most of the
things that I that I wanted to show. Um
so let me go back to the slides again. I
hate slides. Like I'm pretty allergic to
them. Um we'll see uh we'll see how this
works.
Um, so we've talked about LIIA.
Um, another thing that you can use the
Gemini live model. So that real-time
interaction model that we were just uh
that we were just playing around with is
in robotics. So this is a robot called
Pupper. Um, it is completely uh like
open sourced. You can 3D print all of
the parts. It's running Raspberry Pi.
All of the software is open sourced, but
it's using the Gemini models behind the
scenes for things like object detection
and and um to to be able to to respond
to its environment. You can also run
Gemini Live uh with the Pupper. Um you
can use it to to kind of flexibly tell
the robot um what to do. And the the way
to orchestrate this isn't having Gemini
Live um control the robotic actions. you
would have it kind of build the plan and
then invoke models that might be local
on the robot in order to to do things
like um pick up specific items, but you
can use Gemini to build the plan to to
accomplish those tasks. Um and then also
things like augmented reality. Gemini
Live is great at giving directions, at
kind of responding to things that it
sees, at describing, you know, how to do
math that might be on a whiteboard. Um,
and even uh, you know, enabling things
like um, real time transcription of uh,
you know, if somebody's speaking to you
in Chinese, being able to to transcribe
just in English um, what the what the
person is saying. Um, so lots of really
really cool things are capable um, with
these multimodal systems. Um, and with
that, uh, it feels like a good place to
stop to ask for questions. Um, and to
also, I know I'm the only thing standing
in between all of us and lunch, like
hopefully hopefully get us all uh to the
cafeteria or the the session with the
food a little bit early. Um, does
anybody have any questions? Did anybody
learn anything new? Um, cool. Cool.
Yeah.
>> So, so yes, uh there there uh not not so
much a code codeex, but there but there
is uh there is a plan um to have an AI
studio app which Logan has alluded to um
at least a few times on Twitter. Um so
stay tuned. Um, stay tuned. Uh, it it
should be it should be interesting to
see. Um, and the the team's very
excited.
>> Excellent. And you can also use the
Gemini APIs with um all of the things
that you know and love like OpenClaw. Um
uh we have a colleague Gali who is very
emotionally invested in his telegram
plus Gemini setup um and uses it all the
time to uh to invoke uh like workspace
actions and to and coupled with Google
search. Um so definitely especially
given the free tier for the Gemma models
and for some of our Gemini models um
Gemini plus Open Claw is a is a good
path forward.
Cool.
Excellent. Well, thank you'all all for
coming. Thank you all for being early as
well. And then hope to see you tomorrow
and later this afternoon.