Build & deploy AI-powered apps — Paige Bailey, Google DeepMind
Channel: aiDotEngineer
Published at: 2026-04-29
YouTube video id: G_bHFmEAarM
Source: https://www.youtube.com/watch?v=G_bHFmEAarM
Can everyone hear me? >> Excellent. Awesome. Greetings, valiant few. Um, I'm not sure how many folks have heard, um, but there were some electrical issues in the rest of the building. Um so uh so y'all were the ones who showed up early which means that y'all are part of the few that get to hear the talks. Um so if you don't feel lucky uh uh just wait like you know uh there is you are definitely uh you are definitely experiencing um something uh something special this morning. Um, and then, uh, for anybody who wants to come back and hear more or you missed the Gen Media session a little bit earlier today, um, we're going to be doing a whistle stop tour of all of the presentations, uh, for DeepMind this afternoon. Um, so you can come back and meet more of the team and kind of hear more about all of the talks and all of the technologies. Um, I also really really love for sessions to be interactive. So, I'm going to show you some demos. Um, this is going to be very demoheavy as opposed to slideheavy. Um, and then if you have any questions along the way, please feel free to shout them out. It's always much more interesting if this is uh, you know, more of a conversation than just me like showing stuff um, uh, over and over again. So, uh, I don't think it's a secret. Uh, and also I guess introductions. Hi everybody. My name is Paige. I do I'm one of the leads for developer relations at Google DeepMind. Um, I've been doing machine learning for a really long time. I started in 2009 and was contributing to some of the early days of open source scientific computing libraries, things like NumPy, Scypi, Scikitlearn. Um, and then did product for a couple of years and back on the engineering ladder. Um, and really really love that now it's really really hard to to nitpick um what's product, what's engineering, what's design, and what's de um and all of the roles seem to be conflated a little bit. Um, so I don't think it's a secret that Google has been a little bit busy over the course of the last while. Um, over the last month and a half, we've been releasing models um, so fast I feel like everybody's got a little bit of whiplash. Um, uh, Gemini 3.1 Flash Live, which we'll take a look at in a second. Gemini 3.1 Pro and Flash Light. Um, so respectively our largest and smaller model um, that are uh, performant, efficient, able to do a lot of things. um very very quickly and at lowcost profiles. Um we actually just had augment code if anybody is familiar with augment code replat um their entire agent system to default to Gemini 3.1 Pro specifically for performance plus cost uh related reasons. Um Nano Banana 2 for image generation and editing. our embeddings model um which is supporting video and images and audio and text and code all in the same embedding space. So you can say show me all of the content related to cats and it will show you not just video of cats, not just images, but also things like uh audio of a cat purring or meowing um or like books about cats, all sorts of stuff. Um, LIIA 3 for music generation, um, which you saw if you were in the gen media session just a little while ago. Genie3 for world model building, um, so being able to dynamically generate new worlds based on user input. Um, our full stack runtime for AI studio, which includes things like databases and OOTH. Um, Gemma 4, which is part of our open model family. We're lucky enough to have a member of the Gemma team um here at AIE uh this uh uh this week. Um so definitely um Ian, raise your hand. Um yep, greetings. And so uh so the Gemma uh the Gemma team, if you're interested in open models, would be excellent to talk to. Um and then also VO3.1 Light for video generation at a cost profile that's pretty compelling. Um so lots of different stuff. Uh just a show of hands, how many how many folks have heard of all of these models before? Um excellent. Uh and uh the deep minders in the back row like hopefully uh hopefully like that was uh like yeah the uh but if you haven't heard of any of these by the end of the session you'll know all about them um and hopefully uh know which ones you could use or consider for your projects. Um so I don't think it's a secret that uh that Gemini is is kind of special in the industry. Um one of the reasons that it's very special is that it's multimodal both for inputs and also multimodal in terms of outputs. Um so it supports video, images, audio, text, code for inputs. Um but it can also output multiple modalities. It can output text and code but also audio. Um uh images. it can uh uh images and text interled and most of the other models on the market are only capable of handling text and code um as outputs uh and only things like static images as inputs. Um so it's pretty compelling to see what you're able to do. Um and VR APIs you're also able to handle flexible um kind of input formats. So you can have PDFs with embedded images. um you can have uh you know different types of video, different types of audio um uh that that you can serve as as tokens for inference. But again, a lot cooler to see it rather than to just have me talking about it and waxing poetic. Um so I am going to go ahead and go to AI Studio real quick. um and pull up uh pull up my personal instance of AI Studio which is uh you know I always say it but if you see anything embarrassing please don't judge me. Um so this is how many folks have used AI Studio before. Cool. Excellent. For folks who have never used it you can access it at a.dev or ai.studio or aistudio.google.com. Um it works just with your personal Gmail account. Um, so you can get started for free. Uh, you can select different models here off to the right. Um, so you can see that there are different kind of pills here for the kinds of modalities that you might want to work with. So things like video for um, uh, like video generation. Um, VO 3.1, uh, 3.1 fast and 3.1 light are all kind of in this tier section. Um, you can also select the different Gemini models. So, Gemini 3 flash preview, um, flashlight preview. I'm going to select that one just for the interest of time. And you can also do things like toggle on and configure many of these different tools here off to the right. Um, so you can specify things like structured outputs, um, code execution, which we'll take a look at in a second, function calling, um, you can turn on things like grounding with Google search, so just automatically incorporate that as a tool. um grounding with Google Maps uh and also even things like URL context which gives you kind of like poor man's retrieval um uh you can have a list of URLs and then incorporate that into the model's context window so it can use that um to ground some of its outputs. Um and as I'm sure all of y'all know um models are kind of limited based on the data that they have as part of their pre-training and post-training mixtures. So, if they're only trained on data up to a specific point, that's all of the insight that they have out of the box for those uh kinds of data. If you want it to be able to answer questions that happened after that date, um you're going to have to give it access to tools either through search or through retrieval in order to do that work. Um and again, if anybody has questions as I'm kind of rambling along, feel free to raise your hand and shout them out. This is a small enough group that we can um that it should be pretty fun. Cool. So, I I've turned on grounding with Google search. Um you can also add uh you can also add media. So, you can connect to drive, you can upload files, you can record audio, um uh add camera footage, link a YouTube video, link sample media. Um and YouTube just works via URL. So you can paste in a YouTube URL um and have that uh be used for inference with the Gemini models. So as an example, I haven't tried this um so we'll see if it works. Uh uh but we can take a look uh to see uh to see if we can find a dinosaur video. Um I love T-Rexes. So this uh so this past uh weekend in the Bay Area, we had this thing called Bay Area Big Wheels. um which uh and it defaults to one frame per second. You can also specify different start and end times. Um so just for the interest of speed um I might specify start time is like maybe 0 0 0 um or uh 0 seconds and then maybe end time would be uh um maybe like 300 seconds. Um uh and you can see that this ends up being around 27,600 tokens for five minutes of content. Um but I could say create a table um with timestamps for all of the kinds of dinosaurs that uh come on in uh you see in this video. No worries, it's all good. Make sure to include a fun fact about each dinosaur type. Um, and then hit run. Um, but what I was saying about Bay Area Big Wheels is that there's a big bendy hill in San Francisco um, with a whole bunch of very whiplash sort of turns and everybody gets a little tricycle and rides down it. Um, and I did that this past weekend. Um, it's an Easter Sunday tradition. Um, but I was dressed as a dinosaur and was handing out dinosaur Easter eggs. Um, so this is very on brand. And what's happening behind the scenes is we've turned on grounding with Google search. Um, so we have search as a tool which can help inform some of our fun facts. We've got the video that's being pulled in. So the the first um 5 minutes worth of content. Um, we can see the the different dinosaur types. Um Rexy and his parents um obviously have a lot of appearances in this first episode. Um as well as a brachiosaurus, a velociaptor, and a pteranodon, which is a flying reptile. Um and I love that it's uh I love that it's calling out the true fact um that uh that pteranodons are terasaurs, not dinosaurs. Um and the uh and you can also see the different uh the different citations along the way. um uh from the the URLs that are informing all of these fun facts. Um you can also click get code to see all of the code that you would need in order to replicate the experiment that you just did in AI Studio. So it selects the appropriate model. It shows you how you would handle the URI for YouTube. Um, and then it also gives you insight into the the prompt that you can use for the video um, in order to in order to do the work in Python, in TypeScript, in uh, Java, whatever your favorite language might be. Um, and if you wanted to not use a YouTube URL, if you wanted to use your own video, you would be able to pass that to the model, too. Um, it's just really really handy to be able to pull in a YouTube URL um as opposed to having to do the process of downloading it and then kind of um sending it off um sending it off yourself. Um, and now I also want to watch this episode of Rexie the Little T-Rex. Um, this looks very cool. If you hadn't seen as well in the thinking config, you have different thinking settings for all of our Gemini 3.1 series. Um so minimal, low, medium, and high. If you want the model to spend more tokens thinking, um you can turn on high thinking. Um but and uh I often just keep it on minimal or low just for time sake. Um for Gemini 3.1 flashlight uh you get a really nice uh price uh versus uh sort of price um price performance and uh and also speed profile for for the models. Um so you're not having to make big big tradeoffs between them. So that is how you would uh interact with Gemini 3.1 flashlight um within AI studio for video analysis. Um one of the other slept upon features I think in our APIs as well as in AI studio in general um is compare mode and also code execution which we see here off to the right. So, if I turn on code execution, um what we do is we give Gemini a sandboxed environment with Python and a whole bunch of data science libraries pre-installed um where it can kind of pull in um those libraries as tools to to kind of help solve arbitrary data science tasks. Um and since this is uh sort of giving the model access to it in a sandboxed environment, you don't run the risk of having any of this impact your local environment, which is quite nice. um the uh so as an example um if I select Gemini 3.1 flashlight preview turn on code execution go into compare mode I might try to compare it against Gemini 3 flash preview also with code execution um and then one of the things that you can do um and we'll see if this works um is you can select a picture so this is just some Lego bricks um what I could do is paste this in we can make sure that it's uh secure and safe um uh for the corporate overlords. Um this image itself is around 1,000 tokens, but I could say something like draw bounding boxes around all of the green Lego bricks uh using Python. um make and then maybe display the image with with bounding boxes and hit run. And what should happen is that we see a head-to-head comparison of the two different models. Um Gemini 3.1 Flashlight was able to get it right out of the gate. Um which is pretty wild. So this super super tiny model worked really really fast. wrote the Python code to uh to pull in the image to analyze it and to define the bounding boxes. Um and then if you hover over the token consumption, the amount of dollars required to do this work is pretty wild, right? Like so so being able to uh being able to pull in an image, do this kind of analysis, you could have also asked for things like segmentation masks. um you could have asked to count um specific kinds of entities in the in the photo. Um again using bounding boxes or something similar. Um and all of this was done at well under a fraction of a penny. Um so so strongly strongly recommend experimenting with the uh the smaller weight models especially turning on these tools to help them do their work more effectively. Um, and the like you can also see that Gemini 3 flash preview got to the got to the same answer. It just took a little while longer. Um, and then the the cost um is slightly more but still well under a penny. Um, cool. Um, so this uh that's compare mode again using Gemini 3.1 flashlight um just with the addition of code execution along the way. um for folks who might be interested in URL context, just because I know that this is this is something that we've heard quite a bit about from um from folks that are that are using the the Gemini APIs pretty regularly. If you turn on URL context, um you can do things like add URLs. So, I'm going to pull in a URL for a blog post about GEMA 4. Um, which was released just recently last week after the models training data cut off. Um, I'm going to pull in uh, you know, a blog post about Genie 3. Um, also after the models training data cut off. Um, and I could say something to the effect of compare and contrast. Um, uh, Genie 3 and Gemma 4. Um, tell me uh how they're similar uh different or completely unrelated. Um, and they're mostly completely unrelated, but we'll see what the model thinks. Um, hit uh maybe turn on medium for the thinking level. Um, and then hit run. And what we should also see is that the model is able to give its output but site each one of the sources that it's using in order to make its assumptions. Um, so you can see the different sources down at the bottom. Um, the two URLs that I had used, you can use, u, you know, many, many more than just two. Um, but it cites each one of the the sources in line as it's making assessments along the way. Um, and so you can use publicly available information. And then there are also tools within Vert.Ex s that allow you to do retrieval on custom um custom documents that are internal only um without necessarily having to set up a vector database for for retrieval. And again, if you click get code, it gives you all of the code that you would need to replicate what you're what you're doing in the uh in the AI Studio interface. Cool. So, we've talked about uh we've talked about the Gemini 3.1 series of models. Um you can also use uh you can also use 3.1 Pro Preview which is our largest um model series. It's also our most expensive our our kind of uh by virtue of being larger. It's also a little bit slower. Um Gemini 3 Flash is kind of the workhorse. It's the the model that we see most commonly used in production. And then Gemini 3.1 flashlight is also gaining traction especially for people who are using Gemini 2.0 flash or 2.5 flash. Um and then for our Gemini Live series, how many people have seen or heard of Gemini Live before? Cool. Couple of hands. Like thank you. Thank you deep minders for uh for like you know like paying like like paying attention and also for using the models. Um Gemini live gives you the ability to share your screen, share a video feed, share audio with a model and have a conversation with it dynamically. um including things like adding custom function calls um generating automatic function responses and then also doing things like grounding with Google search. Um, so I can turn on grounding with Google search again, um, just as a tool that the model can call. Um, and then I can give the model access to my screen to help me either navigate an app, um, to help me answer questions about what it sees. Um, and to do that in a broad variety of languages, accents, etc. Um, so as an example, I can start screen sharing. Hey there, Gemini. Uh, can you tell me what you see on the screen? I see a Google search for Lego bricks and pieces. There are tons of images of colorful Lego bricks, sets, and bulk collections from various websites. On the right, there's a larger image featuring close-ups of red, yellow, blue, and green bricks. Are you looking for something specific on this page? And do any of y'all does anybody in the room speak a language that is not English? >> Yeah, Gom like you would you would not uh like you you would not be a reliable narrator. Like but at least like last night we tried this and one of our colleagues yelled out yes uh very confidently um a language that they do not speak and so we did not have anybody to verify but I saw a couple of hands on the front row. What what languages do y'all speak? >> Italian. >> Italian. Okay. So, I'm going to Are you okay with verifying if the model is is like saying true things in Italian? Okay, cool. Cool. Thank you for being our fact checker. Hey, Gemini. Uh, could you tell me what you uh could you tell me what you said again? Just tell me in Italian. And then also, could you tell me what the weather is like in London today? Also in Italian. It was good. Excellent. Very cool. Um, and so, so you can you can ask for different languages. You can also say, "Hey, Gemini, can you tell me a poem about what you see on the screen, but could you uh could you tell me it using a Texan accent?" >> Well, howdy there. Let me take a gander at this here screen. I see piles of Lego bricks, a colorful commotion, like a tiny town fixing to rise from a plastic ocean. Red and yellow, blue and green in batches big and small. Ready to build a castle tall or maybe just a wall. Folks want to buy them, sell them, trade them from all over the place to put a smile, I reckon, on a little builder's face. Anything else you need help with, partner? >> That's giving me flashbacks to my hometown. the um but uh you can change uh you can change all of this dynamically just uh by asking naturally within the flow of conversation um so you could imagine practically a scenario like perhaps you have um an entryway in a bank um and there's some sort of a screen somebody comes in starts speaking in Spanish or start speaking in their uh you know the language that they feel most confident in and the model's able to dynamically respond and answer their questions in a language that's familiar to them um Or you could uh kind of specify within system instructions um what the uh what language, dialect, accent, style you might want the model to adopt. Um so if you only want the model to respond in a specific language or a specific style or with a specific tone, um strongly strongly recommend uh modifying the system instructions. And same as always, if I click get code, you see all of the code that you would need to use to replicate the experiment that you did um within the UI. Um so you can see the the media resolution settings um the settings for compression um and all of that kind of uh kind of incorporated in naturally. Um I can also do things like share video feeds. So hey there Gemini. Uh, how many fingers am I holding up? >> You're holding up two fingers. >> What about now? >> That's a thumbs up. >> Yep. Cool. And so uh so big big uh kind of uh spectrum of things that you can accomplish with Gemini Live and again just a very very low price point compared to other solutions that make you kind of stitch together the speechtoext LLM understanding and texttospech pipeline with all of the video content inputs and outputs all by yourself. Um we have another feature like I I always feel like um whenever I'm describing AI studio I'm just like and also and also and also you can do all these other things. Um we have another feature called build which if you've played with vzero.dev or lovable feels very similar. Um it gives you the option to to kind of create and deploy um and to share um a whole spectrum of apps. And now we have even added support for things like databases and authentication. Um so you can add a database, you can add um login with Google, you can add custom API keys um that are all kind of kept secure for you. Um and you can also uh of course create and edit existing apps. Um, so you saw a little while ago uh some examples using um using music uh um from uh from LIA 3 which is exciting. Guilm who created the LIIA studio app is in the is in the back today um and is uh like these are all really really fascinating to play with if you haven't had a chance to experiment with some of the generative media models. Um you can see some of the examples with Nano Banana 2 as well. Um, and also with media pipe. So, as an example, if I click on this app, um, you can, uh, see that it's requesting camera access. This is a game that's taking in, um, kind of the the location of my hand. So, I can grab, uh, grab and kind of, uh, we can all find out that I play this game really badly. Um, but you can uh sort of play um the uh the game and then also inspect all of the code that's used to create the app itself. Um, but for the purposes of this, I'm going to just show you how you can get started with creating an app from scratch just based on anything that you could possibly imagine. Um, so and I'm going to use database and authentication. Um so so we can uh so we can sort of uh add uh fire store and authoriz um sort of the Google login with Firebase. And I will click this little speechtoext microphone that we have here. So uh it's easier than me typing out all of the details. Um, but as an example, I create an app that allows me to upload uh uh to upload a picture of a bookshelf. Um, the bookshelf uh should have a lot of books in kind of profile view so we can see all of the spines and maybe some information about um the titles of the books, uh the author's names, etc. But the app should use Google search grounding to add more information. So what we should get is like the author name, um the the title name, a description of the book, kind of what the the category of the book might be. Um and it should uh the app should ask the user to log in um with their Google login. it should save all of that information for the user um to uh to a database um and uh you know we should be able to to have that persist. So it's it's basically like you take a picture of your bookshelf um and it automatically catalogs all of your books for you which is a lot like that uh that in theory would have been a startup you know three four years ago. Um, but uh this looks reasonably correct. Um, so I'm going to go ahead and click build. Um, and what's happening behind the scenes is you can see Gemini 3.1 Pro Preview kicks in. Um, it starts thinking and planning about what would be needed in order to create this app. Since it's doing a lot standing up a database, like thinking about authentication, it's going to take a while. And while it is, um, we're going to be uh kind of going to uh going to show another couple of demos. Um, so, uh, so we can let the let the model cook in the background. Um, and then if it needs to, if it needs me to take any actions, there will also be like a little ping, um, so we can we can hear it in the background. Um, just in case uh just in case along the way. Um, so I am going to minimize this a little bit. Um, and I'm going to pull up my uh pull up my other browser window. Um and we're going to take a look at project Genie. Um so how many people have heard of Genie before? Yep. Excellent. So all of the hands in the back row, thank you. And then uh also uh also a few folks uh here in the audience as well. Um Genie 3 is Deep Mind's model for generating new worlds. So you can describe a kind of a scene um describe a character and then actively experience it uh with each frame generated dynamically. No physics engine behind the scenes, no Unity, no Unreal Engine, just each frame generated dynamically pixel by pixel. You can navigate it using the arrow keys off to the left. So the um the WD keys um within the the Genie app. Um, and you can also change the video perspective using the arrow keys. Uh, do things like click the space bar. Um, but it's everything from this like volcanic landscape where you're navigating with uh with kind of a a rover um to things like navigating a watery landscape on a jet ski. And you can see that if you hit one of these lights um it actually responds as if there was some sort of a physics engine based on its uh training data and other information that it's seen along the way. Um, it also sounds like AI Studio might have done something. So, we'll take a look at that in just a second, too. Um, and then, uh, you can also see things like hurricanes and what it would be like to experience a hurricane in Florida. Um, uh, jellyfish, you know, and these thermal underwater, um, situations. Um, just really wild and very magical sorts of experiences. Anything you could uh, anything you could create. Um, so let's take a look at what AI Studio is asking me for. So it wants me to enable the Firebase database. Um, and it looks like it's setting that up. So that seems good. Um, like it's on track. Um, and I'm going to head over to Project Genie and we're going to explore and create a world. Um, if I could sign in. We're very big on security for good reason. Yeah. Then uh so we have the uh we have the option to create an environment to create a character and since I am feeling homesick after hearing that like Texas twang about uh uh about the poem for Lego bricks. I'm going to say um Big Bend National Park in Texas um in the middle of the summer. Um sunshining in the sky. Um but all of the uh rock formations are made out of Lego bricks. Um, and the uh the sky has a rainbow. Um, a quadruple rainbow. Um, why not? And that uh that I can guarantee you is is not like a a situation um that that exists in actual Texas. Um ground is uh sandy and dusty. Um, and then maybe the character is um, uh, what would be a good idea for a character? Um, ostrich with a rocket blaster. Um, and goggles. Maybe make it pink. So, pink arr. Cool. I don't think Texas has ever had that. So, uh, we'll see. We'll see what gets uh what gets created behind the scenes. Genie3 is actually a composition of models. So it's not just one model. It's nano banana VO um Gemini to help with prompting all kind of stitched together along with some really really interesting approaches towards distributed systems and compute. Oh my gosh, this is amazing. Um like I immediately want a YouTube video about this guy. Also, we see some Lego brick rock formations. So, let's create this world. Um, and then what we should be able to do is navigate through it again using the arrow keys. Uh, the arrow keys to change the visualization, the views, and the WD keys um to navigate the the little dude around the world. Um um so, so we've got uh we've got the the um like a couple of little options for the ostriches. um each one moving. Um so you can see that it also seems to have given him like very very muscular arms like maybe maybe it uh wants the the ostrich to um uh to to kind of uh be a like a military a military grade fighter. Um but you can see it you can see it walking around um navigating the Lego bricks. And then if I turn around, um, let me see if I can find my way out of this rock formation. Um, you can even have it investigate some of the some of the scenes. So, we've got the the rainbows. Um, if I'm remembering correctly, if you walk towards this canyon, there should be a river at the bottom. So, we can try to make him jump into the canyon. Um but uh but all of this is kind of captured again just dynamically um by the uh by the Genie 3 by the Gen3 model harness itself. Um so come on. Oh no, I didn't make it in time. Um but it's it's really interesting to to see some of the things that you can build. one of our colleagues um furer on Twitter. So FR um created a game where you're a fish and you have to escape a kitchen and you're just like bouncing along as a fish uh you know trying to trying to get out um before it's dinner time. Um so so really really cool to to be able to see some of these things in action. Um Genie 3 is not currently available as an API just yet. Um but the team is uh you know actively thinking about a trusted tester program. Um and today you can access Genie3 through a an ultra subscription. Um though the ultra subscription is only available with Genie in a in a select number of countries. So I strongly recommend um taking a look at that. >> Yep. Question. My understanding is like very >> No, no, you would not be able to create the 3D game meshes or pull them uh pull like this ostrich dude in as an asset for a game. It is just the pixels. Um so we have seen people um couple together things like the the images that are generated with nano banana and and kind of use additional techniques to turn them into 3D assets. Um but uh but that does require additional work. This isn't automatically creating the 3D assets for the games themselves. But it's a it's a really really good question. There are some other companies um there are some other companies that are taking different approaches for world model building. So Fee Lee's company as an example at World Labs uh is taking a a different approach towards building out these uh these environments that do incorporate more of kind of like the Unity Unreal Engine style asset generation. Um but but I think it's uh longer term um as all of the models seem to converge on many input modalities, many output modalities, um we'll probably see all of that kind of converge as well. Um, and so I I wouldn't be surprised if in the future there would be an opportunity to have like video as an ingested thing for a model and then 3D world or like the code for it produced externally. Yeah, especially given that with Gemini today, you can already um kind of give it a give it an image and then say please create like an SVG of this image. Um, and it can do it pretty well, which actually might be a fun demo. So like but one I've never tried before. So like let's see let's see if it actually works. Um and hopefully is not just me um pretending that it does. Uh so but what you can do is if we making sure I'm still sharing my screen. Cool. Um, >> well, so but that but that benchmark has gotten saturated, right? Like the so so uh I'll take the Lego bricks uh the Lego bricks photo that we had just used um and say something like create an SVG um of this image um SVG representation of this image which is a very very simplistic prompt. I could probably get a lot better um I could probably get a lot better results by asking Gemini to expand upon this prompt um as opposed to as opposed to me just kind of like spitballing a really really simple one. Um so if we don't get great results, we'll ask Gemini to rewrite our prompt to um to improve it. And so we can see the the thinking kick in. one of my most favorite hackathon projects ever. Um they um created uh they used Nano Banana actually to take an input image and then to show step by step how you would be uh how you would draw it um um with the the different the different stroke marks along the way. Um but we can see that it's thinking through the perspective. Um it's defining the bricks. It's thinking about the dimensions of the bricks. It's calculating a grid, defining some colors since I turned on the thinking level to be high for the Gemini 3.1 Pro model. Um, it's doing an awful awful lot of thinking about uh simulating the the rotations and the transformations. Um, we can see see that happen along the way. It also sounds like AI Studio um has an update um for the bookshelf cataloger. So, let's take a look at that while the SVG is generated. Um, Firebase terms accepted. Uh, let's retry, um, to see to see what it's doing. And it looks like it was able to create some code for the TypeScript, the CSS, etc. I wonder if because I started using Gemini 3.1 Pro in a different tab. Um maybe it got a little bit tired, but we'll see. Um I also really really love that you can experiment with the generative media models in AI Studio. So if you were here for the earlier session um you saw Guiam share a lot about Lria about uh our nano banana models about VO3.1 light um and so as an example with nanobanana 2 um you also have the option to do things like image search grounding so you can turn on image search and it will kind of reverse image search and bring back um things that are that are tightly aligned um with what you're asking for. So, as an example, I could add sample media for this um like this cute little dog. Um maybe sample media for um let's see, sample media for Hey there, greetings. >> Welcome. like the uh uh sample media for um maybe this uh outside uh this very very nature friendly um location and then say something like show me the dog in the middle of the natural park um with a can of Celsius um which if you have never had Celsius like bless your heart like that seems like a great life. Um, Celsius is like a notoriously uh a notoriously disgusting or at least from my perspective it's pretty disgusting. Um, but very popular at hackathons caffeinated beverages that tastes um that that tastes a little bit like battery acid. Um, at least to me like I'm sure it tastes delicious to to many other folks. Um, it's also very low calorie. So, so it's uh it's a little bit like a Red Bull alternative, but I've given it a picture of a dog, the picture of this natural scene. I've turned on a reverse image search. Um, so it should be able to pull in details about what a Celsius can might look like. Um, and it's thinking through the assignment. It's got my little dog in the natural scene with a can of Celsius. Um and you can also if you hover over the token consumption see that uh in comparison to the nano banana um the the kind of pro model or protier model um it's much much more cost effective than uh than previous iterations. So uh if you're if you're interested in using the Nanobanana series, Nano Banana 2 is a good one to get started. And just as always, if you click get code, um it gives you the code that you would need to uh to replicate whatever you just did in the AI studio UI just using TypeScript or Python or whatever it might be. >> This is true like the if you if you want the uh just as as always if you uh change the thinking settings to be minimal or low um the model will give you a response much more quickly. Whereas if you ask it to think, it will spend um a lot of time generating tokens for planning and for reasoning about the task that you've described. Cool. And so let's go back to this SVG representation. It looks like we've got a first pass. Um so I'm going to copy. Um I'm going to go to an SVG visualizer. Um just an online one. Um and then paste in that. And it looks like we've got our Lego bricks. They're a little bit distorted. Um, but they but they look pretty reasonable, honestly. Um, and then the, as a reminder, the the picture that we were trying to replicate is this one. And it was able to get all of the different kinds of Lego bricks just not in the right configuration setting. Um, so it's really really cool to see um that you can kind of pull in an image. Um, and then with a this was a very very simple prompt, but with a a much more detailed prompt, you would probably be able to to get a much better representation. I'm also curious like if I turn on code execution um like I wonder if it would be able to have um I I wonder if it would be able to invoke code execution as a tool call in order to do that more effectively. Um, so we'll see that. We'll see that in a second. Um, so it does look like it was able to uh it does look like it was able to pull in an appropriate library to to think through the um to think through the process of generating um generating SVGs or an SVG for the image. And it's even doing the segmentation. This is very cool. Um, and for folks who came in a little bit a little bit later, code execution is a tool um, automatically invokeable via the API that'll gives Gemini the option to to kind of create a sandboxed Python environment with a whole bunch of data science libraries pre-installed and it can invoke those as kind of subtools um, within the environment. Awesome. Very very cool. So, uh we're also still uh building out the the uh the bookshelf visualizer. It looks like it's creating the the Firebase blueprint um as well as some of the rules. And so, if we go back to code, we can see all of this um getting generated along the way. Um, another thing that I I strongly strongly recommend folks take a look at if you've uh if you have interest is our video generation city uh series. So, um, we have a new model called VO3.1 Light that also gives you the option to create, um, really, really nice stock footage backed with audio. Um, as well as, uh, as well as basically anything that you would be using the the larger tier series of BO to do um, just with the the model itself. Um, so as an example, uh, let's go to, uh, let's go to Gemini and ask it to help us generate a prompt. Um, I'm just going to to turn on thinking to be low, um, and say something like create a prompt for a video generation model. Um, to generate stock footage uh, for a vegan basketball themed uh, food truck. uh uh make sure that the food options are Warriors uh themed um which is uh which is a San Francisco which is a San Francisco team and then hit run. And then what we're going to do is we're going to take this output prompt um and then put it in uh put it in VO3.1 light. Hit run. You can see that the output resolution is set to 720p. You have a couple of different options for output resolution, not 4K, um, which is something that, uh, that you would need to use kind of a higher tier video generation model for. You can also specify different aspect ratios. So 16x9 or 9x6 if you want more of a mobile app experience. And you can also uh, sort of configure the video duration. So if you want 8 seconds versus if you want um you know something a little bit more concise like four or six seconds um you can pull that in. Um as well as uh this is a paid tier model so you would have to attach an API key in order to use it. Um the handy thing or another handy thing about AI Studio is that if you expand the settings off to the left um you can see there's a section called get API key. Um, and if you click get API key, you can create um you can create one that's acceptable for free tier use just out of the box without having to Oh my gosh, Chef Curry. This is amazing. Chef Curry. Uh, Splash Brothers. And that does look like tofu, like tofu barbcoa with kale and with avocado and with edamame. Like I would, oh my gosh, I love this. Ah, no kidding. Like, and I am absolutely going to send this to somebody I know because their dream is to start like a vegan basketball food truck. Um, the as well as like a custom vegan nut butter business, which I think would be a really like apparently nut butters have like a 50 to 60% margin. Um, so if any of us need like a a hobby plan, uh, like maybe maybe cultivating some of these culinary hobbies is a good is a good one to take. Um, the another thing that I I want to make sure to mention, we talked about it a little bit is, and uh, we have Ian from the Gemma team also available in the back. Um, he'll come he'll be coming back uh, later this afternoon to discuss as well. Um, but we just recently released our Gemma 4 series of models which are extremely extremely powerful. Um, so they're they're able to punch far above their weight um, in terms of the the parameter size and the compute footprint associated, but they're uh, but you can use them via the APIs in AI Studio as well for free. Um, so if you if you want to be able to test out the Gemma series of models, you can have this kind of try before you buy experience within AI Studio um before downloading them to your own infrastructure. Or if you don't necessarily have a spare GPU at home um hiding out in your closet, uh you can just kind of uh you can ping it through the AI Studio interface as well. Um, and if you click, um, I'm going to, uh, do another another prompt. Um, and then pull in just, uh, pull in just an example image. Um, the the Gemma for models also support multimodal understanding. So, they can analyze audio or video or images. You can say something like um, generate a brief description um, of this image. turn thinking level to minimal. And then the the Gemma models are are pretty fast as well. Um so if you if you need a lighterw weight model accessible via an API that you can work with for free or if you need a model that you can download, use on your own infrastructure, fine-tune and run for free with an Apache 2 license, um the Gemma 4 models are an incredible option for you to try. Um they also run on mobile devices for the smallest versions. So you can have one locally downloaded to your pixel. Um the next series of pixels uh like pixel 10 should have Gemma already added to it. Um and then Chrome as a browser is also incorporating the Gemma models. Cool. So we've uh seen the vegan warrior food truck. We've seen Genie 3. We've seen our open model family. um some uh some Lego bricks and pieces. Um it looks like the the AI Studio app is still cooking a little bit. Um and uh one of the other things one of the other things that was mentioned was um one of the other things that was mentioned was uh the Liium model um which is also available via AI studio. Um, so if we go to audio, uh, you can see a couple of different models that are available to try via API. So, LIA 3 Pro preview, LIA 3 clip preview. Um, so as an example, uh, if I click on this guy, um, you can see the the kind of some of the the automatic templates that you can use with it. So, acoustic folk, um, '90s all rock, etc. But I really really love this app that Guom built um which you can find in the gallery and can also remix to your heart's content and it incorporates uh different sound configurations. Um so if we preview this guy uh you can see an option to create your own sound. So, a clip um maybe electronic uh dancable um uh vegan food truck uh vegan basketball food truck um and Legos. Uh and then we talked about Italian. What language do you speak, sir? In the front row. >> Oh, >> yep. >> Spanish. >> Spanish. Excellent. Uh Spanish. um uh lyrics in Spanish. Um and then create. And we should see the the clip start synthesizing. Um it looks that does look pretty pretty Spanish. Um and we'll see we'll see what it means for electronic. And oh my gosh, that's amazing. That is so cool. Um Fore. This is >> advertising agency, >> you know, like >> well, so so we've got a video for it. We've got a we've got a theme song for it. Like clearly this is something that we should all be like uh like our post ASI plan is now like we're going to start a vegan food truck that's basketball themed and Legos. Um but this is uh this is our LIA 3 model. Um uh we had a session a great session about generative media just before this um led by Guiam. Um so if you missed it uh it should be recorded and you can watch it you can watch it afterwards and we'll also be talking a little bit about it in the workshop later this afternoon. Um but again all of the code is kind of available for the app so you can experiment with it and test it out. Um and then we'll also take a look at the Oh cool. So it looks like our bookshelf cataloger is done. Um, I'm going to go ahead and sign in with Google. Um, so it should ask me to log in with my personal Gmail account. Um, we're going to continue. So, it's signed in as me, which is great. Um, we're going to upload a photo. So, I'm going to find a bookshelf um with books on it. A smaller one to make it a little bit easier. Let's go with I've tried. So, we'll see what this one looks like. Yep. So, this one has some uh this one has some like handwritten style text that I want to see if the the model will be able to pick up on. And also, you can't really see some of the some of the author names. So, I want to see if it'll be able to um sort of figure out what the what the book title is, even though it can't see everything on the spines. Um I'm going to upload this photo that we just downloaded. And it shows the latest upload. It's figuring out the book details and it's adding all of them. So, it's it's figured out the the different kinds of books, the the name of the authors, um the the descriptions of the books. Um and then if I sign out and sign back in again, it should be able to also persist. Yep. So, it persists all of the books that I had on my shelf. Um, and then if I wanted to share it with all of y'all, um, uh, and copy the link, um, uh, make public. So public anybody can access, um, if anybody wanted to, uh, like QR code generator. Um yeah, if anybody wanted to try out this uh this bookshelf um app themselves, you can access it by uh by trying out the QR code there um and going to it. Um which is pretty wild, right? Like it's also a one button click deploy to deploy to Cloud Run. Um though uh like in the interest of not burning up my quota too awful much uh like that is I will refrain from doing it for this app in particular. Um but those are those are most of the things that I that I wanted to show. Um so let me go back to the slides again. I hate slides. Like I'm pretty allergic to them. Um we'll see uh we'll see how this works. Um, so we've talked about LIIA. Um, another thing that you can use the Gemini live model. So that real-time interaction model that we were just uh that we were just playing around with is in robotics. So this is a robot called Pupper. Um, it is completely uh like open sourced. You can 3D print all of the parts. It's running Raspberry Pi. All of the software is open sourced, but it's using the Gemini models behind the scenes for things like object detection and and um to to be able to to respond to its environment. You can also run Gemini Live uh with the Pupper. Um you can use it to to kind of flexibly tell the robot um what to do. And the the way to orchestrate this isn't having Gemini Live um control the robotic actions. you would have it kind of build the plan and then invoke models that might be local on the robot in order to to do things like um pick up specific items, but you can use Gemini to build the plan to to accomplish those tasks. Um and then also things like augmented reality. Gemini Live is great at giving directions, at kind of responding to things that it sees, at describing, you know, how to do math that might be on a whiteboard. Um, and even uh, you know, enabling things like um, real time transcription of uh, you know, if somebody's speaking to you in Chinese, being able to to transcribe just in English um, what the what the person is saying. Um, so lots of really really cool things are capable um, with these multimodal systems. Um, and with that, uh, it feels like a good place to stop to ask for questions. Um, and to also, I know I'm the only thing standing in between all of us and lunch, like hopefully hopefully get us all uh to the cafeteria or the the session with the food a little bit early. Um, does anybody have any questions? Did anybody learn anything new? Um, cool. Cool. Yeah. >> So, so yes, uh there there uh not not so much a code codeex, but there but there is uh there is a plan um to have an AI studio app which Logan has alluded to um at least a few times on Twitter. Um so stay tuned. Um, stay tuned. Uh, it it should be it should be interesting to see. Um, and the the team's very excited. >> Excellent. And you can also use the Gemini APIs with um all of the things that you know and love like OpenClaw. Um uh we have a colleague Gali who is very emotionally invested in his telegram plus Gemini setup um and uses it all the time to uh to invoke uh like workspace actions and to and coupled with Google search. Um so definitely especially given the free tier for the Gemma models and for some of our Gemini models um Gemini plus Open Claw is a is a good path forward. Cool. Excellent. Well, thank you'all all for coming. Thank you all for being early as well. And then hope to see you tomorrow and later this afternoon.