Code Mode: Let the Code do the Talking - Sunil Pai, Cloudflare
Channel: aiDotEngineer
Published at: 2026-04-19
YouTube video id: 8txf05vVVl4
Source: https://www.youtube.com/watch?v=8txf05vVVl4
[music] >> Our next presenter created PartyKit, the open-source tool for real-time multiplayer apps. For his day job, he builds AI agents at Cloudflare. Please join me in welcoming to the stage Sunil Pai. Uh 20 minutes to the pub. Uh hi. Uh my name is Sunil Pai. Uh I work at Cloudflare. Uh I build agents over there uh for the agents SDK. I'm trying very hard for this not to be a Cloudflare talk, but I think we are on the sponsor board, so that's nice. Uh this is a talk about something we call code mode. Uh I've been wearing the hat uh and uh there's some prior art to it. We don't claim to have invented it, but this is a talk about the implications of something new that we we're discovering. So, um you guys have built uh AI applications, and tool calling gets weird at scale. When it's just a couple of tools and very short runs, it's fine. But the moment you start stuffing in uh your Google services, your Jira, your wiki, etc., and you have like hundreds hundreds of tools filling up the context, it starts breaking. Um and uh the composition is weird, and there's this back and forth that you have to do with uh the model that's really slow. Uh we decided to take a different act. Instead of doing this JSON back and forth thing, we asked the model to generate code, usually JavaScript, that we could run against an environment. Uh and some of the benefits seem a little obvious to us. Uh with code, you get a typed API, you can do type checking, there are syntax errors. Uh models are trained on gigabytes, if not terabytes, of data already in the training set. Uh and instead of doing this back and forth, you could write code that executes it all in one run, just one execution. So, uh so this is what I mean. Like there are uh fundamental capabilities of code. You're able to do looping, you're able to hold state, uh you're doing sequencing, parallelize parallelization, things that you would normally do with code anyway as an engineer. So, the first place we applied this uh my colleague Matt Carey, who's actually going to be speaking about this a little more tomorrow, you should watch his talk. Uh the Cloudflare API surface is about 2,600 API endpoints. If we exposed a tool for every single one of them, it's about 1.2 million tokens in your first call. Like it just blows. There's no way to create an MCP server for the entire Cloudflare API surface. And he had a very clever idea where he exposes just two tool calls. Uh search and execute. Both of these endpoints accept code as an input, literally a string of code. For search, the input to the function that you pass to it is the entire open API uh JSON spec. And once it does that, execute gives gives you a whole bunch of functions that you can call against the things that you called. And it reduced that 1.2, 1.5 million token thing down to 1,000 tokens. Kind of unheard of. I think it's like 99.9% reduction. Uh this is going to be scary. I actually have I have a live demo of this, and uh demos don't usually do me well on stage, but uh but the point being that we were able to take a wide, super wide API surface and make it incredibly fast. Uh the prompt itself can be uh fairly generic. So, I should have kicked up the font size on this one. The prompt here is as a customer, you come in and say, "We are getting DDoS'd. I want you to find every offending IP that's like attacking us and block them." In a moment of panic, when your website is going down, you don't have the time to do menu diving. Uh the Cloudflare dashboard is famously a little cumbersome to handle. Uh and you just want the thing done. And you can't even get an AE, it's like 3:00 in the morning. Uh with a regular MCP thing, and this isn't even talking about stuffing 1.2 million tokens. It would be about eight round trips to do each of those API calls. Instead, the model can generate this string of code, run it immediately right next to the API surface, and do it in one shot. And it's just running JavaScript. Just functions and um just things that you're exposing on the API surface. Okay, live demo. This is a demo of our mythical server. Uh I hope I'm logged in, because if I'm not, I'll need all of you to close your eyes while I enter a password. Let's say I just want to like list my workers. Oh, there it is. List my workers. I say send. Okay, okay, then there's no password required. Okay, fine, that's fine. Okay, I give it only read-only access for this demo. Uh do the thing. Yes, allow, sure, whatever. Ba ba ba ba ba. Nice. Okay, it comes back, and uh you'll see it'll start executing tool calls. I should be able to open this up. It has sent saying, "Hey, find me all API endpoints that just say the words list workers or something like that." Uh it then runs code, uh which Hey, yeah, it's like one single request for the API endpoint to get all the workers. Uh it must have received a whole bunch of these. It's actually going through JavaScript errors now. This is going to be fun to see if it actually succeeds. Yikes. Oh, is it trying to do it like per page? It's trying to paginate through the thing. Assume that this worked anyway, and I'll keep talking while it does this. Uh Love that this is happening to me on stage, because I did test it 10 times before coming on. Uh I need to pay for the Mythos uh model to make this work accurately. >> [laughter] >> Uh By the way, you can actually see it. It is actually like listing workers over here. It might just be having trouble uh rendering it over here. Um the point being, uh we are able to shrink that down. Now, if this was a talk about optimizing MCP servers, I would be done and dusted. I was like, "Hey, you should throw this, and trust me, it works when you're not staring at it and have 800 people looking at you on the stage." But it did give us an idea that there's something deeper going on here. The ability to like run this code and uh feels like there's a new way of interacting with systems, with LLMs. Uh here's what I think. Like everyone here is a programmer. And I give you a problem statement like you have 200 photos on your desktop. I need you to categorize and rename them. First thing you do is you look you're going to open up an IDE. You're going to write a little script. Maybe you're going to pass every image to a vision model now, because you get a nice caption for it. Uh rename it, and you're done and dusted. That is how you interact with systems. Uh my mother's not going to do this. Her options are to, well, call me up, or just that. There's going to be like lowest common denominator apps for photo management, and it's $7 a month. And for some reason, you have to install a daemon, which is stealing your crypto or some such stuff. Uh and there's been this dichotomy, and it's fine. Like until now, this has been an acceptable uh this has been an acceptable trade-off that non-technical people will have custom-made interfaces built for their needs and desires. LLMs are breaking this boundary. They every human being on the planet now has access to a buddy that can spit out code that can interact with systems. Uh it takes it takes a line like rename these files by date and location, and generates code, and can run it on your uh on whatever system you expose to it. Uh I say execute it safely here, and that's the bit that I do want to talk about in a minute. The other example I have, so this is Kenton. Kenton is the creator of Cloudflare Workers. Uh famously, I'm So, he does the work, and I like taking credit for his work. This is our relationship in the company. Uh so, he he had a thread a little while ago where he built he's built a little white coding environment for himself, because no one else does that in the world right now. So unique. Build your own little white coding thing. Uh the the thing he asked it to generate was a canvas, one of these TLDraw, Excalidraw style canvases. Uh and it did it it did a little canvas with little brushes and colors. And the first thing Kenton did was draw a tic-tac-toe board on it with a little X in the corner. This is the finished state, and I'll get to that in a second. He did that. And uh what he told the model then is, "I want you to play tic-tac-toe with me. The model, as you can guess, it started generating a tic-tac-toe app. Okay? Kenton stopped it immediately. He's like, "No. You have access to the entire state of the system. And the state of the system here is an array of strokes. You know, like just a whole bunch of points, grid line, grid line, X stroke, etc." He said, "Inspect that and play it with me." Uh immediately, the model started it output the state into its own context and it's like, "I recognize what this looks like. It looks like a tic-tac-toe board and I can see that you put an X in the top left. Let me draw a perfect circle in the middle of the app." To be clear, there is no tic-tac-toe code anywhere in this system. The the emergent behavior is that the model has like, "Sure, I now know how to interact with the system with a set of strokes." Uh also, it lost. Uh by the way, it lost the game and then when we saw the reasoning traces, we noticed that Opus let Kenton win. Which is a whole other weird area of alignment we're not talking about. Anyway, so this actually generated a lot of conversation internally and that's why like this talk is a little weird, it's a little woo-woo. I'm not even sure where we're going and I want to like spread the idea to you and have you folks like integrated. So, the the phrase we've started using is it stopped generating a program and it instead inhabiting the state machine. Uh there's a ghost in the shell reference here for anyone who's over the age of 40, you need ibuprofen, uh you should go back home. Uh but no, like it was a very strange thing to for us not to have a separate app generation stage that you then like interact with. That is entirely the part of the thing. So, what does this new software architecture look like? Uh everyone's building what they call a harness. Uh it's because over the last 3 to 6 months, everyone has realized that these coding agents are great general-purpose computing machines. It's why they're running Claude code or code No, they're running Pi on a Mac Mini, which is the wrong machine for this, by the way. You don't have to spend $400 for a thing that makes API calls. Uh it's been driving me mad. If you check all the second-hand prices of Mac Minis have like shot up. I got one before it, but I got it because I'm special that way. Uh you be everyone's building this harness and this architecture of the harness is not just that it can generate code, but it has a safe space to execute this code into which capabilities are uh exposed. Uh and there are some attributes to this sandbox. We're calling it a sandbox, which is again another completely overloaded term and I have friends in the industry, everyone's building a different kind of sandbox. Uh we have a sandbox SDK, which uses containers and VMs, but that's not even what I'm talking about right now. Uh there are some capabilities to it. Unlike a container, which comes with all sorts of features that you surround with security, you know, you do a bunch of things from the outside, you start with something that has no capabilities. The only thing it can do is execute code. It can't do fetches, there's no exposed APIs, no nothing. And then you grant capabilities to it explicitly. Uh we have something called dynamic workers. I told you I'm it's not really a Cloudflare code, someone else build something better if you think it's better, it's fine. Uh but this is what we use. We use V8 isolates because they start up really, really quickly and uh it's about 10 years of security hardening. Uh it's in our DNA, we care we care a lot about that. Anyway, so we you start exposing capabilities as APIs, A. And we also can control all outgoing fetches and any network connections. In fact, the default way we recommend you use this is no outgoing fetches, only APIs. It has to be fast and you need absolute full observability into it. You need to know why last Tuesday it made a trade for $2.3 million for I don't know, man, like llama poop or something, right? You need to go back to that code. Absolute observability on these systems. It can be V8 isolates like we use. Uh you could use, I don't know, a web web assembly, a custom JavaScript interpreter. Uh that's not the main story here. You just want something that's able to execute that you're able to expose capabilities to and run really quickly. From here, you can start getting really ambitious. The example that I showed you was a one-off, take some code, run it on an API, expand. Now, what if you could uh generate long-running workflows that run for days, months, years? Uh what if each of those instances has some state that it can carry with it uh through um through its lifetime? What if in this world of generative UI, you can start generating a perfect perfectly custom UIs for every single user that you have. Everyone who does e-commerce knows this problem. The more popular you get, the more UI becomes this bland thing that has to work for every single user. And then you bring in the ML people and they're like, "Oh, what if we change the color button this way if it's somebody else?" No. You can go absolutely custom. So, uh I I like the fact that I got Opus to generate generative UI for a slide where I'm making a point about generative UI and it still looks a little bit like Uh but the idea is everyone like e-com let me talk about that e-commerce. Like you have context about everything about the user, the things they like, the orders they have in their cart, the things that might be making them mad. You can surface these things as actions. The UI doesn't have to be a blank chat box. Though, honestly, blank chat box e-commerce might be a lot of fun. Uh here I have two different use cases. In the first one, it's uh I need to return these shoes and find something similar under $100. If the product engineers have not implemented this, how it's it's going to kind of suck in but you can generate something on the fly versus what is happening with my uh delayed order. Point being, we are now in a world where we can generate completely different programs backed by a system that you built on your back end for every single user. It's a new kind of software we're building. And this harness idea isn't just built into the product. A lot of people are finding power by running the harness closer to the user simply because then they get to start mashing up all their different services. This is an anti-Cloudflare talk at this point. I'm like, you should be running the software on your iPhone, like not so much on our servers. Please run it on our servers. Uh but you but there you start getting to stitch together different systems in this safe environment and you get to do it on a task-by-task basis. Um I put this in here because I'm a React programmer and I don't want to freak out the React people by saying no one really wants to build UI anymore. But really, it's a harkening back to rethinking everything that we have thought about UI and for this new age. I keep thinking about it as part of the tech tree we have not really explored for 30 years because eval wasn't around, but now we have a safe eval and we have these things that generate code for you. But you do need to be in a place where you understand that your next billion users are these little robots that are generating code for you. To be clear, your customers are still humans. The things interacting with your systems, uh if you really love your users, you need to find out where they hang out and they don't hang out in the pub, they hang out in registries. They dream in types and syntax errors, you know? >> [snorts] >> Uh you need to be thinking about what is the developer experience for these agents. This is something a bunch of companies are already doing really well, by the way, you know? Docs which are markdown, uh errors that let the agent know what to do next, uh discoverability via search. The big one that I do want to talk that I want you to embed in your head, I guess, is this idea of capability-based security. This isn't even a JavaScript talk. It can be in Python, it can be in WASM. Uh I hope it brings a resurgence of Lisp. It's how I kind of learned how like ASTs work, it kind of breaks your brain. Uh but the but the attributes are still very much the same. Events, sandboxing, capability-based security, embeddable so that it's really fast to start up and run ephemerally. Uh React programmers simply be Well, UI programmers simply because they have so much uh they've been so close to users, I suspect that they'll do particularly well here and that feels really good to me, by the way. I feel happy about it. So, to end, for the longest time, programmers like us, we got code, we had infinite power to interact with any system that we could and complain about it on Twitter because our documentation doesn't have the right CSS or something. JavaScript programmers super entitled, by the way. Uh everyone else got buttons and forms. That distinction is breaking. In a world like this, you need to let the code do the talking. The code is the thing that interacts with all your systems. Uh come talk to me about it at the pub. Like this is like it feels like it's opening up a whole new area of research for us. Uh and we have a lot of ideas and I get to finish my talk and the day with 6 seconds left. How good is that? Thank you very much. Appreciate it. >> [applause and cheering] [music] [applause] [music]