Code Mode: Let the Code do the Talking - Sunil Pai, Cloudflare

Channel: aiDotEngineer

Published at: 2026-04-19

YouTube video id: 8txf05vVVl4

Source: https://www.youtube.com/watch?v=8txf05vVVl4

[music]
>> Our next presenter created PartyKit, the
open-source tool for real-time
multiplayer apps. For his day job, he
builds AI agents at Cloudflare. Please
join me in welcoming to the stage Sunil
Pai.
Uh 20 minutes to the pub.
Uh hi. Uh my name is Sunil Pai. Uh I
work at Cloudflare.
Uh I build agents over there uh for the
agents SDK. I'm trying very hard for
this not to be a Cloudflare talk, but I
think we are on the sponsor board, so
that's nice.
Uh this is a talk about something we
call code mode. Uh I've been wearing the
hat
uh and uh
there's some prior art to it. We don't
claim to have invented it, but this is a
talk about the implications of something
new that we we're discovering.
So, um
you guys have built uh AI applications,
and tool calling gets weird at scale.
When it's just a couple of tools and
very short runs, it's fine. But the
moment you start stuffing in uh your
Google services, your Jira, your wiki,
etc., and you have like hundreds
hundreds of tools filling up the
context, it starts breaking.
Um
and uh the composition is weird, and
there's this back and forth that you
have to do with
uh the model that's really slow.
Uh
we decided to take a different act.
Instead of doing this JSON back and
forth thing,
we asked the model to generate code,
usually JavaScript, that we could run
against an environment.
Uh and some of the benefits seem a
little obvious to us. Uh with code, you
get a typed API, you can do type
checking, there are syntax errors. Uh
models are trained on gigabytes, if not
terabytes, of data already in the
training set.
Uh and instead of doing this back and
forth, you could write code that
executes it all in one run, just one
execution.
So, uh so this is what I mean. Like
there are
uh fundamental capabilities of code.
You're able to do looping, you're able
to hold state, uh you're doing
sequencing, parallelize parallelization,
things that you would normally do with
code anyway as an engineer.
So, the first place we applied this uh
my colleague Matt Carey, who's actually
going to be speaking about this a little
more tomorrow, you should watch his
talk. Uh
the Cloudflare API surface is about
2,600 API endpoints. If we exposed a
tool for every single one of them, it's
about 1.2 million tokens in your first
call. Like it just blows. There's no way
to create an MCP server for the entire
Cloudflare API surface.
And he had a very clever idea where he
exposes just two tool calls.
Uh search and execute. Both of these
endpoints accept code as an input,
literally a string of code. For search,
the input to the function that you pass
to it is the entire open API uh JSON
spec.
And once it does that, execute gives
gives you a whole bunch of functions
that you can call against the things
that you called. And it reduced that
1.2, 1.5 million token thing down to
1,000 tokens. Kind of unheard of. I
think it's like 99.9%
reduction.
Uh this is going to be scary. I actually
have I have a live demo of this, and uh
demos don't usually do me well on stage,
but uh but the point being that we were
able to take a wide, super wide API
surface and make it incredibly fast.
Uh the prompt itself can be
uh fairly generic. So, I should have
kicked up the font size on this one. The
prompt here is as a customer, you come
in and say, "We are getting DDoS'd. I
want you to find every offending IP
that's like attacking us and block
them."
In a moment of panic, when your website
is going down,
you don't have the time to do menu
diving. Uh the Cloudflare dashboard is
famously a little cumbersome to handle.
Uh and you just want the thing done. And
you can't even get an AE, it's like 3:00
in the morning.
Uh with a regular MCP thing, and this
isn't even talking about stuffing 1.2
million tokens. It would be about eight
round trips to do each of those API
calls. Instead, the model can generate
this string of code, run it immediately
right next to the API surface, and do it
in one shot.
And it's just running JavaScript. Just
functions and um
just things that you're exposing on the
API surface.
Okay, live demo. This is a demo of our
mythical server.
Uh I hope I'm logged in, because if I'm
not, I'll need all of you to close your
eyes while I enter a password. Let's say
I just want to like list my workers.
Oh, there it is. List my workers. I say
send.
Okay, okay, then there's no password
required. Okay, fine, that's fine. Okay,
I give it only read-only access for this
demo.
Uh do the thing. Yes, allow, sure,
whatever.
Ba ba ba ba ba. Nice. Okay, it comes
back, and uh
you'll see it'll start executing tool
calls. I should be able to open this up.
It has sent
saying, "Hey, find me all API endpoints
that just say the words list workers or
something like that."
Uh it then runs code,
uh which Hey, yeah, it's like one single
request for the API endpoint to get all
the workers.
Uh it must have received a whole bunch
of these.
It's actually going through JavaScript
errors now. This is going to be fun to
see if it actually succeeds.
Yikes.
Oh, is it trying to do it like per page?
It's trying to paginate through the
thing. Assume that this worked anyway,
and I'll keep talking while it does
this.
Uh Love that this is happening to me on
stage, because I did test it 10 times
before coming on. Uh I need to pay for
the Mythos uh model to make this work
accurately.
>> [laughter]
>> Uh
By the way, you can actually see it. It
is actually like listing workers over
here. It might just be having trouble uh
rendering it over here. Um the point
being, uh we are able to shrink that
down. Now, if this was a talk about
optimizing MCP servers, I would be done
and dusted. I was like, "Hey, you should
throw this, and trust me, it works when
you're not staring at it and have 800
people looking at you on the stage."
But it did give us an idea that there's
something deeper going on here.
The ability to like run this code and uh
feels like there's a new way of
interacting with systems, with LLMs.
Uh
here's what I think. Like everyone here
is a programmer. And I give you a
problem statement like you have 200
photos on your desktop. I need you to
categorize and rename them.
First thing you do is you look you're
going to open up an IDE. You're going to
write a little script. Maybe you're
going to pass every image to a vision
model now, because you get a nice
caption for it. Uh rename it, and you're
done and dusted. That is how you
interact with systems.
Uh my mother's not going to do this. Her
options are to, well, call me up, or
just that. There's going to be like
lowest common denominator apps for photo
management, and it's $7 a month. And for
some reason, you have to install a
daemon, which is stealing your crypto or
some such stuff.
Uh and there's been this dichotomy, and
it's fine. Like until now, this has been
an acceptable
uh this has been an acceptable trade-off
that non-technical people will have
custom-made interfaces built for their
needs and desires.
LLMs are breaking this boundary.
They every human being on the planet now
has access to a buddy that can spit out
code that can interact with systems.
Uh it takes it takes a line like rename
these files by date and location,
and generates code, and can run it on
your uh on whatever system you expose to
it. Uh I say execute it safely here, and
that's the bit that I do want to talk
about in a minute.
The other example I have, so this is
Kenton. Kenton is the creator of
Cloudflare Workers. Uh famously, I'm So,
he does the work, and I like taking
credit for his work. This is our
relationship in the company.
Uh so, he he had a thread a little while
ago where he built he's built a little
white coding environment for himself,
because no one else does that in the
world right now. So unique. Build your
own little white coding thing.
Uh the the thing he asked it to generate
was a canvas, one of these TLDraw,
Excalidraw style canvases.
Uh and it did it it did a little canvas
with little brushes and colors. And the
first thing Kenton did was draw a
tic-tac-toe board on it with a little X
in the corner. This is the finished
state, and I'll get to that in a second.
He did that.
And uh what he told the model then is,
"I want you to play tic-tac-toe with me.
The model, as you can guess, it started
generating a tic-tac-toe app.
Okay?
Kenton stopped it immediately. He's
like, "No.
You have access to the entire state of
the system. And the state of the system
here is an array of strokes. You know,
like just a whole bunch of points, grid
line, grid line, X stroke, etc." He
said, "Inspect that and play it with
me."
Uh
immediately, the model started it output
the state into its own context and it's
like, "I recognize what this looks like.
It looks like a tic-tac-toe board and I
can see that you put an X in the top
left. Let me draw a perfect circle in
the middle of the app."
To be clear, there is no tic-tac-toe
code anywhere in this system.
The the emergent behavior is that the
model has like, "Sure, I now know how to
interact with the system with a set of
strokes."
Uh also, it lost. Uh by the way, it lost
the game and then when we saw the
reasoning traces, we noticed that Opus
let Kenton win.
Which is a whole other weird area of
alignment we're not talking about.
Anyway, so this actually generated a lot
of conversation internally and that's
why like this talk is a little weird,
it's a little woo-woo. I'm not even sure
where we're going and I want to like
spread the idea to you and have you
folks like integrated. So, the the
phrase we've started using is it stopped
generating a program and it instead
inhabiting the state machine. Uh there's
a ghost in the shell reference here for
anyone who's over the age of 40, you
need ibuprofen,
uh you should go back home. Uh but no,
like it was a very strange thing to for
us
not to have a separate app generation
stage that you then like interact with.
That is entirely the part of the thing.
So, what does this new software
architecture look like?
Uh everyone's building what they call a
harness. Uh it's because over the last 3
to 6 months, everyone has realized that
these coding agents are great
general-purpose computing machines. It's
why they're running Claude code or code
No, they're running Pi on a Mac Mini,
which is the wrong machine for this, by
the way. You don't have to spend $400
for a thing that makes API calls. Uh
it's been driving me mad. If you check
all the second-hand prices of Mac Minis
have like shot up. I got one before it,
but I got it because I'm special that
way.
Uh you be everyone's building this
harness and this architecture of the
harness is not just that it can generate
code, but it has a safe space to execute
this code into which capabilities are uh
exposed.
Uh and there are some
attributes to this sandbox. We're
calling it a sandbox, which is again
another completely overloaded term and I
have friends in the industry, everyone's
building a different kind of sandbox. Uh
we have a sandbox SDK, which uses
containers and VMs, but that's not even
what I'm talking about right now. Uh
there are some capabilities to it.
Unlike a container, which comes with all
sorts of features that you surround with
security, you know, you do a bunch of
things from the outside, you start with
something that has no capabilities. The
only thing it can do is execute code. It
can't do fetches, there's no exposed
APIs, no nothing.
And then you grant capabilities to it
explicitly. Uh we have something called
dynamic workers. I told you I'm it's not
really a Cloudflare code, someone else
build something better if you think it's
better, it's fine. Uh but this is what
we use. We use V8 isolates because they
start up really, really quickly and uh
it's about 10 years of security
hardening. Uh
it's in our DNA, we care we care a lot
about that. Anyway, so we you start
exposing capabilities as APIs, A.
And we also can control all outgoing
fetches and any network connections. In
fact, the default way we recommend you
use this is no outgoing fetches, only
APIs. It has to be fast and you need
absolute full observability into it. You
need to know why last Tuesday it made a
trade for $2.3 million for I don't know,
man, like llama poop or something,
right? You need to go back to that code.
Absolute observability on these systems.
It can be V8 isolates like we use. Uh
you could use, I don't know, a web web
assembly, a custom JavaScript
interpreter. Uh that's not the main
story here. You just want something
that's able to execute that you're able
to expose capabilities to and run really
quickly.
From here, you can start getting really
ambitious. The example that I showed you
was a one-off, take some code, run it on
an API, expand.
Now, what if you could
uh generate long-running workflows that
run for days, months, years? Uh what if
each of those instances has some state
that it can carry with it uh through um
through its lifetime?
What if in this world of generative UI,
you can start generating a perfect
perfectly custom UIs for every single
user that you have. Everyone who does
e-commerce knows this problem. The more
popular you get, the more UI becomes
this bland thing that has to work for
every single user. And then you bring in
the ML people and they're like, "Oh,
what if we change the color button this
way if it's somebody else?" No.
You can go absolutely custom. So, uh I I
like the fact that I got Opus to
generate generative UI for a slide where
I'm making a point about generative UI
and it still looks a little bit like
Uh but the idea is everyone like
e-com let me talk about that e-commerce.
Like you have context about everything
about the user, the things they like,
the orders they have in their cart, the
things that might be making them mad.
You can surface these things as actions.
The UI doesn't have to be a blank chat
box. Though, honestly, blank chat box
e-commerce might be a lot of fun.
Uh here I have two different use cases.
In the first one, it's uh I need to
return these shoes and find something
similar under $100. If the product
engineers have not implemented this, how
it's
it's going to kind of suck in but you
can generate something on the fly versus
what is happening with my uh delayed
order.
Point being, we are now in a world where
we can generate completely different
programs backed by a system that you
built on your back end for every single
user. It's a new kind of software we're
building.
And this harness idea isn't just built
into the product. A lot of people are
finding power by running the harness
closer to the user simply because then
they get to start mashing up all their
different services. This is an
anti-Cloudflare talk at this point. I'm
like, you should be running the software
on your iPhone, like not so much on our
servers. Please run it on our servers.
Uh but you but there you start getting
to stitch together different systems in
this safe environment and you get to do
it on a task-by-task basis.
Um
I put this in here because I'm a React
programmer and I don't want to freak out
the React people by saying no one really
wants to build UI anymore. But really,
it's a harkening back to rethinking
everything that we have thought about UI
and for this new age. I keep thinking
about it as part of the tech tree we
have not really explored for 30 years
because eval wasn't around, but now we
have a safe eval and we have these
things that generate code for you. But
you do need to be in a place where you
understand that your next billion users
are these little robots that are
generating code for you. To be clear,
your customers are still humans. The
things interacting with your systems, uh
if you really love your users, you need
to find out where they hang out and they
don't hang out in the pub, they hang out
in registries. They dream in types and
syntax errors, you know?
>> [snorts]
>> Uh you need to be thinking about what is
the developer experience for these
agents. This is something a bunch of
companies are already doing really well,
by the way, you know? Docs which are
markdown, uh
errors that let the agent know what to
do next, uh discoverability via search.
The big one that I do want to talk that
I want you to
embed in your head, I guess, is this
idea of capability-based security.
This isn't even a JavaScript talk. It
can be in Python, it can be in WASM. Uh
I hope it brings a resurgence of Lisp.
It's how I kind of learned how like ASTs
work, it kind of breaks your brain. Uh
but the but the attributes are still
very much the same. Events, sandboxing,
capability-based security, embeddable so
that it's really fast to start up and
run ephemerally.
Uh
React programmers simply be Well, UI
programmers simply because they have so
much uh they've been so close to users,
I suspect that they'll do particularly
well here and that feels really good to
me, by the way. I feel happy about it.
So, to end,
for the longest time,
programmers like us, we got code, we had
infinite power to interact with any
system that we could and complain about
it on Twitter
because our documentation doesn't have
the right CSS or something. JavaScript
programmers super entitled, by the way.
Uh everyone else got buttons and forms.
That distinction is breaking. In a world
like this, you need to let the code do
the talking. The code is the thing that
interacts with all your systems. Uh come
talk to me about it at the pub. Like
this is like it feels like it's opening
up a whole new area of research for us.
Uh and we have a lot of ideas and I get
to finish my talk and the day with 6
seconds left. How good is that?
Thank you very much. Appreciate it.
>> [applause and cheering]
[music]
[applause]
[music]