MCP = Mega Context Problem - Matt Carey

Channel: aiDotEngineer

Published at: 2026-04-25

YouTube video id: YBYUvGOuotE

Source: https://www.youtube.com/watch?v=YBYUvGOuotE

[music]
>> Hello everyone.
Welcome.
Quiet down. Quiet down. All right.
Awesome.
How is everyone?
Yeah, good?
>> [applause]
>> Ooh, thanks. Um
Want to hear some MCP versus CLI
debates?
Yeah, is that why you all came? Anyway,
um
Hello, my name is Matt. Uh I work on MCP
and agents at Cloudflare. And welcome to
my talk. It's all about how we can make
every API a tool for agents. APIs exist
in the wild. How can we connect them to
agents and make them do things?
So, I I really love my job because every
day I get to decide like
uh if an agent looked like this, would
he do this or would he do this? And I
think it's kind of fun.
Um and we often fluctuate weight between
the two of them. Uh someone does
something slightly you think is slightly
funny and then 6 months later we're all
doing it and claiming it was the best
thing in the world.
So, yeah, it's really good crack. But,
the main part of the the role I guess
and like what I end up doing day-to-day
is like how do we give agents hands? How
do we let them
interact with the outside world? And
you're probably familiar with something
like this.
This is tool calling, function calling.
Uh it's been around for a while now. The
LLM writes a function, you execute the
function.
Bash bash bash, weather in London is
18°. It's not, it's like eight and it's
freezing.
Um
sad times.
And then from there we went from uh
bundled tools to something like shared
tools. Like people made tools in their
agents. And the you probably This is
like all like recent history, so
everyone's probably aware of this, but
before MCP we had like uh people would
bundle all their tools in their agents
and then they would keep them bundled in
their agents and then if I was like
trying to interact with Gmail or
something, I would make loads of tools
for Gmail, bundle them with my agents,
and that would be it. And the next
person would have to do exactly the same
thing. And then we ended up with this
like big explosion of uh MCP and remote
MCP about April last year. And the
service providers were like, "Oh, we can
we can like uh
give everyone MCP tools and then
everyone can use the same standardized
tools and we just make it once and we
provide it as another surface for people
to consume our API. Maybe there's a CLI,
there's an API, maybe there's like I
don't know, GraphQL API,
um and there's now MCP as like another
surface. But, this got a little bit fun
because it was okay with like eight
tools.
But, then what happens if you added like
a few more?
Or a few more?
Or a few more?
Or a few more?
Or a few more? And now you you you're
like, "I want to give an agent access to
our whole API surface."
And
well, that ain't going to that ain't
going to happen. Why is it not going to
happen? You've exploded a context window
of the agent.
You've like completely annihilated it.
This is 1.something million tokens. Uh
And this was the problem that we like
came across a few
uh well, around a year ago now.
We were trying to give access to the
whole of the Cloudflare API to agents.
You put all of the the You try and make
naive tools that have every single API
endpoint and you fully explode a context
window. Uh open API spec is 2.3 million
tokens. Into tools, that's something
like 1.1 million tokens. And that's like
never going to fly even with like the
biggest foundational models.
Uh uh And in that time we were like,
"We know this is not necessarily an MCP
problem, but it's how everyone else is
doing it. So, we're we're we're going to
we're going to adapt. We're going to
adapt. We're going to improvise."
Um and we're going to split up our API
into lots of different product-based MCP
services. So, you've probably seen this
like uh a company that publishes 16 MCP
service potentially and then users have
to uh interact with the one that they
want to use when they want to use it. Um
there's much less context, but the user
has to select. And most of the time
there's kind of incomplete coverage. So,
like
for instance like uh one of our one of
our product suites, we might have like
six tools in our MCP server, but the
total API maybe has like 30 endpoints.
Like you've completely missed some
coverage there.
Uh And this is not like fulfilling the
goal of like how do we make every API a
tool for agents. It's actually kind of
kind of annoying.
Uh so so I think we did all a little bit
wrong.
Um well, in Cloudflare we had 16 servers
very very quickly. We were hovering
around 2 and 1/2 thousand endpoints. I
think we're actually at like 2,600 API
endpoints now. Um but
we basically couldn't split up all of
these into all of our servers and the
users had to pick the ones that we
wanted. What we really needed was
progressive discovery of tools. Who's
heard of progressive discovery? Anyone
heard? Yeah, cool.
And that brings us to the crux of the
debates that everyone has on the on
online. Um and that is like
how do we do progressive discovery? And
and like is MCP dead? Like is Was MCP
like a really bad idea? Um
uh And I'm going to say like I I don't
think it was. MCP's a protocol. All of
these can be exposed over MCP. We just
shouldn't be dumping loads of tools into
context. That's like the main thing. We
shouldn't be dumping tools in context.
Um and all capabilities like in the
future we might have prompts and
resources more. Skills are basically
resources.
Um
And we just shouldn't be like loading
all of those at once. So, there's like
sort of three ways you can get around
that problem. Uh there's a CLI which
uh people really like. Uh there's tool
search. Or there's a third one that
we're going to come to a little bit
later.
Um
But, like how does how would a CLI work
for agents? So, this is a a sandbox in
the background.
And if I use our our CLI
and I do something like uh I just call
Wrangler.
We get a bunch of commands. The agent
can like read these commands, pass these
commands and be like, "Oh, I want to
interact with the database. Let's Let's
do Wrangler D1." And maybe we want to
list
list our databases, whatever. And then
after some period of time
and some interactive process apparently,
uh we get uh we get like the databases I
have on my account. And like an agent
can kind of do this um
mostly.
Uh and it and it can call {dash} {dash}
help to get like uh
introspection on like which parameters
it needs and th- this this mostly works.
It mostly works.
Uh it's used very popular by things like
open claw and like people generally
really like CLIs. But, you need shell
access. This is like the main thing.
This is I guess the crux of it. Like you
have to have shell access and that's
kind of annoying.
So,
for things like Cloud Code, they wanted
a bit more of a structured way of doing
things. So, they have uh like tool
search. They have a uh search tool which
loads the tools that they need when they
need them into context. So, say I want
to like create a worker.
Uh what it would do is you it would take
the user the user question, it would do
some sort of keyword matching, and then
it would add K equals say eight tools to
context. And then at some point the LLM
is going to look at, "Oh, actually
workers create, this is the one we
need." And so we're going to use that
one. But, the rest of them stay in
context. Maybe it's not eight, maybe
it's six. They change it changes, but um
yeah, you end up with like two 2,100
tokens and only 500 of them are being
used. But, like it works. It it it's it
works quite well. Um you only load the
tools that are relevant.
Uh And then this last thing is a blog
post that uh Cloudflare published in the
summer of well, last summer. And it's
like how can we instead of doing like a
um
a static search tool or instead of like
enforcing an agent to need a CLI, how
can we do something where we just let
the agent write code? And we let the
agent write code against our API.
And it turns out that TypeScript is
actually well, types are a very concise
way of representing inputs and outputs
um in in a way that an agent can reason
about. So, say you have all of these
endpoints. Have like a get worker
scripts or a create a worker or
something like that. We generate these
types.
Uh and then we let the model
given these types, write some code
against these types. So, here we're
doing code mode list workers. I hope you
guys can see that. Uh
And we're going to try and list some
workers. So, this might be like a user
request to list workers. The model
generates this code um against a typed
SDK that we generate from our API. You
can generate them from open API specs.
Uh
And then we can run that and we can like
list the workers that we have on our
account.
Um we could deploy a worker.
That would be fun. Hello world.
And we could put it behind one of the
hardest things to do at Cloudflare,
which is so weird because it's such a
powerful product, but we can add access,
which is like our our managed IDP. And
now this worker is secure behind access.
Kind of cool. With like a access policy
to only allow me into it and all of this
sort of good stuff. Super super easy.
And an agent can generate all of this
code given our types.
So, this feels like a step in the right
direction.
We just let the model write code. We
benefit from the model getting better.
We benefit from
I don't know, or like our improving our
open API spec. It's like that should be
the source of truth.
But, we had this like kind of weird
thing where we thought this was awesome
and we were pretty stoked about it. Uh
but the clients didn't implement it.
And then I'm when I mean clients, I've
gotten to like MCP terms now. So, the
client is the agent. So, we'll be
referring to the agent as a client from
now on. Um but it So, the clients didn't
really implement it.
And we're like a little bit confused
about why this is the case. Like this
was sort of eight, nine months ago now
and it's a better way of interacting
with
uh with APIs. Just let the model write
code against the API, but they didn't
implement it. And why not?
Uh And that's because like running
untrusted code is mega mega scary. Like
i- if I had said to you a few years ago,
"Oh, we're just going to let
a language model write some code that
we're going to gen- we're going to
execute for our users without looking at
it, without reading it, without seeing
what it does, that might have
potentially like secrets access.
Ideally, it has some secret access.
You'd be like, "That's crazy. That's a
CV, right?" It's a CV. Like, it's a
vulnerability.
That's a problem.
And now we're proposing you to do this.
So,
it is quite scary. What loads of things
can go wrong. We could It could read a
file system, read some secrets that you
don't want it to read. It could
exfiltrate those secrets into a network
request, run infinite loops, consume all
your resources, do like really scary
stuff, run a crypto miner, you know?
That would be That would be bad.
And in the past, people have tried loads
of things to let people run code-like
solutions. So, if anyone's ever written
a DSL,
some sort of like JSON spec about how to
and to interpret that as code, that is
basically this.
If you ever used one of those
integration softwares where you have to
do that, that is this. They just don't
trust you to write code on their
servers.
VMs also, people spinning up sandboxes
to run code,
big sandboxes, big VMs, that's this. And
also code review.
But,
it's kind of lucky cuz we have like a
pretty cool primitive that solves this.
And there will be other primitives that
solve this. I just think this is the
first and so it's worth like worth
shouting about, really. And this is like
how do you run untrusted code in a way
that's super safe for you and your
infrastructure?
And it's kind of like this. So, we can
execute a worker
from a string. And a worker is just a
like a little is like an isolat
in V8. There's many blogs about how all
this works. I'm not going to go into it
super deeply. I'm just going to show you
what it can do.
So,
for instance, we have this like
this piece of code that was generated
and we're going to run this piece of
code that was generated. And this ran on
the back end. It didn't run in my
browser. It ran in a dynamic worker
that's fully isolated. And how how I
guess how can I prove that to you?
If we do this one,
we are trying to get some secrets here,
process.env.
And if we print them, there are no
secrets.
And we also have this weird Cloudflare
global. Ooh, interesting.
Um
if we turn That was with node compat on.
If we turn node compatibility off, we
don't even have We don't even have
process.env there and it all errors out.
So,
we can like influent We have this like
programmable sandbox. It's not quite a
sandbox. It's like a very lightweight
thing that you can put load code into it
and then run it. And I'll show you some
other options later. Like, it's not just
us that has this, but we have one that
we host for you and goes to like
Cloudflare level scale. If you want to
do billions of requests, knock yourself
out.
And now like here's one where
the agent's written some code that
accesses like an external API. And if we
run this one, it's like this worker is
not permitted to access the internet via
global functions.
Or maybe we want it to access the
internet. And now we can give it access.
So, it's a programmable sandbox with got
with programmable guardrails.
And all we're doing here is like
flicking a boolean in the server. That's
like all that's happening here. But you
can provide like a more in-depth
function to be like, "Only access things
to these domains." And that's what we do
on the Cloudflare MCP.
Um if we go to next.
Oh, speaking of the Cloudflare MCP,
this is where I really hope the demo
works.
So, this is an MCP client in this slide.
And if we
ask it a question, we're going to get
some We're going to get like a auth
screen pop-up.
And then hopefully all this works.
Oh, insane.
So, now we have like complete Well, we
have read-only access to the whole of
the Cloudflare API. All of my Cloudflare
infrastructure, I have read-only access
to.
Um
which is pretty cool. These account IDs,
don't worry about them. They're not
secrets in in Cloudflare world.
Uh cool. So,
so we just listed a worker, but you
could do many more things here. Like,
you can deploy workers from your command
line. You can do what we did earlier and
add access to something. You could
[snorts] inspect your DNS. You could
send emails soon.
You can do loads and loads of other
stuff. Like, it's very very cool what
you can do here because you have access
to the whole of the Cloudflare API, all
2,000 and something endpoints.
And I guess like
it kind of brings up the question like
where where are we going with letting
agents access external tools? Like, what
does this look like? Like, you have
people installing CLIs for everything
and running it on their own running it
on their own machine. Maybe running it
on a VM. That's kind of cool.
Um
You have us being like, "Oh, you could
just run untrusted code in this like
other in this other place that's like
really isolated."
You have people doing tool search.
You have people
rendering UI's JSON.
I don't know.
Um
And I guess my main thought is that like
we're going to have so many isolated
environments on the on the web.
And there's going to be loads of
infrastructure primitives that allow you
to run this type of untrusted code on
the web because code is actually a very
compact compact plan. Instead of doing
tool calls, you can have one tool called
code where the model generates the code
of your choice and then you run it. And
that code has so many more degrees of
freedom than like an individual tool
call. So, it makes sense to me that as
the models get smarter,
this is what this is what we will do.
And people will adapt their
infrastructure primitives to do this.
So, there'll be so many more of this.
And you see this starting with like
Pydantic Monty,
Deno also, and like we also have it with
WorkerD, the dynamic workers I showed
earlier. Like, people more people are
going to build these primitives because
they're going to become more and more
useful. So,
just like a little explanation.
Um this is WorkerD
like spawning a dynamic worker in this
sandbox and running some code to get a
fib sequence.
You can do the same thing with Deno
with Deno run with some
questionable checking. I have no idea
what that does.
And then you can also kind of do the
same thing with Pydantic. Monty, their
new their new code interpreter for
running untrusted Python.
Because it's Python, we have to download
Python.
Sucks.
This might never work. I actually have
no idea.
Oh, there we go. Great. Uh
So, maybe you can see like where we're
trying to go with this. That there is
there was a previous time where no one
would ever run untrusted code. That was
a CV. Like, you would just immediately
like you have to like
stop allowing that.
Uh
And then it seems that LLMs it's
actually really good for them to run for
them to write code that you can run.
And so now we're building the primitives
to actually enable us to do that. And it
feels like we missed out on this whole
part of the the
the tech scene that like we've never
tried before. Like, in the 1950s, when
you wanted to run something on a
computer in your local town, you printed
out some punch cards and you stamped
them and you gave them to the guy.
And that was kind of like running
untrusted code, right? Like,
that was kind of it. And then when we
went to the cloud, we got away from
that. And now I think we're going to go
much more back to that where
your users can write code cuz your users
are AI.
And AI is very good at writing code. And
that is how they're going to interact
with your platform, whether through MCP,
whether even through like bash and CLI.
Like, I don't mind.
I think they're just going to write code
against your services. And your services
have to be ready for this. Like, your
APIs have to be ready to take a beating
because they have to have good rate
limiting. Cuz I can run this in a for
loop on multiple sandboxes at once and
just hammer your API. Like, you have to
have some way of protecting against
that. Like, this is the new world that
we're that we're now going to be living
in.
And that's like on the server side on
the on the services side. Now, what's
going to happen
on the client side?
So, I think that's almost even more
interesting because that's the
user-facing side of things. Like, the
user's not going to see the server. The
user doesn't care.
The user just What Why is my agent not
getting my Gmail emails? Or why has it
deleted my whole inbox? They're
not going to like They're not going to
see that.
But on the client side, like there's a
lot of innovation that's going to happen
here. And I think we've stalled a little
bit recently
because actually building an MCP client
in particular got really really hard.
Like, to to to actually build a client
that was performative, that worked, you
needed to manage stateful connections,
you needed to manage resumability
between those connections.
There's
There's plenty of other reasons why
building an MCP client was hard, but
like it was a pain, an absolute pain.
And so people had like the most
stripped-down clients they possibly
could. They mostly offloaded to the MCP
SDKs, which are quite bare-bones.
And
no one was building these like more
unique UI experiences on top of that.
And I think that that is going to come
like very very soon. So,
the most the most obvious thing is we're
going to have programmatic tool calling
in the clients. The the previous slide
we just did showing those showing those
sandboxes with WorkerD, Deno, and
Pydantic, that is like just running
untrusted code in a client. People are
going to do that. If your client is
remote, you're going to do it like that.
If your client is locally, well, just
YOLO it, whatever. Just eval it, you
know? It's
going to be fine. But more people are
going to do this programmatic tool
calling. It's going to happen.
And because you're generating code,
people are going to save this code. And
they're going to save it in these like
mini scripts.
And users might be able to decide, "Oh,
this action that I just did, that the
LLM generated for me, I want to keep
that for later." And then it will be a
much faster. So, you can see things for
things like cron jobs. A user might set
up some web scraping job like without
any knowledge of how web scraping works.
And then it generates a script and that
script is run like every day, every 2
days. And whenever it breaks because web
scraping's like pretty brittle, the
agent will fix it and resave the script.
Like, this stuff is how is going to
happen. And I think like these saved
mini scripts, they only work when you
embrace like programmatic tool calling,
but they really do work. And then and
then the the last thing
is we're probably going to have many
many more clients because they've been
so hard to make up until now, and it is
going to get easier.
There's actually only There's not
There's not a huge amount of really
well-used
MCP clients.
That's going to change. And with that
change, like
more people are going to be able to make
them, more people are going to deploy
agents to the cloud that end up being an
MCP client. And I think more people are
going to try and do this like stateless
agent loop thing. Like it it was fine to
have sandboxes for every agent like
running code code locally if
if there were a million agents in this
world. I think when there are like 100
agents for each person
Oh, hello. Let's not do that.
That's going to be That's going to start
getting really tough. And you're going
to have to like embrace a cloud-native
way of doing things, which means that
state has to be something you can turn
on or off.
And
this is I think we're we're nearing the
end, but this is my last thing. It's
like I work a lot on MCP servers and on
the SDK.
And this is where I think that bit's
going.
I think we're going to see MCP as a
middleware
in an when you build an MCP server. When
you build an API, and you build an API
service, it will be a flag that you can
flag on in your favorite framework.
The
the SDK itself is getting super, super
lightweight. And I think by the end of
this year, we'll be like natively in
every single at least TypeScript big
full stack framework. It will just be
there natively. Because it will be so
small, it will literally just express
the protocol in itself. And it will be
silly for them not to have it. They'll
just have a native integration.
And they'll be able to do MCP is true on
all of your APIs. And because all of the
clients will be doing programmatic tool
calling, you can express like your
thousand APIs from one Next.js app and
just do MCP equals true, and expose it
over at the MS tools over MCP as well.
And I think that I think that will
happen.
I've been thinking that's going to
happen for a while, but I think we're
pretty, really close there. And the last
blocker is like fixing the SDK really,
so that it's capable of doing that. It's
capable of fitting in every single front
every single bundle really.
And that's the plan.
Um
You can find out more the We have a Code
Mode blog post that came out pretty
recently. It's how we gave agents an
entire API in a thousand tokens.
If you have a big API, you should
probably do this.
Any accessibility providers, please just
do this, cuz it's really, really good
for people to access your data.
Um
and thank you.
Try out NPMI agents. Thank you very
much. Woo!
>> [applause]
[applause]
[music]