Building Agents (the hard parts!) - Rita Kozlov, Cloudflare

Channel: aiDotEngineer
Published at: 2025-07-23
YouTube video id: j_TKDweOsYE
Source: https://www.youtube.com/watch?v=j_TKDweOsYE
[Music]
Hello everyone. Uh I'm Rita. I'm the VP
of product for um Cloudflare's developer
platform. So workers and durable
objects. Thank you for the shoutouts.
Um, I always like to start by talking a
little bit about um, Cloudflare's
mission and especially our mission for
developers. And I saw a couple hands
here in terms of a number of people that
use Cloudflare workers before. Um, but
actually if you're sitting in this room,
whether you've signed up for Cloudflare
directly or not, um, you've 100% used
Cloudflare before because about 20% of
internet traffic flows through
Cloudflare. Um, so if you've ordered an
Uber recently, um, or, uh, or maybe even
ordered some food, um, you've absolutely
used Cloudflare. Um, but aside from
Cloudflare's CDN, DNS, DOS services, we
do also offer, um, services to
developers, including functions that
you're able to run storage, compute, AI,
inference, um, spanning many, many
things. And our uh our vision for
developers is to make it as easy as
possible for someone to bring their idea
to life from the moment that they write
their first line of code to deploying it
to production to making it live for the
first user to the millions that come
after that. Um so that that's what I do.
It makes my job really exciting to wake
up in the morning and see what
developers are going to build.
Um, now if you're in this room, I I
don't need to tell you that AI is as big
uh technological paradigm shift as um as
cloud, mobile or social before it. Um, I
think everyone here is already convinced
of that. But it is interesting to see
just how quickly things are moving
because I think that it's a good
reflection of how quickly things are
about to move next. So, um, I realized
that I gave a talk about a year ago, um,
where I was pulling up some some stats
and looking at where we were at. And so,
a year ago, um, about 44% of developers
were using AI as a part of their
day-to-day, um, to to help them write
code. Uh, and, um, Gartner was
predicting that about, uh, by by 2030
about 50% of knowledge workers would be
using AI to augment their work. Um and
the these numbers seem ex really really
low now, right? Like um today um over
75% of knowledge workers use AI to
augment their work. Um so this is
already surpassing the 2030 estimates
that were given and more than 76% of
developers use AI as a part of their
development process. And um I I I think
that honestly from the time that this
report was pulled to now that number has
grown even more.
Um the other interesting thing was that
about a year ago when we were talking
about um when we were talking about
workloads we were primarily talking
about workloads in AI that involve
training um and we predicted then that
workloads were going to shift towards
inference and again we've been seeing
that unfold so we saw that with open
AI's 01 model which is shifting more and
more from training to post-training and
inference. We saw a similar thing
actually with DeepSeek who optimize
training so much that more and more
energy is spent on the inference part of
it. But let's talk about what's next. So
um after training and inference comes I
think actual automation and uh I know
there's been a lot of talk about agents
the past couple days but this is the
reason that this is so exciting is that
we have the opportunity to not just uh
to not just augment people's work right
you've been able for some time now to go
somewhere like Chad GPT and ask it like
hey help me draft up an email but what's
really really powerful is to be able to
go and say hey I have a campaign I want
to run. Grab me a full list of the
customers that I talked to this week at
the conference. Uh then draft me up the
email. Then actually I do want to review
it before it goes to a customer. So do
send it to me for approval. And then
ping me when the customer responds. Um
and so these are exactly the types of
agentic workflows that I think we're
going to see more and more that are
really going to unlock that next level
of productivity.
And we're already starting to see these
agents out in the wild and really
meaningfully impacting businesses. Um so
some businesses are seeing uh 20%
revenue increases already as a part of
starting to adopt agents as a part of
sales automation. Um some businesses are
seeing 90% faster response times to
support when using AI agents. Um and in
general uh people are seeing about 50 to
75% time savings when using agents. So
agents are going to be even more
meaningful but are already reshaping the
way that we work.
Um, okay. But, uh, you want to build an
agent. Where do you start? What what all
goes into building an an agent.
The way that I like to think about
agents really comes down to these four
components. So, first you have the
client. You have the interface that the
agent is going to be interacted through
with a human, right? Um then you have
the AI the reasoning piece the the
thinking part that's going to come up
with the logic of what are we about to
execute what are we going to do next now
the thinking part needs now it's
executive branch right it needs a way to
go and execute on the actions that it
decided that it was going to take and
then so that's the workflows and then
workflows also need access to tools so
it's not just enough to be like okay I'm
going to go and do this they need access
to the tools to actually take the
actions.
So, let's run through a quick example of
what would it look like that CRM agent
that I was just showing if I were to go
and build something that helps me
contact people that I talk to. What
would that look like? So, the first part
is if I wanted to have something that
works over voice where I can be like,
"Hey, do this for me." Um, you need
something that connects over WebRTC. um
you then need a speech to text model to
translate what you said um back
um back into text. Um alternatively,
we're all familiar with chat UIs, right?
So you need somewhere to host that. Um
then ideally uh you're using um some
sort of gateway to do caching and to run
your eval to make sure that as you're
iterating on the overall process that
things are getting better and better.
Um, and then you need to send that
response to an LLM that's going to do
the thinking part and come up with the
rest of the plan.
From there, you need a workflow agent.
Um, so that's what's going to keep track
of what actions have been executed and
what actions need to take place next.
And then again, you need to connect to
uh your tools. It can be a web browser,
it can be an API, it can be an internal
service that you need to connect to, or
it can be a vector database if you need
to grab additional knowledge that that
uh that that um agent needs uh access
to. Sometimes you're also going to need
a human in the loop to verify some of
these actions that you're taking.
So, how do you build an agent? I'm
actually going to go backwards here and
start with the tools part. Um, and most
recently, uh, there's been a lot of talk
about MCP. So, the the amazing thing is
that Anthropic introduced this new
standard back in November. And I think
the the really interesting thing about
it is that it really got people thinking
about, okay, how do we expose uh how do
we expose APIs to LLMs in a way that
allows us humans to talk to LLM over
natural language? Um but but I think
that the uh the real missed headlines of
MCPs was actually that LLMs became
really really good at tool calling. This
this wasn't so much the case a few years
ago if you try to play around with tool
calling, but but now they are. And so we
have this new standard for how you can
actually write out your code in a way
that's going to be um incredibly easy to
consume by any uh by any MCP client. And
so the again really cool thing about MCP
is that it does respect a traditional
client server architecture where you're
able to have that conversation back and
forth and importantly have more than one
client that connects to the MCP server.
Um so these are some of the core
concepts that go into MCP. MCP servers
generally have uh resources, prompts,
tooling and sampling. Um resources can
be anything from file contents and
database records. Um, prompts actually
help you define how you want someone
else to interact with your agent because
you can actually prompt your agent
probably better than anyone else can. If
there are any nuances um about how your
system works, you want to build that
into it as much as possible. Um, then
you want to give it access to the actual
tooling, right? And connect those
queries with the tools. Um, and then
last but not least, sampling. Um, I
actually think it I I haven't seen
anyone using sampling in production yet
in an MCP server was the interesting
conclusion that I came to as I was
preparing this talk. But but the idea is
to actually allow you to kind of use
shorthand with your uh with your LLM and
allow it to um kind of complete some of
the thinking behind it.
Um so but but building MCP does come
with some tricky parts and I think the
trickiest parts of that is first of all
the the transport protocol um over SSC
and websockets the ooth part and the
memory part. Um but I'm going to share a
cheat code with everyone here. Um so um
get ready. I'm gonna like flash it real
quick. Oh, you missed it.
Um uh no I'm just kidding. Uh so
Cloudflare has uh Clafler has an SDK
called agents that you can install that
will actually give you a lot of this
functionality out of the box. Um so we
released agents SDK a few months ago and
yes it has the same name as the one that
openai
just released a few days ago as well. Um
but and the two actually work uh play
with each other really really well. But
um I'll tell you a little bit about what
it does and and you can um so you can
use uh agents SDK first of all to run
MCP servers and it comes with a class
builtin called MCP agents that allows
you to host your remote MCP servers with
OOTH with uh transport with HTTP
streaming all built in. Um, so if uh if
you're one of those people that never
wants to touch OOTH again, um this
allows you to do that. Um the really
cool thing is that it has state
management built into it because
Cloudflare has this primitive called
durable objects. And so uh durable
objects, the idea is basically it's kind
of like a serverless function but with
state attached directly to it. So, if
you've ever wanted to um write some code
but then save the state of it without
ever having to set up a database or
anything like that, this is a really
really great way to do it and makes it
really easy to build these MCP servers.
Um, it comes with real-time websocket
communication. So, that makes the whole
chat interface thing really really easy.
React integration hooks so you can build
uh you can integrate it into your front
end really easily and basic chat
capabilities.
So let's walk through what it would
actually look like to deploy an MCP
server on Cloudflare. Um so first I can
define my MCP class that extends MCP
agent which I was just talking about.
And this MCP server is going to be kind
of like a good readads uh server that's
going to recommend different books to
us. So it we're going to set an initial
state that's empty. Uh then I can add
different I can give it a tool uh that's
called add genre. So I can start to
specify my preferences. I'm a big
Patricia Highmith fan. So I can say you
know I really like uh thrillers. And
it's going to it's going to save it and
persist it for future interactions.
And so when I then ask it um for I I can
then have a separate tool called get
recommendations that's going to get book
recommendations.
And uh you can have uh so we were
talking about MCP prompts before. You
can have a personalized pro prompt for
recommending books to someone who likes
the genres, right? Um and has read the
books that you've previously specified
that you read. And so it's a really good
way to get these personalized
recommendations. And every time that you
interact with this tool, it's going to
persist the memory over every single
time. So the recommendation are going to
keep getting better and better. And
because this MCP server is standalone
and can be interacted with through
various uh through various clients, the
memory is actually going to persist
regardless of the tool that you're using
to call into it.
Um now, why is this great? Um it's
amazing because traditionally you would
have to separately set up a database,
manage connections, handle scaling.
There would be added latency in the
setup. Um versus with MCP agent because
the memory part is built into it. Um you
don't have to do any of that and it's
going to scale automatically. It's going
to run close to your AI agent and you
don't really need to think about
infrastructure at all. You just get all
of that out of the box.
Um you can actually so we have a blog
post up. You can go and deploy your
first MCP server today. It's really
really easy. There is literally a deploy
to Cloudflare button. Takes um less than
a minute to get your initial MCP server
up and running. Uh and what's been
really cool is working with some of the
brands that we respect so so much and
seeing companies like Atlassian, Asana,
Stripe, Intercom building their own MCP
servers in this exact way. So you're
actually going down a really really
welltrodden path here.
Okay. So that was the tools part. Um so
let's uh keep working backwards from
from there. So we're we're giving our
agents access to tools, but now we need
a coordination component, right? Um a
workflow that's going to maintain uh
state not through just that one tool
interaction, but through the entire
chain with perhaps a human in the loop.
Um, so human in the loop workflows
require long uh require you to have
really uh long running tasks that
sometimes need to talk to an LLM. It
might be a reasoning LLM that takes
several minutes to come up with a
response. Um, and similarly, if you're
talking to a human in the loop, a human
could take minutes, hours, days, months
to respond. Uh, and so you need
something that's going to be able to
come back and resume its flow after that
task is completed. Um, you also still
need to consider things like websocket
servers, stay persistent, retries,
horizontal scaling. These things can get
white quite tricky. So again, let's walk
through a real use case that uh we built
out with a customer. Um, there's a
company called Knock. They do
notification management and they needed
to provision uh an an agent that would
do um approval when uh you you could
request a new credit card, right? And
then you know your boss needs to go and
approve it through you know it can be um
an email slack um inapp uh notification.
So what do we need to do in order to do
that? Um first we need to allow users to
request a new card through a chat
interface. Uh so you can see that here
we're importing use agent from um the
from the agents react library and then
we're going to have uh we're going to
create a new instance of chat that's
going to have all of these things
instantiated on our behalf and this is
all part of agents SDK. Um then we need
to give it an ability to issue cards
through this um issue card action. um
but we need to wrap it in the require
human input tool in order to delegate
that piece to knock. So um we want to
make sure that the issue card tool is
always always requires the human input.
Um then we need to invite no to send our
approval notifications and defer the
tool call to issue the card until there
is approval. Right? Um, so we have a
tool call to get a new car provision,
but we want to stall that on the actual
approval. Um, so you can see that in
here, um, where we're going to route the
messages to approve something.
Um, now once once something is approved,
we need to then route it back to the
appropriate agent. And this is going to
automatically be handled by the durable
object and in instantly routed to the
correct agent back. Um so you can see in
here um that I'm going to find the user
ID from the tool called for the calling
user. Um and then I'm going to be able
to look it up so I can get the agent by
name by the user ID in here. And so then
if it's an existing agent, we're going
to route it to the correct durable
object and make sure that we're handling
it um with a correct uh web hook.
We then need to resume the pause tool
call, issue the card and let the user
know that the card was approved, right?
Um so in here, if we received an
approved status, then we can move on
with the deferred uh tool execution uh
that that we uh that we defined earlier.
And then last but not least, we need to
make sure the duplicate actions don't
occur, right? So if two things happen
out of sync, we can't approve the card
twice. Uh or we can't provision the card
twice. Um and so this is where again
that state management becomes really
really important. Um and we're able to
store all of this directly in the state
here. Um, so you can see if um, you
know, the if the card has been request
requested or processed already and then
if it's been approved, we're going to
set the status so when a new web hookup
comes in, we can't reapprove the same
exact one.
Um, so we talked about uh we talked
about tools, we talked about workflows.
Um, next you need the uh the reasoning
piece of this and need to choose the
diff the right model to run this. Um,
I'm actually going to skip this part
because there's an entire conference
that's dedicated to this today um of
people that are going to cover this way
better than I will. Um, actually Logan's
talk this morning about everything
that's happening with Gemini was really
really good. There's a bunch of people
talking about eval. Um, but then uh but
then you need you still need a client in
order to connect to your server, right?
And and again, this is the really
beautiful thing about MCP is that once
you built out your MCP server once, uh
you can have uh you can truly meet your
users where they are. Um and
realistically, the nice thing is you
actually you don't have to build a UI
yourself at all. Um if you're if your
users are developers, most likely
they're already using Cursor. Uh and so
now that cursor supports remote MCP
servers, you just import your MCP server
and have your clients be able to
interact with it. Similarly, Claude and
Chat GPT, they both support remote MCPs.
So your users again can start using your
agents instantly directly through there.
But you can also build your own app and
your own MCP client. And I think this is
where you can build really really
interesting agentic workflows when you
do have more control over both the
client and the server uh and connecting
these two pieces together. And not only
that, but your app doesn't actually have
to be limited to just being a user
interface. You can also talk to your MCP
a uh your MCP client over voice. um
especially with um some of the
Cloudflare tools that we have built out
uh that help translate WebRTC to
websocket in a way that really uh makes
it easy to build out these applications
because the MCB client can easily
understand those connections.
So yeah, how do you build an agent? Um
these are the four different pieces you
need. Your client, your AI, your
workflows, your tools. Um, and if you
want to get started and don't know where
to start, I really, really highly
recommend the agents SDK. You'll be able
to get up and running in just a few
minutes. Um, yeah. So, thank you
[Music]