Building Agents (the hard parts!) - Rita Kozlov, Cloudflare
Channel: aiDotEngineer
Published at: 2025-07-23
YouTube video id: j_TKDweOsYE
Source: https://www.youtube.com/watch?v=j_TKDweOsYE
[Music] Hello everyone. Uh I'm Rita. I'm the VP of product for um Cloudflare's developer platform. So workers and durable objects. Thank you for the shoutouts. Um, I always like to start by talking a little bit about um, Cloudflare's mission and especially our mission for developers. And I saw a couple hands here in terms of a number of people that use Cloudflare workers before. Um, but actually if you're sitting in this room, whether you've signed up for Cloudflare directly or not, um, you've 100% used Cloudflare before because about 20% of internet traffic flows through Cloudflare. Um, so if you've ordered an Uber recently, um, or, uh, or maybe even ordered some food, um, you've absolutely used Cloudflare. Um, but aside from Cloudflare's CDN, DNS, DOS services, we do also offer, um, services to developers, including functions that you're able to run storage, compute, AI, inference, um, spanning many, many things. And our uh our vision for developers is to make it as easy as possible for someone to bring their idea to life from the moment that they write their first line of code to deploying it to production to making it live for the first user to the millions that come after that. Um so that that's what I do. It makes my job really exciting to wake up in the morning and see what developers are going to build. Um, now if you're in this room, I I don't need to tell you that AI is as big uh technological paradigm shift as um as cloud, mobile or social before it. Um, I think everyone here is already convinced of that. But it is interesting to see just how quickly things are moving because I think that it's a good reflection of how quickly things are about to move next. So, um, I realized that I gave a talk about a year ago, um, where I was pulling up some some stats and looking at where we were at. And so, a year ago, um, about 44% of developers were using AI as a part of their day-to-day, um, to to help them write code. Uh, and, um, Gartner was predicting that about, uh, by by 2030 about 50% of knowledge workers would be using AI to augment their work. Um and the these numbers seem ex really really low now, right? Like um today um over 75% of knowledge workers use AI to augment their work. Um so this is already surpassing the 2030 estimates that were given and more than 76% of developers use AI as a part of their development process. And um I I I think that honestly from the time that this report was pulled to now that number has grown even more. Um the other interesting thing was that about a year ago when we were talking about um when we were talking about workloads we were primarily talking about workloads in AI that involve training um and we predicted then that workloads were going to shift towards inference and again we've been seeing that unfold so we saw that with open AI's 01 model which is shifting more and more from training to post-training and inference. We saw a similar thing actually with DeepSeek who optimize training so much that more and more energy is spent on the inference part of it. But let's talk about what's next. So um after training and inference comes I think actual automation and uh I know there's been a lot of talk about agents the past couple days but this is the reason that this is so exciting is that we have the opportunity to not just uh to not just augment people's work right you've been able for some time now to go somewhere like Chad GPT and ask it like hey help me draft up an email but what's really really powerful is to be able to go and say hey I have a campaign I want to run. Grab me a full list of the customers that I talked to this week at the conference. Uh then draft me up the email. Then actually I do want to review it before it goes to a customer. So do send it to me for approval. And then ping me when the customer responds. Um and so these are exactly the types of agentic workflows that I think we're going to see more and more that are really going to unlock that next level of productivity. And we're already starting to see these agents out in the wild and really meaningfully impacting businesses. Um so some businesses are seeing uh 20% revenue increases already as a part of starting to adopt agents as a part of sales automation. Um some businesses are seeing 90% faster response times to support when using AI agents. Um and in general uh people are seeing about 50 to 75% time savings when using agents. So agents are going to be even more meaningful but are already reshaping the way that we work. Um, okay. But, uh, you want to build an agent. Where do you start? What what all goes into building an an agent. The way that I like to think about agents really comes down to these four components. So, first you have the client. You have the interface that the agent is going to be interacted through with a human, right? Um then you have the AI the reasoning piece the the thinking part that's going to come up with the logic of what are we about to execute what are we going to do next now the thinking part needs now it's executive branch right it needs a way to go and execute on the actions that it decided that it was going to take and then so that's the workflows and then workflows also need access to tools so it's not just enough to be like okay I'm going to go and do this they need access to the tools to actually take the actions. So, let's run through a quick example of what would it look like that CRM agent that I was just showing if I were to go and build something that helps me contact people that I talk to. What would that look like? So, the first part is if I wanted to have something that works over voice where I can be like, "Hey, do this for me." Um, you need something that connects over WebRTC. um you then need a speech to text model to translate what you said um back um back into text. Um alternatively, we're all familiar with chat UIs, right? So you need somewhere to host that. Um then ideally uh you're using um some sort of gateway to do caching and to run your eval to make sure that as you're iterating on the overall process that things are getting better and better. Um, and then you need to send that response to an LLM that's going to do the thinking part and come up with the rest of the plan. From there, you need a workflow agent. Um, so that's what's going to keep track of what actions have been executed and what actions need to take place next. And then again, you need to connect to uh your tools. It can be a web browser, it can be an API, it can be an internal service that you need to connect to, or it can be a vector database if you need to grab additional knowledge that that uh that that um agent needs uh access to. Sometimes you're also going to need a human in the loop to verify some of these actions that you're taking. So, how do you build an agent? I'm actually going to go backwards here and start with the tools part. Um, and most recently, uh, there's been a lot of talk about MCP. So, the the amazing thing is that Anthropic introduced this new standard back in November. And I think the the really interesting thing about it is that it really got people thinking about, okay, how do we expose uh how do we expose APIs to LLMs in a way that allows us humans to talk to LLM over natural language? Um but but I think that the uh the real missed headlines of MCPs was actually that LLMs became really really good at tool calling. This this wasn't so much the case a few years ago if you try to play around with tool calling, but but now they are. And so we have this new standard for how you can actually write out your code in a way that's going to be um incredibly easy to consume by any uh by any MCP client. And so the again really cool thing about MCP is that it does respect a traditional client server architecture where you're able to have that conversation back and forth and importantly have more than one client that connects to the MCP server. Um so these are some of the core concepts that go into MCP. MCP servers generally have uh resources, prompts, tooling and sampling. Um resources can be anything from file contents and database records. Um, prompts actually help you define how you want someone else to interact with your agent because you can actually prompt your agent probably better than anyone else can. If there are any nuances um about how your system works, you want to build that into it as much as possible. Um, then you want to give it access to the actual tooling, right? And connect those queries with the tools. Um, and then last but not least, sampling. Um, I actually think it I I haven't seen anyone using sampling in production yet in an MCP server was the interesting conclusion that I came to as I was preparing this talk. But but the idea is to actually allow you to kind of use shorthand with your uh with your LLM and allow it to um kind of complete some of the thinking behind it. Um so but but building MCP does come with some tricky parts and I think the trickiest parts of that is first of all the the transport protocol um over SSC and websockets the ooth part and the memory part. Um but I'm going to share a cheat code with everyone here. Um so um get ready. I'm gonna like flash it real quick. Oh, you missed it. Um uh no I'm just kidding. Uh so Cloudflare has uh Clafler has an SDK called agents that you can install that will actually give you a lot of this functionality out of the box. Um so we released agents SDK a few months ago and yes it has the same name as the one that openai just released a few days ago as well. Um but and the two actually work uh play with each other really really well. But um I'll tell you a little bit about what it does and and you can um so you can use uh agents SDK first of all to run MCP servers and it comes with a class builtin called MCP agents that allows you to host your remote MCP servers with OOTH with uh transport with HTTP streaming all built in. Um, so if uh if you're one of those people that never wants to touch OOTH again, um this allows you to do that. Um the really cool thing is that it has state management built into it because Cloudflare has this primitive called durable objects. And so uh durable objects, the idea is basically it's kind of like a serverless function but with state attached directly to it. So, if you've ever wanted to um write some code but then save the state of it without ever having to set up a database or anything like that, this is a really really great way to do it and makes it really easy to build these MCP servers. Um, it comes with real-time websocket communication. So, that makes the whole chat interface thing really really easy. React integration hooks so you can build uh you can integrate it into your front end really easily and basic chat capabilities. So let's walk through what it would actually look like to deploy an MCP server on Cloudflare. Um so first I can define my MCP class that extends MCP agent which I was just talking about. And this MCP server is going to be kind of like a good readads uh server that's going to recommend different books to us. So it we're going to set an initial state that's empty. Uh then I can add different I can give it a tool uh that's called add genre. So I can start to specify my preferences. I'm a big Patricia Highmith fan. So I can say you know I really like uh thrillers. And it's going to it's going to save it and persist it for future interactions. And so when I then ask it um for I I can then have a separate tool called get recommendations that's going to get book recommendations. And uh you can have uh so we were talking about MCP prompts before. You can have a personalized pro prompt for recommending books to someone who likes the genres, right? Um and has read the books that you've previously specified that you read. And so it's a really good way to get these personalized recommendations. And every time that you interact with this tool, it's going to persist the memory over every single time. So the recommendation are going to keep getting better and better. And because this MCP server is standalone and can be interacted with through various uh through various clients, the memory is actually going to persist regardless of the tool that you're using to call into it. Um now, why is this great? Um it's amazing because traditionally you would have to separately set up a database, manage connections, handle scaling. There would be added latency in the setup. Um versus with MCP agent because the memory part is built into it. Um you don't have to do any of that and it's going to scale automatically. It's going to run close to your AI agent and you don't really need to think about infrastructure at all. You just get all of that out of the box. Um you can actually so we have a blog post up. You can go and deploy your first MCP server today. It's really really easy. There is literally a deploy to Cloudflare button. Takes um less than a minute to get your initial MCP server up and running. Uh and what's been really cool is working with some of the brands that we respect so so much and seeing companies like Atlassian, Asana, Stripe, Intercom building their own MCP servers in this exact way. So you're actually going down a really really welltrodden path here. Okay. So that was the tools part. Um so let's uh keep working backwards from from there. So we're we're giving our agents access to tools, but now we need a coordination component, right? Um a workflow that's going to maintain uh state not through just that one tool interaction, but through the entire chain with perhaps a human in the loop. Um, so human in the loop workflows require long uh require you to have really uh long running tasks that sometimes need to talk to an LLM. It might be a reasoning LLM that takes several minutes to come up with a response. Um, and similarly, if you're talking to a human in the loop, a human could take minutes, hours, days, months to respond. Uh, and so you need something that's going to be able to come back and resume its flow after that task is completed. Um, you also still need to consider things like websocket servers, stay persistent, retries, horizontal scaling. These things can get white quite tricky. So again, let's walk through a real use case that uh we built out with a customer. Um, there's a company called Knock. They do notification management and they needed to provision uh an an agent that would do um approval when uh you you could request a new credit card, right? And then you know your boss needs to go and approve it through you know it can be um an email slack um inapp uh notification. So what do we need to do in order to do that? Um first we need to allow users to request a new card through a chat interface. Uh so you can see that here we're importing use agent from um the from the agents react library and then we're going to have uh we're going to create a new instance of chat that's going to have all of these things instantiated on our behalf and this is all part of agents SDK. Um then we need to give it an ability to issue cards through this um issue card action. um but we need to wrap it in the require human input tool in order to delegate that piece to knock. So um we want to make sure that the issue card tool is always always requires the human input. Um then we need to invite no to send our approval notifications and defer the tool call to issue the card until there is approval. Right? Um, so we have a tool call to get a new car provision, but we want to stall that on the actual approval. Um, so you can see that in here, um, where we're going to route the messages to approve something. Um, now once once something is approved, we need to then route it back to the appropriate agent. And this is going to automatically be handled by the durable object and in instantly routed to the correct agent back. Um so you can see in here um that I'm going to find the user ID from the tool called for the calling user. Um and then I'm going to be able to look it up so I can get the agent by name by the user ID in here. And so then if it's an existing agent, we're going to route it to the correct durable object and make sure that we're handling it um with a correct uh web hook. We then need to resume the pause tool call, issue the card and let the user know that the card was approved, right? Um so in here, if we received an approved status, then we can move on with the deferred uh tool execution uh that that we uh that we defined earlier. And then last but not least, we need to make sure the duplicate actions don't occur, right? So if two things happen out of sync, we can't approve the card twice. Uh or we can't provision the card twice. Um and so this is where again that state management becomes really really important. Um and we're able to store all of this directly in the state here. Um, so you can see if um, you know, the if the card has been request requested or processed already and then if it's been approved, we're going to set the status so when a new web hookup comes in, we can't reapprove the same exact one. Um, so we talked about uh we talked about tools, we talked about workflows. Um, next you need the uh the reasoning piece of this and need to choose the diff the right model to run this. Um, I'm actually going to skip this part because there's an entire conference that's dedicated to this today um of people that are going to cover this way better than I will. Um, actually Logan's talk this morning about everything that's happening with Gemini was really really good. There's a bunch of people talking about eval. Um, but then uh but then you need you still need a client in order to connect to your server, right? And and again, this is the really beautiful thing about MCP is that once you built out your MCP server once, uh you can have uh you can truly meet your users where they are. Um and realistically, the nice thing is you actually you don't have to build a UI yourself at all. Um if you're if your users are developers, most likely they're already using Cursor. Uh and so now that cursor supports remote MCP servers, you just import your MCP server and have your clients be able to interact with it. Similarly, Claude and Chat GPT, they both support remote MCPs. So your users again can start using your agents instantly directly through there. But you can also build your own app and your own MCP client. And I think this is where you can build really really interesting agentic workflows when you do have more control over both the client and the server uh and connecting these two pieces together. And not only that, but your app doesn't actually have to be limited to just being a user interface. You can also talk to your MCP a uh your MCP client over voice. um especially with um some of the Cloudflare tools that we have built out uh that help translate WebRTC to websocket in a way that really uh makes it easy to build out these applications because the MCB client can easily understand those connections. So yeah, how do you build an agent? Um these are the four different pieces you need. Your client, your AI, your workflows, your tools. Um, and if you want to get started and don't know where to start, I really, really highly recommend the agents SDK. You'll be able to get up and running in just a few minutes. Um, yeah. So, thank you [Music]