Building Agents at Cloud Scale — Antje Barth, AWS
Channel: aiDotEngineer
Published at: 2025-08-02
YouTube video id: WJjInLeaJjo
Source: https://www.youtube.com/watch?v=WJjInLeaJjo
[Music] [Music] Hi everyone. I'm thrilled to be back on stage here again at the engineer worlds fair and it's amazing to see this community grow. So today I'm going to speak about how we can build agents at clouds scale. Now at Amazon and AWS we truly believe that virtually every customer experience we know of will be reinvented with AI. And not just the existing experiences, but there will also be brand new experiences we are now able to build with the help of AI agents. And we're not just theorizing about this, right? We're all here together to actually build the future. Now, I want to start just with a little bit of what that means internally across Amazon as a business. At Amazon, we have over 1,000 generative AI applications that are either built or in development, transforming everything from how we forecast inventory to how we optimize delivery routes to how customers shop and how they interact with their homes. And one of the most ambitious deployments of AI agents is the complete reimagining of Alexa. And I know many of us have been waiting for this for a long time. So what you're about to see here represents the largest integration of services, agentic capabilities, and LLM that we know of anywhere. So let's have a brief look. Wow. Wow. Look at my style. I know you ain't seen it like this in a while. Oh, hey there. So, we can just like talk now. I'm all ears. Figuratively speaking. Do you know how to manage my kids schedules? I noticed a birthday party conflicts with picking up grandma at the airport. Want me to book her a ride? Billy Eyish is in town soon. No way. I can share when tickets are available in your city. Yes, please. Got any spring break ideas? Somewhere not too far, only if there's a beach and nice weather. Santa Barbara is great for everyone. I found a restaurant downtown I think you'd like. What is Santa Barbara known for? It has great upscale shops and oceanfront dining. Can you go whale watching? Absolutely. Want me to book a catamaran tour? Wow. What's the next step? Remove the nut holding the cartridge. Should I get bangs? You might only love them for a little while. You're probably right. Make a slideshow of baby teap. Mom, what part am I looking for again? 2in washers. Your Uber is 2 minutes away. For real? Wait, did someone let the dog out today? I checked the cameras and yes, in fact, Mozart was just out. I love sharing this video because it shows really the power of agents at scale and just to have a quick look what that means in terms of numbers. We have over 600 million Alexa devices now out in the world and with the help of the latest advancements in AI, we were able to really reimagine this experience. Alexa Plus works through hundreds of specialized expert systems. That's what the Alexa team calls groups of capabilities, APIs, and instructions to accomplish a specific task for you. And all of these experts also orchestrate across tens of thousands of partner services and devices to get the things done, which you just seen a glimpse of this here in this video. And we truly believe that the future will be full of those specialized agents, each with their own unique capabilities and working together seamlessly with other AI agents. Now, this example shows what's possible at this massive scale. But how do we get there? How do we operate at this scale? or said differently, how do we move from web services that we've built for many years now into developing those agentic services? And luckily, many of the underlying principles remain the same. Whether you're building for millions of devices, whether you're reimagining and integrating AI experiences into your enterprise applications, or you're a startup and you're really just looking to kind of scale your idea to the next level. Now, another example I want to show you is an agentic service that we built at AWS. You might have heard about Amazon Q developer which is our code assistant that helps you really kind of across the software development life cycle and just a few months ago we released an Q developer agent for your CLI. So it brings the agendic chat experience into the terminal. It helps you to debug issues. You can ask it natural questions. It can read and write files and really kind of help to make your day-to-day in the terminal more productive. So let's have a quick look how this looks. Here is Amazon Q in the CLI and I'll just ask a good question here. In this case, hey, what do you know about Amazon Bedrock? CLI is integrated with MCP. So what it does, it actually figures out there is a tool. Our AWS documentation team has released an MCP server. It's connecting to it. You see the tool is happening and it's asking for permissions. So I give it the permissions and then it comes back with a response that is grounded in the official AWS documentation. Now I don't want to talk much more about Q but I do want to ask for you just to quickly think about how long did it take for the AWS internal teams to build and ship this agentic service and let's just do it with a quick raise of hands who think it took two months to develop and ship this It's a few hands. Who thinks three weeks? All right, it's a bunch of more hands. Who do you think it took half a year? Almost none. Wow, you folks are great. We built and shipped this within 3 weeks. And to me, this is just almost insane, right? Like the speed and we heard it earlier like the mode of of AI, one of the keynote speakers called it out is execution, right? And I think three weeks is super impressive. Now, how do we enable teams and not just internally at AWS, but in general to build and ship production ready AI agents this quickly? What we did internally, our teams, we needed to fundamentally rethink how to build agents. And what we did is we developed a model-driven approach that really kind of taps into the power of LLMs these days and models that are so much more capable in deciding, planning, reasoning, taking actions and let the developers focus on what their agent should do rather than telling it exactly how to do it. And the great news is we made it available for all of you to use as well. So just a few weeks ago, we released Strand agents. It's an open- source Python SDK which you can check out and start building and running AI agents in just a few lines of code. So let me show you quickly how this looks like. And before I go in here, just a fun fact. If you wonder why did they call it trans agents? Well, this is what happens if you let AI pick its own name. All right. So the reasoning behind because again the AI agent is is capable of reasoning. It came up with like think about the two strands of DNA and just like the two strands of DNA strands agents connects the two core pieces of an agent together the model and the tools. And it helps you building agents. It simplifies it by you really relying on those state-of-the-art models to reason, to plan, and take action. You can simply start with defining a prompt and your tools in code and then test it out locally and then once you're ready, deploy it for example in the cloud. And this is how simple it is. Again, just a couple of lines should look pretty familiar. You install strands agents, you import it and then it comes with pre-built tools which I talk about a little bit more in detail and basically you just add the tools to your agent and then you can start asking it questions or building more complex workflows with it. Now by default strands agents integrates with Amazon Bedrock as the model provider. So you can check the model config here using cloud 3.7 sonnet. But of course, it's not just limited to AWS. You can use Strand agents across multiple providers. For example, we have integrations with a llama. So you can start developing locally, testing it out. We have integrations and tropicedit integrations, metaedit integrations to the llama API. You can use openAI models and any other providers available through the integration with light LLM. And of course, you can also develop your own custom model provider. Now quickly on the tools as I said agents comes with over 20 pre-built tools. So anything from simple tasks like hey I just want to do some file manipulation some API calls obviously integrate with AWS services but then also more complex use cases and I just want to call out a couple of them. So there's a whole group of integrated tools for memory and rack. One tool specifically called retrieve which lets you do semantic search over a knowledge base. And just to show you the power of this, we have an internal agent at AWS that manages over 6,000 tools. Now 6,000 is a hard number of tools to put into a single context window and give um one model to decide. So what we did is we put the descriptions of those tools in a knowledge base and use the retrieve tool here. So the agent can find the most relevant tools for the task at hand and only pull those tools back into the model context for the model to decide which one to take. So that's just one use case how we're leveraging that. Also there is support for multimodality across images, video and audio with strands. There is a tool to kind of prompt for more thinking and deep reasoning and it also comes with pre-built tools to implement multi- aent workflows whether it's graph-based workflows or a swarm of sub aents working together. Now you cannot talk about tools without mentioning MCP right? So obviously we integrated MCP here natively within strands. So you can just use this also to connect to thousands of available MCP servers and make them available as tools for your agent. Support for A2A is also coming soon. But let's start and talk a little bit about MCP first. If you're building on AWS already, make sure to bookmark this GitHub repo. It's labmc. And here you can find a very long list much longer than you what would you see here on this slide of and growing number of the MCP server implementations specifically if you're working and building on AWS. Now, one of the challenges stems from the fact that once we all started building MCP servers, what we had was standard IO, right? So, it started out to help locally connect your systems, your clients to respective tools. And here's just a quick example, which is important for a demo I'll show in a little bit. This is just a standard IO implementation of an MCP server. should look familiar to most of you working with MCP using the Python SDK using fast MCP. All I'm doing here is set up my server and using the decorator to define a tool. In this case, my tool is to roll a dice. And you might see in the code here, it has an input to define the number of sides. And I had to put a picture here because I have to admit, um, I just learned this myself. Do we have DND fans in the room? Woohoo. All right, a few of them. So, you all know what I'm talking about. For the rest of us, I just learned um there are dices, and I have one here. Not sure if the camera can catch this. Um it's just one of them here on the slide. A dice that has, for example, this one has 20 sides. Something very normal in the D&D world to start, I think, your game. Um, don't ask me questions about D and D. my colleague Mike Chambers who is either here or in the expo right now. He built the demo, so kudos to him and he can answer all of the D&D questions. All right, just keep that in mind. Um, I'll come back to this in just a second. Now, what we want to do here is to decouple and kind of connect to remote MCP servers because the topic is to scale, right? And the way to do this is in the AWS world as easy as just deploying it as an Lambda function. So we can do this now with streamable HTTP. And the same concepts apply. You put your Lambda functions as you would have before behind an MCP gateway and then connect. And because we care about security and authorization in the quick demo I'm going to show you, I'm using an authorizer. Um you can also plug in a Cognto framework for this part. And I'm also going to store session data in a Dynamob table. So let's roll this quick demo here. So what you see here is an MCP Lambda handler that we developed. It's available on the GitHub repo which makes it really easy to kind of set up your MCP server in Lambda. Here's a very simple hello world example. The tool is just um again defined with a tool decorator in here and then in the lambda handler function you can reference um the input here the invoke function and pass it to that MCP server. Now if we're looking at the server implementation and here we're doing a little bit more. You can see how we're adding session table support which is a DynamoB table. We're defining the tool. This is the rolling dice tool that I just pointed out but this time it's hosted as a lambda function. You can write all the code you want to have there as well. And then at the very end, it's the same single line that basically when you call the lambda function passes this on to the MCP server. Let's deploy this. And again, we're using the existing tools to deploy Lambda functions as we have before. So this one is using AWS SAM to just deploy that to the cloud. And then we will receive the API gateway URL as well. Now from the client side here I'm using strands agents as you can see and then I am using the MCP integration. I'm passing here my API gateway URL to connect for author authorization. I have a bureau token. Again this is a simple concept demo but you can build more robust integrations here as well. I'm calling the list tool and then I'm passing those tools to my agent as we've seen before. This time it's the MCP available tools. And then if we run this here, we can quickly see this in action and basically going to ask it here to roll a dice. And we're asking it to roll a d20. So again, 20 sides and it's coming back. What did we roll? You can see the tool is kicking in here. We rolled a seven. Great. So this is just really a quick example. The good news is once you're in the AWS world and you're working in Lambda, everything you can build with Lambda, you can integrate there. So basically you have access again to all of the great features, capabilities, applications you might have already built on AWS. Now the next step here is how do we make agents talk to each other, right? That's kind of the the next frontier. And we are super excited about the all the open protocols that are emerging right now with MCP. For example, we joined the steering committee. We're active part of the community contributing code and helping to further evolve MCP. If you want to learn more about this, here is the QR code. We have a whole blog series started on our open source blog. Feel free to check that out as we continue to help evolve those protocols. Now, what's next? We all are aware that this is just the beginning, right? There will be so much more coming. And if you had a chance to check out my colleague Danielle's talk yesterday on useful general intelligence, I just want to quote her a little bit. She said, "The atomic unit of all digital interactions will be an agent call." So we can imagine a future here where you might just have personal agent like shown like this connecting to an agent store and really kind of having agents together accomplishing tasks for you. And some of you here in the room might already be building this, right? So let's go and build this future together. Thanks so much. Check out the additional sessions we have. My colleague Mike is going much more into the rolling dice demo, everything MCP and strands. And my colleague Suman tomorrow will also have a deep dive on strands. And with that, thank you very much. Check us out in the expo hall and grab your own D20. [Music]