Building Agents at Cloud Scale — Antje Barth, AWS

Channel: aiDotEngineer
Published at: 2025-08-02
YouTube video id: WJjInLeaJjo
Source: https://www.youtube.com/watch?v=WJjInLeaJjo
[Music]
[Music]
Hi everyone. I'm thrilled to be back on
stage here again at the engineer worlds
fair and it's amazing to see this
community grow. So today I'm going to
speak about how we can build agents at
clouds scale.
Now at Amazon and AWS we truly believe
that virtually every customer experience
we know of will be reinvented with AI.
And not just the existing experiences,
but there will also be brand new
experiences we are now able to build
with the help of AI agents.
And we're not just theorizing about
this, right? We're all here together to
actually build the future.
Now, I want to start just with a little
bit of what that means internally across
Amazon as a business.
At Amazon, we have over 1,000
generative AI applications that are
either built or in development,
transforming everything from how we
forecast inventory to how we optimize
delivery routes to how customers shop
and how they interact with their homes.
And one of the most ambitious
deployments of AI agents is the complete
reimagining of Alexa.
And I know many of us have been waiting
for this for a long time. So what you're
about to see here represents the largest
integration of services, agentic
capabilities, and LLM that we know of
anywhere. So let's have a brief look.
Wow. Wow. Look at my style. I know you
ain't seen it like this in a while. Oh,
hey there. So, we can just like talk
now. I'm all ears. Figuratively
speaking.
Do you know how to manage my kids
schedules? I noticed a birthday party
conflicts with picking up grandma at the
airport. Want me to book her a ride?
Billy Eyish is in town soon. No way. I
can share when tickets are available in
your city. Yes, please.
Got any spring break ideas? Somewhere
not too far, only if there's a beach and
nice weather. Santa Barbara is great for
everyone. I found a restaurant downtown
I think you'd like. What is Santa
Barbara known for? It has great upscale
shops and oceanfront dining. Can you go
whale watching? Absolutely. Want me to
book a catamaran tour? Wow. What's the
next step? Remove the nut holding the
cartridge. Should I get bangs? You might
only love them for a little while.
You're probably right. Make a slideshow
of baby teap. Mom, what part am I
looking for again? 2in washers. Your
Uber is 2 minutes away. For real?
Wait, did someone let the dog out today?
I checked the cameras and yes, in fact,
Mozart was just out.
I love sharing this video because it
shows really the power of agents at
scale and just to have a quick look what
that means in terms of numbers.
We have over 600 million Alexa devices
now out in the world and with the help
of the latest advancements in AI, we
were able to really reimagine this
experience.
Alexa Plus works through hundreds of
specialized expert systems. That's what
the Alexa team calls groups of
capabilities, APIs, and instructions to
accomplish a specific task for you. And
all of these experts also orchestrate
across tens of thousands of partner
services and devices to get the things
done, which you just seen a glimpse of
this here in this video. And we truly
believe that the future will be full of
those specialized agents, each with
their own unique capabilities and
working together seamlessly with other
AI agents.
Now, this example shows what's possible
at this massive scale. But how do we get
there? How do we operate at this scale?
or said differently, how do we move from
web services that we've built for many
years now into developing those agentic
services? And luckily, many of the
underlying principles remain the same.
Whether you're building for millions of
devices, whether you're reimagining and
integrating AI experiences into your
enterprise applications, or you're a
startup and you're really just looking
to kind of scale your idea to the next
level.
Now, another example I want to show you
is an agentic service that we built at
AWS.
You might have heard about Amazon Q
developer which is our code assistant
that helps you really kind of across the
software development life cycle and just
a few months ago we released an Q
developer agent for your CLI. So it
brings the agendic chat experience into
the terminal. It helps you to debug
issues. You can ask it natural
questions. It can read and write files
and really kind of help to make your
day-to-day in the terminal more
productive.
So let's have a quick look how this
looks.
Here is Amazon Q in the CLI and I'll
just ask a good question here. In this
case, hey, what do you know about Amazon
Bedrock? CLI is integrated with MCP. So
what it does, it actually figures out
there is a tool. Our AWS documentation
team has released an MCP server. It's
connecting to it. You see the tool is
happening and it's asking for
permissions. So I give it the
permissions and then it comes back with
a response that is grounded in the
official AWS documentation.
Now I don't want to talk much more about
Q but I do want to ask for you just to
quickly think about how long did it take
for the AWS internal teams to build and
ship this agentic service and let's just
do it with a quick raise of hands who
think it took two months to develop and
ship this
It's a few hands. Who thinks three
weeks?
All right, it's a bunch of more hands.
Who do you think it took half a year?
Almost none. Wow, you folks are great.
We built and shipped this within 3
weeks. And to me, this is just almost
insane, right? Like the speed and we
heard it earlier like the mode of of AI,
one of the keynote speakers called it
out is execution, right? And I think
three weeks is super impressive.
Now,
how do we enable teams and not just
internally at AWS, but in general to
build and ship production ready AI
agents this quickly? What we did
internally, our teams, we needed to
fundamentally rethink how to build
agents. And what we did is we developed
a model-driven approach that really kind
of taps into the power of LLMs these
days and models that are so much more
capable in deciding, planning,
reasoning, taking actions and let the
developers focus on what their agent
should do rather than telling it exactly
how to do it. And the great news is we
made it available for all of you to use
as well. So just a few weeks ago, we
released Strand agents. It's an open-
source Python SDK which you can check
out and start building and running AI
agents in just a few lines of code. So
let me show you quickly how this looks
like. And before I go in here, just a
fun fact. If you wonder why did they
call it trans agents?
Well, this is what happens if you let AI
pick its own name.
All right. So the reasoning behind
because again the AI agent is is capable
of reasoning. It came up with like think
about the two strands of DNA
and just like the two strands of DNA
strands agents connects the two core
pieces of an agent together the model
and the tools.
And it helps you building agents. It
simplifies it by you really relying on
those state-of-the-art models to reason,
to plan, and take action. You can simply
start with defining a prompt and your
tools in code and then test it out
locally and then once you're ready,
deploy it for example in the cloud.
And this is how simple it is. Again,
just a couple of lines should look
pretty familiar. You install strands
agents, you import it and then it comes
with pre-built tools which I talk about
a little bit more in detail and
basically you just add the tools to your
agent and then you can start asking it
questions or building more complex
workflows with it.
Now by default strands agents integrates
with Amazon Bedrock as the model
provider. So you can check the model
config here using cloud 3.7 sonnet. But
of course, it's not just limited to AWS.
You can use Strand agents across
multiple providers. For example, we have
integrations with a llama. So you can
start developing locally, testing it
out. We have integrations and tropicedit
integrations, metaedit integrations to
the llama API. You can use openAI models
and any other providers available
through the integration with light LLM.
And of course, you can also develop your
own custom model provider.
Now quickly on the tools as I said
agents comes with over 20 pre-built
tools. So anything from simple tasks
like hey I just want to do some file
manipulation some API calls obviously
integrate with AWS services but then
also more complex use cases and I just
want to call out a couple of them. So
there's a whole group of integrated
tools for memory and rack. One tool
specifically called retrieve which lets
you do semantic search over a knowledge
base. And just to show you the power of
this, we have an internal agent at AWS
that manages over 6,000 tools.
Now 6,000 is a hard number of tools to
put into a single context window and
give um one model to decide. So what we
did is we put the descriptions of those
tools in a knowledge base and use the
retrieve tool here. So the agent can
find the most relevant tools for the
task at hand and only pull those tools
back into the model context for the
model to decide which one to take. So
that's just one use case how we're
leveraging that. Also there is support
for multimodality across images, video
and audio with strands. There is a tool
to kind of prompt for more thinking and
deep reasoning and it also comes with
pre-built tools to implement multi- aent
workflows whether it's graph-based
workflows or a swarm of sub aents
working together.
Now you cannot talk about tools without
mentioning MCP right?
So obviously we integrated MCP here
natively within strands. So you can just
use this also to connect to thousands of
available MCP servers and make them
available as tools for your agent.
Support for A2A is also coming soon. But
let's start and talk a little bit about
MCP first.
If you're building on AWS already, make
sure to bookmark this GitHub repo. It's
labmc.
And here you can find a very long list
much longer than you what would you see
here on this slide of and growing number
of the MCP server implementations
specifically if you're working and
building on AWS.
Now, one of the challenges stems from
the fact that once we all started
building MCP servers, what we had was
standard IO, right? So, it started out
to help locally connect your systems,
your clients to respective tools.
And here's just a quick example, which
is important for a demo I'll show in a
little bit. This is just a standard IO
implementation of an MCP server. should
look familiar to most of you working
with MCP using the Python SDK using fast
MCP. All I'm doing here is set up my
server and using the decorator to define
a tool. In this case, my tool is to roll
a dice. And you might see in the code
here, it has an input to define the
number of sides. And I had to put a
picture here because I have to admit,
um, I just learned this myself. Do we
have DND fans in the room?
Woohoo. All right, a few of them. So,
you all know what I'm talking about. For
the rest of us, I just learned um there
are dices, and I have one here. Not sure
if the camera can catch this. Um it's
just one of them here on the slide. A
dice that has, for example, this one has
20 sides. Something very normal in the
D&D world to start, I think, your game.
Um, don't ask me questions about D and
D. my colleague Mike Chambers who is
either here or in the expo right now. He
built the demo, so kudos to him and he
can answer all of the D&D questions. All
right, just keep that in mind. Um, I'll
come back to this in just a second.
Now, what we want to do here is to
decouple and kind of connect to remote
MCP servers because the topic is to
scale, right? And the way to do this is
in the AWS world as easy as just
deploying it as an Lambda function. So
we can do this now with streamable HTTP.
And the same concepts apply. You put
your Lambda functions as you would have
before behind an MCP gateway and then
connect. And because we care about
security and authorization in the quick
demo I'm going to show you, I'm using an
authorizer. Um you can also plug in a
Cognto framework for this part. And I'm
also going to store session data in a
Dynamob table.
So let's roll this quick demo here. So
what you see here is an MCP Lambda
handler that we developed. It's
available on the GitHub repo which makes
it really easy to kind of set up your
MCP server in Lambda. Here's a very
simple hello world example. The tool is
just um again defined with a tool
decorator in here and then in the lambda
handler function you can reference um
the input here the invoke function and
pass it to that MCP server. Now if we're
looking at the server implementation and
here we're doing a little bit more. You
can see how we're adding session table
support which is a DynamoB table. We're
defining the tool. This is the rolling
dice tool that I just pointed out but
this time it's hosted as a lambda
function. You can write all the code you
want to have there as well. And then at
the very end, it's the same single line
that basically when you call the lambda
function passes this on to the MCP
server.
Let's deploy this. And again, we're
using the existing tools to deploy
Lambda functions as we have before. So
this one is using AWS SAM to just deploy
that to the cloud. And then we will
receive the API gateway URL as well. Now
from the client side here I'm using
strands agents as you can see and then I
am using the MCP integration. I'm
passing here my API gateway URL to
connect for author authorization. I have
a bureau token. Again this is a simple
concept demo but you can build more
robust integrations here as well. I'm
calling the list tool and then I'm
passing those tools to my agent as we've
seen before. This time it's the MCP
available tools. And then if we run this
here, we can quickly see this in action
and basically going to ask it here to
roll a dice.
And we're asking it to roll a d20. So
again, 20 sides and it's coming back.
What did we roll? You can see the tool
is kicking in here. We rolled a seven.
Great. So this is just really a quick
example. The good news is once you're in
the AWS world and you're working in
Lambda, everything you can build with
Lambda, you can integrate there. So
basically you have access again to all
of the great features, capabilities,
applications you might have already
built on AWS.
Now the next step here is how do we make
agents talk to each other, right? That's
kind of the the next frontier. And we
are super excited about the all the open
protocols that are emerging right now
with MCP. For example, we joined the
steering committee. We're active part of
the community contributing code and
helping to further evolve MCP. If you
want to learn more about this, here is
the QR code. We have a whole blog series
started on our open source blog. Feel
free to check that out as we continue to
help evolve those protocols.
Now, what's next? We all are aware that
this is just the beginning, right? There
will be so much more coming. And if you
had a chance to check out my colleague
Danielle's talk yesterday on useful
general intelligence, I just want to
quote her a little bit. She said, "The
atomic unit of all digital interactions
will be an agent call." So we can
imagine a future here where you might
just have personal agent like shown like
this connecting to an agent store and
really kind of having agents together
accomplishing tasks for you. And some of
you here in the room might already be
building this, right?
So let's go and build this future
together. Thanks so much. Check out the
additional sessions we have. My
colleague Mike is going much more into
the rolling dice demo, everything MCP
and strands. And my colleague Suman
tomorrow will also have a deep dive on
strands. And with that, thank you very
much. Check us out in the expo hall and
grab your own D20.
[Music]