Katelyn Lesse – Evolving Claude APIs for Agents, Anthropic
Channel: aiDotEngineer
Published at: 2025-12-04
YouTube video id: aqW68Is_Kj4
Source: https://www.youtube.com/watch?v=aqW68Is_Kj4
[music] Good morning. Um, so first let's give a huge thank you to Swix and the whole AI engineer organizing team for bringing us together. [applause] I'm Caitlyn and I lead the claw developer platform team at Anthropic. Um, so let's start with a show of hands. Who here is integrated against an LLM API to build agents? Okay, I'm talking to the right people. Love it. Um, so today I want to share how we're evolving our platform to help you build really powerful agentic systems using claude. So, we love working with developers who do what we call raising the ceiling of intelligence. They're always trying to be on the frontier. They're always trying to get the best out of our models and build the most high performing systems. Um, and so I want to walk you through how we're building a platform that helps you get the best out of Claude. Um, and I'm going to do that using a product that you hopefully have all heard of before. Um, it's an Agentic coding product. We love it a lot and it's called Claude Code. So when we think about maximizing performance um from our models, we think about building a platform that helps you do three things. Um so first, the platform helps you harness Claude's capabilities. We're training Claude to get good at a lot of stuff and we need to give you the tools in our API to use the things that Claude is actually getting good at. Next, we help you manage Claude's context window. Keeping the right context in the window at any given time is really really critical to getting the best outcomes from Claude. And third, we're really excited about this lately. We think you should just give Claude a computer and let it do its thing. So I'll talk about how we're we're evolving the platform to give you the infrastructure and otherwise that you need to actually let Claude do that. So starting with harnessing Claude's capabilities. Um, so we're getting Claude really good at a bunch of stuff and here are the ways that we expose that to you um in our API as ideally customizable features. So here's a first example um relatively basic. Claude got good at thinking um and Claude's performance on various tasks um scales with the amount of time you give it to reason through those problems. Um, and so, uh, we expose this to you as an API feature that you can decide, do you want Claude to think longer for something more complex or do you want Claude to just give you a quick answer? Um, we also expose this with a budget. Um, so you can tell Claude how many tokens to essentially spend on thinking. Um, and so for cloud code, um, pretty good example. Obviously, you're often debugging pretty complex systems with cloud code or sometimes you just want a quick, um, answer to the thing you're trying to do. And so, um, Claude Code takes advantage of this feature in our API to decide whether or not to have Claude think longer. Another basic example is tool use. Claude has gotten really good at reliably calling tools. Um, so we expose this in our API with both our own built-in tools like our web search tool, um, as well as the ability to create your own custom tools. You just define a name, a description, and an input schema. Um, and Claude is pretty good at reliably knowing when to actually go um, and call those tools and pass the right arguments. So, this is relevant for cloud code. Cloud code has many, many, many tools and it's calling them all the time to do things like read files, search for files, write to files, um, and do stuff like rerun tests and otherwise. So, the next way we're evolving the platform to help you ma maximize intelligence from claude um, is helping you manage Claude's context window. Getting the right context at the right time in the window is one of the most important things that you can do to maximize performance. But context management is really complex to get right. Um especially for a coding agent like Claude Code. You've got your technical designs, you've got your entire codebase. Um you've got instructions, you've got tool calls. All these things might be in the window at any given time. And so how do you make sure the right set of those things are in the window? Um, so getting that context right and keeping it optimized over time is something that we've thought a lot about. So let's start with MCP model context protocol. We introduced this a year ago and it's been really cool to see the community swarm around adopting um MCP as a standardized way for agents to interact with external systems. Um, and so for cloud code, you might imagine GitHub or Century. there are plenty of places kind of outside of the agent's context where there might be additional information or tools or otherwise that you want your agent to be able to interact with or the cloud code agent to be able to interact with. Um, and so this will obviously get you much better performance than an agent that only sees the things that are in its window as a result of your prompting. Uh, so the next thing is memory. So, if you can use tools like MCP to get context into your window, we introduced a memory tool to help you actually keep context outside of the window that Claude knows how to pull back into the window only when it actually needs it. Um, and so we introduced the first iteration of our memory tool as essentially a clientside file system. So, you control your data, but Claude is good at knowing, oh, this is like a good thing that I should store away for later. And then, uh, it knows when to pull that context back in. [clears throat] So for cloud code, you could imagine um your patterns for your codebase or maybe your preferences for your git workflows. These are all things that claude can store away in memory and pull back in only when they're actually relevant. And so the third thing is context editing. If memory helps you keep stuff outside the window and pull it back in when it makes sense, context editing helps you clear stuff out that's not relevant right now and shouldn't be in the window. Um, so our first iteration of our context editing is just clearing out old tool results. Um, and we did this because tool results can actually just be really large and take up a lot of space in the window. And we found that tool results from past calls are not necessarily super relevant to help claude get good responses later on in a session. And so you can think about for cloud code, cloud code is calling hundreds of tools. Um, those files that it read otherwise, all these things are taking up space within the window. Um so they take advantage of um context management to clear those things out of the window. And so um we found that if we combined our memory tool with context editing, we saw a 39% bump in performance over o over the benchmark on our own internal evals. Um which was really really huge. And so it just kind of shows you the importance of keeping things in the window that are only relevant at any given time. And we're expanding on this by giving you larger context windows. So for some of our models, you can have a million token context window. Combining that larger window with the tools to actually edit what's in your window maximizes your performance. Um, and over time we're teaching Claude to get better and better at actually understanding what's in its context window. So maybe it has a lot of room to run, maybe it's almost out of space. Um, and Claude will respond accordingly depending on how much time uh or how much room it has left in the window. So, here's the third thing. Um, we think you should give Claude a computer and just let it do its thing. We're really excited about this one. Um, because there's a lot of discourse right now around agent harnesses. Um, you know, how much scaffolding should you have? How opinionated should it be? Should it be heavy? Should it be light? Um, and I think at the end of the day, Claude has access to writing code. And if Claude has access to running that same code, it can accomplish anything. you can get really great professional outputs for the things that you're doing just by giving Claude runway to go and do that. But the challenge for letting you do that is actually the infrastructure as well as stuff like expertise like how do you give cloud access to things that um when it's using a computer it will get you better results. So a fun story is we recently launched cloud code on web and mobile. Um and this was a fun project for our team because we had a lot of problems to solve. When you're running cloud code locally, cloud code is essentially using your machine as its computer. But if you're starting a session on the web or on mobile and then you're walking away, what's happening? Like where is that where is um cloud code running? Where is it doing its work? Um and so we had some hard problems to solve. We needed a secure environment for cloud to be able to write and run code that's not necessarily like approved code by you. Um we needed to solve or container orchestration at scale. Um and we needed session persistence um because uh we launched this and many of you were excited about it and started many many sessions and walked away and we had to make sure that um all of these things were ready to go when you came back and um wanted to see the results of what Claude did. So one key primitive in this is our code execution tool. Um so we released our code execution tool in the API um which allows Claude to run write code and run that code in a secure sandboxed environment. Um, so our platform handles containers, it handles security, and you don't have to think about these things because they're running on our servers. Um, so you can imagine deciding that um, you you want Claude to write some code and you want Claude to go and be able to run that code. And for cloud code, there's plenty of examples here. Um, like make an animation more sparkly that uh, you want Claude to actually be able to run that code. Um, so we really think the future of agents is letting the model work pretty autonomously within a sandbox environment and we're giving you the infrastructure to be able to do that. And this gets really powerful once you think about giving the model actual domain expertise in the things that you're trying to do. So we recently released agent skills which you can use in combination with our code execution tool. Skills are basically just folders of scripts, instructions, and resources that Claude has access to and can decide to run within its sandbox environment. Um, it decides to do that based on the request that you gave it as well as the description of a skill. Um, and Claude is really good at knowing like this is the right time to pull this skill into context and go ahead and use it. And you can combine skills with tools like MCP. So MCP gives you access to tools and access to context. Um, and then skills give you the expertise to actually make use of those tools and make use of that context. Um, and so for cloud code, a good example is web design. Maybe whenever you launch a new product or a new feature, um, you build landing pages. And when you build those landing pages, you want them to follow your design system and you want them to follow the patterns that you've set out. Um, and so Claude will know, okay, I'm being told to build a landing page. This is a good time to pull in the web design skill. um and use the right patterns and and design system for that landing page. Uh tomorrow Barry and Mahes from our team are giving a talk on skills. They'll go much deeper and I definitely recommend checking that out. So these are the ways that we're evolving our platform um to help you take advantage of everything that Claude can do to get the absolute best performance for the things that you're building. First, harnessing Claude's capabilities. So, as our research team trains Claude, we give you the API features to take advantage of those things. Next, managing Claude's context, it's really, really important to keep your context window clean with the right context at the right time. And third, giving Claude a computer and just letting it do its thing. So, we're going to keep evolving our platform. Um, as Claude gets better and has more capabilities and gets better at the capabilities it already has, we'll continue to evolve the API around that so that you can stay on the frontier and take advantage of the best that Claude has to offer. Um, second, as uh, memory and context evolve, we're going to up the ante on the tools that we give you in order to let Claude decide what to pull in, what to store away for later, and what to clean out of the context window. [clears throat] And third, we're really going to keep leaning into agent infrastructure. Some of the biggest problems with the idea of just let Claude have a computer and do its thing are those problems that I talked about around orchestration, secure environments, and sandboxing. And so we're going to keep working um to make sure that those are um ready for you to take advantage of. Um and I'm hiring. We're hiring at Anthropic. We're really growing our team. Um, and so if you're someone who loves um, building delightful developer products um, and if you're excited about what we're doing with Claude, we would love to work with you across end product design um, Devril, lots of functions. So please reach out to us and thank you [applause] [music]