The Future of MCP — David Soria Parra, Anthropic
Channel: aiDotEngineer
Published at: 2026-04-19
YouTube video id: v3Fr2JR47KA
Source: https://www.youtube.com/watch?v=v3Fr2JR47KA
[music] >> Well, welcome. Let's get started. This is an MCP application. That's an agent shipping its own interface not through like a plugin, not through an SDK, not rendered on the fly by the model on the client side, or hardcoded into the product. That is something that is served over an MCP server, and you can take the server, put it into cloud, you can put it into ChatGPT, you can put it into VS Code Cursor, and it will just work. And that I think it's kind of cool because for doing that, you need something that a lot of things that we're want in the ecosystem do not offer. You need semantics, you need to have both sides, client and the server, to understand what each side is talking, to understand how you render this, understand that there's a UI coming. And for that, you need a protocol. And the best part about this, an MCP server doesn't just ship an app, or can ship an app, it can also ship tools with it, and so you can interact with it with the application as a human, and you can have the model interact with it through tools, which is I think a very unique thing that I think we have not explored much just yet. Okay. But, let's quickly rewind a little bit from this what I think is a really cool glimpse into the future of MCP into over a year ago, 18 months, an eternity in AI life cycle, um all of this did not exist. There was just a little spec document, a few SDKs, uh mostly written by Claude, local only with little more than just tools. And in that last 18 or 12 months, you guys have been absolutely crazy building stuff, um building servers, building um an crazy ecosystem around this, and we on our side have been busy busy taking this local only thing, added remote capabilities, added centralized authorization, added new primitive like elicitation and tasks, and last but not least, added new experimental features to the protocol like the MCP applications that you've just seen. And in the meantime, we have reached, I think, a really cool milestone because again, you all of you have been absolutely crazy building, building, and building. Of course, luckily with the help of a a bunch of agents. Um we're now like at 110 million monthly downloads. And that's just, of course, not us using it in our clients and servers. That's like OpenAI's agent SDK, that's Google's ADK, that's LangChain, thousands of frameworks and tools that you might have never ever heard of it pulling it as a as a dependency, which means there's one common standard that all of us have at our disposal to speak to each other. Um just a bit for context, uh React, one of the most successful um open source projects probably of the last decades, took roughly double the amount of time to reach that download volume. And in the meantime, of course, you all have been building really, really cool servers from like little toy projects of WhatsApp servers and Blender servers, uh to building SAS integrations like Linear, Slack, and Notion that are really powering what everyone does every day when they use MCPs. But most importantly, the vast majority of MCP server most of all of us have built are behind closed doors uh connecting company systems to agents uh and AI applications. But I still think this is just the absolute beginning of where we are. Because I think 2025 was all about exploring, and 2026 is all about putting these agents into production. Because if you really think about it, in my mind, 2024, we just built a bunch of like demos and showed some cool stuff to people, and there was a little bit of a buzz there. 2025 was really all about coding agents. But coding agent, if you really think about it, are the most ideal scenario for an agent. It's local, it's verifiable, you can call a compiler, like you have a developer who can fix if it goes wrong in front of the in front of the computer, uh and you can display a UI interface, and the user's quite happy. But I think now with the capabilities of the model increasing, we're going into a new era, which I think this year will be we will see the start, where we're not just doing coding agents, we're going to have general agents that will do real knowledge worker stuff, like things a financial analysis analyst want to do, uh a marketing person want to do. And they need one thing in particular. They don't need a local agent that calls a compiler. What they need is something that could connect to like five SAS applications and a and a shared drive because the most important part for them for an agent is connectivity. And in my mind, connectivity is not one thing. If one if someone tells you there's one solution to all your connectivity problem, be it computer use, be it CLIs, be it MCP, they are probably pretty wrong because the right because the right thing, of course, is that it always means it depends, and there's a real a big connectivity stack, and there's a right tool for the right job. And in my mind, there are three major things that you want to consider building an agent in 2026. It's skills, MCP, and of course, like CLI or computer use depending on your use case. And they have three very distinct things that they can do in three different things you want to consider when you build your agent. Number one, skills, of course, is just like domain knowledge, it's just like capture-specific capabilities put into a very simple file, and it's mostly reusable. There are some minor differences between the different platform. Of course, CLIs very popular when local coding agents. It's an amazing tool to get simply started, to have something that you can pose in a bash, that you that automatically discover where the model can automatically discover what the CLI is capable of. And most importantly, if you have things that are like CLIs, like GitHub, Git, and other things that are in pre-training, CLI is an amazing solution for your connectivity part, and they're particularly good when you have a local agent where you can assume a sandbox, where you can assume a code execution environment. But if you don't have this, if you need rich semantics, when you need a UI that can display long-running tasks, when you can have when you need things like resources, when you need to build something that is full decoupled and needs platform independence, or you don't have a sandbox, when you need things like authorization, governance, policies, or short to say boring enter boring but important enterprise stuff, or if you want to have experiments like MCP applications or what comes soon, skills over MCP, then I think MCP is just like additional connective tissue that is just yet another tool in the toolbox for you to build an amazing agent. And so this is all to say that I think in 2026, we're going to start building agents that use all of it. They don't use one thing, they use all of it, and they use them quite seamlessly together. But I don't think we're quite there just yet. Because we need to build a lot of stuff partially um because our agents kind of still suck. Um and partially because I think we just haven't talked enough about like some of the techniques you can do uh to really put this connective tissue together. The number one thing that we need to go and start building is on the client side, on the on the agent harness side, on the things that powers the connective parts, that be it a cloud code, uh be it a pie, be it whatever application you're going to build. And the number one thing we're going to do there, and what we all have to do, and something I want to really get across today, is that we need to go and start building something called progressive discovery. Most people when they think about like, "Oh, I MCP," they can't think about like context load. But if you really consider what a protocol does, the protocol just puts information across the wire, but the client is responsible for dealing with that information. And what everybody so far has done because we're in this very early experimentation phase, is to simply put all the tools into the context window, and then be quite surprised that maybe the context window gets large. Um but what you can do instead, and what you should do instead, you should start using this progressive discovery pattern, which is to say, use something like tool search to defer the loading of the tools, and start loading the tools when the model needs it. And we have this in the Anthropic API, and people can use this uh on on competitors' APIs as well. But also, you can just build this in yourself where you just download the tool directly, and the moment you give the you give the model a tool loading tool, basically, and the model goes like, "Ah, maybe I need a tool now. Let me look up what tools I need." And then you load them on demand. And here in this example, what you're seeing is on the left side is uh Claude Code before we added this to Claude Code, and then after it uh to Claude Code. So you see a massive reduction in tool uh use uh tool context usage. The second part of that is is something called programmatic tool calling, or what other people usually refer to um to code mode. Um this is the idea that one thing that you really want to do is you want to compose things together. You don't want the model to go call a tool, take the result, then go and talk, call another tool, take the result, call another tool. Because what you're effectively doing is you're letting the model orchestrate things together, and in that orchestration, you're using inference, you're it's it's latency sensitive, and all of it stuff could be done way more effective if you would instead write a script. Um and in fact, that's actually what you constantly do and what you constantly see things like hard code do when it writes the bash command. But you can of course do this with everything, and you can do this with MCP, and you should do this with MCP. So, what does this mean? So, what you want instead of having one tool at another, you want to give the model a repple tool, provide like a like a execution environment, like a V8 isolate or a monty or something like that, or a lua interpreter, and just have the model write the code for you, and the model just executes that code, and then composes them together. And there's a neat little feature in MCP called structured output that tells you what the return value of the output will be, and the model can use this information to to figure out type information, which then mean it can really nicely compose these things together. And in this example here, instead of doing two different calls, you do one call, and you can filter that the model will automatically remove things from a JSON and just continue. Of course, if you don't have uh structured output, you can always just ask the model to give you structured output um uh by just extracting it and saying, "Hey, call us cheap model and say, 'I want this expected type, give it back to me.'" And bam, you have a type, the model can compose things together, and I think this is something we're just not doing enough yet, and this is I think something where we can improve our agent harnesses. And then last but not least, of course, you can just compile compose these things together with executables, like with CLIs, with other components, with APIs as well. Um next, what we need to do besides the client work, which is progressive discovery and um programmatic tool calling, we need to go and start building properly for agents. And that means we all need to stop taking rest APIs and put them one-to-one into uh an MCP server. Every time I see someone building another rest to MCP server a conversion tool, I'm it's a bit cringe because I think it's just it just results in horrible things. Um and what you should do instead, you should design for an agent. Or basically, you can start designing for you as a human, how you would want to interact with this, because that's actually a very, very good start for an agent. If you want to orchestrate things together, you should reach, of course, for programmatic tool calling, and you can do this on the client side, as I said before, but you can also do this on the server side. The Cloudflare MCP server and others like that are great examples how you can have, instead of providing tools, provide an execution environment to the model and then just have them orchestrate things together, which again cuts on token usages, cuts on latency, and is way more powerful in its composition. And then last but not least, you should start and we should start as server authors to use this rich semantics that MCP offers over alternatives. This means shipping MCP applications, it means shipping skills over MCP, it means um using things like task and other aspects that the protocol offers that we're currently slightly underused, or things like elicitations. Things that only MCP can do for you. And of course, that's all the work you all need to do, and maybe some of our product people need to do, we also need to do a lot of work on MCP itself. And there's a few things down the line that we're going to go and have to go and solve. The number one thing is we need to improve the core. There's a few things that, as we have developed the protocol over the last year, that are just not in a good shape. Number one is that the current streamable HTTP is very hard to scale if you're a large hyperscaler. >> [snorts] >> And so, we have a proposal from our friends at Google, who are working on something called a stateless transport protocol, which make it significantly easier to just treat MCP servers like you know, another stateless uh rest server or something like that and we are used to know how to deploy to like cloud runs or kubernetes and so on. So, that's coming down in June and hopefully lining in the SDKs very soon. In addition, we need to improve our asynchronous task primitive, which basically is a very fancy way to say we just want to have agent-to-agent communication. We have a very experimental version of the protocol that very few clients support, so we're going to start building more clients out like that, and most importantly, we are improving some of the little semantics that we need to do. We're going to ship a TypeScript version SDK version two and Python SDK version two based on a lot of the lessons learned over the last year. There's a there's a SDK called fast MCP. Who's using fast MCP? Yeah. It's just way better than Python SDK that we're shipping, right? And that's on me because I wrote the Python SDK. Um and and so, I have a bunch of people who are way better Python developers than me help me write it better. Um the second part is we need to start integrating everywhere. We're going to ship for particularly for enterprises something called cross-app access. It's a new thing that we're working closely together with identity providers, which just allows you It's a very fancy way to say once you log in once with your local company identity provider, be it a Google, be it an Okta, you will be able to just use MCP servers without having to re-login. So, it's a bit more smoothness. Um in addition, we're going to add something called a server discovery by by specifying how you can discover servers on well-known URLs automatically. So, crawlers, browsers, um agents can just go to a website and say, "Oh, I'm instead of just parsing the website, is there also an MCP server I can use?" And we will be able to automatically discover this. This is a really cool thing that will come down also in June when we launch the next specification and will be supported there. And then last but not least, we're starting to use our extension mechanisms in in MCP, which means that some clients will support this, like for example, MCP applications will only be supported by web-based interfaces, because if you're a CLI, you just have a hard time rendering HTML, right? Um and we will do more of these extensions. One of the most exciting extensions that I think is is cool, we're just going to ship skills over MCP, because it's very obvious that if you have a large MCP server with tons and tons of tools, you just want to ship the main knowledge with it and say, "Oh, this is how you're supposed to use this. This is how you're supposed to use this." And it allows you as a server author to continuously ship updated skills without having to rely on plugin mechanisms on registries and other stuff. So, that's coming down. Um there's a lot a lot of experimentation from people already in that space. You can already do some of that today if you just give the model a load skills tool. Like there you can you can build primitives or versions of this today without having to rely on the semantics, but of course, we're going to define the semantics. Okay. So, that's for me a long-winded way to think to say that I think MCP is actually in a really good shape, and I think in this year, we're going to push uh agents to full connectivity, um MCP will continue to play a major, major, major role. And we want, of course, your feedback. We are very open community. We are just have created a foundation. We're mostly running as an open-source community with a discord, with issues. Um just come to us and tell us where the are we wrong, what are we getting right, um so that we can improve this on a continuous basis. So, 2026, I think is all about connectivity, and the best agents use every available method. Like they will use computer use, they will use CLIs, they will use MCPs, and they will use will use skills. Because they want to have a wide variety of things they can do, and then they can ship cool stuff like this, um which is um one of the product features we shipped recently. Uh under the hood, it's nothing but an MCP application um that renders stuff, right? Cool. So, we can now look at uh the model writing graphs. Anyway, thank you. >> [music]