MCP is all you need — Samuel Colvin, Pydantic
Channel: aiDotEngineer
Published at: 2025-07-18
YouTube video id: bmWZk9vTze0
Source: https://www.youtube.com/watch?v=bmWZk9vTze0
[Music] So yeah, I'm talking about uh MCP is all you need. A bit about who I am before we get started. I'm best known as the creator of Pyantic uh data validation library for Python that is uh fairly ubiquitous. downloaded about 360 million times a month. So someone pointed out to me that's like 140 times a second. Uh Pantic is used in general Python development everywhere but also in Genai. So it's used in all of the SDKs and agent frameworks in Python basically. Uh Pantic became a company uh uh beginning of 23 and we have uh built two things beyond Pantic since then. Pantic AI uh an agent framework for Python built on the same principles as Pantic um and Pantic Logfire observability platform um which is our which is the commercial part of what we do. Um I'm also a somewhat inactive co-maintainer of the MCP Python SDK. Um so MCP is all you need is obviously a a play on Jason Lou's talks pantic is all you need that he gave at AI engineer I think first of all nearly two years ago and then the second one pantic is still all you need maybe this time last year. Um and it has the same basic idea that people are over complicating something that we can use a single tool for. And I guess also similarly the title is completely unrealistic. Of course, padantic is not all you need. Uh and neither is MCP for everything. But it has the we have the I think where where we agree is that there are an awful lot of things that MCP can do and that people are over complicating the situation sometimes trying to come up with new ways of doing agentto agent communication. Um, I'm talking here specifically about autonomous agents or code that you're writing. I'm not talking about the um, uh, claw desktop or cursor uh, Z wind surf, etc. use case of coding agents. Those were what MCP was originally primarily designed for. Um, I don't know whether or not David Pereira would say that that what we're doing using MCP from Python is a he definitely wouldn't say it's a misuse, but it I don't think it it was the primary uh design use case for um for MCP. So, two of the of the primitives of MCP prompts and resources probably don't come into this use case that much. They're very useful or or should be very useful in the kind of cursor type use case. They don't really apply in what we're talking about here. Um but tool calling, the third primitive is extremely useful for what we're trying to do here. Um tool calling is a lot more complicated than you might at first think. A lot of people say to me about MCPR, but couldn't it just be uh open API? Why do we need this uh custom protocol for doing it? Um, and there's a number of reasons. The idea of dynamic tools, the tools that come and go during an agent execution depending on the state of the server. Logging, so being able to return data to the user while the tool is still executing, sampling, which I'm going to talk about a lot today, perhaps the most confusingly named part of MCP, if not tech in general right now. Uh, and stuff like tracing, observability. Um, and I would also add to that actually the uh MCP's way of being allowed to operate as effectively a subprocess over standard in and standard out is extremely useful for lots of use cases and open API wouldn't wouldn't solve those problems. This is the kind of prototypical image that you will see from lots of people of what uh MCP is all about. The idea is we have some agent, we have any number of different tools that we can connect to that agent and the point is that like the agent doesn't need to be designed with those particular tools in mind and those tools can be designed without knowing anything about the agent and we can just compose the two together in the same way that uh I can go and use a browser and the web application the website I'm going to doesn't need to know anything about the browser. I mean I know we live in a kind of monoculture of browsers now but like at least the ideal originally was we could have many different browsers all connecting over the same protocol. MCP is following the same idea. But it can get more complicated than this. So we can have situations like this where uh we have tools within our system which are themselves agents and are doing agentic things need access to an LLM and they of course can then in turn connect to other tools over MCP or or directly connecting to tools. This this works nicely. This is elegant. But there's a problem. every single agent in our system needs access to an LLM. And so we need to go and configure that. We need to work out resources for that. And if we are um using remote MCP servers, if that remote MCP server needs to um use an LLM, well, now it's worried about what the cost is going to be of doing that. What what if the uh remote agent that's operating as a tool could effectively piggyback off the uh the model that the original agent has access to. That's what sampling gives us. So as I say, I think sampling is a somewhat uh that's not making that any bigger unfortunately. Um is that clear on screen? I may maybe I'll make it bigger like that. Um sampling is this idea of a of a way where within MCP the protocol the um server can effectively make a request back through the client to the LLM. So in this case client makes a request starts some sort of aantic query makes a call to the LLM LLM comes back and says I want to call that particular tool which is an MCP server. Uh client takes care of making that call to the MCP server. The MCP server now says, "Hey, I actually need to be able to use an LLM to answer whatever this question is." So that then gets sent back to the client. The client proxies that request to the LLM, receives the response from the LLM, sends that uh onto the MCP server, and the MCP server then returns and we can continue on our way. Um, sampling is very powerful, not that widely supported at the moment. Um, I'm going to demo it today with Pantic AI where we have support for sampling. Well, I'll be honest, it's a PR right now, but it will be soon it will be merged. Um, we have support for sampling both as a uh as the client. So, knowing how to proxy the those LLM calls and as a server basically being able to register use the MCP client as as the LLM. So this example is obviously like all examples trivialized or simplified to be to fit on screen. The idea is that we we're building a like research agent which is going to go and research open source uh packages or libraries for us. And we have implemented one of the many tools that you would in fact need for this. And that tool is um making uh I will switch now to code and show you uh the one tool that we have. Uh I'm in completely the wrong file. Here we are. Um so this tool is querying BigQuery BigQuery public data set for uh Pippi to get uh numbers about the number of downloads of a particular package. So this is this is pretty standard padantic AI uh padantic AI code. We've configured log file which I'll show you in a moment. We have the dependencies that the uh that the agent has access to while it's running. We said we can do some retries. So if the agent returns if the LLM returns the wrong data, we can send a retry a big system prompt where we give it basically the schema of the table. Uh tell it what to do, give it a few examples, yada yada. But then we get to this is the probably the powerful bit. So as an output validator we are going to go and first of all we're going to strip out uh markdown block quotes from the SQL um if they're there then we will uh check that the table name is right that it's querying against and tell it that it shouldn't if it it shouldn't and then we're going to go and run the query and critically if the query fails we're going to uh raise model retry with impantic to go and retry uh making the um uh making the request to the um LLM again saying asking the LLM to to uh attempt to to retry this. And what we're the other thing we're doing throughout this you'll see here is we have this context. MCP context.log. So you'll see here when we defined depths type we said that that was going to be an instance of this MCP uh context which is what we get when you call the MCP server. So what we're doing here is we're having a we're providing a type- safe way within in this case um the agent validator but it could be in a tool call if you wanted it to be to access that context. So we can see here that we know at um in the type int uh uh that the the type is uh MCP context. So we have this log function and we know it's signature and we can go and make this log call. The point is this is going to return to the client and ultimately to the user watching before the the thing has completed. So you can get kind of progress updates as we go. MCP also has a context concept of progress which I'm not using here but you can imagine that also being valuable if you knew how far through the query you were. You could show an update in progress. So the idea I think the original principle of uh logging like this is that you have the the cursor style agent running and we want to be able to give updates to the user. Don't worry I'm still going before it's finished and exactly what's happening. But you could also imagine this being useful if you were using MCP. If this was research agent was uh running as a web application you wanted to show the user what was going on. This deep research might take you know minutes to run. We can give these logs while the tool call is still executing. And then we're just going to take the the output turn it into a list of dict and then format it as XML. So you get a nice uh models are very good at basically reviewing XML data. So we basically return whatever the query results are as that kind of XMLish data which the LLM will then be good at uh interpreting. Now we get to the MCP bit. So in this code we are setting up an MCP server using fast MCP. There are two versions of first MCP right now. Confusingly, this is the one from inside the MCP SDK. Um, we the dock string for our function. So, we're registering one tool here, Pippi downloads, and our dock string from that function will end up becoming the description on the tool that is ultimately fed to the LLM that chooses to go and call it. Um, and we're going to pass in the user's question. And I think one of the one of the important things to say here is of course you could set this up to generate the SQL within your uh central agent. You could include all of the um uh description of the SQL the instructions within your within the the description of the tool. Uh models don't seem to like that much data inside a tool description. But more to the point, we're just going to blow up the context window of our main agent if we're going to ship all of this context on how to make these queries into our main agent. That's just all overhead in all of our calls to that agent regardless of whether we're going to call this particular tool. So doing this kind of thing where we're doing the inference inside a tool is a powerful way of effectively limiting uh the context window of the of the main running agent. And then we're just going to return this output which will be a string, the value returned from from here. and we'll just run the run the MCP server and by default the MCP server will run over standard IO. Um, and then we come to our our main application. So here we have a definition of our agent. And you see we've defined one MCP server that's just going to run the the script I just showed you, the Pippi MCP server. Um, and so then this agent will act as the client and has that register as a tool to be able to call. We're also going to set the give it the current date. Uh so it doesn't uh assume it's 20 2023 as they often do. Um and now we can go and ultimately run our main agent. Ask it for example how many downloads Pantic has had this year. And I'm going to be brave and run it and see what happens. And it has succeeded and it has uh gone and told us uh that we had whatever 1.6 billion downloads this year. But probably more interesting is to come and look at what that looks like in Logfire. So if you look at is it going to come through to logfire or we having a failure here as well. This I will admit this is the run from just before uh I came on stage but it it would look exactly the same. So I'm not going to talk too much about observability and how we do uh how MCP observability or tracing works within MCP because I know there's a talk coming up directly after me talking about that. So think of this as a kind of uh spoiler for what's going to come up. But you can see we we run our outer agent. it decides to it calls uh uh GPT40 uh which decides sure enough I'm going to go and call this tool. Uh it doesn't need to think about generating the SQL. It can just have a natural language description of the query that we're trying to make. We then um this is the MCP client as you can see here. MCP client then calls into the MCP server. um makes the which then again runs a different uh pyantic AI uh agent which then makes a call to an LLM which happens through proxing it through the client. So that's where you can see the service going client server uh client server ultimately if you look at the top level uh exchange with the model you'll see here yeah the the the out ultimate output was it return the the return response from running the query was was this kind of XMLish data and then the LLM was able to turn that into a human description of what was going on. I think the other interesting thing probably is we can go and look in we should be able to see the actual SQL that was called. So this is the agent call inside uh MCP server and you can see here the SQL it wrote and you can confirm that it indeed looks correct. Um I am going to uh go on from there and say um thank you very much. Um we are at the booth the the Pantic booth. So if anyone has any questions on this, wants to see this fail in numerous other exciting ways, very happy to to talk to you. Yeah, come and say hi. [Music]