MCP is all you need — Samuel Colvin, Pydantic

Channel: aiDotEngineer
Published at: 2025-07-18
YouTube video id: bmWZk9vTze0
Source: https://www.youtube.com/watch?v=bmWZk9vTze0
[Music]
So yeah, I'm talking about uh MCP is all
you need. A bit about who I am before we
get started. I'm best known as the
creator of Pyantic uh data validation
library for Python that is uh fairly
ubiquitous. downloaded about 360 million
times a month. So someone pointed out to
me that's like 140 times a second. Uh
Pantic is used in general Python
development everywhere but also in
Genai. So it's used in all of the SDKs
and agent frameworks in Python
basically. Uh Pantic became a company uh
uh beginning of 23 and we have uh built
two things beyond Pantic since then.
Pantic AI uh an agent framework for
Python built on the same principles as
Pantic um and Pantic Logfire
observability platform um which is our
which is the commercial part of what we
do. Um I'm also
a somewhat inactive co-maintainer of the
MCP Python SDK.
Um
so MCP is all you need is obviously a a
play on Jason Lou's talks pantic is all
you need that he gave at AI engineer I
think first of all nearly two years ago
and then the second one pantic is still
all you need maybe this time last year.
Um and it has the same basic idea that
people are over complicating something
that we can use a single tool for. And I
guess also similarly the title is
completely unrealistic. Of course,
padantic is not all you need. Uh and
neither is MCP for everything. But it
has the we have the I think where where
we agree is that there are an awful lot
of things that MCP can do and that
people are over complicating the
situation sometimes trying to come up
with new ways of doing agentto agent
communication.
Um,
I'm talking here specifically about
autonomous agents or code that you're
writing. I'm not talking about the
um,
uh, claw desktop or cursor uh, Z wind
surf, etc. use case of coding agents.
Those were what MCP was originally
primarily designed for. Um, I don't know
whether or not David Pereira would say
that that what we're doing using MCP
from Python is a he definitely wouldn't
say it's a misuse, but it I don't think
it it was the primary uh design use case
for
um for MCP.
So, two of the of the primitives of MCP
prompts and resources probably don't
come into this use case that much.
They're very useful or or should be very
useful in the kind of cursor type use
case. They don't really apply in what
we're talking about here. Um
but tool calling, the third primitive is
extremely useful for what we're trying
to do here. Um tool calling is a lot
more complicated than you might at first
think. A lot of people say to me about
MCPR, but couldn't it just be uh open
API? Why do we need this uh custom
protocol for doing it? Um, and there's a
number of reasons. The idea of dynamic
tools, the tools that come and go during
an agent execution depending on the
state of the server. Logging, so being
able to return data to the user
while the tool is still executing,
sampling, which I'm going to talk about
a lot today, perhaps the most
confusingly named part of MCP, if not
tech in general right now. Uh, and stuff
like tracing, observability. Um, and I
would also add to that actually the uh
MCP's way of being allowed to operate as
effectively a subprocess over standard
in and standard out is extremely useful
for lots of use cases and open API
wouldn't wouldn't solve those problems.
This is the kind of prototypical
image that you will see from lots of
people of what uh MCP is all about. The
idea is we have some agent, we have any
number of different tools that we can
connect to that agent and the point is
that like the agent doesn't need to be
designed with those particular tools in
mind and those tools can be designed
without knowing anything about the agent
and we can just compose the two together
in the same way that uh I can go and use
a browser and the web application the
website I'm going to doesn't need to
know anything about the browser. I mean
I know we live in a kind of monoculture
of browsers now but like at least the
ideal originally was we could have many
different browsers all connecting over
the same protocol. MCP is following the
same idea.
But it can get more complicated than
this. So we can have situations like
this where uh we have tools within our
system which are themselves agents and
are doing agentic things need access to
an LLM and they of course can then in
turn connect to other tools over MCP or
or directly connecting to tools. This
this works nicely. This is elegant. But
there's a problem.
every single agent in our system needs
access to an LLM. And so we need to go
and configure that. We need to work out
resources for that. And if we are
um using remote MCP servers, if that
remote MCP server needs to
um use an LLM, well, now it's worried
about what the cost is going to be of
doing that. What what if the uh remote
agent that's operating as a tool could
effectively piggyback off the
uh the model that the original agent has
access to. That's what sampling gives
us. So as I say, I think sampling is a
somewhat uh that's not making that any
bigger unfortunately. Um is that clear
on screen? I may maybe I'll make it
bigger like that. Um sampling is this
idea of a of a way where within MCP the
protocol the um server can effectively
make a request back through the client
to the LLM. So in this case client makes
a request starts some sort of aantic
query makes a call to the LLM LLM comes
back and says I want to call that
particular tool which is an MCP server.
Uh client takes care of making that call
to the MCP server. The MCP server now
says, "Hey, I actually need to be able
to use an LLM to answer whatever this
question is." So that then gets sent
back to the client. The client proxies
that request to the LLM, receives the
response from the LLM, sends that uh
onto the MCP server, and the MCP server
then returns and we can continue on our
way. Um, sampling is very powerful, not
that widely supported at the moment. Um,
I'm going to demo it today with Pantic
AI where we have support for sampling.
Well, I'll be honest, it's a PR right
now, but it will be soon it will be
merged. Um, we have support for sampling
both as a uh as the client. So, knowing
how to proxy the those LLM calls and as
a server basically being able to
register use the MCP client as as the
LLM.
So this example
is obviously like all examples
trivialized or simplified to be to fit
on screen. The idea is that we we're
building a like research agent which is
going to go and research open source uh
packages or libraries for us. And we
have implemented one of the many tools
that you would in fact need for this.
And that tool is um
making uh I will switch now to code and
show you uh the one tool that we have.
Uh
I'm in completely the wrong file. Here
we are. Um so this tool is querying
BigQuery
BigQuery public data set for uh Pippi to
get uh numbers about the number of
downloads of a particular package. So
this is this is pretty standard padantic
AI uh padantic AI code. We've configured
log file which I'll show you in a
moment. We have the dependencies that
the uh that the agent has access to
while it's running. We said we can do
some retries. So if the agent returns if
the LLM returns the wrong data, we can
send a retry a big system prompt where
we give it basically the schema of the
table. Uh tell it what to do, give it a
few examples, yada yada. But then we get
to this is the probably the powerful
bit. So as an output validator we are
going to go and first of all we're going
to strip out uh markdown block quotes
from the SQL um if they're there then we
will uh check that the table name is
right that it's querying against and
tell it that it shouldn't if it it
shouldn't and then we're going to go and
run the query and critically if the
query fails we're going to uh raise
model retry with impantic to go and
retry uh making the um
uh making the request to the um LLM
again saying asking the LLM to to uh
attempt to to retry this. And what we're
the other thing we're doing throughout
this you'll see here is we have this
context. MCP context.log. So you'll see
here when we defined depths type we said
that that was going to be an instance of
this MCP uh context which is what we get
when you call the MCP server. So what
we're doing here is we're having a we're
providing a type- safe way within in
this case um the agent validator but it
could be in a tool call if you wanted it
to be to access that context. So we can
see here that we know at um in the type
int uh uh that the the type is uh MCP
context. So we have this log function
and we know it's signature and we can go
and make this log call. The point is
this is going to
return to the client and ultimately to
the user watching before the the thing
has completed. So you can get kind of
progress updates as we go. MCP also has
a context concept of progress which I'm
not using here but you can imagine that
also being valuable if you knew how far
through the query you were. You could
show an update in progress. So the idea
I think the original principle of uh
logging like this is that you have the
the cursor style agent running and we
want to be able to give updates to the
user. Don't worry I'm still going before
it's finished and exactly what's
happening. But you could also imagine
this being useful if you were using MCP.
If this was research agent was uh
running as a web application you wanted
to show the user what was going on. This
deep research might take you know
minutes to run. We can give these logs
while the tool call is still executing.
And then we're just going to take the
the output turn it into a list of dict
and then format it as XML. So you get a
nice uh models are very good at
basically reviewing XML data. So we
basically return whatever the query
results are as that kind of XMLish data
which the LLM will then be good at uh
interpreting.
Now we get to the MCP bit. So in this
code we are setting up an MCP server
using fast MCP. There are two versions
of first MCP right now. Confusingly,
this is the one from inside the MCP SDK.
Um,
we the dock string for our function. So,
we're registering one tool here, Pippi
downloads, and our dock string from that
function will end up becoming the
description on the tool that is
ultimately fed to the LLM that chooses
to go and call it. Um, and we're going
to pass in the user's question. And I
think one of the one of the important
things to say here is of course you
could set this up to generate the SQL
within your
uh central agent. You could include all
of the um
uh description of the SQL the
instructions within your within the the
description of the tool. Uh models don't
seem to like that much data inside a
tool description. But more to the point,
we're just going to blow up the context
window of our main agent if we're going
to ship all of this context on how to
make these queries into our main agent.
That's just all overhead in all of our
calls to that agent regardless of
whether we're going to call this
particular tool. So doing this kind of
thing where we're doing the inference
inside a tool is a powerful way of
effectively limiting uh the context
window of the of the main running agent.
And then we're just going to return this
output which will be a string, the value
returned from from here. and we'll just
run the run the MCP server and by
default the MCP server will run over
standard IO. Um, and then we come to our
our main application. So here we have a
definition of our agent. And you see
we've defined one MCP server that's just
going to run the the script I just
showed you, the Pippi MCP server. Um,
and so then this agent will act as the
client and has that register as a tool
to be able to call. We're also going to
set the give it the current date. Uh so
it doesn't uh assume it's 20 2023 as
they often do. Um and now we can go and
ultimately run our main agent. Ask it
for example how many downloads Pantic
has had this year. And I'm going to be
brave and run it and see what happens.
And it has succeeded and it has uh gone
and told us uh that we had whatever 1.6
billion downloads this year. But
probably more interesting is to come and
look at what that looks like in Logfire.
So if you look at is it going to come
through to logfire or we having a
failure here as well. This I will admit
this is the run from just before uh I
came on stage but it it would look
exactly the same. So I'm not going to
talk too much about observability and
how we do uh how MCP observability or
tracing works within MCP because I know
there's a talk coming up directly after
me talking about that. So think of this
as a kind of uh spoiler for what's going
to come up. But you can see we we run
our outer agent. it decides to it calls
uh uh GPT40 uh which decides sure enough
I'm going to go and call this tool. Uh
it doesn't need to think about
generating the SQL. It can just have a
natural language description of the
query that we're trying to make. We then
um this is the MCP client as you can see
here. MCP client then calls into the MCP
server. um makes the which then again
runs a different uh pyantic AI uh agent
which then makes a call to an LLM which
happens through proxing it through the
client. So that's where you can see the
service going client server uh client
server
ultimately if you look at the top level
uh exchange with the model you'll see
here yeah the the the out ultimate
output was it return the the return
response from running the query was was
this kind of XMLish data and then the
LLM was able to turn that into a human
description of what was going on. I
think the other interesting thing
probably is we can go and look in we
should be able to see the actual SQL
that was called. So this is the agent
call inside uh MCP server and you can
see here the SQL it wrote and you can
confirm that it indeed looks correct. Um
I am going to
uh go on from there and say um thank you
very much. Um we are at the booth the
the Pantic booth. So if anyone has any
questions on this, wants to see this
fail in numerous other exciting ways,
very happy to to talk to you. Yeah, come
and say hi.
[Music]