Your MCP Server is Bad (and you should feel bad) - Jeremiah Lowin, Prefect

Channel: aiDotEngineer
Published at: 2026-01-12
YouTube video id: 96G7FLab8xc
Source: https://www.youtube.com/watch?v=96G7FLab8xc
I really do appreciate that you're all
here. I'm going to try and make this as
painless as possible. We're not going to
do an interactive part. We're going to
talk through stuff. I'm happy to go off
script. I'm happy to take questions if
there's stuff we want to explore at any
moment in this. My goal is I'd like to
share with you a lot of things that I've
learned. Um I'm going to try and make
them as actionable as possible. So there
is real stuff to do here. Um more than
we might in like a more high level talk.
But let's be very honest, it is late. It
is a lot. It is long. Let's uh let's
talk about MCP. I'm hoping that folks
here are interested in MCP and that's
why you came to this talk. If you're
here to learn about MCP, this might be a
little bit of a of a different bent.
Just show of hands,
heard of MCP,
used MCP,
written in MCP server.
Okay. Uh, anyone feel uncomfortable with
MCP, which is 100% fine. We can tailor.
Okay, then I would say let's let's just
go let's dive in. Um, this is who I am.
Uh, I'm the founder and CEO of a company
called Prefect Technologies. For the
last seven or eight years, we've been
building um data automation software and
orchestration software. Before that, I
was a member of the Apache Airflow PMC.
Um I originally started Prefect to
graduate those same orchestration ideas
into data science. Today, we operate the
full stack. And then um a few years ago,
I I developed an agent framework called
Marvin, which I would not describe as
wildly popular, but it was my leg into
the world of AI, at least from a
developer experience standpoint, and
learned a lot from that. And then more
recently, I introduced a piece of
software called fastmcp, which has is is
wildly wildly popular, maybe even too
popular. And um hence my status today.
I'm a little overwhelmed. Uh I find
myself back in an open source
maintenance seat, which I haven't been
in in a few years, which has been a hell
of a lot of fun. Um but the most
important thing is that fastmcp has
given me a very specific vantage point
that is really the basis for this talk
today. This is our downloads. I've never
seen anything like this. I've never
worked on a project like this. It was
downloaded a million and a half times
yesterday. Um there's a lot of MVP
servers out there and um fastp is just
it's it's it's become the de facto
standard way to build MCP servers. Um I
introduced it almost exactly a year ago.
As many of you are probably aware, MCP
itself was introduced almost exactly a
year ago and a few days later I
introduced the first version of fast
MCP. Uh David atropic uh called me up
said I think this is great. I think this
is how people should build servers. We
put a version of it into the official
SDK which was amazing. And then as um as
MCP has gone crazy in the last year, we
found it actually to be constructive to
position fast MCP uh as I'm maintaining
it as the highle interface to the MCP
ecosystem while the SDK SDK focuses on
the low-level primitives and actually
we're going to remove the fastm
vocabulary from the low-level SDK um in
a couple of months. It's become a little
bit of it's it's too confusing that
there are these two things called fast
MCP. So fast MTP will be a highle
interface to the world and um as a
result we see a lot of um not great MCP
servers. I I named the talk after this
meme and then it occurred to me like do
people even know what this meme is
anymore? Like this this to me is very
funny and very topical and then it's
from like a 1999 episode of Futurama. So
if you haven't seen this, my talk's
title is not meant to be mean. I'm sort
of an optimist. I choose to interpret
this as but you can do better. And so
we're going to find ways to do better.
That is the goal of today's talk. In
fact, to be more precise, what I want to
do today is I would really like to build
an intuition for a gentic product
design. Um I don't see this talked about
nearly as much as it should be given how
many agents are using how many products
today. And what I mean by this is the
exact analog of what it would be if I
were if I were giving a talk on how to
just build a good product for a user,
for a human. And we would talk about
human interface guidelines and we talk
about user experience and we talk about
stories. And I found it really
instructive to start talking about those
things from an agentic perspective
because what else is an MCP server but
an interface um for an agent and we
should design it for the strengths and
weaknesses of those agents in the same
way that we do everything else. Now when
I put this thought in the world I very
very very frequently get this push back
which is but if a human can use an API
why can't an AI and there are so many
things wrong with this question and the
number one thing that's wrong with this
question is that it has a assumption
that I see in so much of AI product
design and it drives me nuts which is
that AIs are perfect or they're oracles
or they're good at everything and they
are very very very powerful tools but
I'm assuming based on your responses
before. I think everyone in this room
has some scars of the fact that they are
fallible or they are limited or you know
they're imperfect. And so I don't like
this question because it presumes that
they're like magically amazing at
everything. But I really don't like this
question. This is a literal question
I've got and I didn't paraphrase it. I
really don't like this question because
humans don't use APIs. Very very rarely
do humans use APIs. Humans use products.
We do anything we can to put something
between us and an API. We put a website.
we put an SDK, we put a client, we put a
mobile app. We we do not like to use
APIs unless we have to or we are the
person responsible for building um that
interface. And so one of my core
arguments um and why I love MCP so much
is that I believe that agents deserve
their own interface that is optimized
for them and uh their own use case. And
in order to design that interface, which
is what I want to motivate today, uh we
have to think a little bit about what is
the difference between a human and an
AI. And it's one of these questions
that's like sounds really stupid when
you say it out loud, but it's
instructive to actually go through. And
I'd like to make the argument to you
that it exists on these three um
dimensions of discovery, iteration, and
context. And so just to begin, humans,
we find discovery really cheap. We tend
to do it once. If you think if if any of
you have had to implement something
against a REST API, what do you do? You
call up the docs or you go in Swagger,
whatever it is, you call it up, you look
at it one time, you figure out what you
need, you're never going to do that
again. And so, while it may take you
some time to do the discovery, it is
cheap in the lifetime of the application
you are building. AIS, not so much.
Every single time that thing turns on,
it shakes hands with the server. It
learns about the server. It enumerates
every single tool and every single
description on that server. So discovery
is actually really expensive for agents.
It consumes a lot of tokens. Um, next,
iteration. Same idea. If you're a human
developer and you're writing code
against an API, you can iterate really
quickly. Why? Because you do your
one-time discovery. You figure out the
three routes you're going to call and
then you write a script that calls them
one after another as fast as your
language allows. So iteration is really
cheap. And if that doesn't work, you
just run it again until it does.
Iteration is cheap. is fast. Um for
agents, I think we all know iteration is
slow. Iteration is the enemy. Every
additional call um subject to your
caching setup also sends the entire
history of all previous co calls over
the wire. Like it is just you do not
want to iterate if you can avoid it. And
so that's going to be an important thing
that we take into consideration. And the
last thing is on context. And this is a
little bit handwavy, but it is important
as humans in this conversation. I'm
talking, you're hearing me, and you're
comparing this to different memories you
have and different experiences you have
on different time scales, and it's all
doing wonderful, amazing things in your
brain. And when you plug an LLM uh into
any um given use case, it remembers the
last 200,000 tokens it saw. And that's
the extent of its um memory plus
whatever is, you know, embedded
somewhere in its in its weights and
that's it. And so we need to be very
very conscious of the fact that it has a
very small brain at this moment. I I
think it is a lot closer to when people
talk about sending, you know, Apollo 11
to the moon and and with like 1 kilobyte
of RAM, whatever it was. I think that's
actually how we need to think about
these things that frankly feel quite
magical because they go and uh open my
PRs for me or whatever it is that they
do. Um, so these are the three key
dimensions in my mind of what is
different and we should not build APIs
that are good for humans on any of these
dimensions and pretend that they are
also good for agents. And one way that
I've kind of started talking about this
is this idea which is an agent can find
a needle in a hay stack. The problem is
it's going to look at every piece of hay
and decide if it's a needle. And that's
like not literally true, but it is in an
intuitive sense how we should think
about what we're putting in front of the
agents and how we're posing a problem.
And an MCP server is nothing but an
interface to that problem andor
solution. And so finally to go back to
our product intuition statement, I
argued to you that the most important
word in the universe for MCP developers
is curate. How do you curate from a huge
amount of information which might be
amenable for a human developer a
interface that is appropriate for one of
these extremely limited AI agents at
least on the dimensions that we just
went through. Um, and that sort of
brings us to this slide, YMCP. And I
almost made this like the Derek
Zoolander slide like but why MCP? Like
but I just told you why MCP Derek. It's
because it does all of these things. It
gives us a standard way of communicating
uh information to agents in a way that's
controllable where we can control not
only how it's discovered but also how it
is acted on. There's a big asterisk on
that because client implementations in
the MCP space right now are not amazing
and they do some things that are
themselves not compliant with the MCP
spec.
Maybe at the end we'll get into that.
It's not directly relevant to now except
that all we can do is try to build the
best servers we can subject to the
limitations of the clients that will use
them. And again, I put this in here. I
think we don't need to go through uh
what MCP is for this audience. So, we're
going to move quickly through this. But
it is, of course, for the for the for
the sake of the transcript, the cliche
is that it's USBC uh for the internet.
It is a standard way to connect LLMs and
either tools or um data. And if you
haven't seen fast MCP, this is what it
looks like to build a fully fully
functional MCP server. This one, I live
in Washington DC. the subway is often on
fire there and so this checks whether or
not the subway is on fire and um indeed
it is. Now the question we are here to
actually explore is why are there so
many bad MCP servers?
Maybe a better question is do you all
agree with me that there are many bad
MCP servers? I sort of declare this as
if it's true. I I'm not trying to make a
controversial statement. There are many
bad MCP servers in the world. I see a
lot of them because people are using my
framework to build them. It does that
surprise anyone that I'm sort of
declaring that I'm genuinely I'm I'm
curious if that's a if I'm made an
assumption. I don't
>> in my experience
I I won't say every every MCB I I came
up to is like that but a lot of them are
like AI rubbers. They just put a like
stringify the content of the API and
that's and that's it.
>> They call it an NCB.
>> Yeah. And I and I think even I'll I'll
make the argument going a little off
script here, but I'll make the argument
that a lot of them even when they're not
rappers are just bad products because no
thought was put into them. And I mean,
uh, one comparison that that I talk
about sometimes with my team is if you
go to a a bad website, you know it's a
bad website. We don't need to sit there
and figure out why it's it's ugly or
it's hard to use or it's hard to find
what you're looking for or it's all
flash. I don't know. I don't know what
makes a bad website exactly, but you
know what a bad website is when you go
to one. Um, we don't like to point out
all the things because there's an
infinite number of them. Instead, we try
to find great examples of good websites.
And so, what I think we need more than
anything else are MCP best practices.
And so, a big push of mine right now and
part of where this talk came from is I
want to make sure that we have as many
best practices in the world and
documented. And I do want to applaud
there are a few firms um these are
screenshots from uh Block has an amazing
playbook which if you hate this talk
read their read their blog post it's
it's like a better version of what I'm
doing right now and GitHub recently put
out one and many other companies have
done as well. I I could have I could
have put a lot here but um these are two
that I've referred to uh quite
frequently and so I I recommend them to
you. Um the block team in particular is
just phenomenal what they're doing on
MCP.
By coincidence, the same team has been
my customer for six years on the data
side and they're I really love the work
that they do and um the blog posts they
put out are very thoughtful and I highly
highly recommend them to you. Um I want
to see more of this and today is sort of
one of my humble efforts to try and put
some of that in the world. And so what I
thought we would do today because I did
not want to ask you to open your laptops
up and set up environments and actually
write code with me because
it's 4:25 on Saturday. Um, I thought
that we would fix a server together sort
of through slides um to make this again
as I said hopefully actionable but um
but a gentle a gentle approach to this.
And so here is here is the server that
you were describing a moment ago. Right.
So someone wrote this server um I hope
that the notation is is clear enough to
folks. We have we have a decorator that
says that a function is a tool and then
we have the tool itself. And forgive me
I didn't bore you with the with the
details because we think this is a bad
server to begin with. Um I think in this
server what's our example here right we
want to we want to check an order status
and so in order to check an order status
we need to learn a lot of things about
the user and uh what their orders are we
need to filter it we need to actually
check the status and if this were a REST
API which presumably it is we know
exactly what we would do here we would
make one call to each of the functions
in a sequence and return that as some
userfacing output and it would be easy
and it would be observable and it would
be fast uh and it would be testable
everything would be good. And instead,
if we expose this to an agent,
what order is it going to call these in?
Does it know what the format of the
arguments are? How long is it going to
take for the minimum three round trips
this is going to require? These are all
the problems that we're exposing just
just by looking at this. We're not I
mean solve them, but that's the problems
I see if I were reviewing this as a
product facing um effort. And so the
first thing that we are going to think
about and I think this is probably the
most important thing when we think about
an effective MCP server because it is
product thinking is outcomes not
operations. What do we want to achieve?
And this is a little bit annoying for
engineers sometimes because it's forced
product thinking. It's not someone
coming along with a user story and and
mapping it all out and saying this is
what we need to implement. We cannot put
something in this server unless we know
for a fact it's going to be useful and
have a good outcome. We have to start
there. There's just not enough context
for us to uh be frivolous. And so here's
kind of what this feels like so that we
can get a sense for it. Um the trap when
you're falling into the trap, you have a
whole bunch of atomic operations. This
is amazing if you're building a REST
API. It is best practice if you're
building a REST API. It is bad if you're
building an MCP server. Instead, we want
things like track latest order and give
an email. It's hard to screw up and you
know what the outcome is when you call
it. Um, the other version of the trap is
agent as glue or agent as orchestrator.
Um, please believe me since I've spent
my career building orchestration
software and automation software that
there are things that are really good at
doing orchestration and there are things
that are really bad at orchestration and
agents are right in the middle because
they can do it but it's expensive and
slow and annoying and hard to debug and
stochastic. And so if you can avoid
that, please do. If you can't, there are
times when you don't know the algorithm
and you don't know how to write the code
and it's not programmatic, that's a
perfect time to use an LLM as an
orchestrator. Finding out an order
status, really bad time, really
expensive time to choose to use an LLM
as your orchestration service. So don't
um instead focus on this sort of one
tool equals one agent story. And again,
even here, we're trying to introduce a
new vocabulary. It's not a user story
because user stories everyone thinks
human even though it is a user. It's an
agent story. It's something that a
programmatic autonomous agent with an
objective and a limited context window
is trying to achieve and we need to
satisfy that as much as we can. And then
this is one of those like little tips
that feels obvious but I think is
important. Name the tool for the agent.
Don't name it for you. It's not a REST
API. It's not supposed to be clear to
future developers who need to write, you
know, you're not writing an API for
change. You're writing an API so that
the agent picks the right tool at the
right time. Don't be afraid about using
silly but um explanatory names for your
tools. I shouldn't say silly. Um they
might feel a little silly, but they're
very userf facing in this moment, even
though it feels like a deep a deep d a
deep API. Um this uh just in case any of
you didn't go read the block blog post.
Uh I just found this section of it so uh
important where they essentially say
something very similar. designed top
down from the workflow, not bottom up
from the API endpoints. Two different
ways to get to the same place, but they
will result in very different forms of
product thinking and very different MCP
server. So again, I just I really
encourage you to go and take a look at
that at that blog post. And if we were
to go back to that bad code example I
showed you a moment ago and start
rewriting this and if we had our
laptops, you're welcome to have your
laptops out and follow along. The code
will essentially run, but there's no
need. Um, here's what that could look
like. We did the thing that you would do
as a human. We made three calls in
sequence that are configured that are to
our API, but we buried them in one
agentf facing tool. And that's how we
went from operations to outcomes. The
the API calls still have to happen.
There's no magic happening here. But the
question is, are we going to ask an
agent to figure out the outcome and how
to stitch them together to achieve it or
are we going to just do it because we
know how to how to do it on its behalf.
So thing number one is outcomes over
operations. Thing number two, another
thing, a lot of these frankly are going
to seem kind of silly actually when I
say them out loud.
Please just trust me from the download
graph that these are the most important
things that I could offer as advice. And
uh if and if none of them apply to you,
think of yourself as in the top 1% of
MCP developers. Flatten your arguments.
Um
I see this so often where I do this
myself. I'll confess to you where you
say uh here's my tool and one of the
inputs is a configuration dictionary
hopefully presumably it's documented
somewhere in maybe in the agents
instructions maybe it's in the doc
string um you have a real problem when
by the way I I don't remember if I have
a point for this later so I'll say it
now uh a very frequent trap that you can
fall into with arguments that are
complex is you'll put the explanation of
how to use them in something like a
system prompt or a sub aent definition
or something like that and then you'll
change the tool in the server and now
you it's almost worse than a poorly
documented tool. You have a doubly
documented tool and and one is wrong and
one is right and only error messages
will save you. Um that's really bad.
We're not This is a more gentle version
of that. Just don't ask your um LLM to
invent complex arguments. Now you could
ask what if it's a pyantic model with
every field annotated and fine that's
better than the dictionary but it's
still going to be hard. There was until
very recently there may still be a bug
in maybe it's not a bug because no one
seems to fix it but in cloud desktop all
um all structured arguments like object
arguments would be sent as a string and
this created a real problem um because
we do not want to support automatic
string conversion to object but clog
desktop is one of the most popular MCP
clients and so we actually bowed to this
in as a matter of like necessity and So
fastmcp will now try if you are
supplying a string argument to something
that is very clearly a structured
object, it will try to des serialize it.
It will try to do the right thing. I
really hate that we have to do that.
That feels very deeply wrong to me that
we have a a type schema that said I need
an object and yet we're doing clutchy
stuff like that. And so this is an
example of where this is an evolving
ecosystem. It's a little um it's a
little messy, but what does it look like
when you do it right? Top level
primitives. These are the arguments into
the function. What's the limit? What is
the status? What is the email? Clearly
defined. Just like naming your tool for
the agent, name the arguments for the
agent. Um, and here's sort of what that
looks like when we get that into code.
Instead of having config colon dict, we
have an email, which is a string. We
have include cancelled, which is a a
flag. And then I highly highly recommend
literals or enums whenever you can. Um,
much better than a string if you know
what the options are. uh at this time
very few LLMs know that this kind of
syntax is supported and so they would
typically write this if you had claude
code or something write this. It would
usually write format colon string equals
basic which works. It just doesn't know
to do this. And so it's one of those
little little actionable tips. Use
literal or use enum equivalently. When
you have a a constrained choice um your
your agent will thank you. And I do have
instructions or context. So, I did get
ahead of myself. I'm sorry everybody. It
is 4:35 on a Saturday. Um, the next
thing though I want to talk about is the
instructions that you give to the agent.
Um, this cuts both ways. Um, the most
obvious way is when you have none. Uh,
we mentioned that a moment ago. If you
don't tell your agent how to use your
MCP server, it will guess. It will try.
Um, it will probably confuse itself and
all of those guesses will show up in its
history and that's not a great outcome.
Um, please document your MCP server.
Document the server itself. Document all
the tools on it. Um, uh, give examples.
Examples are a little bit of a
double-edged sword. Um, on the one hand,
they're extremely helpful for showing
the agent how it should use a tool. On
the other hand, it will almost always do
whatever is in the example. Um, this is
just one of those quirks. Perhaps as
models improve, it will stop doing that.
But uh in my experience, if you have an
example, let's say you have a field for
tags. You want to you want to collect
tags for something. If your example has
two tags, you will never get 10 tags.
You will get two tags pretty much every
time. They'll be accurate. It's not
going to do a bad job, but it really
uses those examples um for a lot more
dimensions than just the fact that they
work if that makes sense. So, so use
examples, but be careful with your
examples. Yes, sir.
>> Giving out of distribution examples as a
way to solve for that. Have you seen
that
>> by out of distribution? Do you mean
>> that are not would not be representative
of bacter?
>> It's so interesting. So um I don't have
a strong opinion on that. That seems
super reasonable to me. I don't have an
opinion on it. I in my experience the
fact that an example has some implicit
pattern like the number of objects in
array is becomes such a strong signal
that I almost gave this its own bullet
point called examples are contracts.
like if you give one expect to get
something like it out of distribution is
a really interesting way to sort of
fight against I guess that inertia I
would imagine it is better to do it that
way
I would just be careful of falling into
this sort of more base layer trap I
think so that's completely reasonable
and I would endorse it I think this is
just a more broad whatever example you
put out there weird quirks of it will
show up I I on an MCP server that I'm
building I encountered this tag thing
just uh yesterday and it really confused
me no matter how much I was like, "Use
at least 10 tags." It always was two.
And I finally figured it was because one
of my examples had had two tags. Um, so
yes, good strategy. May or may not be
enough to overcome these basic these
basic caveats. Um,
oh, I do have examples of contracts. I'm
sorry. It's We're 37. Um, this one I
think is one of the most interesting
things on this slide. Uh, errors are
prompts. So, um,
every response that comes out of the
tool,
your your LLM doesn't know that it's
it's like bad. It's not like it gets a
400 or a 500 or something like that. It
gets what it sees as information about
the fact that it didn't uh succeed in
what it was attempting to do. And so if
you just allow Python in in fastmcp's
case or whatever your tool of choice is
to raise for example an empty value
error or a cryptic MCP error with an
integer code that's the information that
goes back to your LLM and does it know
what to do with it or not probably it
knows at least to retry because it knows
it was an error but you actually have an
opportunity to document your API through
errors and this leads to some
interesting strategies that I don't want
to wholeheartedly endorse but I will
mention where for example if you do have
a complex API because you can't get away
from that. Then instead of documenting
every possibility in the dock string
that that documents the entire tool, you
might actually document how to recover
from the most common failures. And so
it's a very weird form of progressive
disclosure of information where you are
acknowledging that it is likely that
this agent will get its first call
wrong, but based on how it gets it
wrong, you actually have an opportunity
to send more information back in an
error message. Um, as I said, this is a
kind of a not an amazing way to think
about building software, but it is the
ultimate version of what I'm
recommending, which is be as helpful as
possible in your error messages. Do go
overboard. They become part of, as far
as the agent is concerned, its next
prompt. And so, they do matter. Um, if
they are too aggressive or too scary, it
may avoid the tool permanently. It may
decide the tool is inoperable. Um, so
errors really matter. And I don't think
this needs too much of an explanation,
but this is what it looks like when you
have a full dock string and an example,
etc. Um, uh, block uh, in their blog
post makes a point which I haven't seen
used too widely, although chatbt does
take advantage of this in their
developer mode, which is this readonly
hint. So the MCP spec has support for
um, annotations, which is a restricted
subset of annotations that you can place
on various components. One of them for
tools is whether or not it's readon. And
if you supply this optionally, clients
can choose to treat that tool a little
bit differently. And so the uh
motivation behind the readonly hint was
uh basically to help with setting
permissions. And uh I don't know who
here is a fan of d- yolo or d-dangerous
disable permissions or whatever whatever
they're called in different in different
terminals, but then you don't care about
this. But for example, chat GBT will ask
you for extra permission if a tool does
not have this annotation set because it
presumes that it can take a side effect
and can um have an adverse effect. So
use those to your advantage. It is one
other form of design that the client can
choose to provide a better experience
with.
I've talked about this a bit now.
Respects the token budget. Um,
I think the meme right now is that the
GitHub server ships like 200,000 tokens
when you handshake with it, something
like that. Um, this is a real thing. And
I don't think it makes the GitHub server
automatically bad. I think it's actually
makes it endemic on folks like myself
who build frameworks and folks who build
clients to find ways to actually solve
this problem because the answer can't
always be do less. In fact, right now we
want to do more. We want an abundance of
functionality. And so we'll talk about
that maybe a little bit later. Um, but
respect for the token budget really
matters. It is a very scarce resource
and your server is not the only one that
the agent is going to talk to. So, uh, I
was on a call with a customer of mine
recently who is so excited that they're
rolling out MCP and I met with the
engineering team and and just to be
clear, this is an incredibly
forward-thinking, high-erforming um,
massive company that I incredibly
respect. I won't say who they are, but I
really respect them. and they got on the
call and they were so excited and they
were like, "We're in the process of
converting our stuff to MCP so that we
can use it." And they had a a strong
argument why it actually had to be their
API. So that's not even the punch line
of the story, which is a whole other
story in in and of itself, but it
fundamentally came down to this. They
had 800 endpoints that had to be exposed
to which I had this thought, which if by
the time you finish reading this, this
is the token budget for each of those
800 tools. if you assume 200,000 um um
tokens in the context window. So if each
of those 800 tools had only this much
space to document itself, not even
document itself, share its schema, share
its name plus documentation, this is the
amount of space you would get. And when
you were done taking up this space
because you were so careful and each
tool really fit in this, you would
lobomize the agent on handshake because
it would have no room for anything else.
So the token budget really matters. um
if this agent connected to a server with
one more tool that had a one-word dock
string, it would just fail. It would
just have a over effectively an
overflow, right? So, the token budget
matters. Um there is probably a budget
that's appropriate for whatever work
you're doing. You may know what it is,
you may not know what it is. Pretend you
know what it is and be mindful of it. Um
in a worst case scenario, try to be
parsimmonious. Try to be as efficient as
possible. That's why we do experiments
like sending additional instructions in
the error message. It's one way to save
on the token budget on handshake. And
the handshake is painful. Um I'm not
sure folks know that uh when an when an
LLM connects to an NCP server, it
typically does download all the
descriptions in one go so that it knows
what's available to it. And it's usually
not done in like a progressively
disclosed way. That is done outright.
Yes.
>> Uh absolutely.
facive
disclosure mechanisms where when it
first initializ
describe step for each one.
So it's 95% less context window
and then
whatever service it doesn't actually
expose that to the unless it needs
>> that's okay. So that's that's awesome.
Let's let's talk about this idea for one
second because it's a really interesting
design. Um,
there's a debate right now about what
you can do that's compliant with the
spec versus what you do that's not
compliant with the spec. And as long as
you do things that are compliant with
the spec, then then by all means do
them. Who cares? One of the problems is
that there are clients that are not
compliant with a spec. Cloud Desktop is
one of them. I've mentioned it a few
times. I have a history with Cloud
Desktop. Um, Cloud Desktop hashes all of
the tools it receives on the first
contact and puts them in a SQLite
database and it doesn't care what you
do. It doesn't care about the fact that
the spec allows you to send more
information. I think your solution would
get around this because it's a tool
call. But um many of the first attempts
that people use to use spec compliant
techniques for getting around this
problem such as notifications fail in
cloud desktop.
Usually you failed before this in cloud
desktop. I'm not a fan of cloud desktop
from MCP server. I think it's a real
missed opportunity because it is such a
flagship product of the company that has
introduced MCP. I think it's a real
missed opportunity. Cloud code is great.
um uh it it caches everything in SQLite
database so it like doesn't matter uh
what you do um techniques similar to
what you've described where you provide
mechanisms for learning more about a
tool that's a great idea I really like
that um there is a challenge where now
you are back in a sort of flatten
arguments world because you have met
tools now where I need to use tools to
learn about tools and use to tools to
call tools in some extreme cases or
beyond so you need to design this very
carefully that's why it usually does
show up as a dedicated ated product. So
thank you for sharing that. Um uh there
are many really interesting techniques
for trying to solve this problem. Yes.
>> So you talk about um progressive
disclosure. Do you use um masking? So
for example, I connect to my Kubernetes
server and my credentials only give me
certain rights. So therefore there are
28 tools that I don't have access to. So
therefore, you don't need to do that. So
when you say do I do I support that? Do
you mean does MCP support that or do I
in my product support that?
>> Yeah, I was just asking something I've
read
about.
>> Okay. So so the spec makes no claim
about this. The spec says when you call
list tools you get tools back and how
that happens is is up to up to
implementation. Um, fast MCP makes that
an overridable hook through middleware,
but again makes no claim on how that is.
Prefix commercial products, which I'm
not here to pitch, allow per tool
masking on any basis. And we see that as
like a place to have an opinionated in
the commercial landscape as opposed to
an opinion in the open source landscape
as opposed to the protocol which should
have no opinion at all. So if that's
interesting, we can chat about this. You
might be getting into this but if you
take this problem the example J might
have mentioned kind of table of contents
approach guess approach is what split
over the four different chunks or maybe
the 800 don't all justify having their
own server like what was the solution
>> for them they can't do it they there's
no solution that allowed them to have as
much information as they wanted on the
on the contact center window they have
they didn't need it they didn't need it
um and and it became a design question
and and frankly it was this call was
probably four months ago now and it was
just call after call after call after
call like this. Um, which made me
realize we need to have talks more like
this and just talk about what it is to
design a product for an agent. My worry
is MCP is viewed as infrastructure or a
transport technology and it is and I'm
very excited. I think by a year from now
we will be talking about context
products as opposed to MCP servers. I'm
very excited about that. We'll move past
the transport. Um but we need to figure
out how to use it and so so I think
that's how we talk about it. Um the only
other alternative that I have discussed
with a few folks a few companies when
you have a problem like this is if you
control the client
much more interesting things become
available to you. Um if you can instruct
your client to do things a certain way
for example if you have a mobile app
that presents an agentic interface to an
end user you control the client is what
I mean by that. um or if it's internal
and you can dictate what what client or
what custom client a team uses. Now you
can do much more interesting things
because you actually do know a lot more
about that token budget and how to
optimize it. But for an external facing
server, there's not a good there's not a
good solution.
I think by now we have talked through
all of this. So I'll leave it for uh
posterity uh in the interest of time.
Um, we talked about curate as a key verb
earlier in this talk. Um, it is, I would
argue, what we have been doing in each
of these little vignettes that we've
been working through with the code. We
are curating the same information set
down to one that is more amendable and
more recognizable for an agent. Um, 50
tools is where I draw the line where
you're going to have performance
problems. I think it seems really low to
a lot of people. Some people will talk
about it even lower than that. Some
people might talk about it higher. If
you have more than 50 tools on a server
without knowing anything else about it,
I'm going to start to think that it's
not a great server. Um, the GitHub
server has, I think, 170 tools. Does
that mean it's not a great server? No.
There's a good argument there. And the
GitHub team has put out a lot of really
interesting blog posts on semantic
routing that they're doing. They had one
just yesterday actually on like some
interesting techniques they're using.
Um, uh, there's software like, um, like
the one you mentioned a moment ago, sir,
which which helps with this problem. So
having a lot of tools like that does not
automatically make it a bad server, but
it is a smell and it does make me
wonder, can we split them up? Do you
have admin tools mixed in with user
tools? Could we name space these tools
differently? Would it be worthwhile
having two servers instead of one? Um,
that is a little bit of a smell. If you
can get down to 515, that would be
ideal. I know that's not achievable for
most people. So it's one of those
actionable but maybe not so actionable
little tips. It's an aspiration that you
should have and just be careful unless
you are prepared to invest in a lot of
care and evaluation. 50 tools per agent.
I should have said per agent. If I have
a 50 tool server and you have a 50 tool
server, that's 100 tools to the agent.
That's where the performance bottleneck
is, not on the server. Sorry, the slides
should be corrected. It's 50 tools to
the agent is where you start to see
performance degradation. Um, I love
this. Um, Kelly KFl is someone who I've
known a long time. He's at Fiverr now.
And while I was putting this talk
together, I happened to come across
these two blog posts of his, which are a
little bit of like a shot and a chaser.
They're written almost exactly a month
apart. One's from October, one's from
November. In the first one, he talks
about building up a Fiverr server, and
he goes from a couple of basic tools to
uh I think 155 188. And in the second
blog post, he talks about how he curated
that server from 188 down to five. You
could read either of these blog posts.
You could view them independently as a
success story on what his adventure was
in learning MCP. I think taken together
they tell a really interesting story
about making something work and then
making something work well which is of
course the product journey in some
sense. Um and so where this where this
takes us is sort of the thing that I
sorry do you have a question? Oh sorry
um where this takes us is sort of the
thing that I have found is the most like
obvious version of this. I wrote a blog
post that went a little bit viral on
this, which is why I talk about it a
lot, which is please, please just, if
nothing else, stop converting REST APIs
into MCP servers. It is the fastest way
to violate every single thing we've
talked about today, every single one of
the heristics that we laid out about
agents. Um, it really doesn't work. And,
it's really complicated because this is
the fastm documentation. That's a blog
post I had to write. And the blog post
basically says, I know I introduced the
capability to do this. Please stop.
That's a really complicated thing.
That's that could be a workshop in and
of itself. Um, I do bear a little bit of
responsibility here. This is not just a
feature of FastMPP. It's one of the most
popular features of FastMPP, which is
why candidly it's not going anywhere.
And instead, we're going to document
around that fact. Um, but here's the
problem, right? Uh, you just you can't
you just can't you just can't convert
I'm not going to explain it. you just
can't convert rests into dev speed
server but
it is an amazing way to bootstrap.
Um when you are trying to figure out if
something is working do not write a lot
of code where you introduce new ways to
figure out if you have failed. Do start
by picking a couple of key endpoints
mirroring them out with fastmcp's
autoconverter or any other tool you like
or even just write that code yourself.
Make sure you solve one problem at a
time and make the first problem being
can you get an agent to use your tool at
all. Once it's using it, by all means,
strip out the the part of it that just
regurgitates the REST API and start to
curate it and start to apply some of
what we've talked about today. Um, this
this is just one of those candid things,
right? It is the fastest way to get
started. You don't have to do it this
way. I start this way. Um, just don't
end up ship the REST API to prod as an
MCP server. You will regret it. You will
pay for it. um a little bit later even
though there's a dopamine hit up front.
So um these are the five major things
that we talked about today in our pseudo
workshop workshop that wasn't really a
workshop actionable talk. Um outcomes,
not operations. Focus on the workflow.
Focus on the top down. Don't get caught
up in all the little operations. Don't
ask your agent to be an orchestrator
unless you absolutely have to. Um
flatten your arguments. Try not to ship
large payloads. Try not to confuse the
agent. Try not to give it too much
choice. I don't think I said out loud
when we talked about that, but try not
to have tightly coupled arguments. That
really confuses the agent. Um, see if
you can uh design around that. Uh, if
possible, it's not always possible, but
if you can, um, instructions are
context.
Seems obvious to say out loud. Of course
they are. They're information for it.
Use them as context. Design them as
context. Really put thought into your
instructions the same way as you would
into your tool signature and schema.
Respect the token budget. Have to do it.
It it's this is the only one on this
list where if you don't actually do it,
you will simply not have a usable
server. The other ones you can get away
with and frankly the art of this
intuition is start with these rules and
then work backwards into practicality.
But this is the only one where I think
you can't actually cross the line and
then curate ruthlessly if you do nothing
else. Start with what works and then
just tear it down to the essentials. Um
I I have been writing MCP servers about
as long as anyone at this point. um a
year and I still find myself starting by
putting too many tools in the world
sometimes because I'm not sure which one
it will use or or I'm experimenting and
I have to I have to remind myself to go
back and get rid of them and it and it's
hard I think as an engineer especially
designing normal APIs you're like okay
like here's my tool here's v2 is
backwards compatible right like and you
keep it you keep adding stuff and that's
a really natural way to work and it can
be a best practice and uh it doesn't
work here you are It would be like using
a UI that just showed a REST API to a to
a user. Um, this is this is a criticism
I have offered of my own products at
times when I'm like this looks a little
bit too much like our REST API docs,
right? We're not doing our job to
actually give this to our users in a in
a consumable way. Um, so if I can leave
you with just one with just one thought,
it's this. Um, you are not building a
tool. you are building a user interface
and treat it like a user interface
because it is the interface that your
agent is going to use and you can do a
better job or you can do a worse job and
either you or your users will will
benefit from that. Um
I think
I think we are at our time so I'm going
to just open it up for questions or
what's next or what what other
challenges we can solve. Um, I hope that
I hope I found the I hope I walked the
tight rope between uh things that are
useful to you all but don't require you
to write any code at 454 on a Saturday.
Now, um, but I I hope I hope I hope I
had some useful nuggets in there for you
more than you more than you came in
with. And happy to take any question if
there are any.
>> What are typically?
Um that would be where you have one
argument that's like um what is the file
type and another argument that's like
how should we process the file and your
input to the file type argument
determines the valid inputs for the
other argument. So they're they're now
tightly coupled. Some some arguments on
the second thing are invalid depending
on what you said for the first thing.
It's just one extra thing to keep track
of. That's a good question. Sorry I
didn't define that.
Do you have a question?
>> I have to I will start with the first
one. uh
when you are giving like an agent an
entity server you have to like document
the tools or or the the capabilities of
the server in the server and in the
agent and that is like uh not ideal. So
what what would you recommend that or
only in the server?
>> So this this comes down to do you
control the client or not? If you
control the client then this is a real
choice and there are uh there are
different ways to think about it. So,
um, for example, in some of my stuff
that I write that I know I'm using, for
example, cloud code to access, um, I
might actually document my MCP server
as, um, files or cloud skills because I
know what the workflows are going to be.
I know that some of my workflows are
infrequent and I don't want to pollute
the context space with them. So, if you
if you control the client, you you have
a real choice to make there. If you
don't control the client, then you don't
have so much of a choice. have to
document it here because you have to
assume you're you're working with the
worst possible client. Um,
honestly, many of the answers in MCP
space boil down to do you control the
client? Then you can do really
interesting things on both sides of the
protocol. From a server author
perspective, you really do need to
document everything in its dock string.
The one escape hatch is that you can
document a server itself. So every
server has an instructions field. Um, it
is not respected by every client.
I believe my team has filed bugs where
we have determined that to be the case.
Um, so hopefully that's not a permanent
thing, but most clients will on
handshake download not only the tools
and resources and everything, but a
instructions blob for the server itself.
How much information you can put in
there, I' i'd be careful. I don't think
it wants to read a novel. But you do
have this one other opportunity to
document maybe the high level of your
server.
>> Another one, but
>> Oh, yeah. Well, why don't we let's mix
it up and we'll come back. Did you have
a question?
>> Yeah.
>> I'm pretty I'm not a member of the core
committee, but I'm in very close contact
with them. So, maybe I can answer your
question.
>> I'm so excited about this.
>> Yes, this I know a lot about.
>> It's going to it's it's it's going to
expand. It's not actually going to
change so much because of the way it's
implemented. Um uh what question could I
answer like what is it?
>> Am I excited about it? I am excited
about it.
>> Um so
all the rules still apply. That's a that
is a fantastic question. Let's talk
about this for one second. Um some of
you I don't know if any of you were at a
meetup we hosted last night where my
colleague actually gave a presentation
on Oh, you were. Yes, that's right.
I was like I know at least somebody's
coming. Um uh my colleague Adam gave a
very good talk on this which I can we'll
chat after this. I'll I'll send you a
link to um to a recording of it. Um but
the nutshell version is this is this is
uh SEP 1686
uh is the name of the proposal and it
adds asynchronous background tasks to
the MCP protocol not just for tools but
for every operation. Um and we don't
need to talk about too much about what
that is. The reason it doesn't involve
changes to any of these rules is um this
is essentially an optin mode of
operating in which the client is saying
I want this to be run asynchronously and
therefore the client takes on new
responsibilities about checking in on it
and and and polling for the result and
actually collecting the result but the
actual interface of learning about the
tool or calling the tool etc is exactly
the same as it is today. So this is
fully opt-in on the client side. Um and
that's why from a design standpoint,
nothing changes. The only question from
a server designer um standpoint is is
this an appropriate thing to be
backgrounded as opposed to be done, you
know, synchronously on the server. Um or
sorry, let me take that back. You can
background anything because it's a
Python framework. So you can chuck
anything in a Python framework. The
question is should the client wait for
it or not? Should it be a blocking task
is really the is really the the right
vocabulary for this? Um, and that's a
that's just a design question for the
server maintainer.
Is that am I in the the the zone of what
you were looking for?
>> Oh, no kidding.
>> L Very.
Yes, this happens a lot actually and
>> but until you said this, I didn't think
of it as like a pattern, but I've seen
this a lot. It's a real problem.
>> Maybe we'll write a write a blog post on
it. That would be fun. Um,
>> yes,
the rules still apply. But as far as
elicitation is concerned, how do you do
that in terms of
>> uh elicitation is really interesting.
So, um, now we're in advanced MCP
elicitation. Anyone not familiar with
what that is? Yes. So elicitation is
basically a way to ask the client for
more input halfway through a tool
execution. So you take your initial
arguments for the tool, you do an
elicitation. It's a formal NCP request
and you say, "I need more information."
And it's uh structured is what's kind of
cool about it. So the most common use
case of this in clients that support it
is for approvals where you say I need a
yes or no of whether I can proceed on
maybe it's some irreversible side effect
or something like that. Um when it works
it works amazingly. Again it's one of
those things that doesn't have amazing
client support and therefore a lot of
people don't put in their servers
because it'll break your server if you
send out this thing and the client
doesn't know how what to do with it. So
you got to be a little bit careful. Does
it change the design is a fantastic
question. I wish it were used more so I
could say yes and you should depend on
it. If all clients supported it and it
was widely used and the reason all
clients don't support this one, by the
way, I'm not trying to it's not like a
meme that clients are bad. It's
complicated to know how to handle
elicitation because some clients are
userfacing. Then it's super easy. Just
ask the user and give them a form. Some
clients are automated, some are
backgrounded, some and so what you do
with an elicitation is actually kind of
complicated. if you just fill it in as
an LLM,
maybe you satisfied it, maybe you
didn't. It's it's a little tough to
know. So, if it were widely used, I
would say absolutely. It gives you an
opportunity to put in particular tightly
coupled arguments into an elicitation
prompt. Um, or confirmations. Um, a lot
of times you'll see for destructive
tools, you'll see confirm and it'll
default to false and you're forcing the
LLM to acknowledge at least as a way of,
you know, hopefully tipping it into a
more sane operating mode. Elicitation is
a better way to design for that. I
didn't I don't think that made it into
this in any of these examples. So, great
question. Wish I could say yes. I hope
to say yes. How about that? You had a
second question.
>> Yeah.
So, so in my in my job the main thing I
do is is build agents and I do like
Dangra open SDK or something like that
and I usually just like write the the
tools and the tools calling the APIs and
I don't like really see the the need for
the MCPS in in that that space. Do you
agree that the MCPS are like
>> I do
>> not needed there or do you have like a
>> I do I I think um
>> I would not I would not tell you to
write an MCP server. I think that within
a year the reason you would choose to
write MCP server is because you'll get
better observability and uh
understanding of what failed whereas the
agent frameworks are not great because
part of the whole agent framework's job
is to not fail on tool call and actually
surface it back to the LLM similar to
what we were talking about a moment ago.
So you often don't get good
observability into tool call failures.
Um some do but not all. Uh and so one of
the reasons to use an MCP server even
for a local case like that is just
because now you have an automatic
infrastructure so you can actually de
debug and and diagnose and stuff. I
don't think that's the strongest reason
to do it. I think that's going to be in
a year when the ecosystem is more
mature. I think if you are if you fully
control the client and you're doing
client orchestration and you are writing
if you are writing the agentic loop and
you're the only one do whatever you
want.
>> I think that all all of the advice you
gave today also applies when you're
building tools.
>> It absolutely does. This is this is Yes.
Everything we said today applies to Py
like a Python tool. Absolutely. And
that's I mean that's how fastm treats
it. It's a good question. Any last
questions? I'm happy to. Yes.
>> Yes.
excited.
>> Yes. Um, so code mode is something that
Entropic uh, Cloudflare actually blogged
about uh, first and then Entropic
followed up where you actually ask you
you solve some of the problems I just
described here. You ask the LLM to write
code that calls MCP tools in sequence.
And it's a really interesting sidestep
of a lot of what I just uh, talked
through here. Um, the reason that I
don't recommend it wholeheartedly is
because it brings into other other
sandboxing and codeex. Like there's
there's other problems with it, but if
you're in a position to do it, it can be
super cool. Um, I actually have a
colleague who wrote the day that came
out, he wrote a fastmcp extension that
supports it,
which we put in a package somewhere. We
didn't we at first didn't want to put it
in fastmcp main because we weren't sure
fastcp tries to be opinionated and we
weren't sure how to fit that in and then
actually it was so successful that we
decided we're going to add an
experiments
flag to the CLI and have it but I don't
know if it's in yet
h
Yeah, this will go into this new I
forget if we called it experiments or
optimize is it's on our road map right
now and this would this would go in
there. Um, and then there's like a whole
world right now of optimizing tool calls
and stuff. But I I would like to be
respectful of your time and allow you
all to go back to your your lives.
You're very kind to spend an hour
talking about MCPS with me. I'm more
than happy to keep talking if anybody
has has questions, but I I would like to
free you all from the conference. I hope
you all enjoyed the talk and thank you
very much for attending.