Your MCP Server is Bad (and you should feel bad) - Jeremiah Lowin, Prefect
Channel: aiDotEngineer
Published at: 2026-01-12
YouTube video id: 96G7FLab8xc
Source: https://www.youtube.com/watch?v=96G7FLab8xc
I really do appreciate that you're all here. I'm going to try and make this as painless as possible. We're not going to do an interactive part. We're going to talk through stuff. I'm happy to go off script. I'm happy to take questions if there's stuff we want to explore at any moment in this. My goal is I'd like to share with you a lot of things that I've learned. Um I'm going to try and make them as actionable as possible. So there is real stuff to do here. Um more than we might in like a more high level talk. But let's be very honest, it is late. It is a lot. It is long. Let's uh let's talk about MCP. I'm hoping that folks here are interested in MCP and that's why you came to this talk. If you're here to learn about MCP, this might be a little bit of a of a different bent. Just show of hands, heard of MCP, used MCP, written in MCP server. Okay. Uh, anyone feel uncomfortable with MCP, which is 100% fine. We can tailor. Okay, then I would say let's let's just go let's dive in. Um, this is who I am. Uh, I'm the founder and CEO of a company called Prefect Technologies. For the last seven or eight years, we've been building um data automation software and orchestration software. Before that, I was a member of the Apache Airflow PMC. Um I originally started Prefect to graduate those same orchestration ideas into data science. Today, we operate the full stack. And then um a few years ago, I I developed an agent framework called Marvin, which I would not describe as wildly popular, but it was my leg into the world of AI, at least from a developer experience standpoint, and learned a lot from that. And then more recently, I introduced a piece of software called fastmcp, which has is is wildly wildly popular, maybe even too popular. And um hence my status today. I'm a little overwhelmed. Uh I find myself back in an open source maintenance seat, which I haven't been in in a few years, which has been a hell of a lot of fun. Um but the most important thing is that fastmcp has given me a very specific vantage point that is really the basis for this talk today. This is our downloads. I've never seen anything like this. I've never worked on a project like this. It was downloaded a million and a half times yesterday. Um there's a lot of MVP servers out there and um fastp is just it's it's it's become the de facto standard way to build MCP servers. Um I introduced it almost exactly a year ago. As many of you are probably aware, MCP itself was introduced almost exactly a year ago and a few days later I introduced the first version of fast MCP. Uh David atropic uh called me up said I think this is great. I think this is how people should build servers. We put a version of it into the official SDK which was amazing. And then as um as MCP has gone crazy in the last year, we found it actually to be constructive to position fast MCP uh as I'm maintaining it as the highle interface to the MCP ecosystem while the SDK SDK focuses on the low-level primitives and actually we're going to remove the fastm vocabulary from the low-level SDK um in a couple of months. It's become a little bit of it's it's too confusing that there are these two things called fast MCP. So fast MTP will be a highle interface to the world and um as a result we see a lot of um not great MCP servers. I I named the talk after this meme and then it occurred to me like do people even know what this meme is anymore? Like this this to me is very funny and very topical and then it's from like a 1999 episode of Futurama. So if you haven't seen this, my talk's title is not meant to be mean. I'm sort of an optimist. I choose to interpret this as but you can do better. And so we're going to find ways to do better. That is the goal of today's talk. In fact, to be more precise, what I want to do today is I would really like to build an intuition for a gentic product design. Um I don't see this talked about nearly as much as it should be given how many agents are using how many products today. And what I mean by this is the exact analog of what it would be if I were if I were giving a talk on how to just build a good product for a user, for a human. And we would talk about human interface guidelines and we talk about user experience and we talk about stories. And I found it really instructive to start talking about those things from an agentic perspective because what else is an MCP server but an interface um for an agent and we should design it for the strengths and weaknesses of those agents in the same way that we do everything else. Now when I put this thought in the world I very very very frequently get this push back which is but if a human can use an API why can't an AI and there are so many things wrong with this question and the number one thing that's wrong with this question is that it has a assumption that I see in so much of AI product design and it drives me nuts which is that AIs are perfect or they're oracles or they're good at everything and they are very very very powerful tools but I'm assuming based on your responses before. I think everyone in this room has some scars of the fact that they are fallible or they are limited or you know they're imperfect. And so I don't like this question because it presumes that they're like magically amazing at everything. But I really don't like this question. This is a literal question I've got and I didn't paraphrase it. I really don't like this question because humans don't use APIs. Very very rarely do humans use APIs. Humans use products. We do anything we can to put something between us and an API. We put a website. we put an SDK, we put a client, we put a mobile app. We we do not like to use APIs unless we have to or we are the person responsible for building um that interface. And so one of my core arguments um and why I love MCP so much is that I believe that agents deserve their own interface that is optimized for them and uh their own use case. And in order to design that interface, which is what I want to motivate today, uh we have to think a little bit about what is the difference between a human and an AI. And it's one of these questions that's like sounds really stupid when you say it out loud, but it's instructive to actually go through. And I'd like to make the argument to you that it exists on these three um dimensions of discovery, iteration, and context. And so just to begin, humans, we find discovery really cheap. We tend to do it once. If you think if if any of you have had to implement something against a REST API, what do you do? You call up the docs or you go in Swagger, whatever it is, you call it up, you look at it one time, you figure out what you need, you're never going to do that again. And so, while it may take you some time to do the discovery, it is cheap in the lifetime of the application you are building. AIS, not so much. Every single time that thing turns on, it shakes hands with the server. It learns about the server. It enumerates every single tool and every single description on that server. So discovery is actually really expensive for agents. It consumes a lot of tokens. Um, next, iteration. Same idea. If you're a human developer and you're writing code against an API, you can iterate really quickly. Why? Because you do your one-time discovery. You figure out the three routes you're going to call and then you write a script that calls them one after another as fast as your language allows. So iteration is really cheap. And if that doesn't work, you just run it again until it does. Iteration is cheap. is fast. Um for agents, I think we all know iteration is slow. Iteration is the enemy. Every additional call um subject to your caching setup also sends the entire history of all previous co calls over the wire. Like it is just you do not want to iterate if you can avoid it. And so that's going to be an important thing that we take into consideration. And the last thing is on context. And this is a little bit handwavy, but it is important as humans in this conversation. I'm talking, you're hearing me, and you're comparing this to different memories you have and different experiences you have on different time scales, and it's all doing wonderful, amazing things in your brain. And when you plug an LLM uh into any um given use case, it remembers the last 200,000 tokens it saw. And that's the extent of its um memory plus whatever is, you know, embedded somewhere in its in its weights and that's it. And so we need to be very very conscious of the fact that it has a very small brain at this moment. I I think it is a lot closer to when people talk about sending, you know, Apollo 11 to the moon and and with like 1 kilobyte of RAM, whatever it was. I think that's actually how we need to think about these things that frankly feel quite magical because they go and uh open my PRs for me or whatever it is that they do. Um, so these are the three key dimensions in my mind of what is different and we should not build APIs that are good for humans on any of these dimensions and pretend that they are also good for agents. And one way that I've kind of started talking about this is this idea which is an agent can find a needle in a hay stack. The problem is it's going to look at every piece of hay and decide if it's a needle. And that's like not literally true, but it is in an intuitive sense how we should think about what we're putting in front of the agents and how we're posing a problem. And an MCP server is nothing but an interface to that problem andor solution. And so finally to go back to our product intuition statement, I argued to you that the most important word in the universe for MCP developers is curate. How do you curate from a huge amount of information which might be amenable for a human developer a interface that is appropriate for one of these extremely limited AI agents at least on the dimensions that we just went through. Um, and that sort of brings us to this slide, YMCP. And I almost made this like the Derek Zoolander slide like but why MCP? Like but I just told you why MCP Derek. It's because it does all of these things. It gives us a standard way of communicating uh information to agents in a way that's controllable where we can control not only how it's discovered but also how it is acted on. There's a big asterisk on that because client implementations in the MCP space right now are not amazing and they do some things that are themselves not compliant with the MCP spec. Maybe at the end we'll get into that. It's not directly relevant to now except that all we can do is try to build the best servers we can subject to the limitations of the clients that will use them. And again, I put this in here. I think we don't need to go through uh what MCP is for this audience. So, we're going to move quickly through this. But it is, of course, for the for the for the sake of the transcript, the cliche is that it's USBC uh for the internet. It is a standard way to connect LLMs and either tools or um data. And if you haven't seen fast MCP, this is what it looks like to build a fully fully functional MCP server. This one, I live in Washington DC. the subway is often on fire there and so this checks whether or not the subway is on fire and um indeed it is. Now the question we are here to actually explore is why are there so many bad MCP servers? Maybe a better question is do you all agree with me that there are many bad MCP servers? I sort of declare this as if it's true. I I'm not trying to make a controversial statement. There are many bad MCP servers in the world. I see a lot of them because people are using my framework to build them. It does that surprise anyone that I'm sort of declaring that I'm genuinely I'm I'm curious if that's a if I'm made an assumption. I don't >> in my experience I I won't say every every MCB I I came up to is like that but a lot of them are like AI rubbers. They just put a like stringify the content of the API and that's and that's it. >> They call it an NCB. >> Yeah. And I and I think even I'll I'll make the argument going a little off script here, but I'll make the argument that a lot of them even when they're not rappers are just bad products because no thought was put into them. And I mean, uh, one comparison that that I talk about sometimes with my team is if you go to a a bad website, you know it's a bad website. We don't need to sit there and figure out why it's it's ugly or it's hard to use or it's hard to find what you're looking for or it's all flash. I don't know. I don't know what makes a bad website exactly, but you know what a bad website is when you go to one. Um, we don't like to point out all the things because there's an infinite number of them. Instead, we try to find great examples of good websites. And so, what I think we need more than anything else are MCP best practices. And so, a big push of mine right now and part of where this talk came from is I want to make sure that we have as many best practices in the world and documented. And I do want to applaud there are a few firms um these are screenshots from uh Block has an amazing playbook which if you hate this talk read their read their blog post it's it's like a better version of what I'm doing right now and GitHub recently put out one and many other companies have done as well. I I could have I could have put a lot here but um these are two that I've referred to uh quite frequently and so I I recommend them to you. Um the block team in particular is just phenomenal what they're doing on MCP. By coincidence, the same team has been my customer for six years on the data side and they're I really love the work that they do and um the blog posts they put out are very thoughtful and I highly highly recommend them to you. Um I want to see more of this and today is sort of one of my humble efforts to try and put some of that in the world. And so what I thought we would do today because I did not want to ask you to open your laptops up and set up environments and actually write code with me because it's 4:25 on Saturday. Um, I thought that we would fix a server together sort of through slides um to make this again as I said hopefully actionable but um but a gentle a gentle approach to this. And so here is here is the server that you were describing a moment ago. Right. So someone wrote this server um I hope that the notation is is clear enough to folks. We have we have a decorator that says that a function is a tool and then we have the tool itself. And forgive me I didn't bore you with the with the details because we think this is a bad server to begin with. Um I think in this server what's our example here right we want to we want to check an order status and so in order to check an order status we need to learn a lot of things about the user and uh what their orders are we need to filter it we need to actually check the status and if this were a REST API which presumably it is we know exactly what we would do here we would make one call to each of the functions in a sequence and return that as some userfacing output and it would be easy and it would be observable and it would be fast uh and it would be testable everything would be good. And instead, if we expose this to an agent, what order is it going to call these in? Does it know what the format of the arguments are? How long is it going to take for the minimum three round trips this is going to require? These are all the problems that we're exposing just just by looking at this. We're not I mean solve them, but that's the problems I see if I were reviewing this as a product facing um effort. And so the first thing that we are going to think about and I think this is probably the most important thing when we think about an effective MCP server because it is product thinking is outcomes not operations. What do we want to achieve? And this is a little bit annoying for engineers sometimes because it's forced product thinking. It's not someone coming along with a user story and and mapping it all out and saying this is what we need to implement. We cannot put something in this server unless we know for a fact it's going to be useful and have a good outcome. We have to start there. There's just not enough context for us to uh be frivolous. And so here's kind of what this feels like so that we can get a sense for it. Um the trap when you're falling into the trap, you have a whole bunch of atomic operations. This is amazing if you're building a REST API. It is best practice if you're building a REST API. It is bad if you're building an MCP server. Instead, we want things like track latest order and give an email. It's hard to screw up and you know what the outcome is when you call it. Um, the other version of the trap is agent as glue or agent as orchestrator. Um, please believe me since I've spent my career building orchestration software and automation software that there are things that are really good at doing orchestration and there are things that are really bad at orchestration and agents are right in the middle because they can do it but it's expensive and slow and annoying and hard to debug and stochastic. And so if you can avoid that, please do. If you can't, there are times when you don't know the algorithm and you don't know how to write the code and it's not programmatic, that's a perfect time to use an LLM as an orchestrator. Finding out an order status, really bad time, really expensive time to choose to use an LLM as your orchestration service. So don't um instead focus on this sort of one tool equals one agent story. And again, even here, we're trying to introduce a new vocabulary. It's not a user story because user stories everyone thinks human even though it is a user. It's an agent story. It's something that a programmatic autonomous agent with an objective and a limited context window is trying to achieve and we need to satisfy that as much as we can. And then this is one of those like little tips that feels obvious but I think is important. Name the tool for the agent. Don't name it for you. It's not a REST API. It's not supposed to be clear to future developers who need to write, you know, you're not writing an API for change. You're writing an API so that the agent picks the right tool at the right time. Don't be afraid about using silly but um explanatory names for your tools. I shouldn't say silly. Um they might feel a little silly, but they're very userf facing in this moment, even though it feels like a deep a deep d a deep API. Um this uh just in case any of you didn't go read the block blog post. Uh I just found this section of it so uh important where they essentially say something very similar. designed top down from the workflow, not bottom up from the API endpoints. Two different ways to get to the same place, but they will result in very different forms of product thinking and very different MCP server. So again, I just I really encourage you to go and take a look at that at that blog post. And if we were to go back to that bad code example I showed you a moment ago and start rewriting this and if we had our laptops, you're welcome to have your laptops out and follow along. The code will essentially run, but there's no need. Um, here's what that could look like. We did the thing that you would do as a human. We made three calls in sequence that are configured that are to our API, but we buried them in one agentf facing tool. And that's how we went from operations to outcomes. The the API calls still have to happen. There's no magic happening here. But the question is, are we going to ask an agent to figure out the outcome and how to stitch them together to achieve it or are we going to just do it because we know how to how to do it on its behalf. So thing number one is outcomes over operations. Thing number two, another thing, a lot of these frankly are going to seem kind of silly actually when I say them out loud. Please just trust me from the download graph that these are the most important things that I could offer as advice. And uh if and if none of them apply to you, think of yourself as in the top 1% of MCP developers. Flatten your arguments. Um I see this so often where I do this myself. I'll confess to you where you say uh here's my tool and one of the inputs is a configuration dictionary hopefully presumably it's documented somewhere in maybe in the agents instructions maybe it's in the doc string um you have a real problem when by the way I I don't remember if I have a point for this later so I'll say it now uh a very frequent trap that you can fall into with arguments that are complex is you'll put the explanation of how to use them in something like a system prompt or a sub aent definition or something like that and then you'll change the tool in the server and now you it's almost worse than a poorly documented tool. You have a doubly documented tool and and one is wrong and one is right and only error messages will save you. Um that's really bad. We're not This is a more gentle version of that. Just don't ask your um LLM to invent complex arguments. Now you could ask what if it's a pyantic model with every field annotated and fine that's better than the dictionary but it's still going to be hard. There was until very recently there may still be a bug in maybe it's not a bug because no one seems to fix it but in cloud desktop all um all structured arguments like object arguments would be sent as a string and this created a real problem um because we do not want to support automatic string conversion to object but clog desktop is one of the most popular MCP clients and so we actually bowed to this in as a matter of like necessity and So fastmcp will now try if you are supplying a string argument to something that is very clearly a structured object, it will try to des serialize it. It will try to do the right thing. I really hate that we have to do that. That feels very deeply wrong to me that we have a a type schema that said I need an object and yet we're doing clutchy stuff like that. And so this is an example of where this is an evolving ecosystem. It's a little um it's a little messy, but what does it look like when you do it right? Top level primitives. These are the arguments into the function. What's the limit? What is the status? What is the email? Clearly defined. Just like naming your tool for the agent, name the arguments for the agent. Um, and here's sort of what that looks like when we get that into code. Instead of having config colon dict, we have an email, which is a string. We have include cancelled, which is a a flag. And then I highly highly recommend literals or enums whenever you can. Um, much better than a string if you know what the options are. uh at this time very few LLMs know that this kind of syntax is supported and so they would typically write this if you had claude code or something write this. It would usually write format colon string equals basic which works. It just doesn't know to do this. And so it's one of those little little actionable tips. Use literal or use enum equivalently. When you have a a constrained choice um your your agent will thank you. And I do have instructions or context. So, I did get ahead of myself. I'm sorry everybody. It is 4:35 on a Saturday. Um, the next thing though I want to talk about is the instructions that you give to the agent. Um, this cuts both ways. Um, the most obvious way is when you have none. Uh, we mentioned that a moment ago. If you don't tell your agent how to use your MCP server, it will guess. It will try. Um, it will probably confuse itself and all of those guesses will show up in its history and that's not a great outcome. Um, please document your MCP server. Document the server itself. Document all the tools on it. Um, uh, give examples. Examples are a little bit of a double-edged sword. Um, on the one hand, they're extremely helpful for showing the agent how it should use a tool. On the other hand, it will almost always do whatever is in the example. Um, this is just one of those quirks. Perhaps as models improve, it will stop doing that. But uh in my experience, if you have an example, let's say you have a field for tags. You want to you want to collect tags for something. If your example has two tags, you will never get 10 tags. You will get two tags pretty much every time. They'll be accurate. It's not going to do a bad job, but it really uses those examples um for a lot more dimensions than just the fact that they work if that makes sense. So, so use examples, but be careful with your examples. Yes, sir. >> Giving out of distribution examples as a way to solve for that. Have you seen that >> by out of distribution? Do you mean >> that are not would not be representative of bacter? >> It's so interesting. So um I don't have a strong opinion on that. That seems super reasonable to me. I don't have an opinion on it. I in my experience the fact that an example has some implicit pattern like the number of objects in array is becomes such a strong signal that I almost gave this its own bullet point called examples are contracts. like if you give one expect to get something like it out of distribution is a really interesting way to sort of fight against I guess that inertia I would imagine it is better to do it that way I would just be careful of falling into this sort of more base layer trap I think so that's completely reasonable and I would endorse it I think this is just a more broad whatever example you put out there weird quirks of it will show up I I on an MCP server that I'm building I encountered this tag thing just uh yesterday and it really confused me no matter how much I was like, "Use at least 10 tags." It always was two. And I finally figured it was because one of my examples had had two tags. Um, so yes, good strategy. May or may not be enough to overcome these basic these basic caveats. Um, oh, I do have examples of contracts. I'm sorry. It's We're 37. Um, this one I think is one of the most interesting things on this slide. Uh, errors are prompts. So, um, every response that comes out of the tool, your your LLM doesn't know that it's it's like bad. It's not like it gets a 400 or a 500 or something like that. It gets what it sees as information about the fact that it didn't uh succeed in what it was attempting to do. And so if you just allow Python in in fastmcp's case or whatever your tool of choice is to raise for example an empty value error or a cryptic MCP error with an integer code that's the information that goes back to your LLM and does it know what to do with it or not probably it knows at least to retry because it knows it was an error but you actually have an opportunity to document your API through errors and this leads to some interesting strategies that I don't want to wholeheartedly endorse but I will mention where for example if you do have a complex API because you can't get away from that. Then instead of documenting every possibility in the dock string that that documents the entire tool, you might actually document how to recover from the most common failures. And so it's a very weird form of progressive disclosure of information where you are acknowledging that it is likely that this agent will get its first call wrong, but based on how it gets it wrong, you actually have an opportunity to send more information back in an error message. Um, as I said, this is a kind of a not an amazing way to think about building software, but it is the ultimate version of what I'm recommending, which is be as helpful as possible in your error messages. Do go overboard. They become part of, as far as the agent is concerned, its next prompt. And so, they do matter. Um, if they are too aggressive or too scary, it may avoid the tool permanently. It may decide the tool is inoperable. Um, so errors really matter. And I don't think this needs too much of an explanation, but this is what it looks like when you have a full dock string and an example, etc. Um, uh, block uh, in their blog post makes a point which I haven't seen used too widely, although chatbt does take advantage of this in their developer mode, which is this readonly hint. So the MCP spec has support for um, annotations, which is a restricted subset of annotations that you can place on various components. One of them for tools is whether or not it's readon. And if you supply this optionally, clients can choose to treat that tool a little bit differently. And so the uh motivation behind the readonly hint was uh basically to help with setting permissions. And uh I don't know who here is a fan of d- yolo or d-dangerous disable permissions or whatever whatever they're called in different in different terminals, but then you don't care about this. But for example, chat GBT will ask you for extra permission if a tool does not have this annotation set because it presumes that it can take a side effect and can um have an adverse effect. So use those to your advantage. It is one other form of design that the client can choose to provide a better experience with. I've talked about this a bit now. Respects the token budget. Um, I think the meme right now is that the GitHub server ships like 200,000 tokens when you handshake with it, something like that. Um, this is a real thing. And I don't think it makes the GitHub server automatically bad. I think it's actually makes it endemic on folks like myself who build frameworks and folks who build clients to find ways to actually solve this problem because the answer can't always be do less. In fact, right now we want to do more. We want an abundance of functionality. And so we'll talk about that maybe a little bit later. Um, but respect for the token budget really matters. It is a very scarce resource and your server is not the only one that the agent is going to talk to. So, uh, I was on a call with a customer of mine recently who is so excited that they're rolling out MCP and I met with the engineering team and and just to be clear, this is an incredibly forward-thinking, high-erforming um, massive company that I incredibly respect. I won't say who they are, but I really respect them. and they got on the call and they were so excited and they were like, "We're in the process of converting our stuff to MCP so that we can use it." And they had a a strong argument why it actually had to be their API. So that's not even the punch line of the story, which is a whole other story in in and of itself, but it fundamentally came down to this. They had 800 endpoints that had to be exposed to which I had this thought, which if by the time you finish reading this, this is the token budget for each of those 800 tools. if you assume 200,000 um um tokens in the context window. So if each of those 800 tools had only this much space to document itself, not even document itself, share its schema, share its name plus documentation, this is the amount of space you would get. And when you were done taking up this space because you were so careful and each tool really fit in this, you would lobomize the agent on handshake because it would have no room for anything else. So the token budget really matters. um if this agent connected to a server with one more tool that had a one-word dock string, it would just fail. It would just have a over effectively an overflow, right? So, the token budget matters. Um there is probably a budget that's appropriate for whatever work you're doing. You may know what it is, you may not know what it is. Pretend you know what it is and be mindful of it. Um in a worst case scenario, try to be parsimmonious. Try to be as efficient as possible. That's why we do experiments like sending additional instructions in the error message. It's one way to save on the token budget on handshake. And the handshake is painful. Um I'm not sure folks know that uh when an when an LLM connects to an NCP server, it typically does download all the descriptions in one go so that it knows what's available to it. And it's usually not done in like a progressively disclosed way. That is done outright. Yes. >> Uh absolutely. facive disclosure mechanisms where when it first initializ describe step for each one. So it's 95% less context window and then whatever service it doesn't actually expose that to the unless it needs >> that's okay. So that's that's awesome. Let's let's talk about this idea for one second because it's a really interesting design. Um, there's a debate right now about what you can do that's compliant with the spec versus what you do that's not compliant with the spec. And as long as you do things that are compliant with the spec, then then by all means do them. Who cares? One of the problems is that there are clients that are not compliant with a spec. Cloud Desktop is one of them. I've mentioned it a few times. I have a history with Cloud Desktop. Um, Cloud Desktop hashes all of the tools it receives on the first contact and puts them in a SQLite database and it doesn't care what you do. It doesn't care about the fact that the spec allows you to send more information. I think your solution would get around this because it's a tool call. But um many of the first attempts that people use to use spec compliant techniques for getting around this problem such as notifications fail in cloud desktop. Usually you failed before this in cloud desktop. I'm not a fan of cloud desktop from MCP server. I think it's a real missed opportunity because it is such a flagship product of the company that has introduced MCP. I think it's a real missed opportunity. Cloud code is great. um uh it it caches everything in SQLite database so it like doesn't matter uh what you do um techniques similar to what you've described where you provide mechanisms for learning more about a tool that's a great idea I really like that um there is a challenge where now you are back in a sort of flatten arguments world because you have met tools now where I need to use tools to learn about tools and use to tools to call tools in some extreme cases or beyond so you need to design this very carefully that's why it usually does show up as a dedicated ated product. So thank you for sharing that. Um uh there are many really interesting techniques for trying to solve this problem. Yes. >> So you talk about um progressive disclosure. Do you use um masking? So for example, I connect to my Kubernetes server and my credentials only give me certain rights. So therefore there are 28 tools that I don't have access to. So therefore, you don't need to do that. So when you say do I do I support that? Do you mean does MCP support that or do I in my product support that? >> Yeah, I was just asking something I've read about. >> Okay. So so the spec makes no claim about this. The spec says when you call list tools you get tools back and how that happens is is up to up to implementation. Um, fast MCP makes that an overridable hook through middleware, but again makes no claim on how that is. Prefix commercial products, which I'm not here to pitch, allow per tool masking on any basis. And we see that as like a place to have an opinionated in the commercial landscape as opposed to an opinion in the open source landscape as opposed to the protocol which should have no opinion at all. So if that's interesting, we can chat about this. You might be getting into this but if you take this problem the example J might have mentioned kind of table of contents approach guess approach is what split over the four different chunks or maybe the 800 don't all justify having their own server like what was the solution >> for them they can't do it they there's no solution that allowed them to have as much information as they wanted on the on the contact center window they have they didn't need it they didn't need it um and and it became a design question and and frankly it was this call was probably four months ago now and it was just call after call after call after call like this. Um, which made me realize we need to have talks more like this and just talk about what it is to design a product for an agent. My worry is MCP is viewed as infrastructure or a transport technology and it is and I'm very excited. I think by a year from now we will be talking about context products as opposed to MCP servers. I'm very excited about that. We'll move past the transport. Um but we need to figure out how to use it and so so I think that's how we talk about it. Um the only other alternative that I have discussed with a few folks a few companies when you have a problem like this is if you control the client much more interesting things become available to you. Um if you can instruct your client to do things a certain way for example if you have a mobile app that presents an agentic interface to an end user you control the client is what I mean by that. um or if it's internal and you can dictate what what client or what custom client a team uses. Now you can do much more interesting things because you actually do know a lot more about that token budget and how to optimize it. But for an external facing server, there's not a good there's not a good solution. I think by now we have talked through all of this. So I'll leave it for uh posterity uh in the interest of time. Um, we talked about curate as a key verb earlier in this talk. Um, it is, I would argue, what we have been doing in each of these little vignettes that we've been working through with the code. We are curating the same information set down to one that is more amendable and more recognizable for an agent. Um, 50 tools is where I draw the line where you're going to have performance problems. I think it seems really low to a lot of people. Some people will talk about it even lower than that. Some people might talk about it higher. If you have more than 50 tools on a server without knowing anything else about it, I'm going to start to think that it's not a great server. Um, the GitHub server has, I think, 170 tools. Does that mean it's not a great server? No. There's a good argument there. And the GitHub team has put out a lot of really interesting blog posts on semantic routing that they're doing. They had one just yesterday actually on like some interesting techniques they're using. Um, uh, there's software like, um, like the one you mentioned a moment ago, sir, which which helps with this problem. So having a lot of tools like that does not automatically make it a bad server, but it is a smell and it does make me wonder, can we split them up? Do you have admin tools mixed in with user tools? Could we name space these tools differently? Would it be worthwhile having two servers instead of one? Um, that is a little bit of a smell. If you can get down to 515, that would be ideal. I know that's not achievable for most people. So it's one of those actionable but maybe not so actionable little tips. It's an aspiration that you should have and just be careful unless you are prepared to invest in a lot of care and evaluation. 50 tools per agent. I should have said per agent. If I have a 50 tool server and you have a 50 tool server, that's 100 tools to the agent. That's where the performance bottleneck is, not on the server. Sorry, the slides should be corrected. It's 50 tools to the agent is where you start to see performance degradation. Um, I love this. Um, Kelly KFl is someone who I've known a long time. He's at Fiverr now. And while I was putting this talk together, I happened to come across these two blog posts of his, which are a little bit of like a shot and a chaser. They're written almost exactly a month apart. One's from October, one's from November. In the first one, he talks about building up a Fiverr server, and he goes from a couple of basic tools to uh I think 155 188. And in the second blog post, he talks about how he curated that server from 188 down to five. You could read either of these blog posts. You could view them independently as a success story on what his adventure was in learning MCP. I think taken together they tell a really interesting story about making something work and then making something work well which is of course the product journey in some sense. Um and so where this where this takes us is sort of the thing that I sorry do you have a question? Oh sorry um where this takes us is sort of the thing that I have found is the most like obvious version of this. I wrote a blog post that went a little bit viral on this, which is why I talk about it a lot, which is please, please just, if nothing else, stop converting REST APIs into MCP servers. It is the fastest way to violate every single thing we've talked about today, every single one of the heristics that we laid out about agents. Um, it really doesn't work. And, it's really complicated because this is the fastm documentation. That's a blog post I had to write. And the blog post basically says, I know I introduced the capability to do this. Please stop. That's a really complicated thing. That's that could be a workshop in and of itself. Um, I do bear a little bit of responsibility here. This is not just a feature of FastMPP. It's one of the most popular features of FastMPP, which is why candidly it's not going anywhere. And instead, we're going to document around that fact. Um, but here's the problem, right? Uh, you just you can't you just can't you just can't convert I'm not going to explain it. you just can't convert rests into dev speed server but it is an amazing way to bootstrap. Um when you are trying to figure out if something is working do not write a lot of code where you introduce new ways to figure out if you have failed. Do start by picking a couple of key endpoints mirroring them out with fastmcp's autoconverter or any other tool you like or even just write that code yourself. Make sure you solve one problem at a time and make the first problem being can you get an agent to use your tool at all. Once it's using it, by all means, strip out the the part of it that just regurgitates the REST API and start to curate it and start to apply some of what we've talked about today. Um, this this is just one of those candid things, right? It is the fastest way to get started. You don't have to do it this way. I start this way. Um, just don't end up ship the REST API to prod as an MCP server. You will regret it. You will pay for it. um a little bit later even though there's a dopamine hit up front. So um these are the five major things that we talked about today in our pseudo workshop workshop that wasn't really a workshop actionable talk. Um outcomes, not operations. Focus on the workflow. Focus on the top down. Don't get caught up in all the little operations. Don't ask your agent to be an orchestrator unless you absolutely have to. Um flatten your arguments. Try not to ship large payloads. Try not to confuse the agent. Try not to give it too much choice. I don't think I said out loud when we talked about that, but try not to have tightly coupled arguments. That really confuses the agent. Um, see if you can uh design around that. Uh, if possible, it's not always possible, but if you can, um, instructions are context. Seems obvious to say out loud. Of course they are. They're information for it. Use them as context. Design them as context. Really put thought into your instructions the same way as you would into your tool signature and schema. Respect the token budget. Have to do it. It it's this is the only one on this list where if you don't actually do it, you will simply not have a usable server. The other ones you can get away with and frankly the art of this intuition is start with these rules and then work backwards into practicality. But this is the only one where I think you can't actually cross the line and then curate ruthlessly if you do nothing else. Start with what works and then just tear it down to the essentials. Um I I have been writing MCP servers about as long as anyone at this point. um a year and I still find myself starting by putting too many tools in the world sometimes because I'm not sure which one it will use or or I'm experimenting and I have to I have to remind myself to go back and get rid of them and it and it's hard I think as an engineer especially designing normal APIs you're like okay like here's my tool here's v2 is backwards compatible right like and you keep it you keep adding stuff and that's a really natural way to work and it can be a best practice and uh it doesn't work here you are It would be like using a UI that just showed a REST API to a to a user. Um, this is this is a criticism I have offered of my own products at times when I'm like this looks a little bit too much like our REST API docs, right? We're not doing our job to actually give this to our users in a in a consumable way. Um, so if I can leave you with just one with just one thought, it's this. Um, you are not building a tool. you are building a user interface and treat it like a user interface because it is the interface that your agent is going to use and you can do a better job or you can do a worse job and either you or your users will will benefit from that. Um I think I think we are at our time so I'm going to just open it up for questions or what's next or what what other challenges we can solve. Um, I hope that I hope I found the I hope I walked the tight rope between uh things that are useful to you all but don't require you to write any code at 454 on a Saturday. Now, um, but I I hope I hope I hope I had some useful nuggets in there for you more than you more than you came in with. And happy to take any question if there are any. >> What are typically? Um that would be where you have one argument that's like um what is the file type and another argument that's like how should we process the file and your input to the file type argument determines the valid inputs for the other argument. So they're they're now tightly coupled. Some some arguments on the second thing are invalid depending on what you said for the first thing. It's just one extra thing to keep track of. That's a good question. Sorry I didn't define that. Do you have a question? >> I have to I will start with the first one. uh when you are giving like an agent an entity server you have to like document the tools or or the the capabilities of the server in the server and in the agent and that is like uh not ideal. So what what would you recommend that or only in the server? >> So this this comes down to do you control the client or not? If you control the client then this is a real choice and there are uh there are different ways to think about it. So, um, for example, in some of my stuff that I write that I know I'm using, for example, cloud code to access, um, I might actually document my MCP server as, um, files or cloud skills because I know what the workflows are going to be. I know that some of my workflows are infrequent and I don't want to pollute the context space with them. So, if you if you control the client, you you have a real choice to make there. If you don't control the client, then you don't have so much of a choice. have to document it here because you have to assume you're you're working with the worst possible client. Um, honestly, many of the answers in MCP space boil down to do you control the client? Then you can do really interesting things on both sides of the protocol. From a server author perspective, you really do need to document everything in its dock string. The one escape hatch is that you can document a server itself. So every server has an instructions field. Um, it is not respected by every client. I believe my team has filed bugs where we have determined that to be the case. Um, so hopefully that's not a permanent thing, but most clients will on handshake download not only the tools and resources and everything, but a instructions blob for the server itself. How much information you can put in there, I' i'd be careful. I don't think it wants to read a novel. But you do have this one other opportunity to document maybe the high level of your server. >> Another one, but >> Oh, yeah. Well, why don't we let's mix it up and we'll come back. Did you have a question? >> Yeah. >> I'm pretty I'm not a member of the core committee, but I'm in very close contact with them. So, maybe I can answer your question. >> I'm so excited about this. >> Yes, this I know a lot about. >> It's going to it's it's it's going to expand. It's not actually going to change so much because of the way it's implemented. Um uh what question could I answer like what is it? >> Am I excited about it? I am excited about it. >> Um so all the rules still apply. That's a that is a fantastic question. Let's talk about this for one second. Um some of you I don't know if any of you were at a meetup we hosted last night where my colleague actually gave a presentation on Oh, you were. Yes, that's right. I was like I know at least somebody's coming. Um uh my colleague Adam gave a very good talk on this which I can we'll chat after this. I'll I'll send you a link to um to a recording of it. Um but the nutshell version is this is this is uh SEP 1686 uh is the name of the proposal and it adds asynchronous background tasks to the MCP protocol not just for tools but for every operation. Um and we don't need to talk about too much about what that is. The reason it doesn't involve changes to any of these rules is um this is essentially an optin mode of operating in which the client is saying I want this to be run asynchronously and therefore the client takes on new responsibilities about checking in on it and and and polling for the result and actually collecting the result but the actual interface of learning about the tool or calling the tool etc is exactly the same as it is today. So this is fully opt-in on the client side. Um and that's why from a design standpoint, nothing changes. The only question from a server designer um standpoint is is this an appropriate thing to be backgrounded as opposed to be done, you know, synchronously on the server. Um or sorry, let me take that back. You can background anything because it's a Python framework. So you can chuck anything in a Python framework. The question is should the client wait for it or not? Should it be a blocking task is really the is really the the right vocabulary for this? Um, and that's a that's just a design question for the server maintainer. Is that am I in the the the zone of what you were looking for? >> Oh, no kidding. >> L Very. Yes, this happens a lot actually and >> but until you said this, I didn't think of it as like a pattern, but I've seen this a lot. It's a real problem. >> Maybe we'll write a write a blog post on it. That would be fun. Um, >> yes, the rules still apply. But as far as elicitation is concerned, how do you do that in terms of >> uh elicitation is really interesting. So, um, now we're in advanced MCP elicitation. Anyone not familiar with what that is? Yes. So elicitation is basically a way to ask the client for more input halfway through a tool execution. So you take your initial arguments for the tool, you do an elicitation. It's a formal NCP request and you say, "I need more information." And it's uh structured is what's kind of cool about it. So the most common use case of this in clients that support it is for approvals where you say I need a yes or no of whether I can proceed on maybe it's some irreversible side effect or something like that. Um when it works it works amazingly. Again it's one of those things that doesn't have amazing client support and therefore a lot of people don't put in their servers because it'll break your server if you send out this thing and the client doesn't know how what to do with it. So you got to be a little bit careful. Does it change the design is a fantastic question. I wish it were used more so I could say yes and you should depend on it. If all clients supported it and it was widely used and the reason all clients don't support this one, by the way, I'm not trying to it's not like a meme that clients are bad. It's complicated to know how to handle elicitation because some clients are userfacing. Then it's super easy. Just ask the user and give them a form. Some clients are automated, some are backgrounded, some and so what you do with an elicitation is actually kind of complicated. if you just fill it in as an LLM, maybe you satisfied it, maybe you didn't. It's it's a little tough to know. So, if it were widely used, I would say absolutely. It gives you an opportunity to put in particular tightly coupled arguments into an elicitation prompt. Um, or confirmations. Um, a lot of times you'll see for destructive tools, you'll see confirm and it'll default to false and you're forcing the LLM to acknowledge at least as a way of, you know, hopefully tipping it into a more sane operating mode. Elicitation is a better way to design for that. I didn't I don't think that made it into this in any of these examples. So, great question. Wish I could say yes. I hope to say yes. How about that? You had a second question. >> Yeah. So, so in my in my job the main thing I do is is build agents and I do like Dangra open SDK or something like that and I usually just like write the the tools and the tools calling the APIs and I don't like really see the the need for the MCPS in in that that space. Do you agree that the MCPS are like >> I do >> not needed there or do you have like a >> I do I I think um >> I would not I would not tell you to write an MCP server. I think that within a year the reason you would choose to write MCP server is because you'll get better observability and uh understanding of what failed whereas the agent frameworks are not great because part of the whole agent framework's job is to not fail on tool call and actually surface it back to the LLM similar to what we were talking about a moment ago. So you often don't get good observability into tool call failures. Um some do but not all. Uh and so one of the reasons to use an MCP server even for a local case like that is just because now you have an automatic infrastructure so you can actually de debug and and diagnose and stuff. I don't think that's the strongest reason to do it. I think that's going to be in a year when the ecosystem is more mature. I think if you are if you fully control the client and you're doing client orchestration and you are writing if you are writing the agentic loop and you're the only one do whatever you want. >> I think that all all of the advice you gave today also applies when you're building tools. >> It absolutely does. This is this is Yes. Everything we said today applies to Py like a Python tool. Absolutely. And that's I mean that's how fastm treats it. It's a good question. Any last questions? I'm happy to. Yes. >> Yes. excited. >> Yes. Um, so code mode is something that Entropic uh, Cloudflare actually blogged about uh, first and then Entropic followed up where you actually ask you you solve some of the problems I just described here. You ask the LLM to write code that calls MCP tools in sequence. And it's a really interesting sidestep of a lot of what I just uh, talked through here. Um, the reason that I don't recommend it wholeheartedly is because it brings into other other sandboxing and codeex. Like there's there's other problems with it, but if you're in a position to do it, it can be super cool. Um, I actually have a colleague who wrote the day that came out, he wrote a fastmcp extension that supports it, which we put in a package somewhere. We didn't we at first didn't want to put it in fastmcp main because we weren't sure fastcp tries to be opinionated and we weren't sure how to fit that in and then actually it was so successful that we decided we're going to add an experiments flag to the CLI and have it but I don't know if it's in yet h Yeah, this will go into this new I forget if we called it experiments or optimize is it's on our road map right now and this would this would go in there. Um, and then there's like a whole world right now of optimizing tool calls and stuff. But I I would like to be respectful of your time and allow you all to go back to your your lives. You're very kind to spend an hour talking about MCPS with me. I'm more than happy to keep talking if anybody has has questions, but I I would like to free you all from the conference. I hope you all enjoyed the talk and thank you very much for attending.