Scaling GitHub for your Agents — Sam Morrow, GitHub
Channel: aiDotEngineer
Published at: 2026-04-27
YouTube video id: 0n3MKk7r60w
Source: https://www.youtube.com/watch?v=0n3MKk7r60w
[music] >> All right. Hello, London. >> [applause] >> And I hope everyone's been enjoying the AI Engineer Europe so far. For there's so many amazing speakers. I've been like watching talks and talking to people for days now and it's been immense. I'm Sam. I lead development of GitHub's MCP server and yeah, I'm here to talk about mostly challenges we've faced building and scaling our remote server, how we've overcome them, and uh before I will start, I just like I like messing with people. So, you know, here quick show of hands, who's used an MCP server? Good. Good. Uh who's used GitHubs? Who has a hot take on GitHub? >> [laughter] >> And uh yeah, does anyone build a server or client? Oh, nice. Quite a few. Um and yeah, has anyone contributed to the specification? Oh. Oh, yeah. I got I got one. That's actually the first one, I think, other than the MCP dev summit. There was quite a lot of them. Uh but um [snorts] yeah, anyway, it's really awesome to see so many hands. So, I'm glad that I've actually come to the right place. But yes, for GitHub, you know, our MCP journey started well, at least in public in April last year. And uh we actually open-sourced our local MCP in April last year. And we've just turned 1 year's old. So, I'm super stoked by that. But um yeah, back then, right, there was a tremendous buzz. Uh we were the most starred repo on GitHub of of the particular week. And uh like the exposure meant we got a high volume of public contributions, uh rapidly filling gaps in plat- platform coverage that people kind of wanted to add tools and things. And you know, not everything was perfect, right? After a month or so of new features, agents in some ways were getting worse at using GitHub and context windows were getting blown out quicker. And uh you know, we picked, I think, over 100 tools. And certainly at the time, that was just too many. Uh LangChain had already produced research uh they published in February that year, you know, of the exact kind of problems we were seeing. More tools don't make better agents, you know, they get confused and forgetful. Well, I say more tools like more context and more tools shoved directly into the context to be precise. But uh yeah, GitHub's a really expansive platform. And we provided tools, you know, for repos, issues, PRs, actions, projects, like even more things. Uh but the hard part of solving this was like we didn't want to prevent users from having the tools individually that they needed and they used. Uh and suffice to say our user base is pretty diverse. And probably even like on GitHub platform at the moment, there might be like one or two clauses as well. Uh [snorts] and for the record, uh there's a team of us who work on it. It's not just me. >> [snorts] >> And my team is awesome. But uh yeah, so to try and fix some of this, you know, I quick- quickly added this thing tool sets, which was, you know, a kind of grouping concept of related product tools. And users could just pick which ones they wanted and configure it. Uh I also like added a dynamic tool selection thing uh where agents could discover sets of tools and then turn on in chunks. And uh we never released it, but I made a kind of rag version of the same. Um you know, for kind of semantic tool search and discovery. But it Uh like what what do you think happened even in spite of all this stuff? >> [snorts] [laughter] >> Everyone used the default settings. It was really annoying because like in a way we had all these elegant solutions. Uh all they did was require users to actually, you know, configure the JSON a little bit. And most users just don't. Uh maybe it's even a partially a spec problem cuz uh you know, for like every proposal so far for grouping to the MCP specification for various reasons has been rejected. And there have been several attempts. Uh and like in a sense, like every mode or configuration we add, you you know, one could argue is papering over potential gaps. Uh like or gaps in client implementations. So, like as an example, like we have a read-only mode and uh roughly 17% of our users use it, but it maps one-to-one to the read-only uh sorry, yeah, the read-only hint annotation. But like no client exposes that as a method of filtering servers. I think some gateways now do, but anyway, it's a it's an interesting easy win for more enterprise use cases where people often only want that. >> [snorts] >> But uh yeah, we needed to find better solutions to context reduction. And uh you don't need to worry too much about the specifics. This is dated now. But uh like we started trying to optimize and we looked at the use the usage patterns on our remote server. And initially, you know, we cut the amount of context used by focusing the tools more specifically to the general case. And based on usage to like about 49% reduction of the initial load. And then we subsequently also grouped CRUD tools and brought that down even more. And I think like I think you get about 40 tools if you use the default configuration. And then you can kind of expand or contract that based on your own preference. But uh yeah, like it's easy to customize. And uh we've also like recently had a massive push to, you know, reduce output tokens of a lot of tools as well. And um in this example, you know, just by tailoring exactly what comes with the list pull requests, it's like actually lost more than 75% of the tokens used in the output. So, you know, in terms of how token hungry GitHub server is, like it's it's it's a moving target. We're constantly changing things that improve it. And uh if you haven't used it in a while, like it's likely very different from a few months ago even. And um yeah, anyway, like and we haven't ruled out more advanced approaches like code mode. And we're always experimenting internally. But uh on the heels of this, we also dug into our data and we found some more opportunities. >> [snorts] >> So, yeah, like uh we made a big push to reduce tool failures as well. And the success rate is roughly, I think, over 95% at this point. But uh like not all failure is preventable cuz agents don't necessarily know which repos they have write permission on. They still hallucinate. But uh we've been able to identify significant numbers of areas that could be overcome, mostly by encoding a sort of agent intent into our tool surface. And you know, you might have to make five API calls to make it more robust. But you know, in that case, we do that in the server side to reduce round trips cuz that, you know, saves context, saves time, and usually um [snorts] makes uh massively better experience, you know, makes the agents more successful. And yeah, we also started to run evals last year. Um I'm not going to go into detail. The that link takes you to a blog article that my colleague senior wrote about doing it. But uh one of the gists is instead of micro-optimizing individual tool descriptions, you know, you try to test them against each other to try and make sure that they're called at the right times and not called at the wrong times. So that in the pool of each other, they don't fight for like you know, you like the perfect tool description that makes the agent call it all the time is terrible as is the reverse of that. So, you need to try and get that as tight as possible. Um But yeah, this could be a whole other talk. Security, on the other hand, is something that's like a kind of constant menace in all of this. I've seen lots of people talking about this. Um and it's a real problem in some ways for us because, you know, we've a lot of people using plain text access tokens for MCP in the wild. And uh usually they're stored somewhere the agent can access. They're frequently long-lived. They're often over-privileged. And they're kind of sat there just waiting to be abused. Uh end users, like I I don't think they're choosing this, you know, like it's it's actually hard to make configuration easy and secure at the same time. And clients have to make use of system keyrings or encrypted storage. And like VS Code does. Uh but uh you know, the MCP spec also provided a better way with remote HTTP, which, you know, is all the way back to April last year as well. Um [snorts] and we embraced this, of course. Um And we wanted to make secure connection path of least resistance. Uh we didn't want users to have to download a local runtime. And you know, our remote server supports OAuth 2.1. And my team even helped add the proof key for code exchange support, which is commonly known as PKCE, to GitHub's authorization server to improve the security posture for client apps. Um but as I said, we hoped OAuth would be the path of least resistance. And again, perhaps some of you might know what happened. Everyone expected us to support the dynamic client registration. And for us, like it created more problems than it solved because like if you implement it kind of properly, it's hard not to have unbounded growth of app databases and challenges of how you would bucket them for rate limits and there isn't a reliable app identity. So, we just considered it and rejected it and like we feel like it's a well-intentioned mistake and we're you know, we're not the only authorization server to not support this. And um even um like MCP itself, right? It decided that client ID metadata is probably the way to go and I can't promise that we're going to support it, but I promise that I am trying to get us to support it and that should make logging in like massively easier. But um yeah, more on that in the future. And also, speaking of security, some of you may have seen this. Um this was a fun day. >> [laughter] >> But like you know, Invariant Labs published this and you know, like it's a correct sort of correctly done prompt injection exfil attack for getting private data out of GitHub and um the thing is you know, they call it specifically GitHub's MCP server out and I think that we you know, we do provide the tools that can enable that if you just kind of enable them all, but uh it applies to almost every agent setup whether they use MCP or not or whether they use GitHub MCP, you know, like the lethal trifecta stuff which I'm not going to rehash now cuz I think many of you have probably seen it or you can look it up like Simon's Simon Wilson's blog post on that's excellent, but you know, the utility of agents is in conflict direct conflict with kind of protecting this stuff and it's like it's an active space trying to work out how to prevent these problems, but uh it's not solved and it's very much not unique to GitHub and we have users with wildly different risk profiles, you know, like um we you know, we even have people that have like air-gapped GitHub Enterprise server instances in like much more secure and then you know, obviously the collaborators etc. are also just running straight to GitHub with like you know, probably full token access to the agent everything and that's kind of also interesting, right? And like I'm I'm not naysaying any of this. It's just it's cool to kind of see what people do and see if we can actually support the different use cases and security postures while everyone experiments with this stuff. And uh we also kind of use like lean on off to uh manage tools as well. And this is something I'm pretty happy with. Um if you log into GitHub MCP with a PAT token that we just immediately filter the tools down by the scopes that the token has. You uh you don't have to do anything other than give it the token. On OAuth, we support step-up OAuth. So, you know, you can get a we could return a scope challenge and then it will interactively ask the user if they want to allow the scope. And if you do, then you can uh like continue the tool call. It doesn't fail, which I think is also nice. And then VS Code, for example, supports that and I initially worked on this with them just because they already have a token to use GitHub and what they wanted was that if their baked-in token doesn't have permissions to use everything, that it instead of just failing, there was a mechanism for users having a clean install and then an upscoping later if they need it. And yeah, lastly, server tokens as well. Like they didn't have a like on actions and things, they didn't have a user. So, user-specific tools are kind of out there and then by removing those, we're just removing kind of constant sources of failure and wasted context at the same time. Uh we run a completely sort of stateless server setup and um we have been using Redis for session storage, you know, it's standard observability and deep kind of stack. Like this is not a weird picture, but I guess one of the weird things for some people is a lot of people are running a stateful MCP server process in the singular and have kind of struggled with how you get it into this shape. But um for us like we did a few things cuz it's very dynamic, but like one of the fun things we did is um we uh we actually make a brand new in the SDK sense a brand new server instance on every single request and we add the tools to it at the start. So, whatever your configuration is, it just builds this and then you get what you've asked for or what you're allowed to use cuz some things have policies that impact whether you've got tools or not. Um and yeah, like we've been able to scale to this point we serve around 7 million tool calls a week. And we you know, we don't have session affinity. Uh the even the sessions we generally only use them to identify that's the only way to identify the self-reported client identity that comes through MCP. So, it's useful for us to understand like what clients people are using the server with. So, yeah, like we use sessions for that, but um yeah, we also have a like wanted to bring experiments to all of you and everyone. And um [snorts] we have this thing that's a Insiders mode and all it all it does is it it turns on certain feature flags and things for experiments that we're happy to just ship to anyone who wants to use them. And uh this just takes you to the documentation, but um like an example of something that we haven't released generally yet, but is on Insiders is our MCP apps and like just you know, I I set up the example before I came in, but like it's quite nice when you're talking to the agent to have the opportunity to kind of edit the AI-generated uh issue especially if you're you know, you're working heavily in professional open source stuff and you want to make sure that it's you posting and it's not going to get closed as a sort of bot-generated thing. It like this is a nice human in the loop thing that MCP enables and I I much you know, I I wasn't sure how much I would like it at first, but then I've come to love it because I kind of care about how my issues and things are received by people and this is just a really great way to make sure that I can I can check that. Um So, yeah, like in terms of where I think it's going like if something along these lines, I think [snorts] a near future, you know, server discovery will hopefully be automatic and tool tool use will probably become more compositional like bash or piping tools into other tools, streaming data through them or like you know, Cloudflare's code mode approach or Anthropic's tool search tool API which just landed in Claude Code a couple of weeks ago. And OpenAI recently added a similar API as well. Sorry, OpenAI added a similar API, too. And uh I you know, I I fully expect that like thousands of tools will be normal very soon. We're trying to iron out all the problems that prevented it in the first place and I'll probably reverse many of the fewer tools decisions. And uh users hopefully won't even have to know what MCP is. They'll just convey what it is they want to do and the OAuth setup and like you know, the tool selection things will become truly autonomous and I don't think we're that far away from this, but we're we're kind of in this experimental phase where we're not really there yet. But um I think harnesses like Pi are also interesting because you can build a weird client that maybe optimizes this in a really good way yourself. So, I would encourage people to experiment with crazy clients. I I feel like you never know, you could be like the next um >> [snorts] >> uh like well, if you're super lucky, you could be like the next Claude, right? You could publish something that goes so viral it totally changes the agentic game. Uh I wanted to end on a high and look at some numbers. So, like GitHub itself, it's actually got over 11 million Docker downloads of our standard IO server which is by not like by far not the most used version of it either. Um we've got 126 contributors now and over 2,300 issues and PRs which it's been over seven a day like every single day for over a year now which I do look at almost every single thing eventually. So, it's been like >> [laughter] >> quite a year. Um I mean I other some repos have it even worse, but like I also love it. So, I please keep doing it. Um And yeah, we've got almost 4,000 forks which blows my mind. I kind of want to know like the weirder things that people have done that they haven't contributed back. Uh [snorts] yeah, nearly 30,000 stars and uh we're fast approaching 8 million tool calls a week. And GitHub itself is also facing a new challenge. >> [snorts] >> This is really intense, right? And it shows no sign of slowing down. Uh I still wanted you to keep opening issues and PRs for us. Like we will cope, but you know, this is new territory and um uh you know, everything's like mildly on fire for everyone I think these days and it's just exciting and fun. But uh yeah, thank you so much FOR HAVING ME. >> [applause] [applause] >> I THINK I GOT LIKE 30 SECONDS. I DON'T KNOW IF anyone has anything they want to ask, but What's What's your take on piping tool calls? Um I you know what? I think like things like trying out MCP CLIs and things like that is a fun avenue. I don't think it's entirely ironed out, but like one thing you can do take the read-only tools from some MCP wrapped in a CLI and just give it a proper help and just see how see how the agent does like stuff like that is surprisingly effective. And you know, I like I said I want people to mess with this stuff. So I would encourage you to just try it if you're interested. All right, I'm 0 seconds. I will answer you but in person if that's okay. >> [music]