Scaling GitHub for your Agents — Sam Morrow, GitHub

Channel: aiDotEngineer
Published at: 2026-04-27
YouTube video id: 0n3MKk7r60w
Source: https://www.youtube.com/watch?v=0n3MKk7r60w
[music]
>> All right. Hello, London.
>> [applause]
>> And I hope everyone's been enjoying the
AI Engineer Europe so far. For there's
so many amazing speakers. I've been like
watching talks and talking to people for
days now and it's been immense.
I'm Sam. I lead development of GitHub's
MCP server and yeah, I'm here to talk
about mostly challenges we've faced
building and scaling our remote server,
how we've overcome them,
and uh before I will start, I just like
I like messing with people. So, you
know, here quick show of hands,
who's used an MCP server?
Good. Good.
Uh who's used GitHubs?
Who has a hot take on GitHub?
>> [laughter]
>> And uh yeah, does anyone build a server
or client?
Oh, nice. Quite a few. Um and yeah, has
anyone contributed to the specification?
Oh. Oh, yeah. I got I got one. That's
actually the first one, I think, other
than the MCP dev summit. There was quite
a lot of them. Uh
but um [snorts]
yeah, anyway, it's really awesome to see
so many hands. So, I'm glad that I've
actually come to the right place. But
yes, for GitHub, you know, our MCP
journey started well, at least in public
in April last year.
And uh we actually open-sourced our
local MCP in April last year.
And we've just turned 1 year's old. So,
I'm super stoked by that. But um yeah,
back then, right, there was a tremendous
buzz.
Uh we were the most starred repo on
GitHub of of the particular week. And uh
like the exposure meant we got a high
volume of public contributions,
uh rapidly filling gaps in plat-
platform coverage that people kind of
wanted to add tools and things. And you
know, not everything was perfect, right?
After a month or so of new features,
agents in some ways were getting worse
at using GitHub and context windows were
getting blown out quicker.
And uh you know, we picked, I think,
over 100 tools. And certainly at the
time, that was just too many.
Uh LangChain had already produced
research
uh they published in February that year,
you know, of the exact kind of problems
we were seeing.
More tools don't make better agents, you
know, they get confused and forgetful.
Well, I say more tools like more
context and more tools shoved directly
into the context to be precise. But uh
yeah, GitHub's a really expansive
platform. And we provided tools, you
know, for repos, issues, PRs, actions,
projects, like even more things.
Uh but the hard part of solving this was
like we didn't want to prevent users
from having the tools individually that
they needed and they used. Uh and
suffice to say our user base is pretty
diverse.
And probably even like on GitHub
platform at the moment, there might be
like one or two clauses as well. Uh
[snorts] and for the record,
uh there's a team of us who work on it.
It's not just me.
>> [snorts]
>> And my team is awesome.
But uh
yeah, so
to try and fix some of this, you know, I
quick- quickly added this thing tool
sets, which was, you know, a kind of
grouping concept of related product
tools. And users could just pick which
ones they wanted and configure it. Uh I
also like added a dynamic tool selection
thing uh where agents could discover
sets of tools and then turn on in
chunks. And uh we never released it, but
I made a kind of rag version of the
same. Um you know, for kind of semantic
tool search and discovery. But it
Uh
like what what do you think happened
even in spite of all this stuff?
>> [snorts]
[laughter]
>> Everyone used the default settings. It
was really annoying because like in a
way we had all these elegant solutions.
Uh all they did was require users to
actually, you know, configure the JSON a
little bit. And most users just don't.
Uh
maybe it's even a partially a spec
problem cuz
uh you know, for like every proposal so
far for grouping to the MCP
specification for various reasons has
been rejected. And there have been
several attempts. Uh
and like in a sense, like every mode or
configuration we add, you you know, one
could argue is papering over potential
gaps. Uh like or gaps in client
implementations. So, like as an example,
like we have a read-only mode and
uh roughly 17% of our users use it, but
it maps one-to-one to the read-only uh
sorry, yeah, the read-only hint
annotation.
But like no client exposes that as a
method of filtering servers. I think
some gateways now do, but anyway, it's a
it's an interesting easy win for more
enterprise use cases where people often
only want that.
>> [snorts]
>> But uh yeah, we needed to find better
solutions to context reduction. And uh
you don't need to worry too much about
the specifics. This is dated now. But uh
like we started trying to optimize and
we looked at the use the usage patterns
on our remote server. And initially, you
know, we cut the
amount of context used by
focusing the tools more specifically to
the general case. And based on usage to
like about 49% reduction of the initial
load. And then we subsequently also
grouped CRUD tools and brought that down
even more. And I think like I think you
get about 40 tools if you use the
default configuration. And then you can
kind of expand or contract that based on
your own preference. But uh yeah,
like it's easy to customize. And uh
we've also like recently had a massive
push to, you know, reduce output tokens
of a lot of tools as well.
And um in this example,
you know, just by tailoring exactly what
comes with the list pull requests, it's
like actually lost more than 75% of the
tokens used in the output. So,
you know, in terms of how token hungry
GitHub server is, like it's it's it's a
moving target. We're constantly changing
things that improve it. And uh if you
haven't used it in a while, like it's
likely very different from a few months
ago even. And um
yeah, anyway, like and we haven't ruled
out more advanced approaches like code
mode. And we're always experimenting
internally. But uh
on the heels of this, we also dug into
our data and we found some more
opportunities.
>> [snorts]
>> So, yeah, like uh we made a big push to
reduce tool failures as well. And the
success rate is roughly, I think, over
95% at this point. But uh
like not all failure is preventable cuz
agents don't necessarily know which
repos they have write permission on.
They still hallucinate. But uh
we've been able to identify significant
numbers of areas that could be overcome,
mostly by encoding a sort of agent
intent into our tool surface.
And you know, you might have to make
five API calls to make it more robust.
But you know, in that case, we do that
in the server side to reduce round trips
cuz that, you know, saves context, saves
time, and usually um [snorts]
makes uh massively better experience,
you know, makes the agents more
successful.
And yeah, we also started to run evals
last year. Um
I'm not going to go into detail. The
that link takes you to a blog article
that my colleague senior wrote about
doing it. But uh
one of the gists is instead of
micro-optimizing
individual tool descriptions, you know,
you try to test them against each other
to try and make sure that they're called
at the right times and not called at the
wrong times. So that in the pool of each
other, they don't fight for like you
know, you like the perfect tool
description that makes the agent call it
all the time is terrible as is the
reverse of that. So, you need to try and
get that as tight as possible. Um
But yeah, this could be a whole other
talk.
Security, on the other hand, is
something that's like a kind of constant
menace in all of this. I've seen lots of
people talking about this.
Um
and it's a real problem in some ways for
us because, you know, we've a lot of
people using plain text access tokens
for MCP in the wild.
And uh usually they're stored somewhere
the agent can access.
They're frequently long-lived. They're
often over-privileged. And they're kind
of sat there just waiting to be abused.
Uh
end users, like I I don't think they're
choosing this, you know, like it's it's
actually hard to make configuration easy
and secure at the same time. And clients
have to make use of system keyrings or
encrypted storage. And like VS Code
does. Uh but uh you know, the MCP spec
also provided a better way with remote
HTTP, which, you know, is all the way
back to April last year as well.
Um [snorts]
and we embraced this, of course. Um
And we wanted to make secure connection
path of least resistance. Uh we didn't
want users to have to download a local
runtime.
And you know, our remote server supports
OAuth 2.1. And my team even helped add
the proof key for code exchange support,
which is commonly known as PKCE, to
GitHub's authorization server to improve
the security posture for client apps. Um
but as I said, we hoped OAuth would be
the path of least resistance. And again,
perhaps some of you might know what
happened.
Everyone expected us to support the
dynamic client registration. And for us,
like it created more problems than it
solved because like
if you implement it kind of properly,
it's hard not to have unbounded growth
of app databases and challenges of how
you would bucket them for rate limits
and there isn't a reliable app identity.
So, we just considered it and rejected
it and
like we feel like it's a
well-intentioned mistake and we're you
know, we're not the only authorization
server to not support this.
And um
even um
like MCP itself, right? It decided that
client ID metadata
is probably the way to go and I can't
promise that we're going to support it,
but I promise that I am trying to get us
to support it and that should make
logging in like massively easier.
But um yeah, more on that in the future.
And also, speaking of security, some of
you may have seen this.
Um
this was a fun day.
>> [laughter]
>> But
like you know, Invariant Labs published
this and you know, like it's a correct
sort of correctly done prompt injection
exfil attack for getting private data
out of GitHub and
um the thing is you know, they call it
specifically GitHub's MCP server out and
I think that
we you know, we do provide the tools
that can enable that if you just kind of
enable them all, but uh
it applies to almost every agent setup
whether they use MCP or not or whether
they use GitHub MCP, you know, like the
lethal trifecta stuff which I'm not
going to rehash now cuz I think many of
you have probably seen it or you can
look it up like Simon's
Simon Wilson's blog post on that's
excellent, but
you know,
the utility of agents is in conflict
direct conflict with kind of protecting
this stuff and it's like it's an active
space trying to work out how to prevent
these problems, but uh it's not solved
and it's very much not unique to GitHub
and we have users with wildly different
risk profiles, you know, like um
we you know, we even have people that
have like air-gapped GitHub Enterprise
server instances
in like much more secure
and then you know, obviously the
collaborators etc. are also
just running straight to GitHub with
like you know, probably full token
access to the agent everything and
that's kind of also interesting, right?
And like I'm
I'm not naysaying any of this. It's just
it's cool to kind of
see what people do and see if we can
actually support the different use cases
and security postures while everyone
experiments with this stuff.
And uh
we also kind of use like lean on off to
uh
manage tools as well. And this is
something I'm pretty happy with. Um
if you log into GitHub MCP with a PAT
token that we just immediately filter
the tools down by the scopes that the
token has. You
uh you don't have to do anything other
than give it the token.
On OAuth, we support step-up OAuth. So,
you know, you can get a we could return
a scope challenge and then it will
interactively ask the user if they want
to allow the scope.
And if you do, then you can
uh like continue the tool call. It
doesn't fail, which I think is also
nice. And then VS Code, for example,
supports that and I initially worked on
this with them just because they already
have a token to use GitHub and what they
wanted was that if their baked-in token
doesn't have permissions to use
everything, that it instead of just
failing, there was a mechanism for
users having a clean install and then an
upscoping later if they need it.
And yeah, lastly, server tokens as well.
Like they didn't have a like on actions
and things, they didn't have a user.
So, user-specific tools are kind of out
there and then by removing those, we're
just removing kind of constant sources
of failure and wasted context at the
same time.
Uh
we run a completely sort of stateless
server setup and um
we have been using Redis for session
storage, you know, it's standard
observability and deep kind of stack.
Like this is not a weird picture, but I
guess one of the weird things for some
people is a lot of people are running a
stateful MCP server process in the
singular and have kind of struggled with
how you get it into this shape.
But um
for us like we did a few things cuz it's
very dynamic, but like one of the fun
things we did is um
we uh
we actually make a brand new in the SDK
sense a brand new server instance on
every single request and we add the
tools to it at the start. So, whatever
your configuration is, it just builds
this and then you get what you've asked
for or what you're allowed to use cuz
some things have policies that impact
whether you've got tools or not. Um
and
yeah, like we've been able to scale to
this point we serve around 7 million
tool calls a week. And we you know, we
don't have session affinity.
Uh the even the sessions we generally
only use them to identify that's the
only way to identify the self-reported
client identity that comes through MCP.
So, it's useful for us to understand
like what clients people are using the
server with. So,
yeah, like we use sessions for that, but
um
yeah, we also have a like wanted to
bring experiments to all of you and
everyone. And um [snorts]
we have this thing that's a Insiders
mode and
all it all it does is it it turns on
certain feature flags and things for
experiments that we're happy to just
ship to anyone who wants to use them.
And uh this just takes you to the
documentation, but um
like
an example of something that we haven't
released generally yet, but is on
Insiders is our MCP apps and like just
you know, I I set up the example before
I came in, but like it's quite nice when
you're talking to the agent to have the
opportunity to kind of edit the
AI-generated
uh issue especially if you're you know,
you're working heavily in professional
open source stuff and you want to make
sure that it's you posting and it's not
going to get closed as a sort of
bot-generated thing. It like this is a
nice human in the loop thing that MCP
enables and I I much you know, I I
wasn't sure how much I would like it at
first, but then I've come to love it
because I kind of care about how
my issues and things are received by
people and this is just a really great
way to make sure that I can I can check
that.
Um
So, yeah, like in terms of where I think
it's going like if something along these
lines,
I think [snorts] a near future, you
know, server discovery will hopefully be
automatic and tool tool use will
probably become more compositional like
bash or piping tools into other tools,
streaming data through them or like you
know, Cloudflare's code mode approach or
Anthropic's tool search tool API which
just landed in Claude Code a couple of
weeks ago.
And OpenAI recently added a similar API
as well. Sorry, OpenAI added a similar
API, too. And uh
I you know, I I fully expect that like
thousands of tools will be normal very
soon. We're trying to iron out all the
problems that prevented it in the first
place and I'll probably reverse many of
the fewer tools decisions. And uh users
hopefully won't even have to know what
MCP is. They'll just convey what it is
they want to do and the
OAuth setup and like you know, the
tool selection things will become truly
autonomous and I don't think we're that
far away from this, but we're we're kind
of in this experimental phase where
we're not really there yet. But um
I think harnesses like Pi are also
interesting because
you can build a weird client that maybe
optimizes this in a really good way
yourself. So, I would encourage people
to experiment with crazy clients. I I
feel like
you never know, you could be like the
next
um
>> [snorts]
>> uh
like well, if you're super lucky, you
could be like the next Claude, right?
You could
publish something that goes so viral it
totally changes the agentic game. Uh
I wanted to end on a high and look at
some numbers.
So,
like GitHub itself, it's actually got
over 11 million Docker downloads of our
standard IO server which is by not like
by far not the most used version of it
either. Um we've got 126 contributors
now
and over 2,300 issues and PRs which it's
been over seven a day like every single
day for over a year now which I do look
at almost every single thing eventually.
So, it's been like
>> [laughter]
>> quite a year. Um I mean I other some
repos have it even worse, but like I
also love it. So, I please keep doing
it. Um
And yeah, we've got almost 4,000 forks
which blows my mind. I kind of want to
know like the weirder things that people
have done that they haven't contributed
back. Uh [snorts]
yeah, nearly 30,000 stars and uh we're
fast approaching 8 million tool calls a
week. And GitHub itself is also facing a
new challenge.
>> [snorts]
>> This is really intense, right? And it
shows no sign of slowing down.
Uh I still wanted you to keep opening
issues and PRs for us. Like we will
cope, but you know, this is new
territory and um
uh you know, everything's like mildly on
fire for everyone I think these days and
it's just exciting and fun.
But uh yeah, thank you so much FOR
HAVING ME.
>> [applause]
[applause]
>> I THINK I GOT LIKE 30 SECONDS. I DON'T
KNOW IF anyone has anything they want to
ask, but
What's
What's your take on piping tool calls?
Um I you know what? I think like things
like trying out MCP CLIs and things like
that is a fun avenue. I don't think it's
entirely ironed out, but like one thing
you can do take the read-only tools from
some MCP wrapped in a CLI and just give
it a proper help and just see how see
how the agent does like stuff like that
is surprisingly effective.
And
you know, I like I said I want people to
mess with this stuff. So I would
encourage you to just try it if you're
interested.
All right, I'm 0 seconds. I will answer
you but in person if that's okay.
>> [music]