What Data from 20m Pull Requests Reveal About AI Transformation — Nick Arcolano, Jellyfish

Channel: aiDotEngineer
Published at: 2025-11-24
YouTube video id: WqZq8L-v9pA
Source: https://www.youtube.com/watch?v=WqZq8L-v9pA
Hi, my name is Nicholas Arcolano and I'm
the head of research at Jellyfish.
Today, I'd like to talk to you about AI
transformation, specifically what real
world data can tell us about what's
actually happening in the wild. Now, a
lot of AI native companies are being
founded right now, and there are many
more existing companies that are trying
to transform themselves into being AI
native. I've talked to many folks from
these companies, and they all have the
same big questions. Number one, what
does good adoption of AI coding tools
and agents actually look like? Uh,
number two, what productivity gains
should I be expecting as we transform
our team and the tools that we use? Uh,
three, what are the side effects of this
transformation?
And perhaps most importantly, if AI
transformation isn't delivering as
advertised, what's going on and what can
you do about it? Now, at Jellyfish, we
believe the best way to get answers is
with data. So in the next 15- 20 minutes
or so, I'm going to give you some
databacked insights from studies we've
done to help you tackle these big
questions.
Okay, before we jump in though, uh let's
take a minute to talk about the data
behind the rest of the stuff in this
talk. Uh now at Jellyfish, we provide
analytics and insights for software
engineering leaders. And to do this, we
combine information from multiple
sources, including
usage and interactions with AI coding
tools like C-pilot, Cursor, Claude Code,
uh interactions uh with autonomous
coding agents, things like Devon and
Codeex as well as PR review bots. We
also combine this with data from source
control platforms like GitHub, so we can
understand things about the actual
codebase where the work is happening. We
also pull in data from task management
platforms uh things like linear or Jira
and that tells you about what the actual
goal of uh the work being done is. So
for the rest of this talk we're going to
be looking at findings from a data set
with data like this uh across our
customers comprises about 20 million
poll requests. Uh these were written
emerged by about 200,000 uh developers
from around a thousand companies. We've
been collecting this data for more than
a year. So today we'll be looking at
results that span from June 2024 to the
present. Okay,
so let's dig in. Question one, what does
good adoption look like?
Well, [sighs and gasps] let's start with
lines of code. I don't think this is a
great metric, but it's one we all hear
about in the media a bunch, so it's
worth talking about. Here's data from a
cohort of companies we've been tracking
since June of last year. The purple bar
represents the fraction of those
companies that are generating 50% or
more of their code with AI. So if you
look at that purple bar, you can see
that starting last summer, only about 2%
of these companies were generating 50%
or more of their code with AI. But you
can see this has been steadily growing.
And as of last month, among these same
companies, now nearly half are
generating 50% or more of their code
with AI.
Now, I think a more useful thing to look
at actually is developer adoption
because this gets at the actual behavior
change that you want to see in your
team. It's also the thing I've seen that
correlates most directly with good
productivity outcomes. And we're going
to talk about this a lot more later. Uh
but first, we define an AI adoption rate
for developers by computing the fraction
of time that they use AI tools when they
code. So 100% for a developer, that
means you're using AI tools every time
you code. A company's adoption rate for
the whole company, that's just the
average of the adoption rates for all
their individuals. So 100% for a company
means that every developer is using AI
every time they code. So what you see
here, this is a plot of the 25th, 50th,
and 75th percentile of company adoption
rates uh by week for the developers and
companies that we've been tracking. And
if you look at the AI adoption rates as
of last summer, you can see the median
adoption rate was around 22%. So, uh,
median company developers are using AI
22% of the time that they code. It's
grown steadily since then, and today
we're seeing median adoption rates close
to 90%. Now, if you're like me and
you're using multiple tools constantly
in parallel, both synchronous and
asynchronous modes, uh, you're you're at
100%. It might seem crazy to you that
not everyone else is at 100%. However,
the reality is that for many teams,
there are still real technical,
organizational, and cultural barriers to
adopting these tools more completely.
So, that brings me to my final point on
adoption. You might ask, what about
autonomous coding agents? Now, the
results I've just shown you, those are
overwhelmingly from interactive coding
tools, things like Copilot, Cursor,
Claude Code. Now, we know that these
tools all have interactive agentic
modes, but what about your your kind of
true fully autonomous agents like your
Devons or your codeexs? Maybe you're
using agents like these or something
else to great effect or maybe you
haven't really gotten going with
autonomous agents yet. It's fine. You
know, wherever you are in your journey,
but if it feels like you're slow going
getting off the ground with autonomous
agents, I'm here to tell you you're not
alone. So in our data set, we only see
about 44% of companies have done
anything with autonomous agents at all
in the past 3 months. The vast majority
of that work is what you'd consider um
triing and experimentation type stuff
like not full scale production and
ultimately it all amounts to less than
uh 2% of the millions of PRs that were
merged over that time frame. Uh so you
know still very early days.
All right, let's move on. Now, I'd like
to talk about productivity. So, even
though autonomous agents aren't yet
delivering at scale, we're still seeing
big gains from adoption of interactive
coding agents. So, let's talk about what
we're seeing. First though, what do we
mean by productivity? This can be a very
loaded term, kind of squishy,
overloaded. There's many ways to attack
it. Uh, a good place to start though,
just plain old PR throughput. How many
pull requests does the average engineer
merge per week? Not the most exotic
metric, but it's proven. It's widely
accepted. Uh do note that the absolute
level of PR throughput is something that
varies, right? It depends on things like
how you like to scope work. It actually
also depends on your architecture and
put a pin in that because we're going to
talk about that more later. Uh however,
measuring the change in PR throughput,
especially to keep all these other
things constant. Measuring that for your
team is a good way to uh track
productivity gains. Another good one,
cycle time. Uh you know, lots of
different ways to define that one, but
basically the latency or lead time to
code getting deployed. For our purposes,
we'll take each PR and we'll measure the
time frame from the first commit in the
PR until it was merged.
Okay, so here's what we're seeing for
changes in PR throughput. And let me
explain this chart. Uh, every data point
here is a snapshot of a given company on
a given week. The x-axis is the
company's AI adoption rate that we
discussed earlier. The y-axis is the
company's average PRs per engineer that
week. So you can see here a clear
correlation between AI adoption and PR
throughput. The average trend here is
about a 2x change as you go from zero to
full adoption. So on average, a company
should expect to double their PR
throughput if they go from not using AI
at all, which not really anybody's doing
anymore, to 100% uh adoption of AI
coding tools.
Now, we also see some gains in cycle
time. So more work is happening and it's
happening faster. This is similar to the
previous chart, but now on the y- axis,
we're looking at median cycle time for
PRs merged each week instead of PR
throughput. Uh, this is a cool chart. As
an aside, I like the cycle time
distribution because you can see these
two clear bands horizontally. So that
lower horizontal cluster that
corresponds to tasks that take less than
a day and then you see sort of a valley
and then there's a band in the middle
for tasks that take about two days. Then
there's a long tail of stuff going up
the y-axis that takes much longer. I've
truncated it here because as we all know
some things can take uh quite a while
[laughter] to to get merged. Um but you
know what's exciting here is the average
trend is a 24% decrease in cycle times
as you go from 0% to 100% adoption of AI
coding tools.
So big picture is good news for
productivity gains and maybe you're
seeing these things in your own
organization but uh what about the side
effects? We all know there's no free
lunch. So, what other things change as
you go through an AI transformation?
Well, one thing we've observed is that
PRs are getting bigger. So, here's a
plot like the previous ones I've showed,
except now the y-axis is PR size. So, on
average, teams that have fully adopted
AI coding tools are pushing PRs that are
18% larger in terms of net lines of code
added. Now that size change is due much
more uh you know when I say net it's due
more to additions than deletion. So that
means that the combined change is
primarily coming from net new code not
necessarily just uh you know fully
rewritten or heavily reworked code. Uh
another kind of interesting detail is
that the average number of files touched
is about the same. So this change is
more about code that's uh it's more
thorough or maybe just more verbose. But
it's not the case that AI is touching
more files and changing code in more
different places in in the code base.
This is largely happening within the
same files.
Well, now if teams are pushing more PRs
and writing and merging them faster and
the PRs are getting bigger, then you
might be wondering about quality. So,
are we seeing effects on quality as we
use more AI and push code faster? Well,
right now the answer is not really.
We're not really seeing any big effects.
We've looked at bug tickets created and
we looked at rates of PR reverts code
that had to be rolled back and we
haven't found any statistically
significant relationship with the rate
of AI adoption.
Uh interestingly we have found increases
in the rates of bugs resolved. Uh when
you dig into the data you find this is
because um teams are disproportionately
using AI to tackle bug tickets in their
backlog. So you see a lot more uh bug
tickets being um uh addressed by AI but
not necessarily being caused by AI. Uh
this makes sense. You know, bugs are
often well scoped verifiable tasks that
AI coding tools can be set up well to
succeed at. And we're seeing uh a lot of
people having success throwing AI at
those kinds of tasks. Uh but basically
there's there's no smoking gun on
quality yet though you know we're going
to keep digging in here especially as
usage of of asynchronous agents grows.
All right last question.
What if what you're seeing at your or
doesn't align with the kind of results
we've been talking about here so far?
You know what if you're listening to
this and it is just not your reality.
Well, I think I've made it clear uh so
far that the most important thing to
focus on first is adoption. You're not
going to see gains until you get folks
using these tools at scale. I think
that's common sense, but maybe you are
seeing high adoption and you're still
not seeing the kind of productivity
gains that all your friends on LinkedIn
are crowing about. So, what's going on?
Well, we've looked at a lot of things
here and there's plenty more to
investigate, but I'd like to share one
that's particularly interesting and uh
that's code architecture. By code
architecture uh what I mean is how are
the code for your products and services
organized across your repositories. So
uh think about code being organized into
monor repos versus poly repos and that
arrangement of of your code. It could be
indicative of monolithic services versus
microservices. It could be the
difference between a centralized versus
a more federated product strategy. Uh
and the way that we actually measure
this, you know, one key metric for
understanding it is active repos per
engineer. This is actually a pretty
straightforward one. It's just how many
distinct repos typical engineer uh
pushes code in in a given week.
One really cool thing about this metric
is that it's scale independent. So it
turns out that you know by computing
this per engineer normalizing by the
number of engineers you remove any
correlation with the size of of the
company uh with the size of the team. So
in other words this metric it tells you
something about the shape of the code
that your engineers have to work with on
a daily and weekly basis and it tells
you that regardless of how big your
company is.
So you know this metric that that I'm
introducing here this is what the
distribution of that metric looks like.
Uh here's a probability distribution
across the companies in our study. The
more centralized architectures you can
see on the left uh and then there's a
long tail of highly distributed
architectures to the right and then more
balanced architectures, you know,
balanced and lightly distributed line
between these two extremes. So we we've
got these four regimes as you increase
um the active repos per engineer.
So you know, here's where it gets really
interesting. So remember those 2x gains
in PR throughput that I showed you
before. Here's a flashback. Remember
this. Uh well, if we take this plot, you
know, take all these data points, all
these different companies, and you
segment on um this active repos per
engineer.
We've got, you know, four different
regimes that we can do this analysis in.
So we've got centralized, balanced,
distributed, and highly distributed. And
if we perform that same analysis, we see
big differences. So looking at that top
row, you can see centralized and
balanced code architectures,
uh, they trend more like 4x, not like
2x. So they're doing much better than
the average. And the distributed
architecture there, uh, in the the lower
leftand corner in the teal, that that
looks more like that global 2x trend
that we see when you look at all the
data. What's really interesting is this
highly distributed case. There's
essentially no correlation here between
AI adoption and PR throughput. Um, and
actually the the weak trend that does
exist is actually slightly negative. So
what's what's going on here? Like why
are teams with highly distributed
architectures struggling? They don't
seem to be getting real gains, at least
not on average from AI. Well, a big part
of what you're seeing here is really the
problem of context. So most of today's
tools are really set up uh best to work
with one repo at a time. You know, we've
used these uh you know, you pick a repo
and and you dive in and combining
context across repos, it's often
challenging. It's challenging uh for
humans as well as for coding tools and
for agents. Uh moreover,
the relationships between these repos
and the systems and products they relate
to, they're often not even written down
very clearly. They might be largely
locked in the heads of senior engineers.
they're definitely not accessible often
to coding tools and agents. So, it's
going to take some time for for teams to
invest in the context engineering that's
needed here. It's an interesting
challenge and especially you know in
light of the fact that uh a lot of folks
are saying you may have heard this too
that microservices are the right way to
go for a native development. So I could
see a world certainly where we solve
these context challenges. We adopt
autonomous agents at scale. They're set
up for success and this whole thing
flips and this highly distributed
category becomes the most productive way
to do things. But right now this is what
we're seeing out in the world. Um as an
aside, another thing you might notice
here is that all of these distributions
they you know as you go from the most
centralized to the most distributed uh
these uh you know this um PR per
engineer uh shifts upward uh you know
what's happening is the absolute number
of repos uh increases as architectures
get more distributed. Basically, in a
highly distributed architecture, it just
takes more PRs overall to get things
done due to things like migrations,
cross reaper coordination. And I bring
this up because this is one of the many
reasons why counting PRs in the absolute
sense isn't isn't a great metric. You
really need to be tracking change in PR
throughput to understand productivity.
Uh because these things vary due to to
factors like architecture choices.
Okay, so that's it. Uh to recap, you
know, probably not news to anyone uh
watching this, but AI coding tools are
being used in a big way. Autonomous
agents though, not so much. It's still
uh still early days. Uh we're seeing big
productivity gains with mo more code
being shipped and faster. Even if all
you're using is interactive AI coding
tools like Copilot, Cursor, and Cloud
Code, you feel like maybe, you know,
you're not uh as up on agentic, you
know, fully autonomous agentic coding as
you ought to be. two exchange of PR
throughput uh should be your your
expectation. You should you should be
seeing that or more. Um but also you
should expect bigger PRs. Uh but maybe
we can all ease up on some extreme
quality anxiety. Like we want to keep an
eye on that, but we're just not seeing
big issues there. At least not yet. And
finally, there are a lot of reasons why
your mileage may vary and we're going to
continue looking at this. But one place
you can start is to think about your
code architecture. how it might be
holding you back, what you can do um you
know to to compensate for some of the
context limitations you have and
ultimately try to unlock some of those
uh the sweet AI productivity gains. So
that's it. That's all I've got. I'm
Nicholas Arcolano, head of research at
Jellyfish. Thank you so much for
listening.