Making Codebases Agent Ready – Eno Reyes, Factory AI

Channel: aiDotEngineer

Published at: 2025-12-22

YouTube video id: ShuJ_CN6zr4

Source: https://www.youtube.com/watch?v=ShuJ_CN6zr4

[music]
Hey everybody, my name is Eno. Uh really
pumped to talk today about uh something
that at Factory we care a lot about. uh
when we started 2 and 1/2 years ago uh
we said that our mission is to bring
autonomy to software engineering. Um and
that is like got a ton of loaded words
in it. That sounds a little buzzwordy
right now, but I think that the my goal
is that you guys leave this like roughly
20 minutes uh with a bunch of insights
that will apply to your organization uh
and the teams that you build, the
companies you advise, um and if you're
building products in the space, uh
insight into like sort of maybe how to
think about building autonomous systems
and also making your engineering org one
that's able to use agents really
successfully. Um, a sort of like plus of
this is that ideally this applies to any
tools you're using that involve AI. So
it won't be specific to like our product
or any of the other amazing tools out
there. Um, I'd like to start with a
little bit about uh, you know, Andre
Karpathy had a very welltimed tweet. Uh,
so of course I'm going to mention it.
Uh, you know, he he kind of talked about
uh, this idea of software 2.0 coming
from auto uh, the the ability to verify
things, right? Um, this is something
that's in sort of like the the mind of
Silicon Valley right now as uh the most
frontier models are built with post-
training that involve lots of like
verifiable tasks. Um, and really I think
the most interesting thing here is the
sort of frontier and boundary of what
can be solved by AI systems is really
just a uh sort of an input function of
whether or not you can specify an
objective and search through the space
of possible uh solutions, right? And so
uh we are used to building software uh
purely via specification. We say like
the algorithm does this and like input
is x output is y. But if you sort of
shift your mindset to thinking about
automation via verification uh it is a
little bit of a of of a difference in
what is possible to build. Um and there
is another great blog post by uh Jason
where he talks about the asymmetry of
verification. Uh this is like pretty
intuitive to most people who know about
like P versus NP. Uh it's like a a thing
that a lot of people have talked about
throughout the like history of computing
and and software. But there are a ton of
tasks that are much easier to verify
than they are to solve. Um and and vice
versa, but but the the most interesting
sorts of uh easy to verify problems are
ones where there's an objective truth.
They're pretty quick to validate whether
or not they're true. Uh they're
scalable. So validating a bunch of these
things maybe in parallel uh is easy. Um
it's low noise so your chance of
validating it is like really really
high. Um and they have continuous sort
of signals. Uh it's not just like a
binary yes no but like maybe you're 30%
70% 100% accurate or correct. Um and you
know the reason I bring both these
things up is software development is
highly verifiable. Right? This is like
the frontier. It's why uh software
development agents are the most advanced
agents in the world right now. uh and
there are so much uh there's so much
work that has been put in uh over the
last you know 20 to 30 years around the
automated validation and verification of
software that you build um testing right
unit tests end to end tests QA tests
right um the frontier of this is
expanding there's tons of cool companies
like browser base and computer use
agents and all these things that are
making it easier to validate uh really
complex visual or front-end changes um
docs right having like an open API spec
for your codebase uh is something that
can be automated. It's validated. Um I I
I can go through and enumerate a bunch
of these, but I actually think it is
sort of a nice checklist for yourself,
right? Do you have some automated
validation for the format of your code?
Uh do you have llinters? These things
for professional software engineers are
sort of like, yeah, of course we do. But
I think you can go a step further,
right? This is where that continuous
validation component comes in. Um, do
you have llinters that are so
opinionated that a coding agent will
always make code that is exactly at the
level of what your senior engineers will
produce? How do you do that? What does
that even mean? Right? Do you have tests
that will fail when AI slop has been
introduced? Uh, and when highquality AI
code is introduced, those tests pass,
right? These additional layers of
validators are things that most code
bases actually lack because humans are
pretty good at handling most of this
stuff without the automated validation.
Right? Your company may be at some test
coverage rate that's like 50% or 60%.
And that's good enough because humans
will test manually. Um you may have a
flaky build that every third build it
sort of fails and everyone at your
company secretly hates it but no one
says anything, right? These are the
sorts of things that we know are true
about large code bases. And as you scale
out to extremely large code bases,
organizations with 44,000 plus
engineers, right? Uh this starts to
become a very accepted norm that the bar
is sort of maybe at 50% or 60%. Um and
the reality is is most software orgs can
actually scale like that. uh it's sort
of fine to be at that lower uh barrier,
but when you start introducing AI agents
into your software development life
cycle, and I don't just mean in
interactive coding, but really across
the board, right? Uh review,
documentation, testing, all this stuff.
Um this breaks their capabilities. Most
of you have probably only seen an AI
agent that operates in a codebase that
has uh a decent amount of validation. Um
I think a lot of the best companies in
the world right now actually have
introduced very rigorous validation
criteria and it means that their ability
to use agents is significantly greater
than that your like average uh
developer.
Uh you know and and if you think about
it this like traditional loop of
understanding a problem, designing a
solution to the problem, coding it out
and then testing it uh sort of shifts if
you have really rigorous validation. Uh
it becomes a process of when you're
using agents specifying the constraints
by which you would like to be validated
and what should be built. Uh generating
solutions to that outcome verifying uh
both with your automated validation as
well as with your your own intuition. Um
and then iteration where you continue to
iterate on that loop. This move from
sort of like traditional development to
spec specificationdriven development is
one that we're starting to see sort of
bleed into all of the different tools.
Different tools have spec mode. Droids
have like our Droid is our coding agent
have like specification mode, plan mode.
Uh there are entire idees that orient
you around this like specificationdriven
flow. Um and if you combine these two
things together, this is really how you
build reliable and highquality
solutions. So if you think about it,
what is like the best decision for you
to make as an organization? Is it
spending 45 days comparing every single
possible coding tool in the space and
then determining that one tool is
slightly better because it's 10% more
accurate at Swebench or is it making
changes to your organizational practices
that enable all of these coding agents
to succeed and then picking one that
you're, you know, developers like or
honestly letting people choose from the
tons of amazing tools out there.
And when you have these validation
criteria, you can actually introduce way
more complex AI workflows to your
organization, right? Uh if you cannot
automatically validate whether or not a
uh a PR is like reasonably successful or
has code that won't definitely break
prod, uh you are not going to be
parallelizing several like agents at
once, right? you are not going to be
decomposing a large-scale modernization
project uh into a bunch of different
subtasks like that is that is a very
frontier style task to use AI for and if
the single task execution right the
simple I would like to get this done
here's exactly how I'd like it to be
done and here's how you should validate
if that does not work nearly 100% of the
time you can sort of forget successfully
using these other things at scale in
your company um when you get into other
tools like code review, right? Uh if you
want a really highquality AI generated
code review, you need documentation for
your AI systems. Uh and yes, uh agents
will get better at, you know, picking
out, you know, whether or not to run
lint or test. They will get better at
finding solutions when you don't have
explicit pointers. They'll get better at
search, but they won't get better at
just randomly creating this validation
criteria out of thin air. Right? This is
why we believe software developers, by
the way, are going to continue to be
heavily involved in the process of
building software because your role
starts to shift to curating the sort of
environment and garden that your
software is built from. You're setting
the constraints. You're building these
automations and introducing continued
opinionatedness
uh into the uh into these automations.
Um, and you know, if your company
doesn't have at least all of these,
right? Then that means that there's a
lot of work that you can do totally
absent of a procurement cycle or buying
one tool or trying out another one. Uh,
and so plug is that we help
organizations do this, right? I think
that it's great to have tools that allow
you to uh go in and assess this stuff.
They have ROI analytics that let you
interact. Um but I think that for most
organizations uh there is actually like
a very clear way to do this right you
can go and analyze where are you across
those eight different pillars of like
automated validation do you have a
llinter how good is the llinter do you
have agents MD files an open standard
that almost every single coding agent
supports um you can improve uh and
systematically enhance uh these
different validation criteria uh and you
can go through and say Well, we're
seeing that coding agents are reliable
enough for a senior developer to use,
but our junior developers, if you have
the tooling to to tell, by the way, like
which developer is using what tools, you
you you can ask questions like maybe our
junior developers are actually totally
unable to use these coding agents. And
you'll learn that the reason why is not
because they're like more incompetent or
they don't know how to use the tool, but
because there's these niche practices
that you don't have automated validation
for, right? And if you think about what
what is the difference between a like
Google or a meta and a uh a still large
but like 2,000 person engineering or the
difference is that a newrad with
effectively zero context can go and ship
a change to make YouTube's like boundary
like slightly more round and it won't
with some degree of confidence take down
YouTube for like a billion users, right?
And the reason that's possible is
because of the insane amounts of
validation that have to happen on that
code for it to be shipped. The big
difference that we now have is we have
coding agents that can go and identify
exactly where these gaps are and they
can actually remediate those fixes.
Right? So you can ask a coding agent,
could you figure out where we're not
being opinionated enough about our
llinters. You can ask a coding agent to
generate tests. We have an engineer
named Alvin who I love this quote. He
said a slop test is better than no test.
Uh and I think that that's slightly
controversial, but the thing that I
would argue here is that just having
something there, right, that it passes
uh when changes are correct and somewhat
accurately uh matches to the spec of
what you want built, uh people will
enhance it. They'll upgrade it and other
agents will actually notice these tests.
They will follow the patterns. So the
more opinionated you get, the faster the
cycle continues. So I think that what
you guys should be thinking about is
what are the feedback loops in our
organization that we are catering
towards. If you have better agents, they
will make the environment better which
will make the agents better which will
mean you have more time to make the
environment better. And this is sort of
the new DevX loop as well that
organizations can invest in uh that will
enhance all of the tools that you're
procuring, right? So no matter whether
it's a code review tool, a coding agent,
etc., they will all benefit. Um and I
would argue that it sort of shifts your
mental model about what you're as a
leader investing in when you're
investing in your software work right
now. The idea of uh you know opex as
like the input to engineering projects
like we are investing in we want more
people in order to solve this problem.
we need 10 more people. Um, I would I
would argue that uh the other thing that
you can now start investing in is this
environment feedback loop that enables
these additional people to be
significantly more successful, right?
And I think that that's the feedback
loop that can actually take quite a lot
of value because coding agents can just
scale this out. So you know all of this
is to say there's a lot that can be done
outside of the like product itself uh to
enable these systems and the best coding
agents will actually take advantage of
these validation loops right so if your
coding agent isn't proactively seeking
llinters tests etc then you know at the
end of the day it's not going to be as
good as one that will seek those
validation criteria and in addition to
that when organizations uh uh think
about these sorts of things if you're
the person who's able to say, "Here's my
opinion. Here's how I want software to
be built." It scales your capabilities
out greater than ever before. Like one
opinionated engineer can actually
meaningfully change the velocity of the
entire business if you take this to
heart. Uh and you have a way to measure
and systematically improve. Um so that's
uh you know the the majority of uh what
I came here to say. I think that the the
the only thing that I'd leave you with
uh is that when you think about where AI
is going and like where we're at today,
we are still really earn early in our
journey of using software development
agents. If you want a world where the
moment a customer issue comes in, a bug
is filed, that ticket is picked up, a
coding agent executes on that, that
feedback is presented to a developer,
they click approve, that code is merged
and deployed to production in a feedback
loop that takes maybe an hour, 2 hours.
That will be possible, right? We all are
sort of skeptical about that fully
autonomous flow. That is technically
feasible today. The limiter is not the
capability of the coding agent. The
limit is your organization's validation
criteria. So this is like an investment
that made today will make your
organization not 1.5x, not 2x, but that
is where the real like 5x, 6x, 7x comes
from. Um, and it's sort of a an easy
thing to say and it's an unfortunate
story because what that means is you
have to invest in this. It's not
something that like AI will just
magically give to you. Uh it's a choice
that you as an organization have. Uh and
if you make it now, I can guarantee you
that you will be in the top 1 5% of
organizations in terms of edge velocity.
Um and you will out compete everybody
else in the field. So highly recommend
investing in this sort of stuff and
hopefully you found this helpful and
have some lessons to take home. Thanks.
[applause]
[music]
>> [music]
[music]
>> Heat.