Making Codebases Agent Ready – Eno Reyes, Factory AI
Channel: aiDotEngineer
Published at: 2025-12-22
YouTube video id: ShuJ_CN6zr4
Source: https://www.youtube.com/watch?v=ShuJ_CN6zr4
[music] Hey everybody, my name is Eno. Uh really pumped to talk today about uh something that at Factory we care a lot about. uh when we started 2 and 1/2 years ago uh we said that our mission is to bring autonomy to software engineering. Um and that is like got a ton of loaded words in it. That sounds a little buzzwordy right now, but I think that the my goal is that you guys leave this like roughly 20 minutes uh with a bunch of insights that will apply to your organization uh and the teams that you build, the companies you advise, um and if you're building products in the space, uh insight into like sort of maybe how to think about building autonomous systems and also making your engineering org one that's able to use agents really successfully. Um, a sort of like plus of this is that ideally this applies to any tools you're using that involve AI. So it won't be specific to like our product or any of the other amazing tools out there. Um, I'd like to start with a little bit about uh, you know, Andre Karpathy had a very welltimed tweet. Uh, so of course I'm going to mention it. Uh, you know, he he kind of talked about uh, this idea of software 2.0 coming from auto uh, the the ability to verify things, right? Um, this is something that's in sort of like the the mind of Silicon Valley right now as uh the most frontier models are built with post- training that involve lots of like verifiable tasks. Um, and really I think the most interesting thing here is the sort of frontier and boundary of what can be solved by AI systems is really just a uh sort of an input function of whether or not you can specify an objective and search through the space of possible uh solutions, right? And so uh we are used to building software uh purely via specification. We say like the algorithm does this and like input is x output is y. But if you sort of shift your mindset to thinking about automation via verification uh it is a little bit of a of of a difference in what is possible to build. Um and there is another great blog post by uh Jason where he talks about the asymmetry of verification. Uh this is like pretty intuitive to most people who know about like P versus NP. Uh it's like a a thing that a lot of people have talked about throughout the like history of computing and and software. But there are a ton of tasks that are much easier to verify than they are to solve. Um and and vice versa, but but the the most interesting sorts of uh easy to verify problems are ones where there's an objective truth. They're pretty quick to validate whether or not they're true. Uh they're scalable. So validating a bunch of these things maybe in parallel uh is easy. Um it's low noise so your chance of validating it is like really really high. Um and they have continuous sort of signals. Uh it's not just like a binary yes no but like maybe you're 30% 70% 100% accurate or correct. Um and you know the reason I bring both these things up is software development is highly verifiable. Right? This is like the frontier. It's why uh software development agents are the most advanced agents in the world right now. uh and there are so much uh there's so much work that has been put in uh over the last you know 20 to 30 years around the automated validation and verification of software that you build um testing right unit tests end to end tests QA tests right um the frontier of this is expanding there's tons of cool companies like browser base and computer use agents and all these things that are making it easier to validate uh really complex visual or front-end changes um docs right having like an open API spec for your codebase uh is something that can be automated. It's validated. Um I I I can go through and enumerate a bunch of these, but I actually think it is sort of a nice checklist for yourself, right? Do you have some automated validation for the format of your code? Uh do you have llinters? These things for professional software engineers are sort of like, yeah, of course we do. But I think you can go a step further, right? This is where that continuous validation component comes in. Um, do you have llinters that are so opinionated that a coding agent will always make code that is exactly at the level of what your senior engineers will produce? How do you do that? What does that even mean? Right? Do you have tests that will fail when AI slop has been introduced? Uh, and when highquality AI code is introduced, those tests pass, right? These additional layers of validators are things that most code bases actually lack because humans are pretty good at handling most of this stuff without the automated validation. Right? Your company may be at some test coverage rate that's like 50% or 60%. And that's good enough because humans will test manually. Um you may have a flaky build that every third build it sort of fails and everyone at your company secretly hates it but no one says anything, right? These are the sorts of things that we know are true about large code bases. And as you scale out to extremely large code bases, organizations with 44,000 plus engineers, right? Uh this starts to become a very accepted norm that the bar is sort of maybe at 50% or 60%. Um and the reality is is most software orgs can actually scale like that. uh it's sort of fine to be at that lower uh barrier, but when you start introducing AI agents into your software development life cycle, and I don't just mean in interactive coding, but really across the board, right? Uh review, documentation, testing, all this stuff. Um this breaks their capabilities. Most of you have probably only seen an AI agent that operates in a codebase that has uh a decent amount of validation. Um I think a lot of the best companies in the world right now actually have introduced very rigorous validation criteria and it means that their ability to use agents is significantly greater than that your like average uh developer. Uh you know and and if you think about it this like traditional loop of understanding a problem, designing a solution to the problem, coding it out and then testing it uh sort of shifts if you have really rigorous validation. Uh it becomes a process of when you're using agents specifying the constraints by which you would like to be validated and what should be built. Uh generating solutions to that outcome verifying uh both with your automated validation as well as with your your own intuition. Um and then iteration where you continue to iterate on that loop. This move from sort of like traditional development to spec specificationdriven development is one that we're starting to see sort of bleed into all of the different tools. Different tools have spec mode. Droids have like our Droid is our coding agent have like specification mode, plan mode. Uh there are entire idees that orient you around this like specificationdriven flow. Um and if you combine these two things together, this is really how you build reliable and highquality solutions. So if you think about it, what is like the best decision for you to make as an organization? Is it spending 45 days comparing every single possible coding tool in the space and then determining that one tool is slightly better because it's 10% more accurate at Swebench or is it making changes to your organizational practices that enable all of these coding agents to succeed and then picking one that you're, you know, developers like or honestly letting people choose from the tons of amazing tools out there. And when you have these validation criteria, you can actually introduce way more complex AI workflows to your organization, right? Uh if you cannot automatically validate whether or not a uh a PR is like reasonably successful or has code that won't definitely break prod, uh you are not going to be parallelizing several like agents at once, right? you are not going to be decomposing a large-scale modernization project uh into a bunch of different subtasks like that is that is a very frontier style task to use AI for and if the single task execution right the simple I would like to get this done here's exactly how I'd like it to be done and here's how you should validate if that does not work nearly 100% of the time you can sort of forget successfully using these other things at scale in your company um when you get into other tools like code review, right? Uh if you want a really highquality AI generated code review, you need documentation for your AI systems. Uh and yes, uh agents will get better at, you know, picking out, you know, whether or not to run lint or test. They will get better at finding solutions when you don't have explicit pointers. They'll get better at search, but they won't get better at just randomly creating this validation criteria out of thin air. Right? This is why we believe software developers, by the way, are going to continue to be heavily involved in the process of building software because your role starts to shift to curating the sort of environment and garden that your software is built from. You're setting the constraints. You're building these automations and introducing continued opinionatedness uh into the uh into these automations. Um, and you know, if your company doesn't have at least all of these, right? Then that means that there's a lot of work that you can do totally absent of a procurement cycle or buying one tool or trying out another one. Uh, and so plug is that we help organizations do this, right? I think that it's great to have tools that allow you to uh go in and assess this stuff. They have ROI analytics that let you interact. Um but I think that for most organizations uh there is actually like a very clear way to do this right you can go and analyze where are you across those eight different pillars of like automated validation do you have a llinter how good is the llinter do you have agents MD files an open standard that almost every single coding agent supports um you can improve uh and systematically enhance uh these different validation criteria uh and you can go through and say Well, we're seeing that coding agents are reliable enough for a senior developer to use, but our junior developers, if you have the tooling to to tell, by the way, like which developer is using what tools, you you you can ask questions like maybe our junior developers are actually totally unable to use these coding agents. And you'll learn that the reason why is not because they're like more incompetent or they don't know how to use the tool, but because there's these niche practices that you don't have automated validation for, right? And if you think about what what is the difference between a like Google or a meta and a uh a still large but like 2,000 person engineering or the difference is that a newrad with effectively zero context can go and ship a change to make YouTube's like boundary like slightly more round and it won't with some degree of confidence take down YouTube for like a billion users, right? And the reason that's possible is because of the insane amounts of validation that have to happen on that code for it to be shipped. The big difference that we now have is we have coding agents that can go and identify exactly where these gaps are and they can actually remediate those fixes. Right? So you can ask a coding agent, could you figure out where we're not being opinionated enough about our llinters. You can ask a coding agent to generate tests. We have an engineer named Alvin who I love this quote. He said a slop test is better than no test. Uh and I think that that's slightly controversial, but the thing that I would argue here is that just having something there, right, that it passes uh when changes are correct and somewhat accurately uh matches to the spec of what you want built, uh people will enhance it. They'll upgrade it and other agents will actually notice these tests. They will follow the patterns. So the more opinionated you get, the faster the cycle continues. So I think that what you guys should be thinking about is what are the feedback loops in our organization that we are catering towards. If you have better agents, they will make the environment better which will make the agents better which will mean you have more time to make the environment better. And this is sort of the new DevX loop as well that organizations can invest in uh that will enhance all of the tools that you're procuring, right? So no matter whether it's a code review tool, a coding agent, etc., they will all benefit. Um and I would argue that it sort of shifts your mental model about what you're as a leader investing in when you're investing in your software work right now. The idea of uh you know opex as like the input to engineering projects like we are investing in we want more people in order to solve this problem. we need 10 more people. Um, I would I would argue that uh the other thing that you can now start investing in is this environment feedback loop that enables these additional people to be significantly more successful, right? And I think that that's the feedback loop that can actually take quite a lot of value because coding agents can just scale this out. So you know all of this is to say there's a lot that can be done outside of the like product itself uh to enable these systems and the best coding agents will actually take advantage of these validation loops right so if your coding agent isn't proactively seeking llinters tests etc then you know at the end of the day it's not going to be as good as one that will seek those validation criteria and in addition to that when organizations uh uh think about these sorts of things if you're the person who's able to say, "Here's my opinion. Here's how I want software to be built." It scales your capabilities out greater than ever before. Like one opinionated engineer can actually meaningfully change the velocity of the entire business if you take this to heart. Uh and you have a way to measure and systematically improve. Um so that's uh you know the the majority of uh what I came here to say. I think that the the the only thing that I'd leave you with uh is that when you think about where AI is going and like where we're at today, we are still really earn early in our journey of using software development agents. If you want a world where the moment a customer issue comes in, a bug is filed, that ticket is picked up, a coding agent executes on that, that feedback is presented to a developer, they click approve, that code is merged and deployed to production in a feedback loop that takes maybe an hour, 2 hours. That will be possible, right? We all are sort of skeptical about that fully autonomous flow. That is technically feasible today. The limiter is not the capability of the coding agent. The limit is your organization's validation criteria. So this is like an investment that made today will make your organization not 1.5x, not 2x, but that is where the real like 5x, 6x, 7x comes from. Um, and it's sort of a an easy thing to say and it's an unfortunate story because what that means is you have to invest in this. It's not something that like AI will just magically give to you. Uh it's a choice that you as an organization have. Uh and if you make it now, I can guarantee you that you will be in the top 1 5% of organizations in terms of edge velocity. Um and you will out compete everybody else in the field. So highly recommend investing in this sort of stuff and hopefully you found this helpful and have some lessons to take home. Thanks. [applause] [music] >> [music] [music] >> Heat.