Ship Production Software in Minutes, Not Months — Eno Reyes, Factory

Channel: aiDotEngineer
Published at: 2025-07-25
YouTube video id: iheWKg2Tkrk
Source: https://www.youtube.com/watch?v=iheWKg2Tkrk
[Music]
Hi everybody, my name is Eno. I really
appreciate that introduction. Um, and
maybe I can start with a bit of
background. Uh, I started working on
LLMs about two and a half years ago. uh
when uh GBT3.5
was coming out and it became
increasingly clear that agentic systems
were going to be possible with the help
of LLMs. At factory we believe that the
way that we use agents in particular to
build software is going to radically
change the field of software
development. We're transitioning from
the era of human-driven software
development to agent-driven development.
You can see glimpses of that today. You
guys have already heard a bunch of great
talks about different ways that agents
can help with coding in particular.
However, it seems like right now we're
still trying to find what that
interaction pattern, what that future
looks like. And a lot of what's publicly
available is more or less an incremental
improvement. The current zeitgeist is to
take tools that were developed 20 years
ago for humans to write every individual
line of code. Um, and ultimately tools
that were designed first and foremost
for human beings. Uh, and you sprinkle
AI on top and then you keep adding
layers of AI and then at some point
maybe there's some step function change
that happens. But there's not a lot of
clarity there in exactly what that
means. You know, there's a quote that is
attributed to Henry Ford. Uh, if I had
asked people what they wanted, they
would have said faster horses. Now, we
believe that there are some
fundamentally hard problems blocking
organizations from accessing the true
power of AI. This power can only be
found when your team is delegating the
majority of their tasks across the
software life cycle to agents.
To do that, you need a platform that has
an intuitive interface for managing and
delegating tasks, centralized context
from across all your engineering tools
and data sources, agents that
consistently produce reliable,
highquality outputs, and infrastructure
that supports thousands of agents
working in parallel. These are all hard
problems to solve. But our team has
spent the last two years partnering with
large organizations to build towards
this future. This talk is going to serve
as sort of a deep dive into agentnative
development and some of the and a bit of
a share of some of the lessons that
we've learned helping enterprise
organizations make the transition to
agent native development.
When Andre Karpathy said English is the
new programming language, he captured
this very exciting moment. Right? And if
you're to judge AI progress based on
Twitter, you'd think that, you know, you
can basically vibe code your way to
anything. But vibe coding isn't the
approach to solve hard problems. You
can't vibe code a legacy Java 7 app that
runs 5% of the world's global bank
transactions, right? You need a little
bit more software engineering. So agents
really should not be thought of as a
replacement for human ingenuity, right?
Agents are climbing gear and building
production software is like scaling
Mount Everest. And so while better tools
have made this climb more accessible, we
still need to think about how to
leverage them and use our existing
expertise in order to drive this
transformation. I want to start with a
quick video of what's possible today,
right? And so in this you'll see a quick
glimpse of what it's like to delegate a
task to an agentic system. You can watch
the droid as we call them ingest the
task and start grounding itself in the
environment. It uses tools to search
through the codebase, determine the git
branch, check out what the machine has
available to it. It looks through recent
changes to the codebase. It looks at
memories of its recent interactions with
users as well as memories from its
interactions across the entire
organization. And then the droid comes
back with a plan and says, "Here's
exactly what I'm going to do, but I'd
like you to clarify a couple of things.
Right? We need to expect our agents to
not just take what we say at face value,
but instead question it and make us
better software developers." And so
after the user comes back with that
info, the droid comes, it executes on
that task. It leverages its tools to
write code, runs pre-commit hooks,
lints, and ultimately generates a pull
request that passes CI.
But how can you achieve outcomes like
this on a regular basis? Right? It's
nice when it works, but what about when
it fails? At the heart of effective AI
assisted development lies a very
fundamental truth. AI tools are only as
good as the context that they receive.
So much of what people are calling
prompt engineering is really mentally
modeling this alien intelligence that
has a slice of context of the real
world. And if you start thinking about
your AI tools this way, you're going to
start to get a lot better at interacting
with them. We've investigated thousands
of droid assisted development sessions
and you see this sort of heristic emerge
where AI is most likely failing to solve
the problem. Not because the LLMs aren't
good enough, but because it's missing
crucial context that's required to truly
solve it. And better models are going to
make this happen less often. But the
real solution is not just making the AI
smarter. It's going to be getting better
at providing these systems with that
missing context.
LM don't know about your morning
standup. They don't know about the
meeting that you had at Hawk and the
whiteboard that you did, right? But you
can give those things to the LLM if you
transcribe your notes, if you take a
photo and you upload it, right? You have
to start thinking about these things not
as tools, but as something in between a
co-worker and uh and and a platform,
right? And if you can get that context
that lies in the cracks between systems,
you use platforms that integrate
natively with all of your data sources
and you have agents that can actually
make use of those things, you can start
actually driving this transition to
agent native development.
I want to talk a bit as well about
planning and design. When your agent I
mean sorry when your organization is
doing agent native development then you
are using agents at every stage. Droids
don't just write code. They can help
with that part, but the hardest thing
about software development is not the
code. It's about figuring out exactly
what to build. Here you can watch a
droid as it's tasked with trying to find
the most up-to-date information about a
new model release and integrate that
into an existing chat application. It's
going to leverage internet search, its
knowledge of your codebase, its
understanding of your product goals from
its organ uh memory, and its
understanding of your technical
architecture from the design doc you
wrote last week. Planning with AI is
fundamentally different from planning
alone. It's not necessarily just asking
please build this thing for me or give
me the design doc but instead it's about
delegating the groundwork and the
research to AI agents then using a
collaborative platform to interact and
explore possibilities together. That is
how you get better at planning with
agents.
Now you can see here we have a nice
document, a nice plan. You could export
that to notion, Confluence, Jira, any of
your integrations with no setup because
MCP is great, but having every developer
have to install a bunch of servers,
click a bunch of things, pass around the
API key is not necessarily ideal. And so
platforms are going to evolve and solve
a lot of these problems. But in the
meantime, you do have droids. And now a
little bit more on this. The real unlock
for AI transforming your organization in
with respect to planning is going to be
when you start standardizing the way
that your organization thinks, right?
And so there's a bit of a of an example
that we just had a couple of weeks ago
while we were planning out uh a feature
related to our cloud development
environments. We got a lot of feedback
from users and so we had about three
months of user transcripts, people from
enterprises, uh, individuals that we
knew. Uh, we transcribe every single
interaction and meeting at factory. We
take those notes and we combine them
with a droid that has access to our
architecture. We take a ad hoc meeting
that one of our engineers took a granola
of. If you guys use granola, I love that
tool. Um, and we throw that all to the
knowledge droid and we say, we don't
say, "Let's plan the feature out." We
say, "Could you find any patterns in the
customer feedback that map up to our
assumptions? Can you highlight any
technical constraints with what we have
today that might help us make this
better?" And then we take all of that
output, those documents, there's maybe
four or five intermediate results here,
and that's what we use to start
iterating on a final PRD that helps us
outline the full feature.
You can take that PRD and if you have a
droid that has access to linear and Jira
with tools to create tickets, create
epics, modify those things, then that
PRD can be turned into a road map. eight
tickets. This ticket's dependent on that
ticket, but ultimately work that can be
parallelized amongst a group of eight
code droids, right? And so this is how
software is going to evolve. We're going
to move from executing to orchestrating
systems that work on our behalf.
I tal I talked about a couple of these.
I think PRDs, edge design docs, RCA
templates, quarterly engine and product
road maps, right? transcriptions of your
meetings. Normally, you might see this
stuff as a burden, but when your company
is doing agentnative software
development, your process and your
documentation is a knowledge base and a
map for your droids to learn and imitate
the way that your team thinks. This
documentation and process is a
conversation with both future developers
as well as future AI systems. And so if
you can communicate that why behind the
decision, that context for those future
developers and agents, then you'll start
to see that there's a huge lift in their
ability to natively work the way that
your team actually works.
I want to talk about uh agent-driven
development with respect to site
reliability engineering.
There is a lot that goes in to a real
incident response. It would be crazy for
me to go up here and say you could
actually just automate all of S and RCA
work today, but there is a difference in
the AI agent-driven approach. Right
here, we're watching a droid take a
sentry incident and convert it into a
full RCA and mitigation plan.
Traditional incident response is
effectively solving a puzzle. The pieces
are scattered across dozens of systems,
logs in one place, metrics in another,
historical context somewhere else.
There's knowledge in your team's head.
Droids in your organization
fundamentally change this, right? When
an alert triggers, you can pull in
context from relevant system logs, past
incident, runbooks in notion or
confluence, team discussions from Slack.
And you can see that a droid that has
the tools and the ability to access this
can condense that search effort from
hours to minutes. And so really the
acceptable time to act for a standard
enterprise organization should really
it's really going to be zero. Right? The
moment that an incident happens, you
should have a droid that's telling you
exactly what happened, exactly how to
fix it. And the thing that gets
interesting is when you have user and
organization level memory, you really
start to build a model of what your
team's response patterns and common
issues are. And so it's not just
generating runbooks or generating a
mitigation for one incident, right? but
creating new processes that help solve
some of these issues.
And once you've written that RCA, right,
you you can move on to generate runbooks
for those new learned patterns, update
existing response workflows,
capture team knowledge that gets shared
automatically without without the need
for manual curation.
And this is why all these things are
connected. Agentnative incident response
is a part of a larger learning cycle
that happens when you start to integrate
agents into the workflow. We're seeing
teams that are able to cut incident
response time in half because context is
immediate. They're able to reduce repeat
incidents because the third time
something happens, the droid starts to
say, "Maybe we should fix this." And
they're able to improve team
collaboration because when a new
engineer joins the team and says, "How
do we do this?" It's already in memory.
They can just ask the droid how we do
this. And so, most importantly, what
we're seeing in general is a shift from
reactive to predictive operations
because you can now start to really see
the patterns across the entire
operational history. And agentic systems
turn each of these incidents into an
opportunity to make the entire system
more reliable.
AI agents are not replacing software
engineers. They're significantly
amplifying their individual
capabilities. The best developers I know
are spending far less time in the IDE
writing lines of code. It's just not
high leverage. They're managing agents
that can do multiple things at once that
are capable of organizing the systems
and they're building out patterns that
supersede the inner loop of software
development and they're moving to the
outer loop of software development.
They aren't worried about agents taking
their jobs. They're too busy using the
agents to become even better at what
they do. The future belongs to
developers who understand how to work
with agents, not those who hope that AI
will just do the work for them. And in
that future, the skill that matters most
is not technical knowledge or your
ability to optimize a specific system,
but your ability to think clearly and
communicate effectively with both humans
and AI.
Now, if you find any of this interesting
and you want to try the droids, I'm
happy to share that everyone here uh at
this talk can use this QR code uh to
sign up for an account. Our mobile
experience is not optimized yet, but the
droids are on that. And so I'd recommend
trying this on a laptop, but you will
get 20 million free tokens uh credited
your account. Um, and I also want to add
that uh you know, first and foremost,
Factory is an enterprise platform,
right? And so if you're if you're
thinking about security, if you're
thinking about where are the audit logs,
whose responsibility is it when an agent
goes and runs remove RF recursive on
your codebase, right? Droids don't do
that. But if it were to, right, whose
responsibility is that? Then these are
the types of questions that we're
interested in and that we're helping
large organizations solve today. And so
if you're a security professional, if
you're thinking about ownership,
auditability, indemnification, if you're
a lawyer, right? These are the types of
questions that you should start asking
today because yolo mode is probably not
the best thing to be running inside your
enterprise, right? And so give it a
scan, give it a try, check out some of
the controls we have. Um, and if you
have any questions, feel free to reach
out via email. Thanks.
[Applause]
[Music]