The Friction is Your Judgment — Armin Ronacher & Cristina Poncela Cubeiro, Earendil

Channel: aiDotEngineer

Published at: 2026-04-18

YouTube video id: _Zcw_sVF6hU

Source: https://www.youtube.com/watch?v=_Zcw_sVF6hU

Morning. Thanks for having us. Um, today
I want to talk with Christina about
friction a little bit.
Um,
this is um, a
a social preview that came up
automatically when someone submitted
an issue um,
to
um,
basically uh, there was this is a forum
post that goes with um, a security
incident that was deployed accidentally.
It was a configuration change
that caused a problem. And the social
preview post had the marketing tagline
of that company which said, "Ship
without friction."
Um, and we want to encourage to add a
little bit of friction to it.
Um,
and I'll tell you why.
So, who are we? Um, I've been doing
software development for 20 years, most
of it in the open source space.
Um, I have created Flask, which is a
Python framework, which ironically is so
much in the weights that a lot of people
um, are learning about it now because
the machines are producing it.
Um, and I left my previous company that
worked for Century in April last year,
which perfectly coincided with um, me
having time and then obviously Cloud
Code.
And so, I fell deep into a hole of uh,
agent engineering and I started writing
on my blog and and and a lot of people
reached out to me over the last year um,
being all excited about this.
Um, and then I started with a friend in
October company called Arendelle where
we are trying to make sense of all the
AI things.
Um,
Yeah, and my name is Christina and I
work with Armin at this company called
Arendelle. But importantly, I am what I
like to call a native AI engineer. And
what that basically means is that these
tools have been around longer than I
have. Um, so what this means is like
they've been super foundational in how
I've become a software engineer. Not
just because obviously I use them to
work, but also because this is the means
by which I've learned to do what I do.
And before Arendelle, I was working at
Bending Spoons.
So, we want to share a little bit from
practice, not just theory, but um,
I will readily admit that I don't think
we have all the solutions.
So, we have been building with or on
agents for a good 12 months. Um, we had
huge leverage and great disengagement.
And we we really keep running into two
types of problems. Um,
I I think especially if you listen to
some earlier talks at at this
conference, you will have learned a lot
about um, that you should keep using
your brain. Um, it's for some reason
it's really really hard. So, there's a
psychological problem and the other one
is the engineering challenge. It's like
it they they seem to be producing worse
code for some people and better code for
some other people and like what is it
that actually makes it work?
Um, and so this is really not a solution
as it is our part of the journey of how
we think so far we have managed.
Um,
yeah.
So, problem number one is the psychology
part, which is like why is it even
though everybody told you many times
over that you should be using your
brain, you should be slowing down, it's
actually incredibly hard. It's just one
more prompt and and we don't sleep that
much. Like what is it that actually
makes it so hard? And then
would it be that hard if the machines
would actually be writing perfect code
and we wouldn't have to think quite as
much and like what is it is there
something we can do to make this a
little bit better?
So, I'll begin by introducing the first
part of these problems, the psychology
problem.
And what I want to talk first about is
the shift. So, I'm sure a lot of us here
who have been playing with these tools
for a while now experienced this at some
point. We were prompting, prompting, not
so good, and then at some point suddenly
it clicked and they were really really
useful for us.
And it was fun in the beginning and they
gave us a lot of extra time, right?
Because not everyone was using them.
They were actually tools that made us
more productive, that made it more fun
to do our jobs.
But very quickly because they were so
useful and they got us so hooked,
everyone was using them. And so, this
kind of had the opposite effect where
suddenly the baseline expectation was
just that everyone is now using them and
you have to use them. And so, this this
fun and free time translated into
pressure. Now we all have to ship faster
and produce more code and it is just
not sustainable to review and to
actually have time to think.
And so, this leads us to the trap. And I
actually think there's two parts of this
problem of this trap. And one of them a
lot of engineers have spoken about and
it's that these tools are super
addictive. You never know if that next
prompt is going to be the one that makes
your product work and you've added a new
feature or if it's going to be that last
drop of slop that brings your product
crashing down.
And so,
it's very addictive. We keep doing what
we're doing. It's not a great solution.
But also most importantly, and I don't
think we realize this as much, is that
because we produce a lot of output very
fast, we are tricked into thinking that
we're actually being more efficient,
doing more work. And this is quite the
opposite because now we don't have as
much time to actually stop and think and
decide what we're doing, ask ourselves,
is this the best way in which I can
implement this or could I be some doing
something better?
And when you're in this flow, it's very
difficult for yourself to stop and it's
definitely very difficult for your agent
to stop because it's running around and
it's reading files that it should have
never even read. So, we are the ones
that need to actually have the agency to
be in control here.
And one thing that from a if you start
scaling this from like one person to
engineering team that actually took me
quite a while to realize is that it
really changes the composition of the
engineering team.
We we were really supply constrained by
creation of code and so like the balance
between writing code and reviewing code
and engineering teams was usually quite
decent. Now every engineer has a
multitude of producing power compared to
their reviewing power. And so obviously
we are piling up on pull requests, but
we are also slowly starting to expand
the total amount of humans in an
organization that are participating in
engineering process.
I talked to a lot of engineers over the
last year and increasingly there are one
of the things that came up is like now I
have marketing people shipping code. I
have um, former CEOs ship CEOs that used
to be like engineers are shipping code
again.
And so,
the the roles that those people have in
the companies also doesn't give them
the there's not that much um,
um,
the responsibility doesn't rest in them.
The responsibility still rests with the
engineering team.
And so, the the total number of entities
both humans and machines that are
participating in code creation process
outnumbers the ones that can carry
responsibility. We are not there where
the machine can be responsible for the
code changes. And so, that has led to
more and more code reviews being
skipped, being rubber stamped.
Um, and on the goal to small PRs that
that we want to see again so that this
reviewing process goes
um, this amplification is something that
at the very least we need to recognize.
And so,
when you get this pull request that
looks really daunting and has 5,000
lines of code in it, this is actually
when you should be thinking and that's
exactly when it's the most overwhelming
and and increasingly we are tapping out
of this.
On the engineering side, what we're
doing is
we are creating larger pull requests. We
are creating these massive changes
because the it is free now, right?
And
the if you think about how the agents
work, they're really optimized to
creating code that runs. Like their main
objective is write some code, run the
tests, make some progress. The
reinforcement learning sort of gets this
in. And so, the the agents are writing
kind of code that is is when you as a
human, as a software engineer, start
learning how to write code you wouldn't
necessarily write. So, for instance, you
see quite a bit of code that tries to
read a config file and if it doesn't
read a config file, it loads some
defaults.
And as an engineer, you know that's
actually not great because I might not
notice that I'm reading reading the
default config file and so, I might only
discover that I have a massive problem
after 2 hours when I already wrote
database records with wrong data.
And so, these machines, they they
optimize towards making progress,
towards shipping stuff, towards
unblocking themselves. And as a result,
they're creating many more failure
conditions than human written code
normally would do. And in part it's
because you as a human feel a little bit
of a you feel bad when you write code
like this. There's there's something
that sort of builds up emotionally in
yourself, but the agent doesn't have a
reason for this.
It it doesn't feel anything. And so,
if you if you create these services that
are sort of hobbling along and they're
actually willing to to recover from
local failures, you actually create very
very brittle systems.
And this also means that you're very
quickly creating a code base of the size
and complexity that the agent itself can
no longer dig itself out from. It's
going to start no longer reading all the
files that it should. It's it's creating
code in a new file that has already done
somewhere else. And so, this this entire
machinery over time creates much more
entropy in the source code than you
would normally have if if humans were on
it. And and a big part of this is that
humans feel bad and the agents don't
really have any emotions that they
communicate to you.
But as Armin likes to say, don't worry,
not all is lost. We have so found some
correlation between what the agents
really excel at doing and the types of
code bases that we actually put them to
work into. And for example, the main
example here is libraries versus
products. What we found is that for
libraries, they tend to excel a lot
more. And this makes sense because
intrinsically when you're building a
library, you tend to have a very clearly
defined problem that you're trying to
solve. And most of the time you can even
map the set of features that you want to
build to the API service and it has very
tight constraints.
And because this is something that you
probably want to build on top of or make
accessible to other people, it's likely
that it's going to be a very simple core
in which you can then plug into.
And on the other hand, products, and
perhaps this is a bit more unlucky for
the rest of us because we all probably
are more into building products, uh it's
much harder because there are so many
interacting concerns and components.
Like, for example, you have your UI,
your API response, you have different
permissions depending on the feature
flags, the billing, and so on. And so
there's this very heavy intertwining
between different components. And what
this means is that for the agent itself,
it's impossible to fit fit all of this
into its context window. It has no way
to actually understand the entire global
structure. And so locally, the agent
tends to be very reasonable, but when it
gets to the global scale, it becomes a
bit demented.
So, what we're proposing here is that
just as you would do with any type of
system design in the past,
your code base has now become
infrastructure. And as such, you have to
design it in the way so that it is also
legible for the agent, and it can make
the most of it.
And so, this is what we're proposing is
an agent-legible code base.
And one of the main points that is very
clear to all of us, I'm sure, is
modularization. So, like, we have
different components, and this makes it
easy for the agent to add one feature in
one spot without corrupting everything
else. But importantly, this also means
modularizing your code flow itself. So,
for example, I've been working on some
refactoring. We're building somewhat of
an AI assistant.
And for me, it was super important to
understand which steps of my code are
actually like the main points. So, say
like you get user message, then I pass
the message to the agent's loop, and
then I have to deal with the output.
And this is where these points are very
clearly defined for me, so the code was
not as messy. But it happens to be that
between these points, between these
steps, that's where the agent tends to
add the most fuzz. So, it will be
parsing between different types. It's
adding things to state that shouldn't be
in state. And so you end up with these
behaviors that you didn't want to
support and that are unexpected and can
be quite dangerous.
Another point is trying to follow all of
the known patterns because I think we
all know by now there's no point in
fighting the RL, the reinforcement
learning.
The more we can lean into it, the better
that our output is going to be. And it's
also more scalable down the line.
Then as mentioned with libraries, like
if you have a simple core and you push
the complexity to other abstraction
layers, then it's going to be easier for
yourself and the agent to be able to
read your code base.
And no hidden magic. So, for example,
here, uh using React server actions or
using ORM instead of raw SQL, what this
does is that it hides intent from the
agent. And if the agent can't see
something, it can surely not respect it.
And so, to be more precise, these are
the examples of mechanical enforcement
that we have been using at the company.
And most of these we actually achieve
with uh linting rules. So, the main
example would be no bare catch rules.
Great.
Imagine that there's an example here.
The agent found a bare catch rule and it
was like, "Oh, no, this is bad." Edited
it.
But yeah, so we also try to have our SQL
uh always in one query interface so that
the agent doesn't have to go hunting
around the code base finding all of the
different places because if it misses
one, then you can have breaking
behaviors. And again, that's dangerous.
We try to have one primitives components
library for the UI and not have any raw,
for example, input uh input boxes uh so
that it's will always have one type of
styling. It's very consistent, one kind
of behavior.
We don't have any dynamic imports.
And this may not sound as important, but
actually we we enforce unique function
names. And the reason for this is not
just more legibility for you and the
agent, but it's actually also the token
efficiency. So, if your agent is
grepping for a specific feature or
something in your code base, if it only
gets one output, it's going to be much
better at continuing with the loop.
And we've started exploring something
recently called erasable syntax-only
TypeScript mode. And what this does is
that your code is basically JavaScript,
and it has the type annotations on top.
And this means that there's no
transpiling direction because there's
one source of truth between your actual
code and the compiler.
And so, when the agent is looking for
errors, it doesn't have to have this
like confusion of, "Oh my god, where am
I looking at?" It is much better at
finding them.
And so, the goal really is get in this
loop somehow. Like, get the agent to
produce as good code as it can, but you
really need to find a way to feel the
pain that the agent doesn't feel. And
you need to be woken up in a way when
you should be looking at this. And one
of the things we have been doing is we
built a PyExtension for our review
needs, where we are separating out the
kind of input that normally would go
back to the agent. So, this is
mechanical bugs. It is
where it clearly violated H 1 D.
Um but then we specifically call out the
kind of changes where the human's brain
should reactivate, right? It's like, we
don't think that a database migration
should ever go in without a human making
a judgment call on this because it very
much depends on the locks, the size of
the data in production. Um if there are
permissioning changes, you better think
about this themselves rather than the
agent because they they can be
they can be under-documented.
They're just some examples where we
learned
if we miss it, we regret it.
Um and you will miss it, but at least
these machines can help you find this,
and then you see this, and then you
actually get a little bit of a hit like,
"Oh, now I have to kick into gear and do
something here."
Um
this is what this looks like in Py. Um
you have the um
the bottom you have the human callouts.
On the top, you have what is going what
is basically if you were to end this
review and select fix the issues, the
the agent would go back and
automatically act on the first two.
Um but but this is the moment where I
will now go and see like, is this a
dependency I actually want to have in
this code base? Like, do I like the
maintainers? Is this
does this work for me?
And we obviously like the speed. Like,
this is addictive. It is great. We feel
there's a lot of productivity.
But it is so devious if you start
relying on that speed where you really
shouldn't. And so, I can only encourage
you to find the areas where you you have
this feeling that this is actually not
positive. For me, a lot of this is
reproduction cases. Like, when a
customer reports an issue, I can I can
have the agent reproduce this perfectly,
and I have a really good starting point.
Exploring different type of product
directions for as long as you commit
yourself to doing this
uh with the code that it generates.
Um all of this is great, but on the
other hand, system architecture,
creating reliability in the system, they
are not just very good at.
Because we really still have to go slow.
It's There is so much mess that can
appear in a code base in so little time.
Mario was already talking about this
earlier, but like we forget that we're
producing months and months of technical
debt in the in the in the time of weeks,
in the time of days sometimes. And it
becomes so much harder to actually
understand what's going on in this code
base. The when the understanding of your
own code drops, it is really, really
hard. And it's also psychologically
hard. I have found some code pieces that
actually didn't work in production, and
I was kind of frustrated learning that I
was the one that committed it with the
agent and just didn't really see that.
It's it's a very disappointing
experience when it happens. And then you
realize that you actually were the one
that screwed up.
Um and so, it is it is psychologically
incredibly hard to to really judge
objectively the state of the code base.
And the only way right now is to really
slow down a little bit on on that front.
And this this friction, I know that
friction, like every engineering team
I've ever worked at said like, "We need
to get rid of the friction in shipping."
And and that is true. Like, there's a
lot of stuff that's very, very annoying
and shouldn't be there. But if you have
worked in large enough engineering org,
SLOs are great system that is
intentionally designed to put friction
to the engineering process to make you
think, "Do I need this reliability? Do I
need this criticality of the service? Am
I sufficiently staffed to run it?" And
with the agents we have now gotten in
this idea that we should get rid of all
of this when in our reality we need of
it.
Um because the friction actually in many
ways is what's necessary on a physical
level to steer. Like, without friction
there's no steering, and and that is
really necessary.
Um so, you should you should get put a
little bit more of a positive
association to this idea of friction um
because this is really where your
judgment is, this is where your
experience is, and you should be
inserting that and start feeling it.
Thank you. Thank you.