Context Is the New Code — Patrick Debois, Tessl

Channel: aiDotEngineer
Published at: 2026-05-03
YouTube video id: bSG9wUYaHWU
Source: https://www.youtube.com/watch?v=bSG9wUYaHWU
[music]
>> There's there's a few people who want to
start earlier.
I know I'm going to take the opportunity
to officially open kind of the
architect track. There's no track host,
so I do it myself. So, thank you for
coming here. I hope you already had like
a good conference. Um
It's amazing that like so many people
showed up. Um maybe before I start, um
who's used any AI coding agent in this
room? Raise your hand.
Like lower it. Who hasn't? Raise your
hand.
Okay, my kind of people. Perfect. All
right.
Um
Okay. Context is a new code.
Or context development life cycle. Um I
feel honored to be here. Every time I
try to do a different talk at the AI
engineering.
So, this is a little bit of um
you know, thinking ahead. It's an
unpolished thought. It's not like
everything's there, but is there
anything there in AI anyway? But
So,
let's start.
I assume
you all are now vibe coding with
prompts. I barely touch anymore kind of
the code. I just tell the AI to do
something different.
So, I would say like
context is the new code because it's
being generated.
A little bit more advanced maybe is
I see myself having a tendency is I had
large pieces of code that I was using
maybe some helpers and some other
pieces.
And I just turned them into a skill.
We had that in our into our product. It
was an onboarding from, you know, AI
agents. Um
People have Python, Node.js, all the
various things. Then they have different
tools for packaging and
it is impossible to actually code that.
Like it will require a lot of coding.
But if I just say a skill says
please first figure out what their
package manager is, then figure out what
their ecosystem is, and then do these
steps together with the user.
You know, it's solved a lot more
problems that we could ever code. So,
that is another piece that I would say
code is also transforming back into
context as a skill as well, as a
workflow that's reusable. And
leave that with you.
I like to think in parallels.
In 2009, I don't know if there is any
DevOps people in the room. It was kind
of me saying like what if ops looked
more like dev? And then we got like,
hey, collaboration, kind of our
deployment, all that stuff. So, kind of,
you know, last year I started thinking,
what if context
is the code?
How do we deal with this in a more
consistent way?
And
it's basically saying if we have a
software development life cycle
how does a context development life
cycle look like? Because we're basically
shifting somewhere else. It's context,
it's not code. How does it look like?
I came up with this, you know, of course
an infinity loop with some DevOps
background. But the whole idea is that
we generate a lot of context.
Then hopefully we test the context. We
distribute context maybe to some
colleagues, to some other parts of the
organization. We observe whether it
works, and if it doesn't work or works,
we call like, you know, adapt and
regenerate the context and then go from
there. So, that's kind of the
loop of the talk that I'll be going for
with some examples.
So, step by step going through.
Generate. It's probably the one that
you're all most familiar with.
Because you're all prompting.
You're like the human context creation
typing things, right? I was actually
amazed that I just asked, tell me when
my talk is at AI engineer, that it would
fetch the website and it would just say,
here's your talk. Like blew my mind. But
hey, I I said like the context that I've
given it, I'm Patrick, all that stuff,
right? So, very simple context. It's
what you do probably a lot in your
setup.
If you get a little bit more advanced,
you say that prompting is tedious. I
want to have reusable prompts. So, you
know, depending on the flavor of your
coding agents, they call it
instructions.
Luckily, there's a little bit of a
standardization now happening where it's
like an agent.md and some pieces like
that. Boo Claude for still calling it
Claude.md, but anyway, you get the
picture. There's like reusable prompts,
reusable pieces of context that we're
doing.
We can also bring other context in.
If we have documentation of libraries
that we use day to day
we want to pull that in because the LLMs
might not have the latest documentation.
And so it's hallucinating. Is it version
two, version three? We don't know. So,
we give it a context and say, please
download the documentation. Hopefully
then agent optimized. And then they will
do a better job at generating the code
for that version of the library.
Another piece of getting better context
and creating context from libraries.
And of course, it wouldn't be complete
if we would say
pull context from wherever. MC
Get it from your GitLab, GitHub, kind of
Slack.
All context we're pulling in, we're
creating. Even a ticket is creating
context because we're pulling that in
while we go there.
And then maybe the new kid on the block
is, okay, what if we
start like writing our prompts as
specification spec-driven development
which then gets broken down by the agent
into a planning mode into step by step
kind of prompts that it then kind of
runs through. So, a lot creation
happening in that field.
You know, simple. This is probably what
you're closest to.
But
when you're typing all that context and
creating all that context
you change two lines in your Claude.md.
Do you know the impact?
Is it like YOLO? Looks good to me. Let's
do it. You have to think about how do we
test things?
It's not just about we have a piece of
code and we have a piece of context now.
We need to write tests to see what is
the impact. New coding agent? We don't
know where the lines still work.
Now, it's not new in the world of AI
engineering but it's not that common yet
in the world of coding with AI that you
start writing evals for which are tests
for your kind of code context.
Uh a little bit hard to read, but you
know, if you think in parallels
we have different levels of testing in
code, and the simple one could be
linting. Your IDE is has the swiggly
lines like, hey, this is not like, you
know
there's some
incorrect syntax or you could do better
like that.
Here's an example of a validation of a
skill where we say, well, you need to
have the description. It can only be so
long. So, it's validating according to
the spec of the format of the context in
this case.
Simple analogy, simple linter that you
can run.
And then you can do other things like
and and I haven't found maybe the good
coding equivalent, but think of this as
a Grammarly.
Right? So, if you write context
um
is it actually can the agent understand
what you're writing? If you write two
words, it's not verbose enough for it to
actually understand the context. So,
what you can do is you can say ask is
like, okay, you know, given this
context, what do you think about Do you
understand this? And then you can get
feedback like,
oh, it's not explicitly enough written
or it's not complete. Like you're
missing pieces. So, that's kind of from
tools as well. So, whenever you're
writing now your context, you get a
Grammarly saying, hey
do this. That's why I like to voice
code. For some reason, I'm way more
elaborate voice coding than typing. I'm
a bad typer, two fingers
still after so many years. But when I
talk, I was like, you know, I see the
the sentences come on the screen, but it
helps to get good context there.
All right, another kind of test.
So, imagine you put in your Claude.md
or agent.md, I should say. Now,
um every API point must use the prefix
awesome.
Right? You have some convention in your
company. Right? Which is great.
So, your prompter will be then, add me a
new endpoint to save a user.
And you expect actually your coding
agent to just say the code that's being
generated has kind of {slash} awesome
{slash} user.
That's great.
But the way we can test this is by
asking then
an LLM
the code that was generated
does it actually start with {slash}
awesome? Now, you can do that with
regex, I know. This is just for example
purposes, but you can ask it to kind of
judge your code based on your criteria
and whether it did the right thing.
Right? So, imagine you would ask the
same question without your context
above.
No LLM is ever going to prefix your URL
with awesome. So, that's kind of where
your content or your company specific,
your team specific things come in, and
that's why you still write those tests
to see if this still works. Now, maybe
Gemini kind of reacts differently than
Copilot or something, and in your
company you need to make it more, you
know, switchable of context. With this,
you run the tests, and you can actually
tell.
That's the difference.
And then you can make like whole suites,
and I would compare that almost to unit
tests. I have a bunch of these tests,
and they tell me whether that's
actually, you know, good code, the code
is following the rules, and everything's
fine. In this case, it's even kind of
infrastructure as code. It doesn't need
to be code only. It could be various
things. Could be config files as well.
And I just have It's hard to read, but a
bunch of kind of criteria that I just
run every time to do that.
But,
if you want to test,
you know, whether an endpoint has
{slash} awesome
{slash} user,
there's a real test that we want to run,
which is
I want to test the endpoint. I just
don't want only to check the code. I
want to have it running. So, when you
give the judge a tool, and the judge
becomes an agent, and it can do things
in a sandbox and execute stuff.
It can actually do the do the curl. So,
you can bind
LLM as a judge with kind of some
tooling, and then you can have multitude
of tests actually, you know, in this
case,
it kind of ends up being an end-to-end
test, right? Because it's not just
looking at the file, it's actually
running the piece with everything that
it's supposed to do.
And then I can do this like given a
certain commit in my repo,
I want to run this scenario
given this piece of context,
did it make a difference? Yes or no? So,
you're kind of like building this up
while you're committing context also
within your repo.
And because we now have tests, and it
gives us feedback whether it's working
yes or no, or what it's missing, we can
optimize context. So, that's kind of
the, you know, you we can put that in a
code action or something that says like,
"Okay, fix this context. Improve this
context." With all the feedback the LLM
has given us
to improve that.
So, you know, again, coding
uh improvements, but we start thinking
more in testing that piece as well.
Now, one of the first reactions is once
you have tests and optimizations,
can we run this in a CI/CD system
because
that's perfect, right? That's where we
run our all our tests and our test
suites and do that.
Now, there's a little bit of a weird
thing.
If you run evals,
you run it once, you run it another
time, it might not give the same result.
Remember, undeterministic things.
So,
you cannot
say, "Well, run it once, and then if it
passes or not." You're going to be in
for a treat because it's like, "Ah, I I
can't debug that." So,
think about this like you run it five
times,
and out of five, how many times does it
succeed?
And, you know, maybe
in several cases it hits 100% all the
time, which is great.
But, in others not. And depending on how
you change your context,
it will influence which test actually
work or not.
I find it personally helpful to think
about this as error budgets.
I give a set of tests an error budget
that I really care about, so it it's
only allowed like, you know,
to fail minimally, and other pieces are
okay. So, that's how you have to think
about testing context. You cannot do
like exact testing all the time. It's a
different way that this works.
All right. So,
generate. Hopefully, you understand what
the testing could do for you.
And distribute.
Maybe that's also something you already
did.
If you maybe have checked context into
your repo, right? Which is great, you
know, all of a sudden it becomes
available, your colleague checks it out.
Uh zero friction, I can push, I can
share.
But,
we have another mechanism for doing
things. Think of this like Imagine you
have a reusable context
that you want to reuse across multiple
projects, across multiple teams. We had
the concept of a library.
So, what if we package
kind of pieces of context, and then we
are able to install pieces of context
that we need for this project.
Guidelines, front end. It doesn't matter
for that. And then if we take it that up
a notch, how to discover what packages
exists?
That's a registry.
Right?
Now,
in that way, it's no surprise that
you'll see things like skills and kind
of the Tesla registry in the
marketplace,
where you can find a multitude of
skills. Now, the reality is
99.9,
and I mean that in a very sincere way,
of the skills is crap.
But, it's good to learn from others to
see what they're doing.
But, hardly of them, if you run kind of
any set of evals on there, is actually
up to a quality standard.
Now,
that will likely improve. But, there's
also a tendency is that
a lot of the skills and pieces,
people actually want to put that in
their own registry.
So,
I'll come to that later again. But,
so you start seeing the gist, a skill
not only contains context, it can
contain scripts, it can contain
documents, contain bunch of things. So,
is this kind of the package format?
Probably, you know, plugins
could now also contain MCP, but you see
there's like a standard coming in.
Skills all of a sudden, when that came
out, all the coding agents said, "We're
supporting this as
almost like a package format for people
to distribute their context on."
And then when I have one piece of
context,
I have dependencies. And I'm sorry, but
also with context we're going to have
dependency hell.
Right? I I'm I'm I'm going to download
this for front end, and maybe it's
conflicting what is in the React context
package. And so, you start having to
deal with that as well. So, you start
seeing also uh packages that's uh mirror
your library versions, your code ver
like your context versions, and kind of
pull that in as well.
And of course, when we have packages and
people are publishing things in
registry, we need security.
Right? Open claw. Thank you for that.
Like everybody all of a sudden became
aware that we need more secure
things because we are able to run things
on our laptop that are not and coming
from strangers, right? So,
Snyk has a way of scanning context,
right? It's doing some credential
handling. It's uh exposing some
third-party pieces. So, you start seeing
the scanners on the context as well.
And then when you think about security,
who actually built the skill? How was it
built? With what model was this built?
So, all kind of capturing what we
learned in maybe with packaging, like
the SBOM, is kind of the AI SBOM, like
the packaged of context that we're
putting in.
So, you've seen
still on the path, right? You generate,
evaluate, distribute.
Let's move into observe.
When you
are making libraries off skills and
context for others,
and I don't mean copy and paste this
over Slack or something.
But, when you actually want to maintain
this as something somebody else can use,
similar to a library,
um when they start using that, how do
you get feedback whether that still
works?
Now, a great place to get feedback is
actually by looking at the agent logs.
So,
imagine developer one
coding on the project, and the agent is
not doing what they want.
They could put this into their context,
which is great, right? Okay, let let me
do the TDD almost like, you know, I hit
a problem. It's not TDD, but you get my
gist.
Um
or
what if we at a team or an organization
scale would look at the logs every time
an agent said, "We're missing this
piece."
And we surface that and say,
"If everybody's missing this piece, we
should create context for this."
And then we distribute the context to
everybody, and all of a sudden the
impact of improvement is for everybody.
Luckily, like the agent and D, there's
now our standards becoming for logs. So,
we can read from logs, and that's part
of our feedback channel to see if the
agent is actually using or missing some
of the context.
Any feedback you get on a PR that's not
complete, that's feedback on your
context because
that PR was created with certain pieces
of context. If you say this is not
correct, you can kind of keep arguing on
the PR, or you can just say, "Let's
improve the context." So, the next
iteration actually
improves, uh and you don't hit that same
problem again.
What about
running code in production that was
generated from context.
And that's not correct because
yes, we do our PR reviews and we say
thumbs up, thumbs down and we give the
feedback, but the actual feedback is
also in production when it's running.
So, this is a tool that actually
instruments your code,
pushes it out, it's almost like a
wrapper, it pushes it out to production.
When it fails, it says, "These pieces of
code were changed and were failing.
Hey, in this case, input, output,
it did something wrong.
Can we create a test case for this? So,
the next time we don't hit this again in
production?"
Feedback loop.
Now, these are all kind of pretty
trivial like missing pieces of context
or improvements.
But, if you run agents and the
equivalent of scanning maybe, you know,
in the CICD is
you need to make sure when it's running
in production,
is it not doing strange things? So, we
need kind of a way of looking at that.
Now,
I've been toying myself with uh
you know, sandboxing agents and it is a
very resourceful
at finding things.
I like, okay, you know, run this thing,
try to figure out like anything useful
to get break out of the system.
And okay, it uses my environment
variables. Okay, stupid. Let's let me
remove the secret. Let me look at your
memory files. So, you have to really
make make sure that like whatever it's
doing, you can have a way of tracing
this as well.
And uh apologize again for kind of the
slide, but
the gist is
we can have a sandbox where the agent
runs inside.
But, your code agent by default without
any restrictions loads your agent.md,
you load your skill.md.
Like, nothing is blocking that.
So, if you download this,
immediately it's loaded.
So, you can't filter that with
sandboxes. You need to have another way.
I call that a context filter. Think of
this as a web application firewall that
just filters out any patterns or prompt
injections or stuff that is coming in
directly in that piece.
And if you take that, there's a lot of
talk here as well on harness
engineering. Harness engineering itself
also has this kind of full
observability, looking at logs, looking
at traces, looking at feedback.
So, it's kind of, you know, useful for
training pieces, but as much useful for
running your own pieces well.
Those were the pieces for me today.
I would say
for a lot of people, there's like create
context,
test context. Think of this as your
library authoring tool loop.
And then when you push this into the
enterprise, there's an organizational
loop. Hey, I made a library, somebody
else is using it. I'm looking at
whatever that's useful, whether that's
still working, whether that's still
working for all the other pieces. So,
that's kind of like
the kind of
improvement almost like sonar CICD model
for context. And then
you're currently probably doing a lot at
the individual solo model, you're
improving, you're honing, crafting your
own kind of markdown. What if you start
doing this more with your team? Make
that a reflex. If it's missing, add some
context. What if you put that out to a
team of teams and you start having a
flywheel, you know, if you fix it here,
the other team can reuse it and and
that's kind of like,
you know, scaling things out into the
organization as well.
And so, there's a lot of talk about LLMs
and coding agents and I all love them,
but the way that I see it is they're
just the engine.
If you give the engine the wrong fuel,
which is context,
they're not going to perform. So, and
you can't do anything on the LLMs, at
least not me, right? I'm just using the
coding agent, I'm using whatever they
give me, but I can optimize my context
uh
and that's I think the message uh doing
this more in an engineered way than just
copy and pasting things and hoping for
the best in there.
If you like this talk, connect on
LinkedIn for the slides. Uh
give me some feedback, good and bad.
If you want to try Tessel where we
implement some of the pieces of this, uh
have a go.
And if you're also interested in another
conference, I know, you can never have
enough conferences, uh visit uh AI
DevCon, which I curate the content for
uh here in London first and second of
June.
And that's it. I can maybe take a few
questions.
>> [applause]
>> Any questions?
Sure.
So, I was wondering if you have any
thoughts about like more exotic forms of
context like I don't know, the
traditional ones. So, for example, one
of the things I'm working on is
automated system for uh scoping out
architectural problems and like trying
to create hard definitions for them so
that you can feed that to the agent and,
you know, create actual
objectives uh tests.
Cool. Yeah.
Microphones.
Um and one of the things I've been
testing out is like
the ability to create consistency as a
form of context or as a form of eval.
So, um given this rough like very loose
definition of what the plan is, if can
you put that if you try that agent
system, turn that into a really crisp
definition, and you just have that done
in parallel, how often do you get the
same crisp definition? And if they're
all over the place, then the original
definition was so poor, you need to like
go back to base principles or to an
architect. But, if they're all the same,
then it's probably a pretty good
definition and you can carry on with the
downstream process. So, I think it's
like besides just code and typical
evals, um any other sources of context
for generating context that you think is
useful?
Um
I don't have maybe a a specific answer
to your like exotic case, but uh
I would say that maybe the piece that
people are underestimating is that once
you you know, you thought you were going
to save time by writing actually your
context uh instead of all your code,
but if you take this rigorously, you're
going to spend time on writing the right
evals. Right. And that's kind of like,
you know, a lot of work to kind of
because now you don't only have one
prompt that you're trying to get right.
It's like all the prompts of the evals
and that like if people do almost like a
like the more advanced people, they
almost have their own process and they
they build their own process on top of
like for building the right evals on
your business case as well. So,
yeah.
Good question. Thank you. Any other
questions?
If not, I'll be around. Um
say hi. I'll also going to be at the
Tessel booth. So, thank you very much
and I'm going to make space for the next
speaker. Thank you.
>> [music]