Stateful environments for vertical agents — Josh Purtell, Synth Labs

Channel: aiDotEngineer
Published at: 2025-07-22
YouTube video id: 5rMc-moNVx0
Source: https://www.youtube.com/watch?v=5rMc-moNVx0
[Music]
All right. Hi, I'm Josh, founder of
Synth. I help people make their agents a
lot better. And over the last few
months, uh, I found some patterns around
structuring people's agent code that I
think they found very helpful and I
found very helpful for, you know,
thinking about how to build effective
agents, especially for vertical
applications like finance, accounting,
health, um, and so on and so forth. So,
I like to call these stateful
environments because they're
environments that capture state for the
agent.
So, um, let's define terms. What is an
environment? Uh, it feels like a loose
term, but it actually has quite a long
history. People working on reinforcement
learning tasks, which are really just
tasks where you're trying to get the AI
to do something um, without stipulating
how to do it, have been using
environments to kind of containerize the
logic behind the task away from their AI
algorithm for quite a while. So the
first implementation was RL glue. Then
OpenAI back when OpenAI was an RL
company and and not really a language
model company came out with the OpenAI
gym. And then most recently probably the
first kind of verticalish application in
academic papers uh SUB bench and SU
agent um kind of coined the term of
agent computer interface. So people have
been thinking about containerizing a
kind of stateful uh workspace for AIs to
have for for quite a while. This is not
reinventing the wheel. We're just
building on top of what people have
already thought about.
Okay. Um so why are we adding on this
abstraction of statefulness now? Well,
two years ago, um, people mostly gave
their LMS tools to calculate simple sums
or, uh, you know, maybe search the
internet for the weather. You really
didn't need to have a lot of clean,
heavy duty abstractions, um, for for
pretty simple logic like that. As models
got better and people wanted them to use
more effective tools, they moved to API
based tool use. And, you know, maybe you
see that with some people getting
excited about MCP. Um, and it wasn't
really until models got a lot better
with Sonnet 35 that people started uh
kind of thinking about a work a a
product or an artifact that the AI works
on, iterates on, improves step over step
over a long horizon. Um, and I think
when claude artifacts came out is
probably when a lot of people started
thinking about having some abstractions
to help agents like claude work on
product um and artifacts like cloud
artifacts in the web app. So um that's
kind of the impetus the why now. Uh so
what are we contributing? Well a
stateful environment is an engine that
computes results external to the agent
implementation. um the agent manipulates
the environment somehow, but there's a
lot of logic underneath that might
involve uh accessing an API working on
um an Excel document or or some kind of
external um uh you know source of truth
that gets computed on and goes into a
system of record.
It can be a lot for an agent to interact
with uh Excel though like the entire
application. So a stateful environment
exposes a kind of representation or a
version of that environment that the
agent can can make sense of can observe
and and manipulate uh usefully. So you
don't have to show the agent the whole
um OS you you kind of just show it what
it needs to see in the terminal. And
then often, and this is important for
people doing RL training, but it can
also be really handy in multi-agent
settings, network boundaries. Um, so
that your agent doesn't have to run in
the same process as whatever your
stateful environment is.
Okay. Um, so what does this get us? I
help people improve their agents. Um, if
you containerize the logic of your
vertical app into code that never
changes, it's a lot easier to completely
revamp your agent when the new model
comes out. It's a lot harder to do that
when all the logic is kind of just
clumped together. Um, what else does it
give you? Well, if you have a separate
process
determining your environment um that has
standard network boundaries, you can
easily have multi- aent and spin up new
models uh or spin up new agents to work
on this single product together um
across time and there's really no
problems. people have thought about how
to do asynchronous work and the the
answer to that question of how to do
asynchronous work in a reliable way in
production is network boundaries. Um and
then I think the most exciting thing is
once you have um this this boundary you
can start doing things like uh resetting
the state of the thing that your agent
is working on. You could do roll backs.
I think a lot of people working with
agents in a in a code setting know how
valuable it is to just be able to roll
back the agent after it's kind of gotten
derailed. And if you have stateful
environments, that's really easy to
implement. And so in particular, um, a
few years ago, there's a paper called
language agent research, um, that was,
you know, really impressive and it got
really good results, but it's almost
impossible to implement in production
because just nobody had really good um,
abstractions for it. And and techniques
like this are really useful in a long
horizon setting like a lot of builders
care about today. Um, and if you have a
resetable environment, you sort of get
language agent tree search for free. And
so here's kind of a screenshot of uh a
step in the tree search. The agent
branched out in two directions while
playing Minecraft. Um, one of those
branches did a lot better. And then it's
really easy to kind of just converge,
pick the best branch, and go from there.
Um, and in a game like Minecraft where
you have hundreds or thousands of steps,
avoiding derailing and resetting like
that can be really handy. Um, but maybe
not just in Minecraft, also in in kind
of a lot of other settings where people
are having their agents do a lot of work
over long horizons. Um, so if you'd like
to see some implementations of stateful
environments, you can go to our GitHub.
There's an open source repository that
captures a lot of these abstractions and
there's implementations across a lot of
academic benchmarks. Um, how do you find
that? Look for synth AI environments. Um
and and that's the talk.
[Music]