Ralph Loops: Build Dumb AI Loops That Ship — Chris Parsons, Cherrypick

Channel: aiDotEngineer
Published at: 2026-05-04
YouTube video id: 2TLXsxkz0zI
Source: https://www.youtube.com/watch?v=2TLXsxkz0zI
Welcome. So, this workshop is on Ralph
loops. Uh, hands up here who knows what
a Ralph loop is. That's almost everyone.
I'm guessing that the other folks who
came in were just here because they
thought that sounded weird or or maybe
looking for a quiet place to work. I
don't know, but you're very welcome. So,
what we're going to do today, this is a
two-hour workshop. We're going to uh if
you could just kind of um make a little
bit of space if you need to as people
are coming in, that would be really
helpful. Thank you. This is a two-hour
workshop. We're actually going to build
Ralph Loops together. We're going to do
this together on our own laptops in
order to uh to make some stuff happen
and get some things done. So, so it's
not just about theory. This is a very
practical thing. So, if you've got a
laptop, you're welcome to get it out in
a second. We're actually going to try
and do this ourselves. Uh, so, uh, I I
have a few slides, but not many. Most of
this is going to be live demos and kind
of interaction, um, points as well. Um,
and the idea is at the end of this, you
should be able to leave with something
that works that we'll uh, we'll apply it
to a kind of toy codebase uh, just for
fun uh, to create a Pomodoro timer. But
uh but hopefully the idea is that you'll
be able to use this on your real work
when we get uh when we get done. So
another show of hands. Okay. Who is
using Claude code or codeex specifically
to write code? Hands up. Oh, quite a lot
of people specifically to write code.
Okay. Who is using it to write all their
code? Who is no longer writing any code?
That's quite a lot of people. Look
around for a minute. That that is a huge
change. If I'd asked a bunch of
programmers six months ago who was not
writing any more code, you'd get a very
different answer. Uh so next question,
who is using either claude code or
codeex? Um and I'll include cursor here
as well in their non-coding work.
Okay, quite a lot of you. What about for
all your normal non-coding work?
Okay. Interesting. Interesting. So can
you just see the future um in the room?
Right. There's a few people who are
starting but we're still on that journey
for sure. And who has built Ralph Loops
before? Last show of hands. One or two
people. Okay, great. I'm going to be
looking to you for all the answers. Um,
so that's great. So, uh, just a little
bit about me. My name is Chris Parsons.
Um, these days I spend most of my time
trying to help, uh, teams like the team
I used to run figure out what on earth
to do with AI mostly. So, uh, I'm a CTO
by background. I've done a couple of BC
back startups and scaleups and uh this
has taken uh me and my friends by storm
rather and we are trying to still all
figure it out professionally together in
terms of how to help our teams adopt and
use AI. So that's what I do for a living
these days. I have about 30 years or so
of building software professionally been
the CEO of an agency. I've done a lot of
agile consulting remember that uh and
that kind of training as well back in
the day. And uh and funnily enough all
of those it's a whole another talk. all
of those uh principles and practices
that we taught for years and no one
really listened were are still very much
applicable um to AI. So, so there we go.
Um so these days I'm actually running
Ralph loops all the time, 24 hours a day
to get my work done. So I'm I'm using
them to write my emails. I'm using them
to check my calendar. I'm using them to
write content and newsletters. I'm using
them to help me do my client work. So
I'm using them in absolutely everything.
I also use them for code which is what
we're focusing on today but they are
very much applicable to every part of
our lives. So by the end of the day the
idea is that you will be in the position
where you can do that too. Uh so this is
how I used to work with AI until quite
recently. Uh you probably can't see that
very well. This is an N10 workflow uh
that I used in order to create my weekly
newsletter. It took me uh probably a
week to write let alone actually test
and debug. Uh, it's got a huge number of
different things in here. This this is
like a a featured article flow which
would read various different articles
from my blog and figure out whether I
posted it before, summarize it using AI
and put it there. And then there's
another one for grabbing uh links that
I'd posted into a particular list and it
kind of did a bit of commentary on that.
It was really quite complicated and and
difficult to run and maintain. And it
kind of worked okay, except that 2 p.m.
on a Monday, pretty much every Monday, I
would get the dreaded notification from
NA10 that my workflow had failed. And I
was just like, "Oh, no." And then I'd
have to go in and figure out in here
what whatever had broken and try and run
it and fix it. Now, this is nothing
against NA10. NA10 is a really cool tool
and it can do some really cool things.
And I'd never have been able to
orchestrate AI in the way that I was
doing before without a tool like N810.
despite being a coder, it's just so much
easier to manage in here. And you can
manage all the API keys really easily,
and it's it's a nice tool to to stick
things together, but it was so brittle
to use at this kind of level of
complexity. And I didn't get a huge
amount of value out of it. Um, and then
every time I fixed it, I would do
something else. And so, honestly, it was
probably easier for me to just write the
newsletter than it would have been to
maintain the thing that wrote the
newsletter. And I probably had a
slightly better newsletter. Uh, so this
wasn't great. Um but but this was to my
mind a few months ago the really the
only way to use AI. You had to kind of
orchestrate it and manage it, give it
the right data and kind of handle all
the context. And I thought that this was
the future of automation, but it isn't
really the future of automation. The
future of automation is a lot more like
something like this uh running in clawed
code. So this is obviously not the
actual skill, but um I have now in
clawed code a skill that writes
newsletters for me and it has all of
those instructions. In fact, I copied
and pasted the NA10 JSON code from there
into claw code and said, "Write a skill
based on this flow, and it did a great
job." Um, and then what it does is it
goes through and does all of the things.
But what's interesting about that is how
does claw code work? Well, it reads the
first thing, it decides on the next
step, and then it reads the next bit,
and then it decides on the next step,
and it kind of works through over some
minutes to actually write and produce
the newsletter that I was writing. And
it's the same for code. It's the same
for anything that we want to build using
claw code. Claw kind of just takes care
of it. You describe the kinds of things
you want and it does it. Now what's
interesting is that clawed code
fundamentally is running on a loop,
isn't it? It just reads the skill, calls
a tool, goes back to the beginning,
reads the skill again, calls a tool,
calls a tool, and then at some point it
figures out that it's done and it stops
and it gives you your newsletter in
whatever form you want it in. Um, so
what's interesting is that this ships
much better, more coherent newsletters
than the previous uh the previous
workflow. I still have to change them,
write them, screw around with them, but
but they are uh they are a much better
first draft than they ever were. And I
haven't really touched this skill. All I
really do with this skill is I say at
the end of a newsletter writing process,
please just update the skill with
anything you can figure out from the
session you should have done
differently. And it makes the odd tweak
here and there. Um so that's a loop that
is this kind of form of working in loops
with AI. So so agents that originally
start in workflows where you have quite
complicated orchestration that looks a
bit like something hellish like this end
up in quite a simple loop perhaps with a
bit better context. Now this didn't work
for the longest time but it's now
beginning to work with the latest models
and by latest model I really mean GPT
5.8. X really GPT 5.12 onwards uh and
Claude Opus 4.6 or Sonnet 4.6 upwards.
So those models uh started emerging
around about uh the end of November. Uh
I have no idea about Mythos by the way.
I I've spoken to people who've used it
and they say it's it's it's good, but
it's mostly marketing, but we'll see. Uh
but um but yes, uh we'll see what that
where that takes us. Maybe we won't even
need skills. Maybe we'll just say write
a newsletter and it'll do it. Who knows?
Um, but what what I'm trying to my my
point is is that rather than using
complicated workflows, we're actually
using skills and loops much more in
context and loops. And any kind of agent
that we run is in some way a loop
already. Okay. And this kind of powerful
looping construct is something that you
can more generally apply. Um so what
happens when we take loops a little bit
further? So um the first stage um is
this idea or the first idea came from
Jeffrey Huntley uh a little while ago
you know ancient times in AI which means
probably about last June and he said
basically what we should do is whenever
we finish using an AI to do anything we
should just try the thing again in some
way we should just uh just give it
exactly the same prompt and see what
happens see what it does again uh and it
sounds it sounds a bit stupid and it's
based on. Does anyone know where this
story comes from? Who knows why it's
called a Ralph loop? Like two people.
It's called a Ralph loop because of
Ralph Wigum, which is a Simpsons
character who basically says um the uh
he just tries the same thing over and
over and over again and eventually it
works. Um and it's all it is really. All
all that a Ralph loop is is uh build
this thing or do this thing inside a
prompt. Then the AI goes away and does
the thing and then it finishes and says,
"Okay, I've done the thing." And then it
says, "Okay, great. Go and build this
thing." And you know, and do all of the
things I said to build this thing. And
it goes, "Oh, okay. I'll do it again."
And and the the groundbreaking nature of
what that meant was that the uh AI would
often review its code and realize it had
missed something in some way, right? So,
it figured out that uh there wasn't it
wasn't quite finished. And this is quite
a common problem with AI coding tools
last year. It wasn't quite done. it
didn't quite get to the end and
therefore it go oh yeah I should have
fixed that bit and then does it again
and then and then when it stops and says
right I'm definitely finished now 100%
it's done it's finished and then what do
you do you give it another prompt again
say go away and build the feature it's
like I've built the feature and tries
and looks again oh yeah there was
actually this tiny thing that I should
have done I really am now finished and
so on and so on so you can kind of see
the utility of kind of going through
that loop um where you just build the
feature and just then ask it to build
the feature and then you ask it to build
the feature um So, so that's kind of the
first stage of RF loops. And what I'd
like us to do is I'd like us to try
that. So, firstly, I'm going to uh do a
bit of live coding. Hold on to your
hats. Uh we'll see how that goes. And uh
we're going to try and do that process
using claude code to see where that
takes us. So,
um let me
start change what I share.
Sorry, just takes a moment.
Okay,
great.
Uh, this one, can everybody see that?
Okay, can people see that in the back?
Okay, do you want me to increase the
size? Got thumbs up. Great. Okay, so
this is a piece of code that I vibe
coded in about three minutes last night.
So, it's not good, but we'll that's the
whole point. We're going to fix it. Um,
so it is literally a Pomodoro timer, and
you can see how it works. So, if I go to
Python and type Pomodoro
start, woohoo, we've got a Pomodoro
timer. Uh, that's all it does. It
literally just does start. There is no
way of finding out whether it's finished
or complete or anything like that, but
that's what we're going to change. Um,
the other cool thing which is very
important for any self-respecting
vibecoded AI project is that it has
tests. So, look, it's got a test.
There's one test and the check to see
whether it starts. So, that's great. So,
if we just have a quick look, and you'll
have to forgive me if you're not a Vim
fan because I am. Um, we although I
hardly use it now, it's quite sad, you
know, 20 years of muscle memory just
gone. But, um, but yes, so all it does
is it literally just runs a start
command and then it saves in your
Pomodoro uh Pomodoro in your home
directory. It saves the time in which
you started. Really, really simple. So,
this is a very simple, quite
straightforward project. The difference
is that it has a new folder with
different things in it. And these are
tickets. So there are a whole bunch of
ways in which we could improve this
pomodoro timer. Um and the first ticket
is it would be really nice to know how
long is left on our pomodoro rather than
just starting it. Um so I' what I've
done is I've created a very simple
ticket system to allow us to just kind
of capture some changes. These are not u
this was oneshotted. So I have no idea
if these tickets are actually good. In
fact, I haven't looked at some of them.
So we'll see how that goes. But um but
the idea is that uh we can use these in
order to start building a loop of work
in order to get something done. So um
what I'm going to do to start with is
I'm going to start Claude and I'm
literally going to say write the first
ticket. So bear with me while Claude
fires up.
In some ways I'm quite glad they didn't
actually release Mythos yesterday
because I think I don't think it would
be working today if they did. Uh that is
really not working, is it? That's
frustrating.
>> Wow.
Let's try again.
>> There's a problem with Wi-Fi.
>> Oh, okay. I mean, that could cause some
problems to my talk, but we'll have to
see how that goes. I don't have one of
those fancy new um Macs that allow you
to run. No, this has actually locked up
my computer. Can you believe that? It
was working literally 10 minutes ago.
Uh let me just have a look. Uh bear with
me while I debug my machine.
>> Um there is um I've got I think I'm on a
different Wi-Fi, so it should
uh No. Yeah, the Wi-Fi has gone down.
fun
>> tethering time
>> might have to be. Okay, hang on a second
while I tether to my phone, which I
think has decent 5G, so we should be
good. Okay, let's see if that's any
better.
>> Hooray.
Okay, let's try again. Uh, not that one.
Um, cool. So, uh, code Pomodoro
workshop. Okay, is that big enough?
>> Okay, good. Let's try this Claude via
the power of 5G. Look at that. It works.
Fantastic. Okay. So, um, what I'm going
to do is we have, as I said, a very,
very simple, stupid Pomodoro timer. And
we're going to implement a ticket. So,
what I'm going to do is I'm going to say
implement
this ticket. So, it's in doc tickets
001.
Great. And I'm just going to say that.
See what happens. So, what's going to do
is going to read the ticket, which I
showed you briefly earlier. It's it's um
very straightforward. And all it does is
it implements a status to see how far
we've got. Um and then what I would like
it to do, what I'm going to do after
this is I'm then going to say when it's
done because it will literally be two
files. It's not going to be difficult
for it to do. And then I'm going to say,
you know, implement it again and see
what happens.
So, there's a few different ways that
you can do this. Um and there is no kind
of one set way of doing a Ralph loop.
It's really about the concept, not about
anything else. Great. So, it's done the
ticket. If I just quickly do a quick get
diff, you can see that what it did is it
added a status command. I think when I
had a show of hands earlier, most of you
are coders. So, hopefully this is not
tricky to follow. And then we've got a
new test. Look at that. We've added a
test. It didn't even ask it to. It added
a test. Oh my gosh, what is the world
coming to? So, um, now what I'm going to
do is I'm literally going to say the
same thing. Now, a year ago, this would
have been really important step because
it would have definitely missed
something. Um whereas now it's like you
already done it. It's fine. Right? So
Opus is now much better at at noticing
when things are done. Now a traditional
Ralph flute would just keep doing this,
right? And it would keep going with this
kind of implement this ticket, implement
this ticket, implement this ticket. And
um and this is kind of boring and it's
not really going to do very much else.
Um and at some point actually when I
tried this earlier, uh it's interesting.
It's done something different. It
actually noticed that what it should
have done is it actually should have
updated the status to done. So the
process worked that didn't work earlier.
Uh so that's great. So it's actually
noticed something that it didn't do. So
there you can see the fundamental early
principle of early Ralph loops, right?
The idea that you can just zoom through
and and do something and it will find
things eventually that it missed.
Because it missed that. I'm just going
to try once more, but I don't think it
will come up with anything else. Um like
I said, latest models really don't need
this step in quite the same way. Um they
they tend to just kind of get it done.
In fact, this time it's just like, um,
oh, if you're running a Ralph loop that
picks up the next ticket.
Oh, that's hilarious. It's literally
giving away my presentation. Um,
that's fantastic. Okay. Um, what I'd
like us to do is I think as a starting
point, the other thing you can do is you
can just kill the context and then you
can do the same thing again and you can
say implement doc tickets 001. And I
can't bother to spell it. It'll find it.
Um, and then, um, so now, uh, what I'm
doing is basically doing the same thing,
but without it knowing about the
previous context. So, it' be quite
interesting to see what it does with
this. I'm assuming it'll find assuming
it found the ticket. Yeah, it did find
the ticket. That was easy enough for it
to do. It's just running the test to
make sure that they work. And it all
passes. Okay. So, it's happy. So, some
people when we when we first started
using Ralph Loops is that they weren't
doing it within the same status. And uh
there was an early claw code plugin that
just on the stop hook which is which is
what runs right now when it stops
running it would just do the same
command again. And so rather like me
just typing the same thing in each time
but that didn't really work very well
because it didn't get very far. Whereas
uh now what's um more useful or or what
people started doing was just running
claude code in a kind of loop. So they
would do something like while true um
and then do claude
um implement ticket 001,
right? And then done. And then that will
just go through. Uh not quite actually
because I didn't do claude P, but
effectively that's that's what they were
doing. Um oh no, now I've really screwed
it up. I really shouldn't have press hit
enter on that, should I? Okay, there we
go. So that's but that's effective what
people were doing. The dumbest Ralph
loop is literally that just a while loop
and it just goes through and implements
stuff super super easy. Um so um what is
the next step in a RA loop? Well, in
fact what we're going to do now is I'm
going to get you to get to that point
and then I'm going to take questions
from other folks. So um let me just
switch to
back to here. I'm hoping. There we go.
Yeah, great. Um, so what I'd like you to
do if you could crack open your laptops
and um, grab the code from here. So, um,
it's just on my GitHub uh, as Pomodoro
workshop. You should be able to find
that quite easily. Um, and you saw how I
ran it. It's very simple. Uh, you might
need to set up Python in your machine.
So, there's hopefully that won't be too
hard or you just run bare Python.
Pomodoro.py will give you the command
you can type and then it's um, just a
unit test thing to run the test
pomodoro. Super easy. And then in step
four, I want you to fire up Clawed Code
or Codeex and I want you to try and
build that ticket and make sure that
it's working. Um, and that will be a
great starting point. Um, but don't
build any more tickets yet. Don't don't
let it take you too far. And then, uh,
if you are really used to that and that
is just like a literal no-brainer for
you, try in codeex, try in something
else. Try in maybe try setting up
something similar in one of your own
projects. Um, so something different.
So, I'm I'm going to take questions now
while people are typing away on that.
I'm going to give you maybe a few
minutes just to get that set up and then
we'll kind of move on to the next step.
Does anyone have any questions or
comments or thoughts? I have a
microphone here.
Uh if people would like to ask anything.
Yeah,
there we go.
>> It should come on in a second.
>> Hopefully,
the guys at the back are
It may not be on. Is it on?
>> Yeah, you could shout and I repeat.
Yeah, that could work.
>> Is it? Can I just check it's on first?
>> Yeah, it looks on. That's weird. Shout
and I repeat. Anyway, oh, there we go.
There we go.
>> Great.
>> Um, I've I've played a bit around with
the BMAD method, which I don't know
whether you've seen that. Um he's got a
guy a guy who's basically written a
whole load of skills and commands for
following a full agile process from
>> you know and he's got an agent for build
it test it
>> you know everything and and I I guess
>> so
um have have you done anything where
sort of using this kind of Ralph loop
process you go through that cycle you go
through the full software development
life cycle of each stage and then Yes.
Um I I might get as far as that at the
end. Um but yes, I have tried some of
that stuff. Um it's really interesting
and it answer it asks some really very
good questions both around context and
and actually the value of the work. It's
really interesting. So yeah, we'll talk
about maybe that a bit more at the end.
So ask again if I haven't got to it,
just ask the same question again and
we'll get there in a raffle.
>> Okay. Thank you.
>> Um anybody else got any questions?
How are people getting on with setting
that up? Have has anyone managed to set
it up? You know, kind of wave at me if
you've managed to get it running. Great.
Great start. Has anyone managed to
implement the first ticket? Hooray. A
few people. You got a question?
>> Oh, go. You've done it. Great. Fab.
Yeah, I probably should have given
different directions for asking a
question versus having finished. That's
great. So, a few people have got
started. Fantastic. Great. So, you can
probably tell where this is going. And
if you were paying attention to the live
demo, you'll already know the answer.
I'll grab the mic from you so you don't
have to keep holding that. Thank you.
Um, but yes, you don't have to just stop
at one ticket. Now, is Matt PCO happen
to be in the room? I know that he's
doing the workshop after this one. He is
the person I got this from. So, if he's
watching the video, thank you, Matt. Uh,
this was a this was a a revelation to me
back in sort of September last year.
Posted a brilliant YouTube video just
about exactly how to kind of take Ralph
loops to the next level because when
they first came out, I spotted it on the
internet. I played with it and I was
like, "Yeah, this is fine. It's kind of
cool. It kind of spots things that AI
can do where where it's missed things
and it can maybe do a slightly better
job of things, but it's not going to
change everything that I do. Um, and
then the answer is actually it does
change the entire way that I work and
approach code now. So, um, I guess the
the really interesting thing is not how
do I make sure Claude has finished this
one thing. It's what happens if I point
this kind of loop at a whole pile of
things to do, right? What happens when
we point at a whole list of things? Now,
I tried this. I I wrote a blog post
about this, which was a bit depressing
because it just showed abject failure in
the entire post to be honest, but it was
more about I tried what I tried to do is
I tried to get um Claude, I think it was
Claude at the time, to break up a big
project into a lot of different tickets.
And then I got it to break down all of
those tickets into smaller tickets. And
then I got it to to figure out what all
the dependencies were between those
tickets and write them all down really
carefully. And then I got it to figure
out how I it could use like a ton of
different agents. Sorry. Did you have a
question?
Uh one, two, one, two. Okay. I have a
problem with I had a problem with Wi-Fi
and I didn't do the clone.
>> Oh, I'm sorry. Go to this one.
>> Yeah. Yeah. Couple of minutes. Yeah.
Thank you.
>> Yeah, that's fine. I'll leave it on
there. That's fine. Um has everyone
appreciated the slide. Okay, good. Okay,
there we go.
>> Um so, um these slides, by the way, are
created using a slide skill using Nano
Banana Pro. It's absolutely incredible
at making slides. Um, I didn't I haven't
apart from like the tiniest thing like
adding a QR code, I haven't added any
text to these slides. They're just flat
images in Google Slides. Um, they're
actually it's incredible at making
slides. Um, what was I saying? Yeah. So,
uh, so the idea of just creating
uh a a Ralph loop to just do one thing
seemed a bit pointless and and it was
just working around a few limitations.
If you point this loop on a whole pile
of work, then it becomes incredibly
uh powerful. And as I was saying before,
I created this huge complex dependency
graph with a whole ton of different
tickets about how I was going to build
this really complex system for me. And
then I got I fired up like six or seven
parallel agents and I was like, "Right,
you do one stream and you do that stream
and you pick up this ticket." And it
just failed horribly because the the
system just couldn't figure out what had
been done and what hadn't been done.
Picked up there was lots of contention
between tickets like well I can't do
anything until you until I get that
share ticket done so I'm going to do it.
And another Claude was like well I can't
do anything until that share tickets
done so I'm going to do that too. And
then they both implemented the same
thing and it was a huge mess. So that
was really very depressing and I wrote a
whole thing about it and I was basically
like it's impossible to orchestrate
large numbers of agents you know you
just can't do it which was obviously
nonsense but um that's how I felt at the
time and what was interesting was that
what I'd done effectively was recreate
the waterfall processes that were seen
in some of the worst companies back when
I was starting to code for myself in the
90s where people would write
requirements documents that you had to
stagger to carry into the the
requirements meetings.
uh where the the entire project was
specified up front with all the
intricate dependencies handed to the
development team and then given two
years to build. I can see some perhaps
um slightly more seasoned people in the
room sort of nodding and smiling at me
when when they hear me talk about this
but yeah I thankfully managed to avoid
working on any of those teams but I some
of my friends did and it was absolutely
awful. And what I had done is basically
I had given that to Claude to do. I'd
given that waterfall process to Claude
to to um to organize and figure out as
as it went, which was really bad. So no
wonder it didn't work. If humans can't
do that, how was how is AI supposed to
do any better? Um however, if instead of
saying with all of your tickets, right,
the first one is the most important,
then this one, and then you should do
this one, but don't think about this one
until you've done that one. Instead of
doing that, I'm going to go back for
just a minute. Are we all good with this
slide? By the way, does anyone still
need the slide? Okay. Okay, we're good.
Right. Uh instead of doing that, you can
just run a loop where you say something
like, "Hey, just pick the next most
important ticket."
It's as simple as that. Just figure out
here are all the tickets. Just figure
out what is the most important next one
to run. Okay? You don't have to worry
about the dependencies. You don't have
to figure it out yourself. the AI is
quite capable of looking at all of them,
figuring out the dependencies on the fly
based on what's just been done and
figuring out what the next most
important thing to do is. That's
actually quite easy for an AI to do. The
one thing it cannot do so easily is
manage that process in parallel. But to
be honest, when we're running these kind
of loops, if you're running them
continually, the bottleneck is usually
not the number of agents. It's usually
you just keeping up with the AI just
doing things over and over again. So
let's forget parallelism just for a
minute and just start with a loop. See,
if you can keep up with an AI, just one
AI that's running continuously, you're
fine. Don't worry about parallelism just
yet. Don't worry about gas town, any of
that stuff just yet. As impressive as
those projects are, you can just start
with a simple loop. It is okay. So, um,
what I'd like you to do again at this
point is I'm going to quickly show how
this works for those of you um, who
don't have your laptops, but then I'd
like you to just try it on your
computer. So again, let me find my mouse
and then move to
sharing my screen again. Okay,
great. So if I go back to Claude, in
fact, I'll go back to Vim first and look
at the tickets folder. So I've got a
whole bunch of tickets here. I've got a
status command, I've got a stop command,
I've got custom durations. Never use
that anyway. I use um other things like
you know labels and all of that. Um I
could try and figure out the
dependencies myself but I really just
don't need to. I can just simply go into
Claude and say implement the next most
important ticket using
uh TDD
principles
from doc tickets.
um commit
when done. Something like that. Okay.
So, let's see what it does.
So, it's now reading a whole bunch of
tickets. As you can see, it's read
number one, two, and three, and it's
decided that the next one is number two.
It's just going to do it. That's great.
So, it's going to work on it. Um, now
the interesting thing now is that once
this is finishes, now it's using TDD, so
it read the test first. Uh, when it
finishes, hopefully, yeah, it's marked
it as done. Very good. Remembered that
time
and then it should commit. Let's see if
it does.
>> Sorry.
>> No, this is a brand new session.
Although I think it probably had the
working directory from the previous one
still. So um in fact I think it has. So
what it hasn't done is committed those
atomically which is definitely something
I could improve in my prompt but we can
cover that in a minute. Um so then
hopefully it's just going to do that
and then it's going to finish.
Yeah, great. It's done it. Fantastic.
Now what I can do is I can either do
that again uh as a row loop or I can
just restart a new session. Just do the
same thing and this time it will pick
something else and then it will keep
working. Now you can imagine that if I
put this inside a while loop then it
should in theory work through all of the
tickets in some way. Now whether what I
get at the end is actually what I want
is a whole different question. Um but it
will definitely get a lot of work done
in a row. So I'd love you to try it. So,
if you're if you've got the um app
working on your computers, um see if you
can get it to work through just as many
tickets as you want to within that
amount of time. Um so, it should be able
to just carry on. See if you can get it
to to actually um uh maybe write a
little bash script um just like I've
done where you do a while, true, and
then claude. I'll show you how to do
that briefly. In fact, if I just quickly
get reset so it can start just from the
beginning. Um, and in fact, I'm actually
going to go up one more
hard
head. There we go. Great. Yeah, that's
the right place to start. Um, so, uh, if
you can get it to do that, then that's
great. The other thing that you can do
is instead of using claude like this,
you can do claw-p. Can you all see that?
Okay. By the way, I'm not sure how I can
make that. Um,
>> yeah. Yeah, clear is a good one. Yeah.
clear
claude. There we go. So, what we can do
is actually use claude.p like this. And
you can get it to output by just doing
stream JSON or something like that.
What's it called? Something like stream
JSON.
Um, hang on a second.
Let's see.
Um,
I think they've removed it. That's
annoying. Never mind.
which won't see any output.
So there's nothing to stop you setting
it up like this and then you can just do
that.
>> You got to set up CL to have full
permission so it doesn't properly.
>> Yes. So the only way that this works is
if you want to run this properly and for
it to not stop, you have to be quite
selective about the permissions that you
give it. Um so the question was
presumably you have to run claw with
full permissions for this to work. Yes.
Yes, you do. Um
>> it depends on what you're doing. Yeah,
if you're working in a little sandbox
project like this, the chances of it
going elsewhere to find stuff out is is
very small. Um, I have a project called
lockbox and the sole purpose of that is
to try and stop it doing stupid stuff um
by why when it reads untrusted tokens
which could potentially send it off
track, it um it basically just prevents
any kind of file system access or
anything after that. So, there are ways
of of kind of managing it. Um, so what
this is doing in the background is you
can't actually see it doing anything um
because I don't have that that output
mode, but you can figure out basically
that to to run this in in some kind of
script. And if I in fact if I quit that
um hopefully it will stop.
There we go. No,
let's just keep going. Sorry. Clearly
clearly a more productionready Ralph
loop would not look like this. But you
can see it's done a bunch of work. So if
I go to here, you can see it's already
started on the status command. Um, and
it's just working through that at the
moment. So it started at the beginning
again. It was just working. So um,
there's a few things to be aware of
here. One is that feedback is really
really important with Ralph loops. Um,
you know, you need to be able to have it
run uh um in a uh in a way that you can
tell what it's doing and how it's doing
it. So this kind of super basic one that
I've given you there isn't very good.
That's not one that I would recommend
running in production. Um, equally, you
need to um figure out exactly what the
prompt is for Ralph. And that's a
really, really important point. And I
think what I'd like you to do when
you're trying this is yes, it's going to
be building a bunch of tickets in a row,
but equally, it's going to be um uh it's
going to be doing them in a way that you
don't like. Um so for example if we go
if I'm if I'm running this test which is
literally just implement the next most
important thing uh let's start with this
one
um I would probably do something like um
run simplify
which is a really useful skill from
Claude from the anthropic team uh when
finished and ensure you refactor to
reduce duplication you can imagine
Imagine that you could create quite a
complicated um skill for this. And I'll
show you I'll show you my kind of actual
skill for this at the end. But as you're
kind of working through this, do try and
figure out um if there's ways that you
can improve what it's doing. So give it
a go, let it make a decision, and then
let it write some code and then read the
code and think, okay, what could I have
actually improved about that process?
And then reset everything and then
improve the prompt after that. um have a
go at that and see how that works.
Whilst people are kind of working
through that on their machines, I'm
happy to take questions.
>> Yeah.
>> Have you used the skills like
superpowers? Uh
>> which one? Sorry.
>> Superpowers one.
>> Uh the superpowers one. I what I did is
I pointed Claude at the entire
repository and said figure out anything
that isn't currently in my skill set and
um implement them for me with my own
context and that worked quite well. So I
haven't used those ones particularly but
I've basically rip them off.
>> That's great. Um then because I use
superpowers a lot and then I just give
tasks like this and then ask it to like
>> run multiple agents in the background.
Yeah. And have you done that is what I
>> Yeah. Yeah. So what you can do is there
is an agent teams version which I think
I've got turned off in this particular
instance of cla code but what can happen
is you can get uh claude to um use uh
sub aents within team. So in fact I
think I might be able to turn that on if
I can find the agent teams. There it is.
Claude code experimental agent teams. So
if I grab that
and then run Claude with that on
then you should be able to say use an
agent team to implement the dock tickets
in this repo or something like this. And
I don't this isn't actually running
within T-Max. So um so actually maybe
this won't work. So I might just try
this again.
Bear with me a second. So if I grab that
and then paste that there. And then grab
that and then run T-Max.
And then
run that in theory. In theory, um this
should uh start pulling up other agents
and it and because like I said trying to
orchestrate uh Ralph loops or
orchestrate agents myself to try and
organize all of the dependencies and
complexities is actually really really
difficult to do. But um what you can do
is just give the job to Claude to do and
it does a much better job of managing
that for it. So as you can see it's
already decided to print out the entire
thing, the file name and the the ticket
for each and it's got a whole bunch
there. So it's actually decided that
they're all sequential. So therefore it
should run it should run none of them in
parallel which is kind of interesting.
Um and then it should in theory start an
implementation agent. Let's see if it's
going to uh
I think it's just running as a sub
agent. Never mind. I'm not sure that's
going to work. Never mind. If you can
get it to do it, then let me know. But
but but basically, it's an experimental
feature that only came out a few weeks
ago that allow um it to kind of start
sub agents include as well um in order
to do things.
Any other questions while people are
working through that? Yeah.
>> You said you built a rough with
automation, right? So what was the
feedback criteria? So like what decides
if it's a good newspaper article like a
website
framework but what good looks like if
that's dynamic if that's static do you
put that in the cloud MD file?
>> Yeah great question how does that how
does that work? So, so um the question
is just to to repeat the first half of
that is when I used the NA10 workflow in
order to build a Ralph flute for a
newsletter creator, uh how did I know
how did how did the agent know what good
was when it come?
>> How did I define good? Okay, great
question. So, so in terms of
newsletters, I had already been writing
my newsletter manually. So, I knew
roughly what I wanted it to read like
and sound like. I also did a bunch of
research um using a research skill which
was something like which I built which
is something like um great newsletters
and I've also did things like that. I
also said things like um this is a
fantastic
written new in fact I'm just going to
this is not what I do I actually do this
is a fantastic newsletter that I've
written or that I've read somewhere
could you please figure out why this is
so good and what are the kind of
editorial principles that go went into
this newsletter for it to to work really
really well and then I would just paste
that into paste the newsletter in get it
to figure out what what was good about
the newsletter and then and then I would
check it and then I would say yes
there's still is an element of human
taste here. You can't entirely get away
with that. Having said that, I do also
have a simulate audience skill which um
basically uses a whole bunch of
different personas uh for different
clients or or prospective clients that I
work with. And then I would run it run
the finished newsletter through that and
say run all of these in parallel um and
then once you have finished that figure
out ways that I can improve this
newsletter or newsletter skill in order
to do that. So there's a number of
different ways you can do that. the kind
of audience simulation is super
experimental, but it's actually really
effective and and often will will
surface insights I just hadn't thought
of. Um I'm u my my personality as I'm a
bit slightly all over the place,
slightly kind of the way that I talk and
and communicate um often my clients are
not like that. So um I tend to barge
people with information and sometimes my
skill will say okay there's a lot of
ideas in this Chris you just need to
focus on one main point that makes sense
and I'm like that's so helpful. So um so
yes what's quite helpful and interesting
is that you can use AI to give feedback
on AI like that. The great thing about
this particular
um project that we're writing this
little pomodoro thing that people are
writing is that it's a command line tool
and it's really simple to know whether
it works. So it's perfect for a Ralph
loop and in fact these little tools that
we build for ourselves like for example
the newsletter skill perfect for this
kind of loop. I I will often say, "I
want to improve this skill. Could you
please back and forth and and write the
content, then use another agent to read
the content, decide if it's any good,
come up with things to improve, then
send that back in and just run that as a
loop. Um there's a really cool um I
wasn't going to tell you about this
until the end, but I'll tell you now."
Um there's a really cool uh feature
inside code called loop where is instead
of um creating, in fact, I'll start this
in a new session. Instead of just doing
this thing where you have to create your
own while loop, you can say loop every
minute um build the next ticket
from doc tickets basically. And then
what will happen is um the loop will set
up a um a kind of almost like a a repeat
timer. And as you see it's it's got a
cron create tool which for the
uninitiated just means do something
every minute. This is what those five
stars mean. And um what it's going to do
is it will literally just build the next
ticket. It when it finishes it will then
check the chron again, build the next
ticket. When it finishes it will check
the chron again and keep going. So
that's great for working through a bunch
of tickets, but it isn't just applied to
a set of things that you've got from
before. If you think about it, I'll just
leave that running up there. You could
have a loop that does something like
this. loop every one hour check linear
for new bug reports from test.
Um and then
I just leave that running. Um, oh yeah,
can't spell.
Um, just leave that running and um,
you're going to get you're going to
annoy your testing team. But but anyway,
the point is is that you can run these
kinds of loops in order to get work done
in a in a in a quite an interesting,
I guess, dynamic way. Even though it's
quite a simple loop, it's just find the
next thing, do the next thing. If you
think about it, heck of a lot of our
work is just loops. If we're software
developers, what do we do? We look at
the backlog. We pick the top thing from
the backlog. We pull it over to, you
know, in progress. We assign it to
ourselves. We check on the architecture.
We figure out whether there's other
contexts we need. We uh look at the
change. We we make the change. We submit
a PR. We um uh you know wait for
reviews. We uh comment on the reviews.
We reject the reviews. We implement the
changes occasionally. We submit the PR.
We merge the PR. We then go through the
release process. Then we start again,
pick up the next ticket, and so on. That
is a loop. It's quite a complicated one.
Um like we were talking about just a
minute ago, but but it is still a loop.
It is possible to get an AI to run that
entire loop. there's no reason not to.
Um, and that's effectively what's
happening here when when um you can set
up in fact you would never actually
write this. You would much more likely
to write something like this where you'd
say every one hour
uh linear
bug finding and you'd have a skill that
encoded all of those um those chunks of
information that I just gave it in a way
that would work for you and your
particular team. Does everyone does
anyone not know what skills are before I
go any further? No. I think pretty much
almost everyone knows what skills are.
If you haven't figured out what a skill
is yet, then then this is your homework.
Go and understand how skills work. They
are the best uh way that we have at the
moment of packaging up useful little
parcels of context and scripts and
moving them to different places or or
creating different things. So, for
example, I mean, I have about 50 of them
that I've written. Um, and then they
just do lots of different things. The
great thing about skills is that you can
pull them into your context whenever you
need them. So, for example, and I'll
just do this. I could say, "Do you know
how to create images using
uh nano banana?" And I can ask um the AI
the question, and the answer is, well, I
could kind of look this up um but it
actually knows that I have an image
skill for this, funny enough. But if you
hadn't got one, it wouldn't know. But if
I then do images and say uh how do you
create images? Give me the step by step.
Then and what it's going to do is it's
going to pull in that images skill. Um
and then it tells me exactly how it does
it. And I've actually written in fact I
will make that bigger so you can see
I've actually written a script within
that skill that actually does the
generation for me. Um, so it's codified
the process of doing that and it just
picks whichever um, uh, model it wants
to and it gives it content and I have
these specific templates that I use in
order to create specific Nanabanana um,
skills. Nanaban is brilliant. Um, this
is how I created the presentation that
you're looking at. I have a slide skill
and an image skill that work in tandem
in order to create these presentations.
Cool. So let's see what the other thing
has done. As you can see, it's already
on ticket six. Um, the great thing about
Ralphs is you just keep working and keep
talking about something else and it's
done a whole ton of stuff here. And it's
just stopped at this point, but in a
second, um, hopefully if we just wait,
it will start the whole process again.
There we go. It's got the scheduled task
to run and it's going again. Um, so you
can just you you can just leave claw
code sessions uh running with these kind
of loops in them. They last about three
days, so you do have to keep refreshing
them. Um, but you can you can just do
that and keep it running even before you
get to a more complicated write a script
that wraps claw to do a thing and all of
those kinds of things. Um, any other
questions?
Anyone got anything interesting or
surprising out of their Ralph loop? Has
anyone tried this on their real work
yet? This would be the interesting
thing. Yeah. How what was your
experience? Have you still got the mic?
>> Yeah.
>> Yeah. Yeah.
I just made a screenshot Ralph loop for
a website context engineing framework.
So Claude just takes the screenshots and
then looks at the layout because it has
problems with geometric like spacing.
>> It works well.
>> Nice. Cool. So you're actually using
Claude Claude screenshotting to to get
feedback. Yeah.
>> Yeah. That's pretty advanced. Not many
people are doing that. The um people are
trying to use um uh like um playright
and things like that as well to take
screenshots and the claudin chrome
plugin that comes with claude as well.
can use that in order to get it to drive
Chrome and then take screenshots of
what's going on. I've had mixed success
with that because it it's quite a
complex thing for it to manage. But but
for just basic screenshots, it works
really well. For my um images and
content uh that I write, um I when it
runs those images skills, it will always
look at the images first to see whether
there's any kind of a weird AI garble
text or whatever and it will reject them
without even showing me if there's a
problem with an image. Um was there
another question or comment? Yeah,
there's a question just back here. Can
you just pass the mic? Is that okay?
Thank you so much.
>> Um I think it's it's close to a question
that has been already asked. Um because
I'm not quite familiar with Rough Loops.
If I ask the agent to implement task one
that has already been implemented, would
it actually check the quality of what
was implemented or only check if it was
done or not or marked or not? Yeah, it
very much depends on what you set it up
for. So, so there's no kind of magic to
a Ralph loop. It's just a loop. So, this
loop that I'm running at the moment, in
fact, I probably should just say loop
stop. Otherwise, it's going to keep
going. Use my quotota. I think I think
you can just stop like that. We've got a
quite a fully featured pomodoro setup
now.
Come on. Time to stop. Um, so it depends
on it entirely depends on what you
write. So, if you um if you go through
to what was the loop that I set up? I
think it was this one. I just said build
the next ticket. That's very ambiguous
and not very helpful. So, it might not
actually finish it. It may just decide
to build it and not ship it. It may not
actually be very helpful. So, what's
more interesting is if you go to um let
me probably the easiest thing to do. If
I load my my Ralph skill,
this is my actual skill that I use um
for Ralph loops. So um and what I'm
doing actually this is slightly out of
date. Um but the one that I've got here
is actually using a doc changes folder.
Uh you can see that there, but um I
using a doc tickets in this example, but
I've changed it on the the latest one.
Um but um but ultimately you don't have
to use a ticketing system like a flat
file in a in a um in the GitHub
repository. You could use beads which is
Steve Yaki's version of of this kind of
approach which is quite cool. I've used
it. Um you could use linear, you could
use Jira, you could you know as long as
you can get access to it from the AI,
you can use whatever uh ticketing system
you want. uh for this for the purposes
of this exercise that you're working
through I tend just to use uh flat files
because they just work you know it's not
you don't really need anything
sophisticated um in the same way um the
uh the Ralph loop is entirely what you
make it in terms of its effectiveness so
so for example um in this particular one
um I've given it a proper kind of role
in the sense that you are one engineer
in a relay team do exactly one change
then drop the context and stop start
Again, that's the idea. So, for this
one, it's designed to be run in a shell
script where it has an entirely fresh
context each time because I didn't want
the context to pollute each time. These
days, I care much less about that
because context is so much larger than
it used to be. But, um, but when I wrote
this, that was very important. As you
can see, it's specifically for code that
doesn't need human review before
shipping. Um and then basically um it
tells you about when work should go in
there. Uh what the right time for the
tool is read the claw.md change the
format. This is the format of a ticket.
Um this is all of the rationale. These
are the different status values. Um I
ask it to check git state um to make
sure that it hasn't got a working
directory. Um it's also got you know
recovery states. So if it crashed, it
knows that if there's a a dirty working
tree, but the tests are passing, but
you're probably done, um, but you might
not be, so just double check. If the
tests are failing, um, then it's
probably just mid-flight, but broken, so
you should probably just throw it away
or or just treat it differently. So, so
you can you can imagine that this was
built up over time of trying to trying
to get this working, trying to
understand what the user wants, make
sure test passing is not enough, verify
the actual behavior works, run things in
parallel, mark it done, blah blah blah
blah blah. There's an awful lot going
on. Um, so with a real Ralph loop, you
want to be building up over time for
your specific project exactly how to
check something is working, exactly how
to run the test in your particular
framework and dialect, how you submit
things to the test team, how you want to
uh comment on particular changes, what
style you want to use, whether you want
to um pull off the thing that feels most
obvious to you or the thing that's
highest priority or a mixture of both
depending on how you're feeling that
day. whatever whatever you want needs to
be coded in it. So when you're writing a
Ralph loop, um I'll show you a link to
grab this one at the end, but there's no
need to use just mine or or something
else. Just start with mine and then say
fix this for my project and and and
allow it to change and morph and evolve.
Couple of questions. So um just come
forward. Thank you.
>> Hi. Um thanks for the talk. Um could you
expand a bit more on the topic of
sandboxing because that would be the
thing uh stopping me from running this.
Uh
>> yeah, it makes sense. Yeah. Yeah,
absolutely. So there's a number of
different ways to sandbox this. For this
particular small project, I'm not doing
that. Um most of my work happens on a
VPS which is away from my main machine.
It has a few keys on it that are
specific to what I want it to do. And um
it can access developer tools. Uh a lot
of them it can only access them
readonly. Um, it can also access my
email, but again, it has quite strict
fine grain claw permissions for not
sending emails because that's quite
important. It I don't let it ever send
an email. I only ever let it draft them.
So, um, so I use a combination of uh
positioning the code physically, not
physically, but like away from the
machine on a different machine on a VPS.
Uh, I use clawed permissions for that as
well. The permission system is a bit
broken, but it it mostly works. Um, I'm
trying you to build lockbox to make it
even better. Um I use um what else do I
do? Uh so the keys that I use are
separate keys. So um the AI has it
access to its own keys which I don't use
for my other stuff. So I can I can see
the kind of audit trail of what it's
done. So there's a number of different
ways of doing it. Um if you want to just
run things simply on your own machine,
there's Docker sandbox which is quite
cool. It's a new feature in Docker which
just allows you to do Docker sandbox cla
and a run claw within that sandbox. Um,
so you can kind of isolate it within a
specific container. That's quite
powerful because it allows you to only
change things within that specific place
in the file system. The challenge with
that is that it can still leak data from
one of your systems to another of your
systems. Uh there's a thing called the
lethal trifecta. I don't know if you've
heard of that. Simon Wilson uh coined
it. um an idea that if you have
untrusted tokens, uh internet access and
access to secret important data you
don't want to lose, you're going to lose
that data. Basically, that's that's the
the um the bottom line of it. Uh so you
have to kind of minimize the amount of
times that those things collide in the
same context. Um so yeah, lots to say
about security and sandboxing
specifically. Um, I tend to run I don't
run with dangerously skip permissions.
Uh, but I do run with with um a number
of things turned on by default, but not
everything basically. And you kind of
have to go through and figure out what
what your risk profile is and how much
you care about those things. And
certainly as you're giving, the main the
main things to to read up if you're
interested is to read up about the
lethal trifecta if you didn't already
weren't already aware of it. And um kind
of be thoughtful about how much power
and permission you're giving to your
agents, especially if you're using
something like OpenClaw, which is
unfortunately insecure by default. Um I
know that they've been doing a huge
amount of work um on OpenClaw to make it
more secure, but it is still a challenge
for those kinds of agents. They do have
access to a lot of things. Any
other questions? Yeah.
>> Uh yeah, you had a validation step in
the loop.
>> Uh this might be anecdotal evidence, but
as soon as I changed mine to use sub
agents here now for the validation step,
it started finding things.
>> Ah, interesting.
>> Whereas as long as you're doing the
validation in the same step with the
same context, it just pats itself on the
back and like Yeah, it
>> that's a really that's a really good
point. Um there's definitely uh
confirmation bias going on with agents
where they're like, "Oh yeah, of course
I wrote it fine. It was fine. I checked
it a minute ago." Uh yeah, using sub
agents is really powerful because a sub
agent starts with only a small chunk of
context. It doesn't start with the full
context, right? So so you can get much
more uh power from it. So as a good
example from this particular project, um
a really useful um skill which I
mentioned earlier is simplify.
Simplifies a claw coding bundled skill
and what it does is it will look at the
most recent changes and it will run
three sub aents to try and figure out
what whether your code should improve.
So you you can see here what it's doing.
So hopefully this will run these will
load and it will probably find a bunch
of problems.
Yeah, great point.
>> Great presentation. Thank you. Uh did
you try open spec or combined with
openspec or any other spec driven? No,
I'm I if I'm honest, I'm not a huge fan
of spec driven development. I know
that's that's um controversial and I'll
qualify that. I'm I worry that spec
driven development is taking us at the
extreme is taking us back to the bad old
days of waterfall where we would specify
the entire or try and overspecify a
project. Even these little set of
tickets I'm not that comfortable with. I
feel like spec should be much more
iterative um and things that we can see
it's already fixing a bunch of things
that's quite cool. So it found just to
finish off that point it found a bunch
of issues there um and it's got some
fixes. Uh yes so specs um I like just in
time specs. I like the idea of building
or thinking through what you're trying
to build creating some kind of plan and
claw code and then executing it. That's
fine. I'm happy about that and I think
that's a useful step. What I worry about
is a I I worry about things like Kira
where they've codified that into the to
the tool. I worry that that will almost
fossilize that that one approach with
with AI that works today but may not
work again when mythos eventually comes
out. Right? So I worry that the tools
are being too quick to jump to a
specific structure of work that may not
be the right thing in the future. So I'm
I'm cautious about that. Um, I think it
is obvious. It's a it's a truism that AI
needs more context in order to do well.
So, we should try and give it more
context. But I think the idea of
overdoing that and and oversp specking a
project is one to be careful of as well
as overstructuring our process based on
what we know about agents today because
then we'll end up with working on with a
a new kind of AIdriven process that
worked best with agents that came out in
2025 or 2026. You know, when we'll still
be using that in 2030 and that'll be a
pointless waste of time. So those are
the kind of concerns I have with it.
Any other questions?
>> Yeah, great talk. So uh you mentioned
that you don't like spec driven and you
use Ralph. So basically there is no
human in the loop. So the question
arises does clo actually need you there.
Where where is your input there?
>> Great question. Um so I've been thinking
about this quite a lot recently and
having a bit of an existential crisis. I
don't know about anyone else. Um but um
but yes, what what value am I adding
here to this thing? Um certainly not
with writing a Pomodoro timer. I'm not
sure I'm adding much value at all. I
mean, I literally said I oneshotted
those specs and and there was no point
there at all. I'm not saying that I
don't I don't like planning out a
system. What I'm what I'm interested in
at this point and I don't have the
answers is the fact that AI often will
pick better specs and and write better
specs than I can write and will often
have a better idea of the kinds of
direction my software should go in than
I necessarily will have. So I like the
idea of actually having Ralph loops that
create other Ralph loops um potentially
or having Ralph loops that track whole
um uh customer engagements or even whole
startups. So, I have a a skill that I'm
working on. Should I show this? I'm
going to show it. It'll be fine. What
could go wrong? Um, which is called
startup.
It's pretty ambitious, but the idea is
that it should um basically guide a
product through an entire startup
framework. Um, so it is it is meant to
be run as a loop. The idea is that it
Oh, I see you all taking pictures now.
Now I'm owning this thing.
>> Damn it. Um but with great thanks to Ash
Maru who writes some brilliant stuff on
this. I should say that for the for the
for the tape. Um so really really
helpful um to me. So I built this out of
um basically all of the cool books I
I've read about stuff. So I I'm I'm a
startup founder, co-founder CTO. So so
this is a near and dear to my heart. And
what I'm trying to do here is I'm trying
to give the AI enough context such that
it could run my startup for me and
potentially figure out what the next
most important thing to work is and then
do that in a loop, you know, and then it
then there's a big outer loop that runs
that says, "Okay, well, what's the next
most important thing to do? Let's do
that." Um, so it doesn't work, but it
it's it's interesting and it's getting
somewhere and um it will often um the
first thing it does I don't think I've
got it to show. Um but um Oh yeah, I
will show it because it is hilarious.
Hang on just a second. Um, there's a it
I asked it how it was doing uh on one of
its loops and it produced a a startup
update deck as an investor memo which
was I didn't even ask it to do this.
I'll show you the the demo. Hang on a
second. Um because it's absolutely
brilliant.
Uh let's see if I can just show this
window.
There we go. Air skills startup update.
Um, and so yeah, it said, "Yeah, I need
to give him an update." So what I did is
it said basically this is how far we've
got. These are the problems nobody has
solved. This is what we know that's
real. These are the number of this is a
skills management tool that I'm working
on in the background. Um, this is these
are all the kind of issues. And it came
up with all of this cool stuff that
could go into an investor deck. Um, to
be honest, that's not bad. It's it's not
a bad I think it's actually the GitHub
for AI skills, but there we go. And um,
you know, who's going to pay for this
thing? How much will they pay? Those
numbers are definitely not right. But um
but what's interesting is that it
decided that it wanted to do this uh and
figure out all of these numbers based on
based on this um which is I think was
hilarious and it was quite proud of this
deck to be honest and I had to kind of
be like hang on a minute we haven't you
there's some serious thinking you need
to do before you kind of go to that um
will will or pay for skills government
would your all pay for skills governance
great question um not sure yet so anyway
um the reason for showing that is more
to kind of point out that that AI can do
a heck of a lot and and it doesn't do
startups well yet, but that's probably
down to my skill file, not down to the
um agent itself. I I have a feeling that
there are an awful lot of things that
potentially will be um will be loops in
the future. Um I only got that far on my
slides. Oh my gosh. Um hang on a second.
Um so we've done that. We've done that.
If you are still if you're not just
listening to me and are still working on
this demo, I've got a couple of
challenges for you if you'd like to do
this. One is is um you could try
upgrading your ticket format. Um if you
like the raw markdown file, the doc
tickets is fine. If you wanted to just
type bd install or or install beads,
it's super easy to do that. And you
wanted to kind of get Ralph loop to work
with your beads, try that out. See if
that works. You know, no there's no
pressure on you to achieve anything in
this little folder. Uh beads is great
because it only works within your
folder. So um and it just installs a
little tool. So it's quite a useful
thing to to to try it on. So if you
wanted to try a different ticket format
or you wanted to kind of move this into
your main project and connect your your
Ralph loop to your um ticketing system
to see how that feels. Yeah, maybe not
submit tickets yet, but you know, you
potentially could you could try that and
see where that takes you. Um so that's
an option. The other is is the skill. Uh
you you're going to need to keep
upgrading and working on your loop. Uh
the loop basically contains all of the
knowhow about how you as a person will
go through that. You can take it all the
way through from do the next ticket and
you can take it all the way up to do the
next step in the world dominating
startup you're trying to build or
whatever it is. Um you know it works for
all of those kind of things. What's
super interesting about this is that I'm
I'm more and more convinced that
everything in fact is a loop. Maybe
maybe as an engineer I'm definitely on a
loop on and a lot of the work that I do.
Uh maybe as a project manager or or
project manager I'm on a loop. Maybe as
a CEO I'm on a loop. Who knows? Um maybe
uh certainly uh a lot of the kind of
cadences that I work on um run in loops
too. So um I have a skill and if you're
running an open claw bot, you're doing a
similar thing uh that that just runs a
heartbeat every 15 minutes. Um, it just,
um, on my, uh, VPS, it just fires up
Claude, checks a few things, checks my
calendar, see if I've got anything
happening, and, um, sends me telegram
messages. Um, maybe that's on a loop,
and that is, it's definitely on a loop.
It's 15 minutes. Um, I have a worker
loop, which I'll show you in a minute.
And I have a morning loop where every
morning at at 6:00 a.m. it comes up with
a full briefing of my day, figures out
exactly what I should be doing, um, and,
uh, just gives me all the information
that I need that's happened overnight,
all the emails that come in, all of that
stuff. Um the the worker loop is
particularly interesting because it
basically I'm not sure I can actually
show this. Uh let's see if I can find
find something that I can show.
Um
let's see.
No, the reason I can't is because it's
got a bunch of client information in it.
So I can't show you that. But um what I
can show you is for example this screen
I can show you. So if I quickly switch
to this
um
so this is an app I'll make that
slightly bigger. This is now how I run
my worker loop. So this is an app that I
wrote um uh to manage projects. So I
don't have tickets inside my my work
vault. I have project files and each of
the projects is a is a set of work that
I need to do. And then every so often I
have a I basically vibe coded a cambban
system and a worker will pick up and do
the next step on the project. So if the
next step on the project is writing an
email because it has a an overview or
it's checking things or uh it's
producing the slides for my project, it
will do the next step. So this for
example is the uh workshop uh prep spec
uh uh project that it's working on. And
it's it's got a bunch of front matter
that is just looking like that. Um and
um it's got some questions for me. I
haven't updated this. it needs to be
updated. It's got the context. It's got
a decision trail of things that it's
done and why it's done them. Um and so
basically for every different thing
that's happening um it it just is
figuring out the next one. Um it's also
got um notes on um other talks that I
might be giving um that didn't happen in
the end. Um it's got feedback from a
previous workshop I did on a similar
topic um which you can click on.
Actually, I can't click on that. There's
a bug. But um it will basically show the
notes from a feedback session. So so
this project pulls everything together
from all of the context you can find and
then does the next step in a loop. So
you can run everything in a loop. Uh you
can run all of your work in a loop. When
I wake up in the morning, normally I
have about 15 or 16 draft emails where
people have got back to me and it's had
a go at replying them replying to them.
I always have to edit them. They're
always okay. But um but it definitely
has a go at at getting getting on with
trying to schedule some of my work. I
have very specific rules about what it
can and can't do. My basic rule is is
this reversible without embarrassment to
me. And um if the answer is no, don't do
it. Um and but just make a little note
in the project and hand it back to me.
Um so sending emails is not allowed to
do. Creating a slide deck like for
example this one that's reversible. It
doesn't cause me any embarrassment. So
it just gone on and did it um and gave
it to me. Uh for example, uh it doesn't
post on LinkedIn for me. It doesn't um
it doesn't send emails. It doesn't send
messages. It uh but it does it does get
everything ready for me to review. To
your point earlier, which is a very long
answer to your question, um it um it has
caused me to genuinely question what I'm
good at and what I'm here for. Quite a
lot of the time I've got to a point
where I'm just the email person who just
checks emails and send check emails,
send check. That doesn't sound like a a
proper job. That doesn't feel good. So,
so therefore, what does that mean for my
work? And I've had to make a conscious
decision. Which bits of my work do I
want to do and which bits of my work
don't I want to do. I don't want to be
the email reviewer, but I do want to be
the strategist. I do want to be helping
organizations think through what on
earth is going on with AI and how to
kind of fix it for their organization.
Now, I could get AI to do a bad first
draft, but I don't want to be reviewing
AI's draft. I actually want to be doing
that thinking myself. So therefore, I
basically said, don't do any of that
work. I want to do that work. just give
me all the information I need and I'll
do the work because I enjoy that work
and I'm good at it. So, um, AI can do
all of the rubbish work, but it can't
and it shouldn't do the work that I'm
uniquely good at. But because everything
is a loop and Ralph, this is getting so
existential because Ralph loops are are
so um really everything and can be used
for everything. We have to start asking
hard questions about which of our bits
of work we actually want to do. What do
we want to do um out of this work? It's
not just about what AI can do or can't
do anymore.
Um, yes, there's a question. There's
like loads of questions, but let's go at
the back. I think your hand was up
first. I think the um chat's coming with
the mic.
>> Well, we just for the recording, it's
really helpful. Thank you. So, with the
open-ended tasks, yeah, how do you think
about sort of when to stop? So, do you
set KPIs up at the beginning or how do
how do you how do you know when it's
done?
>> Yeah, great question. Um, I ask it to so
again this is this comes down to if I
just go back to sorry different window
uh this one it comes down to upgrading
your loop and you have to basically tell
it when to stop. So, uh I don't just
have one Ralph loop file that works for
all of those different loops that I
showed you earlier. Um, I have different
ones for each. And for example, the
worker says, "When you get to a point
where you've either running out of
context or you've got to a point where
there's an irreversible thing to do,
then I want you to stop and and report."
And what report means is in this case
update the project file, which is just a
file in a repository with what where
you've got to and in a way that where
you present it to me for review. So, I'm
working quite hard at the moment about
about that kind of presentation step
because I I definitely don't want to be
reviewing a diff and just reading text I
find really difficult um just because
there's often a huge wall of it and it's
hard to pause. So, I'm getting it to
start giving me things step by step in
slide format. I'm trying to get it to
that interface I showed you before. You
can kind of see a little bit how I'm
trying to get it to show things in
different places in different ways uh to
to for it to um uh to get to uh what I
need to uniquely do next. I suppose so
it so to the answer your question it
depends on what you're doing. I think
the most important bit is that you
figure out what the edges are for
yourself and and and have that real you
know moment of you know what do I
actually want to be doing? How do I want
to be involved in this work here? not
just uh you know AI is helping me do my
work and a companion to me it's much
more now which bits do I not even need
to know about you know uh next question
there's one here there's a mic just at
the front somewhere I think yeah great
thank you
>> uh hi so since you brought up this topic
of our involvement right so at what
stage
we really get involved I mean you
mentioned you don't really review the
diff um I guess the most important part
of our work now is creating the tickets,
right? I mean, first identifying what
the most useful feature to implement is
and then
>> describing in a way that you foresee all
the different edge cases or um just
explain it the best way possible so that
the outcome is what you desired in the
first place.
>> Yeah.
>> What's your process of creating this
tickets? Yeah, great question. And uh
>> so like like my concern is sometimes you
don't really know yourself until you
start implementing. I mean the way we
used to do it, right? So during the
development process you encounter
certain um different cases where you
need like a custom logic you need to
it's just difficult to foresee these
things from the get go and do you like
iteratively improve the tickets and you
reimplement them or what's your process?
>> Yeah great question. So um in terms of
how I work there are two modes of work
that is done on my behalf. One is the
fully automatic work that we're talking
about here where the I would just get
something done. I don't need when I've
got a decent spec that I trust and and
there's a way of feeding back so that
the AI knows that it's good. I don't
need to be involved in that work. That
can just happen. For every other piece
of work I do, I have a uh I I work on it
with and in Claude code. So um I uh with
that um system I showed you earlier, um
in fact, I'll go back to it so I can
show you. Um
let me just change back to this window.
Where is it? There it is.
With this one, at the very bottom of any
of these kind of projects that it's
running for me, there's a little thing
here, which I this is just a vioded app
that I've written for me. Nobody else
has access to this. Um, it has a little
vCP command which if I take that and I
type this into a terminal window. Um, so
if I go back to this one for example, I
think this is okay.
Um,
yeah, I can't I can't easily show that
just because my internet is not going to
be able to connect to my VPS. But the
point is is that I'm able to go back to
um,
yeah, I'll go back to that.
The point is is that I'm able to um type
that in and just paste that into my VPS.
What that does is it starts a new clawed
code session. That session grabs knows
where to find that project and it pulls
in all of the project context. So rather
like loading a skill, it loads it as it
loads the project. It reads the entire
project file and knows where everything
else is. At that point, it's loaded in
everything it needs in order to
supercharge that session with me and
then we work together on it. So if I'm
for example to your point about specking
tickets, I'd have a ticket I don't know
probably a project for that particular
feature and then I would say okay load
in everything you know about that and it
would load them all in and then we would
work back and forth on specking out
those tickets and then the output would
be um whatever I needed to get done in
order to get that done. So if I'm
working on something that where I I want
to usefully and uniquely do that myself
that's when I would jump into a project
with claw code. And when I say do it
myself, I don't mean typing it myself or
I don't mean doing all the thinking. I
normally mean I get Claude to interview
me to ask questions so that I can uh
give it the information it needs to
formulate to do the writing because I
don't like doing the typing, but I get
it to pull the information out of me in
order to to get that work done. So those
are the two modes. It's the the back and
forth iterating and then it's the it
just the automatic stuff. And I should
also point out that I don't I don't like
reading diffs, but ultimately that's the
only way that you can review code. Um
when I'm when I'm reading a newsletter
item, I don't want to read the diff. I
want to read the newsletter. Whereas if
I'm reviewing code that's important um
that is for other people, then yes, I
read the diffs. I don't like doing it.
Nobody likes reading diffs, but I I
check to make sure it's working. And I
will I can't see myself not doing that
for a while, especially not with
security um like conscious code. Maybe
with mythos or just delegate it. That'd
be nice. Any other questions?
Yes, one here.
>> Um, how do you deal with context rot?
So, for example, uh your example where
you have a loop and it takes one task
after the other. Is it the same cloud
code session that takes all those tasks?
>> We'll have to experiment with that with
the with the slash loop command. Yes, it
is. Um, it's the same session. Um, I
when you run it as a kind of while loop
outside claw, then it's a different
session. um you have different
trade-offs uh with that. With the same
session, you have all the context of the
previous tickets and the previous
changes. That might be useful. In
practice, I've not found that so useful
because it can just pull the files as it
goes. Um if you're not typing anything
into to the session, you're not really
adding anything to that. So, there's
nothing really in there that's useful.
So, I tended I've tended in the past to
prefer starting a fresh context for each
new session. Um, but the loop is very
the slash loop is very easy to run and
it just works and especially opus is
very very good at long context
retrieval. So it's less much much less
of an issue.
>> Okay.
>> Uh, sorry. Yeah, there's a there's one
at the back as well. Is there a
microphone as well? Okay, great.
>> Um, are you reviewing sessions that are
done by by your loops or or are you just
reviewing diffs on on the GitHub? Great
question.
>> I don't allow any of my workers to close
a project. Um, so uh I I would always
say if you think you're done, tell me
what's finished and I will close I will
I will close that off. Um, so it could
it could be that there's a a big list of
completed things that I need to check
off check off for myself, but I I want
to be that kind of final step of
verification. The reason that I've added
that is because I worry that I'll miss
something. There's a thing uh that
someone coined recently called cognitive
debt uh which is the idea of just not
being up to speed with everything that
your codebase can do or or all of the
code in your codebase. And that that
worries me. So so I tend to want to to
at least understand how the code fits
together and and how or how the piece of
work that I'm working on fits together.
So I don't let AI get away with just
putting something, you know, out of my
sight without me having a chance to look
at it. Otherwise, I feel like I'd lose
track of what's happening.
Yeah, because I I mean uh for example, I
I'm using sessions to to track uh
tickets.
>> So, so instead of reviewing the code or
diffs in the code, I'm just reviewing
what the particular session was doing
>> and I even have like a marking system
which which session is on which status.
>> Yeah.
>> Is there any way how how you do it
similarly?
>> Um similar. Yeah, I think the sessions
and the status I I I think that can
work. Um, I I haven't tended to use
sessions like that. What I've tended to
do with sessions is I get Claude to
every night go through all of the
previous sessions that I've run that day
across all the machines I run. Claude,
it saves them all into a JSON file for
me. And then I get it to uh both figure
out how my system could improve uh and
also just what I did so that I haven't I
don't forget um what happened. And so it
writes a little paragraph for how much I
did. And um and I use that in order to
uh to kind of track work, but it's not
quite the same as one ticket per
session. I quite like the idea of having
like one context per session. I think
that's quite a nice idea and one sorry
one um like by per unit of work. I I
just haven't made that work. But that's
>> I I found it really useful because then
then I can go back to the particular
session when the particular thinking was
happening.
>> Yes. And I do do that for sometimes when
I've got a project that's running over
multiple sessions, I can go back to the
previous session. Um, instead of uh the
VCP command I showed you, I could type
VCR and it'll do the same thing. But um
in practice though I I like the
discipline of it having to pick up again
because it mean if it has to pick up
again from a fresh context it means that
all of the information that was in that
session has actually been codified into
other places that any claude code
session or human could find which means
that you end up with a a much more um I
guess richer kind of repository of
knowledge that you're working in. So, so
there's a question mark around if if
session if you if sessions are truly not
ephemeral and you've got them as a
store, are they accessible uh as future
context? If you treat them as ephemeral
and make sure you capture everything
within them into your repository anyway
or into documentation files or whatever,
I think that could be more powerful. So,
worth thinking through for sure.
Any other questions? Feels like we've
come a long way from just write this
ticket, but there we go. It's good. It's
all good. Yeah,
>> thank you so much for the talk. I have a
questions. It seems like uh in the loop
some of the steps might not be necessary
like you might go to the code and then
find nothing there. Y
>> and would you consider to optimize it
somehow or you just let the token burn?
>> No, just burn the tokens. They're not
that expensive. Depends what you're
doing. Um I think we're at the point I
should this is a whole another thing. Um
we're basically in the era of free
tokens right now. Um, you know, I I have
a max 20 subscription. Um, I definitely
use more than the average person
probably who is is paying for one of
those. Um, so I I think at this point I
would optimize for for freeing your own
time up as opposed to optimizing for for
burning a few more tokens. I don't think
tokens will ever get that expensive. I
think that the frontier models
potentially will be very expensive, but
we have really good uh cheaper or freer
alternatives just around the corner. Not
quite as good for the latest kind of
work that we're trying to do, but
they're really really good. So, um you
know, I think um uh there was at least
one that just came, the GLM one that
just came out looks really promising. Um
that's the ZAI one. I think it just came
out this week. Really, really
interesting. I'm still running Claude,
but that won't necessarily always be the
case. So, I think I I would just burn
them. I would, like I said at the very
beginning, um you know, I I spent a long
time doing the whole optimization thing
where I was doing this, if you weren't
here at the beginning, this thing, you
know, I spent a lot of time trying to to
screw around with with all of this, but
but ultimately, I just now let it run.
It's much simpler. I do get quite close
to the end of my max subscription
sometimes, though. I'm slight slightly
nervous about what that means. I have to
figure out how to get another account.
200.
>> Yeah, max to the $200 a month one. Yeah.
Yeah, I get pretty close to that every
week. I'm quite I'm about 80% now.
Getting the jitters. Um yeah, you had a
question. Do you want to bring the mic
back down? Is that okay? Thank you.
>> I'm not looking at that anymore.
>> Hello.
Is this uh I wanted to ask you about
Thank you so much for the presentation.
Yeah, sure.
>> About fine-tuning for for the prompt,
>> do you version it? Do you have data sets
that you use to fine-tune your entire
loop?
>> Uh, so in terms of um versioning the
Ralph loop specifically, like the prompt
for the Ralph loop.
>> Yeah.
>> So I use skills for that. So um as I
pointed out before um uh everything like
that goes into the skill and I get
Claude to write the skill for me. Um,
and that saves in either your your docu
skills folder within your project or it
goes into your home directory under
doclaude skills. Uh, I use GitHub to
version all of those for myself. I don't
think git is the right skills format for
this long term. I think we need a new
thing. Hence trying to build skills in
fact um which is this idea of trying to
um make skills much more portable and
sharable within teams which I'm trying
to figure out. So yes, I do um I do
version control them and I treat them as
quite important code and I don't
actually I do share some of them uh but
I don't share all of them routinely
because they there it's a lot of my own
IP in there and actually a lot of my
customers IP is in there too.
>> Yeah, the question was more with regards
to the performance of the prompts. Mh.
>> So um you were saying that in the
beginning you as you go along you
improve the prompts as you go along and
but are you versioning the that going
along and are you versioning the the
performance of the prompt overall?
>> So when you say prompt do you mean the
um the skill itself that I'm using?
>> Yes. Yes. Yes. Yes.
>> Yes. So yes. So the skill so the prompt
lives within the skill. So when I type
slashbug tracking or slashalph uh that
that is the prompt that that um that
gets written by claude um and managed by
claude which means that the um
uh that whole file is is is the prompt
and therefore that is version
controlled. So I I always I have a git
running within that setup and then every
time I change it I update um update. But
you but but you remain subjective how
you mention you have improved. Let's say
that you have your data set will be an
issue. I say that and the expected uh
output would be the new feature added to
the repo.
>> Yeah.
>> So you could
>> is it more how do I evaluate whether
it's any good or how
>> Yeah, exactly. How do you know you're
actually improving?
>> I see. So how do you know if you're
improving? That's a really good
question. Um I do um stress test my
skills. So with other skills uh and I
say you know is this skill any good?
Could you improve it? Could you write
it? Um, I do
I I spend quite a lot of my time
tinkering with my system and my skills,
probably more than I should. Um, I
think, um,
it's a bit subjective at the moment.
What I haven't done, and this would be a
really good exercise, is to try running,
um, blind testing where you would run a
set of tickets with one skill and a set
tickets with another. Ultimately,
because Claude is non-deterministic
anyway, I think there's a high level of
variability with any of those kinds of
tests. So, it's it's really difficult to
think about how to
to construct a useful test in that way
to know whether you're actually
improving or not. Um, in general, the
more context you give um into your
prompt, the better it will do up until a
point which isn't very easy and obvious
to figure out where it becomes worse.
So, um it's about kind of balancing that
ultimately. But yeah, I haven't done a
kind of objective improvement process. A
great question though. It's a question
just behind you.
How are you version controlling the
skills?
>> I'm using GitHub at the moment. Um I do
have a a product that I'm trying to
build which is I mean this thing here.
So if you want my skill by the way
that's what how you get it. Um it's a
project called air skills which you saw
a brief preview of earlier from my my uh
slide deck that my agent put together
for me. But the idea is that um you uh
can package and manage those skills as a
unit. So you can create skills for your
organization, you can create skill
bundles. you can um create a skill set
for your org that works for different
teams within your org. Um and then that
all gets versioned and updated for you
without everyone having to learn how to
use git and github. That's the idea. Um
it is a real pain at the moment. I found
it really really difficult to manage.
Just for myself even just putting a
skill on GitHub. Uh you know I can't
imagine anyone from there that's quite a
lot of friction for for a coder like me.
It's I can't imagine non-coders using
that. So so yeah trying to trying to
build this. Um, so yeah, run that
command on your machine. You'll have my
skill.
>> Um, sorry, there's a question just here
first and then go next. Yeah.
>> Um, how do you, you sort of touched on
this a little bit, but sort of around
the edges. How do you do knowledge
management? So, I guess, you know, I use
Claude for
why is my VPN not working? And then I
learn something and I want to record
that and then I'm like I've got some a
meeting with somebody with a
transcription and I have that somewhere
else and I've got a bit of code that I'm
writing and all all I've got all of
these different contexts but they're
sort of very disorganized. Do do you
have a way of thinking about how you
organize all of that?
>> Yeah. So I have a code directory and I
have a vault directory and those are the
two directories I work in. So the code
directory contains a few different
projects um that I work in a more
classic way. The vault directory is
where I do all of my other work and and
frankly I I mostly start working in
there even if I'm working on code and
just tell it where the code is. Uh
because the vault contains several
thousand files with all of the different
stuff that I have picked up learn um
worked on with Claude over the last
several years. Well, not with Claude for
that long, but you know what I mean. Um,
I started with uh Obsidian a long time
ago and I've been working on that vault
for for a long time. Um, and and with
Claude now it just works on that for me.
So when I do some research on how to fix
my VPN or whatever it is, it just saves
a file in there. I have some specific
rules for how to kind of structure and
manage that. Um, if you're interested
more, I haven't written a lot about
this, but I know Andre Kapathy's just
written about it using LM as a wiki.
That's a great article if you haven't
seen it already. Um I know there's a
Minjovich actually funnily enough has
done a thing on me palace yesterday.
That's another version of this. Um uh
you can you can use that too. There's
lots of different systems for that that
out there. The best way to get started
is it's markdown files in a file system
and use it like a wiki. So run obsidian
in one window and um claude in the other
and just kind of work with it and and
save things as you go. And so do you
have an agent that then structures and
puts those fault into folders or
something like that?
>> Yes. So it depends on your method. I use
the zetlecast approach which is the one
where you have one note per thought. So
any thought of all just goes into a flat
folder. Then I have a slash projects uh
thing which has all of the projects that
you saw um including one for this
presentation. Um which which is my kind
of unit of work for an agent that we
work on together. It does a lot and then
I do some and then it does some. Uh I
have um transcripts in there. all of the
the calls that I've ever recorded go in
there. Um, and I use a tool called
Leanne, uh, which is a command line
embeddings tool. So, it basically just
runs embeddings across the entire all of
the text in the repository, all of the
transcripts, all of the links I've ever
saved, including all of the content.
It's huge. Um, and um, then it can find
things usefully and easily in there. Um,
so you you the best time to start that
is today because it just takes years to
put together.
I need to write more about that. Any
other questions? Yes, one here.
>> You said you had friction while
versioning your skills. I've been using
skills only for last month, so I'm not
aware of this friction. Can you explain
what the friction is?
>> Um, I can I'm sure there are Has anyone
here had had any kind of friction with
my kind of managing and using skills
yet? Has anybody else? Yeah, quite a few
different people. So, yeah, it's it's a
it's an emerging thing. It's not
surprising you haven't experienced it
yet if you're not using it for very
long. What I found is that if you're
just using them on your own, creating a
file of skills and managing them is is
quite straightforward. Putting them in
GitHub is quite straightforward. It's
sim links and and GitHub repository.
It's fine. What where it becomes
difficult is how you share that. So, how
would you share a skill? Okay. Well, if
you want to use MPX skills, you have to
then put it in its own GitHub
repository. That feels quite heavy
weight just for one skill. I'd have to
have 50 of them in order to share all my
skills. So, that doesn't really work.
Then it's more like, okay, if I don't
want to do that, I just do I just send
them the skill file? Do I send them a
zip file? I mean, I can't think of a
better way of doing it. Um, do I have to
have a subm module in my skills folder
for every single Git repository I share
a skill with? It just doesn't make any
sense. So, I think I think the idea of
Claude has got some stuff in there
around plug-in marketplaces where you
can have a plugin which has a bunch of
skills. That's the best way, but then
you're versioning the plug-in, not the
skills. So, that's probably the most
seamless way. it just it just doesn't
work that well. Also, there's a
challenge around if you if somebody
contributes to your skill, do you want
their changes or not? It will depend on
what the contribution is. Are they local
to just them or are they um uh changes
that could be generally incorporated and
that depends on the skill and depends on
them. So, you have to then manage that.
So, do you run a backlog for each skill
where you have tickets to improve the
skills? Do you see do you see the kind
of I think these are all unsolved
problems. I'm trying to my contribution
is trying to solve some of those. But
these these are big problems we haven't
figured out yet.
Um other questions?
There's one right at the back. Um if
there's a mic that would be amazing.
Thank you.
One, two, one. Okay, it's working. Uh my
question is about how we can uh scale up
this approach with Ralph loop but like
for the actual production team like I I
don't know three engineers how to
coordinate how to cooperate do you have
any idea how we can organize it? Do you
have any experience?
>> That's a big question. How so just to
make sure I've understood it. How do you
scale this up so that you can coordinate
whole teams using this kind of looping
approach?
>> Is that Yeah.
with all the tickets and the skills.
Yeah.
>> 100%. Yeah, it's difficult. I think the
teams that are where I've seen this work
well is where they are proactive about
updating tickets. The great thing is if
you connect your ticketing system to the
AI, it's really good at updating it. So,
you should definitely do that. Um, make
sure that you claim the ticket and move
it into the doing column before it
starts work. Um, and make sure that
somebody else hasn't just done that
before you start work. Do you see what
I'm saying? That's really important to
avoid contention. Those have always been
issues with with bigger teams. Um, just
in the same way, a couple of
controversial things. Just in the same
way that Ralph loops work really well by
just doing one thing in a loop and quite
sequentially. Um, you know, it could
well be that the coordination overhead
in our teams is caused by the fact we've
got too many people in our teams and
maybe we should have smaller teams and
just more of them, right? So maybe maybe
if you're trying to get 10 people to
coordinate and using AI and Ralph loops
and all of that, that's just not going
to work. maybe you need three and maybe
that's the way to to kind of run that
project and then you split it down and
then you have another the other seven
people doing something else or or
whatever. Does that make sense? So, so
making the problem go away is the first
step and making sure that you're already
using your coordination mechanisms is
the second step. Um and then just try it
and and figure out what what the
bottlenecks are. Be really, you know, be
really good at retrospectives uh with
this stuff. I think retrospectives and
teams are often pretty anemic. It's like
what should we do less of? What should
we do more of? That's just a recipe for
the same, more of the same. Um, and just
changing tiny increments, which can be
good, but ultimately this requires a
radical rethink. So, be really conscious
in making sure that your retrospectives
are changing actual things about how you
actually work or or have space to try.
Let's just try using a RA flip on all of
our work for a week and see what
happens, you know? And if it doesn't
work after two days, that's fine, you
know. And then if you are someone here
who's in a leadership capacity and is
able to sponsor that kind of work, this
is what it means to try and move to AI.
If you want to transform your team, you
are going to have to sponsor these kind
of experiments and be okay with failure
because so I was speaking to the leaders
in here for a minute because it's going
to be messy and it's going to it's going
to fail a lot. But if you want real
transformation, that's the only way to
get it. You've got to give um your team
space to try a whole bunch of different
things. So give them air cover. Um so
yeah it's it that's a big and
complicated question. Um I think if
you're able to and have the agency to
just try it and see where it gets to.
There's a whole um uh separate thing
called uh theory of constraints which I
haven't talked about at all which is the
idea that within any team in any system
there is always a bottleneck. There's
always one bottleneck that's the big
bottleneck. If you don't work on that
one bottleneck, all of the other work
that you might do to optimize and
improve the system is pointless and
actually probably counterproductive. So
this is why some teams when using AI
tools and using advanced AI tools like
Ralph Loops, which is just, you know, AI
or you know what we're doing now, but
just on steroids, some teams when they
implement it actually go slower. Some
teams go amazingly fast, some teams go
slow. Why is that? it's because they're
not working on the constraint. The
constraint in those teams might be the
review process. If you or the release
process, if you release your code once a
month, um, and you're shipping 200 PRs,
not 20 in that release, how do you think
that's going to go, right? You know,
it's not going to go well. So, that's
why teams go slower because what they
need to do is fix their release process,
not their coding speed. Um, so always
fix the thing that is the biggest
bottleneck first. Then figure out where
the bottleneck moves in the system and
that's not predictable. It's random. So
you have to figure that out. Then move
and fix the next thing in the system.
For more on that, read the gold by Elio
Goldrat from 1984, no less. It's an
amazing It's amazing book. Um, one more.
Is there another question down here? Is
there another mic? Where's the mic?
>> I have a mic.
>> Oh, you've got a mic. Great. Keep going.
Um
since you were asking about like talking
about the constraints part this reminded
me like I'm part of the AI team and we
have an NI team and they write a lot of
microservices and it's in different
repositories.
>> Okay.
>> How do you deal with uh like coding now
since is it like one big monor repo or
is it like small small repos?
>> Um you you have to try it different ways
and see. Um, I don't think that the
the GitHub or sorry, Git architecture,
whether it's many repos or one repo
really matters. You can always start
your AI in um a folder above all of your
other repos and just get it to work. It
does a great job of that. So, that's
okay. I think the bigger question is
what are the coordination patterns
within your teams and your services?
Who's responsible for what? And how does
that change with with AI? I think that's
a more interesting challenge. The main
reason I'm asking this was because like
some of the microservices depend on
others
>> and then you have to release one of them
you need to release a tag and then
updating another and that's just
>> yeah I think what AI will do is it will
expose all of the places in which that
process is inefficient because it will
do everything faster which means that
you if you're seeing those bottlenecks
where you are getting dependencies
between your microservatives guess what
that's your biggest bottleneck um
therefore you fix it. So how do you fix
that bottleneck? Well, you might try
atomic release system or you might build
something that using claude that using a
ra loop that um figures out a way of um
uh coordinating releases across multiple
repos more successfully. I don't know.
But that's what you do. That's where you
work. Don't work on anything else until
that's fixed. If that's the bottleneck.
>> Yeah, there's a question here. Do you
want to pass the mic?
>> Yes.
>> I should say I'm kind of at the end of
the content. There's if you which
probably was was clear half an hour ago.
Um, the only other thing I I mean I had
a Q&A slide. If you are leaving, you're
welcome to leave, but you're welcome to
save for more questions. I would really
appreciate some feedback though. So,
this this QR code is the only thing that
I manually added to these slides. Um, if
you could just um fill that in, that
would be lovely and amazing. Thank you.
It's literally only three minutes, four
questions. Um, it just helps me to
improve and make sure that I do a good
job of these workshops going forward.
Um, that's also my LinkedIn. I post a
lot of content on there. Do um do
connect with me. you do um in case you
mean some of this stuff, disagree with
me. I love disagreement. I love it when
people say, "Surely Chris, that's nuts.
You shouldn't be doing that." Love those
kind of comments. Uh because it really
helps me to think and improve, which is
what I love to do and because all
because the Ralph Loops is doing all my
other work. So, got no got nothing else
to do.
Uh great. Thank you. Um so, I just
wanted to put that up there. Very happy
to continue answering questions though.
Um but if people wanted to drift away,
then that might be a good time. Go for
it. Um, my question is regarding multi-
aent orchestration tools. My I'm curious
if you've tried things like CVG's
Gasttown or yes,
>> there's another guy who does like MCP
agent mail.
>> Yeah, there's some really cool and
interesting stuff. I still think we're
in the wild west literally with Gasttown
and things like that, but we we don't
really know how how that's going to go.
I have tried Gasttown. I couldn't really
get it to work, but it was pretty early
on. Uh for me I feel like the the agent
orchestration side of things is I I
think that we over complicate things by
assuming that they need to be in
parallel. I quite like the idea of just
starting with a loop to start with. I
don't feel the need to um have my AI um
do lots of things at once before I can
get just get it to to to do one thing.
Well, it kind of goes back to the theory
of constraints thing again. Um, I don't
think speed of, you know, number of
tokens per second is the bottleneck. I
think that it's our ability to specify
what we want and review what the AI has
done. So, so if that's my bottleneck, I
don't want to introduce more agents. So,
I haven't spent lots of time with those
tools kind of for that reason. I feel
like they're solving a problem that not
many people have yet.
>> Yeah. So, so
>> hello. Yeah.
>> Yeah. Not just for speed, but for
example, I don't know if you've
experimented with MCP agent mail. So
agents can uh lock files and speak to
each other so they don't step on each
other's toes and you can use different
like cloud opus and codecs work on the
same project so you get different brains
working on the same project.
>> Nice. Yeah. No, I haven't I think I've
heard of it but I haven't tried it.
Sounds like a super interesting idea
rather like someone mentioned earlier
about sub aents trying to look at things
from a different perspective. I've had a
lot of value for with doing that. I
mentioned earlier my simulate audience
um approach which takes um ultimately
the way the way that it works by the way
is it takes like um uh transcripts and
also survey responses on my website and
it it creates personas and has those
personas think differently in parallel
sub aents to take fresh looks at content
from different perspectives. Um so that
whole idea of having it um having two
different things and two different
models as well in that instance is a
super interesting one. I think we'll see
a lot more of that. I can see a lot of
value in it. We definitely know that a
they're pretty good at agreeing with
themselves and you often get better a
better contrarian take if you throw away
the context and look at it again. So
great principle. I think the the tooling
is still super early which we all know
but they're interesting ideas for sure.
>> There's a question just behind you.
>> Yeah, just keep going. They all turn on.
>> So great talk by the way. Thank you. Um
how much importance do you put on? So in
terms of phases of how you develop
you're spending a lot of time building
out the context, creating the tickets
and you have a system to run them in
sequence
gets pushed up. How much do you focus or
emphasize on CI/CD running automated
tests linting? Um does that give you the
confidence to reduce the amount of code
you're reviewing?
>> Absolutely. Uh well yes and no. It
depends what the code is. I think
firstly I think CI/CD good testing is
absolutely essential. Linting and all of
those things. If you want an AI to do a
good job for you, why wouldn't you give
it those tools to to help it do a good
job for you, right? Just the same way
that humans do much better when they
have linting and CI/CD and good tests.
It's exactly the same. It's the same
with clean code bases. You know, um you
know, it's worth doing all of that work
to make an AI do well. So, so that does
give me more confidence in what I'm
doing. The challenge is if the AI writes
the tests then and also writes the code
then there's a good chance that it's got
something wrong about what you're trying
to build. So often it doesn't make it
doesn't make kind of obvious mistakes.
The things that it gets wrong is it just
completely misunderstands a feature,
builds and says, "Yep, that's fine." And
then ships it and then I'm like, "Oh my
goodness, I don't I don't quite let it
ship all the things." Only only with
pre-release projects do I do that. But
um but yes um it does give me confidence
in in knowing that the thing uh is is I
guess uh functionally acceptable for
release or releasable. What I I still
want to read the diffs because I still
don't trust an AI with security. Um so
um I don't know if I've lost uh lost the
screen. There we go. Thank you. Um I
think my computer went to sleep. I I I
just don't quite trust the AI not to
lose my customers data and I just won't
compromise on that. So I will read the
disc because I don't want to be
responsible for that. Um it doesn't feel
uh it doesn't feel like it feels like
there are some problems for which you
can trust AI fully like for example um
linting uh testing. There are some
problems which you can't really trust AI
just because it's not responsible to do
so. So maybe specific changes around
security. If you're running production
database migrations, you should probably
check that they worked before running
them in production. Um, and there are
some that are a bit more fuzzy and hazy.
So, uh, UI testing is quite interesting
early. You know, the idea of having a a
great feedback mechanism. If you're able
to get an AI to click through your um
project to check that it works, that's
really powerful. It works 50% of the
time in my experience, but it can be
quite useful at least to have a first go
at it. Uh having good endto-end tests is
actually really helpful if you're using
playright or something like that which
it maintains for um for the skills
project I showed you. Uh I have some
very comprehensive end toend tests that
set up full file systems of skills and
get the AI to create two different um
personas with running each you know a
publisher and a creator and it just
checks all the files are in the right
places and that's really really useful
um for those kind of full end to end
tests. Um equally um uh if you're able
to to build those kind of feedback
mechanisms and give the AI a chance to
um to really know whether it's done well
that that's a brilliant place to be. And
so I'm always looking to figure out how
if an AI could tell whether something
was good or not rather than me. And when
I'm able to take myself out of that
loop, it just massively improves the
whole process. It's not always possible
or desirable, but as much as I can, I
do.
You are designing the feedback uh
process though you're you're
>> you're deciding on the criteria and then
letting AI execute on top of it.
>> Yes. Um if if I'm working in a team of
one, yes. If I am working with a product
owner or product manager u or a
designer, I'm really interested in in
utilizing their skills and testers as
well to to figure out ways to design
those pro. This is what they this is the
value that they bring to these
processes, right? is how just in the
same way that coders are thinking how
could we avoid doing the typing
ourselves now uh what about um if you're
a product manager how do you um get the
AI to do the easy stuff so that you
don't have to do it you know how do you
go through that process um same with
testers uh super interesting area of
research
>> two things that come to mind on this
have you what works well for in a small
team what works what worked well for me
is setting up adversarial reviews where
you have spec
The dev agent goes develops a reviewer
that does an adversarial review. You
pass that context back. The dev agent
iterates generally catches a lot of
things increases the amount of
confidence I have to ship it. But even
with that, the with the rate at which
you can create specs and how much code
you actually have to review, I end up
being the bottleneck in the review
process still.
>> Yeah. Have you I'm always I'm always a
bottleneck. I have 30 different things
that I need to now review that my AI has
done overnight or something and I'm just
like oh my gosh, you know, and and the
challenge for me is that a lot of that
is not work that I should be doing. The
only reason that it's given it to me is
because I can't trust the AI with it,
but any human could do that kind of
work. So I'm now like, do I hire humans
to just do the boring work? Is that
ethical? You know, this is kind of an
interest really interesting questions um
for us to think through, but you're
right. If we're able to design system, I
love your adversarial point to to builds
on something else somebody else was
saying. Um if we're able to do that and
design these systems such that we don't
have to be in the loop, I think that is
better for all of us because I don't
just want to give a human a terrible
job.
>> Till then, we're employed. So,
>> yeah, I guess.
>> Thank you.
>> No worries. Any other questions?
Should we call it there? Folks, it's
been a pleasure hanging out with you.
Um, really, really fun.