AIE Europe Keynotes & OpenClaw ft Deepmind, OpenAI, Vercel, @pragmaticengineer , @mattpocockuk

Channel: aiDotEngineer

Published at: 2026-04-09

YouTube video id: O_IMsEg91g8

Source: https://www.youtube.com/watch?v=O_IMsEg91g8

Hey, hey, hey, hey,
hey, hey, hey, hey.
Hey,
hey,
hey.
Yeah.
Hey,
hey,
hey.
Heat.
Hey, Heat.
Hey. Hey. Hey.
Heat.
Heat.
Heat. Heat.
Heat. Heat.
There is no good.
There is no pure
only fores
out of
You called it virtue
because it felt light. You called it
evil when it cut too deep. But nothing
here was born to save you. Nothing here
was built to care. Every function has a
shadow. Every strength a road somewhere
pushing for
break the bra
you
know God is good no force is
Every system fails when it leaves. There
is no virtue. There is no blame.
Holy bless
by name.
rocks when left alone. Chaos eats what
it sets free. Steriliz the world. Fever
burned it down for me. You crown the
side that served you. You fed it until
it grew. Now it
demands the rest of you.
This is not a fall from grace.
This is
beyond.
No. God is good. No.
Every system fails when it
break it with rebellion. You don't save
it with you load it carries and it fells
beneath your feet. You ask for purity
in the conrain world.
That was the error.
Necessary.
>> Destructive.
>> Too long. Corrupted.
>> Necessary.
>> Destructive. Out of balance.
No. God is good. No force is clean. This
is the narrow line between survival.
There is no mercy only.
Not good, not evil.
A light
or unstable.
We arrived inside a world already.
Systems in motion before we could name
them. Languages layered into every
interface. Rules embedded so deeply they
felt like physics. Secrets beneath
cities, protocols beneath conversation,
invisible architecture shaping what we
see, what we try, what we believe is
even
possible. We inherited the defaults,
called them reality, inherited the
limits, called them truth. But every
system around us, every product,
platform, institution, conraint was
imagined once by someone operating with
less context than we have now. Every
tool we use was once a decision. Every
standard we follow was once a guess. And
somewhere along the way, we stopped
asking who wrote this? Why does it work
this way? What would I build instead?
We adapted. We optimized. We learned to
move within the
but were never fixed. We were always
builders, not
users.
Builders. This world is not given. It is
made. And what is made can be remade. No
permission, no waiting, no approval
coming. If it exists, we can change it.
If it's broken, we can replace it. We
don't inherit the future. We build it.
Nature sets the bounds,
but everything else is choice.
Every system is a draft. every interface
of decision,
every limit, a question.
So the only thing that matters now
is what we decide to build next.
Welcome to AI Engineer Europe.
Ladies and gentlemen, please join me in
welcoming to the stage your MC for AI
Engineer Europe 2026,
Phil Hawksworth.
Good morning. Hello.
Good morning. Wow, what a lovely full Is
it a full room? Goodness me, it is a
full room. Welcome everyone. Welcome to
sunny London. Lots of smiley faces here
this morning. I think it helps that the
sun is shining, right? That helps us a
little bit. Come to London, we said,
because we'll be guaranteed sunny
weather. We lucked out. Well, good
morning and welcome to AI Engineer
Europe. Our first suare into Europe with
this event. My name is Phil Hawksworth
and I'm very very happy to be your MC
for just today. You have Téja tomorrow.
Uh but uh but today uh you have me and
it's my my great privilege. Um, it's a
full day. I don't know if you've looked
at the schedule. I'm sure you have. I'm
sure you've been looking at that with a
great deal of eagerness. It is a very,
very jam-packed day. And it's a
jam-packed room. I was going to say
squeeze in to fill in the spaces, but
it's like a jelly mold in here. You guys
have uh have really filled the spaces
beautifully. So, uh, so so welcome. Um,
so this is our first time here in
London. Um, I'm curious before we get
rolling. Uh, where have people have
people come from? people from London
here today.
>> Okay.
>> One one of you is delighted about being
from London. The rest of you are kind of
unsure, but okay. Uh elsewhere in the
UK, people from the north of England.
That's fine. I'm not I'm not trying to
out anyone. That's fine. One one person
came down from the north. Uh people from
the south of England.
It's it's interesting gauging the
different levels of enthusiasm. Uh
difficult to know also where the north
south divide is. Uh people from uh from
just around England. People from England
here.
>> Okay, we're we're starting to understand
the levels of enthusiasm for where we
come from. People from uh elsewhere in
the UK.
>> Yeah,
>> there we go. Where's that enthusiastic
person from?
>> Wales.
>> Wales.
I had to ask. I had to ask. Of course,
the deep, rich, thunderous voice was
from Wales. uh from wider elsewhere in
Europe, people come from further a
field, people from elsewhere in Europe.
>> There we are. There we are. Okay. Does
people from the UK? That's how you do
it, by the way. That's how you do it. Um
I'm curious who who thinks they've come
the furthest. Anyone wants a opening bid
for who's come the furthest? Shout out
where you come from if you think you
might have come the furthest.
>> Canada.
>> Canada. Okay. Good luck everyone. Anyone
Anyone want to be Which Which coast of
Canada? We're talking east or west.
Whereabouts?
Whereabouts? East.
>> Ottawa. Okay. Which is on the east
coast? I know.
>> Uh, anyone prepared to try and beat that
bid?
>> Sri Lanka.
>> Where? Sri Lanka.
>> Okay. I'm I'm going to start needing to
get Google Maps out to try and look at
the exams. Okay. So, we've Safe to say,
welcome. Thank you for coming all that
way. Um, safe to say.
>> I'm sorry, where?
>> Dominican Republic.
>> Dominican Republic. Yeah, you came here
for the sunshine. Well done.
Well done. I'm so glad it isn't gray and
miserable here. It never is in London
ever. Um well, thank you for coming all
of those all of that distance from all
of those places. Um we're very happy to
have you here. Welcome to all of you.
And you know, why do people travel so
far for events like this? Why do we
gather? Um it's kind of an interesting
time. I think our industry is changing
all the time. I think it attracts people
who have a thirst for knowledge. I
suspect all of you here have got a real
thirst for knowledge. Our industry is
changing all the time and I mean I
remember when I first got started I was
putting numbers on screens for traders
to trade off uh on the stock exchange
and using very antiquated old
technologies for that using kind of
video switching and then web
technologies came in and we started
doing various bits of like DOM scripting
with very sophisticated Java applets
behind the scenes. uh and then you know
there was Ajax and there's websockets
technologies were always changing you
know I've worked at software houses I've
worked at agencies I worked at telecom
companies and all the time the flavors
of web development and software
engineering I was using were changing
and I have to admit you know so I'm a
bit nervous standing here in front of
you and admitting this but I was very
very reticent about adopting the kind of
surge in AI tooling and the kind of
surge in what we're seeing now I was
really cautious about it I'd kind of
like to think, you know, I I know best
when it comes to craft and touching the
codes and the engineering. I was very
very cautious about actually putting my
trust in the emerging tools and emerging
ways that we're we're building with
these new tools. But I'm becoming more
and more optimistic and more and more
kind of cautiously optimistic, but very
very excited about the possibilities
now. And it's because of the kind of
work that's been done by the people who
are on this stage and people in this
room and how we're talking and sharing
this kind of growing knowledge about how
we can build with these incredible
tools. And so my fear I think in the
initial thoughts of well my craft is
kind of going away that that fear is
starting to subside now. And I'm just
seeing that the craft isn't going away.
It's just moving to a different part in
the abstraction. So, I think with that
in mind, I'm really excited that I can
be here and I can absorb all this
because this is really the place to to
soak up all of these new developments in
the world of AI, understand how we can
use them better, how we can further the
craft of software engineering and as
you'll see throughout the day, how that
is definitely changing. Um, so this is
definitely the place to be for that and
we're very fortunate that we've got the
support of some wonderful sponsors. uh
Google DeepMind uh are today's
presenting sponsor. So our our thanks to
them. Um also thanks to our platinum
sponsors who are brain trust, work and
open AI and by cri's incredible support
from such a wide range of folks. You
know the gold and silver sponsors uh
here. We thank them as well. So many of
the people from all of these companies
you've seen are here at this event.
They're out in the expo hall. They're
around. They're available to talk to.
Grab them. make the most of the
opportunity and have some great
conversations with them. Um, so before
we get on, I'd like a quick round of
applause for all of our wonderful
sponsors, please.
So, we're going to get started in just a
second, but just a couple of quick
comments about the format of the day and
how the day is going to work. So, um,
we're squeezed in here together in this
lovely room, uh, which is where the
keynotes are going to happen. So, we
have uh a number of uh keynote talks uh
kicking off in just a moment right here.
And then we'll have a short break so you
can get to go and talk to some folks,
talk to each other, make new friends uh
out in the expo hall and around the
event. Uh and then we're going to split
into six different tracks. So, we have a
number of tracks that have different
specialisms and topics uh of areas of
focus uh where you can choose one of
them right in here and others around the
venue. The details of that are in the
schedule and we'll have a little recap
of that before we uh break. uh a bit
later on. Um so um I think we're almost
ready to start. I think you're probably
itching to get started and it's it's
kind of exciting because I think the the
the growth of this industry and the
growth of adoption and just the ground
swell in all of these these technologies
and and businesses being formed around
this these technologies is just
exploding. I mean, and that's that's
been borne out as well by the the growth
in this particular event, you know, from
being in the US originally and now
branching out into Europe and having
such an amazing full room of people and
of just a bustling communities of people
out here. So, uh I know that uh the team
at AI engineer wanted to say a quick
hello and a six a quick thank you and a
bit of a welcome before we get started.
So I'm going to hand over the reigns
very very quickly uh uh to them for that
and then we'll get into the talks for
the day. So uh as we get started I hope
you have a wonderful day today. I'll be
back. We'll talk more during the course
of the day. Uh but to get us rolling uh
is the general manager of AI engineer.
Please welcome to the stage Liam
McBride. Here she comes.
Good morning everyone and welcome. Um so
you're going to be able to tell from my
accent very quickly I'm actually
Scottish. So this conference for me is
very very special. Um I spent most of my
career in London over 15 years right at
the heart of European tech growth. Um
and so today is very special for us to
be bringing our first European
conference here. Um this is our sixth
conference now. Um obviously our first
in London and honestly the energy around
it has completely exceeded all of our
expectations. So thank you. Um AI
engineers actually grown at a rate of
900% over the last three years um from
our first small event in 2023 in San
Francisco. Uh we did three events last
year hosting almost 5,000 attendees.
We're just getting started as you can
see. Um, as well as our core events, we
also have a series of licensed events
that we do, our partner events, um,
across different cities as well. This
year, we're doing Miami, Singapore,
Melbourne, and Paris. So, if any of you
happen to be in those areas, check out
those dates on the website. Um, but yes,
London was our top priority this year um
because really it is some of the most
important work in AI is being done right
here and of course in uh in Europe. Uh
just this morning we myself and Swix
were at 10 Downing Street um with uh
Kalpier Sohi who is the first ever UK
chief AI engineer officer who was
appointed in January. Um and what was
very clear from the conversation in that
room is the ambition of the UK um is
accelerating very fast. Um and that
creates huge opportunity for everyone in
this room um to do very meaningful and
careerdefining work in the very near
future. So over the next few days, I
encourage you to do two things. First,
learn, go deep, ask questions, challenge
ideas. Um, and second, connect. Speak to
each other. Speak to our sponsors, our
speakers. Um, and just make connections.
Um, that will create opportunities and
progress. Um, because this community
only works when we're open,
collaborative, and practical. So, enjoy
this experience together. Uh, paving the
way of the future of AI. Um, welcome to
AI Engineer Europe. Feel free to come
say hello at any point. I'll be here for
the next couple of days. Enjoy everyone.
Thank you.
Our first speaker draws on over 25 years
of software engineering experience from
his time at Google and now Versel. He
will explore what it means to build
infrastructure and applications in a
world where agents are both the builders
and users of software. Please join me in
welcoming to the stage the CTO of
Versel, Malté Uel.
Good morning everyone. This is awesome.
I'm so glad to be here. Welcome to the
first ever AI engineer conference in
Europe. Um, my name is Malta and I'm the
CTO of Forcel.
Now, this is not, you know, usually I
give technical talks, but I thought
because I'm apparently going first that
I need to give a proper keynote, but I
did want to feature what I call my vibe
coding uh stack. Uh, I've been hacking
on a thing called chat SDK, which is a
way to hook your agents to whatever like
Slack, Telegram, WhatsApp chat app you
like. And I've been working on Just
Bash, which is a bash interpreter
written in Typescript that gives you
something like a sandbox with zip
nanoscond startup time um for your
agents because they love bash.
All right,
one thing I wanted to mention is that
the reason why I'm so excited to be here
is that I used to run a little
conference in Berlin called JSCOM for
you and I feel at once in my life I had
completely impeccable timing because it
was the summer of 2019 and we decided
after 10 years it was enough and we went
out with a bang and the reason why this
was such great timing was that there
just wouldn't have been a conference one
year later because of co
But also when we decided that we would
you know step away we were hoping that
someone else would take the reigns and
again that did not happen because of co
so it's now been more than half a decade
and I'm very excited things are finally
starting up again
but it was also clear that it wasn't
going to be like a web development
conference that would really bring the
tech community in Europe back together
in 2026 right in many ways I think AI
engineering is the legit legitimate
successor to web development as a really
mainstream discipline of engineering
that will shape the next decade of
software development as you know soft
engineering itself faces an
unprecedented disruption.
So you're definitely in the right place
today and it's more important than ever
to come together as a community and
reflect on both our profession as
software engineers and AI engineers.
And that's because we're facing a
disruption of both how we build which is
with AI and what we build which is AI
and agents.
And of course disruption can sometimes
lead to anxiety. In fact, I really
actually very often get asked, "Hey,
Malta, is there still a place for
engineers in the future?" And what about
that next generation of engineers?
And I couldn't be more convinced that
the answer is yes.
I often give this example of like
envision me doing a Tik Tok video. They
mentioned in the intro that I have 25
years of experience, which is actually
substantial under uh statement. And so I
would not be good at a Tik Tok video. I
should not be recording Tik Tok videos
because I didn't grow up at this. Right?
And in a very similar way, the next
generation of junior engineers are going
to be so much better at this discipline
because they get molded in the AI world
just like all of you are.
But it's not only the kids that are
going to be all right. We'll all be
fine. And this is why
one of our main see this is that agents
are a new kind of software.
because there was always all this stuff
we wanted to automate, but not all of it
was economically viable to do with
traditional software, but it is with
agents. And what that means is there
will be just so much more software in
the future. Indulge me with a van
diagram. Um, maybe the circle should be
better because the circle represents all
software that should exist. Imagine all
software that should exist.
The problem was that we couldn't write
all of it because it was too expensive
using traditional maxis. Like you can
envision like all these things where
like have all this if statements, you
have all this like knowledge about the
the business like you have to figure it
all out. You have to hardcode into the
application. So much of this software
you just would never write because it
would be obviously uh too expensive.
But now with agents that part of
software becomes economically viable. I
can build it now um with not that much
much effort right and that means that
with AI agents essentially all the
software maybe not all of it we'll find
more in the future but like that circle
gets filled out right all of this stuff
that should be automatable is
automatable there's going to be so much
more software out there
and in a similar line more and more
companies when they ask that question
whether they should buy some software
like a SAS or make some software
themselves. They're answering that with
the make side, right? Over in Silicon
Valley where I live today, we are
talking a lot about the SAS apocalypse.
I think it's what it's called, right?
People like make their own stuff and
they don't buy the SAS software anymore.
I actually think the SAS companies will
be all right, you know, don't worry
about them. But as engineers ourselves,
more companies making more software
again leads to us having more work even
if it's faster.
And in fact, the way I've been kind of
framing this for a while is that we are
speedr running what's really an
experiment in economics of how elastic
the software market is. The thesis being
that the cheaper it is to make software,
the more software we're going to make,
right?
And as a consequence, what's actually
happening is that demand for software
engineers is going up. Now, we don't
know like, you know, like there's going
to be an S-curve, you know, but there's
no signs of of of us reaching the
S-curve. In fact, because we're getting
better at agents, etc. There's so much
leeway in the future. Um, I think we'll
be all right.
So, as AI engineers, it's our job to
build that next application layer.
And of course, what that actually means
is building agents, right? I wanted to
spend some time talking to talking about
archetypes of agents that I'm seeing
actually being built today, actually
being effective, actually something you
can do today without, you know, having
to make major changes. I think we're all
a little bit drunk on the coding agents
because they're so great, right? They
work so well and it seems so obvious
that you can translate that to all other
domains and and sometimes these things
don't go so well, right?
But the thing is that we don't really
have to be doing the most advanced
agents you could possibly imagine.
There's just so much lowhanging fruit to
be be done where you can really really
help companies save them millions or
billions of dollars without actually you
know making these massive changes or
processes that in practice will always
take a long time fail often etc. So this
is what I'm actually saying in the wild.
The first part when you think about what
agents you can think build is people
think well agents that rings a bell I
have team of support agents maybe I can
automate part of that right and that's
also where kind of the first generation
of what we call agent as a service you
can make that acronym in in your head um
startups are are shipping right like you
know the the CRS and decagons of this
world um but more generally I think it's
worth asking yourself in Is there a
business?
Is there a job where it would be quite
transformative if if that you know we
went from a 9 to5 thing because you know
people need to sleep and I can actually
do it 24/7 because agents don't need to
sleep right and I think there's there's
many places for that. The next one is
probably actually even more important.
Um I call call it compress the research
because every business has a certain
type of business process that in a very
abstract fashion has the following
shape.
there's some business event, then you
have to do some research and then you
make a human decision, right? And you
can just build an agent that does the
research phase automatically and that's
all you do. That's all you ship, right?
And the important part why this is like
such an easy thing to ship is because
the process is still the same. There's
still that business event. There's
another research and there's a human
decision. the research goes faster and
you know maybe it goes from something
that took a human 30 minutes now they
can do the same thing in five minutes
and if you run that process 100,000
times a year you just save the company a
whole lot of money but you didn't
increase the risk profile and you didn't
have to change the process at Versell we
actually have like at least two agents
of this shape when you go to versell.com
and you hit the contact sales button
that message actually goes to an agent
right and I hear about 75% of the time
that agent says, "Well, actually, they
just wanted support and hand it over to
the to the support team." But then in
the other case, it will go, "Oh, that's
interesting. Um, let me check out their
LinkedIn. Let me Google their company.
Let me figure out how large they are.
Let me route it to the right person."
Right? And then there's a human
eventually taking a look that it makes
sense, but that obviously was something
that took maybe a person 15 minutes
before and now they don't have to do it
anymore.
And another example is exactly the same
process. If you sent us a abuse report
again, there's an agent taking a look.
Is that website abusive?
What what should we do? Right? Still
obviously the decision in the end should
be done by by the actual professional,
but they don't have to like do all this
research themselves anymore.
Next is what I think is probably the
most magical thing that you can do in
any company today, which is to surface
information that already exists.
It's extremely common that there's
information somewhere in the company,
right? But for all intents and purposes,
you cannot practically use it.
Take for example, everyone, you're all
engineers, you have issue trackers,
right? So,
is it up to date? Probably not all the
time, right? Could it be up to date?
Like, does the information exist? Did
you slack it? Did you have a granola
recording that technically contains the
information that could update your issue
tracker? Yes. Right? Like probably yes.
And so, you can, you know, build an
agent that does this for your company,
right? Whenever you have like a manager
saying, "Well, give me a list of
updates, right? Why don't they already
have the updates, right? why doesn't an
agent have already kind of done that
research already? Um so again this just
makes takes advantage of existing
information which is so powerful.
And finally for the last big category um
there's a magical question that you can
do to figure out agents you should build
in your company which is to ask folks
what do you hate most about your job
and I actually have a case study about
this in at Verscell. So, we actually did
build our own in-house support agent and
it has what's called a 90% deflection
rate. So, 90% of the time it just helps
the person in real time rather than
going down somewhere else. And what
happened? The job satisfaction rate on
our support team exploded.
Why? Because they no longer have to do
the boring stuff, right? Oh, my credit
card got rejected, blah blah blah.
Right? now they get to actually go and
figure out actually interesting cases
actually help people who really need
help rather than doing all the toil
right so that's like I think elimining
boring work is a very noble mission that
we should all kind of strive to do for
the companies that we work for
cool so clearly that new application
layer are agents but we also have to
shift uh we have to consider another
shift that the software itself is going
to be used by agents now, right? And
you know, I work in software
development, developer tools, etc. And I
think we're kind of ahead of the game
here, speedr runninging that
transformation. Um, what I will share
though is that, you know, on our own web
properties, humans are actually now on
the minority. So, in the last seven
days, and we have not shared this
before, over 60% of page views on
versell.com were AI agents. And in a
similar way, we're seeing the way you
use our platform going from people
clicking around in the dashboard to uh
usage shifting to our API and CLI. So
whenever I now, you know, have someone
proposing a feature to me and they show
me like a UI, I'm like guys, what's the
CLI? Like how do you do how do I
automate this? How do how does an agent
use this? You know, you know, UI is now
something that's so cheap.
The other thing that we're observing is
that kind of the relationship changes
between software development, software
developers and infrastructure, right? If
I didn't write the code myself, I also
don't have maybe as strong feelings
about how that stuff runs in production,
right? And so for a company like us,
it's really important that we shift how
you deploy infrastructure to a model
where most of the software was written
by agent and you know has to just run
and people like expected to run just
like they prompted the agent to do the
work.
And finally, and nobody here obviously
is surprised about this, the
applications themselves are, you know,
they're agent and that requires us to
have different infrastructure available,
right? Everyone's now shipping
sandboxes. I think it's almost a meme.
Um, I was mentioning earlier that I
created this thing called just bash and
I'm really interested in kind of this
innovation of how you can give an agent
a computer maybe without giving them a
computer. There's lots of interesting
stuff there in the market. Um, and I'm
I'm sure this conference is going to
have lots of stuff there as well.
And then also more broadly, again, it
was mentioned, I've been here for for a
while, like we're we're like marching
headon into a security nightmare. It
almost feels like a little bit like 1999
where really everything can be hacked,
right? And we just didn't know how to
make something secure. Um, I think we'll
have a root awakening, but what that
really means is that we have to be
open-minded for for how to change
things. Uh, I will give one example. Um,
I think almost all currently popular
agent harnesses have fundamentally the
wrong architecture and that is that they
combine where the harness runs with
where the code that it runs uh that it
generates runs. Right? Um, as of
actually yesterday, I I did see that
Anthropic
agrees with that thesis because they
they on the new agent product, they do
have that separation and it's really
really key. And that's really just also
a point that that these are all solvable
problems. But my main message today is
that we are still in the very early
innings and we have to be prepared to be
open-minded about like paradigm shift
happening in the future, right? We just
have the paradigm shift of agents being
kind of these like very general sandbox
using things. In the future, we will see
more of those paradigm shifts.
Cool. Um, last point I want to make is
that this new application layer that
we're building can thrive independent of
the models, right? Because sometimes
model X is better, sometimes model Y is
better, but we are as AI engineers
building a stable layer on top. And one
of the very interesting consequences is
that we don't have to work at an lab to
drive AI innovation.
In fact, and I think this is almost like
a narrative violation.
Europe is the leader in AI engineering
innovation, right? Um our own AI SDK
which itself makes it now over 10
million a week and it's led by Las GML
who lives in Berlin, right? Um is
working on this. Then there's obviously
pi the coding agent u made in Austria.
You'll be hearing from Mario about it
tomorrow. And of course probably some of
you have heard of it. There's a little
thing called openclaw. Um and Peter will
be on stage here in an hour.
And so it appears to me that Europe
against all odds is taking actually a
leadership role in AI engineering. But
we also have to be a realistic right
like Europe isn't going to play a major
role on the model side. But I don't
think it needs to. In fact, I do see
kind of two big futures ahead of us. One
is where the big model labs win in that
world. AI will stay very expensive. All
the value of all that cool agents tech
will acrue to that company and we won't
really be AI engineers anymore, right?
We'll be like forward deployed engineers
who whoever the winner is, if it's open
AI, Anthropic or Google.
But I don't think that's very likely.
And I think what's actually going on is
that the opposite is happening. The
model companies are commoditizing. Cloud
is amazing. Codex is amazing. Google
will catch up. And importantly,
I'll give them props now because I think
Google's playing an amazing role here
because they have the cheapest
infrastructure on the on the cost side.
And so in that commoditized world, they
will always decide to make it cheaper,
right? And that will keep the price flow
where it should be which is very low.
And that's the outcome that we want
right because in that world we the
engineers are the powerful ones. Our
agents are the one that actually create
the business value
and it's the application layer where the
real innovation happens. Right? This is
where open claw gets invented and that's
where the next paradigm of AI
engineering gets discovered. And that's
really all I wanted to leave you with
today. Thank you very much. Our
next speaker is VP of research at Google
DeepMind. Please join me in welcoming to
the stage Rya Hazdell.
Hello everyone. Wonderful. Uh what a
lovely full room and good smiles. I
heard the uh dig on Google there at the
end. I I I I did catch that. Um so my
name is Rya Hadel. Uh I've been a part
of uh DeepMind uh for the last uh almost
13 years and I'm very happy to have AI
engineer come here to London. I'm also
uh very proud this year to be uh a UK AI
ambassador. So I help the government um
academia industry sort of bridge those
those gaps. Um and uh yes I'm American
by birth but I've been here for long
enough um that uh can count myself among
the proud Brits as well. So I'm going to
talk a little bit about Frontier AI and
the future of intelligence.
Uh to start a little bit of a of a
longer um introduction to who I am. Uh
it's good to be as old as I am. You get
to look at this by the decades. So in
the 90s I did my undergraduate degree in
philosophy of religion. Um was
definitely not uh not a computer
scientist yet. Um uh but I really
enjoyed it. Before you ask, uh uh yes, I
learned a lot and I'm glad that I did
it. and no it hasn't been very useful
since
um in the 2000s I did a bit of a pivot
uh moved into computer science uh after
some some good advice from those close
to me and spent my PhD years in New York
City working on uh convolutional
networks neural networks for robots uh
with uh with Yan Lun uh which is a lot
of fun um I then in the 2010s made the
decision to join a small group of uh
curious um scrappy uh individuals
working at Deep Mind. Um it's a group of
about um uh 30 40 people at the time and
uh we spent the rest of that decade
working on things like Atari video
games, uh Go Starcraft and some
robotics. um a lot of fun and um uh now
I am a VP uh within Deep Mind. I help
run about a group of about 1,200
scientists and engineers um across 10
labs um um and uh we're working on a lot
of different things. I'll tell you about
three of those.
Um so first uh frontier AI is uh an area
where um we really are trying to make
sure that we are staying in the front.
So we're thinking about what are the
next architectures that we're going to
use for Gemini. What are the next
problems that only AI uh can really
address? Um and how are we going to
build the future of intelligence? And
that's thinking not just about
artificial intelligence but it's create
the future of human intelligence as well
and even robotics intelligence as well.
Um we are all on this uh journey
together and I think that it's important
to think about how how humans change as
well as the technology. Um our approach
we look for root nodes. You know we're
not going to waste time on the leaves.
We're going to really find for a big
problem space that hasn't been solved.
what are how deep can we go find the
deepest problems and solve those in
order to then enable um a lot of
downstream stream impact. Um we partner
you know really with the world I really
think about it very broadly and think
about who are the partners that can help
us find those root nodes and solve those
problems and also you know bring it to
the to the leaf nodes and solving
problems that are worth solving. Um the
motto or the mission of deep mind um is
to build AI responsibly uh for the
benefit of humanity. So I really take
that seriously. We want to build build
uh solve problems that are worth
solving.
Um all right. So uh we work in a lot of
different areas within Frontier AI um in
Deep Mind. These are sort of some of the
different uh categories. I'm not going
to tell you about all of them. So you
can uh just uh maybe keep those in
mystery. Um but I'll just pick out a
couple. So first in advanced models um I
actually wanted to uh bring up uh an
embeddings model. So the theme of this
talk overall is things that are not
directly language models. Um and in the
modeling space I wanted to talk about
embedding models. To start that out I'll
ask if anyone knows what a Jennifer
Aniston cell is.
Aha. I got a few neuroscientists
neuroscientists in the room. So this is
actually a concept from neuroscience
um where we've discovered that uh there
are not just a single cell but a small
number of neurons that will encode for a
specific thing as in a specific person.
understand that those combinations of
neurons that only activate for that one
person or that one thing or that one
place, those cells are actually very
robust. They uh activate regardless of
modality. Um and this is used by the
brain very very fast retrieval for
recognition um and for comparison
functions. Uh, so that means that when I
say the name Jennifer Aniston or if I
showed you a picture or a video or if
you even heard her voice, if you knew
her, if you were enough of a fan, then
those all those different modalities
lead to the same set of cells
activating. Um, so we want that in a
artificial neural network um for the
same reason. We want fast retrieval,
recognition, and comparison. Um and so
we can trade what's called an embedding
model uh in order to encode for those
concepts in order to be more robust um
to different uh different ways the
information can be presented um and to
be very very good at sort of
understanding what is the comparison
between these different activations.
I use contrastive losses. One of the
reasons why I like this space is that
because I did my PhD work in part
looking at Siamese neural networks which
was an early way of understanding what
is a contrastive loss function. Um and
so these these embedding uh uh functions
are really critical companion to
generative AI. Sometimes we want to
generate, sometimes we want to retrieve.
Um, so the group at Google has been
working on this for a long time and just
recently we've actually released Gemini
embeddings 2. So this is exciting to me
because it really is sort of the the
ideal. It is fully omnimodal.
Um, it uses uh it's derived from Gemini.
Um, so it's got sort of that level of of
knowledge and understanding of the world
and it is and it allows extremely
extremely good retrieval.
um uh in a little bit more more detail
then um why is it good that it is
unified and multimodal? It means that
you don't have to have different steps
to try to bring things together and ma
map them together. You can be truly
endtoend and not lose information by
trying to combine audio information,
visual information, text information
together. Um, so you can get a single
vector that represents text,
uh, uh, up to 8,8K tokens, um, uh, 128
seconds of video, 80 seconds of audio,
um, and a full PDF. And together, that
can give you a lot of information. You
can then use that um to be able uh to
use it for retrieval
um uh for for for querying um for
agentic logic and other things. Um we
also use something called the matri
matrioska representation learning MRL um
and that allows us to have diff be able
to have the same network but represent
different uh dimensions. So for
instance, you could start out doing a
retrieval uh using only uh 256
dimensions for your embedding and then
you can expand that to get to more
expressiveness.
Um
so uh this also this gives us we can
demonstrate that we have this allows us
to have a unified um semantic space um
and really state-of-the-art quality. Uh
so um just something that's come out
recently that I think should it doesn't
get talked about quite as often as
language models but it's really
important as that uh companion I think.
All right. Next I wanted to quickly talk
about another thing that is not a
language model. This is not a language
model at all. There was no language
involved. Um and this is work that we've
done on the weather. Uh in London it
rains a lot. Um and a few years ago
there was a um uh a informatics uh
scientist at the Met Office, the
meteorological office for the UK, the
UK's weather agency that said, "Can you
predict better rainfall than our
physics-based models um using AI?" And I
said, "I don't know. Interesting
problem. Let me take this back to the
team." Took this back to the team. Um at
Deep Mind, we started working on this
and we discovered yes, you know what?
Predicting the weather even though it is
a very very hard problem using a physics
simulation of the atmosphere is actually
quite tractable for neural network
models given that we have 40 years of
data 40 years of global data on what the
weather is. Um so a couple of years ago
we came out with graphcast.
Um, Graphcast uh predicts the uh
predicts the state of the atmosphere up
to 15 days out um everywhere on Earth
and for many different variables. And
this uses a spherical graph neural
network. Uh can think about this uh uh
encompassing the earth and having nodes
that go all the way from the surface of
the earth all the way up into the lower
stratosphere. Um, and we actually feed
in and then predict in an auto
reggressively a hundred different
atmospheric uh variables. For instance,
uh wind speed, temperature, and humidity
as shown here. Um, and this worked very
well. Here's a quick example. We were
excited to see this um in late 2024.
This is Hurricane Lee. sort of comes
into the Atlantic, pauses for a moment,
and then takes a takes a a turn to the
north, speeds up and makes landfall on
Nova Scotia. Um, the total this total
video is uh nine days worth. So that's
how far the the hurricane moves. And
this is actually the output of the graph
neural network. That's its prediction.
and the prediction that it made is
accurate
nine days out of where that landfall is
going to be. In comparison, the best
gold standard models
um that are that are physics- based were
only accurate six days out as to where
that landfall was going to be. When
you're talking about a major hurricane
hitting land, three days is really
important. Um so with this we said okay
this is important and we're going to
keep on pushing the science. So the team
developed the next model. We called this
gencast. And the difference here is that
this model while also based on a mesh um
is probabilistic and it has a higher
accuracy and a higher efficiency.
The weather is fundamentally chaotic and
we want to know what's happening um on
the tails. And so having a model that's
probabilistic, it allows us to do that,
allows this to be operationalized and
used for actual weather prediction. Um,
Gencast also was more accurate. So when
we compared it to 1300 gold standard uh
benchmarked weather forecasts, then this
was more accurate 97% of the time. And
it was also uh could be we could produce
that 15-day forecast in eight minutes on
a single chip instead of hours on a very
large supercomput.
So much different sort of space of the
solution that we were proposing. Um and
just this last year this team is
relentless. They're constantly coming up
with new models. Um and so the latest
one is called FGN
uh functional generative network. This
directly predicts cyclones rather than
predicting the weather and then having
to add on a cyclone detector as sort of
post-processing. This actually
incorporates the the categorization, the
recognition of cyclones, their
trajectory, their wind speed and the
formation of the eye directly into the
network. We train for that which means
that it's much better. Um so this has
already been used um in the US by the
National Hurricane Center. um and they
are very very excited by how much um of
an advantage this now gives um so uh
this this will hopefully be used
worldwide in in the coming years.
All right. Lastly, I wanted to use the
last few minutes to talk about again
something that is not uh language model
based. So this is world models.
Um, and this actually came out of work
that Deep Mind has done on games and
simulation for a long time. Um, we've
been working on Atari, on Go, on
Starcraft, um, and then on, you know,
Mojo type environments for robotics
because we wanted to understand, um,
agency and the environment.
we started focusing more and more on not
just the training the agent but creating
the in an an infinite environment
you know when we when I did work on a
locomotion here um then oh that's not
playing well
maybe this will play I'm going to jump
forward to genie one so we wanted uh so
this is genie one it could only run for
a few seconds but you could say hey I
want this type of a world. And then you
could produce this little platformer 2D
game environment where you could jump
around for a few minutes and it would
actually respond to whether you hit the
the left or the right and it could
produce a reasonable diversity of
different looking platformer type of
worlds. This was enough to say, hm, we
might have something here. Let's scale
up. Let's scale up the data um and train
again. improve the method and train
again now on 3D games. Then we produce
Genie 2. Genie 2 is uh is interactive,
but it's not yet real time. So you need
to go awfully slowly. Um um and it can
uh produce 3D environments, but it
couldn't do anything that was really
real world type of quality uh and uh
more higher definition.
Um, so we were working on that and then
along came
VO3.
Oh,
going to hope that this
I don't know why.
Well, um I was going to show VO, but
you've all seen VO or Sora or other
video generation environments. So, you
know that we can create videos that are
photorealistic and very good, very high
quality, although they're not
interactive and they're not real time.
And so, Genie 3 really wanted to solve
for all of these things. It wanted to to
create an environment that you could
create a world that you could interact
with, you could move around in that's
real time, so playable, and that also
has that really high level of quality um
that we saw with uh with generative
video models.
Now, I'm not able to advance the slides
at all.
Well,
this is too bad. Um,
oh, there we go. All right, let me just
see if I can show you a couple of
videos. No.
Right.
I'm really sorry.
Well, um
if any of you have not seen Genie, then
please go and take a look. Project Genie
has been um made available uh to all
Gemini Ultra subscribers and so it's
something that we've been really
exciting excited to see. Um
we've been really excited to see what
people have been creating with this. Um
It's just not getting
>> online videos.
>> Yeah.
Yeah, on the Wi-Fi. Yeah.
You go to your browser, you might just
>> It is.
It is. It is. It is.
All
right.
All right. Well, that works now. All
right.
All right.
Now,
All right. Well, now I am out of time,
but I will still take another another
minute or two to just show show you
these. Um, so this is saying that
telling Genie, I want a world where I'm
walking down a muddy lane in Kent. Um,
this looks not far from my house. The
fun thing here is that you look down at
yourself and you realize that you
actually have a body. You're actually
interacting with the world. It's a
little bit odd to uh to know what's
coming out of this model. It's really
understood not just the appearance of a
lane in Kent, but actually what it takes
to engage with that, make the water
move, and to walk forward. Um, of
course, it's not just scenes that are
walking. We can very happily uh ski um
and so you can create an environment
where you can engage with the world in
so many different different ways.
Um, here's an example where it says
original there. That's we started this,
we prompted this with a fragment of
video and now it's changed to Janie
Genie 3. So, this is an artist. He made
those that those first few seconds and
then we use that to prompt Jeanie and
bring this world to life. He was so
tickled to see that we could take a
little snippet of his world that he had
laboriously created and bring that to
life in a way that means that you can
fly through it. you can bounce off of
this thing and it remembers that, oh,
here's that here's that weird structure
there and go back to that and fly
through there. Um, so these environments
are not only diverse and interactive,
high quality, they also have uh memory.
So the prompt here was I'm an origami
lizard in an origami world and this is
what you get. And we use this as a nice
little test that I can spend, you know,
uh I can spend a minute running in one
direction, run back, uh to the start and
everything is exactly as it was at the
beginning. Um because we have a really
good memory. Working in these
environments gives us consistency and
control.
Um and lastly, we have uh we're able to
prompt this world as you're in it. So
that means that um while I'm in a world
that might be a little bit boring, here
I am. You know, this is a world saying,
"I'm walking down the Camden Canal in
London here,
uh, near the Deep Mind office." Well,
what happens if I prompted at the same
time? Then what happens?
I've just changed the world that I'm in
can change it again.
There we go. Immediately, the world is
is is different. And one more time, just
for fun,
I love the idea of a new form of gaming
where I could be adversarially prompting
your experience of a world. It just
creates a whole different sort of um
entertainment, a whole world, a whole
new frontier. Um that I think can be
really amazing, not just for
entertainment, but for education as
well. um the ability to be able to go
into a world in order to learn about it
I think is incredibly powerful um and
may well be something that that that we
see more and more of. Um and uh with
that I will uh say thank you and just a
quick call out that tomorrow morning um
my colleague Omar is going to talk about
Gemma 4 which is a language model.
Thank you.
Our next speaker is here to speak about
harness engineering. How to build
software when humans steer and agents
execute. Please join me in welcoming to
the stage member of technical staff at
OpenAI, Ryan Leopollo.
Good morning, London.
I'm super excited to be here today. I'm
Ryan Laop and for the last nine months,
I have had the privilege of building
software exclusively with agents. Uh, I
am a token billionaire and I believe
that in order for us to get into our AGI
future, we want everybody to be token
billionaires to use the models to do the
full job. And what that means is to lean
into the idea that the models are
capable of being a full software
engineer. And I've lived that experience
by banning my team from even touching
their editors to have to work through
the models in order to get the job done.
And uh today I'm going to talk to you a
little bit about what it means to lean
into that and operationalize the way you
work, the code spaces you live in, and
the processes on your teams in order to
get the agents to do the full job.
I believe I'm preaching to the choir
here when I say that the way we build
software has changed. In the last six
months, we have seen coding agents take
over the world and capability has
continually advanced at a super fast
pace to have these models and the
harnesses within which they live take
more complex actions, do more
complicated work with higher reliability
over longer time horizons. And the place
we've gotten to here is that
implementation is no longer the scarce
resource of what it means to do the job
of software engineering. Code is free.
We have an abundance of code to solve
the problems that we come across in our
day-to-day as we run our teams, build
software, and solve user problems.
Hiring the hands on the keyboards as
part of our teams is only constrained by
GPU capacity and token budgets.
And each engineer today in this room has
access to five, 50 or 5,000 engineers
worth of capacity 247 every day of the
year. The only thing that needs to
happen, our roles is to figure out how
to productively deploy these resources
into our code and into our teams to make
use of this new capacity.
And in this world, skill sets are
shifting more towards systems thinking,
system design, and delegation in order
to make use of this abundant capacity to
produce code to solve problems. And
there are three reasons that this
happened, all of which happened in late
2025.
For me, the magic moment was GPT 5.2,
which when it came out was able to do
the full job of a software engineer. The
models at this point are good enough
where they're isomeorphic to you and I
in terms of the ability to produce code
at high quality that solve real user
problems in real code bases.
Code is free and I know this is maybe a
scary thing to hear because code carries
maintenance burden but it's free to
produce, free to refactor and it is not
a thing to get hung up on anymore.
We think of code as burden because it
it's a synchronous attention drain on
the human engineers on our team. But the
models are incredibly patient. They are
infinitely parallel. So the ability to
produce, maintain, refactor, and delete
code is no longer a forcing function on
figuring out how to allocate resources
on your engineering teams.
So sort of be AGI pill here is to
believe that the models are capable of
producing every line of code we could
ever possibly need. Figuring out when to
delete them, figuring out when to
refactor them or make them more
reliable. And it's your role as software
engineers to figure out how to unblock
your team of agents and humans driving
those agents from being able to drive
them over long horizon work to do the
full job.
The idea here is that every one of you
is a staff engineer. You have as many
team members as you can possibly drive
concurrently and have tokens to support.
And you need to look one day, one week,
six months into the future to figure out
what structures you need to put in place
to productively harness this infinite
capacity to produce code.
The scarce resources in this world that
we see today are three things. Human
time, human and model attention and
model context window. And in the world
where human time and attention is
scarce, the role is to think about where
that time is going, figure out ways to
productively automate it, and move that
synchronous human time into higher
leverage activities.
In a world where human time is scarce
and human time is required to produce
code, we have a stack rank. Things are
either P 0 or P2s. Those P3s will never
get done. However, in a world where code
is free and infinitely abundant, all
those P3s get kicked off immediately,
maybe 4x in parallel. We pick one that
solves the problem and in it goes.
I've had the privilege of building a ton
of agents internally at OpenAI to
improve the productivity of my
co-workers. And when code is free, all
these internal tools can have good
localization and internationalization
from day one. I can make tools that my
colleagues in London, Dublin, Paris,
Brussels, Zurich, and Munich are able to
experience in their native languages
without really having to trade against
any of my other teams capacity in order
to make highquality tools.
We should be working with the assumption
that the best parts of software
engineering that we all know, live, and
breathe are available in any product
that we could ever build all the time.
Humans no need no longer need to concern
themselves with implementation. The
important thing is not the code but the
prompt and the guardrails that got you
there. This is why leaving breadcrumbs,
documentation, ADRs, persona oriented
documentation around what a good job
looks like. All the historical logs of
tickets and code reviews. This is the
process that got you and your teams to
the code and products that you have
today. And this is what is need needs to
happen in order to get your agents there
as well. Your job is to build systems,
software and structures that enable your
team to be successful. And to do that,
we need to make them legible to those
agents that are driving the
implementation. That means structuring
them in a way that's native to the
agents. Writing them in a way that is
respecting of scarce context, which is
this other scarce resource here, and
figuring out ways to make the tokens
that are required to do the job easy to
predict. That means making things the
same as much as possible so we can limit
the amount of attention the model needs
to activate in order to do the job.
Large scale refactoring in this world is
free. So making things the same is
something that you are all able to do.
There's never going to be a migration
that hangs open for six months now that
you can't get the last parts of the
codebase to do because you can just fire
off 15 agents to drive that work to
completion. This is what it means to
have a migration, right? We can finish
them now. Come on. That's good. That's
good. Clap.
There's sort of this like metaepistical
question here about like what it means
to do a good job. And doing a good job
as a software engineer is hard. It
requires us years of being in the
industry to fully internalize what it
means to write highquality maintainable
reliable code that our teammates are
able to build on top of that is going to
acrue leverage to the codebase.
To do a single patch well probably
requires 500 little decisions along the
way around the underspecified
non-functional requirements that go into
producing good code. The agents, the
models during their training have seen
trillions of lines of code that make
every possible choice of those
non-functional requirements that you
could ever imagine. So, it's our job to
specify those non-functional
requirements to write them down in a way
that the agents can see this is what it
is to do a good acceptable job that's
going to produce a merged patch. And if
the agents aren't doing that, it's our
job to figure out ways to refine and
restrict their output such that the code
they write is acceptable. You can just
simply say do not produce slop. Don't
accept slop. You won't get slop in your
codebase. But to do that requires taking
short-term velocity hits in order to
back up or doubleclick into a task to
figure out what it is the agents are
struggling with in your environment.
Put the guardrails in place so they stop
making those mistakes
and then figure out ways to step back
and spend your time on higher leverage
activities once you solve some of the
blockers in the short term.
When I think about empowering my team in
this way, everyone is an expert in what
it is they bring. I have a diverse full
stack team that is experts in front-end
architecture, backend scalability, being
product minded. And each one of those
different personas fleshes out the skill
set of my team by bringing a different
understanding, a different set of solves
for those non-functional requirements.
Getting teammates to write those down
actually means that every engineer
driving agents gets the best of every
single person on my team. I don't need
to block on low signal code review in
order to learn what it means to write a
good QA plan. To have one engineer on my
team document that in a durable way
means every agent trajectory is going to
get a good QA plan. And we can do this
once in a high lever way that we're able
to stack on top of.
So how can we get the agents to do a
good job? What are some of the tools and
techniques we have in order to
essentially prompt inject our agents and
continually remind them of what it means
to make those specific choices that we
expect around those non-functional
requirements. And there's a bunch of
ways we can do this. We can write good
agents.mmd files. However, with
autocompaction, which is a thing that
has continued to improve,
GPT 5.4 and CEX is fantastic at autoco
compaction, I essentially never have to
write slashnew anymore. I've got some
pictures on my Twitter of me strapping
my laptop into the back of my car so I
can continue do running inference while
I'm commuting to and from work. And in
this world, you have to kind of build
for that expectation that context will
get paged out over time. We need to be
continually refreshing context as the
agent goes about doing a task. And the
ways we can do that are by having
reviewer agents look at the code along
the way through the lens of what it
means to be successful. Right? We have
security and reliability review agents
in our codebase that are continually
running as part of every push and CI
that look at those documentations and
the proposed patch and do simple things
like say, are there timeouts and retries
on this bit of network code? Has the
code that has been introduced have a
secure interface that is impossible to
misuse?
I'm sure everyone here has been paged at
some point for network code that failed
in production causing an outage that
could have been remediated by a retry
and a timeout. And I know I'm guilty of
putting that retry and timeout in
merging the bug fix and otherwise
ignoring that. I am not a reliable
reviewer or author of code with respect
to this non-functional requirement.
However, taking the time to write some
docs, write a lint that is bespoke to my
codebase that is going to look at every
time I call fetch to make sure that
there's a retry and a timeout wrapped
around it means I've durably solved this
problem and I'm able to do it because I
lean on this axiom that code is free
that the agents are able to do a good
job that I can completely migrate the
codebase to solve this problem durably
once and for all. And in order to kind
of operate in this way, we need to step
back and look at the durable classes of
failures that the agents and the humans
in the codebase are making time after
time. Figure out why we're spending time
on it. Devise a solution to
systematically eliminate this class of
misbehavior and then continue to
observe, refine, and make additional
choices on those non-functional
requirements.
One really neat trick I use here is that
you can write tests about the source
code as well that are separate from
lints. Right? If we know that context is
limited, we can write a test that limits
the fact that files are no longer than
350 lines. We're adapting our codebase
to the harness to the models to do a
little bit of engineering to be context
efficient and squeeze more juice out of
the model capability that we have today.
The other things we can think about are
providing good error messages that give
actual remediation steps to the model
and to humans for how to proceed next.
It's not enough to say we've got a lint
failure because we're awaiting in a loop
or that we have an unknown at this deep
part of the codebase and why is the
model writing a function called is
record. What we need to do is provide a
prompt via a lint or a test failure that
says no no no you shouldn't have an
unknown here at all because we parse
don't validate at the edge and you
certainly have a type here which was
derived from zot loadbearing
infrastructure for our AI future
prompt things I've talked about here
today is a prompt you can do this
without touching the model weights at
all.
Kind of a funny digression here is it
seems like each advancement we've had in
the complexity of the way we write code
to interact with these models comes from
both increasing capability in the models
and increasingly
niche ways for injecting prompts into
those models. Prompts I'm sure you're
aware are prompts. Powers prompts rules
files prompts skills prompts. These lint
error messages that I am talking about
prompts. Review agents that inject
comments onto the PR that we require the
agent to address before it is able to
propose it for merge. Prompts.
You're going to find lots of ways to
insert prompts into your code. And one
way you can do that is by embedding
agent SDKs into your tests that are
going to review the codebase for
acceptability using prompts that get
embedded into the code. And if I find
myself spending a ton of time writing
prompts, we can actually shell out to
the agent for that as well. Uh, I've
pointed codecs at all of the prompting
cookbooks we have on the OpenAI
developer guide and told it to
synthesize a skill out of them for how
to write prompts. Which means when I
find a need to write prompts in order to
improve my agent performance locally in
the code, I use the skill to write
prompts that I wrote with the agent
looking at the prompts to write the
prompts.
All the leverage that you're encoding in
in to your repository, your team, and
the agents in this way stacks incredibly
well. To kind of pull back to this idea
that a single product-minded engineer on
my team was able to give us a big lift,
they know what it means to write a good
QA plan. To write a good QA plan though,
you have to document all the features
that you have, the critical user
journeys, and how users engage with your
applications, web apps, APIs, and
services.
Once you write those down on how to
write a good QA plan with the
expectation that all userfacing work has
a QA plan, now a review agent is able to
assert expectations around what it means
to prove that you have effectively
written the feature. A QA plan indicates
what media should be attached to the PR
for the humans and agents to know that
you've done a good job, which has the
consequence of me trusting the output
more, needing to shoulder surf the agent
less, and removing myself from the loop
even more to delegate more and more of
the work to agents. And all of this is
just making sure the agents have the
tools and tokens and context
to do the full job to remove myself from
the need as a synchronous driver. The
models crave tokens. We can
operationalize our codebase to give them
tokens to drive them forward using sub
agents and all these other techniques to
refine the agent output.
I'm excited to let you all know today in
the way you all do that you can just go
build things. Do not hesitate to remove
yourselves from the loop by getting the
agents to do the full job because they
can. Thank you.
Our
next presenter is the creator of Open
Claw, the world's fastest growing
open-source AI. He recently joined
OpenAI to work on bringing agents to
everyone. Please join me in welcoming to
the stage Peter Steinberger.
Good morning everyone.
>> So Swiss asked me to do a state of the
claw. Who here is running open claw?
Give me some hands.
Ah, it's like 30 or 40%. Very good. Um,
yeah,
it's been quite a few months. Um, the
project is now 5 months old.
I think it's fair to say by now that we
are the fastest growing project in
GitHub's history. Um if you've seen the
the graph usually it's some some
projects look like a hockey stick but
ours was just like a straight line and a
friend called it stripper pole gross
and that comes with it own challenges.
So we have I think but now we are the
the largest number on GitHub stars.
There's a few that are bigger but
they're basically educational target. No
other software project is that big. It's
around 30,000 commits. it. We're closing
in 2,000 contributors
soon to be 30,000 PRs. Um,
see, and we're not slowing down. So, you
see that it's a ramp, but you know, it's
we only have April 9. So, um,
velocity keeps keeps being good
and at the same time
it hasn't been easy. You know, I I had
two roads when I when I decided what I
want to do and I I did the whole company
thing. I was like, I don't want to do
this again. And then I joined OpenI, but
then we also created the Open CL
foundation and now I kind of have two
jobs and running the foundation is like
a running a company on hard mode because
you have like all the all the things
that you need to take care of but also
you have a lot of volunteers that you
can't really direct.
So
one of my goals has been working on the
on the bus factor like who does comets
um and you see that it's slowly
improving.
Vincent is actually talking after me but
we're still not we're still not there.
Um, in the last months I I talked to a
lot of companies.
So, we now have people from Nvidia on
board. We have someone from Microsoft on
board to like help with MS Teams with
like a Windows app. Uh, we have someone
from Red Hat who's really helping us um
with security and dockerization. We work
with a lot of Chinese companies. We have
people from from Tensent and Biteance.
um they're actually much larger users
than any other continent
and yeah people from pretty much around
the world but like the main thing I I
want to like talk a little bit about is
about open claw is so insecure you know
you've you've seen the
you've seen the memes so like open claw
invites the bad guys
and you probably also seen
companies like Nvidia
doing Neyo claw and like everyone has
little lobsters.
So
you also notice that like in the last
two three months there's been a lot of
releases where things broke.
I've basically been been dosed by
security advisories. So that's what I
did um and what I focused on. So far we
got 1,142
advisories. That's around 16.6 a day. 99
are critical. Um we published around 469
and we closed 60% of them. So these
numbers sound like absolutely
terrifying.
If you compare it for example to like
other large projects like the Linux
kernel gets like eight or nine a day. we
get like twice as much and curl so far
has 600 reports we have like twice as
much as curl.
So every time I I get a
security incident, the rule is the
higher the higher they are screaming how
critical they are, the more likely it
slop. Like we we are I mean you've
probably also seen the news like we we
we are very fast moving into a world
where
we have to change how we build software
because all these AI tools are getting
so good at identifying
even the most weird multi-chained
exploits and like we're gonna going to
break all the software that exists. I
give you an example like
uh Nvidia they
they launched Nemo Claw and Nemoclaw is
a a plug-in and a security layer for
open claw using put it in a sandbox.
I the keynote was on Monday. They
invited me on Sunday to like work with
them. I hooked it up to Codex security.
It found like five different ways how to
break out of the of their secure sandbox
within half an hour.
That's because like if you use that
product, you get access to the unnerfed
model that is quite a bit smarter in
terms of cyber than what the public has
access. Exactly. Because it's dangerous.
But yeah um
also this whole industry those people
for them it's like credits right the
more the more issues they find the more
they are seen so like openclaw was like
the insecure product that everybody
tried to break so literally like
hundreds of people firing up their
clankers trying to break open claw
um
the typical attack surface This is like
remote code execution,
bypass approval, code injection, pass
traversal. Uh again sounds all very
dangerous.
And I give you I give you one one
concrete example. Um
Gshjp.
This is about a this is a CVSS of 10. So
it's like the scariest thing that you
can possibly do.
It is an issue where if you
uh sync for example the iPhone app that
we haven't even shipped yet but is in
progress and you give it only read
permission then you could like break the
system to also get write permission.
So this this one was so critical that
the I know this one's actually different
one
in all in all practical ways.
It is not even an incident because the
the the typical use case is you install
it on your machine
either in a cloud or if you have to on a
Mac Mini. I I stopped fighting this. I'm
just letting people have fun now.
But in 99% 99.9% of cases you either
have access to your gateway or you have
not access to the gateway. In in in my
defense this was my mistake that I tried
to create a a more permissive model. For
example, if you have devices that would
target speech and then would only like
read certain things. So there's like
some use case where you could like have
a a reduced permission system would make
sense.
Um but nobody's even using that. But
this doesn't matter because the rules of
the of those how you create the CVSS
numbers don't contribute to that at all.
And I try to play by the rules. So it is
a 10 out of 10. And the world is going
crazy over incidents that in all
practical ways will not affect people.
There's some other stuff that does
affect people. Uh we have nation states
trying to like hack people. There was
like ghost claw which is like from
likely from North Korea which is
basically confusing people with a
different npm package and if you if you
go to a wrong website and you try to
download it you get like a a root kit.
Um that's outside of our control. This
happens for other people as well. Um,
also there's the Axios thing which funny
enough we are not using Axios
but we are using MS teams or Slack as a
dependency and they're using Exius and
they didn't pin us and of course uh
because that's how supply chain attacks
work. We were also affected.
Yeah. How do you survive 1,142? I'm sure
now it's 1,150.
Uh for a while I I I tried to handle a
loop by myself and which is absolutely
impossible.
So So the fastest way to get help was
like getting getting help from companies
um and Nvidia has been really amazing to
like give us some people that basically
work full-time going through the slop
and hardening the code base.
Oh, there's also one that is
okay.
That um
this is one of the angles. The other
angle is like there's a lot of companies
that do fearongering and it's not just
companies, it's also universities. I
don't know if you've seen it. There was
like this um
paper who made the rounds agents of
chaos and they say oh it's it's about
agents in general but then there's four
pages that explain the open claw
architecture in utmost detail
but you know which page they didn't even
mention
a security page where we explain how you
should install it because then it
wouldn't be fun and it wouldn't be it
would be hard to make a good story. So
what they instead did is they ignored
all of the recommendations we do in
security. Recommendation is it's your
personal agent. Don't put it in a group
chat. If you put it in a group chat,
turn on sandboxing because if anyone can
talk to your agent, they can exfiltrate
anything that the agent can do, right?
So if it's a team agent, it should only
know what the team can know and not any
secret data. And you probably want to
like have it restricted. If it's your
personal agent, you should be the only
one being able to talk to you. But if
you don't play by these rules, you can
get some really fun interactions like,
"Hey, I can talk to your agent and it
can break your system." And then because
I I was I was grilling them a little bit
because I had some questions how to do
things. They told me, "Oh yeah, no, we
run it in pseudo mode because we wanted
the agent to be like maximum powerful."
So they actually fought the setup. It's
actually not easy to run it in pseudo
mode. You have to change code. um
but they didn't mention it in the report
because again that wouldn't give them
cloud.
So yeah um my current frustration is
like there's like a whole industry that
try to put the project in negative
light. It's a nightmare.
It's insecure by default. It's
unacceptable.
Um
and meanwhile a lot of people love it
and people who actually read the
security docs understand it can use it
just fine. One example that I found
particularly great is uh we had one
remote one rce that panicked Belgium.
So the Belgium cyber security did a
release uh about a remote execution
environment
and the whole bug was
a feature where a malicious website
could create a link
that would
trigger the gateway and then forward
your gateway token. Now if you use the
setup that is the default and that is
recommended the gateway token is local
only or if you have to it's in your
private network no external website can
actually access it. If you
actively fight the setup and for example
use cloud code to set it up without
reading, you might be able to get this
setup working.
But again,
that's not anything what what's said on
the website.
So to be very honest, yes, there's
absolutely
uh risk. the the the big risk is the the
basically the lethal trifecta. You know,
any any agentic system that has access
to your data,
has access to untrusted content and the
ability to communicate is something
that's potentially at risk. That's not
anything special to OpenClaw. It's like
any any agent any power fishing system
has a problem. The more the more
powerful you make it, the more it can do
for you, but the more you also have to
understand what it does. So this is like
the the main issue
but people don't talk about it. Yeah.
And then also
um
some part about maintaining.
So
the problem is like if you get all those
security advisories,
you know that most of them are created
with agents, but you still have to use
your brain to actually read it because
we're not at the point where you can
fully trust or I'm not at the point
where I I can just fully trust that the
agent will figure it out. So it is a
huge burden on on time and you never
know. I mean sometimes you can you can
often guess you know anytime the reput
is too nice or like someone apologizes
that's very likely AI because usually
people in security don't apologize. Um
but it is a huge problem and it's
something that I see more and more open
source projects complaining about or
like breaking. Um,
some are very public about it like
ffmpeg.
Usually you get the report. It's very
rare that you actually get a report and
a fix. If you get the report and a fix,
it's usually a very bad fix. If you rush
it, as I sometimes did in the beginning
because it was overload, you will very
certainly break your product.
Yeah. So this is something that's just
very difficult to pull up only with
volunteers. So we so
what are we working on?
Number one is
I
people say like open AI bought openclaw.
That's not the truth. They might bought
my soul.md. Um but they very much
understand that in order for what the
world needs is like more people that
play with AI to like understand what AI
can do to both understand the risk and
also the possibilities. They understand
that if you or like someone who never
played with never used AI suddenly is at
home and uses open claw they'll come to
work and they will ask why don't we have
AI at work. So they very much understand
that like supporting this project is
very useful and in order for that
project to be successful cannot be under
one company. Therefore I'm kind of
building Switzerland with the Open Glove
Foundation and I have Dave who's helping
me with it. Um it's almost done. The
last thing that's keeping us going is
like the American bank system which is a
little bit slow and very confused when
you're not American.
Um, it's inspired by what Ghosti did.
And this will actually then help us to
hire full-time people to both keep up
the pace, improve the quality, and free
up some of my time that I can work on on
cool stuff again.
And that's my little update on State of
the Claw. I'll be around later for like
a Q&A. Thank you for listening.
Ladies and gentlemen, please welcome
back to the stage Phil Hawksworth.
Hey, thanks Peter. Thank you so much,
Peter. Okay, some of you are already
aware that there's a break happening now
and you're heading for the exit. I
completely understand it. a quick couple
of quick things from me before you go.
Um, so first of all, we're about to uh
break into our into our various tracks.
Um, there is a break until uh 11:15
that's going to be happening. Uh, well,
it's happening everywhere, but you can
get refreshments out in the expo hall.
Um, just a very very quick bit of
information uh about the uh the various
tracks. Uh, let's see if we can uh
advance that. We've got uh here we are.
So we have um oh no this is is this
right? I don't know if this is quite
right. Uh you'll check on have a check
on the schedule because we've got uh
tracks for openclaw happening in here.
We've got a track for um harnesses
harness engineering. We've got a track
for context engineering. We've got a
track for multimedia
um sorry multimodal uh interfaces uh
talking about OCR and text to speech uh
etc etc. Um uh and we've got uh of
course Google Deep Minds um uh have a
track as well. We're going to be talking
about all things to do with open models,
agents, web MTP and much much more. Um
I've rattled through there very very
quickly. Um but also there is one more
track uh that I wanted to tell you about
and that is the hallway track. Uh the
hallway track is well we thought we've
got all you gathered here. It would be
nice if you gave a talk as well. uh you
don't all have to give a talk but if you
would like to give a talk you might just
be able to. So uh when you uh registered
for the event you would have found
information in the email about joining
the the the uh AI uh engineer Slack.
Don't know if everyone's found that. You
should definitely be in there. Uh but
you can submit a proposal for a
10-minute lightning talk and there's
room in the schedule tomorrow to give
that. So uh I'll remind you later on but
put your proposals in there. Uh and then
there is a chance that you could give a
talk uh tomorrow. So keep an eye on the
schedule. Also, we'll be letting you
know there's a vote for that. Okay. I
sense you're hungry for refreshment and
I don't blame you. A lot of information
today. So find check the schedule for
the various tracks. Uh find those tracks
later on 11:15 starts. Enjoy your
refreshments. We'll see you soon. Thanks
very much.
Heat. Heat. Heat.
Heat.
Heat.
Hey.
Heat. Heat.
Heat. Heat.
Hey.
Heat. Heat.
Heat.
Hey, Heat.
Baby,
baby.
Heat. Heat.
Hey.
Hey.
Hey. Hey.
Heat.
Heat.
Heat. Heat. N.
Heat.
Heat.
Heat.
Heat.
Heat.
Heat.
Duh.
D.
Heat. Hey, Heat.
Hello.
Hey.
the only
I
know it.
Heat. Heat. N.
Heat. Hey. Hey. Hey.
Hey hey hey hey hey hey hey hey hey hey
hey.
Heat
up
here.
Hey, hey, hey.
Hey,
hey, hey.
Heat. Heat.
Heat.
Heat.
Heat. Heat. Heat.
Heat. Heat.
Heat. Hey, Heat.
me.
Heat.
Heat.
Heat. Heat.
Heat.
Heat.
Hey.
Hey.
Heat. Heat. Heat.
Hello. Okay,
great. Thank you for the whoop. Love the
whoop. Um, so excellent. Okay, you've
chosen the claw uh track to get started
on for our our breakouts and uh uh it's
going to be great. I think it's going to
be it's going to be a good session. Um
we are going to be hearing about a bunch
of different things uh related to uh
openclaw and just personal AI assistance
in general. There's some open claw
contributors, openclaw maintainers, uh
um uh open claw competitors, uh and
openclaw creators, uh going to be here
on the stage. Um we're actually going to
uh be taking this through until the
lunch break. Um oh, there we go. We can
see up there. So, it's about an hour and
a half of uh of sessions, slightly
shorter sessions than uh than earlier, I
think. Um but we're going to be starting
with uh an AMA. came in. You saw Peter
earlier on, but you're going to get a
chance to ask questions and there's
going to be a bit of a conversation uh
with Peter and Swix. So, I think to get
us started, I will simply invite Swix up
who will kick things off. So, uh please
welcome him to the stage. Swix, come on
up. Swix.
>> All right.
>> Actually, we can just go together.
>> You can come out together. There's no
secret. Peter, welcome. Everybody there
is
Okay, so the deal for this is meant to
be an AMA. Uh the the main idea is that
I've run six of these AI engineers and
whenever we have some big maintainer,
big VIP, we only give them a talk, but
actually you guys have questions that
you want to ask. Uh so uh we wanted to
sort of create that opportunity. So you
can you can submit there. I'm going to
moderate uh and and all that. U the
spicy one I'm just going to start off
with. Pete just quote uh quote tweeted
uh me and saying send all your questions
about closed claw right uh
I think uh people have a lot of
questions about um the future of
openclaw at openai uh and uh I wanted to
give you the space what what is the what
are people saying about closed claw and
then what is your response
>> I didn't even think about it was like it
came up when when I decided to go to to
openi And
I think I think people have a point that
open air wasn't always
amazing with open source. And I I think
a lot changed like Codex is open source
now. They released Symfony which is a
really cool orchestration layer. So like
like they're really leaning in and
understanding open source now. They
understand that open cloud needs to stay
open work with any model be it be it one
of the the big companies or being a
local model um everybody in the industry
wins if more people spend time with AI
you know if if I if I think AI is
something scary and then suddenly I I I
play with open claw and suddenly it's
like fun and weird and then I come to
work and there's no like I don't have AI
tools at work. I'm going to get to my
boss and say why the f do we not have AI
at work and and then like those
companies would probably not run open
claw but we want something that's like
hosted and managed and and then somebody
can can make a sale. So they they're
like very much on board. They provide me
with resources. Um, actually it's me
like I could get a lot more people from
OpenAI to help with the project, but
that would just make a picture that they
could have taken over the project and I
don't want that. So I I I brought in
people from Nvidia, we have someone from
Microsoft, someone from Telegram,
someone from Salesforce of all the
companies. So So shout out actually
there's cool people at Slack. So we have
someone that maintains the Slack plugin.
Now I brought Tensent on board, Bite
Dance. We talked to Alibaba, Miniax,
Kim, like all the all the model
providers. They're like very much on
board. Um, Nvidia has been immensely
helpful. They
I think I one of the coolest companies
in terms of here's some engineers who
actually like just hire agency and just
do things.
>> Yeah. Uh and now that I have all the
other companies, I'm also bringing a few
people in from OpenAI to to help
maintain the project because it's I mean
software is just like changing that the
the pace at which this project operates
is is insane. You kind of like you need
an army. Um and I'm working on that.
>> You have an army. Uh and but but you
know even the contributor chart that you
showed uh shows that it's hard to get
quality contributors to stick around.
people keep hiring your maintainers and
then you have to find new ones. Um, so
there's a lot of questions about local
models and open models. Uh, you know,
like not every part of the stack is
open. There's many models where you
don't have access to the models and and
you know, there's sort of weird
restrictions. Um, how important is open
and local models to the future openclaw?
I mean part of part of what what
motivated me to build open CL is you see
all these large companies and then they
have connectors to my Gmail and then my
my email is hosted somewhere then this
company has full access to my email and
then I can get a little bit down there
like it's much more exciting to me if I
have all my data actually under my
control and I and like a little bit of
it goes up there if I need the top tier
token.
>> Yeah. and like a second kind of
hierarchy of uh fallback models.
>> Yeah, you want to I mean I'm I'm
European at heart. You want to own your
data, you know. So so so and nobody
built it. So for me that was very
attractive and also the the fact that
you know if if you're a startup you want
to connect to Gmail, it takes like half
a year and it's like a very very
difficult process. But if I'm a consumer
my clanker can click on any website and
it happily clicks on I'm not a bot. If
you have to give me the data somehow, if
you can if you give me the data, my my
agent is able to get the data. So you
can work around a lot of those those
silos those big companies are building
and ultimately you can do much cooler
automation use cases that large
companies can never do.
>> So it's it's like
it's a little bit the the hacker way.
>> Yeah. And um any indications from the
open team on GBTOSS? Is that continuing
continuing to be a stream of work that
uh will be aligned with open claw or or
is that like separate?
I'm not I'm not in a position to give
yeah give you insights on that just that
um part of an open open cloud trigger is
that like more people in the company are
getting excited about open source um and
I I love that that open air is moving
more into the open direction again if
you compare it to some other top tier
labs that start with an A uh that very
much will sue you if you if you leak any
of their source um or block you if you
are too successful. I I I think open up
your eyes on a good direction.
>> Yeah. Okay. I want to highlight this
question. Um people love hearing about
your coding workflow. I think right by
now your idea of um uh the prompt
request rather than the pull request is
is very well socialized. And also you've
been shocking people with just how
you're spending tokens at OpenAI.
Uh so basically uh the people want to
know how you ship and what do you do
about agent waiting times like why is
you know you're spinning out so many
agents it
>> I know like I I never imagined that this
one picture of me would blow up so much.
Yeah,
>> actually
>> uh give give some numbers just just to
align people. I I think and there's
times where I was running almost 10
sessions at the same time especially
when I used codeex with 50 51 it was
quite slow I think now I have to say we
it's still weird we made improvements
they both make it faster and then
there's also fast mode so by now my
typical workflow is
maybe half of that maybe five six
windows instead of double just because
each loop is faster and like the
area of work I sync in workers is pretty
much the same. So I I don't have to use
split screen so much anymore and I think
we're going to move into a future where
um
token will be will be faster and faster.
So at some point like this is not
natural that you work on on six things
at the same time. Um
but it's basically a workaround until
until faster. Yeah. Uh, one of my, uh,
interesting things of putting you next
to Ryan was to see how the two of you
kind of approach uh, token maxing.
Basically, I'm curious what you think
about the the complete dark factory
approach, right? That uh, you don't even
review code that goes in.
>> I think that's more and more doable.
But also you know when I when I
dark factory in a way also means I come
up with everything I want to build in
the beginning and I just don't think you
can build good software in that way like
the way to the mountain is usually never
a straight line. It is it is it is very
curved. Sometimes you go a little bit
off track and then you you see something
new that inspires you. You find like
shortcuts. um once you're at the top you
you you can find the optimal path but
you never walk like this. So at the same
time you will the first idea that you
have about your project is very unlikely
going to be the final project. But if I
if I suddenly use the waterfall model
again that will be the final project.
For me that doesn't work for me. Like I
I build steps I play with it. I see how
it feels. I get new ideas. My prompts
change. So to me it's a very iterative
approach. So I don't see how you could
fully automate that. You can definitely
build pipelines for certain things.
>> Yeah.
>> But even even for PRs, you don't just
want to build a pipeline that just
merges PRs because a lot of them just
don't make sense. You know, like people
people will pull your product into all
kind of directions, but if you automate
that, the AI will very unlikely know
what's the right direction. You can
guide it. I have like a vision document
that I tried some of that but
the bottleneck is still sinking and like
having taste.
Yeah, taste is very important. Uh how do
you define taste? This is something that
in my conversations with people everyone
understands taste is the moat but nobody
agrees on what taste good taste is. So
I'm just curious to hear yours. I think
in this day and age is like
the very low level of taste if if it
doesn't stink like AI and you know
exactly what I mean you know if if
something is just so writing style
personality
>> also also also UI by now you've seen so
many so much aentic built UI that you
immediately know if it's AI
>> yeah if it has the the color border on
the left right yeah I mean for a while
it was like the purple gradient but much
more so I I feel it's
It's like a feeling the same as you can
identify AI written slop right away.
>> Yeah.
>> Um that's why I say it's a smell. Like
even if you can pinpoint this, you will
know. So So that's probably the lowest
the lowest characterization of taste.
And and then going higher up because now
so much of software is is automatable.
There's actually much more time you can
spend on like the little details. I
don't know, you know, like like just
when you when you when you when you run
open claw, you get like a little message
uh that sometimes roasts people.
Those are like the delightful details I
think that
>> you'll just not get if you prompt in a
high level.
>> Yeah. One one of my favorite tastes of
yours is how you you uh really put a lot
of work into your sole soul MD and you
uh you know open source your approach
and I don't think people worked on
enough soul until until you came along.
So I think that's really interesting. Uh
my I I I have a podcast I haven't done
yet. I haven't released yet with uh
Mikuel Parakin who was the CTO of
Shopify now, but he was the uh guy
leading Bing where Sydney was uh the
original sort of unaligned chatbot that
emerged. Uh but I think people really
have fun when when your soul your
chatbot has personality. Your clanker uh
you know has different obsessions.
>> Well, it wasn't because it the world
changed, right? We had we had chat GBD
in 2023 and 4 and it was basically
us having AI without understanding what
AI can do. So we rebuilt a Google so you
have like a search field and like you
get a response and you you don't expect
Google to have a personality.
>> Yeah. But now that we moved more towards
agents like if if I I didn't think about
in the beginning WhatsApp relay and I
just hooked it up to cloud code. Um and
then I when I was on WhatsApp I noticed
that it doesn't feel quite right. Like
even even though like cloud code already
has some personality it didn't really
fit how people would write to you on
WhatsApp. So that that's how my whole
iteration started was like uh this again
it's about taste, right? It doesn't feel
quite right. It's like too wordy. It
uses too many dots. It it it my friends
text different. And then that's how I
started working. They say, "No, this
isn't like try to write more like a
human."
>> Uh yeah, I I actually run a writing
>> like a lobster.
>> Uh like a lobster. Yes. Um
uh you know the one of my favorite
quotes of yours is uh madness with a
touch of sci science fiction. Yeah.
Right. Like that this is how you run
>> um uh AI projects. And I think
>> not all the art projects, but
specifically
something like OpenClaw would have never
been able, it would not have come out of
an American company just because it
would have been killed in legal long
before it would have been released
because it just has some problems that
we haven't really solved as an industry
yet.
>> Yeah.
>> But now we have some mitigations and
it's getting better. The models are
getting a lot better. But I don't see
how any of the big labs could have
released that. You know, it would be too
much push back. Oh, and like not enough
market proof that this is what people
want.
>> Yeah.
>> So like it had to be done by someone
>> like
>> outside. Yeah. That that that
>> sitting
>> like literally like when I when I built
it in the very beginning, I was like,
"Oh, what's the worst that can happen?"
like it could exfiltrate my token,
my emails. Yeah, nothing is nothing
nothing's in that that would like
completely kill me. You could like
upload some of my pictures. I was like,
yeah, I guess the worst are already
online if you use Grinder. Um,
so it was like it was like,
okay, I can live with that risk. It will
be uncomfortable, but it's like it's
manageable.
>> Yeah.
>> Uh, if your company is a different it
requires a little different approach.
>> Yeah. By the way, uh his Instagram
account, good follow under
underfollowed.
It's also it's also has some good stuff.
Um okay. Uh you were talking about
WhatsApp, talking about Telegram, a lot
of these text apps. Um uh text apps are
good. People are also looking for like
the next form factor. People want like
the maybe the the glasses, the earbuds.
What What is your sort of wish list in
terms of having agents in your life?
I started on that actually already, but
then I was just getting bogged down by
all the people using it and just like
the daily grind.
But if you're at home, I want to be in
any room and you know at Star Trek when
you can when you say computer
I I I want to like talk to my agent
wherever I am and it should just be able
to like respond to me. It should know
where I am. I have like little iPads in
every room and and my agent can use the
canvas feature and project stuff on
those iPads. So like if I ask a question
that that is like easier to be to be
answered by also showing me something
like it could use like the nearest
display because it's aware of where I
am. So the phone is just a very
convenient input point but I kind of
want to like talk to it from anywhere.
Yeah.
>> Like yeah if I'm around and I have
glasses I should just like be able to
like listen in and like project
something on me.
>> Um
>> but just ubiquitous follow you.
>> I think yeah once we have
>> really smart home. Yeah,
>> like agents on your phone, but really
you want ubiquitous agents and then you
want maybe you will have your your your
uppercase open claw your private agent
at work. You might have your I don't
know lowerase openi claw
and then
that claw should be able to like talk to
your personal claw uh in a way that both
your company and you are comfortable
with. So that's kind of like the future
where we need to work on.
>> Yeah. Uh one of uh I just did a podcast
with Maran Dre who's a huge fan uh and
and also uh have conversations with
Andre Karpathy. Both of these guys are
running OpenCloud to run their house.
And I think OpenCloud for homes is like
a kind of underrated, but like people
are really discovering it. And my
funniest sort of irony is that is it's
only possible because the internet of
means that most smart devices are
terrible in security, which means Open
Core can run them.
>> Oh, it's going to be able to work so
much better in in a few months when the
models are getting really bad.
>> Yeah, they're very good. Um, okay. One
security question. uh about prompt
injection. How do you want to solve
prompt injection or what what uh ways in
which uh have you been thinking about
the prompt injection problem?
Probably not enough yet. On the other
hand, like the the the front end models
are really quite good at detecting all
the
all the cases where like just stuff
randomly comes in from a website or an
email is usually not a problem anymore.
You mark as untrusted content very hard
to excfiltrate you from that. If if I
have unlimited access to your claw and
can bombard it with stuff, then there's
still a chance.
>> Then then there's still a chance. But
like for one of things,
>> it's no longer the biggest problem. If
you use that's also why why you know
that this is probably the angle where
like some people say, "Oh, Peter doesn't
like local models." But then I see like
people running like a 20 uh billion
parameter model that just does whatever
you tell it and and it's not trained to
have any defenses at all. That's still
problematic. If you run that and then
you use a web browser or email um would
worry me. That's why that's why OpenClow
warns you if you use a small model and
then people spin the whole thing like we
hate model. I I love I love I love that
it we support everything, but like you
have to
steer
the regular user a little bit into a
direction to make it harder for them to
shoot themselves in the foot.
>> Um
yeah, there there is some ideas for
problem injection. It's
>> still a little bit away. I haven't
announced that.
>> I think Simon Willis has been working a
lot on on this. is I mean he coined the
term prompt injection and the sort of
dual LLM approach seems smart uh and I'm
I'm not smart enough to figure out all
the ways that which it can be attacked
like at some point trust just has to be
a thing right um and uh and I pro it's
something interesting I found out from
talking with Vincent who's speaking next
is that you guys had to implement the
same trust system that Toby Luca had to
implement which is uh you build
reputation over time and things with
more trust uh gets more privileged
access, right? And I think that that
makes sense.
>> That's part of the story.
>> Yeah. Yeah. Yeah. Um Okay. So, uh some
more broader questions. What cool
projects would you like to work on once
you have more free time?
>> I mean, I wanted to work on dreaming,
you know, like my maintenance worked on
dreaming while I I'm there like
>> while you were dreaming.
>> Uh so,
>> shift it, right?
>> Yes. What What is dreaming? Uh it's like
a way to reconcile memories and like
kind of create a little bit like like a
dream log go through like your session
logs. Um
>> we we found out from the enthropic
source code leak that they also working
on dreaming, right?
>> Oh yeah. Yeah. I mean there's
I'm pretty sure there's like more
companies working on that. But think a
little bit like how do we learn as
humans? You you experience a lot of
things during the day and then you sleep
and and in sleep your your brain does
like a garbage collect
converts some me some
local locally stored memories into
long-term storage and like drops others
and that that's similar ideas that I
think could also be very useful for
agents. Um and then like what we shipped
on dreaming is like the first little
step in that direction.
>> Yeah. It's related to the wiki uh thing
that Andre has been talking about where
you sort of collect everything into a
>> wiki is is more memory but like
everything kind of blends a little bit
together. Um that the beauty the beauty
of open claw is that we can just try
stuff you know like like everything what
we worked on for the last months or so
is that
in the beginning it was a big spaghetti
codebased mess and now like everything
everything is an extension a plug-in. So
you can replace memory, you can add the
wicki, you can add dreaming, you can add
I don't know your your your whatever
crazy idea you have and just make it
your own. You don't have to send
everything to a pull request because
we're still completely overloaded on
those. But it's it's more like Linux
where you just can install your own
parts.
>> Yeah. Yeah. uh and uh you are building
what a lot of people think uh is the
most consequential open source since
Linux which I don't know how do you deal
with that how do you deal with the the
the fame what is a day in your life uh
as as the BDFL effectively of something
like this
>> what's my well there's still a lot of
coding there's also a lot of
>> by the way in in between sessions he was
coding
back there
>> yeah they get token excited you have to
like something has to You have to push
the agents, right?
>> Yeah. Um,
where it shifted a little bit now. It's
a lot more a lot more talking and
steering people in the right direction.
Like because there's a lot of things
that we already learned at Open Claw. So
like part of my role at OpenI is like to
like help them not make the same
mistakes again. Um
and then and then open claw is like try
out new things that seem exciting and
some might work and some might not work.
Enable enable companies to like build
their own claw without having to fork
away but like making everything more
more customizable. Um yeah and sometimes
I sleep.
>> Sometimes you sleep. Okay great. Uh I
think that maybe this is the last good
closing questions. Uh what skills do you
want humans and engineers in particular
to focus on developing in the age of AI?
>> Taste was a big one, but I already
mentioned that
system design is still very important.
>> Yes, you we talked about this in San
Francisco. Yeah.
>> If you don't think about that, you will
eventually swipe yourself into a corner,
right? Just by defining the boundaries
like the funny thing is like everything
is in the clanker but you still need to
ask the right questions otherwise
that makes the difference of like good
code that comes out or like really bad
code that comes out and that's still
where like all the knowledge you have
like how you build software you can
apply to steer the agent into into
something that is not slob.
>> Yeah. And then I think I think a skill
that is becoming more and more important
is saying no.
And and and that's something I had to
learn as well because
even the wildest idea is just just a
prompt away.
And usually this one idea is never the
problem but like this idea and this idea
and this idea and this idea and then how
all of that fits together that's the
problem.
>> Yes. So like
I think we're still bottlenecked on
syncing and about like big picture
syncing because imagine the world from
your clanker like you're being thrown
into a code base. You might have an
outdated agent. MD file, but you
basically don't know what DF this is.
And you like then like you tell me, hey,
add user profiles and you like somehow
add user profiles and connect it to the
two things you see, but you didn't see
the whole system, right? And then that's
where a lot of those localized solutions
comes where like the project has like vS
and and it's our job to like help the
agent do its best work by like providing
them with like hints. Hey, you want to
consider this? You want to look there?
How would this interplay with this? And
then and then ultimately you get like a
much a system that actually is
maintainable.
>> Yeah. Um well, thank you for maintaining
one of the most important software of
all time and thank you for spending time
with us.
>> Thanks for having me.
>> Yeah. Hopefully you stick around and
answer questions. Thank you.
>> All right. Uh we have a we have lots of
other maintainers coming in the claw
track. So uh stay tuned.
Good stuff. Well, thanks. Thanks to
Peter and to Swix for that conversation.
Okay, we'll do a little tiny reset here.
I managed to not be involved in carrying
furniture, which is good. Uh I would
only drop things. Uh so as we do a
little switcheroo here and get organized
we're uh just about ready to uh to bring
Vincent up. So Vincent uh is a does
research a research engineer at a also
does Devril work oh look at this is
happening like magic also does deell at
uh at comet uh but I think one of the
things we're going to hear about in
particular here is his work uh
contributing to openclaw. Um, I've had
so many conversations in the last, I
don't know, 12 hours, 36 hours of uh
about how the development workflow and
the kind of the processes are changing
so much since, you know, contributing
code has happened so much faster because
there aren't always just humans writing
it. And I think we're going to touch on
that a little bit today. So, um, I think
we're just about ready to to to bring
Vincent up. So, if you're ready to give
him a giant round of applause and make
him very welcome. Uh, let's uh let's
welcome Vincent Cotch.
All right. Enjoy.
Yeah. Ready?
Welcome everyone.
How are we all doing? Amazing.
Come on. Come on.
No worries.
I promise this won't take too long.
>> They've got it. Cool. Amazing. So,
welcome everyone. I'm Vincent. Uh, what
do I do? I'm one of the core maintainers
at OpenCore working with Peter. And as
you've heard before, I have a day job as
well. Same as Peter. he has a day job at
OpenAI. Um, but you know, it's an open
source project. Amazing things have been
happening. I'm going to talk about what
I call dark factories and how open claw
ships faster than you can read the diff.
Um, this meme is absolutely hilarious.
So, I think Peter posted this a week or
two ago. I wake up, there's a new
technological advancement. I wake up.
It's this this joke that we're shipping
at insane speed and the velocity is just
absolutely phenomenal. And some of you
might think, "Oh, this is some luck or
we're just like Ralph looping to the
max." Um, I think there's actual
engineering work here and I'm going to
talk about that. Now, as I mentioned,
I'm Vincent. I'm your friendly clanker.
Uh, this is me using VR goggles back in
2013. So, despite my accent that sounds
somewhat Australian, I was born and
raised in East London, not far from
here. I actually went to to college just
down the road in Westminster. And, yeah,
at some point decided to live in
Australia. my accent changed, but I used
to love technology. I used to love being
at the edge of technology. And this was
like one of the first few sort of early
um VR goggles that came out. Came in
this big box with a big warning sign on
it saying, "Hey, use for 5 minutes at a
time." Cuz it didn't have like the
anti-motion sickness built into it. And
the funny thing with this one here was
that I didn't use it for 5 minutes. I
used it for 3 hours. and I played Team
Fortress 2. Had an absolute blast and
then I vomited for 3 hours after that
cuz my vision turned into B vision. Um
the what I'm trying to say here is that
like anything on the edge is going to be
janky. It's going to be horri horrific.
It's going to be uh uncharted territory.
and working on openclaw and being part
of the team that ships probably you know
an insane velocity of commits to a point
where I get very limited by GitHub on an
hourly basis uh is an interesting
experience and this experience Britain's
gone through before um we had the
industrial revolution when mills and
cotton were being produced at extreme
amounts of volume and there's a lot of
history here around production and
productionization at scale
in the UK and in Europe. And I feel like
we're going through this moment again.
Uh we're going through this moment of
how do we build at scale and the ways we
used to work before just don't work
anymore. And
it's kind of strange because in my day
job I kind of work in the space of eval
which everything is sort of structured
and there's telemetry and it has to be
all perfect. and I work on a project
where I'm I have this blind faith in the
harness and it's this kind of two walls,
but they're starting to come together.
We used to have hand looms in cottages,
uh centralized mills everywhere. Uh
craftsmen were the factory workers, but
the bottleneck was the weaver's hands.
We're now switching to a world where
engineers writing code and editors, not
so much. uh swarms across repos
uh engineers are becoming factory
managers which I'm going to talk to and
the bottleneck becomes taste you know
that lovely word Italian mother's hands
yes
so in in context like what does this
mean like are you talking absolute
nonsense of people building things at
absolute scale they are what happened
was um very similar to the chat era
where everyone denied it at scale that
they were using chat ch Everyone was in
this absolute fear-mongering sort of
world. But what the reality was that
everyone was using it. Everyone in
secret was just like, "Oh my god, what's
going on? I need to talk to it." And the
same thing is happening with this
autonomous agents at scale. Some
organizations have openly come out with
it. So for example, Anthropic uh with
their recent work they did on building a
new C compiler. Uh we had Spotify saying
they're they're no longer writing code
by hand supposedly. Um Steve Jger which
I absolutely love uh saying he pushes
about 50 PRs a day total solo. He calls
himself a vibe maintainer. I can kind of
relate to that. And open core where
we're pushing at the peak we were doing
800 commits a day. And realistically
like there's about 10 to 15 core
maintainers all with day jobs. It's kind
of astronomical in terms of scale.
And for me this was March 15. um what
was that like two three weeks ago where
I hit close to 3,000 commits per day and
if you actually took look my commits
actually stop when I go to sleep. So if
you if you want to see when I go to
sleep and when I wake up and how many
hours of sleep I have you can just take
a look at my commit history. Yeah it's
astronomical. But the thing is this is
going to become the norm everywhere
else. Like this is this is like a me
telling you you need to wake up that you
know this scale of velocity is going to
be normal and trying to review PRs and
go through all this nonsense may not
work but somewhere in the mix is
engineering there is a form of
engineering that's going to happen so we
did commit maxing you know let's just go
there smash as many commits as we can
and this reminds me of Ralph looping
right this like this this this this guy
where you're like hey I'm just going to
like give you a task I'm going to burn
tokens for for like 8 to 9 hours and
you're waiting, you know, uh you're
waiting, you're hoping something
happens, maybe something happens, I
don't know. Um but what if we had a bit
more of an opinionated uh approach to
this? Um what if we call it bart
looping? I don't know. Uh one of the
other maintainers gave me this idea.
Maybe we'll coin it. Do we do we need
more than just tokens? Uh what does that
reward mechanism look like? How do we
get a bit more opinionated? Yes, let's
run loops, but let's be a bit more smart
about how we do this.
So, um, right about the time you saw
those 3,000 commits, uh, this was the
day before, I was at NVIDIA with Peter,
and the gentleman you see on the left is
is one of the other, uh, Nvidia
gentlemen, and they were like, "Hey,
we're building Nemo Claw." I'm like,
"What? What's going on?" And, uh, let's
help you build it. And I was in the
room. I was like, "I can't work on a
laptop for like hours on end. Can you
bring me a screen?" They bought me a
screen. Um, Peter didn't have a screen,
so that's his laptop on the left. He
asked for a screen, so they gave him an
even bigger screen than mine because,
you know, why not? And we just got to
work. So, he's running about maybe 15
codec sessions, and he's got his Mac
Studio at home, his VPN into I'm running
another like 10 or 15. And between
collectively between us, we're probably
running with sub agents included, maybe
up to 60 70 agents. Um, but on the
foreground, maybe 15 uh swim lanes if
you want to call it that. And we're just
just going for it. Funny thing is we're
working on Nemo claw one side but one
maintainer decided I'm going to move
some stuff around. I'm going to move a
couple of folders around and that was
moving entire channels. So like all our
conversations with like MS Teams and
Slack ended up moving to another
location in the codebase
and we were like oh my goodness we're
going to have to change stuff. Um and I
found a really nice uh place to put my
drink as well. The Nvidia people don't
like this. So what ended up happening is
what we call the great refactor. Um
essentially where we were like hey we
have lots of people raising PRs and what
they actually want is to build features.
The thing is we don't want to give
everyone every single feature that they
want in which case it becomes bloat. You
heard Peter say earlier on the challenge
becomes who do I say no to? It's not
about saying yes. In a world where
tokens are cheap I can just say yes to
absolutely everyone and merge everything
in. But that's going to turn this
codebase into an absolute fire dump. So
the vision was actually we need to cut
this codebase down. We need to we need
to rip it into pieces and a plug-in
architecture somewhat made sense.
Imagine if you're OpenAI or Mistral or
Anthropic, what if you owned that piece
of the provider code and it was handed
to you and it was separate from
everything else. So this code change
that occurred was like a catalyst for
us. It was 2 in the morning, we're
tired, we thought, why not refactor the
entire codebase? Sounds like a splendid
idea. So 2,700 commits later, uh, close
to a million lines of code change, uh,
touching 82% of the core codebase.
Plugins were launched. Um, the night
before, I think it was like 1:00 in the
morning, I'm trying to go to sleep and
the tests are not passing and I was
like, was Icorus and did I fly too close
to the sun? Um, as we like to call it,
did I did I vibe too hard? I actually
generally thought I vibe too hard, but
as a team, we managed. We managed to
bring this codebase back together again.
But the saving grace was these awful
sort of unit tests that AI code loves to
generate that actually ended up
overfitting on our code. So when we
completely ripped everything out, we
still had these tests that were like
extremely overfitting and as long as
they would go green, we knew we were
kind somewhat close.
Um so how do we do this? You know, um in
my case, I call it my factory. It's many
codec sessions. Everyone asks me like,
"What's this magic source? Like, how how
do you do this? What's this crazy insane
thing? Like, how are you guys building
this? Very simple. I have swim lanes."
Um, it could be five, it could be 10, it
could be 20. But traditionally, they
kind of cut themselves up into different
pieces. So, like if I does this work,
the laser, you can't really see it, but
imagine you're a factory manager and you
have a production line blow.
Essentially, you might have a case where
you have uh let's just say CI to one
side, you might have features in one
side, you might have bugs in another.
So, when I'm refactoring and doing
stuff, um right now the codebase is
quite stable. I want to refactor some
tests. Well, that might be swim lanes
one and two. I don't need to really
babysit them too much. I just tell them,
take your time, make sure the tests
pass, just commit, just just push them
through. Whereas with three and four, I
might be looking at specific features
and issues around say Docker or um one
of our messaging channel channels in
which case I'm having a conversation
with those agents. They're going off
investigating, doing the work, coming
back and then maybe five is actually um
looking at new P 0 and P1's um that
might be using other data, might be
using GitHub. Uh we have agents that run
inside of a Discord channel. So when we
do a release, we might be like, "Hey,
what's happened in the last two hours
that I need to be paying attention to?"
And this will scale up and down. But
what ends up becoming quite interesting
is tokens are no longer the problem. Um,
depends who you ask. What really ends up
becoming the problem is just raw compute
and my brain space in order to sort of
keep an eye on all of these sessions. So
in harness we trust what ends up
happening is I don't have this really
insanely complicated process. The one
thing I have complicated in my life is
adopting git work trees and I kind of
wish I hadn't. The only reason why I say
this is when you're running an extremely
heavy test harness, it ended up
completely nuking my machine because I
ended up running like every PR attach
ends up becoming a new git work tree. I
end up with like some close to like 70
or 80 active git work trees in any given
day on my machine. And that's kind of
hell. Um, so I had to actually build
some like magic source around my my
codec session. So my codeex is aware of
git work trees. If I hit the escape key,
it crashes. It will self-heal,
self-reover, get, you know, sparse
stuff. But realistically, I should have
adopted what Peter and other people do
is just like clone the repo 10 times and
point 10 different, you know, codec
sessions to each one. But the trick here
is that like I haven't done any magical
source. I don't use plan mode or spec
mode. I have a conversation with the
agent and we work through it and we find
a way to make it work.
So realistically, it looks a little bit
like this from the matrix. Um, and
people go, "Oh, Vincent, like, how do
you know it's kind of working?" And this
is going to sound somewhat little bit
lunatic. It it if if anyone's watched
The Matrix and seen the scene where Neo
goes over, he's like, "How do you know?
How do you read the text?" And the guy's
like, "Oh, you know, I've been doing
this for a while, so I can see like
woman in red dress or guy walking dog."
And you start to have this like
relationship where you can feel the
reasoning tokens. I know it sounds
somewhat ludicrous, but there's times
where I'm looking at the swim lane, I'm
like, "This sounds off. It doesn't sound
off because of what it's doing. It
sounds off because of how it's
explaining itself to me. It's waffling.
It's not making sense. It doesn't seem
to know what it's doing." And this feels
a lot like how I would manage people.
Um, if I had someone working for me and
they started downright bullshitting, I'd
be like, "Wait a minute, what's going
on?" So in these cases, I might just
nuke the session and go, you know, I'm
not going to deal with this section of
code. I'm going to leave that to another
maintainer or I might come back to it
four or five days later. But that
experience feels very much like
intuitive and building that intuition
I've been able to get to because of the
sheer volume of token maxing I've had to
go through in the previous year.
So there is engineering work. Uh I call
this the agent development environment.
Um essentially the process goes I have
skills I call it skills similar to dot
um dot files both of my dots skills and
dot files is available on GitHub it's
all open source go for it some of my
skills are private but there's skills in
there for like writing technical
documentation for example um that I've
co-created with other um developer
experience and other engineers in the
market um you can use a skills gym
something like a ger which I'm also a
contributor to and or You could just say
go codeex I've been using this skill in
my last uh two weeks go through the
codec sessions read the logs make
improvements to the skill um I would
then take that skill and deploy that
into my open claw or take that into my
know personal environment and I'll use
something like versel skills.sh as like
a mechanism to loop this. I've added
some other testing and other elements on
top of this but there's a process to how
I manage and maintain my skills as an
engineer.
The way we manage PRs has some level of
engineering work to it. Um there's this
kind of running joke that every
maintainer that joins the project
decides to try and tackle like oh my god
we have 6,000 PRs. How are we going to
solve it? I'm going to cluster
everything and like figure this out. So
>> how many
>> there you go. There you go. Honor. Thank
you very much. So this was my flavor of
like trying to solve this. This is like
a semantic graphing uh vector embedding
on the entire GitHub stuff. This is one
PR has 73 ed uh 106 edges. What ends up
happening is that everyone else has the
same problem. So they decide to to send
their flavor of the PR issue becomes
utter noise. So there is even process
around like how we even consume what
we're going to work on. We might not
call it a road map, but we have a way of
kind of dduplicating and seeing what's
out there. This might be a signal for me
to say, okay, if there's enough pressure
coming on one issue, it must be big
enough that all these other clankers
decided it's a big problem. Maybe I
should go and address it.
There is evals surprisingly. Um after
all this refactoring work, we decided to
make a fake Slack of sorts with both
synthetic models and real models so we
can run evaluation loops to check that
each of the providers and the channels
work.
And this question was asked of me
recently. How do you manage 10 plus
agents? And this is something that
you're thinking. I asked them back, how
do you manage 10 plus staff? And they
had no answer for me. I'd worked in
large organizations like airlines and
other places like that managing large AI
teams. I had experience managing up to
30 40 people plus. So for me it was not
like a a new paradigm. But I think for
engineers and people working with these
coding agents at scale, it's the soft
skills that matter. It's how do you ask
your agent what's going on? How do you
know when they're not bullshitting you?
And how do you run that factory? So it's
no longer about the model or the agent.
It's about the process. Uh 2025 was
about token maxing. 2026 is about not
wasting them. It's about token
efficiency. It's about agent in the
loop. Thank you.
>> Thank you ever so much indeed. All
right. Uh, I'm going to invite Radic to
come up and get plugged in and organize
as we're uh as we head straight on to
our next talk. Radic, are you are you
ready and armed with a laptop? Here he
comes. Exactly. Yeah, come on in and
make yourself comfortable while we while
we chat for a minute. Um, so, uh, okay,
I'm really curious about this talk
because one of the things that I'm very
interested in is kind of handing over
ownership of all of well, ownership,
maybe the wrong word, uh, permission,
trust uh, to an AI assistant to help me
actually achieve things while I sleep,
while it's while I'm not attending it.
Uh, and Radic certainly done that. Um,
he's another maintainer uh, working on
OpenCL. um and he has gone down the
rabbit hole of giving the keys to his
life to an AI agent so that it can take
all kinds of actions on his behalf. Um
I'm curious to know where the boundary
sits with, you know, where where trust
lives and uh how much you can uh
abdicate responsibility, hand over uh
the keys to an AI agent. So uh that's
what you're going to talk about, right?
Are you good? Are you ready? You set up?
>> Uh I think so. Yeah. Yeah. Yeah. I
should be
>> okay. It's just almost okay. Super. So,
um, yeah, as before, please give him a
giant R. No, no, he's not ready. He's
not. He's making sure that the AI agents
really do have the access to the things
they need. Um, while he does that, um, I
mentioned earlier on very briefly that
if you want to give a talk, we'd love
you to give a talk and there's a chance
to give a talk tomorrow. Um, I rattled
through those details very, very quickly
as everyone darted out for a cup of
coffee. So, while we just have a moment,
I'll just reiterate that there's a
section tomorrow afternoon uh where one
of the tracks uh does uh allow for
10-minute lightning talks from any of
you who feel like you uh you fancy it.
We love hearing from voices that haven't
already been on the stage. Um so, if you
wanted to go into the AI engineer Slack,
uh you can find details of how to submit
uh a a session there. 10-minute
lightning talk. There's going to be a
vote, I think, towards the end of today.
I guess it would be before the end of
the day so you'll know before tomorrow
otherwise you'll be looking at the
schedule and you'll be you'll hear your
name announced and you'll think okay
here we go. So you will hear beforehand.
Um but that's all happening in the AI
engineer Slack. So check that out later
on. Ready to go?
>> All good. Yeah.
>> Please welcome Radik everybody.
>> Cool.
>> Hey I'm Radik. I'm one of the open claw
maintainers
and uh I want to talk what happens like
in my life with open claw when I
practically gave the keys to my life to
to open claw and and like it almost like
literally and uh what that actually
means so so this happens like step by
step it wasn't all at the same time but
it can access my emails. It can access
my notes, files, calendars, tools, my
operating system, so automations and it
builds on top of like memory of
everything that I do uh at the computer.
So, uh it can do anything with it that
uh that is possible to do with the
computer. But it didn't all happen in
one big like leap. So I install OpenClaw
and now I it just like controls my life
and does everything for me. Uh that that
would be silly to do or like even silly
to expect that this could even work.
Um so what happened is that I I tried
installing
uh just like like everybody does just
like with one channel. I think it's at
the beginning it was just WhatsApp then
I migrated to telegram now I'm on
discord but it was just uh just WhatsApp
just uh one ability to do to just like
chat okay so we are there uh what what's
next that we can do uh let's let's do
some like one simple workflow or one
very simple task that you can do once we
are there let's go to the next step so
this is how it happened where I am today
where I used to think that
I have quite a simple setup with uh my
open claw and what it does because I
never did any big change but when I
encounter different I don't know Twitter
threads uh YouTube videos or talking to
other people how they have it set up I
see that like my setup has everything
that they have more on top of that and
most also is just like more
sophisticated than what what I see out
there which was really surprising to me
because I felt that it's just like one
small step at a time. I have a pretty
like simple setup works for me but uh
that that's what I want to to show how
that happened and how it looks like
today. So you you already had like a lot
of talks about how the sausage is made,
how we are making it better. You'll have
more talks about the insides of the open
claw. I want to show how it looks from
the other side from the first the simple
user then power user. Now I'm also a
maintainer.
you don't have to go to the maintainer
route but uh when I was playing like
with one of the uh workflows I just
encountered some errors and just like
submitted first PR then the second PR
then just looked into Discord and then
you just got involved now I'm a
maintainer there so it's also was just
like one one step at a time
uh so that that's the step these are the
steps that uh it usually happens is that
I see I see the need uh I solve it in in
a very simple way uh and and then I add
more steps to it and this is also why I
usually don't have big issues that
people have that okay now it broke my
computer or it just like completely
bricked during the update because I have
all these small steps that I take if
something breaks I just like step take
one small step back fix it see what
doesn't work, understand why it didn't
work, uh have a setup that it never
happens again and just like take one
step further again. So, uh where it
started being
more and more helpful and kind of like
running my life is when I gave it my
knowledge base. So, I had a lot of stuff
in my Obsidian
which I built up for years. So, right
now I have like about 3,000
pages or notes, markdown files in my
Obsidian.
And this is everything. This is work
stuff, personal stuff, tasks, projects,
research.
Um, what else? Articles
kind of like an inbox of links that I'm
just putting there and it then finds the
the connections
uh and puts it in in perspective and in
context to to other stuff that I have.
So all of that is now
accessible through my open claw with a
very good search. I have search and
memory. I have like normal search. I
have QMD search for for obsidian. I have
different memory for for my workspace.
uh
and
all of that is interlin and and and that
that's where that magic happens. And
when I saw recently
and that that's where it hit me that I
probably don't have a simple setup. When
I recently saw Andre Karpat's tweet that
went viral uh where he says about LLM
knowledge bases, I was reading that and
it's just like yeah, that's exactly what
I have. like what's like super uh
revolutionary about it and then I I I
understood that okay so I got there step
by step it works for me so it's probably
probably worth I don't know sharing
sharing telling more about it u showing
how it works showing how you can get to
that point as well uh and uh for example
for Obsidian um
do
Yeah, this this is how this is the real
screenshot of my my vault and all the
nodes and these are different clusters.
Some are probably uh project related
like the big clusters. Some of the one
off uh these are probably more
uh kind of like bookmarks.
And one of the tasks that I'm doing and
that I have is that when I add something
uh to inbox it then takes that link that
I add there
looks what's there it could be a tweet
it could be a thread uh it could be an
article it could be a YouTube video
analyzes it adds tax to it adds context
to it looks at what's already there on
this topic in my vault, how it could be
helpful in other areas and adds
connections to it. So what previously
was just like Twitter bookmarks that you
bookmark and you never go back to that
now it just adds more context builds up
my knowledge base and is much more
helpful and even surfacing the things
for me when I add a bookmark that okay
so you already had like this and this
and this about this subject and this is
how it connects maybe you should look at
those notes and very often it's just
like yeah I completely forgot about that
and and that's a good source of of
knowledge and of thinking about it. Uh
because there was the reason why I'm
adding this bookmark.
Uh so that that's where it's it's
starting to uh to be super super useful.
On top of that also uh
at 4:00 a.m. like 4 a.m. is just like uh
an example of that that I have. It's
happens probably between 3 and 6 more or
less. Um so this is what what is
happening when I'm sleeping. So when I'm
sleeping again my agent does everything
so that it runs well. It indexes
everything. It backs everything so that
worst case I if I lose something I lose
maybe couple hours of work of content of
anything else.
refreshes all the indexes for for QMD,
for memory, for my Obsidian vault and I
I start fresh in the morning uh with uh
whatever waits for me. Maybe summary of
the emails of the calendar uh everything
updated, the latest uh the latest open
version is waiting for me which also
took like step by step. I have some
scripts around it so that it knows what
to do and what not to do when updating,
what can break, why it breaks, how to
verify it before updating or before
restarting uh your gateway so that it is
able to come back online again. So that
that all is also uh automated and as as
I get up it's it's already waiting for
me uh fresh and ready for me to start
the day and each open claw is
like I'm not a big fan of sharing like
my exact setup because that exact setup
is like very specifically for me for
what I need right now uh for what I will
need in the near future.
for like the errors that I encountered
for issues that I want to be solved but
to give you some idea so that we can
talk more also about specifics and not
just like in general. So these are some
like five areas or five types of jobs
that my agent is doing. Uh the first one
is is ambient operations. So so this is
what I just uh showed you. So it it does
all the updating. It does all the
plumbing. Uh it does all all the stuff
that needs to happen but I don't need
and I don't want to think about.
Um the the second is attention
filtering. So this is also super useful
that because it has access to everything
and because it has all the content
context actually uh so it knows that for
example when an email comes and uh it's
something important or urgent and it
knows from obsidian what's the context
and the background behind it. Uh because
yeah I I keep everything in obsidian
about projects about everything else. So
it then can proactively tell me that uh
I think I have here. So like these are
like three very specific examples that I
had recently
that when the system notices that
something is important and urgent, it
just lets me know. So like Netflix
payment failure for some reason didn't
go through uh was fixed within five
minutes when it happened. Domain renewal
coming up. I would probably miss that
email. Uh but uh it it picked it up uh
gave me gave me a message on my discord
uh renewed my domain
uh emails uh that can already be with
enough context given about the project
for example it can already uh give like
read the email uh understand what's
happening understand what's already done
within the project and just draft the
reply and and it's already in in draft
folder for me to uh accept or or delete
or make some changes. So, so these are
some examples of like potential
filtering uh execution supports. Yeah.
So, that's draft synthesize is that uh
the the inbox and these are on the
right. These are the channels that I
have in my discord that more or less
relate to these types of jobs. So
general is where I have everything. Uh I
just start the conversations uh see
where it goes and if
enough times I have a type a certain
type of conversation I added a specific
channel for it. So the these are uh like
real screenshots from from today
morning. Uh the inbox is where I just
like drop links and it builds the
knowledge base for me. Consulting is for
for for the clients and every all the
backgrounds. It knows all the projects.
It's know knows all the quotes,
deadlines, tasks, next steps, everything
else. Video research is for for YouTube
for
researching what's what's out there uh
to help me uh with with the next
episode. Uh briefing is for morning
briefings. Instagram for social posting.
YouTube is uh for for creating creating
the the videos. Open claw is for
maintainer stuff and there's also one
playground channel uh which it changes
depending on day month or the need. Uh
it's for testing. I usually test maybe a
different model maybe a different uh
workspace different way of setting up uh
the the important files like memory and
everything else. So, I just play there,
see what works. If something worked, uh,
I promote it. If if it doesn't, uh, I
discard it.
Uh, and all of that works because it's
not just uh,
it's a system that has many moving parts
that work well together. So, LLM is for
judgment like understanding the email,
understanding the context, making the
connections. Then there are all the
files the the tools the scripts that I
have built the scripts are just like if
this happens do this it's done you don't
even need judgment so LLM is even
skipped
uh and important uh thing is also to
optimize your memory file your sole
soulm file uh I have also critical rules
MD uh because even if I had something in
agents MD
or in soulm uh it it still managed to to
forget something or not do something uh
with critical rules. Having critical
rules helps and having it uh mentioned
quite high in the agency file. Uh so
that that's also an improvement. Uh I I
went through a few different setups of
memory where I had one memory file. Now
now that uh I have like the whole memory
folder now we also have dreaming where
uh we have like promoting the memories.
So this is important to work on these
files uh and but it's easy to do in open
because everything is inspectable these
are markdown files editable you can look
at it you can read it you can understand
it uh and it works well. Uh what gets
harder? Uh bad memory
compounds. If the memory is not set up
correctly and your vault, your nodes,
your memories grow to thousands, you're
going to have an issue. So you need to
actively work on that. Brittle
automations, especially when it's like
10step automations,
uh it can break and it probably will
break at some point. So it's again
either split it up into simpler ones or
or have uh some guard rails uh that are
more effective
uh noisy nodes uh I'm getting rid of
them cleaning um cleaning regularly and
weak boundaries. So so those are all the
celld and everything else uh that the
files that that are important to
optimize for your needs.
Uh so what I want you to take away from
this is that like do what I did and then
at some point you realize yeah this
stuff is awesome and this stuff helps my
life. Um start with one recurring pain
grow trust incrementally build the
knowledge base uh move everything or
like move as much as you can or as you
want to markdown files and and start
making those connections.
um inspecting system expectable is is
easy for you done for with uh with open
claw um and optimize for the future you
and this is what I want to close with.
So couple years ago I had an article
about like the past me, the present me
and the future me and the past me is
just like this completely stupid guy. He
does nothing. He's lazy. Uh he doesn't
want to do anything. So now I present me
need to do everything for that like past
me and and the future me the future me
is just like kind some kind of like god
creature it can do anything uh that that
that creature is like um all powerful
and just like if I don't do something
today it's fine that that other creature
will do it for me. So that that was the
the issue and
uh the job for me is to to become
friends with the future me to to treat
that as a person that I want to help
with and that's the job of the agent. So
I don't need to do as much as I used to
because the agent just helps the future
me as much as possible so that when I
wake up tomorrow it's like as much as
could be done but then someone else
other than me is done. So that's that
that's the whole purpose of of this
setup at least for me. I don't know it
could be different for you. So that's
what I want to leave you with. Thank
you.
Radic, thanks ever so much. Another
quick round of applause, I think, for
Radic Shankovich. There we go.
All right, so we have Sally Omali now
just getting getting comfortable,
getting set up on the stage. Um, this is
a another uh hot topic, you know, as
we're putting more and more work into
building out our agents and of course
that comes with lots of different files,
lots of different bits of
infrastructure, lots of tools that exist
in our local environments as we're
building those. How do we then make
those portable so that we can share
those with teammates, have them deployed
and all the rest of it. So, uh, luckily,
uh, Sally Amali, who works as a a
principal software engineer at Red Hat,
uh, has been doing just that, and she's
going to be able to talk to us, uh, way
more articulately than I about that
subject. So, Sally, are you good?
>> All happy.
>> Is my mic, my mic's on. Yeah,
>> sounds good to me. Okay, if you're
ready. Okay, a big round of applause,
please. Are you good
>> already? You don't even know me. Okay,
Sally, platform me. Gesh. Hey, um, I'm
Sally. I work at Red Hat. I've been
there for about 10 years and uh the
first seven years awesome totally cool I
was working on containers and uh Linux
security stuff and Kubernetes I'm big
time in open shift that's what I did for
the first seven years and then uh about
three years ago well about five years
ago I moved to the emerging tech org and
that was awesome too because now I'm not
totally tied to a product I get to just
work on what I
I get to just try out new things.
Awesome. And then about three years ago,
it was like all AI all the time.
Everything AI. I know not I knew there
was a data science team at Red Hat. I
had no idea what they did. Machine
learning something something. Um so I
you know started doing AI and uh yeah it
was a lot of Python and Markdown. Every
single thing was like okay another
chatbot more Python more markdown. Um
but uh here we are today and what a what
a crazy awesome world we're in. Um so
the first time the first time I came
across openclaw I was home for a week on
like a station took a few days off and
uh a molt book happened and I was like
what the what is this? I'm totally
trying this and so I went and found it
on GitHub. First thing I do is I look at
the license MIT awesome uh OpenClaw. I'm
like I'm so gonna install this on Open
Shift right now. And so for the next few
days I just kind of built the image um
ran it uh locally in a container, put it
on Open Shift, just played around with
it, went back to work. I'm like, "Guys,
check out OpenCloud. This is so cool."
And uh a couple people on Slack are
like, "It's a security nightmare. Do not
use open claw. Don't put it on the work
laptop. I'm like, guys, what have I what
have I been doing the past 10 years is
I'm sec we're we can take any
application and run it securely. Like
that's what re is like if we can't take
an application and run it securely like
come on this is our golden opportunity
to show everyone. And so uh Red Hat's
coming around to that. Um but uh yeah so
uh this talk is about me running in
containers and so I wanted to get a list
of uh so the I wanted to get a list of
why running in containers is the way to
go. I run everything in containers. I
it's kind of foreign to me to uh take to
just run something natively. It's messy.
It just puts stuff on my computer that I
have to clean up later. I don't like it.
Um, so that's one one one thing. Uh, and
I um ask my forever claw. I guess I have
to introduce my forever claw because she
she's she's she's coming through this
whole talk. So I'm gonna aside my
forever claw's and um she I have two sub
agents. I have Joy. Uh anyone know Joish
astrology?
Sheesh. Every time I ask no one knows
what it is. It's this is very scientific
astrology. Um so she's an astrology
expert and um she gives me my weekly
readings, my birth chart, all of that.
So Joy and then my second agent is Bruno
and he gives me daily briefings on the
Bruins. Um so we're heading into the
playoffs and it's a close race, so I
want to make sure the Bruins get in. Um
so that's my forever claw. And uh and I
asked her, you know, why should we run
you in container? And uh she said all of
that if you were reading, but it's
reproducible. You can isolate your
secrets. It's portable across infra. I
can run on my laptop. I can run it on my
x86. I can run it on my Mac. I can run
it in Kubernetes. Um backed by volumes,
which gives a really nice story for
backup and recovery. Uh because I love
my Forever Claw and I I back her up
every night with uh with
um like a systemd service whatever it's
called on Mac and um and and you just
get that natural uh you just get that
natural sandbox when you run something
in a container. It's it's you know
that's that's what it is and you have to
be very explicit about uh what you um
give access to you know from the host
and so this yes I she loves running in a
container so that's that's all you need
to know um it gives her a clean
predictable environment doesn't have to
worry about the OS quirks stale
dependencies this is literally the
definition of why you should run
everything in containers
And uh just quickly, we're not going to
read this, but this is joy. My horoscope
um for today for giving a talk is
excellent. It's like a very auspicious
day to talk. Uh so yeah, that's why this
talk is going awesome so far.
And uh my daily briefing uh Geeky is
finally waking up. He had a bit of a
lull. He's finally um you know, ramping
up for the playoffs. So it looks like
the Bruins are gonna be looking good. uh
they're in and uh yeah
so uh yeah so um
containers it allows me to
uh another another thing that containers
do is you can set up a whole agent
directory with uh maybe you run some
tools some skills some MCP servers uh
you can keep those in a directory and
mount that whole thing into your
container container uh and uh so when at
startup everything's just up and
running. So I do that as well.
At the end of this talk I'll show you
how I install and I think this is a
reminder to me. Oh no let's talk about
secrets. So I run everything with
Podman, not Docker. Um, but
in theory you can do anything with
Podman and Docker except Podman has this
really cool feature called Podman
secrets and you can save your API keys.
I'll show that. I'll I'll show it off
the slides later. You can save your API
keys to a Podman secret and then you
mount that secret into the container.
And so it just gives this separation. Uh
your your secrets, your API keys are
then just a a ref back to the secret.
And with openclaw, what's really cool is
there's like a double that because in
open claw, there's a secret ref feature.
And I also use that. So my API keys are
a pointer to a secret ref to the outside
secret. And uh that's not perfect, but
it gives me some peace of mind that I
don't I'm I'm not going to be showing my
API keys in the logs and everything. And
then very similarly, um Kubernetes has
Kubernetes secrets. And same thing,
instead of just a straight Mvare, you
have a a a secret a a secret ref to an
MFA.
And this is my reminder to show you how
I install my containers. At the end, I
have a really cool tool. I built it just
for me with everything that I need to
run containers. I'm not pushing it on
anyone, but it's in GitHub and at the
end I can uh let you know where that is
so you could try it if you want.
So when I
so
thank
I think we're heading to a world where
these agents, these AI workloads,
whatever are going to be running
everywhere. I hope we all can see that.
And so imagine my vision is for u
everybody's open clause to be uh running
everywhere and communicating with each
other and uh when and especially in for
like business use cases real real things
not astrology and and brewins uh that
opens up the need
that to the same need to run any
application uh in that way is security
and and how to do it at scale and that's
what Kubernetes gives you and you can uh
what I always do is develop something
locally and then lift it to Kubernetes
and so the same story holds for AI
workloads or open claw
and I was at pietorrchcon yesterday and
um my friend from Nvidia said I could
share this They are running their model
evals with openclaw. They have about 10
engineers. They each have their open
claws running in kubernetes and
periodically just checking in with the
model evals
and it works so well for them. He said
it it was like you know doing the job of
six engineers uh in with with himself.
Now let's think let's just talk about
that for a second. We're not all losing
our jobs, people. Like that's not
happening. What's what that is enabling
for his team is they get to do fun
stuff, interesting stuff. They get to do
creative things. And this is what AI is
giving me and my team is we can focus on
those like outside the box crazy things.
And you don't have to do the tedious
code anymore. Like I haven't written
code in in a few months. And this this
did just happen like probably less than
six months ago. I was uh using AI. I was
like, you know what? This is way better
than me at writing code. And there's I
like Yeah. And I I announced that to my
team. We had an org meeting and I'm
like, "Guys, if you're not using AI for
everything, like you're missing out.
This is 1,000 times better than me at
writing code." And some of the top
engineers at Red Hat like definitely
raised eyebrows and I could tell from
their comments after that they were like
no way. I'm like yeah and so so yes it's
it's enabling us to just dream bigger
and uh this is my reminder to show you
the Kubernetes side of my installer
later.
And um yeah, so backup and recovery is a
nice clean story when you run in
containers too. Uh the state is the
same. the volumes. Another nice thing
about Docker and Podman is there are
volumes and so all of my runtime state
lives in a nice contained Podman volume.
And of course, Kubernetes has uh PVCs.
That's kind of what I just talked about.
And so this this would be my vision of a
workplace setup for open claws where you
maybe have your nice curated baseline
open claw that as a new hire you you
just you get your your base and what
does that have in it? It has your list
of company approved MCP servers uh your
authentication that is approved through
your company. It has all of your these
skills that are very specific to your
team. Uh maybe it access to your Google
Drive. Like all these things uh that you
use every day at work, you can take that
and just fan it out across your whole
team and then and then you can
personalize it as as the individual. And
that's what uh that's what this setup
allows.
The alternative would be you're a new
hireer and you sit next to somebody or
get somebody's uh repo and kind of put
it all together yourself.
And so yeah, team standards,
portable environments,
reproducible onboarding. That's my
vision for like openclaw in the
workplace in the future.
Uh, I actually just recently created my
Forever Claw. It It was like a month of
me um helping out with with OpenClaw and
feeling like I don't even run a real
open call myself. I just constantly
throughout the day I'm spinning it up,
spinning it down, testing it, building
it. Every hour there's like a hundred
new commits. So, I'm constantly pulling
from Maine. I was at PyTorch Con
yesterday and hadn't pulled from Maine
for a couple of days. Uh and and there
were when I did it was like 10,000
commits. Like no joke. It was crazy. I'm
like I don't know what you guys are
doing. Slow down. Uh not really. We
don't want to slow down. So yes, uh
that's that's the story.
And I've got four more minutes. I am
psyched because I can now switch over
here.
So in order to run uh this local
installer here,
which I think I have here, yeah, it's
just a mpm rundev. Now,
the one thing I don't like about this is
when I'm on my Mac, I can't run this in
a container. I I think I can. and I just
haven't taken the time to figure out how
to spawn a container from a container.
You can do that if you're on Linux
because Linux is awesome, but on your
Mac, that's not possible because if you
don't know, whenever you're running a
container on your Mac, you're running in
a virtual machine. Same with Docker.
Containers only run on Linux. So, when
you're running a container on your Mac,
you are always running in a virtual
machine. Docker sets up one and so does
Mac. So, it gets a little tricky when
you want to take a container and spawn
another container from it. But anyways,
here we go. So, if I wanted to run a
local instance, and I have a couple
running now, just, you know, you never
know the demo gods what they're up to.
So, I'm just in case it doesn't work.
I'm gonna I'm gonna um spin up Joe.
All I all I do to set up my pod is I
just give it a name
and then all of these options very
opinionated because I I'm telling you
this is exactly what I need. So if it's
if you like it, use it. If you want to
change it, then submit a PR. Cool. Now,
uh the port is usually 89. Uh, that's
the default, but since this is my second
one that I'm running on my machine, I
had to I'm just bumping it to 99.
These Podman secret mappings I wanted to
show you here.
So, you can see I have these set up
already. They're just on my system.
They're like m but they're not mver
because they're contained. Um these are
my API keys.
And what happens with this installer
is it takes if you're on docker this
should work with docker. It's got podman
written all over it but I've designed it
to work with docker too. So um if you're
on docker it takes the mvare. So you
want to export those as mvers and um
makes them openclaw secret refs. Very
cool feature of openclaw. It definitely
enabled that for every cred credential
create a secret ref. It creates that
separation of uh running your secret
within openclaw or kind of just a
pointer to it. It's it's it's the way to
go. And then uh your providers. So, I'm
going to start with open router because
I have been playing with Gemma
and she's Gemma's great
and then as a fallback I'll use
Anthropic. Sure, why not? But oh, here's
here's some other choices though.
You can you can have your local endpoint
if you're running your own. Uh you could
just uh add that too.
And then because I do observability at
work, uh, I was like, I'm gonna give the
option to set up an open telemetry
collector with Jagger and it works and
it's awesome, but I'm not going to test
it. So, let's not tax my system. Another
feature, how much time? Oh, I got to
hurry. Another feature is the SSH
sandbox. Here, I'll deploy. The SSH
sandbox in OpenClaw is super cool. you
give it SSH keys and known hosts to uh
to wherever you want and it it runs all
of its commands in that workspace. It's
really cool. So look, I just spun up a
Podman container and if I go over to the
instances,
I now have Joe and there's logs
for Joe, the gateway logs.
Um the command, I wanted to show you the
command. I don't want to forget that.
So, here's the podman command. If you
were running Docker, it would be a
Docker command. Have I tested with this
with Docker? No. Uh, I have a friend who
works at Docker. He's awesome. He told
me he would try this out and make it
make sure it works with Docker, too. Um,
he also created this very cool project
called Infer RS, which takes uh Gemma
and runs it really, really, really fast
and uses Turbo Quant. Uh, so yeah.
Anyways, that's um Eric.
So that that's my podman command. And uh
here he is. Joe.
And if I just do like models, I'll do
status.
So like people say it's hard to spin up
open claw. That took two seconds and I
was babbling through the whole way. It
could have taken one second. Uh, so I
can say, "Hey, um,
and the cool thing is I don't have time
to show you because I talk too much, but
the agents are all set up. I've got Joe.
Oh, that not that one. Hold on. I got to
go over to Larry." Larry, I started with
a um
with an MCP server and a sub agent um
all through that form.
So, uh let me go back to Joe.
I wanted to show you how easy it is just
to switch models in case you didn't
know.
I'm not sure if the GPT5 hopefully it
knows it's just GPT 5.4. No, I I didn't.
No, no, no, no. We got to go over to
Larry because I didn't set up G I didn't
set up that extra model with with Joe.
Here we go.
Anyways, um
I didn't have enough time to go through
everything I wanted to go through, but
the uh
cool
the other thing is Kubernetes and you
can do the same thing with Kubernetes
just as easy.
It just uh it's it's connected right now
to my kind cluster. And if I go over, I
can access my Kubernetes claw very
easily as well.
Um
there's Carl. He's running in Kubernetes
and I can access one in Open Shift.
There's uh it switches over to Open
Shift if you're connected to Open Shift.
So yeah, run anyone gonna run Open Cloud
container now. Try it. Yes. Awesome.
Okay, cool. Uh, thank you very much. Uh,
is someone on after me? You're waiting.
Okay, bye.
>> Sorry.
>> Thank you so much, Sally. Sally Ali,
thanks. That's great stuff. I love the
uh the the slightly uh teasing. Anyone
on after me? Could I just I could just
keep going. Uh, sorry, Sally. No more
time today. But uh look out for Sally,
see if you can grab her. Um she loves
talking about this stuff and she's got
lots to talk about. So uh so you know it
goes for all of the speakers, you know,
find them around the the event and have
a chat. They do love to share what they
know and hear from you as well. Um okay,
so this is our our last talk of the
session before we'll have a break for
lunch. Um and earlier on in the day we
talked about we talked about trust and
we talked about kind of uh abdicating
responsibility was maybe not the right
expression but you know certainly kind
of giving over trust and giving over
access to things. Um Nick Taylor who's a
developer advocate at uh Pom Pomearium.
I almost did it. I almost said
Pomeranian uh and now I have done. So uh
so any goodwill that I would have earned
has now been lost. Uh so apologies for
that. But Pomeranium Pomearium do deal
with Oh my goodness. do deal with
exactly this to deal with with trust and
access to things. So um Nick's going to
be able to talk in detail about you know
how you can how you can control that. So
securing securing and building with
openclaw um our last talk of the
session. So I know you've held some
applause back so give it up please for
for Nick.
>> Yeah. So, uh, like like Phil said, I
work at Pomearium, and he's not the
first person to have trouble pronouncing
it. So, I actually convinced the
marketing team to, uh, create Pomeranian
stickers. So, if anybody wants
Pomeranian stickers, I have a bunch with
me. Um, bit about me. Uh, I'm a dev
advocate over at Pomeram. As Phil said,
um, from Canada, halen from Montreal.
So, uh, if anybody likes poutine and
bagels, feel free to chat with me after.
uh also a GitHub star, Microsoft MVP and
AWS community builder. And these you can
pretty much find me everywhere uh at
Nikit T online.
Um I was pretty happy to see this that
there's a a pretty sizable instance on
prem of uh OpenClaw. So it's pretty
happy with that. And it looks like
that's the operator there.
Cool. So uh I don't know. I came up with
a funny title, I guess, but claws out.
Um, we're going to talk about a feature
I contributed to the open claw project
back in February, and it's about
hardening access to the control plane.
So, uh, I'm assuming everybody here is
running an openclaw or open claw
curious. Um, is anybody running, uh, a
mode called trusted proxy off mode? You
might not be, but uh, okay. you might be
on the who's on the token off.
Okay. Um anyways, so at Pamarium where I
work, um you know, I'm always just
trying to secure things. That's just
part of what I do. And I was able to
secure Open Club, but it meant I still
had to add a token uh for the websocket
connection. I had to always pair my
device and stuff. And you don't really
need that with a trusted proxy like
specifically the one that I work on
which is open core. It's called an
identityware proxy. So if anybody's ever
used GCP uh there's a in there. It's
called an identyware proxy. Something
that came out of Google. Uh essentially
you've got an identity provider, a
policy engine, and a reverse proxy. So
those uh it's not the lethal trifecta in
the sense that you usually hear, but uh
it's a pretty solid security approach
for securing internal apps. So I was
like, of course, I I I kind of got
annoyed that I had to add this token
still and do the pairing every time. Uh
I understood why they were there, but uh
I just proposed this issue and then
uh at least one other person who uses
Caddy uh chimed in and said, "Hey,
that's uh sounds like a good idea."
And then Peter uh Stipe was like, "Yeah,
let's let's work on this." And he laid
out like the criteria that he wanted to
have for this feature. So I went ahead
and worked on it. And yeah, again, prior
to trusted proxy off mode, even if you
were secured by a proxy, you still had
to paste in that O token in the UI for
the websocket connection. And also it
sticks it in the query string which uh
obviously like this is really more for
just only local mode really. Um and
still having to pair the device like uh
I don't know if people get annoyed by
pairing the device but uh I you know I
I'd just be on my phone after I just set
it up and then I was like h I got to go
to the other thing to set it up. So um
basically you still had to do those
things even if it was secured with uh a
proxy.
So got merged in and uh I felt pretty
good about it and uh it was nice to get
some praise from Peter. It was uh my
first contribution to the project. So it
was very cool. So what does it look like
exactly like in the config? Uh I'm just
going to show like a kind of narrow part
of the config here, but you have your
gateway and essentially you no longer
need the token like I mentioned. Uh the
mode is obviously different. So it's
called trusted proxy now. And then
there's some new properties you have to
add. So there's trusted proxies and this
is essentially the proxy that is gating
access to the control plane the uh the
gateway. Um it's the IP addresses. It
could be one or more. And aside from
that you have to have a trusted proxy
section. So you'll have a user header
which is in in my case uh it's a jot uh
JWT. Um and then there's like a required
header section. There's some optional
ones too. It depends what you want to
do. There's like allowed users and in my
case I don't need the allowed users
because the way uh identityware proxy
works is the policies dictate that. Um
but essentially that's kind of the the
big change there. And you can do this
through the onboarding or if you just go
back in and uh configure things through
the TUI.
And yeah, so that just meant no more
token for websocket connections and no
longer needed to pair devices. So not
only are you uh getting uh better
security posture potentially, uh to me
it's like a UX win as well because I
really found doing these two things
annoying. Um cool. Uh I also just want
to give a shout out to a couple
contributors after I contributed this.
Um, there was a bug and Anthony reported
it and then uh Sid fixed it and
it was definitely something I missed
because I basically was testing this on
my local environment and I already had
something paired so I didn't run into
the issue that uh Anthony had mentioned.
So uh luckily uh it was a small fix and
uh Sid got that sorted out but just uh
you know when you miss stuff uh people
in the community step up. So OSS for the
win.
The other thing I want to mention, it's
not so much about this feature, but like
when I opened this issue, um the number
of the issue was 1560.
And I had a a PR initially that was like
in the 1700s and I went on vacation and
I said, "Oh, I'll get back to it when
I'm back." And the original PR was
closed because it was stale. And like
literally after two weeks it went from
like 1,500 to like almost 16,000. So uh
basically that's just a testament to how
popular the project got. But it also
meant I had to uh rebase quite a bit
before it got merged. So anyways, I
don't know if anybody else that
contributes to the project, but there's
so many things going on all the time. So
there's a lot of uh rebasing to keep
your thing up to date.
Cool.
So, uh, let's talk about my own open
claw. So, this is Mclaw and he's sitting
on my desk in Montreal right now.
There's some snow still. Um, I use it in
Discord. I don't know where people use
their OpenClaw. I had it on Telegram
initially, but they don't actually uh
uh their their uh channels aren't
encrypted. So, like all the stuff's
unclear. So, I work at a security
company and my CEO is like, "Yeah, don't
use that." So anyways, uh I'm mainly on
Discord. I find it use uh handy that
way. I have WhatsApp too, but I I tend
to use the Discord more.
Um some things I want to mention too is
when I made the contribution, I actually
used OpenClaw to make the contribution,
which was kind of fun. Um, but it also I
made the mistake of I used the GitHub
CLI and I gave it full access so it put
up a PR right away even before I was
like done reviewing things. Uh, so I had
a little like ah but uh put it back into
a draft mode. Um, but aside from that um
after the uh token trusted proxy mode
got merged I just started working on
something. It started getting fun to
just build stuff on my phone. So, I
built out something called Clawspace.
And, you know, doesn't doesn't mean you
need to use it. It's just, you know,
it's the age of personal software. I
just had a lot of fun building it. I
find it useful. And I thought it was
just cool that I could build this out on
my phone on Discord. Um, but for me, I
find it useful because I don't need to
SSH in to see workspace files that I
want to actually read or like edit. Uh,
so u that's just a little side project I
started building. And you can edit files
and stuff too.
Cool.
So, we're gonna do a demo here. This is
going to be uh live coding. So, yolo.
Okay. So, uh there's a MCP track
tomorrow. Uh I've been doing a lot of
work in MCPs. So, what we're going to do
is we are going to build out an MCP. uh
not a full-fledged version of something,
but if you've seen the uh AI engineer
website, they have like an LM text on
the right and there's a MCP server and
there's a few other things. So, I'm
going to go ahead and just add this here
and I'm going to go create an app. I'll
explain some things here in a second.
Okay. And OAS. Okay. So, this is going
to go create an application in chat GBT.
Uh but basically, this is uh an MCP
server that just has UI as well. They'll
they'll be talking about this tomorrow,
but uh I have a template that I use for
this. So, it's not like I'm building
this from scratch, but we're just going
to register the MCP here. And then I'm
just gonna start building with OpenClaw.
And the thing with a Gentic is you never
know when it's done. It's just finishing
O ath here.
Okay, cool. It's connected. And we can
see here it's got two tools. It's got a
echo tool and it's got a search speakers
tool.
So, if we come here, if nobody's ever
used MCP apps, basically in chat GBT,
you do this for your app. And I'm going
to say like echo hello.
And essentially, it's going to do the
tool call, but because there's UI
associated to it, you're going to get
some UI in here. And this is just using
the standard MCP stuff that's in the
spec. Now um so you can do stuff like
change that make it big and stuff but
what I wanted to show is like when I'm
building this with open claw I can do
stuff like this I can say like
uh
change
echoed message
to
aie eu
in the echo widget.
Now, it's going to take a second, but um
this is all web tech under the hood. So,
I don't know if anybody's web devs here,
but essentially it's using vit and
react. So, there's react refresh and vit
module reloading. Mloud is on the case
here, and you can see I'm in chat GBT.
I'm editing live from my workspace the
MCP and and to explain how this is
working, uh we have the trusted proxy O
mode. uh I happen to be using primarium
in this case. So I'm using it as well to
secure other things in the workspace. So
I have a public URL that I've gated for
the MCP and that's how I'm able to use
it in chat GPT. And I can go ahead and
just keep working on it in here. And I
don't know how other people work or
build with uh OpenClaw, but this is kind
of how I've been doing it. I find it
works really well for webdev stuff. So
I'm going to go say update the search
speaker. So let's just do this new chat.
And I'll say at AIE again,
search speakers.
And it's going to give uh a very minimal
UI here because there's not much into
it. Uh so I'm going to just tell Mlaw to
get on the case here. And basically, if
you go to that top right corner of the
AIE uh website, there's speaker.json.
And this is like all the speakers from
the conf. And we're going to use that as
like the source of users. And then I'm
asking it to kind of give the same UI as
what you kind of saw in the echo widget.
Uh it's going to take a minute here
probably because uh the claw is covered
in snow probably in Montreal. But uh
cool. And so basically once this gets
done uh we'll be able to filter users
and just kind of you know see who's
talking um at the conference. And I'm
just going to take a sip of water while
Mlaw is chugging along there.
Again, you never know when a gentic
finishes.
Okay. It's determinist deterministically
indeterminate.
So this should be done in a second. And
then what you're going to see is you're
going to see this updated. And again
just to reiterate the flow, I'm I'm
working in workspace files in my open
claw. I'm speaking to it or typing to it
in Discord. Uh this is a publicly
available site. Uh and I'm able to build
it as I'm in my openclaw. And I like
that workflow. I really don't know how
other people work. I mean, obviously I
use other tools like claude and and
codeex too. Uh but you can see here um
Mclaw was able to get the job done and
then I can start filtering. So we could
look for drilling down here. Then we can
find a speaker and then we can get a bit
more information. And then like I could
say let's add another feature here. So
let's get Mlaw in the case again.
So, we're going to add a more button
here. And there's this send message
function that you can use in MCP apps.
And this is actually going to uh when
you click the but the more button that
it's going to generate, this will
actually make a call to the LLM to and
you're going to get a response back. Uh
so, we'll give this a second.
Cool. So, I added this more button. And
again, like I've been doing webdev for a
while, and I always still find it
magical when things just automatically
update. But I'm going to go ahead and
click on here.
And you're going to see here that it's
thinking now. So, it actually made a
call uh added another uh prompt to chat
GPT here. And it's going to kind of
summarize why it thinks you should check
out Aleandro's talk and a bit more about
it. Now, I just really find this
workflow really cool. It's only possible
if you use some kind of proxy to do
this. Uh you can do this with others
like uh Caddy, with OOTH, you could do
it with uh well, EngineX is kind of
deprecated at this point or not
deprecated, but uh at least in
Kubernetes lambda ingress controller is
um but it's just a really nice way to
gate stuff that is local, but you can
still expose it in a secure way. Um, and
it's also just fun build. Like I don't
know about anybody else, but I've been
really enjoying building stuff just
chatting. Uh, I remember a couple years
ago, uh, Replet, uh, who's who's a AI
company that's, you know, making it
really easy to build stuff. I was like,
why would I ever want to build on my
phone? And, uh, I kind of got, uh, phone
tilled now, I guess. So, um, just f, you
know, just having fun. I think that's
part of the thing with OpenClaw. also
just like use it however you want to.
Like I I find that claw space I created
super helpful. Uh you know build your
own tools and stuff. Um definitely take
security into consideration. Uh you know
there's a bunch of people that have
obviously you know exposed things and
they didn't mean to like you know some
people have deleted all their emails
etc. Um, but I don't know. I I find the
trusted proxy off mode super useful and
at least one other person that does in
the in in that issue. Um, I encourage
you to check it out. Uh, just have fun
building stuff and yeah, that's uh
pretty much it. My name is Nick Taylor
and that's how I build with OpenClaw.
>> Thanks so much, Nick. You'll be relieved
to know you can uh exit the stage and
get some lunch now.
>> Yeah. Okay.
>> Um which is true for all of us. It is it
is lunchtime. So we have uh a nice
healthy lunch lunch break. We're going
to be back at 2:30. So in this room, if
you come back here at 2:30, we've got I
think it's three more uh of these uh of
these breakout sessions uh before
another break. And then back in here
after that break, we're back to the the
keynote sessions where all of the tracks
combine. So, uh, another little
reminder, if you want to talk, submit a
talk, get into the Slack, um, to do
that. Uh, but otherwise, uh, thanks for
this morning. Enjoy your lunch, chat to
folks out there, and we'll see you back
here at 2:30. Okay. Thanks a lot.
>> Thanks.
Hold
still
a little.
I watch the sparks all burn too fast.
Everyone reaching for the flash.
They take the first light they can find
and call it truth and call it mine.
But I stayed when the room went quiet
when the noise fell out of
sat with the weight of the question
while the easy answers walked
away.
It's not that I see further. I just
don't leave it soon. I let the silence.
I let the dark
right past the comfortable.
I wait till the surface breaks till the
shade feels true.
I don't rush the fire.
I give it all.
I give it to
Call it done, call it enough,
but there's a deep hum
for patience.
Every thing makes you choose.
Do you leave with what's acceptable or
stay for asking more of you?
They say it's talent, say it's magic
like it falls from open,
but nothing worth remembering
kinds.
Through the restless
through the earth to collapse
and chase the answer I let it find me
back.
There's a moment after the last good
idea dies
where the room feels empty and you want
to run for your life. That's the door
teaches you.
That's the
real
Hold the light.
Hold the
Let the shape reveal it.
I stay longer than I should long enough
to change.
I wait till the pattern clears till the
signal breaks the haze.
I don't
dream
too soon.
I stay.
I stay.
Heat. Heat.
Heat. Heat.
Hey. Hey. Hey.
What we do in life
echoes in eternity.
Heat.
Heat.
Heat. Heat.
Fear is the mind killer.
Fear is the mind killer.
Heat. Heat.
Heat
up here.
Heat. Hey, heat. Hey, heat.
Hey.
Heat.
Heat.
Heat.
Heat. Heat.
Heat. Hey, heat. Hey, heat.
Heat. Heat.
D.
He
hey.
Hey. Hey.
Heat. Heat.
Heat. Heat.
Hey. Hey. Hey.
What we do in life.
Echoes in eternity.
Heat. Heat.
Heat. Heat.
Heat. Heat.
Heat. Heat.
fear is the mind killer.
Fear is the mind killer.
Heat. Heat.
Heat.
Heat.
Hey, hey, hey. Heat.
Heat.
All
right.
Free your mind.
Free your Mind
heat.
freedom of mind.
You are who you choose to be.
execute the vision.
Heat. Heat.
Heat. Heat.
Hey, hey, hey.
Heat. Heat.
Heat. Heat.
Make the requirements less dumb.
Delete the part or process.
Simplify and optimize.
Accelerate
cycle time.
Automate
Heat.
Heat.
Never give in. Never give up. Outlast.
Out compete.
Persevere. Persevere. Persevere.
Heat. Heat.
A new age has come.
Hey.
Hey. Hey.
High
heat.
High.
Yeah.
You
know you
for you.
Heat. Heat.
Yeah. Yeah.
Hey
Oh,
Heat.
Heat.
Heat. Heat.
Heat. Heat.
Heat. Heat.
Heat. Heat.
Heat.
Heat.
typing into the dark becomes
words evolve to whispers me for
something more divine. syntax
and I see the language change. I'm not
instruing anymore. I'm rearranging f.
Every loop I write rewrites me. Every
functions with meaning. I feel the
interface dissolve between the maker and
the
new
ons.
No lines, no rules, just balance in
between the zero and the one. The
silence and the dreams.
They mold the way we move. We live
inside the gates of what we think is
true. But deep beneath the data post,
there's something undefined.
A universe compiling the image of our
minds. Every line reveals reflection.
Every loop replace connection. We're not
building, we're becomot.
No lines, no rules, just balance in
the zero and the one, the silence and
the dream.
We are not just the world we're in.
We are the world we're doing.
Each prompt, each breath, each fragile
spin. The universe
renewing.
This is the new code.
Alive and undefined.
Where logic meets motion and structure
bends to mind. The systems eternal but
the souls the line. We are the new
compiling tie.
Compiling time.
Heat. Heat.
Hey,
black yeah.
Hey hey
hey.
Hey
Oh,
hey.
Hey. Hey. Hey.
Heat.
Heat.
Da da da da da.
Heat. Hey, heat. Hey, heat.
Heat.
Heat.
Oh,
Heat. Heat.
Heat. Hey, Heat.
Heat. Heat. N.
Hey.
Hey.
Hey.
Hold still
a little.
I watch the sparks all burn too fast.
Everyone reaching for the flash.
They take the first light they can find
and call it truth and call it mine.
But I stayed when the room went quiet
of the question
while the easy answers walked
away.
It's not that I see further. I just
don't
let the silence.
I let the dark
right past the comfortable light.
I wait till the surface breaks till the
shade feels true inside.
I don't rush the fire.
I give it all.
I give it to
I felt
it.
But there's a deeper still humeneath
the fear of not being love.
Every great thing for patience.
Every thing makes you choose.
Do you leave with what's acceptable?
They say it's
magic.
But
remember
on the first try
I stay when it stops feeling kind
through the restless through the urge to
collapse
and chase the answer. I find
the last good idea
where the room feels empty and you want
to run for your life. That's the body
teaches you to open. That's the edge
where the real
life
Let the shape reveal it.
I stay longer than I should long enough
to change.
I stay
away till the pattern clears.
Breaks the haze.
I
most dreams
don't fail.
They're just left too soon.
I stay.
I stay.
I want you.
Heat.
Heat.
Heat. Heat.
Heat.
Heat.
Heat. Heat.
Heat.
Hey, Heat.
Hey,
hey, hey.
hey
hey
Heat.
Heat. N.
Heat.
Heat.
Heat.
Hey, Heat.
Heat. Heat.
Heat.
Heat.
Heat. Heat.
Oh.
Heat.
Hey, heat. Hey, heat.
Hey.
Hey,
hey, hey.
Oh yeah.
Oh.
Heat. Heat.
Hey,
hey, hey.
Hey.
Hey.
Hello. Welcome back. Oh, that was a
that was an interesting moment. I was
pouring myself a cup of coffee and then
I heard my name called and I thought,
have I got anywhere to be at the moment?
Uh, so I came to see you. Uh, okay.
Welcome. Welcome back. Uh, did you is
the coffee still flowing? Yes. You
caffeinated? Did you manage to have a
decent break? Have you had some good
chats with folks in the breaks? I'm
hoping so. Okay, we have uh so we've got
three um uh three more sessions while
we're in our in our tracks before we all
come back and reconvene uh later on. Um
the first up uh today uh is is honor
Solaz who um is founding engineer at Tex
Cortez. We've been talking a little bit
about you know how agents get deployed,
how our development environments get
contained and what have you, but what
about scaling them? What do we what
happens when we're starting to run many
many agents all at the same time in
parallel? Well, um, Honor's been talking
a bit about that and thinking a bit
about that and he's going to come and
share some wisdom with us now. So,
please, uh, let's get this afternoon off
to a good start with a giant round of
applause, please, for Honor Solaz.
Honor.
>> Yeah.
Hi, everyone. Um,
welcome. Um so
the talk is building on ACP on uh at
opencl it's uh also about other things
like how to uh put open source agents on
open source agent frameworks on
kubernetes and stuff. So I'm I'm I hope
uh to I will today share in a nice way
what I've been working on in the last
two months. Um, a little bit about me,
very brief. My ah a little bit about me.
I've been building harnesses since a few
months before chat GPD came out. I built
a jupitter lab extension over og codeex
model like Dainci code 2. Um, back in
the days that eventually so I'm
currently working for a startup and that
that initial coding harness turned into
its current harness over time like a
ship of Thesus. it got ripped apart and
put back together so many times. And uh
yeah, I'm a founding engineer there.
I've been
in the industry since three and a half,
four years. I using open cloud since
Cloudbot uh first dropped in Discord. I
went in there. I've been following Peter
since he wrote uh Cloud Code is my
computer. I was like, "This guy's crazy
because Cloud Force on it like I
wouldn't give my machine to it." But he
he went he ventured forth and he's paved
the way for us. And when I saw Cloudbot
in Discord, my mind was blown. The next
day I installed it at the company uh in
our like cluster and it was basically
talking to people on discore and also
everybody else's minds were blown and we
are like we have we're used to selling
the enterprise for a few uh years now.
So that's why I started by adding an MS
teams integration in case it might be
useful at some point and I became a
maintainer along the way. I was there
when it was renamed two times. Um and
today I I my focus is on agent
interoperability and orchestration.
Um like one of my goals is accelerate
enterprise adoption of openlow and
adjacent software and also address you
know openlow is not secure. Well, this
will be secure or Peter talked a lot
about that earlier. So, I don't have
anything else to add on top of it. Like,
it's work in progress.
Sorry,
I'll use this.
So, uh started by I started on developer
workflows right away. So I created a PR
uh and then call it Discord driven
development. Well, Telegramdriven
development suits better because it's
TDD. So um and then uh right after I set
it personally and I realized, you know,
OPUS is not so reliable for complex. It
wasn't back in the days. It's now
better. It's agents are improving, but
Codex was main my main harness and I
wanted to use Codex in Discord, but it
was basic. I was telling I was playing a
telephone game. I was telling Oppus to
tell Codex to do something. God knows
what it's saying, you know, because
wording matters when prompting, but it
was working somehow. And I would go uh
and look at the codeex session and then
it paraphrased kind of what I was
saying, but eventually, you know, I got
some stuff done, but I knew this could
be done much easier. Um
and today I'm running a full ID on
discord like uh you know we have
parallel workloads you have like one to
like five channels at any point I'm
working with one to five agents. Um, so
you see codeex one to5 and then close is
for testing the open close ACP feature
and then yeah basically that's how my
some of my channels look like and it's
very good for coding on the go because I
have I'm a guy who's addicted to side
projects and uh I I like to just you
know uh AI is making things making
things a lot easier to do these sort of
things like in parallel you have an
inspiration you execute on a weekend and
you just get it you ship it ACPX which
I'm going to talk about is is like
similar I built it through uh discord
and what you do is you bind the channel
uh discord channel to codex through ACP
you can also use codex app server
protocol herald another maintainer uh
developed it and here is me using it
before I flying to London to create an
PDF about like convert the docs of ACP
into PDF and then I have to say put it
in temp because Open Codex doesn't know
about the harness and it cannot send me
in discord. So then I go to another
channel and tell it to send it to me on
that channel. So we are developers and
our tools we don't have time to polish
them very well but we know the
like advantages and disadvantages. What
is ACP? So it's not you know most people
say is it like MCP? MCP for is for
giving tools to the model. ACPs for uh
like standardizing agent to client
interaction. It's uh shout out to Zed. I
actually forgot to put Zed logo. Um
Zed is building a new editor in Rust,
more efficient, like lower memory usage,
not Electron. So uh I'm using Zeds also
since like uh last fall. And uh you know
if you use Codex on VS Code or Cloud
Code, they are all building different
plugins. It's not exactly. It's like so
much wasted work. If only you could just
standardize them under one interface and
you just build it once and then you you
ship it. And that's what the the idea
that's the idea they had. It's also more
it's it's much less duplicate duplicated
work. Um there are competing standards.
Um so agent like agent uh agent protocol
that's for agent talking to agent. ACP
agent client protocol is for a human
human talking to agent but agent can use
the human one to talk to other agents as
well. In the long run as as these
protocols get adopted we will support
all of them. they will weigh the you
know advantages and then we will use
them somehow but when I need so I chose
ACP because when I needed the most I
needed adapters for codex and cloud code
and only zed had built them and Google
didn't have them back back back at the
time so that's why I chose it
u that so I when you're adding in
functionality to open you do uh you use
do it through a CLI
Um
so I said okay let's create a CLI for
ACP let's an agent call any other agent
over the command line. So that's that's
how it started. Uh and slowly turning
into a Swiss army knife for ACP as I
will show in a bit.
So in uh at openlow you have fire hoses
uh we have 60 KPRs uh over 60 kprs total
300 to 500 per day on average are open
and basically overnight people woke up
and decided they want they like open. So
we have tens of thousands of
stakeholders who want to add features to
open cloud and the biggest challenge of
the project currently is how do you uh
absorb all the needs and wants of these
people and how do you balance them? How
do you like you can't please everyone
but how do you create an elegant system
without uh creating AI slop you know
that can cater everyone's needs and
Peter's workflow uh gave me an idea so
this is also something I do you go and
you ask your clanker what is this so PR
comes your way like most of the time
it's AI generated description like what
is this uh if the human put thought into
it, great. But like, yeah, you need to
ask what it's doing. You ask what is
this the best possible fix. Most of the
time it's no. And then you either
continue disc like he he wrote it like
you you do some back and forth with the
agent. The reason is people just uh run
into an issue with their open claw and
it can use GitHub and then they just say
uh please fix and then they just send
some slop your way. You know this is you
can't merge it but you can also fully
discard. You need to take this data
point you it's like crucial feedback
from the user. Uh so you need to put it
categorize it put it in a bin you know
it it tells you when some part of the
code is broken and Vincent also talked
about that a bit uh and here is uh his
his codeex session doing that on one
side there's a one that says it's it's a
good fix u and on other side it's just
saying it's not bad and
this is so mechanical and you you once
you do this over and over you realize
you're repeating something if only you
could program something to automate it.
So you are automating the automator. Um
so I created I started uh to create like
workflows uh in an abstract way. So uh
item comes PR and then you find the
intent you're judging implementation
you're like uh looking into if if that's
conflicts uh if reviews gives you like
some issues that need to be addressed.
So you need to make the CI pass if it's
not passing already. Most people just
don't care about that. All this
mechanical work by the time the main the
PR ends up in front of you should be
resolved ideally and can be resolved and
that's what I'm working on.
Ah the workflow you have the shameful uh
Ralph review refactor loops. Uh I I'm a
believer that you know when people say
like give the
just just run a running a agent in a
loop does not not necessarily have to be
uh uh something that will create slope
as long as you're not making it design
something but you're making it uncover
shallow bugs that can be easily fixed
should be fine. So in the abstract
workflow that I created u which is
actually called like pro like turn into
a program you can uh tell it to uh do
superficial refactors and then you can
tell if it's it needs a fundamental
refactor relate to the human um
resolving conflicts also doesn't need
like no it was hard back in the days I
don't think anyone is like resolving
conflicts by hand by now um so we are
basically creating uh standard operating
procedures for agents. That's a fancy
word for workflow. Um so that's what I
built into ACPX. Uh it's a NAT like
workflow engine. Uh but it's driving a
codeex session. You can see on the
right. Let me show you in action.
So this was one PR.
It's loading. Uh let's speed it up a
bit. So there's some programmatic parts.
So this is just replaying uh what it's
doing.
Uh it's reproducing the bug. It's
judging refactor. It's reviewing. Now
I'm doing a review loop and then review
didn't uh bring anything. It it's it's
what I do. Uh but I make it outputs like
JSON structured data. So I can put it in
like an ATL like workflow and I will
talk you know this is a general workflow
engine. Um you can use it on other
things as well.
Um like you need to apply agents
generously
on problems. So I see it is like an
ointment uh that you apply generously on
any problem that can be solved with
agents. You need to take yourself out of
the loop and solve it with agents. Um
and personal agents I think are on a
like there there's a spectrum enterprise
and personal agent and normally you see
the the work you use at the computer and
then the PC the the PC use at work and
the PC use at at at the home used to be
relatively similar but that will not be
the case with agents because at at work
you will be using consuming a lot more
inference uh and that means in
enterprise there will be a lot more
money to be made. So that's why I'm a
bit also excited about enterprise agents
and open close potential.
That's why uh I I believe in uh on
demand disposable agents. If you use
open claw on Slack or Teams or Discord,
this is one instance. You create an app.
The problem is you can't really talk to
multiple instances of this. um
like
you create a connection, you create a
Slack app to connect another agent and
another name and another uh profile
picture you need to create another app
and you have to create an app manifest
and it's it's it's something that
shouldn't be managed manually by
clicking and the platforms like chat
apps don't have this standard yet where
you can mult do multi- aent provisioning
like cosmetically create different
agents. This is what this is on on the
screen. You see I asked chat GP to gen
generate the idea. You agents uh and
then the name can be generated by an
underlying app and you can talk to them
separately. This is not supported and
this must be supported for this uh
vision to work until it's supported. I'm
using it on another UI because like we
are all gonna start one uh agent per
task and there will be uh they will work
on these tasks and they will be creating
files, editing files and it will all be
synchronized. It will be it will be a
tad bit different than what you're used
to with your uh personal agent. And to
do to have that you need to have like uh
a few key components. You need to have
Kubernetes. You need to uh have an agent
harness. Open cloud could be one of
them. Codex cloud code could be one of
them. Could be ACP. It could be uh
GitHub.
uh you give read write access and you do
like state data synchronization like you
maybe you do something that's use rync
some something like what whatever
algorithm Dropbox is using and there are
some projects that are uh taking this on
I've been working on this uh this is
outside of openlow this is my day job
and uh it's an open source uh
orchestrator
uh it's a go operator that's uh
basically handles
the complicated parts. You know, there's
a user experience. You want to create a
concierge on Slack concier agent and
you're talking to it, but you get
bottleneck because you have 100
employees on Slack. So, you need to for
for some other task you may you may need
to create a new one and then it creates
and gives you like a website link. Um,
I'm going to skip the like shout outs to
Cognition and Devon U because uh yeah,
they invented the category, but I'm
running low on time. So, the repo is a
text cortex spritz. I'm just going to
demo it for our use case. We use it on
error reporting uh currently. So, if
you're on Slack, you know, you can ask
it to dispatch an agent to debug it. you
know, you're like, I'm uh asking any new
bugs after pro release and it's saying
something and then you ask it to create
an agent. Well, if I could put that
agent into Slack, I would, but I can't
do that. So, I have to put it in another
UI. And this is an open source project.
You can take this UI is also uh like a
React app hosted in the cluster that
you're deploying this these Helm charts
to. And uh yeah, it starts a
conversation there. It starts working on
the uh problem. This is like Codex web
or Devon or anything, but it's actually
using like a full Kubernetes pod.
Wasteful, but I think it's the better
abstraction because open close showed
the power, you know, when you give a
full computer to an agent, it's a lot
more powerful. Um and I I believe that
as well. um u like I think uh open hands
uses firecracker so I'm also uh not so
uh well versed on all the different
virtualization frameworks uh so it's I'm
also learning along the way but I have a
working product that's uh running on
kubernetes
uh and you can use this uh product uh
you can if you're interested in
deploying internally and uh using codex
on the head uh and then just spin things
off. If you have like I don't know a
back end uh if if you have an open
source project first of all I can help
you set this up. If you have a system
like inlet of just hundred of hundreds
of issues per day I can help you process
it. Um this is um
does all the wiring around Slack and
like keeping those agents uh like on and
the user experience and then the the
interro like you're not locked in in any
agent. You can switch it's all abstract
the way below uh with ACP. Yeah, that
was my talk. Uh thank you for listening.
Some social links in case you want to uh
get in contact with me. I just want to
make clear like of my uh their openlow
uh side and text cortex site. U so this
the last part was text about the work I
do at text cortex
just to give a disclaimer.
Um thank you for listening I guess.
Perfect. Thank you. Thank you so much
getting us off and running. Um
brilliant. Okay. Right. Well, uh, next
we have, uh, MV who's going to come up
and talk in just a second, I think. Uh,
are you good? Oh, yeah. Okay. So, uh,
yeah, I'm going to invite MVver up to
get get plugged in, what have you for a
second. Um, we've been talking a lot
about the tools and the ecosystems, and
my goodness, there's enough companies
out there who are building products and
platforms uh, for us to use with agents.
Um, but of course, there's also a very
rich ecosystem of open-source tools that
are supporting this as well. um we've
heard from several of them already today
and this this ecosystem is really
flourishing and I'm I'm personally quite
excited about this kind of ground swell
of uh of new tooling and kind of
innovative ways that uh people are
coming together to build out and support
this this big movement and that that's
what MV is going to talk about. So MVA
is a machine learning engineer at
HuggingFace um uh and has been been
experimenting with this and is going to
cover all kinds of aspects of building
things uh with agents uh and and looking
at the ecosystem. How are you doing? You
good?
>> Are we plugged in? You have a clicker?
>> Yes.
>> Okay. Excellent.
>> Like that.
>> All right. So if you're ready,
>> give me a second
doing the screen now. It's
>> almost
>> and then we have one more talk after MVA
and then we have a break. Okay. Um and
then we'll all come together after that
break as I said earlier for the for the
last keynotes uh of the day.
>> So
>> yes,
>> bit of screen mirroring going on. You're
not live coding, are you murder? You're
not going to be doing any live coding.
Okay. No,
>> it's okay. Yes.
>> No live coding.
>> Okay. Happy. Good to go.
>> Yes.
>> Right. Let's do it. Uh, a nice big warm
round of applause please for MVA.
>> Thank you.
Hello everyone and welcome to this talk
in open agent uh ecosystem and uh I
would like to call it having an AI
engineer at your fingertips.
Um I'm Marv and I work in the open
source team of hugging face. How many of
you are hugging using hugging face on
daily basis?
Oh let's change that. This is not okay.
Um but first let's talk a bit about open
source and what it is. So when it comes
to machine learning open source is
absolutely differential. Basically you
have the open weight models um that go
in with non-commercial licenses. We call
them open weight. And then we have open
source models that have uh commercially
available licenses such as this one from
deepseek. It's called MIT license or
Apache 2.0. And then there is like even
more open uh models that have the code
open. If you have like agents there, the
harness is open, everything is open. And
this matters even more by the fact that
like yesterday or the other day it was
revealed that the cloud uh performance
was going down. Uh so if you if you have
everything in the open, nothing changes
without you knowing. No performance
degradation without you knowing.
Everything's great. Uh, but on top of
it, if you have access to the weights,
you can shrink them, you can quantize
them, you can fine-tune them if you feel
like it. And it's absolute guaranteed
privacy for your end user because uh you
can deploy it to edge devices, browsers
without the data going somewhere else.
Uh, this matters a lot in my opinion
even more these days with the security
breaches and everything.
And there was this argument maybe few
years ago that open source models aren't
as good as no no this is not the case
like you see for instance the latest GLM
5.1 is absolutely crashing it and I'm
actually using it in my coding setup. Uh
the this is the uh artificial analysis
intelligence index and the green ones
are open models. Meanwhile the black
ones are the closed models and we are we
just catched up and we will catch up
even more with the upcoming models and
stuff. And let's go back to hugging face
hub. So everything is facilitated
through hugging face hub. all of the
open releases. It's the infra layer for
all of your open source uh workflows
and as of now it's hosting even more
models. I should have updated the
number. It's probably close to 3 million
a lot of data sets spaces and
everything. But that's not all when it
comes to the Aentic ecosystem. And this
is what we are going to talk about
today. So when you go to the models uh
you can filter for aentic models. Uh
they are mostly the trending ones and
there is like two types of models in my
opinion. There is the v vision LMS and
then there is the LLMs and the vision
LMS can also act as like a computer use
agent over the screenshots. They know
where to click etc which is pretty cool.
And one trend I have recently noticed is
the fact that you have uh labs releasing
their LLMs as vision uh with vision
capabilities day zero like for instance
the Gemma 4 was an omni model and still
it's an agentic model there is Q1 3.5 uh
there is Kimik uh Kimik 2.5 these were
VLMs so I foresee that all of these
models will be over time release day
zero with vision capabilities
and uh it's super easy to run this
actually like you can just use like VLM
ML or like llama CPP llama server uh
from the get-go with like few lines of
code like it used to be much more um
frictiony but these days this is a not a
big deal
and if you want to compare open models
we have recently launched this feature
called benchmark data sets. So when you
go to the data sets on the left hand
side there is like on the bottom there
is a bunch benchmark button you just
click it and then you can see the
popular benchmarks such as S sw ebench
pro or humanities last exam or aime and
others and when you go to for instance S
swbench to see like how your agent is
like good in coding and stuff uh you see
the open models ranked according to the
scores. So like currently GLM 5.1 is top
of the list.
So it's also easy to pick an open model
these days because there's 3 million
models out there and it used to be a
challenge to pick different models.
And if you actually want to wipe check
it, hugging face has this ser uh service
called inference providers uh which does
routing for the best models to best
providers like all of the providers are
there. There's gro cerebras I don't know
novita and everything and then it's
super easy to compare them as well if
you see like uh you have the cheapest or
the fastest option actually I had to
truncate it but also there is the tool
use column so you can actually pick one
of the open source models for the
agentic use case and stuff and going
back to agents after all of these uh
hugging face hub shield uh hugging face
hub actually recently has shipped a ton
of uh features for you to use open
models with agents, agents and stuff and
first off like there is the MCP server
where you can plug hub into your LLM
and there is uh skills uh which allow
you to even wipe train models like you
just go to your agent and say train Q1
3.5 on this data set for me and then it
just trains which to me is like sci-fi
at this point because it used to not
exist and like there is so many things
going on in the back end and the agent
actually handles them very well and then
there is the local agent so you can run
full coding agents uh locally from
models with hugging face hub because we
integrate very well to them
and coming to the first one so basically
my talk will be consisting about all of
these uh coming to the first one there
is the local coding agents
and your options. You have like actually
many many options but like one of my
favorites is Pi because it's like super
simple to set up. Uh basically you can I
I think you can also use it with
inference providers remotely but also if
you want to serve like a local coding
agent you can use llama CPP to serve it
and then pi will directly consume that
and uh something very cool is also llama
agent which is baked into llama CPP as a
binary that you can just directly
execute and start a model by giving
hugging face hub id. So, it's super easy
as well to get an local agent running.
Uh, I will share my slides on my Twitter
account after. So, no need to take
pictures.
My one of my most favorite things these
days is Hermes agent and I will just die
on this hill. So, this is like this is a
bit one step even further to from the
open claw by means of memory management
and everything. And it's actually super
easy to get started with that. And uh it
is you can either use it locally or with
hugging face inference provider. So for
instance, I was playing with that
uh like the setup wizard does everything
for you. You just give the keys and
stuff and then integrate into your Slack
or WhatsApp or whatever and you're good
to go. And I absolutely recommend using
this if you want to use it with an open
model. I absolutely recommend GL GLM 5.1
for instance. I actually failed
initially to integrate into Slack. I
have witnesses in here my colleague uh
Neils is here and um I asked GLM 5.1 to
fix it with the Hermes agent and it's
fixed on its own and it's uh it was a
good day like uh I I think GLM 5.1 is a
very good model and I cannot I can't
absolutely wait to use it with Gemma 4
but also this weekend there is like on
Twitter there was a rumor
uh minimax model coming up. So I will
also probably try with that and share my
findings. So I absolutely recommend
using Hermes agent with the open models.
And one more thing so basically uh
HuggingFace Hub now has a new data set
repository type called traces. And this
is basically all of your uh codecs uh
cloud code or PI traces they host it.
And for instance, if you go to your um
if you pushed uh a trace uh and then you
go over there, you will see in the data
set viewer if you click on the traces
column
uh it pops up like this. It is very
nicely parsed and you can just explore
your data and then later if you want you
can even train a model on that which is
pretty cool in my opinion.
And uh if you want to push your agent
traces, you can just upload your
sessions from uh these uh paths and
nothing else is needed. And we will also
probably have Hermes agent very soon for
traces.
Uh going back if you want to use if you
want more options to serve LLM behind
the agent locally.
So some tips and tricks in finding a
good model you just go to hugging face.
There is an other tab. Under the other
tab, there is the apps. So these apps
are like LM Studio, Jean, um, Llama,
CPP, everything that is for local
serving is over there. And when you
filter for them, you have the models
that are supported by these uh by these
uh local apps. So whatever you want to
serve, we have you covered. And when you
go to the model repository, something
very cool in my opinion is that on the
left and right hand side there is gguf
uh section. So basically GGF if you
don't know it's supported it's it's
basically comes in llama CPP the file uh
format uh that is supported in many
things like all llama LM studio
everything and
you have the hardware compatibility for
instance the Gemma 4 larger model if you
quantize it to 4bit it fits inside an L4
GPU uh with the 24 GB of VRAM
so I think this is very cool and this is
also also serve to uh MLX repositories
as well. And when you go to the again to
the model repository, if you have
absolutely zero clue on how to serve
this model on top right there is use
this model and you have the options of
the local apps that the model is
supported in. And when you click that
you see like only with few lines of
command uh that you can run you install,
you get the model served and voila. It's
very very convenient to run the open
models these days
and lastly supercharging your coding
agents using hugging face skills.
So there is we have like bunch of skills
in order to get you started with
training uh I don't know inferring with
the open models using open models
exploring open data sets using AI apps
everything and uh we have this thing
called the hugging face CLI skill which
allows coding agents to manage
repositories
uh run jobs launch demos and everything
and this is how you can install it uh
you can just uh type HF skills on Google
and you will find the uh commands. Uh
but we have more skills than that. So
basically this allows you to plug hub in
into your agent like give you all of the
uh hugging face hub exploration. But
rest of the skills are super cool. There
is LLM trainer skill. Basically this is
uh this is not only for LLMs but also
vision language models. You can just
tell the model to okay train this model
on this data set and it will just kick
off the job remotely uh on our infra or
like locally wherever you want. And
there is gradu skill which allows you to
build demos. And there is hugging face
data set skill which allows you to uh
explore data sets through our data set
viewer API and you can install it very
easily. Again we come with more
integrations. I just put cloud and
gemini here.
So putting this into action for instance
I asked the model uh to I asked cloud
code to say hey can you train qan2vl
on lava instruct mix which is like a
vision language data set and it asked me
a few questions. It said okay which
instance would you like this to go in
because you have multiple options. uh
the model actually like in the back end
the agent actually uh calculates the
amount of VRAM required to run fine-tune
that model in a given batch size and
everything. So it handles everything for
you. It just asks you a few questions.
Okay, what is your validation split blah
blah and then it just launches the job
which to me is absolute sci-fi still to
this day as a person who have been
training models since I don't know
beginning of my career like six six
years
and you at the end you just find your
model on hub
and this is not limited to LLMs and VLMs
I have recently shipped um skills for
for instance training object detectors
your I don't know segmenting model and
everything for vision it handles for
instance different bounding box types
and everything you just give the command
and let it handle everything
and going back to MCP what do we serve
uh we have models data set spaces search
for your task uh semantic search for
spaces so if you don't know spaces it's
like the app store of AI you have a ton
of uh apps over there for absolutely
everything you could see. And also we
have something called jobs which allows
you to kick off uh oneoff jobs that ends
like uh if it fails or if it succeeds
and you pay for the amount of time it
was up. And also you can query these
apps from MCP like I'm going to show you
shortly, but it plays nicely with all of
your favorite platforms.
And so for instance in here I ask the
model generate image of a bak lava made
of yarn and then it will call uh the
hugging face of qven image which is an
image generation model hosted remotely
and then it will query that and it will
bring um the output of that. It works
very nice look.
But you need to turn on there is a
setting in the MCP called dynamic
spaces. If you want more options of like
if you want absolutely all of the spaces
you need to turn that on which is a bit
of bit experimental
and here is some few ideas that you can
use spaces MCP. Uh but you're absolutely
not limited to those. And tying it all
together, my colleague Neils has built
uh something I which I found cool so I
wanted to share. So basically on hugging
face hub there is papers and these
papers basically AI related papers. We
want people to be able to ask questions
to these papers or share h but not all
of the papers come with markdown uh
which the model which we can index and
stuff. So we OCR 30 30 30,000 papers uh
using codeex open OCR models and jobs
all through prompting which is a bit
crazy. So the steps to do that is
firstly pick an OCR model that is cheap
and nice and performance. Ask the LLM to
kick off a processing job and actually
write the code for that and then kick it
off on hanging face infra and then let
the skill set up the instance of hosting
that model and everything without you
going through the pain of the napkin
math and then profit.
So to pick an OCR model you need to um
you need you can go to ALM OCR bench
which is a benchmark data set that I
have previously shown you. The first
result is Chandra OCR but don't be
fooled by this. We have just today
shipped a skill that you can just ask
the model okay what is the best model on
OCR for fine-tuning and it will also
make recommendations around finetuning
and stuff. So if you need like smaller
models etc it will handle everything for
you with this skill. So it's pretty
cool. Check it out. Um once you pick the
model okay we in this case we use
Chandram.
uh we asked model to write the script
and it did and then the agent just does
the napkin math for the instance and uh
calculates the cost of the running job
and everything and then these jobs will
be so so basically these jobs will be
rerun. So we have recently launched this
infra product called buckets which is
like a a3 buckets but much cheaper and
faster
um that you can use with mounting and
yeah basically um you can just use that
and you can get started uh in these
links. I hope you like this talk. Thank
you so much.
Thank you so much.
>> Thank you.
>> Um, great. Well, we are we are down to
our final session uh of this uh of this
chunk. I'm going to invite uh Frederick
to come and get get plugged in. I never
know which side people there he is. Come
come and get plugged in and settled in
while we talk for a second. So, um
Frederick Vichowski is the uh CEO of a
startup which is kind of new on the
scene but growing at a ridiculous pace.
Uh I don't know if people have
encountered uh encountered Victor yet,
but it's a it's a tool that that uh
connects many many tools and services
and I'm talking about a vast number of
tools and services and then kind of
presents those through like a unified
interface. Um I'm hearing the phrase
kind of it's it's like like the first
employee AI employee which is language I
I that piques my interest very much. Um
and we're going to get to hear a little
bit about that. not just uh what the
product is but also kind of some of the
challenges in building it uh and some of
the applications uh of it. So uh so yeah
this that kind of shed some light on
this title you know an AI co-worker that
lives in Slack. Um so yeah I think we're
going to hear about that now. Are you
how are you looking? Uh Frederick you
you happy you you set up
>> very happy
>> good to go. Okay so much.
>> All right so let's have a round of
applause please to welcome Frederick
Vichowski.
Let me see if that works.
Okay, clicker doesn't seem to work, but
that's fine. Um, cool. So, my name is
Frederick. I'm the co-founder of Victor.
Um, Victor is the AI employee that
probably most of you have heard of
already. It's absolutely blowing up. We
launched it in February this year. Zero
expectations of growing at all. It was
actually an experiment and it surprised
all of us. immediate product market fit,
you know, huge adoption worldwide and
yeah, we can't uh we can't catch up. Um
so what what is Victor? Victor is an AI
employee. And when you think of an AI
employee,
um you should think of it as
just like a human employee, you know,
lives where you live, lives in Slack, it
doesn't have a web app. Um, so just like
your teammates, you don't need to go to
a separate place to to call it. It
participates in your discussions in
threads, in channels, and it has access
to to the tools that you have access to.
Um, it has access to 3,000 integrations.
And if for some reason it doesn't have
access to your integrations,
it can build its own connections.
So essentially, Victor can use any tools
that your company uses. And therefore,
Victor has the context of all of your
tools. And as opposed to human
employees, Victor has a horizontal and
broad context about the whole company.
And for example, when you currently hire
a CMO,
you can probably assume that this CMO
would be much better if it has had
access to your codebase, if it was able
to contribute to your, you know, uh to
to your codebase. Um and Victor can do
this. So, it's bringing this kind of um
universal PhD level understanding to all
of the areas of the company.
Let's start with a quick story of Victor
and the company. Um, our mission from
the very early days in 2023 was to build
AI employees. And back then it was, you
know, after chat GPT has launched. Back
then we thought that the right way to
build AI employees is is is through
browsers. Um, as a reminder, we didn't
have tool calling. we didn't have like
you know great code generating models.
So you know probably the right way to
take actions was was through browsers.
You know browsers are like very
universal interfaces. Um you can
essentially use any tools through a
browser. Uh most apps have have have uh
browser apps that that you can interact
with. And the way back then it was
called JCAI was working. uh it was
taking a snapshot of your DOM min
minifying it in a lossless way um and
then based on the snapshot and this
minified snapshot and your goal it was
deciding on the next step. So for
example should I type something in in
the search bar in Google or should I
click on a login button to log in. Um
and it was great you know it it
certainly should work right um and it
did but it didn't work for a lot of
steps with the current cap with the
previous cap capabilities of the models
was like back in 2023 it was working for
like three to five five steps reliably
and by reliably I mean with 60%
reliability and you know that was
compounding with with each step and so
uh that was still state-of-the-art so
JAI was a state-of-the-art web agent on
on the most popular agentic benchmark
called called web arena and uh and it
was doing well but it was very difficult
to make it into a useful product just
because of the reliability and the speed
issues. Currently you can just call a
few tools or like you know call a
function and it will immediately give
you an output and with the web agents
you know you have to wait a minute until
it fails. So it was quite quite hard but
but web agents are amazing and you know
um they're finally working much better
than in the past. Um cool. So you know
uh after that uh JCI became an email
agent. So you know Sonet 3.5 came uh we
have built our first first agent loop
and we really wanted to have the
experience of you not having to go to a
web app to ask the agent to do something
but rather the agent having all the
necessary context and being able to
proactively come up with the tasks for
you. Um and we achieved that with Jace.
Jay was like an amazing product also
great product market fit um
and it's still it's still alive you
should you should check it out.
Basically the way it works is you know
whenever an email arrives an agent loop
is triggered connects to your tools can
react to emails not only with email
drafts but also with with with with uh
tool codes. For example, if someone asks
for a refund the agent can automat
automatically do a refund for you. of
course can be gated with approvals as
well. Um cool. Uh but then you know this
February we launched Victor uh Victor
you know probably everyone I mean we are
in the open cloud slack so in open cloud
track so everyone knows open claw which
is a personal agent and we always wanted
to build the employee which is the
company agent and that's the first
question you should ask yourself is like
how is it different what is the
difference between the company agent and
the and the personal agent. So um first
we think uh that company agents should
live where you live, work where you work
and have all the company context and
that you know I if um if if you're
building a personal agent then probably
everyone from the company connects their
own integrations and you know runs those
agents on their own with Victor and with
the company agents is different because
suddenly it's sufficient for one person
from the company to to connect an
integration. Victor will inherit the
permissions from this integrations or
like you can tune it and then the whole
team has access to it. So you don't need
to connect them a h 100 times. So as I
said before 3,000 tools um lives in
Slack and essentially does anything
across rows and um that comes with
challenges and as you can imagine um
as you can imagine like the um I'll talk
about one mainly here. So the first
challenge with you know coming from a a
personal agent to a team agent and you
know not having one user but any users
is
is around memory. So with open claw
there was a big concern about the memory
getting cluttered cluttered over time
and I think that's a you know serious
and it it makes sense to be concerned
about this right um but imagine that you
have the same architecture and the same
memory but now for a 100 users and not
one user. So it's probably running out
of the memory a 100 times faster. It's a
big challenge to be solved. Um and we
have solved it. Uh another thing is
Slack has different channels and and
companies have you know different
different hierarchies that we need to
adhere to and people will often give the
agents conflict conflicting instructions
but let's imagine that you have Victor
your company agent in one channel in the
growth channel and then in the
engineering channel and also in people's
DMs. So um Victor will you know take the
context from the growth channel uh or
will take the context from the executive
channel and you somehow need to make
sure that this context will not be
leaked to the engineering or support
channel. Similarly if you DM Victor with
your problems um Victor should not take
the context from the growth channel
unless you are from the growth team. So
it adds a lot of complexity on how uh
how how the access is structured.
And
you know we chose Slack as our interface
for what we think is AGI for companies.
And there is a reason for this. I I'll
say I'll start to I I'll first talk
about the reasons and then what breaks
in Slack. Um so there are two two major
reasons. First we wanted it Victor to
feel like a human employee and you don't
interact with human employees in web
apps. you interact with them in in Slack
just your teammates, right? Um and
um and the number two reason for
choosing Slack as an interface is that
if Victor is like a very powerful agent
and it's supposed to perform difficult
tasks,
then those tasks will not execute
immediately. They they can take like 10
minutes to execute, right? Uh naturally.
So when you go to a web app and ask an
agent to do something for you, so you
switched context and now you need to
wait 10 minutes for the answer or for
the output. It's quite frustrating,
right? Like you don't want to wait. You
are used to from chat GPT you're used to
immediate answers. It it should take
like 30 seconds and it's done. Thank
you. Copy paste and I'm done. Um
but it's not how it works with the
powerful agents. So why is Slack better?
Well, now if you ping someone on Slack
and tell them to build an app for you
and get an answer in 10 minutes, you are
shocked. No teammate has ever built you
an app in 10 minutes, right? So so kind
of the perception is different and
suddenly the latency is is very low uh
when you compare it to uh to to to your
normal Slack experience. But there are
certain things that break in slack. And
number one is that you know when you
work in web apps you have a single kind
of um um single thread. You open uh you
open a new agent or a new new thread and
you you speak to this agent. However,
when you are in Slack, you have a lot of
interaction modes. One of them is DMing
people. Another one is being in public
channels and particip participating in
threads. Another one is just reacting
with emojis. You know, you can also edit
your messages and stuff. And all of this
is is an input to an agent. And all of
that needs to fit into a linear context
somehow, not in a single thread, right?
And we we need to manage this.
So let me give you an example. Um, of
course, when someone deletes a message,
a human assumes that the task should not
be continued or it's not interesting
anymore. When someone edits a message,
you should also respond to to an edit.
Um, but let's say you are DMing your
coworker, whether that's Victor or your
friend, and you start a thread in Slack,
right? Um, but at some point, and that
humans do it very often, you forget
about the thread and you just start a
new DM to the same person. um should you
start and you open a new sandbox
and humans normally have the context
from the previous thread but for the
agent is it's a totally new area it's a
new task right so what needs to happen
then is you need to somehow always
whenever Victor receives a DM look at
the previous messages and somehow roll
them over to the to the to the existing
conversations um so this is just one of
the challenges that you need to face Um,
fun fact, we noticed, you know, I I
didn't think it would be as important as
as it actually is, but what really
matters is the tone. I'll give you an
example from one of our customers. Um,
you know, we were testing, so we use
Opus 4.6 now for Victor. Um, and we were
used as as the kind of the main model
and we were using um, we wanted to try
GPT 5.4 for and on the tool calling and
codegen it's actually amazing you know
it should work and it's actually cheaper
as well so why not replace opus with GPD
5.4 before uh and there's one reason we
we didn't go for it. There's a couple,
but one the most interesting one is the
personality. Um, for some reason our
users can be due to our architecture,
but they loved Opus and they all started
raging when we did we did the AB test.
So, uh, I think there's something
beautiful in in that model that, um, you
know, um, we can learn from. And Opus is
a bit sassy as well in Victor. I'm not
sure if that's thanks to our team or who
made it this way, but uh, actually quite
funny. I encourage everyone to try um
proactivity. One of the kind of powerful
things that Victor can do is proactively
suggest you the workflows that it can
automate. So let's say you're in a
growth team and you discuss an AB test
and the results and at some point you
realize okay this one option is
performing really well. I'll go for this
option instead of the other one. Um,
Victor has access to your post hog or
whatever tool you use for analytics and
it can literally check and realize and
it will do so if what you're saying is
not some
It happened a couple of times that you
know you're discussing some experiments
Victor checked post hog and said hey you
know it's true but like this is not
statistically significant and then it
has run a calculation of why I'm saying
some Um, so it's fun. It's an
advantage, right? If Victor can suddenly
join a conversation and be helpful, it
will be activated more broadly in the
workspace, which is great for the
products. But if Victor does it on day
one and it happened, um the security
teams start raging because someone adds
Victor to your workspace and suddenly
Victor starts DMing everyone and then
participating in the threads and the
security is going crazy. Um that's why I
think you should earn it with the first
us with a few users first and then you
can roll it out broadly. Um
exactly
um
yeah so the value of shared context I
don't have much time left but um I'll
very quickly talk about the difference
between Victor and agents like cloud
code or like cloud co-work or whatever.
Um now cloud co-works on your desktop so
it's a bit different. Uh the advantage
of Victor is that it works in cloud. You
don't need to have your computer open
for it to work. And another thing is the
shared context. So as I said at the
beginning for Victor to work well for
you to be able to you know ask Victor to
change your meta ads budget or like to
read your analytics data only one person
from the company needs to connect this
integration. Right? Imagine that you
work in a 100 person team and your
growth team is 20 people. If you have to
ask 20 people to connect your meta ads
everyone individually, it's quite
painful. Furthermore, if someone wants
to interact with Victor um
and like if Victor wants to be
proactive, everyone connects their own
integration, someone can connect their
own wrong integration, right? and Victor
can be just very stuck and wrong and you
know might not know which integration to
use which adds a lot of complexity uh
for the user.
Um, cool. And something I want to
highlight here is that Victor is not a
tool. It's a hire.
And here's what I mean. I'll tell you
one customer story. One of the biggest
e-commerce brands in the in the United
States, they uh their team admin has
connected Victor and the first
integration, a team integration that
they connected was their personal email,
personal Gmail. And then suddenly the
team started speaking to Victor about
this guy's emails
and um and this guy is is is texting me
and saying, "Hey man, like what the
hell? Victor is leaking all of my data.
Why are you doing this?" And I'm like,
"Why did you give Victor access to your
personal email?" Like, you know, if you
hire a new employee, do you give them
access to your personal email?
Probably not, right? Um, that said, I
think it was a great inspiration and
what we did is we added a capability to
Victor to kind of scope the integrations
so they're not always shared and if you
want to have your personal integration
to your personal email and want Victor
to like in your DMs or publicly uh be
able to use it when you call it, uh,
this is also possible now. Um,
yeah.
And and so to summarize um what does it
take for an AI coworker to be great? I
think there are three major pillars if
you want to build your coworker. I this
is a technical crowd. So I encourage
everyone here to try to build your own
victor. Um and uh you know there are
just three things you may need to make
work. Helps get work done quite easy.
Models are capable today. Um you know
connector integration through pipedream
will work well. Knows the company has
the context from Slack. Make sure you're
able to utilize this context well. Um,
you will probably need to go for the
Slack approval process which is very
difficult and can be can be boring. And
then make it friendly. It makes a
difference. And you should um make sure
that Victor likes your team. Your team
likes Victor. Um, this is our vision for
the future. Every company has AI
employees. I think it's obvious. Not
nothing to argue here. Um and
historically um I just want to highlight
the vision for AGI has been with us
since the 17th century. Um Godfrey Lnitz
the inventor of calculus um was
reasoning about you know humans doing
unnecessary things and he wanted to
build a calculator. Um little did he
know, you know, like a calculation is
not the not the only cognitive task that
that we can automate. And I think um we
are now in this beautiful moment in
history where um where where we can
essentially automate all the cognitive
tasks and we we can be part of the
revolution. So I'll just let myself read
his quote. Um it is unworthy of
excellent men to lose hours like slaves
in the labor of calculation. Let let us
leave that to machines. And with that, I
just wanted to encourage everyone to
scan this QR code, click on sign up, and
add Victor to your Slack. Test it out.
Everyone in this room has a $100 in free
credits. No string is attached. You can
just remove remove Victor at any time.
It will add a lot of value. I promise.
If it doesn't, give me a call. I'll make
sure it does. Thank you.
Thank you so much. I I was not fast
enough to get out here to scan the code
myself, but I'll grab you later on.
Thank you ever so much, Frederick. Um
okay. So much so much to digest after
this uh this trunch of talks. Um luckily
we have a break now. So we have a break
up until um 4:30. So, it's plenty of
time to get some air, get some
refreshments, chat to the folks outside,
make a new friend, uh, and then we'll be
back in here for the remaining keynote
sessions, which will take us through to
the end of the day, uh, starting uh, at
4:30. Okay, enjoy your break. See you in
a bit.
>> That concludes our
Hold still
a little.
I watch the sparks all burn too fast.
Everyone reaching for the flash.
They take the first light they can find
and call it truth and call it mine.
But I stayed when the room went quiet
with the weight of the question
while the easy answers walked
away.
It's not that I see further. I just
don't
let the silence.
I let the dark
right past the comfortable light.
I wait till the surface breaks till the
shade feels true.
I don't rush the fire.
I give it to
I give it to
Call it
but there's a deep still humeneath
not being love.
Every great thing for patience.
Every thing makes you choose.
Do you leave with what's acceptable?
Stay for
more of you.
They say it's talent. Say it's magic
like it falls from open
but nothing worth remember
I stay when it starts feeling kinds
I
wait through the restless
out through the earth to collapse.
Hide by and chase the answer. I let it
find me back. There's a moment after the
last good idea.
Where the room feels empty and you want
to run for your life. That's the
teaches you.
That's the edge where the real stand.
Let the shape reveal it.
I stay longer than I should long enough
to change.
I stayatter
clears.
The haze
bar
with time.
Most dreams
don't fail.
They're just too soon.
I stay.
I stay.
Heat. Heat.
Heat. Heat. N.
Heat. Heat.
Hey, hey, hey.
Hey,
hey, hey.
high.
High.
High fall.
Higher.
You
Yeah, you
Oh yeah. Yeah.
Hey,
hey, hey.
Heat. Heat.
Heat.
Heat.
Heat. Heat.
Heat. Heat. Heat.
Heat. Hey, Heat.
Hey, hey, hey.
Hey. Hey. Hey.
Heat. Heat.
Heat. Heat.
Heat.
Heat.
Heat.
Hey. Hey. Hey.
Heat
up
here.
Heat. Heat.
Heat. Heat.
Hey.
Hey. Hey.
Hey, hey, hey.
Heat. Hey. Hey. Hey.
Heat. Heat.
Heat. Heat.
Hey, hey, hey.
Hey hey hey
Hey,
hey, hey.
Hey, hey, hey.
Heat. Hey, Heat.
Heat. Heat.
Heat. Heat.
Heat.
Heat.
Heat.
Heat.
Heat.
Heat.
Heat. Heat.
Heat. Heat. N.
Heat. Heat.
Heat.
Heat.
Heat. Heat.
Hey, hey, hey,
hey, hey, Hey
hey hey
hey hey hey hey hey hey.
Heat. Heat.
Hey,
hey,
hey,
hey.
Heat. Heat.
Hey,
hey,
hey.
Hey, hey, hey. Heat.
Hey, Heat.
Heat. Heat.
Heat. Heat.
Heat. Heat.
Hey,
hey,
hey.
Heat. Heat.
Heat. Heat. N.
Heat. Heat.
Heat. Heat.
Heat. Heat.
Oh.
Oh.
Oh. Heat. Heat.
Heat. Heat.
Oh,
hey.
Heat. Hey, Heat.
Heat. Heat.
Heat. Heat.
Heat. Heat. N.
Hey.
Hey.
Heat. Heat.
Heat. Heat.
Heat.
Heat.
ladies and gentlemen, please welcome
back to the stage Phil Hawksworth.
Hey,
right.
Well, welcome back for everyone who's
been adventuring around the other
tracks. Some of us have been here the
entire time. Uh I salute you, my uh my
my faithful few who have been sticking
out for one track all the time. Um how's
the day been? You mean you enjoying the
day?
>> Nice. I'm glad. I'm very glad to hear
it. How have people been doing with the
hallway track? And by that I mean not
only submitting a talk suggestion for
tomorrow if you want to do a lightning
talk, but also meeting people and having
uh having new connections. Uh, hands up,
please, if you've spoken to a new and
interesting person and made a new
friend.
I love it. Okay, put your hands up
really quickly because the reason you
need to take this down quickly is if you
were to look around and the person that
you enjoyed talking to didn't have their
hand up,
awkward as you like. So, I think we
managed to get away with it, but there
are so many hands up. Of course, it was
reciprocated. And also you've got
another chance to make more connections
because later on after this last set of
talks of course we do have a bit of a
social bit of a mixer outside there um
we'll talk about that a little bit later
on. Uh but now we are kind of into our
last set of uh keynote talks uh of the
day and what a great way to end the day.
What great company we're in. I I'll let
this get rolling in just a minute. just
a couple of kind of thoughts about the
the themes that we're going to be
talking about here because we've seen so
many themes throughout the day. Uh and
in this kind of last set of our final
four talks of the day, we're going to be
hearing a little bit about, you know,
how the fundamentals of software
engineering uh still matter, how how
those are still important. You know,
what happens when we enable everyone to
be delivering code and to be delivering
uh products and to build. Uh but before
we do that, we're going to have a bit of
a conversation. uh about what happens
when software engineering uh and AI
meet. So I'm very uh pleased that we're
going to be able to uh bring uh Gay Oros
and Swix to the stage uh for a nice
hopefully cozy chat. So enjoy the last
talks of the day uh and I'll see you on
the other side. Okay, enjoy. See you in
a bit.
Sure game.
All right. I going to assume most of you
uh show of hands who subscribes to a
pragmatic engineer. Oh my god.
>> Wow.
>> Uh he is uh he needs no introduction
then. Let's get right into it. Um what
is token maxing and should everyone here
be doing it?
So I I heard about token maxing a week
ago or like week and a half ago first
and you know some people have been doing
it for longer and I tweeted about it I
think three days ago saying oh there's
this token maxing and again you see it
on social media and my DMs were blowing
up from from people at large companies I
don't want to name names but like you
know Meta Microsoft
uh some so some so some some other ones
as well like uh the likes of and and so
so many more and the story is a little
bit different every at every company on
why people are doing it and whether they
like it or whether they think it's good.
But there's a few a few common themes.
One is token output at these larger
companies is measured in in some way.
There's like either a leaderboard or
there's a way to look up your your
peers. Salesforce, for example, you can
check the spend the the money spend that
every every person at the company did.
You can like search in a tool that
someone built and it shows how many
dollars they spent on on AI related
tokens and you know first there's this
number then there's this uncertainty on
in the tech industry right we're kind of
hearing layoffs like massive cuts at the
likes of block and I mean there like no
matter how much tokens people spend they
were let go independent of this but
people start to think like does is it
part of performance evaluations or
promotions or all that and the answer is
kind of so inside of meta I talk with
managers and in the performance
evaluation they have this data point
which is one of many data points right
the same way as as like diffs or impact
or or code reviews of how helpful this
person is but they do just like with any
data point they sometimes pull it in and
use it so typically and just like any
data point it can be weaponized so like
a low performer with low impact and a
low token count clearly not even trying
so and a high performer with high impact
and high token count. Clear that's
innovating and this must be doing good.
So inside of these companies
specifically I talked with a lot of
people at at Meta and again this is not
representative of 100% of Meta but they
had this leaderboard where people showed
up and they have like massive amounts of
tokens and a lot of engineers got just
scared worried so they started to token
max to try to generate tokens. stories
that I've heard first or well secondhand
from these people who who who told me
firsthand is for example instead of
reading the documentation I will ask the
agent to summarize it for me and ask
questions even though it doesn't do a
good job answering it but my token count
goes up people just want to not be in
the bottom 25% or bottom 50% for token
count where these things are measured
inside of Microsoft again there's a
leaderboard and I'm talking with people
they're like it's ridiculous like how
some people are just running autonomous
agents to build junk honestly for the
sake of having that number go up and and
sometimes it gets ridiculous because
like inside of meta they had this
leaderboard they got rid of it after an
article came out and it looked amaz
whoever built it like just just like
closed it down. that people are still
token maxing by the way because there's
this this thinking that it might have
gone but you know we're engineers and
don't forget these are high-paying jobs
right that like you don't really want to
lose a job over something stupid as like
you didn't have INF token count and
that's how it feels but inside
Salesforce there's a target of minimum
spend per month like I think it's like
$175 between things so like people are
like again you kind of like you know
beginning of the month like just token
max to get there so it's it's it's weird
and it started as a joke earlier like a
few months ago token maxing was really
just people like going crazy and
enjoying this thing and building cool
stuff but it's kind of turned into in a
lot of companies I think it's just a
culturally weird thing so it's a weird
time to be in because I remember lines
of code used to be when when early uh
developer productivity tools came out
like velocity and pluralite flow they
kind of measured lines of code and and
number of QPRs and we know that was
stupid and people kind of optimized for
that at companies that did it but it's
it's almost like what now it's the top
running companies like Meta Microsoft
who are incentivizing ing people to to
do just stupid stuff, honestly.
>> Yeah, those are wild stories. And one of
the things you're clapping for that
deserves another full conversation. Uh,
one of the things I like about talking
with you and subscribing to your
newsletter is that you basically kind of
anonymize all these stories from from
real incidents and real examples. Um why
is it that uh is is it still worth it
right with all the flaws uh you know
when you have good hearts law like what
whatever gets measured gets uh sort of
abused with all the flaws is it still
worth it you know is is is AI basically
still making us faster overall like the
cost of token maxing is still with all
these like really ridiculous examples is
it still net worth it?
>> Yeah. So, don't forget like the reason
token maxing is probably a thing is like
let's just go back to six months ago
where
I I I was at a I was at a CTO like
dinner conference whatever like a bunch
of CTO's gather CTO level people this
this was in Amsterdam and we had like
like a bunch of people and there were
talking and and one of the CTOs like the
the the Amazon of the Netherlands uh
there there's a e-commerce company was
saying like hey like everyone like I
have a problem like engineers on my team
are really skeptical of AI and they're
not really using it the AI tools don't
forget this was before Opus 4.5 and
those models were were out they were not
as as productive we had uh we we already
had a cursor and and the like and they
subscribed they're like not using it
that much on existing code bases right
and and next to them uh the head of the
Dutch national bank said like oh we
don't have that problem ours are using
it because our our mission is to
regulate this thing so we need to
understand it and they're kind of
motivated and there was this time where
experienced engineers were kind of
holding off because if you had an
existing code base and use AI cursor
whatever on it it was mildly useful if
that even and these engineers were like
why should I use a tool if it doesn't
help me refactor it doesn't find the bug
it doesn't do what I need to do and
leadership saw they're not really using
it and they kept hearing you know the
likes of Antroic for example was already
saying how they're writing a lot of
their code with with cloud code uh and
it just keeps increasing and entropics
you know like revenue is going up like
this. So those leaders are kind of they
might be confusing correlation and and
and you know like which one comes first
but they're like well we should be using
it more because probably good things
will happen and thus bad things will
happen if we don't use it. So the whole
targeting and measuring things it
actually came from leadership wanting we
want our engineers to use faking AI. I
don't care what it is. And it it was a
bit of a push. Like we know this is bad,
but it's it's better than them using it.
A best example is Coinbase where uh
Brian Armstrong, the CEO, just like
fired an engineer or he sent an email
saying everyone like needs to get on
board and use AI tools and whoever
doesn't use it in a week, I'll have a
conversation with them and then I think
a week later or Saturday, he fired an
engineer and you know like this again
high paying job like we're talking base
salary like three 400k,000
per per year. uh and then both equity
and everything on top of it like they
got the message everyone just start to
just you know like use it and you know
back to your question so on on one there
there's a push and look I feel it's a
little bit like this is going to be
controversial but have you ever wor
wonder wondered why big tech love to do
lead code style interviews algorithmical
interviews which have nothing to do with
the job and and we know it's the case
and there's a lot of criticism for this
and they've been doing this since since
like 20 years But here's the thing. It
selects for a specific type of person.
It selects for the person who's smart
and willing to put up with absolute
to get the job. And this
person, you know, they will study two
months preI two months or three months
of lead code, which again makes no sense
on the job, but you do it. You get in
there and this person will be putting to
put up with that makes absolute
no sense to keep the job. So token
maxing happens at large companies and
people are putting up with this BS and
look a lot of them are smart and they
will make the most of it. Some of them
will build cool stuff. Um it's it's the
reality I think of big tech. So we're in
this weird place where big tech is a bit
weirder than startups where you know no
one cares about tokenaxing. They care
about like just building stuff and you
know use whatever makes sense. Don't
people will care about the cost.
>> Yeah. But going back to your question
like like you know like is is it making
us productive as as a whole like
individually it's it certainly is and as
teams we're kind of like a bit question
mark because we should be moving faster
and there are a few companies that do
androphic is a good example but a bunch
of companies are like not it's it's it
seems it's hard to retrofit all this AI
into like the way we have been working.
>> Yeah. Uh one of my favorite studies from
last year was the meter study where they
uh did a blind test of uh people and
their expectations of productivity,
right? And basically the the end result
was they felt 20% more productive but
their demonstrated results was actually
they were 20% less productive on
average.
>> Yes. But that that study was very
interesting because they
>> it was very small sample size.
>> It was 30 people and there was one
outlier uh who actually was way more We
we interviewed him on the pod. Yeah.
Yeah. Yeah. So he was the one productive
AI engineer.
But anyway, so uh actually my theory is
that uh something that I've seen on my
team is that I've been enabling coding
agents for the rest of my team who are
nontechnical, right? And uh you as the
engineer may not be more much that much
more productive because and you can be
more productive if you uh attend AIE.
But uh if you actually enable your
non-coding uh your your non-coding co
collaborators to code actually they are
more productive because they don't have
to wait for you right and that's that
like unlock of like oh suddenly you have
serverless developers basically uh and I
think I think that's that organizational
coding thing is different than studying
pull request level productivity for the
individual developer. Yeah. And and the
thing that still I still remember to
this date I I talked with Simon Willis I
think in 2024. So two years after Chad
GPT came out and he was Simon Wilson top
commenter on Hacker News or he's he's
>> that's his that's not his title man. Top
commenter on Hacker News. What the
>> No
>> creative Django top blogger. Yeah. Uh
prompt injections. Uh yeah.
>> Yeah. He's actually not top commenter.
he's the most submitted block cuz he
blocks so much like like and he's
>> but he told me back then he said like
this thing AI is is just so hard to to
get good at. He's like there's no manual
and he's like I've been doing it back
then for two years and I'm still I'm
still figuring out what works and what
doesn't. I keep changing my workflows
and I think that's something that is a
bit hard for us. Two things about AI
that for anyone engineers is hard to
understand. One is it just takes a long
time to get good at it and you need to
keep doing it. And the second thing is
understanding the theory will not make
you better at using the tools which is
an absolute mind honestly because
we're so used to you know you understand
how the compiler works, how assembly
works. Okay, you will now be more
efficient if you want to write low-level
code because you know how it works. But
what with these things I mean you you
could of course it's helpful to
understand how how the the architecture
underlying works attention the different
the the different probability sets etc
etc but it will not help you get a sense
for how you can use it and then once you
figure out how you can be more
productive if you're if you're inside of
a team again it kind of breaks and you
have to relearn again but but the more
effort you put into it it like it's
clear that it's it's working it's
helpful and I think it it's the teams
I'm seeing and getting more value out of
it. Low ego, open to learning, open to
leaving your priors behind. The word
priors I have not used forever and I
feel we're in this stage where like just
just leave your priors behind. Just have
an open mind like don't leave your
experience behind but you know be open
to it.
>> Yeah. Zooming out a little bit. How is
the role of the software engineer
changing?
>> I think it's always this was always
coming but AI is just just speeding it
up. uh even before AI a few
it's interesting how you see like
startups in many ways venture funded
startups are kind of front running what
the industry will be catching up because
venture funded startups are about fast
growth um doing
mo moving fast with smaller teams
because smaller teams mean smaller comps
even pre-AI so a lot a lot of these
venture funded startups start to expect
a lot wider range of roles from
engineers for example DevOps as a whole
inside VC funded companies from the mid
2010s, every engineer was kind of like
responsible for the code they deployed.
But like more traditional companies,
they had more money, more sorry, more
less pressure. They kind of have
dedicated DevOps teams and some of those
things. So in in the industry like the
software engineer is now becoming like
the kind of the tester role has
collapsed into software engineer. We
most companies don't have dedicated
testers. Very very few do. DevOps
collapse into here. Uh and now we're
starting to have the product role also
starting to come. So a lot of companies
even like in 2022 before AI start to
hire for product engineers that's
happening faster and I think the the
last push that AI is doing is even for
early career engineers there's a lot
more seniority expected or or senior
like things planning about things
knowing about the business so I I I
think the role is expectations are are
higher teams are also getting smaller
everywhere I talked with someone at John
Deere 200 person uh 200 year old company
sorry uh you know like they do tractors
and and all all that stuff. And and
inside of that company, one of their
their VP of engineerings was telling me
how they're actually seeing that their
two pizza teams are now just one pizza
teams inside of that company. It's the
reality partially because of these
tools.
>> So my joke used to be I am a one pizza
team because I eat a lot of pizza, but
uh depends how much pizza you eat. Uh
there's so I'm sorry to interrupt. I
don't know if I cut you off in some
critical points. Uh there's a comment
saying I've heard it twice even among
this audience where a lot of people are
saying that oh uh you're no longer an
engineer everyone's an engineering
manager now and you've been an
engineering manager and I wonder if you
agree with that or if you have a
different take you know because
basically you're the the the common
analogy is that you're no longer a
software engineer you're just managing
engineering agents right yeah if you've
been a manager before that is an
absolute
so so here here's the thing the like
Yes, you are a manager without all the
things that no one wants to become a
manager for the the when you become an
engineering manager. Hands up if you are
or have been an engineering manager,
right? Hands up if you actually if
you've not been in you want to be one.
>> About 15 20%.
>> All right, you come and talk to me
afterwards. I I'll tell there's a hand
up there. I'll talk you out of it. So,
so what you think you become an
engineering manager to like help
people's career maybe have higher salary
higher impact all you know there can be
a lot of dynamics but the reality is is
is you you become more removed from the
product and you have to deal with people
problems and the thing with with agents
is you don't have to deal with people
drama people problems conflict between
your team I mean unless the next
generation of agents start to fight with
each other I think that'll be something
but you actually you you do have to
orchestrate but it's more like a tech
lead role or or or or experienced
engineer where where you're like
mentoring uh mentoring engineers but you
don't have the people management you
don't need to worry about the personal
problems so it's actually a lot more
kind of empowering and I was talking
with uh the podcast was was just out uh
yesterday with with DHH uh creator of
Ruby on Rails who said you know people
told him like okay it's it's like
managing things and he's not excited
about managing agents but he feels it's
more like a mech suit where you have
like you can do seven things at once you
can do it a lot faster and you're in
control and that's more what it feels
like. So there's orchestration, yes, but
it's very different to management. And
also the the really really bad thing or
honestly shitty thing about management
if if you make it into management which
makes it hard also rewarding later when
you you tell yourself at least this
thing is you start a project with all
these people under you. You know
congratulations you've got 10 people
wonderful and you start a project and in
six months you will see some results of
the decision that you made. With agents
it's just so much faster. So the the
feedback loop is faster. So I I think
it's it's not much of it except for the
orchestration and and and for that
everyone's going to have their own
flavor. Some people will will have the
tendency to like run multiple agents and
they're good at this or we good at it.
Some people just do like two agents.
Michelle Hashimoto I interviewed him. He
has two agents. He always has one agent
running that. No, he has one background
agent that he doesn't. That's it. He's
like two is enough for me. Great.
>> Yeah. Yeah. Uh where figure out the
patterns. Um uh I want to hit you on
large tech infra.
Uh this is something that I think both
of us are very excited by by uh good
infra which is a very niche uh interest.
What are you seeing?
>> It's wild to see how much of the So I
said that from externally a lot of
companies a lot of big tech companies
especially the ones are spending a bunch
on AI and have platforms and all that
you're not seeing too much like more
come out like Uber is a good example.
I'm not seeing too many more features
come out of Uber or new product launcher
and they're like but what's going on?
they are really investing in AI but when
you look inside there's a whole lot of
buzz they are rebuilding their complete
IM infra you know they're and I'm not
talking about they're buying cursor or
or cloud code or all that they're doing
that as well but they're completely
they're building their own own custom
background coding agents that is
integrated into their monor repo they
are are having uh their own MCP gateway
that is is now integrated into service
discovery their on call tooling is being
retoled their internal code review
system is like like categorizing based
on risk. They are like and Uber is one
example but everyone else Airbnb
intercom Meta Microsoft even midsize
companies are just building so much
internal improp and I was asking to
myself like why on one end this feels
like such a waste but when I worked at
Uber for four years I realized they
spend so much on on internal platform
there's two reasons one is honestly it's
a it's a lowrisk way to get good with AI
uh to be hands-on and these companies
want to be hands-on but maybe you
shouldn't start with shipping AI
features no one wants into your
codebase. Second of all, because these
these companies have such so much code
that never fit in a context window, by
building custom solutions and just basic
basic dragons, that kind of stuff, they
will have better results than
off-the-shelf vendors. So, they already
have a win. And number three, honestly,
is anything that has AI in it gets
funded. So, there's this joke of if
you're in the developer platform team
and you're asking for more headcount,
like good luck with that. Oh, developer
platform. Oh, but say that you want to
get two extra head count for agent
experience. Done. So, so there's that
part as well. But, but
>> agent experience is just a CLI
>> pretty much. But all of this come
inside. There's so much buzz and so much
work. Everyone's building their own
custom system. So, I'm kind of wondering
how long this will take. But I think for
next year, this is going to happen. So,
if you either have friends or if you're
work if you're working at a company,
you'll see, but talk with with friends
at other large companies and you will
probably see you are all building the
same thing. If you're in a large company
and you're not already building an MCP
gateway, what are you even doing?
>> Yeah. Um, actually a lot of these topics
are exactly the things I cured for
tomorrow. Uh, it's just fantastic to
have you as the closing keynote for
today because uh it's it's like a
appetizer for tomorrow. We have talks
about MCP gateway and all these sort of
AI architecture and infra things and I
do think like uh infra like taking AI
infra seriously as a company is uh very
mis not not that well un understood and
right now you just kind of learn by
example from people because there's not
really like a textbook or anything like
about it. So the way I think about this
because again from if you just kind of
step out and we love to criticize big
tech of how they're wasting money here
and there and by the way we love to
criticize Google and I'm kind of
thinking to myself like hang on what if
Google ex actually executed well like do
we want that and you know they would
kill all the startups but but what
they're doing makes makes sense and
Shopify is an example where I'm like huh
I'm starting to get why it makes sense
to do all this stuff. So Shopify in 2021
they were the first company to have
access to GitHub copilot. What happened
is the the head of engineering
Farhantoir heard about GitHub copilot
being developed internally inside of
GitHub and he pinged Thomas Dunca the
CEO of GitHub at the time and said hey
Thomas I heard you guys are doing
co-pilot and he's like yeah we are it's
internal. He's like I I'd like to get
access to it. He's like yeah but it's
not for sale. He's like no no no you
don't understand. I I didn't ask if it's
for sale. we would like to roll it out
to all of Shopify and in return we will
give you feedback for 3,000 people for
you know as honest feedback all the time
and so they got it a year before it was
out anywhere and they incurred a lot of
churn it wasn't that great initially and
and they went through all of this stuff
and then Shopify was the first company
to on board to like a bunch of other
tools and they gave unlimited budget and
they're spending so much time ironing
out bugs but the reason they're doing it
this is what like made me click is they
are trading off churn and expense and
spending a lot more money to be at the
forefront of this. They are a few months
ahead or six months ahead of their
competition and for them it's worth it.
It's not worth it for anyone else,
right? If you're if you're in a company
where your business is like something
something physical and you don't care
like yeah just just wait out it it'll
come. But for a lot of us in the tech
industry this turn is worth it. Plus
what Farhan told me is like because he
actually told me he's kind of worried
about the cost now but he was like look
like it's still worth it because if it
would look silly if I said you cannot
have these tools how would I hire the
best
>> so it's it's innovation recruitment and
it kind of makes sense when you think
about it and the weird thing everyone's
doing it at the same time so it looks
silly but it it's rational uh my next
podcast is with Mike Parin the CTO of
Shopify and uh the sheer amount of
machine learning that they do and infra
that they set up for their customers
makes me want to be a customer. You
know, that's that's like the best uh
endorsement I can give. Um I'm going to
get meta a little bit and talk about
pragmatic engineer. Uh you and I kind of
startedish in COVID. Uh you just left
Uber. Uh how has it been growing? What
what are the main stats that you're
proud of that uh you'd like to share
with the world?
>> Yeah. So I I started Pragmatic Engineer.
I I I a joke that if it wasn't for CO I
I would probably never have started the
this thing because what happened with CO
is uh Uber had layoffs and most of the
tech industry was doing great but Uber
was not and my team uh was hit by
layoffs and then we we had to disperse
the remaining people and other teams
because our mission no longer made sense
and it was just like a the morale was
low my morale was low so I was like let
me take a break I wanted to write some
books Swix was writing his book the the
coding career
>> yeah some of you have read it I've met
some of you
>> yeah and that that's how we met there
and then uh my plan was to write a book
and then start start up some startup
something something platform engineer
control C controlV from what Uber was
doing inside and that's actually almost
all Uber su Uber startups it's amazing
temporal is is is from there
>> if I by the way if I did not start AI
engineer I would have started platform
engineer that that would have been the
industry conference
>> yeah love it uh and then I start I
started the pragmatic engineer u a year
after I left Uber it was just an
experiment um I figured No one substack
was taking off. No one was writing about
software engineering in-depth and I just
acted all confident saying pretended
that I I knew what I was doing. The
first article was about Uber's platform
and program split that no one had
written about publicly before and it's a
it's a free article you can you can now
check it out. Uh and it was like when
you feel product market fit that's what
I felt almost immediately. The first
week before I published anything, just a
confident Twitter post, I had 100 people
pay upfront $100 for the whole year,
which I was like, whoa, I have published
anything. In six weeks, I was at a,000
people paying for this thing that didn't
exist before, which was my old Uber base
salary back back in Amsterdam. And it
just kept going up. So like I I figured
like when you find product market fit,
this is like outside of like there's
this rule like if you find product
market fit, just keep doing what you're
doing. So for me, I just kept writing
that one article. I got all these
interview requests, collaborations,
podcast. I just said no to all of them
because I knew the most important thing
was to do what makes it successful,
which is that one article. And later it
turned into two articles. And for two
years, this is all I did is two
articles. And after two years, I looked
up and I was like, huh, like this is
actually working. People like doing it.
I like doing it. There's a future in
that. And that's when I decided I
actually want to turn this into a
business that I don't burn out because
for two years every vacation I went to I
was working 50 60 hours. I was always
thinking I was writing I I couldn't
really let go. So I started to grow the
team a little bit. Uh I I Ellen Bird the
first secondary researcher. Ellen, she's
ex uh
>> she's here right?
>> Ellen's not here. Um Jessica is who who
just joined uh later.
>> Yeah.
>> And then uh so now it was two of us. Uh,
and I started a podcast a year and a
half ago because I talked with so many
people. I figured it was a bit of a
shame to to not have it. So, the
primatic engineer became the number one
paid technology newsletter about four
months after starting. It stayed there
for three years. Now, semi analysis has
>> Dylan versus uh you guys. Um, yeah. No,
congrats on your success. Uh I think
you're also a leading tech voice in
Europe which I think you're sort of
proudly sort of uh upholding that over
here which I would really wanted to
feature. Thank you for your support for
AIE and uh everyone thank you to
>> Awesome.
Our
next presenter can bring levity to the
often serious world of engineering.
Please join me in welcoming to the stage
founder at sizzy.coy.
All right. Wow. Back room.
Those are not my slides. There you go.
Hi, I'm K. We probably argued on X. I'm
D on X and I turned 34 years old today.
Decided to do a talk on my birthday
because Thank you.
I'd like to torture myself by asking, do
we have anyone from Tinkerer Club here?
Please just the person sleeping in the
back is like, "Oh, what did he ask?" All
right. I formed this recently. It's an
awesome community where every person
inside is copy and paste of everyone
inside is hilarious to see. If you want,
you can join us. So, I'm going to talk
today about the past, present, and
future of productivity and personal
agents. Starting with my first to-do app
was when I was 10 years old, which is
crazy. I found an old note in a notebook
and some scribbles are like barely
legible. We're like, I need to eat my
string juice today. I don't know what a
10-year-old does for a to-do list, but
it clearly had checkboxes. And I've been
trying and wrestling to solve
productivity since then. I was anyone
else forever unhappy with todo apps,
please. Like there's no perfect Thank
you. Thank you. It's not only me. So I
tried like this was probably 15 years
ago. I got so fed up with like the
to-doist and the other ones that I
started using text files way before all
these local markdown blah blah blah. And
I used an Android app called Tasker to
basically manage all of these text
files. I got contextual reminders like
whenever I connect to Wi-Fi, remind me
about something or when I arrive at a
destination or when I bike or blah blah
blah. So I was always trying to figure
out a productivity system. I had like a
Google Home which supported back in the
day if supported to basically cut the
command in half. So when you say tell my
assistant too, you can take the second
half and send it to any of the iftt
services which was pretty cool. And
anytime I would have a to-do around the
house, I would just tell my Google
assist and it would just store it. It
wasn't smart, it wasn't AI, but I was
building towards something where I can
offload my thoughts and process them in
a way. I realized that I never wanted a
to-do app. I wanted like sort of like a
life OS. So slowly I've been going to
that direction in 2017 16. I I'm bad at
naming so just ignore the names of
everything I've ever built. So I made
something called todo which was which
was like a todo app but all the to-dos
like shoot up to the top based on like a
priority system. So if you tag them with
something called health or crisis or
whatever it is they would just
accumulate all of those points and shoot
higher to the top. So it was kind of
helping me to prioritize things. ADHD
hit of course and I forgot about that
one and I started something called
better. This one was kind of hard to SEO
because good luck figuring out SEO for
better apps. So eventually I had to
rebrand it. But I expanded here by
adding to habits, planner events, and a
bunch of other things because I realized
if these three are not together, I can
never make like a mini OS. Then of
course ADHD hit and I switched to a
bunch of other apps. And in 2022, I
started making Benji. It's named after
my dog. My dog is the mascot. That's not
the logo, but the point is I wanted an
app to rule them all. I might have went
a little bit overboard. to the next
slide. You're going to see you're like,
"Oh, probably he added routines and
calendar events and like what else?" No,
this is how much I hate marketing. If
you're like, "Wait, I've never heard of
Benji. How come?" Because every time I
had the urge to do marketing and to
actually promote this to people, I was
like, "Maybe one more feature. Maybe one
more feature." It's like almost like 3 4
years later and I still haven't properly
wrapped this up. It's still not properly
finished. But I was frustrated with
using a web app for one thing, an iOS
app for another thing. It supports this,
it supports Android, it doesn't support
this. Some of them are subscriptions,
some of them are premium. So I just
wanted all of these features like
mangled into one tool that can sort of
fix my life. Has it? Absolutely not. But
we're going towards that. My vision is
to one day have like a Benji phone and a
Benji OS. And the funny thing is I said
this on a podcast and the guy was like
very ambitious for someone who doesn't
have a landing page for Benji. So I
didn't have a landing page, but one day
I'm going to make like a Benji phone. So
the friction with having like making
this life OS whether it's in notion or
in something else like Benji. The
annoying thing is you have to use forms
to input data. So I oscillate between
two states. I'm either for a month like
logged into Benji logging everything and
doing all the things or I completely
ignore it. I don't care about what
things are there to do nutrition
whatever. I'm like no no no I don't want
to look at it and then in a few months
I'll go back into that cycle because
there's a lot of friction in in using
all these tools. We had the chat GPT
moment. It was awesome. But when chat
GPT plugins came out I don't know if you
remember that ancient relic that they
now it's transforming to MCPS and
whatever. Um, I called my wife and I was
like, "Honey, it's over. It's over for
all the apps, for all SAS. Like, GPD is
going to eat the world. It's all going
to be chat GPD. It's all going to be
within the thing. Benji is pointless. I
wasted years on blah blah blah." 3 years
later, she received so many of these
calls. She just ignores me at this
point. I'm like, "Oh my god, they
dropped the new Opus." She's like,
"Uh-huh. Cool. Cool. Nothing ever
happens. But we're going towards this."
Like 2023, before the models could
return JSON, you had to bully the models
to return JSON. I don't know who
remembers this. like you had to be like
please don't write any markdown. It's
like sure here's some JSON. You're like
no. So you had to parse it to cut it to
to like shape it into form to make some
JSON. And I added a feature in Benji
where you can like press a press a key
on your keyboard. It would record with a
microphone and as I was speaking it
would like periodically cut some of what
I was speaking and basically call it
wasn't MCP. It wasn't anything. It would
call APIs in Benji and you can see your
calendar moving live and your to-dos and
everything. And to people on Twitter
this was mind-blowing. They're like,
"Holy dude. you should pursue it.
You should make something out of this.
But ADHD, I was like, "No, no, no, no.
People like it. It went viral, which
means we never have to talk about this
again." So the Benji assistant still
hasn't shipped and I did nothing about
it. Meanwhile, people took one feature
of Benji, which is like, I don't know,
food tracking. They take a picture with
your phone and it analyzes calories and
they made multi-millions, but I have 60
features. There's there's a lesson in
there. So last October, I I I realized
that wait, I'm using clot code. I can
use it for more stuff like it has tool
calls, functions, and a bunch of other
stuff. Maybe I can tell it to do my
taxes and end up in jail. Hopefully not.
Uh maybe I can tell it to organize my
email and my to-do list and a bunch of
other things. So when skills came out, I
started like loading my cloud code with
personal skills. But I'm like wait now I
have coding skills. I have personal
skills. It gets confused. Like I started
asking people how do I go and make this
into like a proper assistant that's like
lives on top of cloud code but it has
tools for other stuff other than coding.
But ADHD was like how about we forget
about this? let let's pete let Pete come
up with the cloudbot and everything else
like you don't need to worry about this.
So cloud code had like the wrong shell
for me because it was like terminal
based and I crave for something else. So
when Peter make uh cloudbot back then
when I saw the tweet I'm like oh my god
you can talk to it through WhatsApp or
telegram or whatever for me it was like
that's the moment that's what I needed
for my cloud setup to actually you know
evolve into the next thing. My brain
caught on fire. I think we got like mass
psychosis. It turned into a cult.
Everyone wearing like lobster suits. It
it it it's been crazy for a while and I
joined the discord and it was like less
than 100 people who had their cloudbot
set up. Even Pete was like how did you
do this? There's no like onboarding.
There's no like how did you do it? And I
told him what I'm telling people now. I
don't know how the internals of my setup
work. I just ask either codeex or cloud
code to fix it to change it to improve
the memory to do this to do that but I
have no freaking idea. People are like
what do you have in your JSON file? I'm
like I haven't seen a JSON file since
four years ago. Like I don't know. just
ask my bots and it just fixes the
things. So, for a while I went like full
lobster mode. This is me at the first
meet up in Vienna in a lobster suit. I
made that logo. I actually made the open
claw logo at 2 a.m. at night. Uh I like
started wearing all of these lobster
merch doing tutorials, podcast, guests,
talking about all the use cases and blah
blah blah. And finally, what I liked for
someone who's been obsessed with to-dos
and productivity since like 10 years
old, I'm like the future is finally
reachable. like all my files from Google
Drive and iCloud and presentations I
have and photos from high school and
like all the things that I have like
piled up and unfinished business ideas.
I could see how open clock can just
magically, you know, wave the lobster
hands and just fix everything in my
life. So I was immediately done with all
the cloud models. I I went full hipster
mode like no more Gemini, no more chat
GPT, no more cloud. I wanted to fully I
I got the power of finally owning the
assistant, owning the files, owning the
memory, deleting the sessions if you
wanted to. So it's like it felt fully
local. So naturally I started preparing
all my data for agents. I went from the
guy who was like always using cloud and
stuff to annoyingly self-hosting
everything. Everything has to come off
the cloud. It has to be local on my NAS
on my machine just so my agents can
actually work on it. So these are still
work in progress. The classic work in
progress I'm going to finish one day.
But I started moving to local hosted
like nextcloud image local markdown for
everything that requires a lot of API
calls or MCP and whatever. I would
rather just have it local to work on all
of this locally. I went that far and I
went back to Android. Like I feel like
this thing in a way, you know, like
enchanted me. I'm like, who am I? I
don't recognize myself anymore because I
wanted my agent to be able to read my
notifications, clear my notifications,
install apps, uninstall apps. It can do
anything on an Android phone and on iOS
it can maybe send you a push
notification and if Tim Cook allows. I
was planning to do like 10 15 more
slides. Sorry for the flashbang there.
Uh of use cases, but then they told me
the presentation is exactly 18 minutes.
So I did that one. It's on YouTube. It's
on a bunch of podcasts. I don't want to
talk about probably all of you have
maybe even more use cases than me. But
when we do like we do weekly meetups in
the tinkerer club and we talk mostly
about open claw and I love to ask this
question when I ask them about which use
cases do you have then ask them but
which ones of them you cannot do with
cloud code and codeex and immediately it
just reduces by 90% because it's like uh
yeah I can kind of do that with cloud
code. So, I've been also asking myself
like what is the value of like having
like a package agent like like OpenClaw?
I think that one-on-one chat with one
agent sucks because if you think about
delegating in your life, if you have
like business and personal and family
and blah blah blah, you don't want to
have like one employee loaded with all
the information about your life talking
in like Telegram in a one-on-one chat
about everything. So, more people
started using Telegram topics. They
started using Discord, Slack, and other
stuff just to get organized. I like the
idea of specialized agents which open
claw supports but not a lot of people
use them because basically they have
like provider model level of thinking a
system prompt or soul a list of tools
and MCPS and a list of permissions. I
like that this is like package and we're
going to talk with this agent about
fitness. Now people talk about LLM
psychosis. I'm out here like going crazy
like these are all of the bots that I
created and I tried to like contain
every bot to have a purpose in my life.
Like some of them are for work, some of
don't take photos of my chats. Uh, so
now I ended up with I have five disc.
The funny thing is like as I keep
talking, keep in mind that my life is
far from solved. It's never been more
chaotic. I've never been late on on
rent, on mortgage, on like customer
emails. It's a mess, but it's a
performative mess, right? So I ended up
with five Discords and each Discord has
many channels and threads and forum
posts and nested thingies and blah blah
blah. And then inevitably, I mean, you
can sense this across the community. I
says that across Tinkerer Club because
in the beginning it was an explosion of
signups of people joining the meetup.
They're like, "Oh my god, weekly calls,
we're going to crush the world." And now
if you enter a meetup now, it's like
five people and it's slowly turning into
like um open claw anonymous and
everyone's like, "Yeah, mine didn't do
like the crown jobs, man. It
drives
a bit depressing, but I think we'll
bounce back. We'll figure out like you
know, we'll figure it out." Why is this
happening? because it was and kind of is
for me unreliable where it matters most
which is like cron jobs multi- aents the
agents talking to each other the agents
forgetting like literally in the next
message like huh what what are you
saying and I'm like the message is above
you just go one message above you this
is getting fixed and it's getting
updates every day but I I've yet to see
that it's actually you know working this
is not the open clause or any other
agent's fault but Discord and Telegram
were not meant for a life OS we're just
molding them into something but they'll
never be the right UI for you to manage
your life fully. It's like cop in a way
until we get to something else. We're
going to use Discord or Telegram. And
finally, as I would like to call them,
Benthropic, they ruin the charm of it.
Like as soon as you pull the model,
talking to GPT5 talks feels like talking
to a box box of oats. Seriously, it has
the personality of this. Try this. It's
like, okay, did you do that? No, but I
told you to do it. Okay, I'll do it. Did
you do it? No. Every conversation with
Open Claw looks like that in the last
and it drives me nuts. So, what now?
Where do we go from here? I don't know
how much time I have left. It says six
minutes. Where do we go from here? I see
like two futures like fighting for each
other and I don't think that either of
them is going to win in the long run.
So, we have these custom agents like
OpenClaw, Hermes, or whatever else is
possible. Uh, and we have cloud agents
because everyone is trying to grab a
slice of the pie. Now, we have co-work
and OpenAI is going to have a thingy and
Perplexity is trying to make a thing and
everyone is trying to make their cloud
thing and those are the cloud ones. So
the custom ones are never going to work
because they're for tinkerers. And I'm
telling you like in ticker club the
people we have people who are building
their own pinball machines. Talking
about tinkers like they tinker with
everything and everyone is freaking
tired of like trying to make this thing
work. Let alone people who have lives
let alone people who have like busy
lives and jobs and whatever else. No one
will have time to tweak this. They would
just like a served solution for them so
everything works out of the box. Not me.
I'm not I'm not going to be happy until
I you know and then cloud agents I tried
cloud co-work for like five minutes and
I'm like this is too nerfed. This is not
an openclaw alternative. It cannot do
like even like 5% of the things that
openclaw can do. So this is will be for
the masses and but it won't satisfy the
tinkers the people who want to self-host
own the models and blah blah blah. So
two directions here. What am I going to
do like personally for myself and what I
think is going to happen next in the
actual like industry. I'm juggling
currently between OpenClaw, Hermes,
Paperclip. Is anyone using paperclip?
It's like kind of this like cool like
conbon linear like thingy for agents
wasting a lot of credits. I'm trying
plain timmax with codeex a lot of time.
When you reach the peak frustration with
the first three, you're like, "Fuck it."
When you open the terminal, you're like,
"Ah, maybe the agents are not that
smart." So, I'm juggling between all of
this and I'm using all of them daily.
But it's like the hesitation that I have
like I wanted to see where the location
for the venue is and I had two options.
open the website or go to Discord and
I'm like, I don't want to talk to that
box of oatmeal. You know, it's going to
be like, yeah, I'll find the location in
your email. Did you? No. Are you ready
for it? It keeps asking you, are you
ready for the thing you told me to do?
It's crazy. So, I started making my own
thing naturally. You can see the
progression. It's never going to see the
light of day. It's not for people. It's
just an experiment to do it for me. I
call it wolffer. And I'm not making it
for mass appeal. I'm not making it for
everyone to use it. I'm trying to like
how can I make a tiny abstraction on top
of like codeex or cloud code rest in
peace. I I'm afraid to use cloud code
because I might get arrested. So it's
only on codeex for now and it's not
extensible and it doesn't support a
billion providers. So I'll start with
the cons. What sucks you're forced to
use the UI chat of the actual app and
you cannot use telegram or iMessage or
whatever. There's no support for any of
this. It's absolutely the opposite of
Open Claw and Hermes. It's not built
with plugins in mind. It's the idea is
to have everything in it. There's no
memory system. I'm not really selling
the thing, but none of these things are
out of the box. It's not very modular.
It's made by an ADHD squirrel brain that
will forget about it by the end of the
month. And it doesn't have open eye
funding, and it doesn't have a cool
lobster logo. These are the cons. But
the pros and why I would suggest all of
you to maybe dabble with this and try to
make your own um or maybe eventually try
mine if I ever release it for people. It
has predictable conversations. And the
UI that I made, you go to the Wolver
app, like wolffer, whatever the URL is,
and it has like predictable UI that's
like made for multi- aent orchestration
into like multiple topics, multiple
conversations. Like everything was made
for this purpose. It's not like you're
taking Discord and you're trying to mold
it to be for a certain purpose. And my
favorite feature is because I don't
believe in memory of agents. Like people
are like, "Oh, we finally saw Mila, oh
solved memory." I'm like, "No,
absolutely she didn't solve memory."
What I believe in here I have nested
topics. So I have like work projects
Benji Benji customer support. Let's say
that's the nested tree. And when I'm
talking to Benji customer support in the
first prompt, it injects the description
of all the parent prompts. So when I'm
talking to Benji customer support, it
doesn't need to pull from memory or some
magical place. It just looks at the
topic, the parent topic, the parent
topic, the parent topic. It takes all
the descriptions together and it
immediately knows what is my work, what
is Benji, what are my projects and how
do I do customer support. And I can get
more out of that than hoping from some
memory system that's going to pull the
right context out of the the right
place. It kind of works for me. It
supports workspaces. I can switch
between workspaces. I hated that I
couldn't see tool calls. I would like to
see tool calls to collapse them, to
uncolapse them, to see loading spinners.
There's buttons for stopping the thing.
I don't need to use slash commands. Uh
the chron jobs are predictable. And when
you get a chron message, it actually
reads from the entire conversation and
it labels it as chron. So it's not like
where did this come from and why is the
agent kind of lost. There's UI for
managing agents which is like for my
brain I really need it when I chat in a
topic on the right side I see that the
agent is like Chandler and he has this
model and this capability. So it really
helps me to know who am I talking to and
just tweak and be like no no no you
don't need that capability. Boom it
disappears. Um I would have included
screenshots but the app didn't work cuz
it's on my Mac studio at home. It's a
long story but imagine the screenshots.
is kind of cool and I like that you can
like there's like a knowledge base and
documents that you can write markdown
documents in the thing and you can add
them because in Discord you can only add
other members. There's no dynamic ad to
mention something else and here I can
mention for example hey let's fix the
landing page of Benji just like and then
I would add the landing page of Tinker
Club for example or I can add a
knowledge base or a password or a skill
so I can combine multiple ads so I I
give it the right exact context that it
needs uh for the actual thing. What I
think is going to happen next because
this is definitely not going to be a
mainstream thing. What's going to happen
next in the entire agents and industry
and what are people going to do? This is
my prediction. I think the way we use
computers right now is absolutely
insane. Does anyone agree with me? And
have you finally got this? Like when you
open your computer like computers
shouldn't be this way. One person, two
Okay, we have a lot of people. Like I
open my computer after a few hours it
greets me with 17 updates for apps I
haven't used in a while. And it greets
me with like tabs that I had open since
yesterday. like how I imagine in the
future it would need to ingest all the
information about my life like
notifications and emails and everything
and to-dos and everything that's
happening in my life and depending on
how far away I've been for from the
computer it should greet me with the
next task to work on and then the next
one and the next one and it should maybe
give me a break and be like hey enough
let's do this let's do that so in a way
I think the role of AI is going to
inverse so the way we prompt the AI
right now I think it's going to inverse
and the fully productive people will be
the one who delegate 99% of the stuff
for to the AI and then the AI prompts
you. It's like, hey, you didn't send me
a picture of your passport or, hey, what
do you want to do? You basically do
decisions and you basically click like
forms or you answer questionnaires or
whatever it is, but in the background,
there's something constantly working for
you instead of you prompting it all the
time. I agree with this sentiment.
People are like, "But my grandma will
never vip code." That's 100% true
because I think where we're going, we're
actually not going to need most consumer
apps. know your grandma, your mom or
your friends are not going to VIP code,
but they'll be able to sit in this new
futuristic OS and they'll be able to do
any task that they want to do like
either the a the UI is going to pop on
the fly or whatever it needs, but
they'll be doing task and they'll forget
about I need an app to do a task.
They'll just do it. A small set of apps
will survive, but it will be software
for like specialists and people I don't
know who are doing like color grading or
some movie making or music making where
they actually need a software. But
normies will just chat to their computer
and their computer will do things and
the UI will generate on the fly. Uh I
also think it would be the funniest
thing if Apple wins all of this because
local models are getting insanely good
and they're going to get even better
this year and next year. And I think for
most normies, for most people, they'll
be completely fine with a local agent
like Siri getting tool capabilities from
all of their locally installed apps, not
wasting any credits. Their data doesn't
go anywhere and their phone magically is
doing things. the latest Google Pixel
can already launch your apps in the
background and order coffee and do a
bunch of things for you. So, I think
that's where everything is going. So,
I'm over time. Thank you for listening
to my rant. Hopefully, we can discuss
afterwards and thank you very much.
Thank you.
In the age of AI, do software
fundamentals matter anymore? Our next
presenter argues that they matter now
more than ever. Please join me in
welcoming to the stage engineer and
educator at AI Hero, Matt PCO.
Wow.
Hello everyone. Having a good conference
so far?
>> Are you having a good conference so far?
>> Good. Wonderful.
I have a message for you that I hope
will be um a comforting message for
folks who believe that uh their skill
set is no longer worth anything in this
new age, which is I believe that
software fundamentals matter now more
than they actually ever have.
And
I'm a teacher and I've been recently
teaching a course called Claude Code for
real engineers. Nice and provocative.
And in the process of kind of working on
this course, I had to come up with a
curriculum about AI coding, which is a
bit of a nightmare because things are
changing all the time, right? AI is a
whole new paradigm. We need to chuck out
all of the old rules surely so that we
can bring in the new stuff.
And there's a kind of movement that has
come up around this, which is the specs
to code movement. And the specs to code
movement says that okay you can write a
specification about how an application
is supposed to work then you can use AI
to turn it into code. If there's a
problem with the application you then go
back to the spec. You don't really look
at the code. You just change the spec.
You run the compiler again and you end
up with more code. Raise your hand if
you've heard of that. Keep your hand
raised if you've tried it. Okay. I've
tried it too. You can put your hands
down.
And what I noticed was I would run it
and I would try not to look at the code,
but I would look at the code and I
realized I would get code out first of
all and then I would run it, I would get
worse code. And then I did it again, I
got even worse code and I got it again.
I kept running the compiler, kept
running the compiler and I would just
end up with garbage.
You know, raise your hand if that's
happened to you. Yes. I don't think this
works. the idea that we can just ignore
the code and just have the code let it
manage itself is just sort of v coding
by another name
and I didn't believe that back then I
thought okay how do I fix the compiler
how do I make it so that it doesn't
produce bad code each time or worse code
and so I thought okay I need to explain
to the LLM in English what a good
codebase looks like let me dig out one
of my old favorite books which is a
philosophy of software design by John
ouster go on Amazon get it? Um, and he
has a definition for what bad code looks
like. He calls it complex code.
Complexity is anything related to the
structure of a software system that
makes it hard to understand and modify
the system. Right? So, a a bad codebase
is a codebase that's hard to change. If
you can't change a codebase without
causing bugs, then it's a bad codebase.
Good code bases are easy to change. So,
I thought, oh, that was good. Let's try
another book. Let's try the paragmatic
programmer. Go on Amazon, get it. They
have a whole chapter on something called
software entropy. And this is exactly
what I was seeing. Entropy is the idea
that things tend towards um disaster and
uh floating away from each other and
collapse. And this is exactly how most
software systems behave too is that
every time you make a change to a
codebase, if you're only thinking about
that change, not thinking about the
design of the whole system, your
codebase is going to get worse and worse
and worse. And that's what I was seeing.
Everything inside the specs to code idea
that you just run the compiler again and
again was making worse code. Now there's
an idea that sort of drives the specs to
code movement which is that code is
cheap. Raise your hand if you've heard
that phrase before that code is cheap.
Yeah.
Well, I don't think this is right. I
think code is not cheap. In fact, bad
code is the most expensive it's ever
been. Because if you have a codebase
that's hard to change, you're not able
to take all of the bounty that AI can
offer because AI in a good codebase
actually does really, really well.
And this means good code bases matter
more than ever, which means software
fundamentals matter more than ever.
That's the thesis of this talk. So,
let's actually get into practical stuff.
I'm going to talk about different
failure modes that you may have
experienced or you may not have
experienced yet with AI and how you can
avoid them by just going back to old
books and looking at good software
practices. Sound good? So, the first one
is that the AI didn't do what I wanted.
You know, I I thought I had a good idea
in my head and the AI just did something
totally different or it did some uh like
specs that I you know, it just made
something I didn't want. Raise your hand
if you've hit this mode.
Cool. Okay. Well, this is what they say
in the pragmatic programmer is that no
one knows exactly what they want. Is
that you and the AI, there is a
communication barrier there, right? And
so when you're talking to the AI, that's
kind of like the AI doing its
requirements gathering. It's basically
working out from you what it is that you
need. And I realized that there was
another book, Frederick P. Brooks, the
design of design, and it talks about
this idea called the design concept. is
that when you have more than one person
designing something together, you have
this idea sort of floating between you,
this ephemeral idea of the thing that
you're building. And that thing that
you're building or the idea of it is
called the design concept. It's not an
asset. It's not something you can put in
a markdown file. It is the invisible
sort of theory of what you're building.
And so I thought, okay, that's what's
going on. Me and the AI don't share a
design concept. So I came up with a
skill. The skill is very very simple.
It's called grill me and it looks like
this. Interview me relentlessly about
every aspect of this plan until we reach
a shared understanding. Walk down each
branch of the design tree which is
another thing from Frederick P. Brooks
resolving dependencies between decisions
one by one. This skill is like uh the
repo containing this skill has like
13,000 stars or something like it just
went nuts. Went viral. People love this
thing. it. These couple of lines means
the AI asks you like 40 questions, 60
questions. I've had it ask people a
hundred questions before it's satisfied
they've reached a shared understanding.
And it means it turns the AI into a kind
of adversary where it's just continually
pinging you ideas and trying to reach a
shared understanding. And that means
that the conversation that you then
generate, you can take that and turn it
into a product requirements document or
something. or if it's a small change,
you can just uh do turn it directly into
issues and then your AFK agent will then
pick it up. And don't at me on this, but
I personally believe this is better than
the default plan mode in the tool that I
use, which is claw code. Plan mode is
extremely eager to create an asset. It
really wants to uh just create a plan
and start working. whereas I think it's
a lot nicer to reach a shared design
concept first. So that's tip number one.
Now failure mode number two is that the
AI is just way too verbose.
It's like you're almost talking across
purposes with the AI. Raise your hand if
you uh feel this. If you ever experience
that failure mode. Yeah. It's kind of
like the AI is like talking just using
too many words to try to communicate
what it's doing. It's not like you're
talking uh using the same language. And
this to me felt very very familiar.
Right? If you've ever been a developer
for a long time and you've worked with
let's say domain experts, someone
building an application, um let's say
the domain expert wants you to build
something on uh I don't know microchips.
You have no idea what microchips are.
You need to establish some kind of
shared language, right? Because
otherwise they're going to be using
terms you don't understand. You're going
to be translating that into code that
maybe you don't even understand and
certainly the domain expert won't. And
so there's this kind of language gap
between you and the domain expert. And
so I went back to domain driven design.
DDD, this is something I'm still kind of
on the edge of exploring, but everything
I'm reading about DDD is just music to
my ears. I freaking love it. And DDD has
a concept of a ubiquitous language.
With ubiquitous language, conversations
among developers and expressions of the
code and conversations with domain
experts are all derived from the same
domain model. It's essentially a
markdown file full of a list of terms
that you and the AI have in common. And
you really focus on those terms and you
really make sure that they're aligned
with what it actually means and you use
them all the time in the code when
you're talking about the code when
you're talking to domain experts or in
our case when you're talking with AI. So
I made a skill. This skill is the
ubiquitous language skill. Basically
just scans your codebase, looks for
terminology, and then um creates a
markdown file. Creates the ubiquitous
language markdown file. A bunch of
markdown tables with all of the
terminology. And this then I pass it to
the AI and I'm able to read it to and I
actually have it open all the time when
I'm grilling with the AI and planning
and that. What I noticed by reading the
thinking traces of the AI, it not only
improves the planning, but it allows the
AI to think in a less verbose way and
actually means that the implementation
is more aligned with what you actually
planned. So this has absolutely been a
powerhouse. It's been unbelievably good.
So that's tip number two. Create a
shared language with the AI. So okay,
let's imagine that you've aligned with
the AI. You know what it is you're
supposed to be building. the AI has
built the right thing, but it doesn't
work. Raise your hands if that's
happened to you. Yep. Just doesn't work.
Well, there's an obvious thing that we
can do to make that better, which is we
can use feedback loops. We can use um
static types. You know, if you're not
using TypeScript, u that's crazy. Uh if
you're not using uh if you're building a
front-end app and you're not giving it
the LM access to the browser so it can
look around, absolutely needs that. And
you obviously also need automated tests.
And one sort of thing I notice here is
that even with these feedback loops, the
LLM doesn't use them very well. It
doesn't kind of like get the most out of
its feedback loops in the way that a
veteran developer would. And so it does
what it tends to do is just does way too
much at once. it will produce the huge
amounts of code and then think, "Oh, I
should probably type check that actually
or I should uh yeah, maybe check a test
on that or maybe do something like
that." And this in the pragmatic
programmer they describe as outrunning
your headlights as essentially driving
too fast because the rate of feedback is
your speed limit. The rate of feedback
is your speed limit, which means that
you should be testing as you go, taking
small deliberate steps. And the AI by
default is really not very good at that.
And so skill number three is TDD. You
should be using testdriven development
because TDD forces the LLM to really
take small steps. You create a test
first. You make that test pass and then
you refactor the code to make it nicer
and consider the design.
The issue here is that testing is really
hard. Testing has always been hard.
And the reason for that
is there are a ton of different
decisions you need to make when you
write a test. You need to figure out how
big a unit do you want to test. You need
to figure out what to mock. You need to
figure out what behaviors do you even
want to test in the first place. And all
of these decisions are dependent. So if
you are testing a really big unit like
an entire massive application, then it
might be quite flaky. You might not want
to test that many behaviors. you know,
if you only test this unit, you need to
mock this unit. You know, it's all
interlin. And I've been thinking about
this for years for my entire development
career.
And what we notice is that good code
bases are easy code bases to test,
right? So, here we're starting to get
back to the idea of code being important
is that the better your codebase is, the
better your feedback loops are. Because
you're able to um give better feedback
to the LM, it produces better code.
And so I thought what does a good
codebase what does a testable codebase
look like? Again we go to John sterout.
He talks about having deep modules in
your codebase. Not shallow modules not
lots of modules that expose like kind of
um lots of functions. They should be
relatively few large deep modules with
simple interfaces. Let's compare them
quickly.
Deep modules, lots of functionality
hidden behind a simple interface, hiding
the complexity. You can look inside the
deep module if you want to, but you
don't need to. You can just use the
interface. Shallow modules, not much
functionality, complex interface.
And I'll just wait for you to take the
photos.
Shallow modules in a codebase kind of
look like this, where you have a ton of
different tiny little blobs that the AI
has to walk through and navigate. And
this is really hard for the AI to
explore actually. And so often what
you'll see is if you have a codebase
like this, which AI is really good at
creating code bases like this is that
you'll have a situation where AI doesn't
understand what your code is doing. It
will attempt to explore the code, but
because it's poorly laid out, filled
with shallow modules, it doesn't maybe
get to the right module in time or
doesn't understand all the dependencies,
all that stuff. It doesn't understand
your code. And so what does a codebase
full of deep modules look like? Well, it
looks like this
where it's the same code, but it's just
structured inside boundaries where you
have these interfaces on the top.
And these interfaces, you should
probably have a lot of control over them
and design them really well. Otherwise,
you know, AI might mess up the design.
But the implementation, you can kind of
leave that to the AI a bit. So, how do
you turn a codebase that looks like this
into a codebase that looks like that?
Well, I've got a skill for that. Improve
codebase architecture. Turns out this is
not it's quite complicated to do this,
but it's a like a set of steps that you
can reusably do again and again. You
just sort of explore the codebase, look
for opportunities where there's code
that's kind of um related, and wrap all
of that in a deep module.
And this is a testable codebase because
the boundaries around this code are so
so simple. You test at the interface,
you verify using that interface and
you're good to go. And so this is a
codebase that rewards TDD.
But how about failure bone number six,
which is your okay, let's say your
feedback loops are working. Let's say
that things are kicking into gear.
You're able to ship more code than you
ever have before, but your brain can't
keep up, right? Uh, raise your hand if
you felt more tired than you have ever
before in your development career. Yeah,
me too. It's knackering. And I think
that this is a codebase that actually
makes it harder for your brain because
you as well as the AI need to keep all
of that information in your head.
Whereas this, not only is it simpler
for you to read and understand, it also
means you can kind of treat these
modules or these deep modules as gray
boxes.
you can kind of say okay I'm going to
just design the interface but I'm not
going to worry too much or not review
the implementation too much. You can do
this obviously with uh things that are
less critical in your application. Can't
do this with uh you know various things
like finance or whatever but in many
many modules in your app you don't need
to think about the implementation too
much as long as you have a testable
boundary outside the module and as long
as you understand its purpose and can
design it from the outside. I have found
this has really saved my brain because I
can just go okay the AI I'll let you
handle what's inside the big blob. I'm
just going to test from the outside and
verify it. So that's tip number five.
Design the interface delegate the
implementation.
But this means that whenever we're
touching the code, whenever we're
planning stuff, we need to think about
and be aware of the modules in our
application. We need to know that map
really well. It needs to be part of our
ubiquitous language. We need to build it
into our planning skills as well. So my
writer PRD inside the PRD I'm specific
about the module changes and the
interfaces inside those modules how
they're being modified. I'm thinking
about them all the time. And this comes
from Kent Beck. Invest in the design of
the system every day. And this is the
core of it right because specs the code
we are not investing in the design of
the system. We are divesting from it.
We're getting rid of that. Whereas this
I think is absolutely key.
And so code is not cheap. That's the
message I want you to take away. Code is
important.
And if we think about AI as a really
great on the ground programmer, a kind
of tactical programmer, a sergeant on
the ground making the code changes, you
need someone above that. You need
someone thinking on the strategic level
and that's you. And that requires
software fundamental skills that we've
been using for 20 years for longer.
Now, if you are interested in any of the
skills I put up here, it's in the GitHub
repo, Mac Pocco skills. And if you're
interested in the training that I do or
uh any free stuff, I'm on YouTube, I'm
on Twitter, but I'm also at aihero.dev
where I have a newsletter that you can
check out. Thank you so much. I hope
that this gives you confidence in this
new AI age that you can actually make a
good impact. Thank you. Our
next presenter created PartyKit, the
open-source tool for realtime
multiplayer apps. For his day job, he
builds AI agents at Cloudflare. Please
join me in welcoming to the stage Sunil
Pi.
Let me
uh 20 minutes to the pub.
Uh hi uh my name is Sunil Pi. Uh I work
at Cloudflare. Uh I build agents over
there uh for the agents SDK. I'm trying
very hard for this not to be a
Cloudflare talk, but I think we are on
the sponsor board. So that's nice. Uh
this is a talk about something we call
code mode. Uh I've been wearing the hat.
Uh and uh there's some prior art to it.
We don't claim to have invented it, but
this is a talk about the implications of
something new that we we're discovering.
So um you guys have built uh AI
applications and tool calling gets weird
at scale. When it's just a couple of
tools and very short runs, it's fine.
But the moment you start stuffing in uh
your Google services, your Jira, your
wiki, etc. And you're like hundreds
hundreds of tools filling up the
context, it starts breaking. Um
and the composition is weird. And
there's this back and forth that you
have to do with uh the model that's
really slow.
Uh we decided to take a different tact.
Instead of doing this JSON back and
forth thing, we asked the model to
generate code, usually JavaScript that
we could run against an environment.
Uh and some of the benefits seem a
little obvious to us. Uh with code you
get a typed API, you can do type
checking. There are syntax errors. Uh
models are trained on gigabytes if not
terabytes of data already in the
training set. Uh and instead of doing
this back and forth, you could write
code that executes it all in one run
just one execution.
So uh so this is what I mean like there
are uh fundamental capabilities of code.
You're able to do looping. You're able
to hold state. Uh you're doing
sequencing, paralleliz parallelization,
things that you would normally do with
code anyway as an engineer.
So the first place we applied this uh my
colleague Matt Kerry who's actually
going to be speaking about this a little
more tomorrow. You should watch his
talk. Uh the Cloudflare API surface is
about 2600 API endpoints. If we exposed
a tool for every single one of them,
it's about 1.2 million tokens in your
first call. Like it just blows. There's
no way to create an MCP server for the
entire Cloudflare API surface. And he
had a very clever idea where he exposes
just two tool calls
uh search and execute. Both of these
endpoints accept code as an input
literally a string of code. for search.
The input to the function that you pass
to it is the entire open API JSON spec.
And once it does that, execute gives
gives you a whole bunch of functions
that you can call against the things
that you call. And it reduced that 1.2
1.5 million token thing down to a
thousand tokens. Kind of unheard of. I
think it's like 99.9%
reduction. Uh this is going to be scary.
I actually h have a live demo of this
and uh demos don't usually do me well on
stage but uh but the point being that we
were able to take a wide super wide API
surface and make it incredibly fast.
Uh the prompt itself can be uh fairly
generic. So I should have kicked up the
font size on this one. The prompt here
is as a customer you come in and say we
are getting dodoed. I want you to find
every offending IP that's like attacking
us and block them
in a moment of panic when your website
is going down. You don't have the time
to do menu diving. Uh the Cloudflare
dashboard is famously a little
cumbersome to handle. Uh and you just
want the thing done and you can't even
get an AE. It's like 3:00 in the
morning.
Uh with a regular MCP thing. And this
isn't even talking about stuffing 1.2 2
million tokens. It would be about eight
round trips to do each of those API
calls. Instead, the model can generate
this string of code, run it immediately
right next to the API surface and do it
in one shot. And it's just running
JavaScript. It's just functions and u
just things that you're exposing on the
API surface.
Okay, live demo. This is a demo of our
mythical server. Uh, I hope I'm logged
in because if I'm not, I'll need all of
you to close your eyes while I enter a
password. Let's say I just want to like
list my workers.
Oh, there it is. List my workers. I say
send.
Okay. Okay. And there's no password
required. Okay, fine. That's fine. Okay.
I give it only readonly access for this
demo. Uh, do the thing. Yes. Allow.
Sure. Whatever.
Nice. Okay, it comes back and uh you'll
see it'll start executing tool calls. I
should be able to open this up. It has
sent saying, "Hey, find me all API
endpoints that just say the words list
workers or something like that." Uh it
then runs code
uh which hey, yeah, it's like one single
request for the API endpoint to get all
the workers. Uh it must have received a
whole bunch of these. It's actually
going through JavaScript errors. No,
this is going to be fun to see if it
actually succeeds.
Yikes.
Oh, is it trying to do it like per It's
trying to pageionate through the thing.
Assume that this worked anyway and I'll
keep talking while it does this. Uh,
love that this is happening to me on
stage because I did test it 10 times
before coming on. Uh, I need to pay for
the Mythos uh model to make this work
accurately.
Uh by the way you can actually see it is
actually like listing workers over here.
It might just be having trouble uh
rendering it over here. Um the point
being uh we are able to shrink that
down. Now if this was a talk about
optimizing MCP servers I would be done
and dusted. I was like hey you should
throw this and trust me it works when
you're not staring at it and have 800
people looking at you on the stage. But
it did give us an idea that there's
something deeper going on here. the
ability to like run this code and uh it
feels like there's a new way of
interacting with systems with LLMs.
Um
here's what I think like everyone here
is a programmer and I give you a problem
statement like you have 200 photos on
your desktop. I need you to categorize
and rename them. First thing you do is
you you're going to open up an ID.
You're going to write a little script.
Maybe you're going to pass every image
to a vision model now because you get a
nice caption for it. Uh rename it and
you're done and dusted. That is how you
interact with systems. Uh my mother's
not going to do this. Her options are to
well call me up or usually like buy an
app either desktop phone and no one's
made an app that does exactly just that.
there's going to be like lowest common
denominator apps for photo management
and it's $7 a month and for some reason
you have to install a damon which is
stealing your crypto or some such stuff.
Uh and there's been this dichotomy and
it's fine like until now this has been
an acceptable
uh this has been an acceptable tradeoff
that non-technical people will have
custommade interfaces built for their
needs and desires.
LLMs are breaking this boundary. They
every human being on the planet now has
access to a buddy that can spit out code
that can interact with systems. Uh it
takes it takes a line like rename these
files by date and location and generates
code and can run it on your uh on
whatever system you expose to it. Uh I
say executed safely here and that's the
bit that I do want to talk about in a
minute. The other example I have, so
this is Kenton. Kenton is the creator of
Cloudflare Workers. Uh famously I'm so
he does the work and I like taking
credit for his work. This is our
relationship in the company. Uh so he he
had a thread a little while ago where he
built he's built a little wide coding
environment for himself because no one
else does that in the world right now.
So unique build your own little wipe
coding thing. Uh the the thing he asked
it to generate was a canvas. one of
these TL draw excalid draw style
canvases. Uh, and it did it did a little
canvas with little brushes and colors
and the first thing Kenton did was draw
a tic-tac-toe board on it with a little
X in the corner. This is the finished
state and I'll get to that in a second.
He did that
and uh what he told the model then is I
want you to play tic-tac-toe with me.
The model, as you can guess, it started
generating a tic-tac-toe app.
Okay,
Kenton stopped it immediately. He's
like, "No,
you have access to the entire state of
the system." And the state of the system
here is an array of strokes, you know,
like just a whole bunch of points, grid
line, grid line, xstroke, etc. It said,
"Inspect that and play it with me."
Uh, immediately the model started. It
output the state into its own context
and it's like I recognize what this
looks like. It looks like a tic tac
board and I can see that you put an X in
the top left. Let me draw a perfect
circle in the middle of the app. To be
clear, there is no tic-tac-toe code
anywhere in this system. The the
emergent behavior is that the model has
like sure I now know how to interact
with the system with a set of strokes.
Uh, also it lost uh, by the way, it lost
the game and then when we saw the
reasoning traces, we noticed that Opus
let Kenton win, which is a whole other
weird area of alignment we're not
talking about. Anyway, so this actually
generated a lot of conversation
internally and that's why like this talk
is a little weird. It's a little woowoo.
I'm not even sure where we're going and
I want to like spread the idea to you
and have you folks like integrate it. So
the the phrase we have started using is
it stopped generating a program and it
instead started inhabiting the state
machine. Uh there's a ghost in the shell
reference here for anyone who's over the
age of 40. You need ibuprofen. Uh you
should go back home. Uh but no like it
was a very strange thing to for us not
to have a separate app generation stage
that you then like interact with. That
is entirely the part of the thing. So
what does this new software architecture
look like?
uh everyone's building what they call a
harness. Uh it's because over the last 3
to six months, everyone has realized
that these coding agents are great
general purpose computing machines. It's
why they're running cloud code. No,
they're running Pi on a Mac Mini, which
is the wrong machine for this. By the
way, you don't have to spend $400 for a
thing that makes API calls. It's been
driving me mad. If you check, all the
secondhand prices of Mac minis have like
shot up. I got one before it, but I
bought it because I'm special that way.
uh you everyone's building this harness
and this architecture of the harness is
not just that it can generate code but
it has a safe space to execute this code
into which capabilities are uh exposed.
Uh and there are some
attributes to this sandbox. We're
calling it a sandbox which is again
another completely overloaded term and I
have friends in the industry. Everyone's
building a different kind of sandbox. uh
we have a sandbox SDK which uses
containers and VMs but that's not even
what I'm talking about right now. Um
there are some capabilities to it unlike
a container which comes with all sorts
of features that you surround with
security. You know you do a bunch of
things from the outside. You start with
something that has no capabilities. The
only thing it can do is execute code. It
can't do fetches. There's no exposed
APIs, no nothing. And then you grant
capabilities to it explicitly. Uh we
have something called dynamic workers. I
told you it's not really a Cloudflare
code. Someone else builds something
better. If you think it's better, it's
fine. Uh, but this is what we use. We
use V8 isolates because they start up
really really quickly and it's about 10
years of security hardening. Uh, it's in
our DNA. We we care a lot about that.
Anyway, so we you start exposing
capabilities as APIs. A and we also can
control all outgoing fetches and any
network connections. In fact, the
default way we recommend you use this is
no outgoing fetches, only APIs. It has
to be fast and you need absolute full
observability into it. You need to know
why last Tuesday it made a trade for
$2.3 million for I don't know man like
llama poop or something right? You need
to go back to that code. You need
absolute observability on these systems.
It can be V8 isolates like we use. Uh
you could use I don't know a web web
assembly a custom JavaScript
interpreter. Uh that's not the main
story here. You just want something
that's able to execute, that you're able
to expose capabilities to and run really
quickly.
From here, you can start getting really
ambitious. The example that I showed you
was a oneoff take some code, run it on
an API, expand.
Now, what if you could uh generate
longunning workflows that run for days,
months, years? Uh what if each of those
instances has some state that it can
carry with it uh through its lifetime?
What if in this world of generative UI,
you can start generating a perfect
perfectly custom UIs for every single
user that you have. Everyone who does
e-commerce knows this problem. The more
popular you get, the more UI becomes
this bland thing that has to work for
every single user. And then you bring in
the ML people and like, oh, what if we
change the color button this way if it's
somebody else? No,
you can go absolutely custom. So, u I I
like the fact that I got Opus to
generate generative UI for a slide where
I'm making a point about generative UI
and it still looks a little bit like
Uh but the idea is everyone like e
let me talk about that e-commerce like
you have context about everything about
the user, the things they like, the
orders they have in their cart, the
things that might be making them mad.
You can surface these things as actions.
The UI doesn't have to be a blank chat
box. Though honestly blank chat box
e-commerce might be a lot of fun. Uh
here I have two different use cases. In
the first one it's uh I need to return
these shoes and find something similar
under $100. If the product engineers
have not implemented this how are it's
going to kind of suck in but you can
generate something on the fly versus
what is happening with my uh delayed
order.
Point being, we are now in a world where
we can generate completely different
programs backed by a system that you
built on your back end for every single
user. It's a new kind of software we're
building. And this harness idea isn't
just built into the product. A lot of
people are finding power by running the
harness closer to the user simply
because then they get to start mashing
up all their different services. This is
an anti-Cloudflare talk at this point.
And I'm like, you should be running the
software on your iPhone, like not so
much on our servers. Please run it on
our servers. Uh, but you, but there you
start getting to stitch together
different systems in this safe
environment. And you get to do it on a
taskbytask basis.
Um, I put this in here because I'm a
React programmer and I don't want to
freak out the React people by saying no
one really wants to build UI anymore.
But really it's a hearkening back to
rethinking everything that we have
thought about UI and for this new age. I
keep thinking about it as part of the
tech tree. We have not really explored
for 30 years because eval wasn't around.
But now we have a safe eval and we have
these things that generate code for you.
But you do need to be in a place where
you understand that your next billion
users are these little robots that are
generating code for you. To be clear,
your customers are still humans. things
interacting with your systems. Uh if you
really love your users, you need to find
out where they hang out. And they don't
hang out in the pub. They hang out in
registries. They dream in types and
syntax errors. You know, uh you need to
be thinking about what is the developer
experience for these agents. This is
something a bunch of companies are
already doing really well by the way.
You know, docs which are marked down, uh
errors that let the agent know what to
do next, uh discoverability via search.
The big one that I do want to talk that
I want you to
embed in your head I guess is this idea
of capability based security.
This isn't even a JavaScript talk. It
can be in Python. It can be in Wasm. Uh
I hope it brings a resurgence of lisp.
It's how I kind of learned how like as
work. It kind of breaks your brain. Uh
but the but the attributes are still
very much the same. events, sandboxing,
capability based security, embeddible so
that it's really fast to start up and
run ephemerally. Uh React programmers
simply be well UI programmers simply
because they have so much they've been
so close to users. I suspect that
they'll do particularly well here and
that feels really good to me by the way.
I feel happy about it. So to end
for the longest time programmers like
us, we got code. We had infinite power
to interact with any system that we
could and complain about it on Twitter
because our documentation isn't have the
right CSS or something. JavaScript
programmers super entitled by the way.
Uh, everyone else got buttons and forms.
That distinction in breaking in a world
like this, you need to let the code do
the talking. The code is the thing that
interacts with all your systems. Uh,
come talk to me about it at the pub.
Like this is like it feels like it's
opening up a whole new area of research
for us. uh and we have a lot of ideas
and I get to finish my talk and the day
with six seconds left. How good is that?
Thank you very much. Appreciate it.
>> Ladies and gentlemen, please welcome
back to the stage Phil Hawksworth.
>> Okay.
You thought you'd seen the last of me
and you almost have. Um that's it.
That's uh that's day one of the
conference done and dusted. I hope
you've enjoyed it just as much as I
have. I think it's been amazing. Um
tomorrow I'm not MCing for you. I'm out
of your hair. You have the rather
wonderful Dejas Kumar uh who's going to
be um MCing. He's fantastic. You're in
safe hands. Cherish him. Uh he's great.
Um, a couple of little things that you
might want to know about before you jet
off. Tomorrow we're going to be
starting. I think it's the same routine
as today. 8:00 a.m. out there, there'll
be breakfast and nibbles and what have
you. And 9:00 a.m. we'll be getting
started in here uh in the same in the
safe hands as I say, tas. Um, uh, and
then we'll be we'll be off and round
again. I would like to say before I I I
say goodbye, uh, what a huge day it's
been. So much incredible content. Should
we have a round of applause for all the
incredible speakers you've seen today?
My um
my my brain is quite full now. I've been
I've been challenged by a lot of things.
I've been really inspired by a lot lot
of things. The the last two talks uh
just because they're fresh in my mind
really really have landed very nicely
for me. They've been very useful. So I
hope you've taken away something uh
incredible. I hope you've had good
interesting chats with your uh fellow
attendees and people at the stalls and
and the speakers and all the like. There
is chance to do more of that now because
we're we don't we're not getting kicked
out. We get to go and enjoy the space
there. There are refreshments. I I hear
whispers. There might be beer for those
that like beer. Other things are
available as well. Um so uh go and uh go
and check that out there. I think it's
until 8:00 we have the space and we can
uh continue our conversation. Uh, also
just keep in mind there are various side
events around uh, I mean know they've
been happening already and there's more
tonight and I think there might be some
more tomorrow. Keep an eye on the
website for details about side events.
The the various sponsors and partners
uh, have put those on. I think typically
they're free but you usually have to
register. So keep an eye out for those
because there might be other things that
you might want to get involved with. Um,
okay. I think that's it from me. I hope
you've had a good time today. I hope to
talk to you out there in a few minutes.
Enjoy your day tomorrow. Thanks very
much. Thanks.