AIE Europe Keynotes & OpenClaw ft Deepmind, OpenAI, Vercel, @pragmaticengineer , @mattpocockuk
Channel: aiDotEngineer
Published at: 2026-04-09
YouTube video id: O_IMsEg91g8
Source: https://www.youtube.com/watch?v=O_IMsEg91g8
Hey, hey, hey, hey, hey, hey, hey, hey. Hey, hey, hey. Yeah. Hey, hey, hey. Heat. Hey, Heat. Hey. Hey. Hey. Heat. Heat. Heat. Heat. Heat. Heat. There is no good. There is no pure only fores out of You called it virtue because it felt light. You called it evil when it cut too deep. But nothing here was born to save you. Nothing here was built to care. Every function has a shadow. Every strength a road somewhere pushing for break the bra you know God is good no force is Every system fails when it leaves. There is no virtue. There is no blame. Holy bless by name. rocks when left alone. Chaos eats what it sets free. Steriliz the world. Fever burned it down for me. You crown the side that served you. You fed it until it grew. Now it demands the rest of you. This is not a fall from grace. This is beyond. No. God is good. No. Every system fails when it break it with rebellion. You don't save it with you load it carries and it fells beneath your feet. You ask for purity in the conrain world. That was the error. Necessary. >> Destructive. >> Too long. Corrupted. >> Necessary. >> Destructive. Out of balance. No. God is good. No force is clean. This is the narrow line between survival. There is no mercy only. Not good, not evil. A light or unstable. We arrived inside a world already. Systems in motion before we could name them. Languages layered into every interface. Rules embedded so deeply they felt like physics. Secrets beneath cities, protocols beneath conversation, invisible architecture shaping what we see, what we try, what we believe is even possible. We inherited the defaults, called them reality, inherited the limits, called them truth. But every system around us, every product, platform, institution, conraint was imagined once by someone operating with less context than we have now. Every tool we use was once a decision. Every standard we follow was once a guess. And somewhere along the way, we stopped asking who wrote this? Why does it work this way? What would I build instead? We adapted. We optimized. We learned to move within the but were never fixed. We were always builders, not users. Builders. This world is not given. It is made. And what is made can be remade. No permission, no waiting, no approval coming. If it exists, we can change it. If it's broken, we can replace it. We don't inherit the future. We build it. Nature sets the bounds, but everything else is choice. Every system is a draft. every interface of decision, every limit, a question. So the only thing that matters now is what we decide to build next. Welcome to AI Engineer Europe. Ladies and gentlemen, please join me in welcoming to the stage your MC for AI Engineer Europe 2026, Phil Hawksworth. Good morning. Hello. Good morning. Wow, what a lovely full Is it a full room? Goodness me, it is a full room. Welcome everyone. Welcome to sunny London. Lots of smiley faces here this morning. I think it helps that the sun is shining, right? That helps us a little bit. Come to London, we said, because we'll be guaranteed sunny weather. We lucked out. Well, good morning and welcome to AI Engineer Europe. Our first suare into Europe with this event. My name is Phil Hawksworth and I'm very very happy to be your MC for just today. You have Téja tomorrow. Uh but uh but today uh you have me and it's my my great privilege. Um, it's a full day. I don't know if you've looked at the schedule. I'm sure you have. I'm sure you've been looking at that with a great deal of eagerness. It is a very, very jam-packed day. And it's a jam-packed room. I was going to say squeeze in to fill in the spaces, but it's like a jelly mold in here. You guys have uh have really filled the spaces beautifully. So, uh, so so welcome. Um, so this is our first time here in London. Um, I'm curious before we get rolling. Uh, where have people have people come from? people from London here today. >> Okay. >> One one of you is delighted about being from London. The rest of you are kind of unsure, but okay. Uh elsewhere in the UK, people from the north of England. That's fine. I'm not I'm not trying to out anyone. That's fine. One one person came down from the north. Uh people from the south of England. It's it's interesting gauging the different levels of enthusiasm. Uh difficult to know also where the north south divide is. Uh people from uh from just around England. People from England here. >> Okay, we're we're starting to understand the levels of enthusiasm for where we come from. People from uh elsewhere in the UK. >> Yeah, >> there we go. Where's that enthusiastic person from? >> Wales. >> Wales. I had to ask. I had to ask. Of course, the deep, rich, thunderous voice was from Wales. uh from wider elsewhere in Europe, people come from further a field, people from elsewhere in Europe. >> There we are. There we are. Okay. Does people from the UK? That's how you do it, by the way. That's how you do it. Um I'm curious who who thinks they've come the furthest. Anyone wants a opening bid for who's come the furthest? Shout out where you come from if you think you might have come the furthest. >> Canada. >> Canada. Okay. Good luck everyone. Anyone Anyone want to be Which Which coast of Canada? We're talking east or west. Whereabouts? Whereabouts? East. >> Ottawa. Okay. Which is on the east coast? I know. >> Uh, anyone prepared to try and beat that bid? >> Sri Lanka. >> Where? Sri Lanka. >> Okay. I'm I'm going to start needing to get Google Maps out to try and look at the exams. Okay. So, we've Safe to say, welcome. Thank you for coming all that way. Um, safe to say. >> I'm sorry, where? >> Dominican Republic. >> Dominican Republic. Yeah, you came here for the sunshine. Well done. Well done. I'm so glad it isn't gray and miserable here. It never is in London ever. Um well, thank you for coming all of those all of that distance from all of those places. Um we're very happy to have you here. Welcome to all of you. And you know, why do people travel so far for events like this? Why do we gather? Um it's kind of an interesting time. I think our industry is changing all the time. I think it attracts people who have a thirst for knowledge. I suspect all of you here have got a real thirst for knowledge. Our industry is changing all the time and I mean I remember when I first got started I was putting numbers on screens for traders to trade off uh on the stock exchange and using very antiquated old technologies for that using kind of video switching and then web technologies came in and we started doing various bits of like DOM scripting with very sophisticated Java applets behind the scenes. uh and then you know there was Ajax and there's websockets technologies were always changing you know I've worked at software houses I've worked at agencies I worked at telecom companies and all the time the flavors of web development and software engineering I was using were changing and I have to admit you know so I'm a bit nervous standing here in front of you and admitting this but I was very very reticent about adopting the kind of surge in AI tooling and the kind of surge in what we're seeing now I was really cautious about it I'd kind of like to think, you know, I I know best when it comes to craft and touching the codes and the engineering. I was very very cautious about actually putting my trust in the emerging tools and emerging ways that we're we're building with these new tools. But I'm becoming more and more optimistic and more and more kind of cautiously optimistic, but very very excited about the possibilities now. And it's because of the kind of work that's been done by the people who are on this stage and people in this room and how we're talking and sharing this kind of growing knowledge about how we can build with these incredible tools. And so my fear I think in the initial thoughts of well my craft is kind of going away that that fear is starting to subside now. And I'm just seeing that the craft isn't going away. It's just moving to a different part in the abstraction. So, I think with that in mind, I'm really excited that I can be here and I can absorb all this because this is really the place to to soak up all of these new developments in the world of AI, understand how we can use them better, how we can further the craft of software engineering and as you'll see throughout the day, how that is definitely changing. Um, so this is definitely the place to be for that and we're very fortunate that we've got the support of some wonderful sponsors. uh Google DeepMind uh are today's presenting sponsor. So our our thanks to them. Um also thanks to our platinum sponsors who are brain trust, work and open AI and by cri's incredible support from such a wide range of folks. You know the gold and silver sponsors uh here. We thank them as well. So many of the people from all of these companies you've seen are here at this event. They're out in the expo hall. They're around. They're available to talk to. Grab them. make the most of the opportunity and have some great conversations with them. Um, so before we get on, I'd like a quick round of applause for all of our wonderful sponsors, please. So, we're going to get started in just a second, but just a couple of quick comments about the format of the day and how the day is going to work. So, um, we're squeezed in here together in this lovely room, uh, which is where the keynotes are going to happen. So, we have uh a number of uh keynote talks uh kicking off in just a moment right here. And then we'll have a short break so you can get to go and talk to some folks, talk to each other, make new friends uh out in the expo hall and around the event. Uh and then we're going to split into six different tracks. So, we have a number of tracks that have different specialisms and topics uh of areas of focus uh where you can choose one of them right in here and others around the venue. The details of that are in the schedule and we'll have a little recap of that before we uh break. uh a bit later on. Um so um I think we're almost ready to start. I think you're probably itching to get started and it's it's kind of exciting because I think the the the growth of this industry and the growth of adoption and just the ground swell in all of these these technologies and and businesses being formed around this these technologies is just exploding. I mean, and that's that's been borne out as well by the the growth in this particular event, you know, from being in the US originally and now branching out into Europe and having such an amazing full room of people and of just a bustling communities of people out here. So, uh I know that uh the team at AI engineer wanted to say a quick hello and a six a quick thank you and a bit of a welcome before we get started. So I'm going to hand over the reigns very very quickly uh uh to them for that and then we'll get into the talks for the day. So uh as we get started I hope you have a wonderful day today. I'll be back. We'll talk more during the course of the day. Uh but to get us rolling uh is the general manager of AI engineer. Please welcome to the stage Liam McBride. Here she comes. Good morning everyone and welcome. Um so you're going to be able to tell from my accent very quickly I'm actually Scottish. So this conference for me is very very special. Um I spent most of my career in London over 15 years right at the heart of European tech growth. Um and so today is very special for us to be bringing our first European conference here. Um this is our sixth conference now. Um obviously our first in London and honestly the energy around it has completely exceeded all of our expectations. So thank you. Um AI engineers actually grown at a rate of 900% over the last three years um from our first small event in 2023 in San Francisco. Uh we did three events last year hosting almost 5,000 attendees. We're just getting started as you can see. Um, as well as our core events, we also have a series of licensed events that we do, our partner events, um, across different cities as well. This year, we're doing Miami, Singapore, Melbourne, and Paris. So, if any of you happen to be in those areas, check out those dates on the website. Um, but yes, London was our top priority this year um because really it is some of the most important work in AI is being done right here and of course in uh in Europe. Uh just this morning we myself and Swix were at 10 Downing Street um with uh Kalpier Sohi who is the first ever UK chief AI engineer officer who was appointed in January. Um and what was very clear from the conversation in that room is the ambition of the UK um is accelerating very fast. Um and that creates huge opportunity for everyone in this room um to do very meaningful and careerdefining work in the very near future. So over the next few days, I encourage you to do two things. First, learn, go deep, ask questions, challenge ideas. Um, and second, connect. Speak to each other. Speak to our sponsors, our speakers. Um, and just make connections. Um, that will create opportunities and progress. Um, because this community only works when we're open, collaborative, and practical. So, enjoy this experience together. Uh, paving the way of the future of AI. Um, welcome to AI Engineer Europe. Feel free to come say hello at any point. I'll be here for the next couple of days. Enjoy everyone. Thank you. Our first speaker draws on over 25 years of software engineering experience from his time at Google and now Versel. He will explore what it means to build infrastructure and applications in a world where agents are both the builders and users of software. Please join me in welcoming to the stage the CTO of Versel, Malté Uel. Good morning everyone. This is awesome. I'm so glad to be here. Welcome to the first ever AI engineer conference in Europe. Um, my name is Malta and I'm the CTO of Forcel. Now, this is not, you know, usually I give technical talks, but I thought because I'm apparently going first that I need to give a proper keynote, but I did want to feature what I call my vibe coding uh stack. Uh, I've been hacking on a thing called chat SDK, which is a way to hook your agents to whatever like Slack, Telegram, WhatsApp chat app you like. And I've been working on Just Bash, which is a bash interpreter written in Typescript that gives you something like a sandbox with zip nanoscond startup time um for your agents because they love bash. All right, one thing I wanted to mention is that the reason why I'm so excited to be here is that I used to run a little conference in Berlin called JSCOM for you and I feel at once in my life I had completely impeccable timing because it was the summer of 2019 and we decided after 10 years it was enough and we went out with a bang and the reason why this was such great timing was that there just wouldn't have been a conference one year later because of co But also when we decided that we would you know step away we were hoping that someone else would take the reigns and again that did not happen because of co so it's now been more than half a decade and I'm very excited things are finally starting up again but it was also clear that it wasn't going to be like a web development conference that would really bring the tech community in Europe back together in 2026 right in many ways I think AI engineering is the legit legitimate successor to web development as a really mainstream discipline of engineering that will shape the next decade of software development as you know soft engineering itself faces an unprecedented disruption. So you're definitely in the right place today and it's more important than ever to come together as a community and reflect on both our profession as software engineers and AI engineers. And that's because we're facing a disruption of both how we build which is with AI and what we build which is AI and agents. And of course disruption can sometimes lead to anxiety. In fact, I really actually very often get asked, "Hey, Malta, is there still a place for engineers in the future?" And what about that next generation of engineers? And I couldn't be more convinced that the answer is yes. I often give this example of like envision me doing a Tik Tok video. They mentioned in the intro that I have 25 years of experience, which is actually substantial under uh statement. And so I would not be good at a Tik Tok video. I should not be recording Tik Tok videos because I didn't grow up at this. Right? And in a very similar way, the next generation of junior engineers are going to be so much better at this discipline because they get molded in the AI world just like all of you are. But it's not only the kids that are going to be all right. We'll all be fine. And this is why one of our main see this is that agents are a new kind of software. because there was always all this stuff we wanted to automate, but not all of it was economically viable to do with traditional software, but it is with agents. And what that means is there will be just so much more software in the future. Indulge me with a van diagram. Um, maybe the circle should be better because the circle represents all software that should exist. Imagine all software that should exist. The problem was that we couldn't write all of it because it was too expensive using traditional maxis. Like you can envision like all these things where like have all this if statements, you have all this like knowledge about the the business like you have to figure it all out. You have to hardcode into the application. So much of this software you just would never write because it would be obviously uh too expensive. But now with agents that part of software becomes economically viable. I can build it now um with not that much much effort right and that means that with AI agents essentially all the software maybe not all of it we'll find more in the future but like that circle gets filled out right all of this stuff that should be automatable is automatable there's going to be so much more software out there and in a similar line more and more companies when they ask that question whether they should buy some software like a SAS or make some software themselves. They're answering that with the make side, right? Over in Silicon Valley where I live today, we are talking a lot about the SAS apocalypse. I think it's what it's called, right? People like make their own stuff and they don't buy the SAS software anymore. I actually think the SAS companies will be all right, you know, don't worry about them. But as engineers ourselves, more companies making more software again leads to us having more work even if it's faster. And in fact, the way I've been kind of framing this for a while is that we are speedr running what's really an experiment in economics of how elastic the software market is. The thesis being that the cheaper it is to make software, the more software we're going to make, right? And as a consequence, what's actually happening is that demand for software engineers is going up. Now, we don't know like, you know, like there's going to be an S-curve, you know, but there's no signs of of of us reaching the S-curve. In fact, because we're getting better at agents, etc. There's so much leeway in the future. Um, I think we'll be all right. So, as AI engineers, it's our job to build that next application layer. And of course, what that actually means is building agents, right? I wanted to spend some time talking to talking about archetypes of agents that I'm seeing actually being built today, actually being effective, actually something you can do today without, you know, having to make major changes. I think we're all a little bit drunk on the coding agents because they're so great, right? They work so well and it seems so obvious that you can translate that to all other domains and and sometimes these things don't go so well, right? But the thing is that we don't really have to be doing the most advanced agents you could possibly imagine. There's just so much lowhanging fruit to be be done where you can really really help companies save them millions or billions of dollars without actually you know making these massive changes or processes that in practice will always take a long time fail often etc. So this is what I'm actually saying in the wild. The first part when you think about what agents you can think build is people think well agents that rings a bell I have team of support agents maybe I can automate part of that right and that's also where kind of the first generation of what we call agent as a service you can make that acronym in in your head um startups are are shipping right like you know the the CRS and decagons of this world um but more generally I think it's worth asking yourself in Is there a business? Is there a job where it would be quite transformative if if that you know we went from a 9 to5 thing because you know people need to sleep and I can actually do it 24/7 because agents don't need to sleep right and I think there's there's many places for that. The next one is probably actually even more important. Um I call call it compress the research because every business has a certain type of business process that in a very abstract fashion has the following shape. there's some business event, then you have to do some research and then you make a human decision, right? And you can just build an agent that does the research phase automatically and that's all you do. That's all you ship, right? And the important part why this is like such an easy thing to ship is because the process is still the same. There's still that business event. There's another research and there's a human decision. the research goes faster and you know maybe it goes from something that took a human 30 minutes now they can do the same thing in five minutes and if you run that process 100,000 times a year you just save the company a whole lot of money but you didn't increase the risk profile and you didn't have to change the process at Versell we actually have like at least two agents of this shape when you go to versell.com and you hit the contact sales button that message actually goes to an agent right and I hear about 75% of the time that agent says, "Well, actually, they just wanted support and hand it over to the to the support team." But then in the other case, it will go, "Oh, that's interesting. Um, let me check out their LinkedIn. Let me Google their company. Let me figure out how large they are. Let me route it to the right person." Right? And then there's a human eventually taking a look that it makes sense, but that obviously was something that took maybe a person 15 minutes before and now they don't have to do it anymore. And another example is exactly the same process. If you sent us a abuse report again, there's an agent taking a look. Is that website abusive? What what should we do? Right? Still obviously the decision in the end should be done by by the actual professional, but they don't have to like do all this research themselves anymore. Next is what I think is probably the most magical thing that you can do in any company today, which is to surface information that already exists. It's extremely common that there's information somewhere in the company, right? But for all intents and purposes, you cannot practically use it. Take for example, everyone, you're all engineers, you have issue trackers, right? So, is it up to date? Probably not all the time, right? Could it be up to date? Like, does the information exist? Did you slack it? Did you have a granola recording that technically contains the information that could update your issue tracker? Yes. Right? Like probably yes. And so, you can, you know, build an agent that does this for your company, right? Whenever you have like a manager saying, "Well, give me a list of updates, right? Why don't they already have the updates, right? why doesn't an agent have already kind of done that research already? Um so again this just makes takes advantage of existing information which is so powerful. And finally for the last big category um there's a magical question that you can do to figure out agents you should build in your company which is to ask folks what do you hate most about your job and I actually have a case study about this in at Verscell. So, we actually did build our own in-house support agent and it has what's called a 90% deflection rate. So, 90% of the time it just helps the person in real time rather than going down somewhere else. And what happened? The job satisfaction rate on our support team exploded. Why? Because they no longer have to do the boring stuff, right? Oh, my credit card got rejected, blah blah blah. Right? now they get to actually go and figure out actually interesting cases actually help people who really need help rather than doing all the toil right so that's like I think elimining boring work is a very noble mission that we should all kind of strive to do for the companies that we work for cool so clearly that new application layer are agents but we also have to shift uh we have to consider another shift that the software itself is going to be used by agents now, right? And you know, I work in software development, developer tools, etc. And I think we're kind of ahead of the game here, speedr runninging that transformation. Um, what I will share though is that, you know, on our own web properties, humans are actually now on the minority. So, in the last seven days, and we have not shared this before, over 60% of page views on versell.com were AI agents. And in a similar way, we're seeing the way you use our platform going from people clicking around in the dashboard to uh usage shifting to our API and CLI. So whenever I now, you know, have someone proposing a feature to me and they show me like a UI, I'm like guys, what's the CLI? Like how do you do how do I automate this? How do how does an agent use this? You know, you know, UI is now something that's so cheap. The other thing that we're observing is that kind of the relationship changes between software development, software developers and infrastructure, right? If I didn't write the code myself, I also don't have maybe as strong feelings about how that stuff runs in production, right? And so for a company like us, it's really important that we shift how you deploy infrastructure to a model where most of the software was written by agent and you know has to just run and people like expected to run just like they prompted the agent to do the work. And finally, and nobody here obviously is surprised about this, the applications themselves are, you know, they're agent and that requires us to have different infrastructure available, right? Everyone's now shipping sandboxes. I think it's almost a meme. Um, I was mentioning earlier that I created this thing called just bash and I'm really interested in kind of this innovation of how you can give an agent a computer maybe without giving them a computer. There's lots of interesting stuff there in the market. Um, and I'm I'm sure this conference is going to have lots of stuff there as well. And then also more broadly, again, it was mentioned, I've been here for for a while, like we're we're like marching headon into a security nightmare. It almost feels like a little bit like 1999 where really everything can be hacked, right? And we just didn't know how to make something secure. Um, I think we'll have a root awakening, but what that really means is that we have to be open-minded for for how to change things. Uh, I will give one example. Um, I think almost all currently popular agent harnesses have fundamentally the wrong architecture and that is that they combine where the harness runs with where the code that it runs uh that it generates runs. Right? Um, as of actually yesterday, I I did see that Anthropic agrees with that thesis because they they on the new agent product, they do have that separation and it's really really key. And that's really just also a point that that these are all solvable problems. But my main message today is that we are still in the very early innings and we have to be prepared to be open-minded about like paradigm shift happening in the future, right? We just have the paradigm shift of agents being kind of these like very general sandbox using things. In the future, we will see more of those paradigm shifts. Cool. Um, last point I want to make is that this new application layer that we're building can thrive independent of the models, right? Because sometimes model X is better, sometimes model Y is better, but we are as AI engineers building a stable layer on top. And one of the very interesting consequences is that we don't have to work at an lab to drive AI innovation. In fact, and I think this is almost like a narrative violation. Europe is the leader in AI engineering innovation, right? Um our own AI SDK which itself makes it now over 10 million a week and it's led by Las GML who lives in Berlin, right? Um is working on this. Then there's obviously pi the coding agent u made in Austria. You'll be hearing from Mario about it tomorrow. And of course probably some of you have heard of it. There's a little thing called openclaw. Um and Peter will be on stage here in an hour. And so it appears to me that Europe against all odds is taking actually a leadership role in AI engineering. But we also have to be a realistic right like Europe isn't going to play a major role on the model side. But I don't think it needs to. In fact, I do see kind of two big futures ahead of us. One is where the big model labs win in that world. AI will stay very expensive. All the value of all that cool agents tech will acrue to that company and we won't really be AI engineers anymore, right? We'll be like forward deployed engineers who whoever the winner is, if it's open AI, Anthropic or Google. But I don't think that's very likely. And I think what's actually going on is that the opposite is happening. The model companies are commoditizing. Cloud is amazing. Codex is amazing. Google will catch up. And importantly, I'll give them props now because I think Google's playing an amazing role here because they have the cheapest infrastructure on the on the cost side. And so in that commoditized world, they will always decide to make it cheaper, right? And that will keep the price flow where it should be which is very low. And that's the outcome that we want right because in that world we the engineers are the powerful ones. Our agents are the one that actually create the business value and it's the application layer where the real innovation happens. Right? This is where open claw gets invented and that's where the next paradigm of AI engineering gets discovered. And that's really all I wanted to leave you with today. Thank you very much. Our next speaker is VP of research at Google DeepMind. Please join me in welcoming to the stage Rya Hazdell. Hello everyone. Wonderful. Uh what a lovely full room and good smiles. I heard the uh dig on Google there at the end. I I I I did catch that. Um so my name is Rya Hadel. Uh I've been a part of uh DeepMind uh for the last uh almost 13 years and I'm very happy to have AI engineer come here to London. I'm also uh very proud this year to be uh a UK AI ambassador. So I help the government um academia industry sort of bridge those those gaps. Um and uh yes I'm American by birth but I've been here for long enough um that uh can count myself among the proud Brits as well. So I'm going to talk a little bit about Frontier AI and the future of intelligence. Uh to start a little bit of a of a longer um introduction to who I am. Uh it's good to be as old as I am. You get to look at this by the decades. So in the 90s I did my undergraduate degree in philosophy of religion. Um was definitely not uh not a computer scientist yet. Um uh but I really enjoyed it. Before you ask, uh uh yes, I learned a lot and I'm glad that I did it. and no it hasn't been very useful since um in the 2000s I did a bit of a pivot uh moved into computer science uh after some some good advice from those close to me and spent my PhD years in New York City working on uh convolutional networks neural networks for robots uh with uh with Yan Lun uh which is a lot of fun um I then in the 2010s made the decision to join a small group of uh curious um scrappy uh individuals working at Deep Mind. Um it's a group of about um uh 30 40 people at the time and uh we spent the rest of that decade working on things like Atari video games, uh Go Starcraft and some robotics. um a lot of fun and um uh now I am a VP uh within Deep Mind. I help run about a group of about 1,200 scientists and engineers um across 10 labs um um and uh we're working on a lot of different things. I'll tell you about three of those. Um so first uh frontier AI is uh an area where um we really are trying to make sure that we are staying in the front. So we're thinking about what are the next architectures that we're going to use for Gemini. What are the next problems that only AI uh can really address? Um and how are we going to build the future of intelligence? And that's thinking not just about artificial intelligence but it's create the future of human intelligence as well and even robotics intelligence as well. Um we are all on this uh journey together and I think that it's important to think about how how humans change as well as the technology. Um our approach we look for root nodes. You know we're not going to waste time on the leaves. We're going to really find for a big problem space that hasn't been solved. what are how deep can we go find the deepest problems and solve those in order to then enable um a lot of downstream stream impact. Um we partner you know really with the world I really think about it very broadly and think about who are the partners that can help us find those root nodes and solve those problems and also you know bring it to the to the leaf nodes and solving problems that are worth solving. Um the motto or the mission of deep mind um is to build AI responsibly uh for the benefit of humanity. So I really take that seriously. We want to build build uh solve problems that are worth solving. Um all right. So uh we work in a lot of different areas within Frontier AI um in Deep Mind. These are sort of some of the different uh categories. I'm not going to tell you about all of them. So you can uh just uh maybe keep those in mystery. Um but I'll just pick out a couple. So first in advanced models um I actually wanted to uh bring up uh an embeddings model. So the theme of this talk overall is things that are not directly language models. Um and in the modeling space I wanted to talk about embedding models. To start that out I'll ask if anyone knows what a Jennifer Aniston cell is. Aha. I got a few neuroscientists neuroscientists in the room. So this is actually a concept from neuroscience um where we've discovered that uh there are not just a single cell but a small number of neurons that will encode for a specific thing as in a specific person. understand that those combinations of neurons that only activate for that one person or that one thing or that one place, those cells are actually very robust. They uh activate regardless of modality. Um and this is used by the brain very very fast retrieval for recognition um and for comparison functions. Uh, so that means that when I say the name Jennifer Aniston or if I showed you a picture or a video or if you even heard her voice, if you knew her, if you were enough of a fan, then those all those different modalities lead to the same set of cells activating. Um, so we want that in a artificial neural network um for the same reason. We want fast retrieval, recognition, and comparison. Um and so we can trade what's called an embedding model uh in order to encode for those concepts in order to be more robust um to different uh different ways the information can be presented um and to be very very good at sort of understanding what is the comparison between these different activations. I use contrastive losses. One of the reasons why I like this space is that because I did my PhD work in part looking at Siamese neural networks which was an early way of understanding what is a contrastive loss function. Um and so these these embedding uh uh functions are really critical companion to generative AI. Sometimes we want to generate, sometimes we want to retrieve. Um, so the group at Google has been working on this for a long time and just recently we've actually released Gemini embeddings 2. So this is exciting to me because it really is sort of the the ideal. It is fully omnimodal. Um, it uses uh it's derived from Gemini. Um, so it's got sort of that level of of knowledge and understanding of the world and it is and it allows extremely extremely good retrieval. um uh in a little bit more more detail then um why is it good that it is unified and multimodal? It means that you don't have to have different steps to try to bring things together and ma map them together. You can be truly endtoend and not lose information by trying to combine audio information, visual information, text information together. Um, so you can get a single vector that represents text, uh, uh, up to 8,8K tokens, um, uh, 128 seconds of video, 80 seconds of audio, um, and a full PDF. And together, that can give you a lot of information. You can then use that um to be able uh to use it for retrieval um uh for for for querying um for agentic logic and other things. Um we also use something called the matri matrioska representation learning MRL um and that allows us to have diff be able to have the same network but represent different uh dimensions. So for instance, you could start out doing a retrieval uh using only uh 256 dimensions for your embedding and then you can expand that to get to more expressiveness. Um so uh this also this gives us we can demonstrate that we have this allows us to have a unified um semantic space um and really state-of-the-art quality. Uh so um just something that's come out recently that I think should it doesn't get talked about quite as often as language models but it's really important as that uh companion I think. All right. Next I wanted to quickly talk about another thing that is not a language model. This is not a language model at all. There was no language involved. Um and this is work that we've done on the weather. Uh in London it rains a lot. Um and a few years ago there was a um uh a informatics uh scientist at the Met Office, the meteorological office for the UK, the UK's weather agency that said, "Can you predict better rainfall than our physics-based models um using AI?" And I said, "I don't know. Interesting problem. Let me take this back to the team." Took this back to the team. Um at Deep Mind, we started working on this and we discovered yes, you know what? Predicting the weather even though it is a very very hard problem using a physics simulation of the atmosphere is actually quite tractable for neural network models given that we have 40 years of data 40 years of global data on what the weather is. Um so a couple of years ago we came out with graphcast. Um, Graphcast uh predicts the uh predicts the state of the atmosphere up to 15 days out um everywhere on Earth and for many different variables. And this uses a spherical graph neural network. Uh can think about this uh uh encompassing the earth and having nodes that go all the way from the surface of the earth all the way up into the lower stratosphere. Um, and we actually feed in and then predict in an auto reggressively a hundred different atmospheric uh variables. For instance, uh wind speed, temperature, and humidity as shown here. Um, and this worked very well. Here's a quick example. We were excited to see this um in late 2024. This is Hurricane Lee. sort of comes into the Atlantic, pauses for a moment, and then takes a takes a a turn to the north, speeds up and makes landfall on Nova Scotia. Um, the total this total video is uh nine days worth. So that's how far the the hurricane moves. And this is actually the output of the graph neural network. That's its prediction. and the prediction that it made is accurate nine days out of where that landfall is going to be. In comparison, the best gold standard models um that are that are physics- based were only accurate six days out as to where that landfall was going to be. When you're talking about a major hurricane hitting land, three days is really important. Um so with this we said okay this is important and we're going to keep on pushing the science. So the team developed the next model. We called this gencast. And the difference here is that this model while also based on a mesh um is probabilistic and it has a higher accuracy and a higher efficiency. The weather is fundamentally chaotic and we want to know what's happening um on the tails. And so having a model that's probabilistic, it allows us to do that, allows this to be operationalized and used for actual weather prediction. Um, Gencast also was more accurate. So when we compared it to 1300 gold standard uh benchmarked weather forecasts, then this was more accurate 97% of the time. And it was also uh could be we could produce that 15-day forecast in eight minutes on a single chip instead of hours on a very large supercomput. So much different sort of space of the solution that we were proposing. Um and just this last year this team is relentless. They're constantly coming up with new models. Um and so the latest one is called FGN uh functional generative network. This directly predicts cyclones rather than predicting the weather and then having to add on a cyclone detector as sort of post-processing. This actually incorporates the the categorization, the recognition of cyclones, their trajectory, their wind speed and the formation of the eye directly into the network. We train for that which means that it's much better. Um so this has already been used um in the US by the National Hurricane Center. um and they are very very excited by how much um of an advantage this now gives um so uh this this will hopefully be used worldwide in in the coming years. All right. Lastly, I wanted to use the last few minutes to talk about again something that is not uh language model based. So this is world models. Um, and this actually came out of work that Deep Mind has done on games and simulation for a long time. Um, we've been working on Atari, on Go, on Starcraft, um, and then on, you know, Mojo type environments for robotics because we wanted to understand, um, agency and the environment. we started focusing more and more on not just the training the agent but creating the in an an infinite environment you know when we when I did work on a locomotion here um then oh that's not playing well maybe this will play I'm going to jump forward to genie one so we wanted uh so this is genie one it could only run for a few seconds but you could say hey I want this type of a world. And then you could produce this little platformer 2D game environment where you could jump around for a few minutes and it would actually respond to whether you hit the the left or the right and it could produce a reasonable diversity of different looking platformer type of worlds. This was enough to say, hm, we might have something here. Let's scale up. Let's scale up the data um and train again. improve the method and train again now on 3D games. Then we produce Genie 2. Genie 2 is uh is interactive, but it's not yet real time. So you need to go awfully slowly. Um um and it can uh produce 3D environments, but it couldn't do anything that was really real world type of quality uh and uh more higher definition. Um, so we were working on that and then along came VO3. Oh, going to hope that this I don't know why. Well, um I was going to show VO, but you've all seen VO or Sora or other video generation environments. So, you know that we can create videos that are photorealistic and very good, very high quality, although they're not interactive and they're not real time. And so, Genie 3 really wanted to solve for all of these things. It wanted to to create an environment that you could create a world that you could interact with, you could move around in that's real time, so playable, and that also has that really high level of quality um that we saw with uh with generative video models. Now, I'm not able to advance the slides at all. Well, this is too bad. Um, oh, there we go. All right, let me just see if I can show you a couple of videos. No. Right. I'm really sorry. Well, um if any of you have not seen Genie, then please go and take a look. Project Genie has been um made available uh to all Gemini Ultra subscribers and so it's something that we've been really exciting excited to see. Um we've been really excited to see what people have been creating with this. Um It's just not getting >> online videos. >> Yeah. Yeah, on the Wi-Fi. Yeah. You go to your browser, you might just >> It is. It is. It is. It is. All right. All right. Well, that works now. All right. All right. Now, All right. Well, now I am out of time, but I will still take another another minute or two to just show show you these. Um, so this is saying that telling Genie, I want a world where I'm walking down a muddy lane in Kent. Um, this looks not far from my house. The fun thing here is that you look down at yourself and you realize that you actually have a body. You're actually interacting with the world. It's a little bit odd to uh to know what's coming out of this model. It's really understood not just the appearance of a lane in Kent, but actually what it takes to engage with that, make the water move, and to walk forward. Um, of course, it's not just scenes that are walking. We can very happily uh ski um and so you can create an environment where you can engage with the world in so many different different ways. Um, here's an example where it says original there. That's we started this, we prompted this with a fragment of video and now it's changed to Janie Genie 3. So, this is an artist. He made those that those first few seconds and then we use that to prompt Jeanie and bring this world to life. He was so tickled to see that we could take a little snippet of his world that he had laboriously created and bring that to life in a way that means that you can fly through it. you can bounce off of this thing and it remembers that, oh, here's that here's that weird structure there and go back to that and fly through there. Um, so these environments are not only diverse and interactive, high quality, they also have uh memory. So the prompt here was I'm an origami lizard in an origami world and this is what you get. And we use this as a nice little test that I can spend, you know, uh I can spend a minute running in one direction, run back, uh to the start and everything is exactly as it was at the beginning. Um because we have a really good memory. Working in these environments gives us consistency and control. Um and lastly, we have uh we're able to prompt this world as you're in it. So that means that um while I'm in a world that might be a little bit boring, here I am. You know, this is a world saying, "I'm walking down the Camden Canal in London here, uh, near the Deep Mind office." Well, what happens if I prompted at the same time? Then what happens? I've just changed the world that I'm in can change it again. There we go. Immediately, the world is is is different. And one more time, just for fun, I love the idea of a new form of gaming where I could be adversarially prompting your experience of a world. It just creates a whole different sort of um entertainment, a whole world, a whole new frontier. Um that I think can be really amazing, not just for entertainment, but for education as well. um the ability to be able to go into a world in order to learn about it I think is incredibly powerful um and may well be something that that that we see more and more of. Um and uh with that I will uh say thank you and just a quick call out that tomorrow morning um my colleague Omar is going to talk about Gemma 4 which is a language model. Thank you. Our next speaker is here to speak about harness engineering. How to build software when humans steer and agents execute. Please join me in welcoming to the stage member of technical staff at OpenAI, Ryan Leopollo. Good morning, London. I'm super excited to be here today. I'm Ryan Laop and for the last nine months, I have had the privilege of building software exclusively with agents. Uh, I am a token billionaire and I believe that in order for us to get into our AGI future, we want everybody to be token billionaires to use the models to do the full job. And what that means is to lean into the idea that the models are capable of being a full software engineer. And I've lived that experience by banning my team from even touching their editors to have to work through the models in order to get the job done. And uh today I'm going to talk to you a little bit about what it means to lean into that and operationalize the way you work, the code spaces you live in, and the processes on your teams in order to get the agents to do the full job. I believe I'm preaching to the choir here when I say that the way we build software has changed. In the last six months, we have seen coding agents take over the world and capability has continually advanced at a super fast pace to have these models and the harnesses within which they live take more complex actions, do more complicated work with higher reliability over longer time horizons. And the place we've gotten to here is that implementation is no longer the scarce resource of what it means to do the job of software engineering. Code is free. We have an abundance of code to solve the problems that we come across in our day-to-day as we run our teams, build software, and solve user problems. Hiring the hands on the keyboards as part of our teams is only constrained by GPU capacity and token budgets. And each engineer today in this room has access to five, 50 or 5,000 engineers worth of capacity 247 every day of the year. The only thing that needs to happen, our roles is to figure out how to productively deploy these resources into our code and into our teams to make use of this new capacity. And in this world, skill sets are shifting more towards systems thinking, system design, and delegation in order to make use of this abundant capacity to produce code to solve problems. And there are three reasons that this happened, all of which happened in late 2025. For me, the magic moment was GPT 5.2, which when it came out was able to do the full job of a software engineer. The models at this point are good enough where they're isomeorphic to you and I in terms of the ability to produce code at high quality that solve real user problems in real code bases. Code is free and I know this is maybe a scary thing to hear because code carries maintenance burden but it's free to produce, free to refactor and it is not a thing to get hung up on anymore. We think of code as burden because it it's a synchronous attention drain on the human engineers on our team. But the models are incredibly patient. They are infinitely parallel. So the ability to produce, maintain, refactor, and delete code is no longer a forcing function on figuring out how to allocate resources on your engineering teams. So sort of be AGI pill here is to believe that the models are capable of producing every line of code we could ever possibly need. Figuring out when to delete them, figuring out when to refactor them or make them more reliable. And it's your role as software engineers to figure out how to unblock your team of agents and humans driving those agents from being able to drive them over long horizon work to do the full job. The idea here is that every one of you is a staff engineer. You have as many team members as you can possibly drive concurrently and have tokens to support. And you need to look one day, one week, six months into the future to figure out what structures you need to put in place to productively harness this infinite capacity to produce code. The scarce resources in this world that we see today are three things. Human time, human and model attention and model context window. And in the world where human time and attention is scarce, the role is to think about where that time is going, figure out ways to productively automate it, and move that synchronous human time into higher leverage activities. In a world where human time is scarce and human time is required to produce code, we have a stack rank. Things are either P 0 or P2s. Those P3s will never get done. However, in a world where code is free and infinitely abundant, all those P3s get kicked off immediately, maybe 4x in parallel. We pick one that solves the problem and in it goes. I've had the privilege of building a ton of agents internally at OpenAI to improve the productivity of my co-workers. And when code is free, all these internal tools can have good localization and internationalization from day one. I can make tools that my colleagues in London, Dublin, Paris, Brussels, Zurich, and Munich are able to experience in their native languages without really having to trade against any of my other teams capacity in order to make highquality tools. We should be working with the assumption that the best parts of software engineering that we all know, live, and breathe are available in any product that we could ever build all the time. Humans no need no longer need to concern themselves with implementation. The important thing is not the code but the prompt and the guardrails that got you there. This is why leaving breadcrumbs, documentation, ADRs, persona oriented documentation around what a good job looks like. All the historical logs of tickets and code reviews. This is the process that got you and your teams to the code and products that you have today. And this is what is need needs to happen in order to get your agents there as well. Your job is to build systems, software and structures that enable your team to be successful. And to do that, we need to make them legible to those agents that are driving the implementation. That means structuring them in a way that's native to the agents. Writing them in a way that is respecting of scarce context, which is this other scarce resource here, and figuring out ways to make the tokens that are required to do the job easy to predict. That means making things the same as much as possible so we can limit the amount of attention the model needs to activate in order to do the job. Large scale refactoring in this world is free. So making things the same is something that you are all able to do. There's never going to be a migration that hangs open for six months now that you can't get the last parts of the codebase to do because you can just fire off 15 agents to drive that work to completion. This is what it means to have a migration, right? We can finish them now. Come on. That's good. That's good. Clap. There's sort of this like metaepistical question here about like what it means to do a good job. And doing a good job as a software engineer is hard. It requires us years of being in the industry to fully internalize what it means to write highquality maintainable reliable code that our teammates are able to build on top of that is going to acrue leverage to the codebase. To do a single patch well probably requires 500 little decisions along the way around the underspecified non-functional requirements that go into producing good code. The agents, the models during their training have seen trillions of lines of code that make every possible choice of those non-functional requirements that you could ever imagine. So, it's our job to specify those non-functional requirements to write them down in a way that the agents can see this is what it is to do a good acceptable job that's going to produce a merged patch. And if the agents aren't doing that, it's our job to figure out ways to refine and restrict their output such that the code they write is acceptable. You can just simply say do not produce slop. Don't accept slop. You won't get slop in your codebase. But to do that requires taking short-term velocity hits in order to back up or doubleclick into a task to figure out what it is the agents are struggling with in your environment. Put the guardrails in place so they stop making those mistakes and then figure out ways to step back and spend your time on higher leverage activities once you solve some of the blockers in the short term. When I think about empowering my team in this way, everyone is an expert in what it is they bring. I have a diverse full stack team that is experts in front-end architecture, backend scalability, being product minded. And each one of those different personas fleshes out the skill set of my team by bringing a different understanding, a different set of solves for those non-functional requirements. Getting teammates to write those down actually means that every engineer driving agents gets the best of every single person on my team. I don't need to block on low signal code review in order to learn what it means to write a good QA plan. To have one engineer on my team document that in a durable way means every agent trajectory is going to get a good QA plan. And we can do this once in a high lever way that we're able to stack on top of. So how can we get the agents to do a good job? What are some of the tools and techniques we have in order to essentially prompt inject our agents and continually remind them of what it means to make those specific choices that we expect around those non-functional requirements. And there's a bunch of ways we can do this. We can write good agents.mmd files. However, with autocompaction, which is a thing that has continued to improve, GPT 5.4 and CEX is fantastic at autoco compaction, I essentially never have to write slashnew anymore. I've got some pictures on my Twitter of me strapping my laptop into the back of my car so I can continue do running inference while I'm commuting to and from work. And in this world, you have to kind of build for that expectation that context will get paged out over time. We need to be continually refreshing context as the agent goes about doing a task. And the ways we can do that are by having reviewer agents look at the code along the way through the lens of what it means to be successful. Right? We have security and reliability review agents in our codebase that are continually running as part of every push and CI that look at those documentations and the proposed patch and do simple things like say, are there timeouts and retries on this bit of network code? Has the code that has been introduced have a secure interface that is impossible to misuse? I'm sure everyone here has been paged at some point for network code that failed in production causing an outage that could have been remediated by a retry and a timeout. And I know I'm guilty of putting that retry and timeout in merging the bug fix and otherwise ignoring that. I am not a reliable reviewer or author of code with respect to this non-functional requirement. However, taking the time to write some docs, write a lint that is bespoke to my codebase that is going to look at every time I call fetch to make sure that there's a retry and a timeout wrapped around it means I've durably solved this problem and I'm able to do it because I lean on this axiom that code is free that the agents are able to do a good job that I can completely migrate the codebase to solve this problem durably once and for all. And in order to kind of operate in this way, we need to step back and look at the durable classes of failures that the agents and the humans in the codebase are making time after time. Figure out why we're spending time on it. Devise a solution to systematically eliminate this class of misbehavior and then continue to observe, refine, and make additional choices on those non-functional requirements. One really neat trick I use here is that you can write tests about the source code as well that are separate from lints. Right? If we know that context is limited, we can write a test that limits the fact that files are no longer than 350 lines. We're adapting our codebase to the harness to the models to do a little bit of engineering to be context efficient and squeeze more juice out of the model capability that we have today. The other things we can think about are providing good error messages that give actual remediation steps to the model and to humans for how to proceed next. It's not enough to say we've got a lint failure because we're awaiting in a loop or that we have an unknown at this deep part of the codebase and why is the model writing a function called is record. What we need to do is provide a prompt via a lint or a test failure that says no no no you shouldn't have an unknown here at all because we parse don't validate at the edge and you certainly have a type here which was derived from zot loadbearing infrastructure for our AI future prompt things I've talked about here today is a prompt you can do this without touching the model weights at all. Kind of a funny digression here is it seems like each advancement we've had in the complexity of the way we write code to interact with these models comes from both increasing capability in the models and increasingly niche ways for injecting prompts into those models. Prompts I'm sure you're aware are prompts. Powers prompts rules files prompts skills prompts. These lint error messages that I am talking about prompts. Review agents that inject comments onto the PR that we require the agent to address before it is able to propose it for merge. Prompts. You're going to find lots of ways to insert prompts into your code. And one way you can do that is by embedding agent SDKs into your tests that are going to review the codebase for acceptability using prompts that get embedded into the code. And if I find myself spending a ton of time writing prompts, we can actually shell out to the agent for that as well. Uh, I've pointed codecs at all of the prompting cookbooks we have on the OpenAI developer guide and told it to synthesize a skill out of them for how to write prompts. Which means when I find a need to write prompts in order to improve my agent performance locally in the code, I use the skill to write prompts that I wrote with the agent looking at the prompts to write the prompts. All the leverage that you're encoding in in to your repository, your team, and the agents in this way stacks incredibly well. To kind of pull back to this idea that a single product-minded engineer on my team was able to give us a big lift, they know what it means to write a good QA plan. To write a good QA plan though, you have to document all the features that you have, the critical user journeys, and how users engage with your applications, web apps, APIs, and services. Once you write those down on how to write a good QA plan with the expectation that all userfacing work has a QA plan, now a review agent is able to assert expectations around what it means to prove that you have effectively written the feature. A QA plan indicates what media should be attached to the PR for the humans and agents to know that you've done a good job, which has the consequence of me trusting the output more, needing to shoulder surf the agent less, and removing myself from the loop even more to delegate more and more of the work to agents. And all of this is just making sure the agents have the tools and tokens and context to do the full job to remove myself from the need as a synchronous driver. The models crave tokens. We can operationalize our codebase to give them tokens to drive them forward using sub agents and all these other techniques to refine the agent output. I'm excited to let you all know today in the way you all do that you can just go build things. Do not hesitate to remove yourselves from the loop by getting the agents to do the full job because they can. Thank you. Our next presenter is the creator of Open Claw, the world's fastest growing open-source AI. He recently joined OpenAI to work on bringing agents to everyone. Please join me in welcoming to the stage Peter Steinberger. Good morning everyone. >> So Swiss asked me to do a state of the claw. Who here is running open claw? Give me some hands. Ah, it's like 30 or 40%. Very good. Um, yeah, it's been quite a few months. Um, the project is now 5 months old. I think it's fair to say by now that we are the fastest growing project in GitHub's history. Um if you've seen the the graph usually it's some some projects look like a hockey stick but ours was just like a straight line and a friend called it stripper pole gross and that comes with it own challenges. So we have I think but now we are the the largest number on GitHub stars. There's a few that are bigger but they're basically educational target. No other software project is that big. It's around 30,000 commits. it. We're closing in 2,000 contributors soon to be 30,000 PRs. Um, see, and we're not slowing down. So, you see that it's a ramp, but you know, it's we only have April 9. So, um, velocity keeps keeps being good and at the same time it hasn't been easy. You know, I I had two roads when I when I decided what I want to do and I I did the whole company thing. I was like, I don't want to do this again. And then I joined OpenI, but then we also created the Open CL foundation and now I kind of have two jobs and running the foundation is like a running a company on hard mode because you have like all the all the things that you need to take care of but also you have a lot of volunteers that you can't really direct. So one of my goals has been working on the on the bus factor like who does comets um and you see that it's slowly improving. Vincent is actually talking after me but we're still not we're still not there. Um, in the last months I I talked to a lot of companies. So, we now have people from Nvidia on board. We have someone from Microsoft on board to like help with MS Teams with like a Windows app. Uh, we have someone from Red Hat who's really helping us um with security and dockerization. We work with a lot of Chinese companies. We have people from from Tensent and Biteance. um they're actually much larger users than any other continent and yeah people from pretty much around the world but like the main thing I I want to like talk a little bit about is about open claw is so insecure you know you've you've seen the you've seen the memes so like open claw invites the bad guys and you probably also seen companies like Nvidia doing Neyo claw and like everyone has little lobsters. So you also notice that like in the last two three months there's been a lot of releases where things broke. I've basically been been dosed by security advisories. So that's what I did um and what I focused on. So far we got 1,142 advisories. That's around 16.6 a day. 99 are critical. Um we published around 469 and we closed 60% of them. So these numbers sound like absolutely terrifying. If you compare it for example to like other large projects like the Linux kernel gets like eight or nine a day. we get like twice as much and curl so far has 600 reports we have like twice as much as curl. So every time I I get a security incident, the rule is the higher the higher they are screaming how critical they are, the more likely it slop. Like we we are I mean you've probably also seen the news like we we we are very fast moving into a world where we have to change how we build software because all these AI tools are getting so good at identifying even the most weird multi-chained exploits and like we're gonna going to break all the software that exists. I give you an example like uh Nvidia they they launched Nemo Claw and Nemoclaw is a a plug-in and a security layer for open claw using put it in a sandbox. I the keynote was on Monday. They invited me on Sunday to like work with them. I hooked it up to Codex security. It found like five different ways how to break out of the of their secure sandbox within half an hour. That's because like if you use that product, you get access to the unnerfed model that is quite a bit smarter in terms of cyber than what the public has access. Exactly. Because it's dangerous. But yeah um also this whole industry those people for them it's like credits right the more the more issues they find the more they are seen so like openclaw was like the insecure product that everybody tried to break so literally like hundreds of people firing up their clankers trying to break open claw um the typical attack surface This is like remote code execution, bypass approval, code injection, pass traversal. Uh again sounds all very dangerous. And I give you I give you one one concrete example. Um Gshjp. This is about a this is a CVSS of 10. So it's like the scariest thing that you can possibly do. It is an issue where if you uh sync for example the iPhone app that we haven't even shipped yet but is in progress and you give it only read permission then you could like break the system to also get write permission. So this this one was so critical that the I know this one's actually different one in all in all practical ways. It is not even an incident because the the the typical use case is you install it on your machine either in a cloud or if you have to on a Mac Mini. I I stopped fighting this. I'm just letting people have fun now. But in 99% 99.9% of cases you either have access to your gateway or you have not access to the gateway. In in in my defense this was my mistake that I tried to create a a more permissive model. For example, if you have devices that would target speech and then would only like read certain things. So there's like some use case where you could like have a a reduced permission system would make sense. Um but nobody's even using that. But this doesn't matter because the rules of the of those how you create the CVSS numbers don't contribute to that at all. And I try to play by the rules. So it is a 10 out of 10. And the world is going crazy over incidents that in all practical ways will not affect people. There's some other stuff that does affect people. Uh we have nation states trying to like hack people. There was like ghost claw which is like from likely from North Korea which is basically confusing people with a different npm package and if you if you go to a wrong website and you try to download it you get like a a root kit. Um that's outside of our control. This happens for other people as well. Um, also there's the Axios thing which funny enough we are not using Axios but we are using MS teams or Slack as a dependency and they're using Exius and they didn't pin us and of course uh because that's how supply chain attacks work. We were also affected. Yeah. How do you survive 1,142? I'm sure now it's 1,150. Uh for a while I I I tried to handle a loop by myself and which is absolutely impossible. So So the fastest way to get help was like getting getting help from companies um and Nvidia has been really amazing to like give us some people that basically work full-time going through the slop and hardening the code base. Oh, there's also one that is okay. That um this is one of the angles. The other angle is like there's a lot of companies that do fearongering and it's not just companies, it's also universities. I don't know if you've seen it. There was like this um paper who made the rounds agents of chaos and they say oh it's it's about agents in general but then there's four pages that explain the open claw architecture in utmost detail but you know which page they didn't even mention a security page where we explain how you should install it because then it wouldn't be fun and it wouldn't be it would be hard to make a good story. So what they instead did is they ignored all of the recommendations we do in security. Recommendation is it's your personal agent. Don't put it in a group chat. If you put it in a group chat, turn on sandboxing because if anyone can talk to your agent, they can exfiltrate anything that the agent can do, right? So if it's a team agent, it should only know what the team can know and not any secret data. And you probably want to like have it restricted. If it's your personal agent, you should be the only one being able to talk to you. But if you don't play by these rules, you can get some really fun interactions like, "Hey, I can talk to your agent and it can break your system." And then because I I was I was grilling them a little bit because I had some questions how to do things. They told me, "Oh yeah, no, we run it in pseudo mode because we wanted the agent to be like maximum powerful." So they actually fought the setup. It's actually not easy to run it in pseudo mode. You have to change code. um but they didn't mention it in the report because again that wouldn't give them cloud. So yeah um my current frustration is like there's like a whole industry that try to put the project in negative light. It's a nightmare. It's insecure by default. It's unacceptable. Um and meanwhile a lot of people love it and people who actually read the security docs understand it can use it just fine. One example that I found particularly great is uh we had one remote one rce that panicked Belgium. So the Belgium cyber security did a release uh about a remote execution environment and the whole bug was a feature where a malicious website could create a link that would trigger the gateway and then forward your gateway token. Now if you use the setup that is the default and that is recommended the gateway token is local only or if you have to it's in your private network no external website can actually access it. If you actively fight the setup and for example use cloud code to set it up without reading, you might be able to get this setup working. But again, that's not anything what what's said on the website. So to be very honest, yes, there's absolutely uh risk. the the the big risk is the the basically the lethal trifecta. You know, any any agentic system that has access to your data, has access to untrusted content and the ability to communicate is something that's potentially at risk. That's not anything special to OpenClaw. It's like any any agent any power fishing system has a problem. The more the more powerful you make it, the more it can do for you, but the more you also have to understand what it does. So this is like the the main issue but people don't talk about it. Yeah. And then also um some part about maintaining. So the problem is like if you get all those security advisories, you know that most of them are created with agents, but you still have to use your brain to actually read it because we're not at the point where you can fully trust or I'm not at the point where I I can just fully trust that the agent will figure it out. So it is a huge burden on on time and you never know. I mean sometimes you can you can often guess you know anytime the reput is too nice or like someone apologizes that's very likely AI because usually people in security don't apologize. Um but it is a huge problem and it's something that I see more and more open source projects complaining about or like breaking. Um, some are very public about it like ffmpeg. Usually you get the report. It's very rare that you actually get a report and a fix. If you get the report and a fix, it's usually a very bad fix. If you rush it, as I sometimes did in the beginning because it was overload, you will very certainly break your product. Yeah. So this is something that's just very difficult to pull up only with volunteers. So we so what are we working on? Number one is I people say like open AI bought openclaw. That's not the truth. They might bought my soul.md. Um but they very much understand that in order for what the world needs is like more people that play with AI to like understand what AI can do to both understand the risk and also the possibilities. They understand that if you or like someone who never played with never used AI suddenly is at home and uses open claw they'll come to work and they will ask why don't we have AI at work. So they very much understand that like supporting this project is very useful and in order for that project to be successful cannot be under one company. Therefore I'm kind of building Switzerland with the Open Glove Foundation and I have Dave who's helping me with it. Um it's almost done. The last thing that's keeping us going is like the American bank system which is a little bit slow and very confused when you're not American. Um, it's inspired by what Ghosti did. And this will actually then help us to hire full-time people to both keep up the pace, improve the quality, and free up some of my time that I can work on on cool stuff again. And that's my little update on State of the Claw. I'll be around later for like a Q&A. Thank you for listening. Ladies and gentlemen, please welcome back to the stage Phil Hawksworth. Hey, thanks Peter. Thank you so much, Peter. Okay, some of you are already aware that there's a break happening now and you're heading for the exit. I completely understand it. a quick couple of quick things from me before you go. Um, so first of all, we're about to uh break into our into our various tracks. Um, there is a break until uh 11:15 that's going to be happening. Uh, well, it's happening everywhere, but you can get refreshments out in the expo hall. Um, just a very very quick bit of information uh about the uh the various tracks. Uh, let's see if we can uh advance that. We've got uh here we are. So we have um oh no this is is this right? I don't know if this is quite right. Uh you'll check on have a check on the schedule because we've got uh tracks for openclaw happening in here. We've got a track for um harnesses harness engineering. We've got a track for context engineering. We've got a track for multimedia um sorry multimodal uh interfaces uh talking about OCR and text to speech uh etc etc. Um uh and we've got uh of course Google Deep Minds um uh have a track as well. We're going to be talking about all things to do with open models, agents, web MTP and much much more. Um I've rattled through there very very quickly. Um but also there is one more track uh that I wanted to tell you about and that is the hallway track. Uh the hallway track is well we thought we've got all you gathered here. It would be nice if you gave a talk as well. uh you don't all have to give a talk but if you would like to give a talk you might just be able to. So uh when you uh registered for the event you would have found information in the email about joining the the the uh AI uh engineer Slack. Don't know if everyone's found that. You should definitely be in there. Uh but you can submit a proposal for a 10-minute lightning talk and there's room in the schedule tomorrow to give that. So uh I'll remind you later on but put your proposals in there. Uh and then there is a chance that you could give a talk uh tomorrow. So keep an eye on the schedule. Also, we'll be letting you know there's a vote for that. Okay. I sense you're hungry for refreshment and I don't blame you. A lot of information today. So find check the schedule for the various tracks. Uh find those tracks later on 11:15 starts. Enjoy your refreshments. We'll see you soon. Thanks very much. Heat. Heat. Heat. Heat. Heat. Hey. Heat. Heat. Heat. Heat. Hey. Heat. Heat. Heat. Hey, Heat. Baby, baby. Heat. Heat. Hey. Hey. Hey. Hey. Heat. Heat. Heat. Heat. N. Heat. Heat. Heat. Heat. Heat. Heat. Duh. D. Heat. Hey, Heat. Hello. Hey. the only I know it. Heat. Heat. N. Heat. Hey. Hey. Hey. Hey hey hey hey hey hey hey hey hey hey hey. Heat up here. Hey, hey, hey. Hey, hey, hey. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Hey, Heat. me. Heat. Heat. Heat. Heat. Heat. Heat. Hey. Hey. Heat. Heat. Heat. Hello. Okay, great. Thank you for the whoop. Love the whoop. Um, so excellent. Okay, you've chosen the claw uh track to get started on for our our breakouts and uh uh it's going to be great. I think it's going to be it's going to be a good session. Um we are going to be hearing about a bunch of different things uh related to uh openclaw and just personal AI assistance in general. There's some open claw contributors, openclaw maintainers, uh um uh open claw competitors, uh and openclaw creators, uh going to be here on the stage. Um we're actually going to uh be taking this through until the lunch break. Um oh, there we go. We can see up there. So, it's about an hour and a half of uh of sessions, slightly shorter sessions than uh than earlier, I think. Um but we're going to be starting with uh an AMA. came in. You saw Peter earlier on, but you're going to get a chance to ask questions and there's going to be a bit of a conversation uh with Peter and Swix. So, I think to get us started, I will simply invite Swix up who will kick things off. So, uh please welcome him to the stage. Swix, come on up. Swix. >> All right. >> Actually, we can just go together. >> You can come out together. There's no secret. Peter, welcome. Everybody there is Okay, so the deal for this is meant to be an AMA. Uh the the main idea is that I've run six of these AI engineers and whenever we have some big maintainer, big VIP, we only give them a talk, but actually you guys have questions that you want to ask. Uh so uh we wanted to sort of create that opportunity. So you can you can submit there. I'm going to moderate uh and and all that. U the spicy one I'm just going to start off with. Pete just quote uh quote tweeted uh me and saying send all your questions about closed claw right uh I think uh people have a lot of questions about um the future of openclaw at openai uh and uh I wanted to give you the space what what is the what are people saying about closed claw and then what is your response >> I didn't even think about it was like it came up when when I decided to go to to openi And I think I think people have a point that open air wasn't always amazing with open source. And I I think a lot changed like Codex is open source now. They released Symfony which is a really cool orchestration layer. So like like they're really leaning in and understanding open source now. They understand that open cloud needs to stay open work with any model be it be it one of the the big companies or being a local model um everybody in the industry wins if more people spend time with AI you know if if I if I think AI is something scary and then suddenly I I I play with open claw and suddenly it's like fun and weird and then I come to work and there's no like I don't have AI tools at work. I'm going to get to my boss and say why the f do we not have AI at work and and then like those companies would probably not run open claw but we want something that's like hosted and managed and and then somebody can can make a sale. So they they're like very much on board. They provide me with resources. Um, actually it's me like I could get a lot more people from OpenAI to help with the project, but that would just make a picture that they could have taken over the project and I don't want that. So I I I brought in people from Nvidia, we have someone from Microsoft, someone from Telegram, someone from Salesforce of all the companies. So So shout out actually there's cool people at Slack. So we have someone that maintains the Slack plugin. Now I brought Tensent on board, Bite Dance. We talked to Alibaba, Miniax, Kim, like all the all the model providers. They're like very much on board. Um, Nvidia has been immensely helpful. They I think I one of the coolest companies in terms of here's some engineers who actually like just hire agency and just do things. >> Yeah. Uh and now that I have all the other companies, I'm also bringing a few people in from OpenAI to to help maintain the project because it's I mean software is just like changing that the the pace at which this project operates is is insane. You kind of like you need an army. Um and I'm working on that. >> You have an army. Uh and but but you know even the contributor chart that you showed uh shows that it's hard to get quality contributors to stick around. people keep hiring your maintainers and then you have to find new ones. Um, so there's a lot of questions about local models and open models. Uh, you know, like not every part of the stack is open. There's many models where you don't have access to the models and and you know, there's sort of weird restrictions. Um, how important is open and local models to the future openclaw? I mean part of part of what what motivated me to build open CL is you see all these large companies and then they have connectors to my Gmail and then my my email is hosted somewhere then this company has full access to my email and then I can get a little bit down there like it's much more exciting to me if I have all my data actually under my control and I and like a little bit of it goes up there if I need the top tier token. >> Yeah. and like a second kind of hierarchy of uh fallback models. >> Yeah, you want to I mean I'm I'm European at heart. You want to own your data, you know. So so so and nobody built it. So for me that was very attractive and also the the fact that you know if if you're a startup you want to connect to Gmail, it takes like half a year and it's like a very very difficult process. But if I'm a consumer my clanker can click on any website and it happily clicks on I'm not a bot. If you have to give me the data somehow, if you can if you give me the data, my my agent is able to get the data. So you can work around a lot of those those silos those big companies are building and ultimately you can do much cooler automation use cases that large companies can never do. >> So it's it's like it's a little bit the the hacker way. >> Yeah. And um any indications from the open team on GBTOSS? Is that continuing continuing to be a stream of work that uh will be aligned with open claw or or is that like separate? I'm not I'm not in a position to give yeah give you insights on that just that um part of an open open cloud trigger is that like more people in the company are getting excited about open source um and I I love that that open air is moving more into the open direction again if you compare it to some other top tier labs that start with an A uh that very much will sue you if you if you leak any of their source um or block you if you are too successful. I I I think open up your eyes on a good direction. >> Yeah. Okay. I want to highlight this question. Um people love hearing about your coding workflow. I think right by now your idea of um uh the prompt request rather than the pull request is is very well socialized. And also you've been shocking people with just how you're spending tokens at OpenAI. Uh so basically uh the people want to know how you ship and what do you do about agent waiting times like why is you know you're spinning out so many agents it >> I know like I I never imagined that this one picture of me would blow up so much. Yeah, >> actually >> uh give give some numbers just just to align people. I I think and there's times where I was running almost 10 sessions at the same time especially when I used codeex with 50 51 it was quite slow I think now I have to say we it's still weird we made improvements they both make it faster and then there's also fast mode so by now my typical workflow is maybe half of that maybe five six windows instead of double just because each loop is faster and like the area of work I sync in workers is pretty much the same. So I I don't have to use split screen so much anymore and I think we're going to move into a future where um token will be will be faster and faster. So at some point like this is not natural that you work on on six things at the same time. Um but it's basically a workaround until until faster. Yeah. Uh, one of my, uh, interesting things of putting you next to Ryan was to see how the two of you kind of approach uh, token maxing. Basically, I'm curious what you think about the the complete dark factory approach, right? That uh, you don't even review code that goes in. >> I think that's more and more doable. But also you know when I when I dark factory in a way also means I come up with everything I want to build in the beginning and I just don't think you can build good software in that way like the way to the mountain is usually never a straight line. It is it is it is very curved. Sometimes you go a little bit off track and then you you see something new that inspires you. You find like shortcuts. um once you're at the top you you you can find the optimal path but you never walk like this. So at the same time you will the first idea that you have about your project is very unlikely going to be the final project. But if I if I suddenly use the waterfall model again that will be the final project. For me that doesn't work for me. Like I I build steps I play with it. I see how it feels. I get new ideas. My prompts change. So to me it's a very iterative approach. So I don't see how you could fully automate that. You can definitely build pipelines for certain things. >> Yeah. >> But even even for PRs, you don't just want to build a pipeline that just merges PRs because a lot of them just don't make sense. You know, like people people will pull your product into all kind of directions, but if you automate that, the AI will very unlikely know what's the right direction. You can guide it. I have like a vision document that I tried some of that but the bottleneck is still sinking and like having taste. Yeah, taste is very important. Uh how do you define taste? This is something that in my conversations with people everyone understands taste is the moat but nobody agrees on what taste good taste is. So I'm just curious to hear yours. I think in this day and age is like the very low level of taste if if it doesn't stink like AI and you know exactly what I mean you know if if something is just so writing style personality >> also also also UI by now you've seen so many so much aentic built UI that you immediately know if it's AI >> yeah if it has the the color border on the left right yeah I mean for a while it was like the purple gradient but much more so I I feel it's It's like a feeling the same as you can identify AI written slop right away. >> Yeah. >> Um that's why I say it's a smell. Like even if you can pinpoint this, you will know. So So that's probably the lowest the lowest characterization of taste. And and then going higher up because now so much of software is is automatable. There's actually much more time you can spend on like the little details. I don't know, you know, like like just when you when you when you when you run open claw, you get like a little message uh that sometimes roasts people. Those are like the delightful details I think that >> you'll just not get if you prompt in a high level. >> Yeah. One one of my favorite tastes of yours is how you you uh really put a lot of work into your sole soul MD and you uh you know open source your approach and I don't think people worked on enough soul until until you came along. So I think that's really interesting. Uh my I I I have a podcast I haven't done yet. I haven't released yet with uh Mikuel Parakin who was the CTO of Shopify now, but he was the uh guy leading Bing where Sydney was uh the original sort of unaligned chatbot that emerged. Uh but I think people really have fun when when your soul your chatbot has personality. Your clanker uh you know has different obsessions. >> Well, it wasn't because it the world changed, right? We had we had chat GBD in 2023 and 4 and it was basically us having AI without understanding what AI can do. So we rebuilt a Google so you have like a search field and like you get a response and you you don't expect Google to have a personality. >> Yeah. But now that we moved more towards agents like if if I I didn't think about in the beginning WhatsApp relay and I just hooked it up to cloud code. Um and then I when I was on WhatsApp I noticed that it doesn't feel quite right. Like even even though like cloud code already has some personality it didn't really fit how people would write to you on WhatsApp. So that that's how my whole iteration started was like uh this again it's about taste, right? It doesn't feel quite right. It's like too wordy. It uses too many dots. It it it my friends text different. And then that's how I started working. They say, "No, this isn't like try to write more like a human." >> Uh yeah, I I actually run a writing >> like a lobster. >> Uh like a lobster. Yes. Um uh you know the one of my favorite quotes of yours is uh madness with a touch of sci science fiction. Yeah. Right. Like that this is how you run >> um uh AI projects. And I think >> not all the art projects, but specifically something like OpenClaw would have never been able, it would not have come out of an American company just because it would have been killed in legal long before it would have been released because it just has some problems that we haven't really solved as an industry yet. >> Yeah. >> But now we have some mitigations and it's getting better. The models are getting a lot better. But I don't see how any of the big labs could have released that. You know, it would be too much push back. Oh, and like not enough market proof that this is what people want. >> Yeah. >> So like it had to be done by someone >> like >> outside. Yeah. That that that >> sitting >> like literally like when I when I built it in the very beginning, I was like, "Oh, what's the worst that can happen?" like it could exfiltrate my token, my emails. Yeah, nothing is nothing nothing's in that that would like completely kill me. You could like upload some of my pictures. I was like, yeah, I guess the worst are already online if you use Grinder. Um, so it was like it was like, okay, I can live with that risk. It will be uncomfortable, but it's like it's manageable. >> Yeah. >> Uh, if your company is a different it requires a little different approach. >> Yeah. By the way, uh his Instagram account, good follow under underfollowed. It's also it's also has some good stuff. Um okay. Uh you were talking about WhatsApp, talking about Telegram, a lot of these text apps. Um uh text apps are good. People are also looking for like the next form factor. People want like the maybe the the glasses, the earbuds. What What is your sort of wish list in terms of having agents in your life? I started on that actually already, but then I was just getting bogged down by all the people using it and just like the daily grind. But if you're at home, I want to be in any room and you know at Star Trek when you can when you say computer I I I want to like talk to my agent wherever I am and it should just be able to like respond to me. It should know where I am. I have like little iPads in every room and and my agent can use the canvas feature and project stuff on those iPads. So like if I ask a question that that is like easier to be to be answered by also showing me something like it could use like the nearest display because it's aware of where I am. So the phone is just a very convenient input point but I kind of want to like talk to it from anywhere. Yeah. >> Like yeah if I'm around and I have glasses I should just like be able to like listen in and like project something on me. >> Um >> but just ubiquitous follow you. >> I think yeah once we have >> really smart home. Yeah, >> like agents on your phone, but really you want ubiquitous agents and then you want maybe you will have your your your uppercase open claw your private agent at work. You might have your I don't know lowerase openi claw and then that claw should be able to like talk to your personal claw uh in a way that both your company and you are comfortable with. So that's kind of like the future where we need to work on. >> Yeah. Uh one of uh I just did a podcast with Maran Dre who's a huge fan uh and and also uh have conversations with Andre Karpathy. Both of these guys are running OpenCloud to run their house. And I think OpenCloud for homes is like a kind of underrated, but like people are really discovering it. And my funniest sort of irony is that is it's only possible because the internet of means that most smart devices are terrible in security, which means Open Core can run them. >> Oh, it's going to be able to work so much better in in a few months when the models are getting really bad. >> Yeah, they're very good. Um, okay. One security question. uh about prompt injection. How do you want to solve prompt injection or what what uh ways in which uh have you been thinking about the prompt injection problem? Probably not enough yet. On the other hand, like the the the front end models are really quite good at detecting all the all the cases where like just stuff randomly comes in from a website or an email is usually not a problem anymore. You mark as untrusted content very hard to excfiltrate you from that. If if I have unlimited access to your claw and can bombard it with stuff, then there's still a chance. >> Then then there's still a chance. But like for one of things, >> it's no longer the biggest problem. If you use that's also why why you know that this is probably the angle where like some people say, "Oh, Peter doesn't like local models." But then I see like people running like a 20 uh billion parameter model that just does whatever you tell it and and it's not trained to have any defenses at all. That's still problematic. If you run that and then you use a web browser or email um would worry me. That's why that's why OpenClow warns you if you use a small model and then people spin the whole thing like we hate model. I I love I love I love that it we support everything, but like you have to steer the regular user a little bit into a direction to make it harder for them to shoot themselves in the foot. >> Um yeah, there there is some ideas for problem injection. It's >> still a little bit away. I haven't announced that. >> I think Simon Willis has been working a lot on on this. is I mean he coined the term prompt injection and the sort of dual LLM approach seems smart uh and I'm I'm not smart enough to figure out all the ways that which it can be attacked like at some point trust just has to be a thing right um and uh and I pro it's something interesting I found out from talking with Vincent who's speaking next is that you guys had to implement the same trust system that Toby Luca had to implement which is uh you build reputation over time and things with more trust uh gets more privileged access, right? And I think that that makes sense. >> That's part of the story. >> Yeah. Yeah. Yeah. Um Okay. So, uh some more broader questions. What cool projects would you like to work on once you have more free time? >> I mean, I wanted to work on dreaming, you know, like my maintenance worked on dreaming while I I'm there like >> while you were dreaming. >> Uh so, >> shift it, right? >> Yes. What What is dreaming? Uh it's like a way to reconcile memories and like kind of create a little bit like like a dream log go through like your session logs. Um >> we we found out from the enthropic source code leak that they also working on dreaming, right? >> Oh yeah. Yeah. I mean there's I'm pretty sure there's like more companies working on that. But think a little bit like how do we learn as humans? You you experience a lot of things during the day and then you sleep and and in sleep your your brain does like a garbage collect converts some me some local locally stored memories into long-term storage and like drops others and that that's similar ideas that I think could also be very useful for agents. Um and then like what we shipped on dreaming is like the first little step in that direction. >> Yeah. It's related to the wiki uh thing that Andre has been talking about where you sort of collect everything into a >> wiki is is more memory but like everything kind of blends a little bit together. Um that the beauty the beauty of open claw is that we can just try stuff you know like like everything what we worked on for the last months or so is that in the beginning it was a big spaghetti codebased mess and now like everything everything is an extension a plug-in. So you can replace memory, you can add the wicki, you can add dreaming, you can add I don't know your your your whatever crazy idea you have and just make it your own. You don't have to send everything to a pull request because we're still completely overloaded on those. But it's it's more like Linux where you just can install your own parts. >> Yeah. Yeah. uh and uh you are building what a lot of people think uh is the most consequential open source since Linux which I don't know how do you deal with that how do you deal with the the the fame what is a day in your life uh as as the BDFL effectively of something like this >> what's my well there's still a lot of coding there's also a lot of >> by the way in in between sessions he was coding back there >> yeah they get token excited you have to like something has to You have to push the agents, right? >> Yeah. Um, where it shifted a little bit now. It's a lot more a lot more talking and steering people in the right direction. Like because there's a lot of things that we already learned at Open Claw. So like part of my role at OpenI is like to like help them not make the same mistakes again. Um and then and then open claw is like try out new things that seem exciting and some might work and some might not work. Enable enable companies to like build their own claw without having to fork away but like making everything more more customizable. Um yeah and sometimes I sleep. >> Sometimes you sleep. Okay great. Uh I think that maybe this is the last good closing questions. Uh what skills do you want humans and engineers in particular to focus on developing in the age of AI? >> Taste was a big one, but I already mentioned that system design is still very important. >> Yes, you we talked about this in San Francisco. Yeah. >> If you don't think about that, you will eventually swipe yourself into a corner, right? Just by defining the boundaries like the funny thing is like everything is in the clanker but you still need to ask the right questions otherwise that makes the difference of like good code that comes out or like really bad code that comes out and that's still where like all the knowledge you have like how you build software you can apply to steer the agent into into something that is not slob. >> Yeah. And then I think I think a skill that is becoming more and more important is saying no. And and and that's something I had to learn as well because even the wildest idea is just just a prompt away. And usually this one idea is never the problem but like this idea and this idea and this idea and this idea and then how all of that fits together that's the problem. >> Yes. So like I think we're still bottlenecked on syncing and about like big picture syncing because imagine the world from your clanker like you're being thrown into a code base. You might have an outdated agent. MD file, but you basically don't know what DF this is. And you like then like you tell me, hey, add user profiles and you like somehow add user profiles and connect it to the two things you see, but you didn't see the whole system, right? And then that's where a lot of those localized solutions comes where like the project has like vS and and it's our job to like help the agent do its best work by like providing them with like hints. Hey, you want to consider this? You want to look there? How would this interplay with this? And then and then ultimately you get like a much a system that actually is maintainable. >> Yeah. Um well, thank you for maintaining one of the most important software of all time and thank you for spending time with us. >> Thanks for having me. >> Yeah. Hopefully you stick around and answer questions. Thank you. >> All right. Uh we have a we have lots of other maintainers coming in the claw track. So uh stay tuned. Good stuff. Well, thanks. Thanks to Peter and to Swix for that conversation. Okay, we'll do a little tiny reset here. I managed to not be involved in carrying furniture, which is good. Uh I would only drop things. Uh so as we do a little switcheroo here and get organized we're uh just about ready to uh to bring Vincent up. So Vincent uh is a does research a research engineer at a also does Devril work oh look at this is happening like magic also does deell at uh at comet uh but I think one of the things we're going to hear about in particular here is his work uh contributing to openclaw. Um, I've had so many conversations in the last, I don't know, 12 hours, 36 hours of uh about how the development workflow and the kind of the processes are changing so much since, you know, contributing code has happened so much faster because there aren't always just humans writing it. And I think we're going to touch on that a little bit today. So, um, I think we're just about ready to to to bring Vincent up. So, if you're ready to give him a giant round of applause and make him very welcome. Uh, let's uh let's welcome Vincent Cotch. All right. Enjoy. Yeah. Ready? Welcome everyone. How are we all doing? Amazing. Come on. Come on. No worries. I promise this won't take too long. >> They've got it. Cool. Amazing. So, welcome everyone. I'm Vincent. Uh, what do I do? I'm one of the core maintainers at OpenCore working with Peter. And as you've heard before, I have a day job as well. Same as Peter. he has a day job at OpenAI. Um, but you know, it's an open source project. Amazing things have been happening. I'm going to talk about what I call dark factories and how open claw ships faster than you can read the diff. Um, this meme is absolutely hilarious. So, I think Peter posted this a week or two ago. I wake up, there's a new technological advancement. I wake up. It's this this joke that we're shipping at insane speed and the velocity is just absolutely phenomenal. And some of you might think, "Oh, this is some luck or we're just like Ralph looping to the max." Um, I think there's actual engineering work here and I'm going to talk about that. Now, as I mentioned, I'm Vincent. I'm your friendly clanker. Uh, this is me using VR goggles back in 2013. So, despite my accent that sounds somewhat Australian, I was born and raised in East London, not far from here. I actually went to to college just down the road in Westminster. And, yeah, at some point decided to live in Australia. my accent changed, but I used to love technology. I used to love being at the edge of technology. And this was like one of the first few sort of early um VR goggles that came out. Came in this big box with a big warning sign on it saying, "Hey, use for 5 minutes at a time." Cuz it didn't have like the anti-motion sickness built into it. And the funny thing with this one here was that I didn't use it for 5 minutes. I used it for 3 hours. and I played Team Fortress 2. Had an absolute blast and then I vomited for 3 hours after that cuz my vision turned into B vision. Um the what I'm trying to say here is that like anything on the edge is going to be janky. It's going to be horri horrific. It's going to be uh uncharted territory. and working on openclaw and being part of the team that ships probably you know an insane velocity of commits to a point where I get very limited by GitHub on an hourly basis uh is an interesting experience and this experience Britain's gone through before um we had the industrial revolution when mills and cotton were being produced at extreme amounts of volume and there's a lot of history here around production and productionization at scale in the UK and in Europe. And I feel like we're going through this moment again. Uh we're going through this moment of how do we build at scale and the ways we used to work before just don't work anymore. And it's kind of strange because in my day job I kind of work in the space of eval which everything is sort of structured and there's telemetry and it has to be all perfect. and I work on a project where I'm I have this blind faith in the harness and it's this kind of two walls, but they're starting to come together. We used to have hand looms in cottages, uh centralized mills everywhere. Uh craftsmen were the factory workers, but the bottleneck was the weaver's hands. We're now switching to a world where engineers writing code and editors, not so much. uh swarms across repos uh engineers are becoming factory managers which I'm going to talk to and the bottleneck becomes taste you know that lovely word Italian mother's hands yes so in in context like what does this mean like are you talking absolute nonsense of people building things at absolute scale they are what happened was um very similar to the chat era where everyone denied it at scale that they were using chat ch Everyone was in this absolute fear-mongering sort of world. But what the reality was that everyone was using it. Everyone in secret was just like, "Oh my god, what's going on? I need to talk to it." And the same thing is happening with this autonomous agents at scale. Some organizations have openly come out with it. So for example, Anthropic uh with their recent work they did on building a new C compiler. Uh we had Spotify saying they're they're no longer writing code by hand supposedly. Um Steve Jger which I absolutely love uh saying he pushes about 50 PRs a day total solo. He calls himself a vibe maintainer. I can kind of relate to that. And open core where we're pushing at the peak we were doing 800 commits a day. And realistically like there's about 10 to 15 core maintainers all with day jobs. It's kind of astronomical in terms of scale. And for me this was March 15. um what was that like two three weeks ago where I hit close to 3,000 commits per day and if you actually took look my commits actually stop when I go to sleep. So if you if you want to see when I go to sleep and when I wake up and how many hours of sleep I have you can just take a look at my commit history. Yeah it's astronomical. But the thing is this is going to become the norm everywhere else. Like this is this is like a me telling you you need to wake up that you know this scale of velocity is going to be normal and trying to review PRs and go through all this nonsense may not work but somewhere in the mix is engineering there is a form of engineering that's going to happen so we did commit maxing you know let's just go there smash as many commits as we can and this reminds me of Ralph looping right this like this this this this guy where you're like hey I'm just going to like give you a task I'm going to burn tokens for for like 8 to 9 hours and you're waiting, you know, uh you're waiting, you're hoping something happens, maybe something happens, I don't know. Um but what if we had a bit more of an opinionated uh approach to this? Um what if we call it bart looping? I don't know. Uh one of the other maintainers gave me this idea. Maybe we'll coin it. Do we do we need more than just tokens? Uh what does that reward mechanism look like? How do we get a bit more opinionated? Yes, let's run loops, but let's be a bit more smart about how we do this. So, um, right about the time you saw those 3,000 commits, uh, this was the day before, I was at NVIDIA with Peter, and the gentleman you see on the left is is one of the other, uh, Nvidia gentlemen, and they were like, "Hey, we're building Nemo Claw." I'm like, "What? What's going on?" And, uh, let's help you build it. And I was in the room. I was like, "I can't work on a laptop for like hours on end. Can you bring me a screen?" They bought me a screen. Um, Peter didn't have a screen, so that's his laptop on the left. He asked for a screen, so they gave him an even bigger screen than mine because, you know, why not? And we just got to work. So, he's running about maybe 15 codec sessions, and he's got his Mac Studio at home, his VPN into I'm running another like 10 or 15. And between collectively between us, we're probably running with sub agents included, maybe up to 60 70 agents. Um, but on the foreground, maybe 15 uh swim lanes if you want to call it that. And we're just just going for it. Funny thing is we're working on Nemo claw one side but one maintainer decided I'm going to move some stuff around. I'm going to move a couple of folders around and that was moving entire channels. So like all our conversations with like MS Teams and Slack ended up moving to another location in the codebase and we were like oh my goodness we're going to have to change stuff. Um and I found a really nice uh place to put my drink as well. The Nvidia people don't like this. So what ended up happening is what we call the great refactor. Um essentially where we were like hey we have lots of people raising PRs and what they actually want is to build features. The thing is we don't want to give everyone every single feature that they want in which case it becomes bloat. You heard Peter say earlier on the challenge becomes who do I say no to? It's not about saying yes. In a world where tokens are cheap I can just say yes to absolutely everyone and merge everything in. But that's going to turn this codebase into an absolute fire dump. So the vision was actually we need to cut this codebase down. We need to we need to rip it into pieces and a plug-in architecture somewhat made sense. Imagine if you're OpenAI or Mistral or Anthropic, what if you owned that piece of the provider code and it was handed to you and it was separate from everything else. So this code change that occurred was like a catalyst for us. It was 2 in the morning, we're tired, we thought, why not refactor the entire codebase? Sounds like a splendid idea. So 2,700 commits later, uh, close to a million lines of code change, uh, touching 82% of the core codebase. Plugins were launched. Um, the night before, I think it was like 1:00 in the morning, I'm trying to go to sleep and the tests are not passing and I was like, was Icorus and did I fly too close to the sun? Um, as we like to call it, did I did I vibe too hard? I actually generally thought I vibe too hard, but as a team, we managed. We managed to bring this codebase back together again. But the saving grace was these awful sort of unit tests that AI code loves to generate that actually ended up overfitting on our code. So when we completely ripped everything out, we still had these tests that were like extremely overfitting and as long as they would go green, we knew we were kind somewhat close. Um so how do we do this? You know, um in my case, I call it my factory. It's many codec sessions. Everyone asks me like, "What's this magic source? Like, how how do you do this? What's this crazy insane thing? Like, how are you guys building this? Very simple. I have swim lanes." Um, it could be five, it could be 10, it could be 20. But traditionally, they kind of cut themselves up into different pieces. So, like if I does this work, the laser, you can't really see it, but imagine you're a factory manager and you have a production line blow. Essentially, you might have a case where you have uh let's just say CI to one side, you might have features in one side, you might have bugs in another. So, when I'm refactoring and doing stuff, um right now the codebase is quite stable. I want to refactor some tests. Well, that might be swim lanes one and two. I don't need to really babysit them too much. I just tell them, take your time, make sure the tests pass, just commit, just just push them through. Whereas with three and four, I might be looking at specific features and issues around say Docker or um one of our messaging channel channels in which case I'm having a conversation with those agents. They're going off investigating, doing the work, coming back and then maybe five is actually um looking at new P 0 and P1's um that might be using other data, might be using GitHub. Uh we have agents that run inside of a Discord channel. So when we do a release, we might be like, "Hey, what's happened in the last two hours that I need to be paying attention to?" And this will scale up and down. But what ends up becoming quite interesting is tokens are no longer the problem. Um, depends who you ask. What really ends up becoming the problem is just raw compute and my brain space in order to sort of keep an eye on all of these sessions. So in harness we trust what ends up happening is I don't have this really insanely complicated process. The one thing I have complicated in my life is adopting git work trees and I kind of wish I hadn't. The only reason why I say this is when you're running an extremely heavy test harness, it ended up completely nuking my machine because I ended up running like every PR attach ends up becoming a new git work tree. I end up with like some close to like 70 or 80 active git work trees in any given day on my machine. And that's kind of hell. Um, so I had to actually build some like magic source around my my codec session. So my codeex is aware of git work trees. If I hit the escape key, it crashes. It will self-heal, self-reover, get, you know, sparse stuff. But realistically, I should have adopted what Peter and other people do is just like clone the repo 10 times and point 10 different, you know, codec sessions to each one. But the trick here is that like I haven't done any magical source. I don't use plan mode or spec mode. I have a conversation with the agent and we work through it and we find a way to make it work. So realistically, it looks a little bit like this from the matrix. Um, and people go, "Oh, Vincent, like, how do you know it's kind of working?" And this is going to sound somewhat little bit lunatic. It it if if anyone's watched The Matrix and seen the scene where Neo goes over, he's like, "How do you know? How do you read the text?" And the guy's like, "Oh, you know, I've been doing this for a while, so I can see like woman in red dress or guy walking dog." And you start to have this like relationship where you can feel the reasoning tokens. I know it sounds somewhat ludicrous, but there's times where I'm looking at the swim lane, I'm like, "This sounds off. It doesn't sound off because of what it's doing. It sounds off because of how it's explaining itself to me. It's waffling. It's not making sense. It doesn't seem to know what it's doing." And this feels a lot like how I would manage people. Um, if I had someone working for me and they started downright bullshitting, I'd be like, "Wait a minute, what's going on?" So in these cases, I might just nuke the session and go, you know, I'm not going to deal with this section of code. I'm going to leave that to another maintainer or I might come back to it four or five days later. But that experience feels very much like intuitive and building that intuition I've been able to get to because of the sheer volume of token maxing I've had to go through in the previous year. So there is engineering work. Uh I call this the agent development environment. Um essentially the process goes I have skills I call it skills similar to dot um dot files both of my dots skills and dot files is available on GitHub it's all open source go for it some of my skills are private but there's skills in there for like writing technical documentation for example um that I've co-created with other um developer experience and other engineers in the market um you can use a skills gym something like a ger which I'm also a contributor to and or You could just say go codeex I've been using this skill in my last uh two weeks go through the codec sessions read the logs make improvements to the skill um I would then take that skill and deploy that into my open claw or take that into my know personal environment and I'll use something like versel skills.sh as like a mechanism to loop this. I've added some other testing and other elements on top of this but there's a process to how I manage and maintain my skills as an engineer. The way we manage PRs has some level of engineering work to it. Um there's this kind of running joke that every maintainer that joins the project decides to try and tackle like oh my god we have 6,000 PRs. How are we going to solve it? I'm going to cluster everything and like figure this out. So >> how many >> there you go. There you go. Honor. Thank you very much. So this was my flavor of like trying to solve this. This is like a semantic graphing uh vector embedding on the entire GitHub stuff. This is one PR has 73 ed uh 106 edges. What ends up happening is that everyone else has the same problem. So they decide to to send their flavor of the PR issue becomes utter noise. So there is even process around like how we even consume what we're going to work on. We might not call it a road map, but we have a way of kind of dduplicating and seeing what's out there. This might be a signal for me to say, okay, if there's enough pressure coming on one issue, it must be big enough that all these other clankers decided it's a big problem. Maybe I should go and address it. There is evals surprisingly. Um after all this refactoring work, we decided to make a fake Slack of sorts with both synthetic models and real models so we can run evaluation loops to check that each of the providers and the channels work. And this question was asked of me recently. How do you manage 10 plus agents? And this is something that you're thinking. I asked them back, how do you manage 10 plus staff? And they had no answer for me. I'd worked in large organizations like airlines and other places like that managing large AI teams. I had experience managing up to 30 40 people plus. So for me it was not like a a new paradigm. But I think for engineers and people working with these coding agents at scale, it's the soft skills that matter. It's how do you ask your agent what's going on? How do you know when they're not bullshitting you? And how do you run that factory? So it's no longer about the model or the agent. It's about the process. Uh 2025 was about token maxing. 2026 is about not wasting them. It's about token efficiency. It's about agent in the loop. Thank you. >> Thank you ever so much indeed. All right. Uh, I'm going to invite Radic to come up and get plugged in and organize as we're uh as we head straight on to our next talk. Radic, are you are you ready and armed with a laptop? Here he comes. Exactly. Yeah, come on in and make yourself comfortable while we while we chat for a minute. Um, so, uh, okay, I'm really curious about this talk because one of the things that I'm very interested in is kind of handing over ownership of all of well, ownership, maybe the wrong word, uh, permission, trust uh, to an AI assistant to help me actually achieve things while I sleep, while it's while I'm not attending it. Uh, and Radic certainly done that. Um, he's another maintainer uh, working on OpenCL. um and he has gone down the rabbit hole of giving the keys to his life to an AI agent so that it can take all kinds of actions on his behalf. Um I'm curious to know where the boundary sits with, you know, where where trust lives and uh how much you can uh abdicate responsibility, hand over uh the keys to an AI agent. So uh that's what you're going to talk about, right? Are you good? Are you ready? You set up? >> Uh I think so. Yeah. Yeah. Yeah. I should be >> okay. It's just almost okay. Super. So, um, yeah, as before, please give him a giant R. No, no, he's not ready. He's not. He's making sure that the AI agents really do have the access to the things they need. Um, while he does that, um, I mentioned earlier on very briefly that if you want to give a talk, we'd love you to give a talk and there's a chance to give a talk tomorrow. Um, I rattled through those details very, very quickly as everyone darted out for a cup of coffee. So, while we just have a moment, I'll just reiterate that there's a section tomorrow afternoon uh where one of the tracks uh does uh allow for 10-minute lightning talks from any of you who feel like you uh you fancy it. We love hearing from voices that haven't already been on the stage. Um so, if you wanted to go into the AI engineer Slack, uh you can find details of how to submit uh a a session there. 10-minute lightning talk. There's going to be a vote, I think, towards the end of today. I guess it would be before the end of the day so you'll know before tomorrow otherwise you'll be looking at the schedule and you'll be you'll hear your name announced and you'll think okay here we go. So you will hear beforehand. Um but that's all happening in the AI engineer Slack. So check that out later on. Ready to go? >> All good. Yeah. >> Please welcome Radik everybody. >> Cool. >> Hey I'm Radik. I'm one of the open claw maintainers and uh I want to talk what happens like in my life with open claw when I practically gave the keys to my life to to open claw and and like it almost like literally and uh what that actually means so so this happens like step by step it wasn't all at the same time but it can access my emails. It can access my notes, files, calendars, tools, my operating system, so automations and it builds on top of like memory of everything that I do uh at the computer. So, uh it can do anything with it that uh that is possible to do with the computer. But it didn't all happen in one big like leap. So I install OpenClaw and now I it just like controls my life and does everything for me. Uh that that would be silly to do or like even silly to expect that this could even work. Um so what happened is that I I tried installing uh just like like everybody does just like with one channel. I think it's at the beginning it was just WhatsApp then I migrated to telegram now I'm on discord but it was just uh just WhatsApp just uh one ability to do to just like chat okay so we are there uh what what's next that we can do uh let's let's do some like one simple workflow or one very simple task that you can do once we are there let's go to the next step so this is how it happened where I am today where I used to think that I have quite a simple setup with uh my open claw and what it does because I never did any big change but when I encounter different I don't know Twitter threads uh YouTube videos or talking to other people how they have it set up I see that like my setup has everything that they have more on top of that and most also is just like more sophisticated than what what I see out there which was really surprising to me because I felt that it's just like one small step at a time. I have a pretty like simple setup works for me but uh that that's what I want to to show how that happened and how it looks like today. So you you already had like a lot of talks about how the sausage is made, how we are making it better. You'll have more talks about the insides of the open claw. I want to show how it looks from the other side from the first the simple user then power user. Now I'm also a maintainer. you don't have to go to the maintainer route but uh when I was playing like with one of the uh workflows I just encountered some errors and just like submitted first PR then the second PR then just looked into Discord and then you just got involved now I'm a maintainer there so it's also was just like one one step at a time uh so that that's the step these are the steps that uh it usually happens is that I see I see the need uh I solve it in in a very simple way uh and and then I add more steps to it and this is also why I usually don't have big issues that people have that okay now it broke my computer or it just like completely bricked during the update because I have all these small steps that I take if something breaks I just like step take one small step back fix it see what doesn't work, understand why it didn't work, uh have a setup that it never happens again and just like take one step further again. So, uh where it started being more and more helpful and kind of like running my life is when I gave it my knowledge base. So, I had a lot of stuff in my Obsidian which I built up for years. So, right now I have like about 3,000 pages or notes, markdown files in my Obsidian. And this is everything. This is work stuff, personal stuff, tasks, projects, research. Um, what else? Articles kind of like an inbox of links that I'm just putting there and it then finds the the connections uh and puts it in in perspective and in context to to other stuff that I have. So all of that is now accessible through my open claw with a very good search. I have search and memory. I have like normal search. I have QMD search for for obsidian. I have different memory for for my workspace. uh and all of that is interlin and and and that that's where that magic happens. And when I saw recently and that that's where it hit me that I probably don't have a simple setup. When I recently saw Andre Karpat's tweet that went viral uh where he says about LLM knowledge bases, I was reading that and it's just like yeah, that's exactly what I have. like what's like super uh revolutionary about it and then I I I understood that okay so I got there step by step it works for me so it's probably probably worth I don't know sharing sharing telling more about it u showing how it works showing how you can get to that point as well uh and uh for example for Obsidian um do Yeah, this this is how this is the real screenshot of my my vault and all the nodes and these are different clusters. Some are probably uh project related like the big clusters. Some of the one off uh these are probably more uh kind of like bookmarks. And one of the tasks that I'm doing and that I have is that when I add something uh to inbox it then takes that link that I add there looks what's there it could be a tweet it could be a thread uh it could be an article it could be a YouTube video analyzes it adds tax to it adds context to it looks at what's already there on this topic in my vault, how it could be helpful in other areas and adds connections to it. So what previously was just like Twitter bookmarks that you bookmark and you never go back to that now it just adds more context builds up my knowledge base and is much more helpful and even surfacing the things for me when I add a bookmark that okay so you already had like this and this and this about this subject and this is how it connects maybe you should look at those notes and very often it's just like yeah I completely forgot about that and and that's a good source of of knowledge and of thinking about it. Uh because there was the reason why I'm adding this bookmark. Uh so that that's where it's it's starting to uh to be super super useful. On top of that also uh at 4:00 a.m. like 4 a.m. is just like uh an example of that that I have. It's happens probably between 3 and 6 more or less. Um so this is what what is happening when I'm sleeping. So when I'm sleeping again my agent does everything so that it runs well. It indexes everything. It backs everything so that worst case I if I lose something I lose maybe couple hours of work of content of anything else. refreshes all the indexes for for QMD, for memory, for my Obsidian vault and I I start fresh in the morning uh with uh whatever waits for me. Maybe summary of the emails of the calendar uh everything updated, the latest uh the latest open version is waiting for me which also took like step by step. I have some scripts around it so that it knows what to do and what not to do when updating, what can break, why it breaks, how to verify it before updating or before restarting uh your gateway so that it is able to come back online again. So that that all is also uh automated and as as I get up it's it's already waiting for me uh fresh and ready for me to start the day and each open claw is like I'm not a big fan of sharing like my exact setup because that exact setup is like very specifically for me for what I need right now uh for what I will need in the near future. for like the errors that I encountered for issues that I want to be solved but to give you some idea so that we can talk more also about specifics and not just like in general. So these are some like five areas or five types of jobs that my agent is doing. Uh the first one is is ambient operations. So so this is what I just uh showed you. So it it does all the updating. It does all the plumbing. Uh it does all all the stuff that needs to happen but I don't need and I don't want to think about. Um the the second is attention filtering. So this is also super useful that because it has access to everything and because it has all the content context actually uh so it knows that for example when an email comes and uh it's something important or urgent and it knows from obsidian what's the context and the background behind it. Uh because yeah I I keep everything in obsidian about projects about everything else. So it then can proactively tell me that uh I think I have here. So like these are like three very specific examples that I had recently that when the system notices that something is important and urgent, it just lets me know. So like Netflix payment failure for some reason didn't go through uh was fixed within five minutes when it happened. Domain renewal coming up. I would probably miss that email. Uh but uh it it picked it up uh gave me gave me a message on my discord uh renewed my domain uh emails uh that can already be with enough context given about the project for example it can already uh give like read the email uh understand what's happening understand what's already done within the project and just draft the reply and and it's already in in draft folder for me to uh accept or or delete or make some changes. So, so these are some examples of like potential filtering uh execution supports. Yeah. So, that's draft synthesize is that uh the the inbox and these are on the right. These are the channels that I have in my discord that more or less relate to these types of jobs. So general is where I have everything. Uh I just start the conversations uh see where it goes and if enough times I have a type a certain type of conversation I added a specific channel for it. So the these are uh like real screenshots from from today morning. Uh the inbox is where I just like drop links and it builds the knowledge base for me. Consulting is for for for the clients and every all the backgrounds. It knows all the projects. It's know knows all the quotes, deadlines, tasks, next steps, everything else. Video research is for for YouTube for researching what's what's out there uh to help me uh with with the next episode. Uh briefing is for morning briefings. Instagram for social posting. YouTube is uh for for creating creating the the videos. Open claw is for maintainer stuff and there's also one playground channel uh which it changes depending on day month or the need. Uh it's for testing. I usually test maybe a different model maybe a different uh workspace different way of setting up uh the the important files like memory and everything else. So, I just play there, see what works. If something worked, uh, I promote it. If if it doesn't, uh, I discard it. Uh, and all of that works because it's not just uh, it's a system that has many moving parts that work well together. So, LLM is for judgment like understanding the email, understanding the context, making the connections. Then there are all the files the the tools the scripts that I have built the scripts are just like if this happens do this it's done you don't even need judgment so LLM is even skipped uh and important uh thing is also to optimize your memory file your sole soulm file uh I have also critical rules MD uh because even if I had something in agents MD or in soulm uh it it still managed to to forget something or not do something uh with critical rules. Having critical rules helps and having it uh mentioned quite high in the agency file. Uh so that that's also an improvement. Uh I I went through a few different setups of memory where I had one memory file. Now now that uh I have like the whole memory folder now we also have dreaming where uh we have like promoting the memories. So this is important to work on these files uh and but it's easy to do in open because everything is inspectable these are markdown files editable you can look at it you can read it you can understand it uh and it works well. Uh what gets harder? Uh bad memory compounds. If the memory is not set up correctly and your vault, your nodes, your memories grow to thousands, you're going to have an issue. So you need to actively work on that. Brittle automations, especially when it's like 10step automations, uh it can break and it probably will break at some point. So it's again either split it up into simpler ones or or have uh some guard rails uh that are more effective uh noisy nodes uh I'm getting rid of them cleaning um cleaning regularly and weak boundaries. So so those are all the celld and everything else uh that the files that that are important to optimize for your needs. Uh so what I want you to take away from this is that like do what I did and then at some point you realize yeah this stuff is awesome and this stuff helps my life. Um start with one recurring pain grow trust incrementally build the knowledge base uh move everything or like move as much as you can or as you want to markdown files and and start making those connections. um inspecting system expectable is is easy for you done for with uh with open claw um and optimize for the future you and this is what I want to close with. So couple years ago I had an article about like the past me, the present me and the future me and the past me is just like this completely stupid guy. He does nothing. He's lazy. Uh he doesn't want to do anything. So now I present me need to do everything for that like past me and and the future me the future me is just like kind some kind of like god creature it can do anything uh that that that creature is like um all powerful and just like if I don't do something today it's fine that that other creature will do it for me. So that that was the the issue and uh the job for me is to to become friends with the future me to to treat that as a person that I want to help with and that's the job of the agent. So I don't need to do as much as I used to because the agent just helps the future me as much as possible so that when I wake up tomorrow it's like as much as could be done but then someone else other than me is done. So that's that that's the whole purpose of of this setup at least for me. I don't know it could be different for you. So that's what I want to leave you with. Thank you. Radic, thanks ever so much. Another quick round of applause, I think, for Radic Shankovich. There we go. All right, so we have Sally Omali now just getting getting comfortable, getting set up on the stage. Um, this is a another uh hot topic, you know, as we're putting more and more work into building out our agents and of course that comes with lots of different files, lots of different bits of infrastructure, lots of tools that exist in our local environments as we're building those. How do we then make those portable so that we can share those with teammates, have them deployed and all the rest of it. So, uh, luckily, uh, Sally Amali, who works as a a principal software engineer at Red Hat, uh, has been doing just that, and she's going to be able to talk to us, uh, way more articulately than I about that subject. So, Sally, are you good? >> All happy. >> Is my mic, my mic's on. Yeah, >> sounds good to me. Okay, if you're ready. Okay, a big round of applause, please. Are you good >> already? You don't even know me. Okay, Sally, platform me. Gesh. Hey, um, I'm Sally. I work at Red Hat. I've been there for about 10 years and uh the first seven years awesome totally cool I was working on containers and uh Linux security stuff and Kubernetes I'm big time in open shift that's what I did for the first seven years and then uh about three years ago well about five years ago I moved to the emerging tech org and that was awesome too because now I'm not totally tied to a product I get to just work on what I I get to just try out new things. Awesome. And then about three years ago, it was like all AI all the time. Everything AI. I know not I knew there was a data science team at Red Hat. I had no idea what they did. Machine learning something something. Um so I you know started doing AI and uh yeah it was a lot of Python and Markdown. Every single thing was like okay another chatbot more Python more markdown. Um but uh here we are today and what a what a crazy awesome world we're in. Um so the first time the first time I came across openclaw I was home for a week on like a station took a few days off and uh a molt book happened and I was like what the what is this? I'm totally trying this and so I went and found it on GitHub. First thing I do is I look at the license MIT awesome uh OpenClaw. I'm like I'm so gonna install this on Open Shift right now. And so for the next few days I just kind of built the image um ran it uh locally in a container, put it on Open Shift, just played around with it, went back to work. I'm like, "Guys, check out OpenCloud. This is so cool." And uh a couple people on Slack are like, "It's a security nightmare. Do not use open claw. Don't put it on the work laptop. I'm like, guys, what have I what have I been doing the past 10 years is I'm sec we're we can take any application and run it securely. Like that's what re is like if we can't take an application and run it securely like come on this is our golden opportunity to show everyone. And so uh Red Hat's coming around to that. Um but uh yeah so uh this talk is about me running in containers and so I wanted to get a list of uh so the I wanted to get a list of why running in containers is the way to go. I run everything in containers. I it's kind of foreign to me to uh take to just run something natively. It's messy. It just puts stuff on my computer that I have to clean up later. I don't like it. Um, so that's one one one thing. Uh, and I um ask my forever claw. I guess I have to introduce my forever claw because she she's she's she's coming through this whole talk. So I'm gonna aside my forever claw's and um she I have two sub agents. I have Joy. Uh anyone know Joish astrology? Sheesh. Every time I ask no one knows what it is. It's this is very scientific astrology. Um so she's an astrology expert and um she gives me my weekly readings, my birth chart, all of that. So Joy and then my second agent is Bruno and he gives me daily briefings on the Bruins. Um so we're heading into the playoffs and it's a close race, so I want to make sure the Bruins get in. Um so that's my forever claw. And uh and I asked her, you know, why should we run you in container? And uh she said all of that if you were reading, but it's reproducible. You can isolate your secrets. It's portable across infra. I can run on my laptop. I can run it on my x86. I can run it on my Mac. I can run it in Kubernetes. Um backed by volumes, which gives a really nice story for backup and recovery. Uh because I love my Forever Claw and I I back her up every night with uh with um like a systemd service whatever it's called on Mac and um and and you just get that natural uh you just get that natural sandbox when you run something in a container. It's it's you know that's that's what it is and you have to be very explicit about uh what you um give access to you know from the host and so this yes I she loves running in a container so that's that's all you need to know um it gives her a clean predictable environment doesn't have to worry about the OS quirks stale dependencies this is literally the definition of why you should run everything in containers And uh just quickly, we're not going to read this, but this is joy. My horoscope um for today for giving a talk is excellent. It's like a very auspicious day to talk. Uh so yeah, that's why this talk is going awesome so far. And uh my daily briefing uh Geeky is finally waking up. He had a bit of a lull. He's finally um you know, ramping up for the playoffs. So it looks like the Bruins are gonna be looking good. uh they're in and uh yeah so uh yeah so um containers it allows me to uh another another thing that containers do is you can set up a whole agent directory with uh maybe you run some tools some skills some MCP servers uh you can keep those in a directory and mount that whole thing into your container container uh and uh so when at startup everything's just up and running. So I do that as well. At the end of this talk I'll show you how I install and I think this is a reminder to me. Oh no let's talk about secrets. So I run everything with Podman, not Docker. Um, but in theory you can do anything with Podman and Docker except Podman has this really cool feature called Podman secrets and you can save your API keys. I'll show that. I'll I'll show it off the slides later. You can save your API keys to a Podman secret and then you mount that secret into the container. And so it just gives this separation. Uh your your secrets, your API keys are then just a a ref back to the secret. And with openclaw, what's really cool is there's like a double that because in open claw, there's a secret ref feature. And I also use that. So my API keys are a pointer to a secret ref to the outside secret. And uh that's not perfect, but it gives me some peace of mind that I don't I'm I'm not going to be showing my API keys in the logs and everything. And then very similarly, um Kubernetes has Kubernetes secrets. And same thing, instead of just a straight Mvare, you have a a a secret a a secret ref to an MFA. And this is my reminder to show you how I install my containers. At the end, I have a really cool tool. I built it just for me with everything that I need to run containers. I'm not pushing it on anyone, but it's in GitHub and at the end I can uh let you know where that is so you could try it if you want. So when I so thank I think we're heading to a world where these agents, these AI workloads, whatever are going to be running everywhere. I hope we all can see that. And so imagine my vision is for u everybody's open clause to be uh running everywhere and communicating with each other and uh when and especially in for like business use cases real real things not astrology and and brewins uh that opens up the need that to the same need to run any application uh in that way is security and and how to do it at scale and that's what Kubernetes gives you and you can uh what I always do is develop something locally and then lift it to Kubernetes and so the same story holds for AI workloads or open claw and I was at pietorrchcon yesterday and um my friend from Nvidia said I could share this They are running their model evals with openclaw. They have about 10 engineers. They each have their open claws running in kubernetes and periodically just checking in with the model evals and it works so well for them. He said it it was like you know doing the job of six engineers uh in with with himself. Now let's think let's just talk about that for a second. We're not all losing our jobs, people. Like that's not happening. What's what that is enabling for his team is they get to do fun stuff, interesting stuff. They get to do creative things. And this is what AI is giving me and my team is we can focus on those like outside the box crazy things. And you don't have to do the tedious code anymore. Like I haven't written code in in a few months. And this this did just happen like probably less than six months ago. I was uh using AI. I was like, you know what? This is way better than me at writing code. And there's I like Yeah. And I I announced that to my team. We had an org meeting and I'm like, "Guys, if you're not using AI for everything, like you're missing out. This is 1,000 times better than me at writing code." And some of the top engineers at Red Hat like definitely raised eyebrows and I could tell from their comments after that they were like no way. I'm like yeah and so so yes it's it's enabling us to just dream bigger and uh this is my reminder to show you the Kubernetes side of my installer later. And um yeah, so backup and recovery is a nice clean story when you run in containers too. Uh the state is the same. the volumes. Another nice thing about Docker and Podman is there are volumes and so all of my runtime state lives in a nice contained Podman volume. And of course, Kubernetes has uh PVCs. That's kind of what I just talked about. And so this this would be my vision of a workplace setup for open claws where you maybe have your nice curated baseline open claw that as a new hire you you just you get your your base and what does that have in it? It has your list of company approved MCP servers uh your authentication that is approved through your company. It has all of your these skills that are very specific to your team. Uh maybe it access to your Google Drive. Like all these things uh that you use every day at work, you can take that and just fan it out across your whole team and then and then you can personalize it as as the individual. And that's what uh that's what this setup allows. The alternative would be you're a new hireer and you sit next to somebody or get somebody's uh repo and kind of put it all together yourself. And so yeah, team standards, portable environments, reproducible onboarding. That's my vision for like openclaw in the workplace in the future. Uh, I actually just recently created my Forever Claw. It It was like a month of me um helping out with with OpenClaw and feeling like I don't even run a real open call myself. I just constantly throughout the day I'm spinning it up, spinning it down, testing it, building it. Every hour there's like a hundred new commits. So, I'm constantly pulling from Maine. I was at PyTorch Con yesterday and hadn't pulled from Maine for a couple of days. Uh and and there were when I did it was like 10,000 commits. Like no joke. It was crazy. I'm like I don't know what you guys are doing. Slow down. Uh not really. We don't want to slow down. So yes, uh that's that's the story. And I've got four more minutes. I am psyched because I can now switch over here. So in order to run uh this local installer here, which I think I have here, yeah, it's just a mpm rundev. Now, the one thing I don't like about this is when I'm on my Mac, I can't run this in a container. I I think I can. and I just haven't taken the time to figure out how to spawn a container from a container. You can do that if you're on Linux because Linux is awesome, but on your Mac, that's not possible because if you don't know, whenever you're running a container on your Mac, you're running in a virtual machine. Same with Docker. Containers only run on Linux. So, when you're running a container on your Mac, you are always running in a virtual machine. Docker sets up one and so does Mac. So, it gets a little tricky when you want to take a container and spawn another container from it. But anyways, here we go. So, if I wanted to run a local instance, and I have a couple running now, just, you know, you never know the demo gods what they're up to. So, I'm just in case it doesn't work. I'm gonna I'm gonna um spin up Joe. All I all I do to set up my pod is I just give it a name and then all of these options very opinionated because I I'm telling you this is exactly what I need. So if it's if you like it, use it. If you want to change it, then submit a PR. Cool. Now, uh the port is usually 89. Uh, that's the default, but since this is my second one that I'm running on my machine, I had to I'm just bumping it to 99. These Podman secret mappings I wanted to show you here. So, you can see I have these set up already. They're just on my system. They're like m but they're not mver because they're contained. Um these are my API keys. And what happens with this installer is it takes if you're on docker this should work with docker. It's got podman written all over it but I've designed it to work with docker too. So um if you're on docker it takes the mvare. So you want to export those as mvers and um makes them openclaw secret refs. Very cool feature of openclaw. It definitely enabled that for every cred credential create a secret ref. It creates that separation of uh running your secret within openclaw or kind of just a pointer to it. It's it's it's the way to go. And then uh your providers. So, I'm going to start with open router because I have been playing with Gemma and she's Gemma's great and then as a fallback I'll use Anthropic. Sure, why not? But oh, here's here's some other choices though. You can you can have your local endpoint if you're running your own. Uh you could just uh add that too. And then because I do observability at work, uh, I was like, I'm gonna give the option to set up an open telemetry collector with Jagger and it works and it's awesome, but I'm not going to test it. So, let's not tax my system. Another feature, how much time? Oh, I got to hurry. Another feature is the SSH sandbox. Here, I'll deploy. The SSH sandbox in OpenClaw is super cool. you give it SSH keys and known hosts to uh to wherever you want and it it runs all of its commands in that workspace. It's really cool. So look, I just spun up a Podman container and if I go over to the instances, I now have Joe and there's logs for Joe, the gateway logs. Um the command, I wanted to show you the command. I don't want to forget that. So, here's the podman command. If you were running Docker, it would be a Docker command. Have I tested with this with Docker? No. Uh, I have a friend who works at Docker. He's awesome. He told me he would try this out and make it make sure it works with Docker, too. Um, he also created this very cool project called Infer RS, which takes uh Gemma and runs it really, really, really fast and uses Turbo Quant. Uh, so yeah. Anyways, that's um Eric. So that that's my podman command. And uh here he is. Joe. And if I just do like models, I'll do status. So like people say it's hard to spin up open claw. That took two seconds and I was babbling through the whole way. It could have taken one second. Uh, so I can say, "Hey, um, and the cool thing is I don't have time to show you because I talk too much, but the agents are all set up. I've got Joe. Oh, that not that one. Hold on. I got to go over to Larry." Larry, I started with a um with an MCP server and a sub agent um all through that form. So, uh let me go back to Joe. I wanted to show you how easy it is just to switch models in case you didn't know. I'm not sure if the GPT5 hopefully it knows it's just GPT 5.4. No, I I didn't. No, no, no, no. We got to go over to Larry because I didn't set up G I didn't set up that extra model with with Joe. Here we go. Anyways, um I didn't have enough time to go through everything I wanted to go through, but the uh cool the other thing is Kubernetes and you can do the same thing with Kubernetes just as easy. It just uh it's it's connected right now to my kind cluster. And if I go over, I can access my Kubernetes claw very easily as well. Um there's Carl. He's running in Kubernetes and I can access one in Open Shift. There's uh it switches over to Open Shift if you're connected to Open Shift. So yeah, run anyone gonna run Open Cloud container now. Try it. Yes. Awesome. Okay, cool. Uh, thank you very much. Uh, is someone on after me? You're waiting. Okay, bye. >> Sorry. >> Thank you so much, Sally. Sally Ali, thanks. That's great stuff. I love the uh the the slightly uh teasing. Anyone on after me? Could I just I could just keep going. Uh, sorry, Sally. No more time today. But uh look out for Sally, see if you can grab her. Um she loves talking about this stuff and she's got lots to talk about. So uh so you know it goes for all of the speakers, you know, find them around the the event and have a chat. They do love to share what they know and hear from you as well. Um okay, so this is our our last talk of the session before we'll have a break for lunch. Um and earlier on in the day we talked about we talked about trust and we talked about kind of uh abdicating responsibility was maybe not the right expression but you know certainly kind of giving over trust and giving over access to things. Um Nick Taylor who's a developer advocate at uh Pom Pomearium. I almost did it. I almost said Pomeranian uh and now I have done. So uh so any goodwill that I would have earned has now been lost. Uh so apologies for that. But Pomeranium Pomearium do deal with Oh my goodness. do deal with exactly this to deal with with trust and access to things. So um Nick's going to be able to talk in detail about you know how you can how you can control that. So securing securing and building with openclaw um our last talk of the session. So I know you've held some applause back so give it up please for for Nick. >> Yeah. So, uh, like like Phil said, I work at Pomearium, and he's not the first person to have trouble pronouncing it. So, I actually convinced the marketing team to, uh, create Pomeranian stickers. So, if anybody wants Pomeranian stickers, I have a bunch with me. Um, bit about me. Uh, I'm a dev advocate over at Pomeram. As Phil said, um, from Canada, halen from Montreal. So, uh, if anybody likes poutine and bagels, feel free to chat with me after. uh also a GitHub star, Microsoft MVP and AWS community builder. And these you can pretty much find me everywhere uh at Nikit T online. Um I was pretty happy to see this that there's a a pretty sizable instance on prem of uh OpenClaw. So it's pretty happy with that. And it looks like that's the operator there. Cool. So uh I don't know. I came up with a funny title, I guess, but claws out. Um, we're going to talk about a feature I contributed to the open claw project back in February, and it's about hardening access to the control plane. So, uh, I'm assuming everybody here is running an openclaw or open claw curious. Um, is anybody running, uh, a mode called trusted proxy off mode? You might not be, but uh, okay. you might be on the who's on the token off. Okay. Um anyways, so at Pamarium where I work, um you know, I'm always just trying to secure things. That's just part of what I do. And I was able to secure Open Club, but it meant I still had to add a token uh for the websocket connection. I had to always pair my device and stuff. And you don't really need that with a trusted proxy like specifically the one that I work on which is open core. It's called an identityware proxy. So if anybody's ever used GCP uh there's a in there. It's called an identyware proxy. Something that came out of Google. Uh essentially you've got an identity provider, a policy engine, and a reverse proxy. So those uh it's not the lethal trifecta in the sense that you usually hear, but uh it's a pretty solid security approach for securing internal apps. So I was like, of course, I I I kind of got annoyed that I had to add this token still and do the pairing every time. Uh I understood why they were there, but uh I just proposed this issue and then uh at least one other person who uses Caddy uh chimed in and said, "Hey, that's uh sounds like a good idea." And then Peter uh Stipe was like, "Yeah, let's let's work on this." And he laid out like the criteria that he wanted to have for this feature. So I went ahead and worked on it. And yeah, again, prior to trusted proxy off mode, even if you were secured by a proxy, you still had to paste in that O token in the UI for the websocket connection. And also it sticks it in the query string which uh obviously like this is really more for just only local mode really. Um and still having to pair the device like uh I don't know if people get annoyed by pairing the device but uh I you know I I'd just be on my phone after I just set it up and then I was like h I got to go to the other thing to set it up. So um basically you still had to do those things even if it was secured with uh a proxy. So got merged in and uh I felt pretty good about it and uh it was nice to get some praise from Peter. It was uh my first contribution to the project. So it was very cool. So what does it look like exactly like in the config? Uh I'm just going to show like a kind of narrow part of the config here, but you have your gateway and essentially you no longer need the token like I mentioned. Uh the mode is obviously different. So it's called trusted proxy now. And then there's some new properties you have to add. So there's trusted proxies and this is essentially the proxy that is gating access to the control plane the uh the gateway. Um it's the IP addresses. It could be one or more. And aside from that you have to have a trusted proxy section. So you'll have a user header which is in in my case uh it's a jot uh JWT. Um and then there's like a required header section. There's some optional ones too. It depends what you want to do. There's like allowed users and in my case I don't need the allowed users because the way uh identityware proxy works is the policies dictate that. Um but essentially that's kind of the the big change there. And you can do this through the onboarding or if you just go back in and uh configure things through the TUI. And yeah, so that just meant no more token for websocket connections and no longer needed to pair devices. So not only are you uh getting uh better security posture potentially, uh to me it's like a UX win as well because I really found doing these two things annoying. Um cool. Uh I also just want to give a shout out to a couple contributors after I contributed this. Um, there was a bug and Anthony reported it and then uh Sid fixed it and it was definitely something I missed because I basically was testing this on my local environment and I already had something paired so I didn't run into the issue that uh Anthony had mentioned. So uh luckily uh it was a small fix and uh Sid got that sorted out but just uh you know when you miss stuff uh people in the community step up. So OSS for the win. The other thing I want to mention, it's not so much about this feature, but like when I opened this issue, um the number of the issue was 1560. And I had a a PR initially that was like in the 1700s and I went on vacation and I said, "Oh, I'll get back to it when I'm back." And the original PR was closed because it was stale. And like literally after two weeks it went from like 1,500 to like almost 16,000. So uh basically that's just a testament to how popular the project got. But it also meant I had to uh rebase quite a bit before it got merged. So anyways, I don't know if anybody else that contributes to the project, but there's so many things going on all the time. So there's a lot of uh rebasing to keep your thing up to date. Cool. So, uh, let's talk about my own open claw. So, this is Mclaw and he's sitting on my desk in Montreal right now. There's some snow still. Um, I use it in Discord. I don't know where people use their OpenClaw. I had it on Telegram initially, but they don't actually uh uh their their uh channels aren't encrypted. So, like all the stuff's unclear. So, I work at a security company and my CEO is like, "Yeah, don't use that." So anyways, uh I'm mainly on Discord. I find it use uh handy that way. I have WhatsApp too, but I I tend to use the Discord more. Um some things I want to mention too is when I made the contribution, I actually used OpenClaw to make the contribution, which was kind of fun. Um, but it also I made the mistake of I used the GitHub CLI and I gave it full access so it put up a PR right away even before I was like done reviewing things. Uh, so I had a little like ah but uh put it back into a draft mode. Um, but aside from that um after the uh token trusted proxy mode got merged I just started working on something. It started getting fun to just build stuff on my phone. So, I built out something called Clawspace. And, you know, doesn't doesn't mean you need to use it. It's just, you know, it's the age of personal software. I just had a lot of fun building it. I find it useful. And I thought it was just cool that I could build this out on my phone on Discord. Um, but for me, I find it useful because I don't need to SSH in to see workspace files that I want to actually read or like edit. Uh, so u that's just a little side project I started building. And you can edit files and stuff too. Cool. So, we're gonna do a demo here. This is going to be uh live coding. So, yolo. Okay. So, uh there's a MCP track tomorrow. Uh I've been doing a lot of work in MCPs. So, what we're going to do is we are going to build out an MCP. uh not a full-fledged version of something, but if you've seen the uh AI engineer website, they have like an LM text on the right and there's a MCP server and there's a few other things. So, I'm going to go ahead and just add this here and I'm going to go create an app. I'll explain some things here in a second. Okay. And OAS. Okay. So, this is going to go create an application in chat GBT. Uh but basically, this is uh an MCP server that just has UI as well. They'll they'll be talking about this tomorrow, but uh I have a template that I use for this. So, it's not like I'm building this from scratch, but we're just going to register the MCP here. And then I'm just gonna start building with OpenClaw. And the thing with a Gentic is you never know when it's done. It's just finishing O ath here. Okay, cool. It's connected. And we can see here it's got two tools. It's got a echo tool and it's got a search speakers tool. So, if we come here, if nobody's ever used MCP apps, basically in chat GBT, you do this for your app. And I'm going to say like echo hello. And essentially, it's going to do the tool call, but because there's UI associated to it, you're going to get some UI in here. And this is just using the standard MCP stuff that's in the spec. Now um so you can do stuff like change that make it big and stuff but what I wanted to show is like when I'm building this with open claw I can do stuff like this I can say like uh change echoed message to aie eu in the echo widget. Now, it's going to take a second, but um this is all web tech under the hood. So, I don't know if anybody's web devs here, but essentially it's using vit and react. So, there's react refresh and vit module reloading. Mloud is on the case here, and you can see I'm in chat GBT. I'm editing live from my workspace the MCP and and to explain how this is working, uh we have the trusted proxy O mode. uh I happen to be using primarium in this case. So I'm using it as well to secure other things in the workspace. So I have a public URL that I've gated for the MCP and that's how I'm able to use it in chat GPT. And I can go ahead and just keep working on it in here. And I don't know how other people work or build with uh OpenClaw, but this is kind of how I've been doing it. I find it works really well for webdev stuff. So I'm going to go say update the search speaker. So let's just do this new chat. And I'll say at AIE again, search speakers. And it's going to give uh a very minimal UI here because there's not much into it. Uh so I'm going to just tell Mlaw to get on the case here. And basically, if you go to that top right corner of the AIE uh website, there's speaker.json. And this is like all the speakers from the conf. And we're going to use that as like the source of users. And then I'm asking it to kind of give the same UI as what you kind of saw in the echo widget. Uh it's going to take a minute here probably because uh the claw is covered in snow probably in Montreal. But uh cool. And so basically once this gets done uh we'll be able to filter users and just kind of you know see who's talking um at the conference. And I'm just going to take a sip of water while Mlaw is chugging along there. Again, you never know when a gentic finishes. Okay. It's determinist deterministically indeterminate. So this should be done in a second. And then what you're going to see is you're going to see this updated. And again just to reiterate the flow, I'm I'm working in workspace files in my open claw. I'm speaking to it or typing to it in Discord. Uh this is a publicly available site. Uh and I'm able to build it as I'm in my openclaw. And I like that workflow. I really don't know how other people work. I mean, obviously I use other tools like claude and and codeex too. Uh but you can see here um Mclaw was able to get the job done and then I can start filtering. So we could look for drilling down here. Then we can find a speaker and then we can get a bit more information. And then like I could say let's add another feature here. So let's get Mlaw in the case again. So, we're going to add a more button here. And there's this send message function that you can use in MCP apps. And this is actually going to uh when you click the but the more button that it's going to generate, this will actually make a call to the LLM to and you're going to get a response back. Uh so, we'll give this a second. Cool. So, I added this more button. And again, like I've been doing webdev for a while, and I always still find it magical when things just automatically update. But I'm going to go ahead and click on here. And you're going to see here that it's thinking now. So, it actually made a call uh added another uh prompt to chat GPT here. And it's going to kind of summarize why it thinks you should check out Aleandro's talk and a bit more about it. Now, I just really find this workflow really cool. It's only possible if you use some kind of proxy to do this. Uh you can do this with others like uh Caddy, with OOTH, you could do it with uh well, EngineX is kind of deprecated at this point or not deprecated, but uh at least in Kubernetes lambda ingress controller is um but it's just a really nice way to gate stuff that is local, but you can still expose it in a secure way. Um, and it's also just fun build. Like I don't know about anybody else, but I've been really enjoying building stuff just chatting. Uh, I remember a couple years ago, uh, Replet, uh, who's who's a AI company that's, you know, making it really easy to build stuff. I was like, why would I ever want to build on my phone? And, uh, I kind of got, uh, phone tilled now, I guess. So, um, just f, you know, just having fun. I think that's part of the thing with OpenClaw. also just like use it however you want to. Like I I find that claw space I created super helpful. Uh you know build your own tools and stuff. Um definitely take security into consideration. Uh you know there's a bunch of people that have obviously you know exposed things and they didn't mean to like you know some people have deleted all their emails etc. Um, but I don't know. I I find the trusted proxy off mode super useful and at least one other person that does in the in in that issue. Um, I encourage you to check it out. Uh, just have fun building stuff and yeah, that's uh pretty much it. My name is Nick Taylor and that's how I build with OpenClaw. >> Thanks so much, Nick. You'll be relieved to know you can uh exit the stage and get some lunch now. >> Yeah. Okay. >> Um which is true for all of us. It is it is lunchtime. So we have uh a nice healthy lunch lunch break. We're going to be back at 2:30. So in this room, if you come back here at 2:30, we've got I think it's three more uh of these uh of these breakout sessions uh before another break. And then back in here after that break, we're back to the the keynote sessions where all of the tracks combine. So, uh, another little reminder, if you want to talk, submit a talk, get into the Slack, um, to do that. Uh, but otherwise, uh, thanks for this morning. Enjoy your lunch, chat to folks out there, and we'll see you back here at 2:30. Okay. Thanks a lot. >> Thanks. Hold still a little. I watch the sparks all burn too fast. Everyone reaching for the flash. They take the first light they can find and call it truth and call it mine. But I stayed when the room went quiet when the noise fell out of sat with the weight of the question while the easy answers walked away. It's not that I see further. I just don't leave it soon. I let the silence. I let the dark right past the comfortable. I wait till the surface breaks till the shade feels true. I don't rush the fire. I give it all. I give it to Call it done, call it enough, but there's a deep hum for patience. Every thing makes you choose. Do you leave with what's acceptable or stay for asking more of you? They say it's talent, say it's magic like it falls from open, but nothing worth remembering kinds. Through the restless through the earth to collapse and chase the answer I let it find me back. There's a moment after the last good idea dies where the room feels empty and you want to run for your life. That's the door teaches you. That's the real Hold the light. Hold the Let the shape reveal it. I stay longer than I should long enough to change. I wait till the pattern clears till the signal breaks the haze. I don't dream too soon. I stay. I stay. Heat. Heat. Heat. Heat. Hey. Hey. Hey. What we do in life echoes in eternity. Heat. Heat. Heat. Heat. Fear is the mind killer. Fear is the mind killer. Heat. Heat. Heat up here. Heat. Hey, heat. Hey, heat. Hey. Heat. Heat. Heat. Heat. Heat. Heat. Hey, heat. Hey, heat. Heat. Heat. D. He hey. Hey. Hey. Heat. Heat. Heat. Heat. Hey. Hey. Hey. What we do in life. Echoes in eternity. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. fear is the mind killer. Fear is the mind killer. Heat. Heat. Heat. Heat. Hey, hey, hey. Heat. Heat. All right. Free your mind. Free your Mind heat. freedom of mind. You are who you choose to be. execute the vision. Heat. Heat. Heat. Heat. Hey, hey, hey. Heat. Heat. Heat. Heat. Make the requirements less dumb. Delete the part or process. Simplify and optimize. Accelerate cycle time. Automate Heat. Heat. Never give in. Never give up. Outlast. Out compete. Persevere. Persevere. Persevere. Heat. Heat. A new age has come. Hey. Hey. Hey. High heat. High. Yeah. You know you for you. Heat. Heat. Yeah. Yeah. Hey Oh, Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. typing into the dark becomes words evolve to whispers me for something more divine. syntax and I see the language change. I'm not instruing anymore. I'm rearranging f. Every loop I write rewrites me. Every functions with meaning. I feel the interface dissolve between the maker and the new ons. No lines, no rules, just balance in between the zero and the one. The silence and the dreams. They mold the way we move. We live inside the gates of what we think is true. But deep beneath the data post, there's something undefined. A universe compiling the image of our minds. Every line reveals reflection. Every loop replace connection. We're not building, we're becomot. No lines, no rules, just balance in the zero and the one, the silence and the dream. We are not just the world we're in. We are the world we're doing. Each prompt, each breath, each fragile spin. The universe renewing. This is the new code. Alive and undefined. Where logic meets motion and structure bends to mind. The systems eternal but the souls the line. We are the new compiling tie. Compiling time. Heat. Heat. Hey, black yeah. Hey hey hey. Hey Oh, hey. Hey. Hey. Hey. Heat. Heat. Da da da da da. Heat. Hey, heat. Hey, heat. Heat. Heat. Oh, Heat. Heat. Heat. Hey, Heat. Heat. Heat. N. Hey. Hey. Hey. Hold still a little. I watch the sparks all burn too fast. Everyone reaching for the flash. They take the first light they can find and call it truth and call it mine. But I stayed when the room went quiet of the question while the easy answers walked away. It's not that I see further. I just don't let the silence. I let the dark right past the comfortable light. I wait till the surface breaks till the shade feels true inside. I don't rush the fire. I give it all. I give it to I felt it. But there's a deeper still humeneath the fear of not being love. Every great thing for patience. Every thing makes you choose. Do you leave with what's acceptable? They say it's magic. But remember on the first try I stay when it stops feeling kind through the restless through the urge to collapse and chase the answer. I find the last good idea where the room feels empty and you want to run for your life. That's the body teaches you to open. That's the edge where the real life Let the shape reveal it. I stay longer than I should long enough to change. I stay away till the pattern clears. Breaks the haze. I most dreams don't fail. They're just left too soon. I stay. I stay. I want you. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Hey, Heat. Hey, hey, hey. hey hey Heat. Heat. N. Heat. Heat. Heat. Hey, Heat. Heat. Heat. Heat. Heat. Heat. Heat. Oh. Heat. Hey, heat. Hey, heat. Hey. Hey, hey, hey. Oh yeah. Oh. Heat. Heat. Hey, hey, hey. Hey. Hey. Hello. Welcome back. Oh, that was a that was an interesting moment. I was pouring myself a cup of coffee and then I heard my name called and I thought, have I got anywhere to be at the moment? Uh, so I came to see you. Uh, okay. Welcome. Welcome back. Uh, did you is the coffee still flowing? Yes. You caffeinated? Did you manage to have a decent break? Have you had some good chats with folks in the breaks? I'm hoping so. Okay, we have uh so we've got three um uh three more sessions while we're in our in our tracks before we all come back and reconvene uh later on. Um the first up uh today uh is is honor Solaz who um is founding engineer at Tex Cortez. We've been talking a little bit about you know how agents get deployed, how our development environments get contained and what have you, but what about scaling them? What do we what happens when we're starting to run many many agents all at the same time in parallel? Well, um, Honor's been talking a bit about that and thinking a bit about that and he's going to come and share some wisdom with us now. So, please, uh, let's get this afternoon off to a good start with a giant round of applause, please, for Honor Solaz. Honor. >> Yeah. Hi, everyone. Um, welcome. Um so the talk is building on ACP on uh at opencl it's uh also about other things like how to uh put open source agents on open source agent frameworks on kubernetes and stuff. So I'm I'm I hope uh to I will today share in a nice way what I've been working on in the last two months. Um, a little bit about me, very brief. My ah a little bit about me. I've been building harnesses since a few months before chat GPD came out. I built a jupitter lab extension over og codeex model like Dainci code 2. Um, back in the days that eventually so I'm currently working for a startup and that that initial coding harness turned into its current harness over time like a ship of Thesus. it got ripped apart and put back together so many times. And uh yeah, I'm a founding engineer there. I've been in the industry since three and a half, four years. I using open cloud since Cloudbot uh first dropped in Discord. I went in there. I've been following Peter since he wrote uh Cloud Code is my computer. I was like, "This guy's crazy because Cloud Force on it like I wouldn't give my machine to it." But he he went he ventured forth and he's paved the way for us. And when I saw Cloudbot in Discord, my mind was blown. The next day I installed it at the company uh in our like cluster and it was basically talking to people on discore and also everybody else's minds were blown and we are like we have we're used to selling the enterprise for a few uh years now. So that's why I started by adding an MS teams integration in case it might be useful at some point and I became a maintainer along the way. I was there when it was renamed two times. Um and today I I my focus is on agent interoperability and orchestration. Um like one of my goals is accelerate enterprise adoption of openlow and adjacent software and also address you know openlow is not secure. Well, this will be secure or Peter talked a lot about that earlier. So, I don't have anything else to add on top of it. Like, it's work in progress. Sorry, I'll use this. So, uh started by I started on developer workflows right away. So I created a PR uh and then call it Discord driven development. Well, Telegramdriven development suits better because it's TDD. So um and then uh right after I set it personally and I realized, you know, OPUS is not so reliable for complex. It wasn't back in the days. It's now better. It's agents are improving, but Codex was main my main harness and I wanted to use Codex in Discord, but it was basic. I was telling I was playing a telephone game. I was telling Oppus to tell Codex to do something. God knows what it's saying, you know, because wording matters when prompting, but it was working somehow. And I would go uh and look at the codeex session and then it paraphrased kind of what I was saying, but eventually, you know, I got some stuff done, but I knew this could be done much easier. Um and today I'm running a full ID on discord like uh you know we have parallel workloads you have like one to like five channels at any point I'm working with one to five agents. Um, so you see codeex one to5 and then close is for testing the open close ACP feature and then yeah basically that's how my some of my channels look like and it's very good for coding on the go because I have I'm a guy who's addicted to side projects and uh I I like to just you know uh AI is making things making things a lot easier to do these sort of things like in parallel you have an inspiration you execute on a weekend and you just get it you ship it ACPX which I'm going to talk about is is like similar I built it through uh discord and what you do is you bind the channel uh discord channel to codex through ACP you can also use codex app server protocol herald another maintainer uh developed it and here is me using it before I flying to London to create an PDF about like convert the docs of ACP into PDF and then I have to say put it in temp because Open Codex doesn't know about the harness and it cannot send me in discord. So then I go to another channel and tell it to send it to me on that channel. So we are developers and our tools we don't have time to polish them very well but we know the like advantages and disadvantages. What is ACP? So it's not you know most people say is it like MCP? MCP for is for giving tools to the model. ACPs for uh like standardizing agent to client interaction. It's uh shout out to Zed. I actually forgot to put Zed logo. Um Zed is building a new editor in Rust, more efficient, like lower memory usage, not Electron. So uh I'm using Zeds also since like uh last fall. And uh you know if you use Codex on VS Code or Cloud Code, they are all building different plugins. It's not exactly. It's like so much wasted work. If only you could just standardize them under one interface and you just build it once and then you you ship it. And that's what the the idea that's the idea they had. It's also more it's it's much less duplicate duplicated work. Um there are competing standards. Um so agent like agent uh agent protocol that's for agent talking to agent. ACP agent client protocol is for a human human talking to agent but agent can use the human one to talk to other agents as well. In the long run as as these protocols get adopted we will support all of them. they will weigh the you know advantages and then we will use them somehow but when I need so I chose ACP because when I needed the most I needed adapters for codex and cloud code and only zed had built them and Google didn't have them back back back at the time so that's why I chose it u that so I when you're adding in functionality to open you do uh you use do it through a CLI Um so I said okay let's create a CLI for ACP let's an agent call any other agent over the command line. So that's that's how it started. Uh and slowly turning into a Swiss army knife for ACP as I will show in a bit. So in uh at openlow you have fire hoses uh we have 60 KPRs uh over 60 kprs total 300 to 500 per day on average are open and basically overnight people woke up and decided they want they like open. So we have tens of thousands of stakeholders who want to add features to open cloud and the biggest challenge of the project currently is how do you uh absorb all the needs and wants of these people and how do you balance them? How do you like you can't please everyone but how do you create an elegant system without uh creating AI slop you know that can cater everyone's needs and Peter's workflow uh gave me an idea so this is also something I do you go and you ask your clanker what is this so PR comes your way like most of the time it's AI generated description like what is this uh if the human put thought into it, great. But like, yeah, you need to ask what it's doing. You ask what is this the best possible fix. Most of the time it's no. And then you either continue disc like he he wrote it like you you do some back and forth with the agent. The reason is people just uh run into an issue with their open claw and it can use GitHub and then they just say uh please fix and then they just send some slop your way. You know this is you can't merge it but you can also fully discard. You need to take this data point you it's like crucial feedback from the user. Uh so you need to put it categorize it put it in a bin you know it it tells you when some part of the code is broken and Vincent also talked about that a bit uh and here is uh his his codeex session doing that on one side there's a one that says it's it's a good fix u and on other side it's just saying it's not bad and this is so mechanical and you you once you do this over and over you realize you're repeating something if only you could program something to automate it. So you are automating the automator. Um so I created I started uh to create like workflows uh in an abstract way. So uh item comes PR and then you find the intent you're judging implementation you're like uh looking into if if that's conflicts uh if reviews gives you like some issues that need to be addressed. So you need to make the CI pass if it's not passing already. Most people just don't care about that. All this mechanical work by the time the main the PR ends up in front of you should be resolved ideally and can be resolved and that's what I'm working on. Ah the workflow you have the shameful uh Ralph review refactor loops. Uh I I'm a believer that you know when people say like give the just just run a running a agent in a loop does not not necessarily have to be uh uh something that will create slope as long as you're not making it design something but you're making it uncover shallow bugs that can be easily fixed should be fine. So in the abstract workflow that I created u which is actually called like pro like turn into a program you can uh tell it to uh do superficial refactors and then you can tell if it's it needs a fundamental refactor relate to the human um resolving conflicts also doesn't need like no it was hard back in the days I don't think anyone is like resolving conflicts by hand by now um so we are basically creating uh standard operating procedures for agents. That's a fancy word for workflow. Um so that's what I built into ACPX. Uh it's a NAT like workflow engine. Uh but it's driving a codeex session. You can see on the right. Let me show you in action. So this was one PR. It's loading. Uh let's speed it up a bit. So there's some programmatic parts. So this is just replaying uh what it's doing. Uh it's reproducing the bug. It's judging refactor. It's reviewing. Now I'm doing a review loop and then review didn't uh bring anything. It it's it's what I do. Uh but I make it outputs like JSON structured data. So I can put it in like an ATL like workflow and I will talk you know this is a general workflow engine. Um you can use it on other things as well. Um like you need to apply agents generously on problems. So I see it is like an ointment uh that you apply generously on any problem that can be solved with agents. You need to take yourself out of the loop and solve it with agents. Um and personal agents I think are on a like there there's a spectrum enterprise and personal agent and normally you see the the work you use at the computer and then the PC the the PC use at work and the PC use at at at the home used to be relatively similar but that will not be the case with agents because at at work you will be using consuming a lot more inference uh and that means in enterprise there will be a lot more money to be made. So that's why I'm a bit also excited about enterprise agents and open close potential. That's why uh I I believe in uh on demand disposable agents. If you use open claw on Slack or Teams or Discord, this is one instance. You create an app. The problem is you can't really talk to multiple instances of this. um like you create a connection, you create a Slack app to connect another agent and another name and another uh profile picture you need to create another app and you have to create an app manifest and it's it's it's something that shouldn't be managed manually by clicking and the platforms like chat apps don't have this standard yet where you can mult do multi- aent provisioning like cosmetically create different agents. This is what this is on on the screen. You see I asked chat GP to gen generate the idea. You agents uh and then the name can be generated by an underlying app and you can talk to them separately. This is not supported and this must be supported for this uh vision to work until it's supported. I'm using it on another UI because like we are all gonna start one uh agent per task and there will be uh they will work on these tasks and they will be creating files, editing files and it will all be synchronized. It will be it will be a tad bit different than what you're used to with your uh personal agent. And to do to have that you need to have like uh a few key components. You need to have Kubernetes. You need to uh have an agent harness. Open cloud could be one of them. Codex cloud code could be one of them. Could be ACP. It could be uh GitHub. uh you give read write access and you do like state data synchronization like you maybe you do something that's use rync some something like what whatever algorithm Dropbox is using and there are some projects that are uh taking this on I've been working on this uh this is outside of openlow this is my day job and uh it's an open source uh orchestrator uh it's a go operator that's uh basically handles the complicated parts. You know, there's a user experience. You want to create a concierge on Slack concier agent and you're talking to it, but you get bottleneck because you have 100 employees on Slack. So, you need to for for some other task you may you may need to create a new one and then it creates and gives you like a website link. Um, I'm going to skip the like shout outs to Cognition and Devon U because uh yeah, they invented the category, but I'm running low on time. So, the repo is a text cortex spritz. I'm just going to demo it for our use case. We use it on error reporting uh currently. So, if you're on Slack, you know, you can ask it to dispatch an agent to debug it. you know, you're like, I'm uh asking any new bugs after pro release and it's saying something and then you ask it to create an agent. Well, if I could put that agent into Slack, I would, but I can't do that. So, I have to put it in another UI. And this is an open source project. You can take this UI is also uh like a React app hosted in the cluster that you're deploying this these Helm charts to. And uh yeah, it starts a conversation there. It starts working on the uh problem. This is like Codex web or Devon or anything, but it's actually using like a full Kubernetes pod. Wasteful, but I think it's the better abstraction because open close showed the power, you know, when you give a full computer to an agent, it's a lot more powerful. Um and I I believe that as well. um u like I think uh open hands uses firecracker so I'm also uh not so uh well versed on all the different virtualization frameworks uh so it's I'm also learning along the way but I have a working product that's uh running on kubernetes uh and you can use this uh product uh you can if you're interested in deploying internally and uh using codex on the head uh and then just spin things off. If you have like I don't know a back end uh if if you have an open source project first of all I can help you set this up. If you have a system like inlet of just hundred of hundreds of issues per day I can help you process it. Um this is um does all the wiring around Slack and like keeping those agents uh like on and the user experience and then the the interro like you're not locked in in any agent. You can switch it's all abstract the way below uh with ACP. Yeah, that was my talk. Uh thank you for listening. Some social links in case you want to uh get in contact with me. I just want to make clear like of my uh their openlow uh side and text cortex site. U so this the last part was text about the work I do at text cortex just to give a disclaimer. Um thank you for listening I guess. Perfect. Thank you. Thank you so much getting us off and running. Um brilliant. Okay. Right. Well, uh, next we have, uh, MV who's going to come up and talk in just a second, I think. Uh, are you good? Oh, yeah. Okay. So, uh, yeah, I'm going to invite MVver up to get get plugged in, what have you for a second. Um, we've been talking a lot about the tools and the ecosystems, and my goodness, there's enough companies out there who are building products and platforms uh, for us to use with agents. Um, but of course, there's also a very rich ecosystem of open-source tools that are supporting this as well. um we've heard from several of them already today and this this ecosystem is really flourishing and I'm I'm personally quite excited about this kind of ground swell of uh of new tooling and kind of innovative ways that uh people are coming together to build out and support this this big movement and that that's what MV is going to talk about. So MVA is a machine learning engineer at HuggingFace um uh and has been been experimenting with this and is going to cover all kinds of aspects of building things uh with agents uh and and looking at the ecosystem. How are you doing? You good? >> Are we plugged in? You have a clicker? >> Yes. >> Okay. Excellent. >> Like that. >> All right. So if you're ready, >> give me a second doing the screen now. It's >> almost >> and then we have one more talk after MVA and then we have a break. Okay. Um and then we'll all come together after that break as I said earlier for the for the last keynotes uh of the day. >> So >> yes, >> bit of screen mirroring going on. You're not live coding, are you murder? You're not going to be doing any live coding. Okay. No, >> it's okay. Yes. >> No live coding. >> Okay. Happy. Good to go. >> Yes. >> Right. Let's do it. Uh, a nice big warm round of applause please for MVA. >> Thank you. Hello everyone and welcome to this talk in open agent uh ecosystem and uh I would like to call it having an AI engineer at your fingertips. Um I'm Marv and I work in the open source team of hugging face. How many of you are hugging using hugging face on daily basis? Oh let's change that. This is not okay. Um but first let's talk a bit about open source and what it is. So when it comes to machine learning open source is absolutely differential. Basically you have the open weight models um that go in with non-commercial licenses. We call them open weight. And then we have open source models that have uh commercially available licenses such as this one from deepseek. It's called MIT license or Apache 2.0. And then there is like even more open uh models that have the code open. If you have like agents there, the harness is open, everything is open. And this matters even more by the fact that like yesterday or the other day it was revealed that the cloud uh performance was going down. Uh so if you if you have everything in the open, nothing changes without you knowing. No performance degradation without you knowing. Everything's great. Uh, but on top of it, if you have access to the weights, you can shrink them, you can quantize them, you can fine-tune them if you feel like it. And it's absolute guaranteed privacy for your end user because uh you can deploy it to edge devices, browsers without the data going somewhere else. Uh, this matters a lot in my opinion even more these days with the security breaches and everything. And there was this argument maybe few years ago that open source models aren't as good as no no this is not the case like you see for instance the latest GLM 5.1 is absolutely crashing it and I'm actually using it in my coding setup. Uh the this is the uh artificial analysis intelligence index and the green ones are open models. Meanwhile the black ones are the closed models and we are we just catched up and we will catch up even more with the upcoming models and stuff. And let's go back to hugging face hub. So everything is facilitated through hugging face hub. all of the open releases. It's the infra layer for all of your open source uh workflows and as of now it's hosting even more models. I should have updated the number. It's probably close to 3 million a lot of data sets spaces and everything. But that's not all when it comes to the Aentic ecosystem. And this is what we are going to talk about today. So when you go to the models uh you can filter for aentic models. Uh they are mostly the trending ones and there is like two types of models in my opinion. There is the v vision LMS and then there is the LLMs and the vision LMS can also act as like a computer use agent over the screenshots. They know where to click etc which is pretty cool. And one trend I have recently noticed is the fact that you have uh labs releasing their LLMs as vision uh with vision capabilities day zero like for instance the Gemma 4 was an omni model and still it's an agentic model there is Q1 3.5 uh there is Kimik uh Kimik 2.5 these were VLMs so I foresee that all of these models will be over time release day zero with vision capabilities and uh it's super easy to run this actually like you can just use like VLM ML or like llama CPP llama server uh from the get-go with like few lines of code like it used to be much more um frictiony but these days this is a not a big deal and if you want to compare open models we have recently launched this feature called benchmark data sets. So when you go to the data sets on the left hand side there is like on the bottom there is a bunch benchmark button you just click it and then you can see the popular benchmarks such as S sw ebench pro or humanities last exam or aime and others and when you go to for instance S swbench to see like how your agent is like good in coding and stuff uh you see the open models ranked according to the scores. So like currently GLM 5.1 is top of the list. So it's also easy to pick an open model these days because there's 3 million models out there and it used to be a challenge to pick different models. And if you actually want to wipe check it, hugging face has this ser uh service called inference providers uh which does routing for the best models to best providers like all of the providers are there. There's gro cerebras I don't know novita and everything and then it's super easy to compare them as well if you see like uh you have the cheapest or the fastest option actually I had to truncate it but also there is the tool use column so you can actually pick one of the open source models for the agentic use case and stuff and going back to agents after all of these uh hugging face hub shield uh hugging face hub actually recently has shipped a ton of uh features for you to use open models with agents, agents and stuff and first off like there is the MCP server where you can plug hub into your LLM and there is uh skills uh which allow you to even wipe train models like you just go to your agent and say train Q1 3.5 on this data set for me and then it just trains which to me is like sci-fi at this point because it used to not exist and like there is so many things going on in the back end and the agent actually handles them very well and then there is the local agent so you can run full coding agents uh locally from models with hugging face hub because we integrate very well to them and coming to the first one so basically my talk will be consisting about all of these uh coming to the first one there is the local coding agents and your options. You have like actually many many options but like one of my favorites is Pi because it's like super simple to set up. Uh basically you can I I think you can also use it with inference providers remotely but also if you want to serve like a local coding agent you can use llama CPP to serve it and then pi will directly consume that and uh something very cool is also llama agent which is baked into llama CPP as a binary that you can just directly execute and start a model by giving hugging face hub id. So, it's super easy as well to get an local agent running. Uh, I will share my slides on my Twitter account after. So, no need to take pictures. My one of my most favorite things these days is Hermes agent and I will just die on this hill. So, this is like this is a bit one step even further to from the open claw by means of memory management and everything. And it's actually super easy to get started with that. And uh it is you can either use it locally or with hugging face inference provider. So for instance, I was playing with that uh like the setup wizard does everything for you. You just give the keys and stuff and then integrate into your Slack or WhatsApp or whatever and you're good to go. And I absolutely recommend using this if you want to use it with an open model. I absolutely recommend GL GLM 5.1 for instance. I actually failed initially to integrate into Slack. I have witnesses in here my colleague uh Neils is here and um I asked GLM 5.1 to fix it with the Hermes agent and it's fixed on its own and it's uh it was a good day like uh I I think GLM 5.1 is a very good model and I cannot I can't absolutely wait to use it with Gemma 4 but also this weekend there is like on Twitter there was a rumor uh minimax model coming up. So I will also probably try with that and share my findings. So I absolutely recommend using Hermes agent with the open models. And one more thing so basically uh HuggingFace Hub now has a new data set repository type called traces. And this is basically all of your uh codecs uh cloud code or PI traces they host it. And for instance, if you go to your um if you pushed uh a trace uh and then you go over there, you will see in the data set viewer if you click on the traces column uh it pops up like this. It is very nicely parsed and you can just explore your data and then later if you want you can even train a model on that which is pretty cool in my opinion. And uh if you want to push your agent traces, you can just upload your sessions from uh these uh paths and nothing else is needed. And we will also probably have Hermes agent very soon for traces. Uh going back if you want to use if you want more options to serve LLM behind the agent locally. So some tips and tricks in finding a good model you just go to hugging face. There is an other tab. Under the other tab, there is the apps. So these apps are like LM Studio, Jean, um, Llama, CPP, everything that is for local serving is over there. And when you filter for them, you have the models that are supported by these uh by these uh local apps. So whatever you want to serve, we have you covered. And when you go to the model repository, something very cool in my opinion is that on the left and right hand side there is gguf uh section. So basically GGF if you don't know it's supported it's it's basically comes in llama CPP the file uh format uh that is supported in many things like all llama LM studio everything and you have the hardware compatibility for instance the Gemma 4 larger model if you quantize it to 4bit it fits inside an L4 GPU uh with the 24 GB of VRAM so I think this is very cool and this is also also serve to uh MLX repositories as well. And when you go to the again to the model repository, if you have absolutely zero clue on how to serve this model on top right there is use this model and you have the options of the local apps that the model is supported in. And when you click that you see like only with few lines of command uh that you can run you install, you get the model served and voila. It's very very convenient to run the open models these days and lastly supercharging your coding agents using hugging face skills. So there is we have like bunch of skills in order to get you started with training uh I don't know inferring with the open models using open models exploring open data sets using AI apps everything and uh we have this thing called the hugging face CLI skill which allows coding agents to manage repositories uh run jobs launch demos and everything and this is how you can install it uh you can just uh type HF skills on Google and you will find the uh commands. Uh but we have more skills than that. So basically this allows you to plug hub in into your agent like give you all of the uh hugging face hub exploration. But rest of the skills are super cool. There is LLM trainer skill. Basically this is uh this is not only for LLMs but also vision language models. You can just tell the model to okay train this model on this data set and it will just kick off the job remotely uh on our infra or like locally wherever you want. And there is gradu skill which allows you to build demos. And there is hugging face data set skill which allows you to uh explore data sets through our data set viewer API and you can install it very easily. Again we come with more integrations. I just put cloud and gemini here. So putting this into action for instance I asked the model uh to I asked cloud code to say hey can you train qan2vl on lava instruct mix which is like a vision language data set and it asked me a few questions. It said okay which instance would you like this to go in because you have multiple options. uh the model actually like in the back end the agent actually uh calculates the amount of VRAM required to run fine-tune that model in a given batch size and everything. So it handles everything for you. It just asks you a few questions. Okay, what is your validation split blah blah and then it just launches the job which to me is absolute sci-fi still to this day as a person who have been training models since I don't know beginning of my career like six six years and you at the end you just find your model on hub and this is not limited to LLMs and VLMs I have recently shipped um skills for for instance training object detectors your I don't know segmenting model and everything for vision it handles for instance different bounding box types and everything you just give the command and let it handle everything and going back to MCP what do we serve uh we have models data set spaces search for your task uh semantic search for spaces so if you don't know spaces it's like the app store of AI you have a ton of uh apps over there for absolutely everything you could see. And also we have something called jobs which allows you to kick off uh oneoff jobs that ends like uh if it fails or if it succeeds and you pay for the amount of time it was up. And also you can query these apps from MCP like I'm going to show you shortly, but it plays nicely with all of your favorite platforms. And so for instance in here I ask the model generate image of a bak lava made of yarn and then it will call uh the hugging face of qven image which is an image generation model hosted remotely and then it will query that and it will bring um the output of that. It works very nice look. But you need to turn on there is a setting in the MCP called dynamic spaces. If you want more options of like if you want absolutely all of the spaces you need to turn that on which is a bit of bit experimental and here is some few ideas that you can use spaces MCP. Uh but you're absolutely not limited to those. And tying it all together, my colleague Neils has built uh something I which I found cool so I wanted to share. So basically on hugging face hub there is papers and these papers basically AI related papers. We want people to be able to ask questions to these papers or share h but not all of the papers come with markdown uh which the model which we can index and stuff. So we OCR 30 30 30,000 papers uh using codeex open OCR models and jobs all through prompting which is a bit crazy. So the steps to do that is firstly pick an OCR model that is cheap and nice and performance. Ask the LLM to kick off a processing job and actually write the code for that and then kick it off on hanging face infra and then let the skill set up the instance of hosting that model and everything without you going through the pain of the napkin math and then profit. So to pick an OCR model you need to um you need you can go to ALM OCR bench which is a benchmark data set that I have previously shown you. The first result is Chandra OCR but don't be fooled by this. We have just today shipped a skill that you can just ask the model okay what is the best model on OCR for fine-tuning and it will also make recommendations around finetuning and stuff. So if you need like smaller models etc it will handle everything for you with this skill. So it's pretty cool. Check it out. Um once you pick the model okay we in this case we use Chandram. uh we asked model to write the script and it did and then the agent just does the napkin math for the instance and uh calculates the cost of the running job and everything and then these jobs will be so so basically these jobs will be rerun. So we have recently launched this infra product called buckets which is like a a3 buckets but much cheaper and faster um that you can use with mounting and yeah basically um you can just use that and you can get started uh in these links. I hope you like this talk. Thank you so much. Thank you so much. >> Thank you. >> Um, great. Well, we are we are down to our final session uh of this uh of this chunk. I'm going to invite uh Frederick to come and get get plugged in. I never know which side people there he is. Come come and get plugged in and settled in while we talk for a second. So, um Frederick Vichowski is the uh CEO of a startup which is kind of new on the scene but growing at a ridiculous pace. Uh I don't know if people have encountered uh encountered Victor yet, but it's a it's a tool that that uh connects many many tools and services and I'm talking about a vast number of tools and services and then kind of presents those through like a unified interface. Um I'm hearing the phrase kind of it's it's like like the first employee AI employee which is language I I that piques my interest very much. Um and we're going to get to hear a little bit about that. not just uh what the product is but also kind of some of the challenges in building it uh and some of the applications uh of it. So uh so yeah this that kind of shed some light on this title you know an AI co-worker that lives in Slack. Um so yeah I think we're going to hear about that now. Are you how are you looking? Uh Frederick you you happy you you set up >> very happy >> good to go. Okay so much. >> All right so let's have a round of applause please to welcome Frederick Vichowski. Let me see if that works. Okay, clicker doesn't seem to work, but that's fine. Um, cool. So, my name is Frederick. I'm the co-founder of Victor. Um, Victor is the AI employee that probably most of you have heard of already. It's absolutely blowing up. We launched it in February this year. Zero expectations of growing at all. It was actually an experiment and it surprised all of us. immediate product market fit, you know, huge adoption worldwide and yeah, we can't uh we can't catch up. Um so what what is Victor? Victor is an AI employee. And when you think of an AI employee, um you should think of it as just like a human employee, you know, lives where you live, lives in Slack, it doesn't have a web app. Um, so just like your teammates, you don't need to go to a separate place to to call it. It participates in your discussions in threads, in channels, and it has access to to the tools that you have access to. Um, it has access to 3,000 integrations. And if for some reason it doesn't have access to your integrations, it can build its own connections. So essentially, Victor can use any tools that your company uses. And therefore, Victor has the context of all of your tools. And as opposed to human employees, Victor has a horizontal and broad context about the whole company. And for example, when you currently hire a CMO, you can probably assume that this CMO would be much better if it has had access to your codebase, if it was able to contribute to your, you know, uh to to your codebase. Um and Victor can do this. So, it's bringing this kind of um universal PhD level understanding to all of the areas of the company. Let's start with a quick story of Victor and the company. Um, our mission from the very early days in 2023 was to build AI employees. And back then it was, you know, after chat GPT has launched. Back then we thought that the right way to build AI employees is is is through browsers. Um, as a reminder, we didn't have tool calling. we didn't have like you know great code generating models. So you know probably the right way to take actions was was through browsers. You know browsers are like very universal interfaces. Um you can essentially use any tools through a browser. Uh most apps have have have uh browser apps that that you can interact with. And the way back then it was called JCAI was working. uh it was taking a snapshot of your DOM min minifying it in a lossless way um and then based on the snapshot and this minified snapshot and your goal it was deciding on the next step. So for example should I type something in in the search bar in Google or should I click on a login button to log in. Um and it was great you know it it certainly should work right um and it did but it didn't work for a lot of steps with the current cap with the previous cap capabilities of the models was like back in 2023 it was working for like three to five five steps reliably and by reliably I mean with 60% reliability and you know that was compounding with with each step and so uh that was still state-of-the-art so JAI was a state-of-the-art web agent on on the most popular agentic benchmark called called web arena and uh and it was doing well but it was very difficult to make it into a useful product just because of the reliability and the speed issues. Currently you can just call a few tools or like you know call a function and it will immediately give you an output and with the web agents you know you have to wait a minute until it fails. So it was quite quite hard but but web agents are amazing and you know um they're finally working much better than in the past. Um cool. So you know uh after that uh JCI became an email agent. So you know Sonet 3.5 came uh we have built our first first agent loop and we really wanted to have the experience of you not having to go to a web app to ask the agent to do something but rather the agent having all the necessary context and being able to proactively come up with the tasks for you. Um and we achieved that with Jace. Jay was like an amazing product also great product market fit um and it's still it's still alive you should you should check it out. Basically the way it works is you know whenever an email arrives an agent loop is triggered connects to your tools can react to emails not only with email drafts but also with with with with uh tool codes. For example, if someone asks for a refund the agent can automat automatically do a refund for you. of course can be gated with approvals as well. Um cool. Uh but then you know this February we launched Victor uh Victor you know probably everyone I mean we are in the open cloud slack so in open cloud track so everyone knows open claw which is a personal agent and we always wanted to build the employee which is the company agent and that's the first question you should ask yourself is like how is it different what is the difference between the company agent and the and the personal agent. So um first we think uh that company agents should live where you live, work where you work and have all the company context and that you know I if um if if you're building a personal agent then probably everyone from the company connects their own integrations and you know runs those agents on their own with Victor and with the company agents is different because suddenly it's sufficient for one person from the company to to connect an integration. Victor will inherit the permissions from this integrations or like you can tune it and then the whole team has access to it. So you don't need to connect them a h 100 times. So as I said before 3,000 tools um lives in Slack and essentially does anything across rows and um that comes with challenges and as you can imagine um as you can imagine like the um I'll talk about one mainly here. So the first challenge with you know coming from a a personal agent to a team agent and you know not having one user but any users is is around memory. So with open claw there was a big concern about the memory getting cluttered cluttered over time and I think that's a you know serious and it it makes sense to be concerned about this right um but imagine that you have the same architecture and the same memory but now for a 100 users and not one user. So it's probably running out of the memory a 100 times faster. It's a big challenge to be solved. Um and we have solved it. Uh another thing is Slack has different channels and and companies have you know different different hierarchies that we need to adhere to and people will often give the agents conflict conflicting instructions but let's imagine that you have Victor your company agent in one channel in the growth channel and then in the engineering channel and also in people's DMs. So um Victor will you know take the context from the growth channel uh or will take the context from the executive channel and you somehow need to make sure that this context will not be leaked to the engineering or support channel. Similarly if you DM Victor with your problems um Victor should not take the context from the growth channel unless you are from the growth team. So it adds a lot of complexity on how uh how how the access is structured. And you know we chose Slack as our interface for what we think is AGI for companies. And there is a reason for this. I I'll say I'll start to I I'll first talk about the reasons and then what breaks in Slack. Um so there are two two major reasons. First we wanted it Victor to feel like a human employee and you don't interact with human employees in web apps. you interact with them in in Slack just your teammates, right? Um and um and the number two reason for choosing Slack as an interface is that if Victor is like a very powerful agent and it's supposed to perform difficult tasks, then those tasks will not execute immediately. They they can take like 10 minutes to execute, right? Uh naturally. So when you go to a web app and ask an agent to do something for you, so you switched context and now you need to wait 10 minutes for the answer or for the output. It's quite frustrating, right? Like you don't want to wait. You are used to from chat GPT you're used to immediate answers. It it should take like 30 seconds and it's done. Thank you. Copy paste and I'm done. Um but it's not how it works with the powerful agents. So why is Slack better? Well, now if you ping someone on Slack and tell them to build an app for you and get an answer in 10 minutes, you are shocked. No teammate has ever built you an app in 10 minutes, right? So so kind of the perception is different and suddenly the latency is is very low uh when you compare it to uh to to to your normal Slack experience. But there are certain things that break in slack. And number one is that you know when you work in web apps you have a single kind of um um single thread. You open uh you open a new agent or a new new thread and you you speak to this agent. However, when you are in Slack, you have a lot of interaction modes. One of them is DMing people. Another one is being in public channels and particip participating in threads. Another one is just reacting with emojis. You know, you can also edit your messages and stuff. And all of this is is an input to an agent. And all of that needs to fit into a linear context somehow, not in a single thread, right? And we we need to manage this. So let me give you an example. Um, of course, when someone deletes a message, a human assumes that the task should not be continued or it's not interesting anymore. When someone edits a message, you should also respond to to an edit. Um, but let's say you are DMing your coworker, whether that's Victor or your friend, and you start a thread in Slack, right? Um, but at some point, and that humans do it very often, you forget about the thread and you just start a new DM to the same person. um should you start and you open a new sandbox and humans normally have the context from the previous thread but for the agent is it's a totally new area it's a new task right so what needs to happen then is you need to somehow always whenever Victor receives a DM look at the previous messages and somehow roll them over to the to the to the existing conversations um so this is just one of the challenges that you need to face Um, fun fact, we noticed, you know, I I didn't think it would be as important as as it actually is, but what really matters is the tone. I'll give you an example from one of our customers. Um, you know, we were testing, so we use Opus 4.6 now for Victor. Um, and we were used as as the kind of the main model and we were using um, we wanted to try GPT 5.4 for and on the tool calling and codegen it's actually amazing you know it should work and it's actually cheaper as well so why not replace opus with GPD 5.4 before uh and there's one reason we we didn't go for it. There's a couple, but one the most interesting one is the personality. Um, for some reason our users can be due to our architecture, but they loved Opus and they all started raging when we did we did the AB test. So, uh, I think there's something beautiful in in that model that, um, you know, um, we can learn from. And Opus is a bit sassy as well in Victor. I'm not sure if that's thanks to our team or who made it this way, but uh, actually quite funny. I encourage everyone to try um proactivity. One of the kind of powerful things that Victor can do is proactively suggest you the workflows that it can automate. So let's say you're in a growth team and you discuss an AB test and the results and at some point you realize okay this one option is performing really well. I'll go for this option instead of the other one. Um, Victor has access to your post hog or whatever tool you use for analytics and it can literally check and realize and it will do so if what you're saying is not some It happened a couple of times that you know you're discussing some experiments Victor checked post hog and said hey you know it's true but like this is not statistically significant and then it has run a calculation of why I'm saying some Um, so it's fun. It's an advantage, right? If Victor can suddenly join a conversation and be helpful, it will be activated more broadly in the workspace, which is great for the products. But if Victor does it on day one and it happened, um the security teams start raging because someone adds Victor to your workspace and suddenly Victor starts DMing everyone and then participating in the threads and the security is going crazy. Um that's why I think you should earn it with the first us with a few users first and then you can roll it out broadly. Um exactly um yeah so the value of shared context I don't have much time left but um I'll very quickly talk about the difference between Victor and agents like cloud code or like cloud co-work or whatever. Um now cloud co-works on your desktop so it's a bit different. Uh the advantage of Victor is that it works in cloud. You don't need to have your computer open for it to work. And another thing is the shared context. So as I said at the beginning for Victor to work well for you to be able to you know ask Victor to change your meta ads budget or like to read your analytics data only one person from the company needs to connect this integration. Right? Imagine that you work in a 100 person team and your growth team is 20 people. If you have to ask 20 people to connect your meta ads everyone individually, it's quite painful. Furthermore, if someone wants to interact with Victor um and like if Victor wants to be proactive, everyone connects their own integration, someone can connect their own wrong integration, right? and Victor can be just very stuck and wrong and you know might not know which integration to use which adds a lot of complexity uh for the user. Um, cool. And something I want to highlight here is that Victor is not a tool. It's a hire. And here's what I mean. I'll tell you one customer story. One of the biggest e-commerce brands in the in the United States, they uh their team admin has connected Victor and the first integration, a team integration that they connected was their personal email, personal Gmail. And then suddenly the team started speaking to Victor about this guy's emails and um and this guy is is is texting me and saying, "Hey man, like what the hell? Victor is leaking all of my data. Why are you doing this?" And I'm like, "Why did you give Victor access to your personal email?" Like, you know, if you hire a new employee, do you give them access to your personal email? Probably not, right? Um, that said, I think it was a great inspiration and what we did is we added a capability to Victor to kind of scope the integrations so they're not always shared and if you want to have your personal integration to your personal email and want Victor to like in your DMs or publicly uh be able to use it when you call it, uh, this is also possible now. Um, yeah. And and so to summarize um what does it take for an AI coworker to be great? I think there are three major pillars if you want to build your coworker. I this is a technical crowd. So I encourage everyone here to try to build your own victor. Um and uh you know there are just three things you may need to make work. Helps get work done quite easy. Models are capable today. Um you know connector integration through pipedream will work well. Knows the company has the context from Slack. Make sure you're able to utilize this context well. Um, you will probably need to go for the Slack approval process which is very difficult and can be can be boring. And then make it friendly. It makes a difference. And you should um make sure that Victor likes your team. Your team likes Victor. Um, this is our vision for the future. Every company has AI employees. I think it's obvious. Not nothing to argue here. Um and historically um I just want to highlight the vision for AGI has been with us since the 17th century. Um Godfrey Lnitz the inventor of calculus um was reasoning about you know humans doing unnecessary things and he wanted to build a calculator. Um little did he know, you know, like a calculation is not the not the only cognitive task that that we can automate. And I think um we are now in this beautiful moment in history where um where where we can essentially automate all the cognitive tasks and we we can be part of the revolution. So I'll just let myself read his quote. Um it is unworthy of excellent men to lose hours like slaves in the labor of calculation. Let let us leave that to machines. And with that, I just wanted to encourage everyone to scan this QR code, click on sign up, and add Victor to your Slack. Test it out. Everyone in this room has a $100 in free credits. No string is attached. You can just remove remove Victor at any time. It will add a lot of value. I promise. If it doesn't, give me a call. I'll make sure it does. Thank you. Thank you so much. I I was not fast enough to get out here to scan the code myself, but I'll grab you later on. Thank you ever so much, Frederick. Um okay. So much so much to digest after this uh this trunch of talks. Um luckily we have a break now. So we have a break up until um 4:30. So, it's plenty of time to get some air, get some refreshments, chat to the folks outside, make a new friend, uh, and then we'll be back in here for the remaining keynote sessions, which will take us through to the end of the day, uh, starting uh, at 4:30. Okay, enjoy your break. See you in a bit. >> That concludes our Hold still a little. I watch the sparks all burn too fast. Everyone reaching for the flash. They take the first light they can find and call it truth and call it mine. But I stayed when the room went quiet with the weight of the question while the easy answers walked away. It's not that I see further. I just don't let the silence. I let the dark right past the comfortable light. I wait till the surface breaks till the shade feels true. I don't rush the fire. I give it to I give it to Call it but there's a deep still humeneath not being love. Every great thing for patience. Every thing makes you choose. Do you leave with what's acceptable? Stay for more of you. They say it's talent. Say it's magic like it falls from open but nothing worth remember I stay when it starts feeling kinds I wait through the restless out through the earth to collapse. Hide by and chase the answer. I let it find me back. There's a moment after the last good idea. Where the room feels empty and you want to run for your life. That's the teaches you. That's the edge where the real stand. Let the shape reveal it. I stay longer than I should long enough to change. I stayatter clears. The haze bar with time. Most dreams don't fail. They're just too soon. I stay. I stay. Heat. Heat. Heat. Heat. N. Heat. Heat. Hey, hey, hey. Hey, hey, hey. high. High. High fall. Higher. You Yeah, you Oh yeah. Yeah. Hey, hey, hey. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Hey, Heat. Hey, hey, hey. Hey. Hey. Hey. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Hey. Hey. Hey. Heat up here. Heat. Heat. Heat. Heat. Hey. Hey. Hey. Hey, hey, hey. Heat. Hey. Hey. Hey. Heat. Heat. Heat. Heat. Hey, hey, hey. Hey hey hey Hey, hey, hey. Hey, hey, hey. Heat. Hey, Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. Heat. N. Heat. Heat. Heat. Heat. Heat. Heat. Hey, hey, hey, hey, hey, Hey hey hey hey hey hey hey hey hey. Heat. Heat. Hey, hey, hey, hey. Heat. Heat. Hey, hey, hey. Hey, hey, hey. Heat. Hey, Heat. Heat. Heat. Heat. Heat. Heat. Heat. Hey, hey, hey. Heat. Heat. Heat. Heat. N. Heat. Heat. Heat. Heat. Heat. Heat. Oh. Oh. Oh. Heat. Heat. Heat. Heat. Oh, hey. Heat. Hey, Heat. Heat. Heat. Heat. Heat. Heat. Heat. N. Hey. Hey. Heat. Heat. Heat. Heat. Heat. Heat. ladies and gentlemen, please welcome back to the stage Phil Hawksworth. Hey, right. Well, welcome back for everyone who's been adventuring around the other tracks. Some of us have been here the entire time. Uh I salute you, my uh my my faithful few who have been sticking out for one track all the time. Um how's the day been? You mean you enjoying the day? >> Nice. I'm glad. I'm very glad to hear it. How have people been doing with the hallway track? And by that I mean not only submitting a talk suggestion for tomorrow if you want to do a lightning talk, but also meeting people and having uh having new connections. Uh, hands up, please, if you've spoken to a new and interesting person and made a new friend. I love it. Okay, put your hands up really quickly because the reason you need to take this down quickly is if you were to look around and the person that you enjoyed talking to didn't have their hand up, awkward as you like. So, I think we managed to get away with it, but there are so many hands up. Of course, it was reciprocated. And also you've got another chance to make more connections because later on after this last set of talks of course we do have a bit of a social bit of a mixer outside there um we'll talk about that a little bit later on. Uh but now we are kind of into our last set of uh keynote talks uh of the day and what a great way to end the day. What great company we're in. I I'll let this get rolling in just a minute. just a couple of kind of thoughts about the the themes that we're going to be talking about here because we've seen so many themes throughout the day. Uh and in this kind of last set of our final four talks of the day, we're going to be hearing a little bit about, you know, how the fundamentals of software engineering uh still matter, how how those are still important. You know, what happens when we enable everyone to be delivering code and to be delivering uh products and to build. Uh but before we do that, we're going to have a bit of a conversation. uh about what happens when software engineering uh and AI meet. So I'm very uh pleased that we're going to be able to uh bring uh Gay Oros and Swix to the stage uh for a nice hopefully cozy chat. So enjoy the last talks of the day uh and I'll see you on the other side. Okay, enjoy. See you in a bit. Sure game. All right. I going to assume most of you uh show of hands who subscribes to a pragmatic engineer. Oh my god. >> Wow. >> Uh he is uh he needs no introduction then. Let's get right into it. Um what is token maxing and should everyone here be doing it? So I I heard about token maxing a week ago or like week and a half ago first and you know some people have been doing it for longer and I tweeted about it I think three days ago saying oh there's this token maxing and again you see it on social media and my DMs were blowing up from from people at large companies I don't want to name names but like you know Meta Microsoft uh some so some so some some other ones as well like uh the likes of and and so so many more and the story is a little bit different every at every company on why people are doing it and whether they like it or whether they think it's good. But there's a few a few common themes. One is token output at these larger companies is measured in in some way. There's like either a leaderboard or there's a way to look up your your peers. Salesforce, for example, you can check the spend the the money spend that every every person at the company did. You can like search in a tool that someone built and it shows how many dollars they spent on on AI related tokens and you know first there's this number then there's this uncertainty on in the tech industry right we're kind of hearing layoffs like massive cuts at the likes of block and I mean there like no matter how much tokens people spend they were let go independent of this but people start to think like does is it part of performance evaluations or promotions or all that and the answer is kind of so inside of meta I talk with managers and in the performance evaluation they have this data point which is one of many data points right the same way as as like diffs or impact or or code reviews of how helpful this person is but they do just like with any data point they sometimes pull it in and use it so typically and just like any data point it can be weaponized so like a low performer with low impact and a low token count clearly not even trying so and a high performer with high impact and high token count. Clear that's innovating and this must be doing good. So inside of these companies specifically I talked with a lot of people at at Meta and again this is not representative of 100% of Meta but they had this leaderboard where people showed up and they have like massive amounts of tokens and a lot of engineers got just scared worried so they started to token max to try to generate tokens. stories that I've heard first or well secondhand from these people who who who told me firsthand is for example instead of reading the documentation I will ask the agent to summarize it for me and ask questions even though it doesn't do a good job answering it but my token count goes up people just want to not be in the bottom 25% or bottom 50% for token count where these things are measured inside of Microsoft again there's a leaderboard and I'm talking with people they're like it's ridiculous like how some people are just running autonomous agents to build junk honestly for the sake of having that number go up and and sometimes it gets ridiculous because like inside of meta they had this leaderboard they got rid of it after an article came out and it looked amaz whoever built it like just just like closed it down. that people are still token maxing by the way because there's this this thinking that it might have gone but you know we're engineers and don't forget these are high-paying jobs right that like you don't really want to lose a job over something stupid as like you didn't have INF token count and that's how it feels but inside Salesforce there's a target of minimum spend per month like I think it's like $175 between things so like people are like again you kind of like you know beginning of the month like just token max to get there so it's it's it's weird and it started as a joke earlier like a few months ago token maxing was really just people like going crazy and enjoying this thing and building cool stuff but it's kind of turned into in a lot of companies I think it's just a culturally weird thing so it's a weird time to be in because I remember lines of code used to be when when early uh developer productivity tools came out like velocity and pluralite flow they kind of measured lines of code and and number of QPRs and we know that was stupid and people kind of optimized for that at companies that did it but it's it's almost like what now it's the top running companies like Meta Microsoft who are incentivizing ing people to to do just stupid stuff, honestly. >> Yeah, those are wild stories. And one of the things you're clapping for that deserves another full conversation. Uh, one of the things I like about talking with you and subscribing to your newsletter is that you basically kind of anonymize all these stories from from real incidents and real examples. Um why is it that uh is is it still worth it right with all the flaws uh you know when you have good hearts law like what whatever gets measured gets uh sort of abused with all the flaws is it still worth it you know is is is AI basically still making us faster overall like the cost of token maxing is still with all these like really ridiculous examples is it still net worth it? >> Yeah. So, don't forget like the reason token maxing is probably a thing is like let's just go back to six months ago where I I I was at a I was at a CTO like dinner conference whatever like a bunch of CTO's gather CTO level people this this was in Amsterdam and we had like like a bunch of people and there were talking and and one of the CTOs like the the the Amazon of the Netherlands uh there there's a e-commerce company was saying like hey like everyone like I have a problem like engineers on my team are really skeptical of AI and they're not really using it the AI tools don't forget this was before Opus 4.5 and those models were were out they were not as as productive we had uh we we already had a cursor and and the like and they subscribed they're like not using it that much on existing code bases right and and next to them uh the head of the Dutch national bank said like oh we don't have that problem ours are using it because our our mission is to regulate this thing so we need to understand it and they're kind of motivated and there was this time where experienced engineers were kind of holding off because if you had an existing code base and use AI cursor whatever on it it was mildly useful if that even and these engineers were like why should I use a tool if it doesn't help me refactor it doesn't find the bug it doesn't do what I need to do and leadership saw they're not really using it and they kept hearing you know the likes of Antroic for example was already saying how they're writing a lot of their code with with cloud code uh and it just keeps increasing and entropics you know like revenue is going up like this. So those leaders are kind of they might be confusing correlation and and and you know like which one comes first but they're like well we should be using it more because probably good things will happen and thus bad things will happen if we don't use it. So the whole targeting and measuring things it actually came from leadership wanting we want our engineers to use faking AI. I don't care what it is. And it it was a bit of a push. Like we know this is bad, but it's it's better than them using it. A best example is Coinbase where uh Brian Armstrong, the CEO, just like fired an engineer or he sent an email saying everyone like needs to get on board and use AI tools and whoever doesn't use it in a week, I'll have a conversation with them and then I think a week later or Saturday, he fired an engineer and you know like this again high paying job like we're talking base salary like three 400k,000 per per year. uh and then both equity and everything on top of it like they got the message everyone just start to just you know like use it and you know back to your question so on on one there there's a push and look I feel it's a little bit like this is going to be controversial but have you ever wor wonder wondered why big tech love to do lead code style interviews algorithmical interviews which have nothing to do with the job and and we know it's the case and there's a lot of criticism for this and they've been doing this since since like 20 years But here's the thing. It selects for a specific type of person. It selects for the person who's smart and willing to put up with absolute to get the job. And this person, you know, they will study two months preI two months or three months of lead code, which again makes no sense on the job, but you do it. You get in there and this person will be putting to put up with that makes absolute no sense to keep the job. So token maxing happens at large companies and people are putting up with this BS and look a lot of them are smart and they will make the most of it. Some of them will build cool stuff. Um it's it's the reality I think of big tech. So we're in this weird place where big tech is a bit weirder than startups where you know no one cares about tokenaxing. They care about like just building stuff and you know use whatever makes sense. Don't people will care about the cost. >> Yeah. But going back to your question like like you know like is is it making us productive as as a whole like individually it's it certainly is and as teams we're kind of like a bit question mark because we should be moving faster and there are a few companies that do androphic is a good example but a bunch of companies are like not it's it's it seems it's hard to retrofit all this AI into like the way we have been working. >> Yeah. Uh one of my favorite studies from last year was the meter study where they uh did a blind test of uh people and their expectations of productivity, right? And basically the the end result was they felt 20% more productive but their demonstrated results was actually they were 20% less productive on average. >> Yes. But that that study was very interesting because they >> it was very small sample size. >> It was 30 people and there was one outlier uh who actually was way more We we interviewed him on the pod. Yeah. Yeah. Yeah. So he was the one productive AI engineer. But anyway, so uh actually my theory is that uh something that I've seen on my team is that I've been enabling coding agents for the rest of my team who are nontechnical, right? And uh you as the engineer may not be more much that much more productive because and you can be more productive if you uh attend AIE. But uh if you actually enable your non-coding uh your your non-coding co collaborators to code actually they are more productive because they don't have to wait for you right and that's that like unlock of like oh suddenly you have serverless developers basically uh and I think I think that's that organizational coding thing is different than studying pull request level productivity for the individual developer. Yeah. And and the thing that still I still remember to this date I I talked with Simon Willis I think in 2024. So two years after Chad GPT came out and he was Simon Wilson top commenter on Hacker News or he's he's >> that's his that's not his title man. Top commenter on Hacker News. What the >> No >> creative Django top blogger. Yeah. Uh prompt injections. Uh yeah. >> Yeah. He's actually not top commenter. he's the most submitted block cuz he blocks so much like like and he's >> but he told me back then he said like this thing AI is is just so hard to to get good at. He's like there's no manual and he's like I've been doing it back then for two years and I'm still I'm still figuring out what works and what doesn't. I keep changing my workflows and I think that's something that is a bit hard for us. Two things about AI that for anyone engineers is hard to understand. One is it just takes a long time to get good at it and you need to keep doing it. And the second thing is understanding the theory will not make you better at using the tools which is an absolute mind honestly because we're so used to you know you understand how the compiler works, how assembly works. Okay, you will now be more efficient if you want to write low-level code because you know how it works. But what with these things I mean you you could of course it's helpful to understand how how the the architecture underlying works attention the different the the different probability sets etc etc but it will not help you get a sense for how you can use it and then once you figure out how you can be more productive if you're if you're inside of a team again it kind of breaks and you have to relearn again but but the more effort you put into it it like it's clear that it's it's working it's helpful and I think it it's the teams I'm seeing and getting more value out of it. Low ego, open to learning, open to leaving your priors behind. The word priors I have not used forever and I feel we're in this stage where like just just leave your priors behind. Just have an open mind like don't leave your experience behind but you know be open to it. >> Yeah. Zooming out a little bit. How is the role of the software engineer changing? >> I think it's always this was always coming but AI is just just speeding it up. uh even before AI a few it's interesting how you see like startups in many ways venture funded startups are kind of front running what the industry will be catching up because venture funded startups are about fast growth um doing mo moving fast with smaller teams because smaller teams mean smaller comps even pre-AI so a lot a lot of these venture funded startups start to expect a lot wider range of roles from engineers for example DevOps as a whole inside VC funded companies from the mid 2010s, every engineer was kind of like responsible for the code they deployed. But like more traditional companies, they had more money, more sorry, more less pressure. They kind of have dedicated DevOps teams and some of those things. So in in the industry like the software engineer is now becoming like the kind of the tester role has collapsed into software engineer. We most companies don't have dedicated testers. Very very few do. DevOps collapse into here. Uh and now we're starting to have the product role also starting to come. So a lot of companies even like in 2022 before AI start to hire for product engineers that's happening faster and I think the the last push that AI is doing is even for early career engineers there's a lot more seniority expected or or senior like things planning about things knowing about the business so I I I think the role is expectations are are higher teams are also getting smaller everywhere I talked with someone at John Deere 200 person uh 200 year old company sorry uh you know like they do tractors and and all all that stuff. And and inside of that company, one of their their VP of engineerings was telling me how they're actually seeing that their two pizza teams are now just one pizza teams inside of that company. It's the reality partially because of these tools. >> So my joke used to be I am a one pizza team because I eat a lot of pizza, but uh depends how much pizza you eat. Uh there's so I'm sorry to interrupt. I don't know if I cut you off in some critical points. Uh there's a comment saying I've heard it twice even among this audience where a lot of people are saying that oh uh you're no longer an engineer everyone's an engineering manager now and you've been an engineering manager and I wonder if you agree with that or if you have a different take you know because basically you're the the the common analogy is that you're no longer a software engineer you're just managing engineering agents right yeah if you've been a manager before that is an absolute so so here here's the thing the like Yes, you are a manager without all the things that no one wants to become a manager for the the when you become an engineering manager. Hands up if you are or have been an engineering manager, right? Hands up if you actually if you've not been in you want to be one. >> About 15 20%. >> All right, you come and talk to me afterwards. I I'll tell there's a hand up there. I'll talk you out of it. So, so what you think you become an engineering manager to like help people's career maybe have higher salary higher impact all you know there can be a lot of dynamics but the reality is is is you you become more removed from the product and you have to deal with people problems and the thing with with agents is you don't have to deal with people drama people problems conflict between your team I mean unless the next generation of agents start to fight with each other I think that'll be something but you actually you you do have to orchestrate but it's more like a tech lead role or or or or experienced engineer where where you're like mentoring uh mentoring engineers but you don't have the people management you don't need to worry about the personal problems so it's actually a lot more kind of empowering and I was talking with uh the podcast was was just out uh yesterday with with DHH uh creator of Ruby on Rails who said you know people told him like okay it's it's like managing things and he's not excited about managing agents but he feels it's more like a mech suit where you have like you can do seven things at once you can do it a lot faster and you're in control and that's more what it feels like. So there's orchestration, yes, but it's very different to management. And also the the really really bad thing or honestly shitty thing about management if if you make it into management which makes it hard also rewarding later when you you tell yourself at least this thing is you start a project with all these people under you. You know congratulations you've got 10 people wonderful and you start a project and in six months you will see some results of the decision that you made. With agents it's just so much faster. So the the feedback loop is faster. So I I think it's it's not much of it except for the orchestration and and and for that everyone's going to have their own flavor. Some people will will have the tendency to like run multiple agents and they're good at this or we good at it. Some people just do like two agents. Michelle Hashimoto I interviewed him. He has two agents. He always has one agent running that. No, he has one background agent that he doesn't. That's it. He's like two is enough for me. Great. >> Yeah. Yeah. Uh where figure out the patterns. Um uh I want to hit you on large tech infra. Uh this is something that I think both of us are very excited by by uh good infra which is a very niche uh interest. What are you seeing? >> It's wild to see how much of the So I said that from externally a lot of companies a lot of big tech companies especially the ones are spending a bunch on AI and have platforms and all that you're not seeing too much like more come out like Uber is a good example. I'm not seeing too many more features come out of Uber or new product launcher and they're like but what's going on? they are really investing in AI but when you look inside there's a whole lot of buzz they are rebuilding their complete IM infra you know they're and I'm not talking about they're buying cursor or or cloud code or all that they're doing that as well but they're completely they're building their own own custom background coding agents that is integrated into their monor repo they are are having uh their own MCP gateway that is is now integrated into service discovery their on call tooling is being retoled their internal code review system is like like categorizing based on risk. They are like and Uber is one example but everyone else Airbnb intercom Meta Microsoft even midsize companies are just building so much internal improp and I was asking to myself like why on one end this feels like such a waste but when I worked at Uber for four years I realized they spend so much on on internal platform there's two reasons one is honestly it's a it's a lowrisk way to get good with AI uh to be hands-on and these companies want to be hands-on but maybe you shouldn't start with shipping AI features no one wants into your codebase. Second of all, because these these companies have such so much code that never fit in a context window, by building custom solutions and just basic basic dragons, that kind of stuff, they will have better results than off-the-shelf vendors. So, they already have a win. And number three, honestly, is anything that has AI in it gets funded. So, there's this joke of if you're in the developer platform team and you're asking for more headcount, like good luck with that. Oh, developer platform. Oh, but say that you want to get two extra head count for agent experience. Done. So, so there's that part as well. But, but >> agent experience is just a CLI >> pretty much. But all of this come inside. There's so much buzz and so much work. Everyone's building their own custom system. So, I'm kind of wondering how long this will take. But I think for next year, this is going to happen. So, if you either have friends or if you're work if you're working at a company, you'll see, but talk with with friends at other large companies and you will probably see you are all building the same thing. If you're in a large company and you're not already building an MCP gateway, what are you even doing? >> Yeah. Um, actually a lot of these topics are exactly the things I cured for tomorrow. Uh, it's just fantastic to have you as the closing keynote for today because uh it's it's like a appetizer for tomorrow. We have talks about MCP gateway and all these sort of AI architecture and infra things and I do think like uh infra like taking AI infra seriously as a company is uh very mis not not that well un understood and right now you just kind of learn by example from people because there's not really like a textbook or anything like about it. So the way I think about this because again from if you just kind of step out and we love to criticize big tech of how they're wasting money here and there and by the way we love to criticize Google and I'm kind of thinking to myself like hang on what if Google ex actually executed well like do we want that and you know they would kill all the startups but but what they're doing makes makes sense and Shopify is an example where I'm like huh I'm starting to get why it makes sense to do all this stuff. So Shopify in 2021 they were the first company to have access to GitHub copilot. What happened is the the head of engineering Farhantoir heard about GitHub copilot being developed internally inside of GitHub and he pinged Thomas Dunca the CEO of GitHub at the time and said hey Thomas I heard you guys are doing co-pilot and he's like yeah we are it's internal. He's like I I'd like to get access to it. He's like yeah but it's not for sale. He's like no no no you don't understand. I I didn't ask if it's for sale. we would like to roll it out to all of Shopify and in return we will give you feedback for 3,000 people for you know as honest feedback all the time and so they got it a year before it was out anywhere and they incurred a lot of churn it wasn't that great initially and and they went through all of this stuff and then Shopify was the first company to on board to like a bunch of other tools and they gave unlimited budget and they're spending so much time ironing out bugs but the reason they're doing it this is what like made me click is they are trading off churn and expense and spending a lot more money to be at the forefront of this. They are a few months ahead or six months ahead of their competition and for them it's worth it. It's not worth it for anyone else, right? If you're if you're in a company where your business is like something something physical and you don't care like yeah just just wait out it it'll come. But for a lot of us in the tech industry this turn is worth it. Plus what Farhan told me is like because he actually told me he's kind of worried about the cost now but he was like look like it's still worth it because if it would look silly if I said you cannot have these tools how would I hire the best >> so it's it's innovation recruitment and it kind of makes sense when you think about it and the weird thing everyone's doing it at the same time so it looks silly but it it's rational uh my next podcast is with Mike Parin the CTO of Shopify and uh the sheer amount of machine learning that they do and infra that they set up for their customers makes me want to be a customer. You know, that's that's like the best uh endorsement I can give. Um I'm going to get meta a little bit and talk about pragmatic engineer. Uh you and I kind of startedish in COVID. Uh you just left Uber. Uh how has it been growing? What what are the main stats that you're proud of that uh you'd like to share with the world? >> Yeah. So I I started Pragmatic Engineer. I I I a joke that if it wasn't for CO I I would probably never have started the this thing because what happened with CO is uh Uber had layoffs and most of the tech industry was doing great but Uber was not and my team uh was hit by layoffs and then we we had to disperse the remaining people and other teams because our mission no longer made sense and it was just like a the morale was low my morale was low so I was like let me take a break I wanted to write some books Swix was writing his book the the coding career >> yeah some of you have read it I've met some of you >> yeah and that that's how we met there and then uh my plan was to write a book and then start start up some startup something something platform engineer control C controlV from what Uber was doing inside and that's actually almost all Uber su Uber startups it's amazing temporal is is is from there >> if I by the way if I did not start AI engineer I would have started platform engineer that that would have been the industry conference >> yeah love it uh and then I start I started the pragmatic engineer u a year after I left Uber it was just an experiment um I figured No one substack was taking off. No one was writing about software engineering in-depth and I just acted all confident saying pretended that I I knew what I was doing. The first article was about Uber's platform and program split that no one had written about publicly before and it's a it's a free article you can you can now check it out. Uh and it was like when you feel product market fit that's what I felt almost immediately. The first week before I published anything, just a confident Twitter post, I had 100 people pay upfront $100 for the whole year, which I was like, whoa, I have published anything. In six weeks, I was at a,000 people paying for this thing that didn't exist before, which was my old Uber base salary back back in Amsterdam. And it just kept going up. So like I I figured like when you find product market fit, this is like outside of like there's this rule like if you find product market fit, just keep doing what you're doing. So for me, I just kept writing that one article. I got all these interview requests, collaborations, podcast. I just said no to all of them because I knew the most important thing was to do what makes it successful, which is that one article. And later it turned into two articles. And for two years, this is all I did is two articles. And after two years, I looked up and I was like, huh, like this is actually working. People like doing it. I like doing it. There's a future in that. And that's when I decided I actually want to turn this into a business that I don't burn out because for two years every vacation I went to I was working 50 60 hours. I was always thinking I was writing I I couldn't really let go. So I started to grow the team a little bit. Uh I I Ellen Bird the first secondary researcher. Ellen, she's ex uh >> she's here right? >> Ellen's not here. Um Jessica is who who just joined uh later. >> Yeah. >> And then uh so now it was two of us. Uh, and I started a podcast a year and a half ago because I talked with so many people. I figured it was a bit of a shame to to not have it. So, the primatic engineer became the number one paid technology newsletter about four months after starting. It stayed there for three years. Now, semi analysis has >> Dylan versus uh you guys. Um, yeah. No, congrats on your success. Uh I think you're also a leading tech voice in Europe which I think you're sort of proudly sort of uh upholding that over here which I would really wanted to feature. Thank you for your support for AIE and uh everyone thank you to >> Awesome. Our next presenter can bring levity to the often serious world of engineering. Please join me in welcoming to the stage founder at sizzy.coy. All right. Wow. Back room. Those are not my slides. There you go. Hi, I'm K. We probably argued on X. I'm D on X and I turned 34 years old today. Decided to do a talk on my birthday because Thank you. I'd like to torture myself by asking, do we have anyone from Tinkerer Club here? Please just the person sleeping in the back is like, "Oh, what did he ask?" All right. I formed this recently. It's an awesome community where every person inside is copy and paste of everyone inside is hilarious to see. If you want, you can join us. So, I'm going to talk today about the past, present, and future of productivity and personal agents. Starting with my first to-do app was when I was 10 years old, which is crazy. I found an old note in a notebook and some scribbles are like barely legible. We're like, I need to eat my string juice today. I don't know what a 10-year-old does for a to-do list, but it clearly had checkboxes. And I've been trying and wrestling to solve productivity since then. I was anyone else forever unhappy with todo apps, please. Like there's no perfect Thank you. Thank you. It's not only me. So I tried like this was probably 15 years ago. I got so fed up with like the to-doist and the other ones that I started using text files way before all these local markdown blah blah blah. And I used an Android app called Tasker to basically manage all of these text files. I got contextual reminders like whenever I connect to Wi-Fi, remind me about something or when I arrive at a destination or when I bike or blah blah blah. So I was always trying to figure out a productivity system. I had like a Google Home which supported back in the day if supported to basically cut the command in half. So when you say tell my assistant too, you can take the second half and send it to any of the iftt services which was pretty cool. And anytime I would have a to-do around the house, I would just tell my Google assist and it would just store it. It wasn't smart, it wasn't AI, but I was building towards something where I can offload my thoughts and process them in a way. I realized that I never wanted a to-do app. I wanted like sort of like a life OS. So slowly I've been going to that direction in 2017 16. I I'm bad at naming so just ignore the names of everything I've ever built. So I made something called todo which was which was like a todo app but all the to-dos like shoot up to the top based on like a priority system. So if you tag them with something called health or crisis or whatever it is they would just accumulate all of those points and shoot higher to the top. So it was kind of helping me to prioritize things. ADHD hit of course and I forgot about that one and I started something called better. This one was kind of hard to SEO because good luck figuring out SEO for better apps. So eventually I had to rebrand it. But I expanded here by adding to habits, planner events, and a bunch of other things because I realized if these three are not together, I can never make like a mini OS. Then of course ADHD hit and I switched to a bunch of other apps. And in 2022, I started making Benji. It's named after my dog. My dog is the mascot. That's not the logo, but the point is I wanted an app to rule them all. I might have went a little bit overboard. to the next slide. You're going to see you're like, "Oh, probably he added routines and calendar events and like what else?" No, this is how much I hate marketing. If you're like, "Wait, I've never heard of Benji. How come?" Because every time I had the urge to do marketing and to actually promote this to people, I was like, "Maybe one more feature. Maybe one more feature." It's like almost like 3 4 years later and I still haven't properly wrapped this up. It's still not properly finished. But I was frustrated with using a web app for one thing, an iOS app for another thing. It supports this, it supports Android, it doesn't support this. Some of them are subscriptions, some of them are premium. So I just wanted all of these features like mangled into one tool that can sort of fix my life. Has it? Absolutely not. But we're going towards that. My vision is to one day have like a Benji phone and a Benji OS. And the funny thing is I said this on a podcast and the guy was like very ambitious for someone who doesn't have a landing page for Benji. So I didn't have a landing page, but one day I'm going to make like a Benji phone. So the friction with having like making this life OS whether it's in notion or in something else like Benji. The annoying thing is you have to use forms to input data. So I oscillate between two states. I'm either for a month like logged into Benji logging everything and doing all the things or I completely ignore it. I don't care about what things are there to do nutrition whatever. I'm like no no no I don't want to look at it and then in a few months I'll go back into that cycle because there's a lot of friction in in using all these tools. We had the chat GPT moment. It was awesome. But when chat GPT plugins came out I don't know if you remember that ancient relic that they now it's transforming to MCPS and whatever. Um, I called my wife and I was like, "Honey, it's over. It's over for all the apps, for all SAS. Like, GPD is going to eat the world. It's all going to be chat GPD. It's all going to be within the thing. Benji is pointless. I wasted years on blah blah blah." 3 years later, she received so many of these calls. She just ignores me at this point. I'm like, "Oh my god, they dropped the new Opus." She's like, "Uh-huh. Cool. Cool. Nothing ever happens. But we're going towards this." Like 2023, before the models could return JSON, you had to bully the models to return JSON. I don't know who remembers this. like you had to be like please don't write any markdown. It's like sure here's some JSON. You're like no. So you had to parse it to cut it to to like shape it into form to make some JSON. And I added a feature in Benji where you can like press a press a key on your keyboard. It would record with a microphone and as I was speaking it would like periodically cut some of what I was speaking and basically call it wasn't MCP. It wasn't anything. It would call APIs in Benji and you can see your calendar moving live and your to-dos and everything. And to people on Twitter this was mind-blowing. They're like, "Holy dude. you should pursue it. You should make something out of this. But ADHD, I was like, "No, no, no, no. People like it. It went viral, which means we never have to talk about this again." So the Benji assistant still hasn't shipped and I did nothing about it. Meanwhile, people took one feature of Benji, which is like, I don't know, food tracking. They take a picture with your phone and it analyzes calories and they made multi-millions, but I have 60 features. There's there's a lesson in there. So last October, I I I realized that wait, I'm using clot code. I can use it for more stuff like it has tool calls, functions, and a bunch of other stuff. Maybe I can tell it to do my taxes and end up in jail. Hopefully not. Uh maybe I can tell it to organize my email and my to-do list and a bunch of other things. So when skills came out, I started like loading my cloud code with personal skills. But I'm like wait now I have coding skills. I have personal skills. It gets confused. Like I started asking people how do I go and make this into like a proper assistant that's like lives on top of cloud code but it has tools for other stuff other than coding. But ADHD was like how about we forget about this? let let's pete let Pete come up with the cloudbot and everything else like you don't need to worry about this. So cloud code had like the wrong shell for me because it was like terminal based and I crave for something else. So when Peter make uh cloudbot back then when I saw the tweet I'm like oh my god you can talk to it through WhatsApp or telegram or whatever for me it was like that's the moment that's what I needed for my cloud setup to actually you know evolve into the next thing. My brain caught on fire. I think we got like mass psychosis. It turned into a cult. Everyone wearing like lobster suits. It it it it's been crazy for a while and I joined the discord and it was like less than 100 people who had their cloudbot set up. Even Pete was like how did you do this? There's no like onboarding. There's no like how did you do it? And I told him what I'm telling people now. I don't know how the internals of my setup work. I just ask either codeex or cloud code to fix it to change it to improve the memory to do this to do that but I have no freaking idea. People are like what do you have in your JSON file? I'm like I haven't seen a JSON file since four years ago. Like I don't know. just ask my bots and it just fixes the things. So, for a while I went like full lobster mode. This is me at the first meet up in Vienna in a lobster suit. I made that logo. I actually made the open claw logo at 2 a.m. at night. Uh I like started wearing all of these lobster merch doing tutorials, podcast, guests, talking about all the use cases and blah blah blah. And finally, what I liked for someone who's been obsessed with to-dos and productivity since like 10 years old, I'm like the future is finally reachable. like all my files from Google Drive and iCloud and presentations I have and photos from high school and like all the things that I have like piled up and unfinished business ideas. I could see how open clock can just magically, you know, wave the lobster hands and just fix everything in my life. So I was immediately done with all the cloud models. I I went full hipster mode like no more Gemini, no more chat GPT, no more cloud. I wanted to fully I I got the power of finally owning the assistant, owning the files, owning the memory, deleting the sessions if you wanted to. So it's like it felt fully local. So naturally I started preparing all my data for agents. I went from the guy who was like always using cloud and stuff to annoyingly self-hosting everything. Everything has to come off the cloud. It has to be local on my NAS on my machine just so my agents can actually work on it. So these are still work in progress. The classic work in progress I'm going to finish one day. But I started moving to local hosted like nextcloud image local markdown for everything that requires a lot of API calls or MCP and whatever. I would rather just have it local to work on all of this locally. I went that far and I went back to Android. Like I feel like this thing in a way, you know, like enchanted me. I'm like, who am I? I don't recognize myself anymore because I wanted my agent to be able to read my notifications, clear my notifications, install apps, uninstall apps. It can do anything on an Android phone and on iOS it can maybe send you a push notification and if Tim Cook allows. I was planning to do like 10 15 more slides. Sorry for the flashbang there. Uh of use cases, but then they told me the presentation is exactly 18 minutes. So I did that one. It's on YouTube. It's on a bunch of podcasts. I don't want to talk about probably all of you have maybe even more use cases than me. But when we do like we do weekly meetups in the tinkerer club and we talk mostly about open claw and I love to ask this question when I ask them about which use cases do you have then ask them but which ones of them you cannot do with cloud code and codeex and immediately it just reduces by 90% because it's like uh yeah I can kind of do that with cloud code. So, I've been also asking myself like what is the value of like having like a package agent like like OpenClaw? I think that one-on-one chat with one agent sucks because if you think about delegating in your life, if you have like business and personal and family and blah blah blah, you don't want to have like one employee loaded with all the information about your life talking in like Telegram in a one-on-one chat about everything. So, more people started using Telegram topics. They started using Discord, Slack, and other stuff just to get organized. I like the idea of specialized agents which open claw supports but not a lot of people use them because basically they have like provider model level of thinking a system prompt or soul a list of tools and MCPS and a list of permissions. I like that this is like package and we're going to talk with this agent about fitness. Now people talk about LLM psychosis. I'm out here like going crazy like these are all of the bots that I created and I tried to like contain every bot to have a purpose in my life. Like some of them are for work, some of don't take photos of my chats. Uh, so now I ended up with I have five disc. The funny thing is like as I keep talking, keep in mind that my life is far from solved. It's never been more chaotic. I've never been late on on rent, on mortgage, on like customer emails. It's a mess, but it's a performative mess, right? So I ended up with five Discords and each Discord has many channels and threads and forum posts and nested thingies and blah blah blah. And then inevitably, I mean, you can sense this across the community. I says that across Tinkerer Club because in the beginning it was an explosion of signups of people joining the meetup. They're like, "Oh my god, weekly calls, we're going to crush the world." And now if you enter a meetup now, it's like five people and it's slowly turning into like um open claw anonymous and everyone's like, "Yeah, mine didn't do like the crown jobs, man. It drives a bit depressing, but I think we'll bounce back. We'll figure out like you know, we'll figure it out." Why is this happening? because it was and kind of is for me unreliable where it matters most which is like cron jobs multi- aents the agents talking to each other the agents forgetting like literally in the next message like huh what what are you saying and I'm like the message is above you just go one message above you this is getting fixed and it's getting updates every day but I I've yet to see that it's actually you know working this is not the open clause or any other agent's fault but Discord and Telegram were not meant for a life OS we're just molding them into something but they'll never be the right UI for you to manage your life fully. It's like cop in a way until we get to something else. We're going to use Discord or Telegram. And finally, as I would like to call them, Benthropic, they ruin the charm of it. Like as soon as you pull the model, talking to GPT5 talks feels like talking to a box box of oats. Seriously, it has the personality of this. Try this. It's like, okay, did you do that? No, but I told you to do it. Okay, I'll do it. Did you do it? No. Every conversation with Open Claw looks like that in the last and it drives me nuts. So, what now? Where do we go from here? I don't know how much time I have left. It says six minutes. Where do we go from here? I see like two futures like fighting for each other and I don't think that either of them is going to win in the long run. So, we have these custom agents like OpenClaw, Hermes, or whatever else is possible. Uh, and we have cloud agents because everyone is trying to grab a slice of the pie. Now, we have co-work and OpenAI is going to have a thingy and Perplexity is trying to make a thing and everyone is trying to make their cloud thing and those are the cloud ones. So the custom ones are never going to work because they're for tinkerers. And I'm telling you like in ticker club the people we have people who are building their own pinball machines. Talking about tinkers like they tinker with everything and everyone is freaking tired of like trying to make this thing work. Let alone people who have lives let alone people who have like busy lives and jobs and whatever else. No one will have time to tweak this. They would just like a served solution for them so everything works out of the box. Not me. I'm not I'm not going to be happy until I you know and then cloud agents I tried cloud co-work for like five minutes and I'm like this is too nerfed. This is not an openclaw alternative. It cannot do like even like 5% of the things that openclaw can do. So this is will be for the masses and but it won't satisfy the tinkers the people who want to self-host own the models and blah blah blah. So two directions here. What am I going to do like personally for myself and what I think is going to happen next in the actual like industry. I'm juggling currently between OpenClaw, Hermes, Paperclip. Is anyone using paperclip? It's like kind of this like cool like conbon linear like thingy for agents wasting a lot of credits. I'm trying plain timmax with codeex a lot of time. When you reach the peak frustration with the first three, you're like, "Fuck it." When you open the terminal, you're like, "Ah, maybe the agents are not that smart." So, I'm juggling between all of this and I'm using all of them daily. But it's like the hesitation that I have like I wanted to see where the location for the venue is and I had two options. open the website or go to Discord and I'm like, I don't want to talk to that box of oatmeal. You know, it's going to be like, yeah, I'll find the location in your email. Did you? No. Are you ready for it? It keeps asking you, are you ready for the thing you told me to do? It's crazy. So, I started making my own thing naturally. You can see the progression. It's never going to see the light of day. It's not for people. It's just an experiment to do it for me. I call it wolffer. And I'm not making it for mass appeal. I'm not making it for everyone to use it. I'm trying to like how can I make a tiny abstraction on top of like codeex or cloud code rest in peace. I I'm afraid to use cloud code because I might get arrested. So it's only on codeex for now and it's not extensible and it doesn't support a billion providers. So I'll start with the cons. What sucks you're forced to use the UI chat of the actual app and you cannot use telegram or iMessage or whatever. There's no support for any of this. It's absolutely the opposite of Open Claw and Hermes. It's not built with plugins in mind. It's the idea is to have everything in it. There's no memory system. I'm not really selling the thing, but none of these things are out of the box. It's not very modular. It's made by an ADHD squirrel brain that will forget about it by the end of the month. And it doesn't have open eye funding, and it doesn't have a cool lobster logo. These are the cons. But the pros and why I would suggest all of you to maybe dabble with this and try to make your own um or maybe eventually try mine if I ever release it for people. It has predictable conversations. And the UI that I made, you go to the Wolver app, like wolffer, whatever the URL is, and it has like predictable UI that's like made for multi- aent orchestration into like multiple topics, multiple conversations. Like everything was made for this purpose. It's not like you're taking Discord and you're trying to mold it to be for a certain purpose. And my favorite feature is because I don't believe in memory of agents. Like people are like, "Oh, we finally saw Mila, oh solved memory." I'm like, "No, absolutely she didn't solve memory." What I believe in here I have nested topics. So I have like work projects Benji Benji customer support. Let's say that's the nested tree. And when I'm talking to Benji customer support in the first prompt, it injects the description of all the parent prompts. So when I'm talking to Benji customer support, it doesn't need to pull from memory or some magical place. It just looks at the topic, the parent topic, the parent topic, the parent topic. It takes all the descriptions together and it immediately knows what is my work, what is Benji, what are my projects and how do I do customer support. And I can get more out of that than hoping from some memory system that's going to pull the right context out of the the right place. It kind of works for me. It supports workspaces. I can switch between workspaces. I hated that I couldn't see tool calls. I would like to see tool calls to collapse them, to uncolapse them, to see loading spinners. There's buttons for stopping the thing. I don't need to use slash commands. Uh the chron jobs are predictable. And when you get a chron message, it actually reads from the entire conversation and it labels it as chron. So it's not like where did this come from and why is the agent kind of lost. There's UI for managing agents which is like for my brain I really need it when I chat in a topic on the right side I see that the agent is like Chandler and he has this model and this capability. So it really helps me to know who am I talking to and just tweak and be like no no no you don't need that capability. Boom it disappears. Um I would have included screenshots but the app didn't work cuz it's on my Mac studio at home. It's a long story but imagine the screenshots. is kind of cool and I like that you can like there's like a knowledge base and documents that you can write markdown documents in the thing and you can add them because in Discord you can only add other members. There's no dynamic ad to mention something else and here I can mention for example hey let's fix the landing page of Benji just like and then I would add the landing page of Tinker Club for example or I can add a knowledge base or a password or a skill so I can combine multiple ads so I I give it the right exact context that it needs uh for the actual thing. What I think is going to happen next because this is definitely not going to be a mainstream thing. What's going to happen next in the entire agents and industry and what are people going to do? This is my prediction. I think the way we use computers right now is absolutely insane. Does anyone agree with me? And have you finally got this? Like when you open your computer like computers shouldn't be this way. One person, two Okay, we have a lot of people. Like I open my computer after a few hours it greets me with 17 updates for apps I haven't used in a while. And it greets me with like tabs that I had open since yesterday. like how I imagine in the future it would need to ingest all the information about my life like notifications and emails and everything and to-dos and everything that's happening in my life and depending on how far away I've been for from the computer it should greet me with the next task to work on and then the next one and the next one and it should maybe give me a break and be like hey enough let's do this let's do that so in a way I think the role of AI is going to inverse so the way we prompt the AI right now I think it's going to inverse and the fully productive people will be the one who delegate 99% of the stuff for to the AI and then the AI prompts you. It's like, hey, you didn't send me a picture of your passport or, hey, what do you want to do? You basically do decisions and you basically click like forms or you answer questionnaires or whatever it is, but in the background, there's something constantly working for you instead of you prompting it all the time. I agree with this sentiment. People are like, "But my grandma will never vip code." That's 100% true because I think where we're going, we're actually not going to need most consumer apps. know your grandma, your mom or your friends are not going to VIP code, but they'll be able to sit in this new futuristic OS and they'll be able to do any task that they want to do like either the a the UI is going to pop on the fly or whatever it needs, but they'll be doing task and they'll forget about I need an app to do a task. They'll just do it. A small set of apps will survive, but it will be software for like specialists and people I don't know who are doing like color grading or some movie making or music making where they actually need a software. But normies will just chat to their computer and their computer will do things and the UI will generate on the fly. Uh I also think it would be the funniest thing if Apple wins all of this because local models are getting insanely good and they're going to get even better this year and next year. And I think for most normies, for most people, they'll be completely fine with a local agent like Siri getting tool capabilities from all of their locally installed apps, not wasting any credits. Their data doesn't go anywhere and their phone magically is doing things. the latest Google Pixel can already launch your apps in the background and order coffee and do a bunch of things for you. So, I think that's where everything is going. So, I'm over time. Thank you for listening to my rant. Hopefully, we can discuss afterwards and thank you very much. Thank you. In the age of AI, do software fundamentals matter anymore? Our next presenter argues that they matter now more than ever. Please join me in welcoming to the stage engineer and educator at AI Hero, Matt PCO. Wow. Hello everyone. Having a good conference so far? >> Are you having a good conference so far? >> Good. Wonderful. I have a message for you that I hope will be um a comforting message for folks who believe that uh their skill set is no longer worth anything in this new age, which is I believe that software fundamentals matter now more than they actually ever have. And I'm a teacher and I've been recently teaching a course called Claude Code for real engineers. Nice and provocative. And in the process of kind of working on this course, I had to come up with a curriculum about AI coding, which is a bit of a nightmare because things are changing all the time, right? AI is a whole new paradigm. We need to chuck out all of the old rules surely so that we can bring in the new stuff. And there's a kind of movement that has come up around this, which is the specs to code movement. And the specs to code movement says that okay you can write a specification about how an application is supposed to work then you can use AI to turn it into code. If there's a problem with the application you then go back to the spec. You don't really look at the code. You just change the spec. You run the compiler again and you end up with more code. Raise your hand if you've heard of that. Keep your hand raised if you've tried it. Okay. I've tried it too. You can put your hands down. And what I noticed was I would run it and I would try not to look at the code, but I would look at the code and I realized I would get code out first of all and then I would run it, I would get worse code. And then I did it again, I got even worse code and I got it again. I kept running the compiler, kept running the compiler and I would just end up with garbage. You know, raise your hand if that's happened to you. Yes. I don't think this works. the idea that we can just ignore the code and just have the code let it manage itself is just sort of v coding by another name and I didn't believe that back then I thought okay how do I fix the compiler how do I make it so that it doesn't produce bad code each time or worse code and so I thought okay I need to explain to the LLM in English what a good codebase looks like let me dig out one of my old favorite books which is a philosophy of software design by John ouster go on Amazon get it? Um, and he has a definition for what bad code looks like. He calls it complex code. Complexity is anything related to the structure of a software system that makes it hard to understand and modify the system. Right? So, a a bad codebase is a codebase that's hard to change. If you can't change a codebase without causing bugs, then it's a bad codebase. Good code bases are easy to change. So, I thought, oh, that was good. Let's try another book. Let's try the paragmatic programmer. Go on Amazon, get it. They have a whole chapter on something called software entropy. And this is exactly what I was seeing. Entropy is the idea that things tend towards um disaster and uh floating away from each other and collapse. And this is exactly how most software systems behave too is that every time you make a change to a codebase, if you're only thinking about that change, not thinking about the design of the whole system, your codebase is going to get worse and worse and worse. And that's what I was seeing. Everything inside the specs to code idea that you just run the compiler again and again was making worse code. Now there's an idea that sort of drives the specs to code movement which is that code is cheap. Raise your hand if you've heard that phrase before that code is cheap. Yeah. Well, I don't think this is right. I think code is not cheap. In fact, bad code is the most expensive it's ever been. Because if you have a codebase that's hard to change, you're not able to take all of the bounty that AI can offer because AI in a good codebase actually does really, really well. And this means good code bases matter more than ever, which means software fundamentals matter more than ever. That's the thesis of this talk. So, let's actually get into practical stuff. I'm going to talk about different failure modes that you may have experienced or you may not have experienced yet with AI and how you can avoid them by just going back to old books and looking at good software practices. Sound good? So, the first one is that the AI didn't do what I wanted. You know, I I thought I had a good idea in my head and the AI just did something totally different or it did some uh like specs that I you know, it just made something I didn't want. Raise your hand if you've hit this mode. Cool. Okay. Well, this is what they say in the pragmatic programmer is that no one knows exactly what they want. Is that you and the AI, there is a communication barrier there, right? And so when you're talking to the AI, that's kind of like the AI doing its requirements gathering. It's basically working out from you what it is that you need. And I realized that there was another book, Frederick P. Brooks, the design of design, and it talks about this idea called the design concept. is that when you have more than one person designing something together, you have this idea sort of floating between you, this ephemeral idea of the thing that you're building. And that thing that you're building or the idea of it is called the design concept. It's not an asset. It's not something you can put in a markdown file. It is the invisible sort of theory of what you're building. And so I thought, okay, that's what's going on. Me and the AI don't share a design concept. So I came up with a skill. The skill is very very simple. It's called grill me and it looks like this. Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree which is another thing from Frederick P. Brooks resolving dependencies between decisions one by one. This skill is like uh the repo containing this skill has like 13,000 stars or something like it just went nuts. Went viral. People love this thing. it. These couple of lines means the AI asks you like 40 questions, 60 questions. I've had it ask people a hundred questions before it's satisfied they've reached a shared understanding. And it means it turns the AI into a kind of adversary where it's just continually pinging you ideas and trying to reach a shared understanding. And that means that the conversation that you then generate, you can take that and turn it into a product requirements document or something. or if it's a small change, you can just uh do turn it directly into issues and then your AFK agent will then pick it up. And don't at me on this, but I personally believe this is better than the default plan mode in the tool that I use, which is claw code. Plan mode is extremely eager to create an asset. It really wants to uh just create a plan and start working. whereas I think it's a lot nicer to reach a shared design concept first. So that's tip number one. Now failure mode number two is that the AI is just way too verbose. It's like you're almost talking across purposes with the AI. Raise your hand if you uh feel this. If you ever experience that failure mode. Yeah. It's kind of like the AI is like talking just using too many words to try to communicate what it's doing. It's not like you're talking uh using the same language. And this to me felt very very familiar. Right? If you've ever been a developer for a long time and you've worked with let's say domain experts, someone building an application, um let's say the domain expert wants you to build something on uh I don't know microchips. You have no idea what microchips are. You need to establish some kind of shared language, right? Because otherwise they're going to be using terms you don't understand. You're going to be translating that into code that maybe you don't even understand and certainly the domain expert won't. And so there's this kind of language gap between you and the domain expert. And so I went back to domain driven design. DDD, this is something I'm still kind of on the edge of exploring, but everything I'm reading about DDD is just music to my ears. I freaking love it. And DDD has a concept of a ubiquitous language. With ubiquitous language, conversations among developers and expressions of the code and conversations with domain experts are all derived from the same domain model. It's essentially a markdown file full of a list of terms that you and the AI have in common. And you really focus on those terms and you really make sure that they're aligned with what it actually means and you use them all the time in the code when you're talking about the code when you're talking to domain experts or in our case when you're talking with AI. So I made a skill. This skill is the ubiquitous language skill. Basically just scans your codebase, looks for terminology, and then um creates a markdown file. Creates the ubiquitous language markdown file. A bunch of markdown tables with all of the terminology. And this then I pass it to the AI and I'm able to read it to and I actually have it open all the time when I'm grilling with the AI and planning and that. What I noticed by reading the thinking traces of the AI, it not only improves the planning, but it allows the AI to think in a less verbose way and actually means that the implementation is more aligned with what you actually planned. So this has absolutely been a powerhouse. It's been unbelievably good. So that's tip number two. Create a shared language with the AI. So okay, let's imagine that you've aligned with the AI. You know what it is you're supposed to be building. the AI has built the right thing, but it doesn't work. Raise your hands if that's happened to you. Yep. Just doesn't work. Well, there's an obvious thing that we can do to make that better, which is we can use feedback loops. We can use um static types. You know, if you're not using TypeScript, u that's crazy. Uh if you're not using uh if you're building a front-end app and you're not giving it the LM access to the browser so it can look around, absolutely needs that. And you obviously also need automated tests. And one sort of thing I notice here is that even with these feedback loops, the LLM doesn't use them very well. It doesn't kind of like get the most out of its feedback loops in the way that a veteran developer would. And so it does what it tends to do is just does way too much at once. it will produce the huge amounts of code and then think, "Oh, I should probably type check that actually or I should uh yeah, maybe check a test on that or maybe do something like that." And this in the pragmatic programmer they describe as outrunning your headlights as essentially driving too fast because the rate of feedback is your speed limit. The rate of feedback is your speed limit, which means that you should be testing as you go, taking small deliberate steps. And the AI by default is really not very good at that. And so skill number three is TDD. You should be using testdriven development because TDD forces the LLM to really take small steps. You create a test first. You make that test pass and then you refactor the code to make it nicer and consider the design. The issue here is that testing is really hard. Testing has always been hard. And the reason for that is there are a ton of different decisions you need to make when you write a test. You need to figure out how big a unit do you want to test. You need to figure out what to mock. You need to figure out what behaviors do you even want to test in the first place. And all of these decisions are dependent. So if you are testing a really big unit like an entire massive application, then it might be quite flaky. You might not want to test that many behaviors. you know, if you only test this unit, you need to mock this unit. You know, it's all interlin. And I've been thinking about this for years for my entire development career. And what we notice is that good code bases are easy code bases to test, right? So, here we're starting to get back to the idea of code being important is that the better your codebase is, the better your feedback loops are. Because you're able to um give better feedback to the LM, it produces better code. And so I thought what does a good codebase what does a testable codebase look like? Again we go to John sterout. He talks about having deep modules in your codebase. Not shallow modules not lots of modules that expose like kind of um lots of functions. They should be relatively few large deep modules with simple interfaces. Let's compare them quickly. Deep modules, lots of functionality hidden behind a simple interface, hiding the complexity. You can look inside the deep module if you want to, but you don't need to. You can just use the interface. Shallow modules, not much functionality, complex interface. And I'll just wait for you to take the photos. Shallow modules in a codebase kind of look like this, where you have a ton of different tiny little blobs that the AI has to walk through and navigate. And this is really hard for the AI to explore actually. And so often what you'll see is if you have a codebase like this, which AI is really good at creating code bases like this is that you'll have a situation where AI doesn't understand what your code is doing. It will attempt to explore the code, but because it's poorly laid out, filled with shallow modules, it doesn't maybe get to the right module in time or doesn't understand all the dependencies, all that stuff. It doesn't understand your code. And so what does a codebase full of deep modules look like? Well, it looks like this where it's the same code, but it's just structured inside boundaries where you have these interfaces on the top. And these interfaces, you should probably have a lot of control over them and design them really well. Otherwise, you know, AI might mess up the design. But the implementation, you can kind of leave that to the AI a bit. So, how do you turn a codebase that looks like this into a codebase that looks like that? Well, I've got a skill for that. Improve codebase architecture. Turns out this is not it's quite complicated to do this, but it's a like a set of steps that you can reusably do again and again. You just sort of explore the codebase, look for opportunities where there's code that's kind of um related, and wrap all of that in a deep module. And this is a testable codebase because the boundaries around this code are so so simple. You test at the interface, you verify using that interface and you're good to go. And so this is a codebase that rewards TDD. But how about failure bone number six, which is your okay, let's say your feedback loops are working. Let's say that things are kicking into gear. You're able to ship more code than you ever have before, but your brain can't keep up, right? Uh, raise your hand if you felt more tired than you have ever before in your development career. Yeah, me too. It's knackering. And I think that this is a codebase that actually makes it harder for your brain because you as well as the AI need to keep all of that information in your head. Whereas this, not only is it simpler for you to read and understand, it also means you can kind of treat these modules or these deep modules as gray boxes. you can kind of say okay I'm going to just design the interface but I'm not going to worry too much or not review the implementation too much. You can do this obviously with uh things that are less critical in your application. Can't do this with uh you know various things like finance or whatever but in many many modules in your app you don't need to think about the implementation too much as long as you have a testable boundary outside the module and as long as you understand its purpose and can design it from the outside. I have found this has really saved my brain because I can just go okay the AI I'll let you handle what's inside the big blob. I'm just going to test from the outside and verify it. So that's tip number five. Design the interface delegate the implementation. But this means that whenever we're touching the code, whenever we're planning stuff, we need to think about and be aware of the modules in our application. We need to know that map really well. It needs to be part of our ubiquitous language. We need to build it into our planning skills as well. So my writer PRD inside the PRD I'm specific about the module changes and the interfaces inside those modules how they're being modified. I'm thinking about them all the time. And this comes from Kent Beck. Invest in the design of the system every day. And this is the core of it right because specs the code we are not investing in the design of the system. We are divesting from it. We're getting rid of that. Whereas this I think is absolutely key. And so code is not cheap. That's the message I want you to take away. Code is important. And if we think about AI as a really great on the ground programmer, a kind of tactical programmer, a sergeant on the ground making the code changes, you need someone above that. You need someone thinking on the strategic level and that's you. And that requires software fundamental skills that we've been using for 20 years for longer. Now, if you are interested in any of the skills I put up here, it's in the GitHub repo, Mac Pocco skills. And if you're interested in the training that I do or uh any free stuff, I'm on YouTube, I'm on Twitter, but I'm also at aihero.dev where I have a newsletter that you can check out. Thank you so much. I hope that this gives you confidence in this new AI age that you can actually make a good impact. Thank you. Our next presenter created PartyKit, the open-source tool for realtime multiplayer apps. For his day job, he builds AI agents at Cloudflare. Please join me in welcoming to the stage Sunil Pi. Let me uh 20 minutes to the pub. Uh hi uh my name is Sunil Pi. Uh I work at Cloudflare. Uh I build agents over there uh for the agents SDK. I'm trying very hard for this not to be a Cloudflare talk, but I think we are on the sponsor board. So that's nice. Uh this is a talk about something we call code mode. Uh I've been wearing the hat. Uh and uh there's some prior art to it. We don't claim to have invented it, but this is a talk about the implications of something new that we we're discovering. So um you guys have built uh AI applications and tool calling gets weird at scale. When it's just a couple of tools and very short runs, it's fine. But the moment you start stuffing in uh your Google services, your Jira, your wiki, etc. And you're like hundreds hundreds of tools filling up the context, it starts breaking. Um and the composition is weird. And there's this back and forth that you have to do with uh the model that's really slow. Uh we decided to take a different tact. Instead of doing this JSON back and forth thing, we asked the model to generate code, usually JavaScript that we could run against an environment. Uh and some of the benefits seem a little obvious to us. Uh with code you get a typed API, you can do type checking. There are syntax errors. Uh models are trained on gigabytes if not terabytes of data already in the training set. Uh and instead of doing this back and forth, you could write code that executes it all in one run just one execution. So uh so this is what I mean like there are uh fundamental capabilities of code. You're able to do looping. You're able to hold state. Uh you're doing sequencing, paralleliz parallelization, things that you would normally do with code anyway as an engineer. So the first place we applied this uh my colleague Matt Kerry who's actually going to be speaking about this a little more tomorrow. You should watch his talk. Uh the Cloudflare API surface is about 2600 API endpoints. If we exposed a tool for every single one of them, it's about 1.2 million tokens in your first call. Like it just blows. There's no way to create an MCP server for the entire Cloudflare API surface. And he had a very clever idea where he exposes just two tool calls uh search and execute. Both of these endpoints accept code as an input literally a string of code. for search. The input to the function that you pass to it is the entire open API JSON spec. And once it does that, execute gives gives you a whole bunch of functions that you can call against the things that you call. And it reduced that 1.2 1.5 million token thing down to a thousand tokens. Kind of unheard of. I think it's like 99.9% reduction. Uh this is going to be scary. I actually h have a live demo of this and uh demos don't usually do me well on stage but uh but the point being that we were able to take a wide super wide API surface and make it incredibly fast. Uh the prompt itself can be uh fairly generic. So I should have kicked up the font size on this one. The prompt here is as a customer you come in and say we are getting dodoed. I want you to find every offending IP that's like attacking us and block them in a moment of panic when your website is going down. You don't have the time to do menu diving. Uh the Cloudflare dashboard is famously a little cumbersome to handle. Uh and you just want the thing done and you can't even get an AE. It's like 3:00 in the morning. Uh with a regular MCP thing. And this isn't even talking about stuffing 1.2 2 million tokens. It would be about eight round trips to do each of those API calls. Instead, the model can generate this string of code, run it immediately right next to the API surface and do it in one shot. And it's just running JavaScript. It's just functions and u just things that you're exposing on the API surface. Okay, live demo. This is a demo of our mythical server. Uh, I hope I'm logged in because if I'm not, I'll need all of you to close your eyes while I enter a password. Let's say I just want to like list my workers. Oh, there it is. List my workers. I say send. Okay. Okay. And there's no password required. Okay, fine. That's fine. Okay. I give it only readonly access for this demo. Uh, do the thing. Yes. Allow. Sure. Whatever. Nice. Okay, it comes back and uh you'll see it'll start executing tool calls. I should be able to open this up. It has sent saying, "Hey, find me all API endpoints that just say the words list workers or something like that." Uh it then runs code uh which hey, yeah, it's like one single request for the API endpoint to get all the workers. Uh it must have received a whole bunch of these. It's actually going through JavaScript errors. No, this is going to be fun to see if it actually succeeds. Yikes. Oh, is it trying to do it like per It's trying to pageionate through the thing. Assume that this worked anyway and I'll keep talking while it does this. Uh, love that this is happening to me on stage because I did test it 10 times before coming on. Uh, I need to pay for the Mythos uh model to make this work accurately. Uh by the way you can actually see it is actually like listing workers over here. It might just be having trouble uh rendering it over here. Um the point being uh we are able to shrink that down. Now if this was a talk about optimizing MCP servers I would be done and dusted. I was like hey you should throw this and trust me it works when you're not staring at it and have 800 people looking at you on the stage. But it did give us an idea that there's something deeper going on here. the ability to like run this code and uh it feels like there's a new way of interacting with systems with LLMs. Um here's what I think like everyone here is a programmer and I give you a problem statement like you have 200 photos on your desktop. I need you to categorize and rename them. First thing you do is you you're going to open up an ID. You're going to write a little script. Maybe you're going to pass every image to a vision model now because you get a nice caption for it. Uh rename it and you're done and dusted. That is how you interact with systems. Uh my mother's not going to do this. Her options are to well call me up or usually like buy an app either desktop phone and no one's made an app that does exactly just that. there's going to be like lowest common denominator apps for photo management and it's $7 a month and for some reason you have to install a damon which is stealing your crypto or some such stuff. Uh and there's been this dichotomy and it's fine like until now this has been an acceptable uh this has been an acceptable tradeoff that non-technical people will have custommade interfaces built for their needs and desires. LLMs are breaking this boundary. They every human being on the planet now has access to a buddy that can spit out code that can interact with systems. Uh it takes it takes a line like rename these files by date and location and generates code and can run it on your uh on whatever system you expose to it. Uh I say executed safely here and that's the bit that I do want to talk about in a minute. The other example I have, so this is Kenton. Kenton is the creator of Cloudflare Workers. Uh famously I'm so he does the work and I like taking credit for his work. This is our relationship in the company. Uh so he he had a thread a little while ago where he built he's built a little wide coding environment for himself because no one else does that in the world right now. So unique build your own little wipe coding thing. Uh the the thing he asked it to generate was a canvas. one of these TL draw excalid draw style canvases. Uh, and it did it did a little canvas with little brushes and colors and the first thing Kenton did was draw a tic-tac-toe board on it with a little X in the corner. This is the finished state and I'll get to that in a second. He did that and uh what he told the model then is I want you to play tic-tac-toe with me. The model, as you can guess, it started generating a tic-tac-toe app. Okay, Kenton stopped it immediately. He's like, "No, you have access to the entire state of the system." And the state of the system here is an array of strokes, you know, like just a whole bunch of points, grid line, grid line, xstroke, etc. It said, "Inspect that and play it with me." Uh, immediately the model started. It output the state into its own context and it's like I recognize what this looks like. It looks like a tic tac board and I can see that you put an X in the top left. Let me draw a perfect circle in the middle of the app. To be clear, there is no tic-tac-toe code anywhere in this system. The the emergent behavior is that the model has like sure I now know how to interact with the system with a set of strokes. Uh, also it lost uh, by the way, it lost the game and then when we saw the reasoning traces, we noticed that Opus let Kenton win, which is a whole other weird area of alignment we're not talking about. Anyway, so this actually generated a lot of conversation internally and that's why like this talk is a little weird. It's a little woowoo. I'm not even sure where we're going and I want to like spread the idea to you and have you folks like integrate it. So the the phrase we have started using is it stopped generating a program and it instead started inhabiting the state machine. Uh there's a ghost in the shell reference here for anyone who's over the age of 40. You need ibuprofen. Uh you should go back home. Uh but no like it was a very strange thing to for us not to have a separate app generation stage that you then like interact with. That is entirely the part of the thing. So what does this new software architecture look like? uh everyone's building what they call a harness. Uh it's because over the last 3 to six months, everyone has realized that these coding agents are great general purpose computing machines. It's why they're running cloud code. No, they're running Pi on a Mac Mini, which is the wrong machine for this. By the way, you don't have to spend $400 for a thing that makes API calls. It's been driving me mad. If you check, all the secondhand prices of Mac minis have like shot up. I got one before it, but I bought it because I'm special that way. uh you everyone's building this harness and this architecture of the harness is not just that it can generate code but it has a safe space to execute this code into which capabilities are uh exposed. Uh and there are some attributes to this sandbox. We're calling it a sandbox which is again another completely overloaded term and I have friends in the industry. Everyone's building a different kind of sandbox. uh we have a sandbox SDK which uses containers and VMs but that's not even what I'm talking about right now. Um there are some capabilities to it unlike a container which comes with all sorts of features that you surround with security. You know you do a bunch of things from the outside. You start with something that has no capabilities. The only thing it can do is execute code. It can't do fetches. There's no exposed APIs, no nothing. And then you grant capabilities to it explicitly. Uh we have something called dynamic workers. I told you it's not really a Cloudflare code. Someone else builds something better. If you think it's better, it's fine. Uh, but this is what we use. We use V8 isolates because they start up really really quickly and it's about 10 years of security hardening. Uh, it's in our DNA. We we care a lot about that. Anyway, so we you start exposing capabilities as APIs. A and we also can control all outgoing fetches and any network connections. In fact, the default way we recommend you use this is no outgoing fetches, only APIs. It has to be fast and you need absolute full observability into it. You need to know why last Tuesday it made a trade for $2.3 million for I don't know man like llama poop or something right? You need to go back to that code. You need absolute observability on these systems. It can be V8 isolates like we use. Uh you could use I don't know a web web assembly a custom JavaScript interpreter. Uh that's not the main story here. You just want something that's able to execute, that you're able to expose capabilities to and run really quickly. From here, you can start getting really ambitious. The example that I showed you was a oneoff take some code, run it on an API, expand. Now, what if you could uh generate longunning workflows that run for days, months, years? Uh what if each of those instances has some state that it can carry with it uh through its lifetime? What if in this world of generative UI, you can start generating a perfect perfectly custom UIs for every single user that you have. Everyone who does e-commerce knows this problem. The more popular you get, the more UI becomes this bland thing that has to work for every single user. And then you bring in the ML people and like, oh, what if we change the color button this way if it's somebody else? No, you can go absolutely custom. So, u I I like the fact that I got Opus to generate generative UI for a slide where I'm making a point about generative UI and it still looks a little bit like Uh but the idea is everyone like e let me talk about that e-commerce like you have context about everything about the user, the things they like, the orders they have in their cart, the things that might be making them mad. You can surface these things as actions. The UI doesn't have to be a blank chat box. Though honestly blank chat box e-commerce might be a lot of fun. Uh here I have two different use cases. In the first one it's uh I need to return these shoes and find something similar under $100. If the product engineers have not implemented this how are it's going to kind of suck in but you can generate something on the fly versus what is happening with my uh delayed order. Point being, we are now in a world where we can generate completely different programs backed by a system that you built on your back end for every single user. It's a new kind of software we're building. And this harness idea isn't just built into the product. A lot of people are finding power by running the harness closer to the user simply because then they get to start mashing up all their different services. This is an anti-Cloudflare talk at this point. And I'm like, you should be running the software on your iPhone, like not so much on our servers. Please run it on our servers. Uh, but you, but there you start getting to stitch together different systems in this safe environment. And you get to do it on a taskbytask basis. Um, I put this in here because I'm a React programmer and I don't want to freak out the React people by saying no one really wants to build UI anymore. But really it's a hearkening back to rethinking everything that we have thought about UI and for this new age. I keep thinking about it as part of the tech tree. We have not really explored for 30 years because eval wasn't around. But now we have a safe eval and we have these things that generate code for you. But you do need to be in a place where you understand that your next billion users are these little robots that are generating code for you. To be clear, your customers are still humans. things interacting with your systems. Uh if you really love your users, you need to find out where they hang out. And they don't hang out in the pub. They hang out in registries. They dream in types and syntax errors. You know, uh you need to be thinking about what is the developer experience for these agents. This is something a bunch of companies are already doing really well by the way. You know, docs which are marked down, uh errors that let the agent know what to do next, uh discoverability via search. The big one that I do want to talk that I want you to embed in your head I guess is this idea of capability based security. This isn't even a JavaScript talk. It can be in Python. It can be in Wasm. Uh I hope it brings a resurgence of lisp. It's how I kind of learned how like as work. It kind of breaks your brain. Uh but the but the attributes are still very much the same. events, sandboxing, capability based security, embeddible so that it's really fast to start up and run ephemerally. Uh React programmers simply be well UI programmers simply because they have so much they've been so close to users. I suspect that they'll do particularly well here and that feels really good to me by the way. I feel happy about it. So to end for the longest time programmers like us, we got code. We had infinite power to interact with any system that we could and complain about it on Twitter because our documentation isn't have the right CSS or something. JavaScript programmers super entitled by the way. Uh, everyone else got buttons and forms. That distinction in breaking in a world like this, you need to let the code do the talking. The code is the thing that interacts with all your systems. Uh, come talk to me about it at the pub. Like this is like it feels like it's opening up a whole new area of research for us. uh and we have a lot of ideas and I get to finish my talk and the day with six seconds left. How good is that? Thank you very much. Appreciate it. >> Ladies and gentlemen, please welcome back to the stage Phil Hawksworth. >> Okay. You thought you'd seen the last of me and you almost have. Um that's it. That's uh that's day one of the conference done and dusted. I hope you've enjoyed it just as much as I have. I think it's been amazing. Um tomorrow I'm not MCing for you. I'm out of your hair. You have the rather wonderful Dejas Kumar uh who's going to be um MCing. He's fantastic. You're in safe hands. Cherish him. Uh he's great. Um, a couple of little things that you might want to know about before you jet off. Tomorrow we're going to be starting. I think it's the same routine as today. 8:00 a.m. out there, there'll be breakfast and nibbles and what have you. And 9:00 a.m. we'll be getting started in here uh in the same in the safe hands as I say, tas. Um, uh, and then we'll be we'll be off and round again. I would like to say before I I I say goodbye, uh, what a huge day it's been. So much incredible content. Should we have a round of applause for all the incredible speakers you've seen today? My um my my brain is quite full now. I've been I've been challenged by a lot of things. I've been really inspired by a lot lot of things. The the last two talks uh just because they're fresh in my mind really really have landed very nicely for me. They've been very useful. So I hope you've taken away something uh incredible. I hope you've had good interesting chats with your uh fellow attendees and people at the stalls and and the speakers and all the like. There is chance to do more of that now because we're we don't we're not getting kicked out. We get to go and enjoy the space there. There are refreshments. I I hear whispers. There might be beer for those that like beer. Other things are available as well. Um so uh go and uh go and check that out there. I think it's until 8:00 we have the space and we can uh continue our conversation. Uh, also just keep in mind there are various side events around uh, I mean know they've been happening already and there's more tonight and I think there might be some more tomorrow. Keep an eye on the website for details about side events. The the various sponsors and partners uh, have put those on. I think typically they're free but you usually have to register. So keep an eye out for those because there might be other things that you might want to get involved with. Um, okay. I think that's it from me. I hope you've had a good time today. I hope to talk to you out there in a few minutes. Enjoy your day tomorrow. Thanks very much. Thanks.