Agentic Engineering: Working With AI, Not Just Using It — Brendan O'Leary
Channel: aiDotEngineer
Published at: 2026-04-07
YouTube video id: BEKc4P87XKo
Source: https://www.youtube.com/watch?v=BEKc4P87XKo
Let's talk a little bit about what I mean by agentic engineering. And let's maybe start with a question. If I were to ask you right now, how are you using AI in your work? Could you actually really explain it? Not just, you know, it helps me code faster. It can write code really fast, but like the real workflow. What you hand off, what you keep, how you decide in between. Most engineers can't and that's a little wild to me because 90% of engineers are already using AI tools or have used them. Maybe only half of them are using them on a regular basis, but that's a number that's definitely growing all the time. And that's the current state. So, the question isn't whether your team is using AI, they are. The question is whether you're getting the most out of it or you're just kind of auto completing your way through the day. That gap between using AI and being able to articulate how you work with it, that's what this talk is all about. And really, I think it represents a paradigm shift of how we think about AI. And you know, the history of AI and software engineering is moving uh very fast. It's also very surprisingly short, right? In the 20 early 2020s, we got tools that could finish the lines for you. You type, you know, half of a function signature and the model would guess the rest of it. You know, kind of like auto complete on steroids. It's a neat trick. And then in 2022, models started to be able to suggest entire functions, right? You could describe what you wanted and chat with a model and maybe get a working implementation back. And this is where GitHub Co-pilot first came on the scene and broke through and millions of developers started using it. And for the first time it was starting to seem like maybe AI wasn't a novelty, maybe it was generally useful. But then in 2025, something really broke. It's, you know, what we're living in now in 2026. The the models don't just suggest, they can execute. They can take a task and break it down and figure out which files need to be touched and make the changes and run the tests themselves and then come back with an actual pull request. And so, that's not just fancy auto complete. It's not just a a faster horse. It's a collaborator. It's a different way of working. And Armin, the creator of Flask for those Python folks here, put it, I think, perfectly. We're no longer just using machines. We're now working with them. And that framing, I think, captures this real shift. Right? Tools are things that you pick up and put down. You use a hammer. You don't work with a hammer. But the AI coding agents we have today, they're kind of somewhere more in between and they're maybe a little bit more like working with another engineer. Now, it just happens to be an engineer who's read every Stack Overflow answer ever written. And I think that needs a a mental model shift. And this is the mental model I want you to carry through the rest of this video and honestly through the rest of your, you know, next couple years of your career in working with these tools. I I do think they're still tools, but we have to think about them differently. You kind of have to think about your AI agent as an energetic, enthusiastic, extremely well-read, often confidently wrong junior developer. That junior developer is incredibly fast. They don't easily get tired. They don't have any ego about their code. They'll happily rewrite something six times if you ask them to. And they have an astonishing breadth of knowledge. They've seen lots of languages. They've seen lots of frameworks. They've seen lots of patterns. But, and this is critical, what they don't have is judgment. They don't know your business context. They don't understand the reasons why you made that very specific architectural decision 3 months ago. And they'll confidently write code that is technically correct and contextually wrong. Armand also said that he's gained more than 30% of time in his day because the machine is doing a lot of the work. That's a real gain. But he's getting that 30% because he knows what he can hand off and what he has to keep for himself. He's not just blindly accepting every suggestion. He's directing the work. And that's the difference between using AI and working with AI. And that's what agentic engineering actually means. And so, let's get tactical. If you're an engineer, how do we really get good at this? I think the number one thing to think about is context engineering. And here Karpathy says, you know, context engineering is a delicate art and science of, you know, filling the context window with just what needs to happen for the agent to have the right context for the right iteration for the next step. I think that's really critical for a couple of reasons. First, context is expensive, right? Every token you add into the context is going to add cost because all of those things, that whole chat history, is sent back in as a input tokens every time that you send it. And that, you know, can can add up pretty quickly. And the other key is that more context doesn't always mean better results. And in fact, um it can make the model actually dumber. Right? It's not just about the money. The quality can degrade as you get over about 50% full. And there's lots of things that can trap you here. And not the least of which are, you know, the facts that fact that MCP servers became so popular that we have a lot of these enabled all the time now. Well, each one of those loads more and more context. Uh you know, more and more input code tokens in the context. And and that can be a real problem if you start getting into this dumb zone around 50% context. And that also isn't the only problem because not only can more context be a problem, but bad context can be a problem and can poison everything. Right? So, this happens when you're maybe mis- mixing two different tasks that didn't really overlap. Or you've kind of got some outdated comments either in the code or that you've made to the agent. Or even worse, what I've seen a lot of people do is they start walking down the road with an agent and then realize, "Hey, we're down the wrong path. We've made a lot of wrong decisions." And they try to steer the agent back. But the problem is again, the agent is not doing real reasoning like you and I as a human. Right? It's taking all that context every time. And it may get lost in the middle or even see some of those negative things that you had before as still part of the context. And you see those negative patterns creeping back in if you're not careful. That's why it's better, you know, to not let these things kind of compound. But also, you know, always start a new session once you realize things are kind of off the rails. Right? Because not only is context expensive, the more we have doesn't always mean better quality. In fact, at a certain point there's a tipping point where it means worse quality. And bad context can corrupt the output. So, the real critical thing for engineers is to manage the context. And what does that mean? Well, one, I think it means persisting a lot of information outside of the context window so that we can bring it in, right? So, this is things like scratch pads for things we're working on, memory files, the agents.md, those kinds of files that help the agents have context to what you're working on. We also need to be very selective when we're selecting that context. So, that means only pull in what's relevant for this step of the problem, right? Don't just pull in everything that might be useful. And so, that could mean, you know, things like bringing in the right at mentions for files that we're referencing. That could mean making sure we don't have unnecessary MCP servers enabled. Uh and it means, you know, making sure that the agent has the right data and that we as a human have curated that data for the agent. And then, as it's getting bigger and that that window gets bigger, we want to summarize and trim and compress that context, right? If we've gone through a whole big deep dive and debugging session with the agent and now we think we have the problem and the solution, well, that's great. It might be time to compress that context and just focus the agent back in on, "Okay, now we understand this problem. We're going to go fix it." Uh and then the other most important thing is to isolate context. And I think this is why we've seen this huge rise in the past six or eight months of parallel agents because splitting work across several agents or several sessions can help things not accumulate. And really drive this kind of task separation. And again, if you think about it, aren't these all of the same things that I would tell a brand new engineering manager about about managing a junior engineer? Like the story I tell here is a when I was early in my career, I spent a lot of time as an engineering manager and product manager before I went into the dark arts of developer relations. And in my first job ever as an engineer manager, I was at a healthcare software company. And there was this new thing coming out called an iPad. And that dates me a little bit. Um but it was it was released in the market and we thought this could be a great place to collect patient history, you know, that form you have to fill out every year at the doctor. It's very critical to assessing a lot of your, you know, risk of disease. Um but having to fill it out from scratch every time is is not fun. And so, I designed in this other archaic tool that some people may have heard of called Balsamiq, basically a wireframing tool, a wireframe of what this would look like. Now, that wireframing tool used things like Comic Sans and like silly smiley face icons as placeholders. And a lot of other stuff like that that you'd expect from just a wireframe. And I handed that to a set of interns that we had working for us that summer thinking this is a great green field project for them to take some time on. And you know, a few weeks later I got back a working prototype and the font was Comic Sans and there were silly emoji placeholders. And that's because that's what the spec had in it. And so so whose fault was that? Obviously it was not the intern's fault. It was my fault as an engineering manager not giving the right context to those junior engineers as to what's important, what's not, and what we really need to focus on and what problem we're solving. And so I think the habits that can tie all of that together are you don't need to think about all four of those things for every task, you just need to think about doing one task per session, keep an eye on your context meter, and if you're in doubt and it feels like things are off the route rails, you're probably right. So start a new session, ask it to summarize the session for a new agent. Turns out that AI is really great at writing prompts for AI. So if you've worked on something with an agent for a while, have that agent summarize where you're at, you can now read it, make sure it matches with your understanding and then start a new uh session with just that right context. Again, it's a little bit of art and a little bit of science. So how do we put this into practice? Well, I think there's a lot of workflows, there's lots of things written out there that you can read. I've even compiled a lot of them at path.kilo.ai. It's a where you can find like all of these kinds of trends and ideas and workflow patterns that have been talked about. But what I think I keep coming back to is is maybe one of the simpler ones and that's the research plan implement loop. Right? And I think this really helps us solve for a lot of like classic mistakes that people do when they pick up agentic engineering for the first time or pick up AI to help try to do some engineering. Um and what most people do is say, "Hey, help me implement this feature. I want it to do X and Y." And you know, these large language models are very good at outputting lots of code. In fact, when I joined Kilo Code over a year ago, I made a pronouncement that we would never have our website be just prompt and a whole lot of code flying by. Makes for a great demo and you've seen lots and lots of coding agents that maybe that's how they show it off. But I think the reality is jumping straight into code like that can cause a lot of wrong assumptions, it can waste even more time rather than saving time, and just create a lot of frustration. And it really creates that kind of paradigm that we've seen where people are kind of anti-AI or think that AI is not a useful tool because they've jumped right in and gotten, you know, put garbage in and gotten garbage out. Uh or maybe it's been a while since they've used it, right? I mean, if you think of the the Will Smith eating spaghetti when it comes to AI videos, that's come a long way in just the past two, three, four years. You know, the same is true of the AI coding models, but you have to do what works to give them the best chance at getting a great result. And what that is is first understanding the problem really well and making sure you and the AI agent can understand the problem really well. Then laying out explicit steps for implementing that uh that those changes or fixing that problem. And only then do we jump to the implementation phase where we're writing code. And Dex Horthy has a great uh phrase that he says here, which is a bad line of research can potentially be hundreds of lines of bad code. And so we're really going to focus in on how do we get the research and the plan in place in order to make give ourselves the best chance of having great code come out. So in that first phase, we're going to use a tool that is only going to be focused in on research. And so for Kilo, we call that ask mode. And the reason we call it that is because the ask mode can't actually do anything. It can only chat. It can't write files. It can maybe read files if you let it, but it can't, you know, start trying to code a solution. And so instead of trying to to code a solution from the beginning, we're going to first try to understand the system. You know, how does it actually work today? Where are the right files that are going to be involved? What are the right paradigms that we want to mirror or how does this differ from something that we have already? And you know, just kind of learn where in the code base this this is going to go and you know, how the data is going to flow through the system and how it's going to change with our change as well as like any edge cases we can need to consider, right? AI is really great at brainstorming and so it can help you kind of brainstorm those things and make sure you've really covered all of your bases. And then once you're done that research, what's going to come out of that is an actual output document that shows the the details of that research that you can then read and basically agree with and understand, "Hey, this this matches my understanding of the problem. I think we're ready to move on to the plan." And so then once we've reviewed that as a human, now we can say, "Okay, let's outline the next steps. What kind of you know, files are we going to create or or change? Maybe there's some code snippets, but not always is it a good idea to have a code snippet in the plan. We are definitely going to include like how is how are we going to verify and know this change is correct? What are the test either changes or additions that we're going to make to know that? And we're also going to be really explicit at the plan planning phase about what is in and out of scope, what is going to change, what isn't going to change. And again, the output of that is going to be a very clear plan file, right? You'll see a lot of repositories nowadays have a folder called plans. Right? And we want to have that plan file be step-by-step instructions with specific changes that we're going to make that have test commands to verify it, that has a strategy for understanding how it's going to change the system. And it's going to be very clear so that we can even use maybe a smaller, faster, or cheaper model to implement it because we've spent the time in the research and plan phase to really understand what we're going to be doing once we get to implementing the change. And when we come to implementing the change, we now can start over a new session and give it just the plan execution. It allows us to keep the context in that session very low. It allows us to carefully review each change and I think commit very frequently. Now, I used to work at a company called GitLab for many, many years. Uh so maybe I'm a little biased towards Git, but I think Git can be a huge helper here when it comes to helping you slowly iterate and understand the changes that the agents are making. I treat Git on my local machine kind of like my own first pull request review with my agents before I maybe put up an actual pull request for my uh you know, for my colleagues to look at. But I think again, it's critical to understand here that human research at the planning or sorry, human time at the planning and research phases is really the highest highest leverage use of your time. By the time you're implementing, you want to have all that hard thinking done. Uh and that's really critical cuz again, going back to Dex Horthy who's who's spoken a lot on the subject and uh I I highly recommend you check out his you you know, videos of him on YouTube talking about this. He says very aptly that AI can't replace thinking. It can only amplify the thinking you've done or the lack of thinking you haven't done or you know, the fact that you haven't thought it through. And so let's talk about how we can figure our agents kind of like one more step down from this this uh paradigm of research plan implement to really make sure we do this. So first we talked about modes and customizations. We already talked about these modes, ask, code, architect. These modes that are specialized and focused on the thing that we're trying to get done. Right? Architect is maybe for planning. Ask mode is for research. Code mode is for actually implementing. Uh then we also want to have, you know, a set of rules that make sense for our workspace, right? For the the repository we're in. Uh or maybe globally on our machine so that we understand, you know, that we have a certain set of rules that we always want to adhere to. Uh and agents are pretty good at loading in and understanding those rules. Uh but we have to have them written down for them to have those in their context, right? And so I think a lot of the agent behavior then is are things that we want to tweak as we're learning, right? How many Do we want to do multiple agents at a time? Do we want those agents to use work trees so that we can then again, merge them back in to our local uh repository locally before committing them to to a pull request? Uh how much do we want to auto-approve, right? So most agents have the ability to tune, you know, what are the things that it can do independently? What are the tools it can use independently? Can it read files? Can it read files inside or outside of the workspace? Uh can it run tests? You know, what can the agent do autonomously without your intervention versus what do you need to approve? Yeah, I think this is something that you have to set up to be comfortable with in the beginning and then also you need to be comfortable changing as you learn how to use these tools. And then I think a good mental model um for this agent configuration is maybe kind of three distinct buckets, right? We talked about modes, right? This is that that role-based configuration, you know, a behavior of an agent that we want. Uh but there's two other really key things and that is the agents.md and then skills.md that you'll hear about. Uh and so what are those what's the difference between the two? Well, the agents.md is now quickly becoming the de facto standard for where all agents go kind of for their readme, for the like always-on rules and details about the project. Uh so I think it's critical that your project has an agents.md with a minimal amount of information that an agent needs to know about, you know, what are the conventions that we're using, what are the commands that we're using to get it built or tested, and like what are the requirements around testing, uh or requirements that we need to be sure check off before committing. And then skills are kind of more of a specific workflow, right? So there's reusable kind of playbooks for agents. So if there's something that you're doing a lot, you're making motion graphics with their motion often, or you're um you know, doing some sort of like uh daily or weekly or monthly change log compiling, those kinds of things are great to put in as skills that an agent can then pick up when it needs it to do those specific kinds of workflows. And so typically those are on demand and you say, "Hey, let's use this skill for this task." Versus the agents is almost always loaded into the context for the agent, so it knows what's going on. And then of course, I I work at Kilocode, and so I've got some power user tips there, um but I think some of these many of these apply, you know, regardless of which agent you're using, but I think they're critical as you kind of get comfortable with those first kinds of paradigms. How do I now customize this and make it work for me? And one is at-mentioning for context. So mentioning files or commits or, you know, things from the terminal that output. Those kinds of things and bringing them into the context quickly are really helpful. Uh using slash commands to do things like starting a new task when we need to, or condensing the context when it's getting too full. Uh those kind of quick commands can help us move a lot faster. Uh we also can, if we're working in in VS Code uh with Kilocode, we can select uh a section of of code and right-click and say add to Kilocode, and then that context is brought right in there, and I can then talk or ask or uh questions about the that code, or ask the agent to change a certain part about that code. Uh and then of course, we have autocomplete built in as well, which I think is still useful, especially because we also have it not just in code, but as you're prompting. And then kind of beyond the IDE, I think we're seeing, you know, also this shift this year in, you know, where else do I want to be able to use this? In the CLI, from my mobile phone, in a cloud agent, directly in Slack. Right? The ability to kind of use these agents wherever you are is something that's becoming more expected uh of of everyone and everyone's agents. And I think that's a good thing. I think that means that we're starting to learn how we can use this these agents again more like a collaborator that's everywhere that we need to be. And then one other thing that I want to talk about um are is getting other context things in. First of all, model context protocol, right? Context is right in the name. Um the idea of this is, you know, fundamentally these models originally can only like it receive input tokens and create output tokens, right? Uh and slowly but surely we've been enabling them to use tools where they can, you know, make tool calls out uh and affect things in the environment, like running tests. Uh the MCP, the concept of MCP basically expands this to say, "Hey, I want to give other tools." Right? For instance, the GitHub MCP gives the agent a lot of tools to interact with the GitHub API, look up pull requests, um look up comments, look up issues, and understand a lot more about your your GitHub environment, right? Um or context seven helps it look for up-to-date framework documentation, because of course, as you know, the LLMs kind of have a cutoff date where their knowledge cuts off, and then then anything that's improved since then they don't know about. Um so these MCP servers can be very helpful, and there's there's thousands of them out there. Uh but the concern is that every one of them is going to add at least some information, right? Details about those tools that it has to the system prompt that gets sent every time uh you're having an interaction with an agent. And so you want to make sure, if you're not actually using that, to disable it, right? Let's say I have a Postgres MCP that connects to my database, and I'm doing a whole bunch of front-end work that doesn't involve the database at all. Well, that Postgres MCP is just going to be wasted tokens, and maybe even worse, tokens that help, you know, kind of confuse the agent and and not understand that it's not supposed to touch the database right now. Uh so we want to be really careful to not like overuse MCPs. And then another thing we hear from um enterprises a lot is how do we work with internal platform APIs? Uh and I think that, you know, there's kind of four different ways of doing that. One, if there's already an OpenAI open API spec for it, or Swagger spec, use that. If there's not, then convert it to markdown so that you can save that markdown, you know, in the agents.md or somewhere else in the repository to reference it. Uh and if it's something that changes a little bit more frequently, maybe you do need to have like a reference URL that you can pull in uh and have the agent go pull every time to see the latest and greatest. Uh and then we've seen some customers who, you know, have complex multi-step, multi-system workflows, where building their own MCP server might be the right choice. But, you know, one way or another, I think the the key is to, when working alongside Kilo or any of these agents, you know, isolate your work from the agent's work, and then review that agent's work as a pull request, right? That helps you understand, you know, how can I um best review the code just like I would review a junior engineer's code. And so that's really the presentation that I have on Kilo. We've got some exciting new features coming up. We've got, you know, expanded across all these surfaces. Uh we also have a big focus on Openclaw and Kiloclaw and making a very safe way to use um Openclaw agents. Uh and so if you haven't taken a look at Kilo, I've just a little plug at the end here, visit kilo.ai, uh and we'd love to get your feedback on what we're building. And you know, just kind of to give you, you know, where do we go from here? Again, I think you've kind of got to pick a tool and get lots of reps, right? We said earlier on that, you know, it's part art and part science, and I think that just means you need a lot of reps, right? To kind of get the feel for what can I trust the models to do, and what can't I trust the models to do. Uh and then try this research, plan, implement, feedback loop. See how that works for you. Um and I think maybe you'll end up like some of these other senior engineers who have said, "Hey, look, I'm having more fun programming now than I've had in in years and years." Uh as we, you know, farm out some of this tedious work to AI agents and let our brains work on the harder engineering problems. Thanks.