Agents need more than a chat - Jacob Lauritzen, CTO Legora
Channel: aiDotEngineer
Published at: 2026-04-22
YouTube video id: XNtkiQJ49Ps
Source: https://www.youtube.com/watch?v=XNtkiQJ49Ps
[music] >> How's everyone doing? Still good? Right. It's 5:00 p.m. on a Friday. There's just me and two more people behind you and Friday beer, so I'll try to be a little bit quick here. I'm here to talk to you guys today about vertical AI and and complex agents and why I think they need more than just the chat. If you've ever worked with a long-running complex agent, you've probably tried something like this. Sorry that it's all white. I can see the flash banging your guys' face. Um you're told to research something, draft a contract, make no mistakes, and um it starts thinking, it starts reading, launches a bunch of sub-agents, does web search, writes files, launches more sub-agents, does more reading, writes more files, keeps going, takes forever, after 30 minutes, it gives you your contract. You take a look. Clause three doesn't look right. What Did you make a mistake here? Could you, you know, look at another document? You're absolutely right. Then you see this, compaction. That's when you know you can give up. It's going to forget everything. It's in the the the context rot state. Anyway, it continues, it keeps on going, and uh you get a new contract. Does it look Was it only clause three that was changed? Probably not. And so you end up in this state. Not the greatest experience. My name is Jacob. I'm the CTO of Legora. We are a collaborative AI workspace for law firms, so we're a vertical AI company. We have more than 1,000 customers, more than 50 markets. We've raised a bunch of money. Uh we're growing extremely fast. Um I'm being told maybe the fastest in history. Um we are also hiring engineers in London. So in case anyone's interested and wants to be on this growth journey, please talk to me after my talk. Um our goal and the goal of most vertical AI companies is to make agents complete more and more complex work end-to-end. That's sort of doing that has changed a lot in the past 6 to 12 months because there are new economics of production. So it used to be if you wanted to complete end-to-end work, that you would be focused on doing the work. Right? That would be sort of the main thing is actually just getting it done. But today things look a little bit different cuz right now planning work and reviewing work is the new bottleneck. So doing the actual work is extremely cheap. It's very easy to do. But now you have to spend time planning, you have to get the non-functional requirements, you have to get the specs, and you have to spend a lot of time reviewing the work. And if anyone's reviewed big PRs on GitHub, it really sucks. It's extremely painful. Um maybe if you're super AI pilled, you just get your AI agents to review their own work. No humans involved. Maybe it works, maybe it doesn't. And when we think about completing complex work, both the planning stage, the doing stage, and the reviewing stage, the verifier's rule is a good way to think about work. So verifier's rule is a a term that was coined by Jason, which states that if it's a task is solvable and it's easy to verify, then it's going to get solved by AI. He was primarily talking about foundational models, so sort of if you can make something very easy to verify, then you can do RL environment, you can post-train, it's going to solve it. I think it also goes for agents. You know, if you can make a task verifiable, you can just run an agent in a loop and tell it, "Hey, you did this wrong. Please fix it." and it'll eventually get there. Different industries are different places in this spectrum. Um it's a little bit more complex than just this because verticals have tasks that are different places on the spectrum. So if you take legal, we can check definitions in a contract, super easy to verify, super easy to get done. Writing a contract is very easy to solve, but actually extremely difficult to verify cuz if you think about it, when you write a contract, the only time you can actually verify if, you know, the language you used works is if it goes to court and a judge basically verifies it, tells you if it's good or not. So that's actually quite complex. Litigation strategy is also basically impossible to verify. If you don't know what litigation is, it's when you sue someone or someone sues you. I know we're in Europe now, but the Americans really love doing this all the time. Um but essentially, if you ask five lawyers, "What should be the right strategy for this litigation case?" they're going to give you different answers. And so there's no objective truth, which means it's basically impossible to verify and it's really difficult for AI to solve. Similarly on coding, some parts of it are easy, building a successful consumer app, very difficult to verify. So when we think about this, um we think about how to involve humans where it really matters and let agents do the work that we can let them do. There's two things that are important um to think about with agent-human collaboration. Control is the first one. Control is how effectively can a human instill their knowledge into the work that the agent is doing. So how effectively can I steer it? Control is a matter of how much do I need to review. So if I have very low control, I'm going to look at every single agent trace and see exactly what it did. If I have very Oh, sorry, low trust. If I have very high trust, I won't look at it at all. Depending on where the task falls in sort of the the chart, different things are important. How to increase trust. So if you want to increase trust, there's a few different things you can do. Firstly, you can bring a task down in the spectrum. So here is an example from coding. If you want to implement a feature, well, you can give it browser access, you can do test-driven development, and then suddenly it's actually a verifiable task and it's going to do much better. There are similar things you can do in finance and in legal, um you can do something similar as well. We don't have Let's take the contract example in legal. You can't really verify it, but you can look for a proxy for verification. So for contracts, what you can do is you can take a look at previous contracts. These are our golden contracts. We know they work well. Let's set up a test. Is it the new contract Is it similar to the old one? That's sort of a proxy for verification that's going to allow your agent to do much better job. You can also decompose tasks. So here's the example with writing a contract. I can turn that from one task into a bunch of other tasks, and I can leave picking a risk profile, picking the precedent documents, the negotiation stance, I can leave that to the human, but I can try to get other stuff done where it's easy to verify. So apply formatting, make it look like all my other contracts. Apply checking definition, which is essentially linting. Are all definitions used? Are all the definitions that are used to defined? This kind of stuff you can build, and then the agent can basically rip much better. You can also add guardrails. And guardrails is essentially a way to increase trust by limiting what the agent can do. So instead of being able to do all of this, you're just going to say you can only do these, you can only edit these three files, you can only read these from this directory, you can only search these websites. By limiting what it can do, you basically get more trust cuz you know they won't do all these weird things. An example of this, probably all know this one, Claude Code. If there's very low trust, it's going to basically tell you every single time it wants to do anything, which makes it extremely useless. Uh and on the high trust end of the spectrum, you just YOLO mode it, let it rip, and hope that it doesn't delete your prod database. Then there's control. So how do we increase control? Well, if you think about complex agent work, you can kind of think about it as a tree of work, as a DAG essentially. So here's an example where I wanted to write a report on a bunch of employment contracts. So the agent's going to say, "Okay, let me research the organization first. Then I want to review the contracts and I'm going to review for a few different things for each of the contracts. And then I'm going to draft a report at the end." This is extremely low control because essentially, I can only impose my judgment at the root level. So it's going to do all of this work and then it's going to get back to me and then I can try to talk to you again, and that's just basically the example I gave at the beginning. So very low control. Then there's planning. Planning essentially allows you to steer the agent up front and align on the approach. And so with planning here, it might say, "Okay, you should absolutely take these steps. These are correct. These are the clauses you should be looking for. This is what you want to review." So this is a good step. It gives you a bit more control. It's easier to impose what you want it to do. The problem is planning, you basically have to do all the work to just know what to do. I'm sure people have tried this in Claude Code. You basically have to go through the entire thing. It's really inefficient. It takes a long time and asks you a bunch of questions, and in the end, it's basically impossible for it to really know if you it has all the information it needs. Let's say for one of these contracts, there's a special clause. It wouldn't know that in the planning step. You can't really tell it what to do when it sees that because it hasn't done all the work. Essentially, you could compare planning to working with a co-worker that's uh comes up to you, tells you about the approach, you align with them, and then you never ever hear from them again until they deliver the final document. It's not a super nice way to collaborate. This is a good thing we have right now, but um I don't think planning is going to stay around. Then we have skills. Skills are really, really, really good. They are really good because the skills allow you to encode human judgment into essentially the nodes of work the that happen here. So I can say whenever you review confidentiality, you should do it in this way. And the really good thing about this is it allows for contingencies. So here at one of the termination reviewing termination clauses, there's a special EU law. But I have that in a skill, so that means whatever happens when it actually does the work, it knows how to handle that special case. You can't really do this with planning. There's also progressive discovery, which again is really awesome. Whatever happens, it it knows it'll pick it up. The problem is um you don't have skills for everything. The next step [snorts] is then uh to use elicitation, which means ask the user. Ask the the human. So you might have skills as well, but then instead of you giving all the info, it's going to come to you. It's going to say "Hey, here's the thing I don't know how to handle, and what do you want me to do?" This uh makes a lot of sense, first of all. Um what you don't want is you don't want the agent to be blocked. So ideally, if you implement this, what you do is you tell the agent "If you're unsure about something, make a decision, unblock yourself, but write this to a decision log." So then the human can review the decision log afterwards and reverse decisions if it needs to. Now the right UX for this, if you imagine this work, this tree, being 10 times bigger, 100 times bigger, um you don't want this in a chat. You don't want to open up a chat and then it's infinitely long. You have to answer 50 questions. You wouldn't know what to answer. You wouldn't really be able to do it because you don't have the right context. So not chat. Chat is one-dimensional. It's a very low bandwidth interface, and it tries to collapse this work tree into a single sort of linear thing. So what's a better interface? Well, I think humans and agents should collaborate in high bandwidth artifacts. I think they need to work in things that are maybe typically persistent, um and they will look different industry to industry, vertical to vertical, depending on what task you're solving. So an example from us is um a document. That's like a durable interface where it makes sense to collaborate. That's how you collaborate with your co-workers. You can highlight clause three and it will only change clause three. You can add comments. You can tag your agents. You can tag your collaborators. You can hand off parts of the document to special agents. Another example is our tabular review, which is essentially I ask it to do um the contract review that I talked about, and it's going to say, "Okay, let me spin up a tabular review, which is like a known primitive that our users know." And it looks like this, and then it's going to say, "I'm going to review all the contracts, and I'm going to just flag a few items for you that I want your take on." And then I can go in there and I can see very quickly where the problems are. So it's high control. I it's very effective for me to instill judgment. And I can also very quickly get an idea for what the agent has actually done. So reviewing is easy. And then once I've done that, I can just kick off the rest of the agent. Right now, what we're seeing a lot is the convergence of UI, basically um this is post-hoc and linear. Uh within last 2 weeks, shipping this new UI. Um to be clear, chat boxes as input is great. I think you get a lot it's extremely flexible, allows you to do a lot of stuff, but you don't want chat to be your main mode of collaboration with a complex agent. The good thing about this is language is essentially the universal interface. It's what people use to communicate. You can do everything with the voice. Um but agents aren't humans. Just a few minutes ago, I was um talking to a potential candidate for Legal and I was describing our org chart, and um I was limited because I can only use language. I wish that I could just draw up an org chart and they could interact with it and they could use it, but I can't because I'm a human. Uh I'm limited by language, but agents are not humans, and so we should not constrain them to human language. Thank you. >> [music]