Agents need more than a chat - Jacob Lauritzen, CTO Legora

Channel: aiDotEngineer

Published at: 2026-04-22

YouTube video id: XNtkiQJ49Ps

Source: https://www.youtube.com/watch?v=XNtkiQJ49Ps

[music]
>> How's everyone doing? Still good?
Right. It's 5:00 p.m. on a Friday.
There's just me and two more people
behind you and Friday beer, so I'll try
to be a little bit quick here.
I'm here to talk to you guys today about
vertical AI and and complex agents and
why I think they need more than just the
chat.
If you've ever worked with a
long-running complex agent, you've
probably tried something like this.
Sorry that it's all white. I can see the
flash banging your guys' face.
Um you're told to research something,
draft a contract, make no mistakes, and
um it starts thinking, it starts
reading, launches a bunch of sub-agents,
does web search, writes files,
launches more sub-agents, does more
reading, writes more files,
keeps going, takes forever, after 30
minutes,
it gives you your contract.
You take a look.
Clause three doesn't look right. What
Did you make a mistake here? Could you,
you know, look at another document?
You're absolutely right.
Then you see this, compaction. That's
when you know you can give up. It's
going to forget everything. It's in the
the the context rot state.
Anyway, it continues, it keeps on going,
and uh you get a new contract.
Does it look Was it only clause three
that was changed?
Probably not.
And so you end up in this state.
Not the greatest experience.
My name is Jacob. I'm the CTO of Legora.
We are a collaborative AI workspace for
law firms, so we're a vertical AI
company. We have more than 1,000
customers, more than 50 markets. We've
raised a bunch of money.
Uh we're growing extremely fast.
Um
I'm being told maybe the fastest in
history.
Um
we are also hiring engineers in London.
So in case anyone's interested and wants
to be on this growth journey, please
talk to me after my talk.
Um
our goal and the goal of most vertical
AI companies is to make agents complete
more and more complex work end-to-end.
That's sort of doing that has changed a
lot in the past 6 to 12 months because
there are new economics of production.
So it used to be if you wanted to
complete end-to-end work, that you would
be focused on doing the work.
Right? That would be sort of the main
thing is actually just getting it done.
But
today
things look a little bit different cuz
right now
planning work and reviewing work is the
new bottleneck. So doing the actual work
is extremely cheap. It's very easy to
do.
But now you have to spend time planning,
you have to get the non-functional
requirements, you have to get the specs,
and you have to spend a lot of time
reviewing the work. And if anyone's
reviewed big PRs on GitHub, it really
sucks. It's extremely painful. Um
maybe if you're super AI pilled, you
just get your AI agents to review their
own work. No humans involved.
Maybe it works, maybe it doesn't.
And when we think about
completing complex work, both the
planning stage, the doing stage, and the
reviewing stage,
the verifier's rule is a good way to
think about work. So verifier's rule is
a a term that was coined by Jason, which
states that if it's a task is solvable
and it's easy to verify, then it's going
to get solved by AI.
He was primarily talking about
foundational models, so sort of if you
can
make something very easy to verify, then
you can do RL environment, you can
post-train, it's going to solve it.
I think it also goes for agents. You
know, if you can make a task verifiable,
you can just run an agent in a loop and
tell it, "Hey, you did this wrong.
Please fix it." and it'll eventually get
there.
Different industries are different
places in this spectrum. Um it's a
little bit more complex than just this
because verticals have tasks that are
different places on the spectrum. So
if you take legal,
we can check definitions in a contract,
super easy to verify, super easy to get
done.
Writing a contract is very easy to
solve, but actually extremely difficult
to verify cuz if you think about it,
when you write a contract, the only time
you can actually verify if, you know,
the language you used works is if it
goes to court and a judge basically
verifies it, tells you if it's good or
not. So that's actually quite complex.
Litigation strategy is also basically
impossible to verify.
If you don't know what litigation is,
it's when you sue someone or someone
sues you. I know we're in Europe now,
but the Americans really love doing this
all the time.
Um
but essentially, if you ask five
lawyers,
"What should be the right strategy for
this litigation case?" they're going to
give you different answers. And so
there's no objective truth, which means
it's basically impossible to verify and
it's really difficult for AI to solve.
Similarly on coding, some parts of it
are easy, building a successful consumer
app, very difficult to verify.
So when we think about this, um
we think about how to involve humans
where it really matters and let agents
do the work that we can let them do.
There's two things that are important um
to think about with agent-human
collaboration.
Control is the first one. Control is how
effectively can a human instill their
knowledge into the work that the agent
is doing. So how effectively can I steer
it?
Control is a matter of how much do I
need to review. So if I have very low
control, I'm going to look at every
single agent trace and see exactly what
it did. If I have very Oh, sorry, low
trust. If I have very high trust, I
won't look at it at all.
Depending on where the task falls in
sort of the the chart, different things
are important.
How to increase trust. So if you want to
increase trust, there's a few different
things you can do.
Firstly, you can bring a task down in
the spectrum. So here is an example from
coding. If you want to implement a
feature, well, you can give it browser
access, you can do test-driven
development, and then suddenly it's
actually a verifiable task and it's
going to do much better.
There are similar things you can do in
finance and in legal, um you can do
something similar as well. We don't have
Let's take the contract example in
legal. You can't really verify it, but
you can look for a proxy for
verification. So for contracts, what you
can do is you can take a look at
previous contracts. These are our golden
contracts. We know they work well. Let's
set up a test. Is it the new contract Is
it similar to the old one? That's sort
of a proxy for verification that's going
to allow your agent to do much better
job.
You can also decompose tasks. So here's
the example with writing a contract. I
can turn that from one task into a bunch
of other tasks, and I can leave picking
a risk profile, picking the precedent
documents, the negotiation stance, I can
leave that to the human, but I can try
to get other stuff done where it's easy
to verify. So apply formatting, make it
look like all my other contracts.
Apply checking definition, which is
essentially linting. Are all definitions
used? Are all the definitions that are
used to defined? This kind of stuff you
can build, and then the agent can
basically rip much better.
You can also add guardrails. And
guardrails is essentially a way to
increase trust by limiting what the
agent can do. So instead of being able
to do all of this, you're just going to
say you can only do these, you can only
edit these three files, you can only
read these from this directory, you can
only search these websites. By limiting
what it can do, you basically get more
trust cuz you know they won't do all
these weird things.
An example of this, probably all know
this one, Claude Code. If there's very
low trust, it's going to basically tell
you every single time it wants to do
anything, which makes it extremely
useless.
Uh and on the high trust end of the
spectrum, you just YOLO mode it, let it
rip, and hope that it doesn't delete
your prod database.
Then there's control.
So how do we increase control? Well,
if you think about complex agent work,
you can kind of think about it as a tree
of work, as a DAG essentially. So here's
an example where I wanted to write a
report on a bunch of employment
contracts.
So the agent's going to say, "Okay, let
me research the organization first. Then
I want to review the contracts and I'm
going to review for a few different
things for each of the contracts.
And then I'm going to draft a report at
the end."
This is extremely low control because
essentially, I can only impose my
judgment at the root level. So it's
going to do all of this work and then
it's going to get back to me and then I
can try to talk to you again, and that's
just basically the example I gave at the
beginning.
So very low control.
Then there's planning.
Planning essentially allows you to steer
the agent up front and align on the
approach. And so with planning here, it
might say, "Okay, you should absolutely
take these steps. These are correct.
These are the clauses you should be
looking for. This is what you want to
review."
So this is a good step. It gives you a
bit more control. It's easier to impose
what you want it to do.
The problem is
planning, you basically have to do all
the work to just know what to do.
I'm sure people have tried this in
Claude Code. You basically have to go
through the entire thing. It's really
inefficient. It takes a long time and
asks you a bunch of questions, and in
the end, it's basically impossible for
it to really know if you it has all the
information it needs. Let's say for one
of these contracts,
there's a special clause.
It wouldn't know that in the planning
step. You can't really tell it what to
do when it sees that because it hasn't
done all the work.
Essentially, you could compare planning
to working with a co-worker that's uh
comes up to you, tells you about the
approach, you align with them, and then
you never ever hear from them again
until they deliver the final document.
It's not a super nice way to
collaborate.
This is
a good thing we have right now, but um
I don't think planning is going to stay
around.
Then we have skills. Skills are really,
really, really good. They are really
good because the skills allow you to
encode human judgment into essentially
the nodes of work the that happen here.
So I can say whenever you review
confidentiality, you should do it in
this way.
And the really good thing about this is
it allows for contingencies. So here at
one of the termination reviewing
termination clauses, there's a special
EU law. But I have that in a skill, so
that means whatever happens when it
actually does the work, it knows how to
handle that special case.
You can't really do this with planning.
There's also progressive discovery,
which again is really awesome. Whatever
happens, it it knows it'll pick it up.
The problem is um you don't have skills
for everything.
The next step [snorts]
is then uh to use elicitation, which
means ask the user. Ask the the human.
So you might have skills as well, but
then instead of you giving all the info,
it's going to come to you. It's going to
say "Hey, here's the thing I don't know
how to handle, and what do you want me
to do?"
This uh makes a lot of sense, first of
all. Um what you don't want is you don't
want the agent to be blocked. So
ideally, if you implement this, what you
do is you tell the agent "If you're
unsure about something, make a decision,
unblock yourself, but write this to a
decision log." So then the human can
review the decision log afterwards and
reverse decisions if it needs to.
Now the right UX for this, if you
imagine this work, this tree, being 10
times bigger, 100 times bigger,
um you don't want this in a chat. You
don't want to open up a chat and then
it's infinitely long. You have to answer
50 questions. You wouldn't know what to
answer. You wouldn't really be able to
do it because you don't have the right
context. So
not chat. Chat is one-dimensional. It's
a very low bandwidth interface, and it
tries to collapse this work tree into a
single sort of linear thing.
So what's a better interface? Well,
I think humans and agents should
collaborate in high bandwidth artifacts.
I think they need to work in things that
are maybe typically persistent, um and
they will look different industry to
industry, vertical to vertical,
depending on what task you're solving.
So
an example from us is um a document.
That's like a durable interface where it
makes sense to collaborate. That's how
you collaborate with your co-workers.
You can highlight clause three and it
will only change clause three. You can
add comments. You can tag your agents.
You can tag your collaborators. You can
hand off parts of the document to
special agents.
Another example is our tabular review,
which is essentially I ask it to do um
the contract review that I talked about,
and it's going to say, "Okay, let me
spin up a tabular review, which is like
a known
primitive that our users know."
And it looks like this, and then it's
going to say, "I'm going to review all
the contracts, and I'm going to just
flag a few items for you that I want
your take on." And then I can go in
there and I can see very quickly where
the problems are. So it's high control.
I it's very effective for me to instill
judgment. And I can also very quickly
get an idea for what the agent has
actually done.
So reviewing is easy. And then once I've
done that, I can just kick off the rest
of the agent.
Right now, what we're seeing a lot is
the convergence of UI, basically
um
this is post-hoc and linear.
Uh within last 2 weeks, shipping this
new UI.
Um
to be clear,
chat boxes as input is great. I think
you get a lot it's extremely flexible,
allows you to do a lot of stuff, but you
don't want chat to be your main mode of
collaboration with a complex agent.
The good thing about this is language is
essentially the universal interface.
It's what people use to communicate. You
can do everything with the voice. Um
but agents aren't humans.
Just a few minutes ago, I was um talking
to a potential candidate for Legal
and I was describing our org chart, and
um
I was limited because I can only use
language. I wish that I could just draw
up an org chart and they could interact
with it and they could use it, but I
can't because I'm a human.
Uh I'm limited by language, but agents
are not humans, and so we should not
constrain them to human language. Thank
you.
>> [music]