How to build Enterprise Aware Agents - Chau Tran, Glean

Channel: aiDotEngineer
Published at: 2025-07-24
YouTube video id: hxFpUcvWPcU
Source: https://www.youtube.com/watch?v=hxFpUcvWPcU
[Music]
Thanks Alex for the introduction. That
was a very impressive LLM generated
summary of me. Uh I've never heard it
before but uh nice. Um so um today I'm
going to talk to you about something
that has been keeping me up at night. Uh
probably some of you too. So how to
build enterprise aware agents. How to
bring the brilliance of AI into the
messy complex realities of uh how your
business operated.
So let's jump straight to the hottest
question of the month for AI builders.
Uh should I build workflows or should I
build agents?
So what are workflows? Workflows are
system where LLMs and tools are
orchestrated through predefined code
paths. So there are two main ways where
you can um represent the workflows. The
first way is through uh imperative code
base. So these are the workflows where
you you know write a program that calls
LMS uh read the response and then call
tools and so like uh do this in a
traditional programming flow and then
here you get have like direct control of
the execution of uh all the steps. The
second way to represent workflow is
through uh declarative graphs. So in
this way you sort of um represent your
workflow as like a graph of where nodes
are sort of like steps where you can
call tools or call llms and then there
sort of edge between nodes. Um so you
kind of define the structure but not
execution and the execution of this is
usually handled by some framework uh
workflow frameworks. So I'm not going to
go into the details of pros and cons for
these two approaches but um the main
point here is like for workflows you get
structure and predictability. So if you
run a workflow today it will mostly
behave the same way uh if you run it
tomorrow.
On the other hand, um we have agents
which are systems where LLM sort of
dynamically direct their own processes
of like decide how to achieve a task
like decides what tools to go uh what
step to take depends on the task itself.
Um so the core agent loop is pretty
simple. So it receive a task or like a
goal from a human and then it uh sort of
enter this iterative loop where it uh
plan what to do and then execute the
action and then read the results from
the environment and sort of iterate
until uh it uh gets all the result. It's
one and then uh respond to the user.
So what are the tradeoffs between
workflows and agents? Um workflows are
sort of like the Toyota of AI systems.
Uh it's very predictable. Um it's good
for when you want to automate uh
repetitive tasks uh or like encode
existing best practice or like know how
in your in your business. This is
usually lower cost and lower latency
because you don't have to spend time on
this all this LLM calls to decide what
to do. And they're also also easier to
debug because like you have this code or
this graph that you can manually
pinpoint uh at which step is going wrong
in in the execution. And in building
workflows uh humans are sort of in
control like you can control your
destiny like given even given u
imperfect LMS uh you can sort of do
tweaks and engineering so that your task
work right now. On the other hand,
agents are sort of like the Tesla of AI
systems. Like it's more uh you know
open-ended. This is good for like
researching unsolved problems. Uh it's
also usually good at taking advantage of
um better and better LM capabilities
because here the AI is in control. Um
generally it's higher cost and latency
because you need LLM to like figure out
what to do and then but the uh upside is
like there's less logic to maintain the
call loop is very simple and u sometimes
you get like this uh hints of brilliance
that always feels like you know
everything is going to be automated in a
few months. Um the problem is like your
your Tesla like it's works very well
most of the time but sometime it still
take the wrong exit on the highway and
that's when you kind of miss your toa so
and the decision to build workflows or
agents is a pretty tricky one because it
depends highly on the state of the LLM.
Um so some workflows that doesn't work
in the agentic loop now might start to
work later in a few months when the new
model has come out. So it's uh it's a
really huge dilemma.
Um but recently one thought um that's
sort of really changed how I think about
it is what if you don't really have to
choose right so if you think of agent
what they do is when you give the agent
a task it will figure out the steps that
needs to be done to achieve that task.
Right? So um you give it the task you
figure out the one step take the action
figure out the next steps and then at
the end when the agent finish the
execution and then you look at the trace
of what happened all those series of
steps is a workflow. So if I represent
this in like a uh programming kind of
way then agent takes a st a task and
then generate a workflow to achieve that
task. Um
so if we think of it this way agent take
a task and generate a workflow then you
can sort of see like there are really
good synergies between workflows and
agent. So the first thing is you can
actually use workflows as uh evaluation
for your agents right so uh let's say in
in your company you can collect a huge
amount of um golden workflows like given
a task this is the steps that uh needs
to be done to solve that task and you
have a huge list of uh of those uh sort
of handbook on on how um to do things in
your company then you can actually
evaluate your agents by uh you know like
give it a task see what it did and
compare it to the the golden workflow
like did it actually figure out the
right steps. So this is a little bit
different from evaluating end to end.
You are not judging agent by uh the end
response but like by uh whether it
actually did the right step to get to
that end response.
Um the second and uh even better way uh
for workflows to help help agents is you
know given that same golden uh workflows
library you can also use it to train
your agents. Um so here you truly get
the best of both worlds where you know
with the data feeding you can uh your
agents will be able to execute the exact
workflow that you have in your library
for the known task. Um but then Oracle
um it can also rely on its own uh
internal reasoning capabilities to sort
of compose different workflows together
to uh achieve new tasks and even use its
own reasoning to kind of extend uh what
you teach it but like make it better.
Um and then agents can also help
workflows as well. Uh one way to do that
is um for workflow building platforms uh
you can use an agent to generate the
workflows. Um so this is sort of how uh
glean agents work under the hood where
the user can give uh the workflow video
like a sort of natural language
description of the task it is trying to
achieve and then we run an agent
implementation to figure out the steps
that are needed to achieve that
workflow. then the user can sort of like
uh make edit or like add change uh the
workflow that that the the agent was uh
proposing.
Um and lastly and I think is like uh the
most powerful
um synergy is you can use agents as a
workflow discovery engine, right? So you
ship an agent uh users try to accomplish
new task with your agent and then when
they find that the agent did a good job
then you can sort of save that workflow
as like okay this is how you do this
task in my company and then over time
you can use this um as like training
data to help agents get better.
Cool. Um so that was the main points of
my talk. Um, I guess maybe some of you
are thinking, do we still need this kind
of stuff in a world where we have AGI?
Um, so here's here's my thought
experiment and uh why I think this maybe
still needed after AGI. So AGI is going
to be a super intelligent employee,
right? Um but if they if AI doesn't know
about uh how your company works, it's
sort of like uh a really good employee
who just joined and doesn't know about
all the business practices and still
needs on boarding needs to know like who
to talk to to get unblocked and like uh
all the very nuanced ways of doing
things in the enterprise. Um so what is
enterprise aware AGI? So enterprise a
aware AGI is fully on boarded, very
intelligent, knows the ways your company
do things and um
one one key kind of insight that I um I
think is like sort there are many
acceptable ways to achieve a task. Um
but there's a gap between an acceptable
output versus a great output. Um one
example is like you know competitor
analysis like sure it can do some basic
Google search and like uh read some uh
notes out outside to like do some compet
analysis but does it actually follow uh
the protocols or the processes that your
company define and does it actually
address all the key metrics that your
executive uh really care about.
So um
given all these data you know like tasks
and golden workflows how do you actually
train your agents um using those data.
So this is uh the second part of my
talk.
Um so there are two main ways we have um
experimented with the first one is
through fine-tuning. Um there are sort
of two main flavor of fine-tuning here.
is uh you know supervised fine-tuning
where you give uh give an input and an
expected output and you train your model
to just um mimic that uh behavior. The
second way is through all RHF where you
don't have a golden label but you sort
of have a a rating or a reward when you
know like this task this workflow is it
a good one or is it a a bad one. So then
you can sort of run your uh favorite
optimization algorithms to fine-tune the
LLM.
So the pros of this method is that it
can learn really well when you have a
lot of data. Um um if you have a huge
amount of uh tasks and workflows, it can
really learn um like sort of generalize
across different tasks and like combined
workflows. Um the problem here is one uh
you kind of have to create a fork from
the from the frontier LLM right so you
start with some LLM you do some
finetuning and then by the time the fine
tuning finishes maybe there's a new and
better model already come out then you
have to like redo this whole process
again and the second is like any change
to your training data uh like you need
to do retraining right so if you have a
new tool then maybe some of the existing
workflow is outdated then you have to
retrain. Uh if you do change some
business priorities or business
processes then you have to like redo the
training again and it also not super
flexible for personalization. Um so
given the same task maybe different
teams or different employees might
actually have a different optimal
workflows to to do those tasks and
fine-tuning is not super well suited for
for those use cases.
Um then comes the second option uh which
is dynamic prompting through search. So
um given the same label data uh from
task to a golden workflow you build a
really good search engine for task um so
that you can find similar task given a
new task. So then at runtime uh to
accomplish a new task we'll find the
most similar task in the training data
and then you feed the representation of
those workflows to the LM as the
examples. Right? So here you really have
a spectrum of uh determinism and
creativity.
So when there's no workflow that sort of
match your input task then
are in control like it can use this
creativity to generate a new workflow
but when there's a high confidence match
of something that you have done before
then the LM will sort of give you a
workflow that's very similar to what was
in the training data.
Um so one very concrete example uh come
back to the competitor analysis uh
example before so you collected this
huge list of task to workflow
uh and then when a new task like say
what what competitors have we've been
running into recently then it will
retrieve you know how to analyze each
competitor and then you will find a work
on how to find uh your recent customer
calls and then the LLM So take those
example and then sort of generate a
composed workflow where it read customer
calls, read uh internal messages,
extract competitors and then run
analysis for each of them.
Um okay so comparison time um
fine-tuning RHF is very strong uh when
you have a lot of data that you want to
generalize.
dynamic prompting research is more
flexible. Uh also give you better
interpretivity uh that you can sort of
look into the exact examples that was
affecting your outputs and um
fine-tuning is good for learning
generalized behaviors uh where the
ground truth labels don't change over
time or like across different users. um
dynamic prompting with search is better
for learning customized behaviors or
like the last mile quality gap where you
know uh requirements are changing
quickly. Um one one sort of analogy I
think about fine-tuning versus dynamic
prompting is um fine-tuning is very
similar to like building customized
hardware. So when you know when you have
a sort of task that you really want to
optimize for and the requirements don't
change over time like you can really
build custom hardware that do it very
well. Uh but it's sort of costly when
you uh change your requirements compared
to dynamic prompting is more like
writing software uh not as uh optimized
but like you can just change them very
quickly.
Um last point uh so how do we actually
build this workflow search right so how
do you give it a task like find similar
task uh I would say it's very similar to
building document search right um and
there are two main components to this
the first one uh is what everyone
usually think of when they think of
search which is a textual similarity
right um given this task what are some
the similar sounding tasks that are in
the training data. Um, and here the sort
of uh
golden recipe is like like hybrid search
between lexical vector embeddings uh
reranking late interaction all that.
But uh what I found is in in the
enterprise settings uh pure text
similarity is not enough. when uh when
you give users the choice to create
workflows and write documents
when you want to search for something
there will be like hundreds or thousand
of similar looking documents or
workflows and uh the problem becomes how
do you choose the right one uh right so
uh which is what I call as uh
authoritiveness here and to solve this
problem then you kind of have to go into
uh knowledge graph right so if this
workflow is created by someone who I
work closely with uh it has high success
rate and like people post about it um on
Slack then it's more likely to be the
right one. So all the tricks in uh the
recommended system uh um world also
applies here for for workflow search and
um this kind of authoritiveness signals
are very hard to encode directly into an
LLM which is why we sort of have to have
like a separate system that does the the
search for workflows.
Cool. Um so key takeaways uh workflows
good for determinism human are in
control agents more open-ended AI is in
control and um the synergy between a
agents and workflows is workflows can be
used for agents evaluation. Uh workflow
is used for agents training and agents
is used for workflows discovery. Um,
fine-tuning is good for generalized
behaviors. Dynamic prompting with search
is good for personalized behaviors.
All right. Uh, I still have 1 minute and
uh 30 seconds. Uh, maybe time for one
question.
question.
So the question was uh I tried to
reinterpret it. Let me know if it's
wrong. uh how much data do we need to do
finetuning given the new
RLVR
RLVR
that's a very difficult question to
answer because uh it really depends on
how out out distribution your task is
compared to the internal uh knowledge of
the LM um but I'll catch you after and
we can talk more thank you too difficult
of a
[Music]