How to build Enterprise Aware Agents - Chau Tran, Glean
Channel: aiDotEngineer
Published at: 2025-07-24
YouTube video id: hxFpUcvWPcU
Source: https://www.youtube.com/watch?v=hxFpUcvWPcU
[Music] Thanks Alex for the introduction. That was a very impressive LLM generated summary of me. Uh I've never heard it before but uh nice. Um so um today I'm going to talk to you about something that has been keeping me up at night. Uh probably some of you too. So how to build enterprise aware agents. How to bring the brilliance of AI into the messy complex realities of uh how your business operated. So let's jump straight to the hottest question of the month for AI builders. Uh should I build workflows or should I build agents? So what are workflows? Workflows are system where LLMs and tools are orchestrated through predefined code paths. So there are two main ways where you can um represent the workflows. The first way is through uh imperative code base. So these are the workflows where you you know write a program that calls LMS uh read the response and then call tools and so like uh do this in a traditional programming flow and then here you get have like direct control of the execution of uh all the steps. The second way to represent workflow is through uh declarative graphs. So in this way you sort of um represent your workflow as like a graph of where nodes are sort of like steps where you can call tools or call llms and then there sort of edge between nodes. Um so you kind of define the structure but not execution and the execution of this is usually handled by some framework uh workflow frameworks. So I'm not going to go into the details of pros and cons for these two approaches but um the main point here is like for workflows you get structure and predictability. So if you run a workflow today it will mostly behave the same way uh if you run it tomorrow. On the other hand, um we have agents which are systems where LLM sort of dynamically direct their own processes of like decide how to achieve a task like decides what tools to go uh what step to take depends on the task itself. Um so the core agent loop is pretty simple. So it receive a task or like a goal from a human and then it uh sort of enter this iterative loop where it uh plan what to do and then execute the action and then read the results from the environment and sort of iterate until uh it uh gets all the result. It's one and then uh respond to the user. So what are the tradeoffs between workflows and agents? Um workflows are sort of like the Toyota of AI systems. Uh it's very predictable. Um it's good for when you want to automate uh repetitive tasks uh or like encode existing best practice or like know how in your in your business. This is usually lower cost and lower latency because you don't have to spend time on this all this LLM calls to decide what to do. And they're also also easier to debug because like you have this code or this graph that you can manually pinpoint uh at which step is going wrong in in the execution. And in building workflows uh humans are sort of in control like you can control your destiny like given even given u imperfect LMS uh you can sort of do tweaks and engineering so that your task work right now. On the other hand, agents are sort of like the Tesla of AI systems. Like it's more uh you know open-ended. This is good for like researching unsolved problems. Uh it's also usually good at taking advantage of um better and better LM capabilities because here the AI is in control. Um generally it's higher cost and latency because you need LLM to like figure out what to do and then but the uh upside is like there's less logic to maintain the call loop is very simple and u sometimes you get like this uh hints of brilliance that always feels like you know everything is going to be automated in a few months. Um the problem is like your your Tesla like it's works very well most of the time but sometime it still take the wrong exit on the highway and that's when you kind of miss your toa so and the decision to build workflows or agents is a pretty tricky one because it depends highly on the state of the LLM. Um so some workflows that doesn't work in the agentic loop now might start to work later in a few months when the new model has come out. So it's uh it's a really huge dilemma. Um but recently one thought um that's sort of really changed how I think about it is what if you don't really have to choose right so if you think of agent what they do is when you give the agent a task it will figure out the steps that needs to be done to achieve that task. Right? So um you give it the task you figure out the one step take the action figure out the next steps and then at the end when the agent finish the execution and then you look at the trace of what happened all those series of steps is a workflow. So if I represent this in like a uh programming kind of way then agent takes a st a task and then generate a workflow to achieve that task. Um so if we think of it this way agent take a task and generate a workflow then you can sort of see like there are really good synergies between workflows and agent. So the first thing is you can actually use workflows as uh evaluation for your agents right so uh let's say in in your company you can collect a huge amount of um golden workflows like given a task this is the steps that uh needs to be done to solve that task and you have a huge list of uh of those uh sort of handbook on on how um to do things in your company then you can actually evaluate your agents by uh you know like give it a task see what it did and compare it to the the golden workflow like did it actually figure out the right steps. So this is a little bit different from evaluating end to end. You are not judging agent by uh the end response but like by uh whether it actually did the right step to get to that end response. Um the second and uh even better way uh for workflows to help help agents is you know given that same golden uh workflows library you can also use it to train your agents. Um so here you truly get the best of both worlds where you know with the data feeding you can uh your agents will be able to execute the exact workflow that you have in your library for the known task. Um but then Oracle um it can also rely on its own uh internal reasoning capabilities to sort of compose different workflows together to uh achieve new tasks and even use its own reasoning to kind of extend uh what you teach it but like make it better. Um and then agents can also help workflows as well. Uh one way to do that is um for workflow building platforms uh you can use an agent to generate the workflows. Um so this is sort of how uh glean agents work under the hood where the user can give uh the workflow video like a sort of natural language description of the task it is trying to achieve and then we run an agent implementation to figure out the steps that are needed to achieve that workflow. then the user can sort of like uh make edit or like add change uh the workflow that that the the agent was uh proposing. Um and lastly and I think is like uh the most powerful um synergy is you can use agents as a workflow discovery engine, right? So you ship an agent uh users try to accomplish new task with your agent and then when they find that the agent did a good job then you can sort of save that workflow as like okay this is how you do this task in my company and then over time you can use this um as like training data to help agents get better. Cool. Um so that was the main points of my talk. Um, I guess maybe some of you are thinking, do we still need this kind of stuff in a world where we have AGI? Um, so here's here's my thought experiment and uh why I think this maybe still needed after AGI. So AGI is going to be a super intelligent employee, right? Um but if they if AI doesn't know about uh how your company works, it's sort of like uh a really good employee who just joined and doesn't know about all the business practices and still needs on boarding needs to know like who to talk to to get unblocked and like uh all the very nuanced ways of doing things in the enterprise. Um so what is enterprise aware AGI? So enterprise a aware AGI is fully on boarded, very intelligent, knows the ways your company do things and um one one key kind of insight that I um I think is like sort there are many acceptable ways to achieve a task. Um but there's a gap between an acceptable output versus a great output. Um one example is like you know competitor analysis like sure it can do some basic Google search and like uh read some uh notes out outside to like do some compet analysis but does it actually follow uh the protocols or the processes that your company define and does it actually address all the key metrics that your executive uh really care about. So um given all these data you know like tasks and golden workflows how do you actually train your agents um using those data. So this is uh the second part of my talk. Um so there are two main ways we have um experimented with the first one is through fine-tuning. Um there are sort of two main flavor of fine-tuning here. is uh you know supervised fine-tuning where you give uh give an input and an expected output and you train your model to just um mimic that uh behavior. The second way is through all RHF where you don't have a golden label but you sort of have a a rating or a reward when you know like this task this workflow is it a good one or is it a a bad one. So then you can sort of run your uh favorite optimization algorithms to fine-tune the LLM. So the pros of this method is that it can learn really well when you have a lot of data. Um um if you have a huge amount of uh tasks and workflows, it can really learn um like sort of generalize across different tasks and like combined workflows. Um the problem here is one uh you kind of have to create a fork from the from the frontier LLM right so you start with some LLM you do some finetuning and then by the time the fine tuning finishes maybe there's a new and better model already come out then you have to like redo this whole process again and the second is like any change to your training data uh like you need to do retraining right so if you have a new tool then maybe some of the existing workflow is outdated then you have to retrain. Uh if you do change some business priorities or business processes then you have to like redo the training again and it also not super flexible for personalization. Um so given the same task maybe different teams or different employees might actually have a different optimal workflows to to do those tasks and fine-tuning is not super well suited for for those use cases. Um then comes the second option uh which is dynamic prompting through search. So um given the same label data uh from task to a golden workflow you build a really good search engine for task um so that you can find similar task given a new task. So then at runtime uh to accomplish a new task we'll find the most similar task in the training data and then you feed the representation of those workflows to the LM as the examples. Right? So here you really have a spectrum of uh determinism and creativity. So when there's no workflow that sort of match your input task then are in control like it can use this creativity to generate a new workflow but when there's a high confidence match of something that you have done before then the LM will sort of give you a workflow that's very similar to what was in the training data. Um so one very concrete example uh come back to the competitor analysis uh example before so you collected this huge list of task to workflow uh and then when a new task like say what what competitors have we've been running into recently then it will retrieve you know how to analyze each competitor and then you will find a work on how to find uh your recent customer calls and then the LLM So take those example and then sort of generate a composed workflow where it read customer calls, read uh internal messages, extract competitors and then run analysis for each of them. Um okay so comparison time um fine-tuning RHF is very strong uh when you have a lot of data that you want to generalize. dynamic prompting research is more flexible. Uh also give you better interpretivity uh that you can sort of look into the exact examples that was affecting your outputs and um fine-tuning is good for learning generalized behaviors uh where the ground truth labels don't change over time or like across different users. um dynamic prompting with search is better for learning customized behaviors or like the last mile quality gap where you know uh requirements are changing quickly. Um one one sort of analogy I think about fine-tuning versus dynamic prompting is um fine-tuning is very similar to like building customized hardware. So when you know when you have a sort of task that you really want to optimize for and the requirements don't change over time like you can really build custom hardware that do it very well. Uh but it's sort of costly when you uh change your requirements compared to dynamic prompting is more like writing software uh not as uh optimized but like you can just change them very quickly. Um last point uh so how do we actually build this workflow search right so how do you give it a task like find similar task uh I would say it's very similar to building document search right um and there are two main components to this the first one uh is what everyone usually think of when they think of search which is a textual similarity right um given this task what are some the similar sounding tasks that are in the training data. Um, and here the sort of uh golden recipe is like like hybrid search between lexical vector embeddings uh reranking late interaction all that. But uh what I found is in in the enterprise settings uh pure text similarity is not enough. when uh when you give users the choice to create workflows and write documents when you want to search for something there will be like hundreds or thousand of similar looking documents or workflows and uh the problem becomes how do you choose the right one uh right so uh which is what I call as uh authoritiveness here and to solve this problem then you kind of have to go into uh knowledge graph right so if this workflow is created by someone who I work closely with uh it has high success rate and like people post about it um on Slack then it's more likely to be the right one. So all the tricks in uh the recommended system uh um world also applies here for for workflow search and um this kind of authoritiveness signals are very hard to encode directly into an LLM which is why we sort of have to have like a separate system that does the the search for workflows. Cool. Um so key takeaways uh workflows good for determinism human are in control agents more open-ended AI is in control and um the synergy between a agents and workflows is workflows can be used for agents evaluation. Uh workflow is used for agents training and agents is used for workflows discovery. Um, fine-tuning is good for generalized behaviors. Dynamic prompting with search is good for personalized behaviors. All right. Uh, I still have 1 minute and uh 30 seconds. Uh, maybe time for one question. question. So the question was uh I tried to reinterpret it. Let me know if it's wrong. uh how much data do we need to do finetuning given the new RLVR RLVR that's a very difficult question to answer because uh it really depends on how out out distribution your task is compared to the internal uh knowledge of the LM um but I'll catch you after and we can talk more thank you too difficult of a [Music]