Agentic Search for Context Engineering — Leonie Monigatti, Elastic
Channel: aiDotEngineer
Published at: 2026-05-08
YouTube video id: ynJyIKwjonM
Source: https://www.youtube.com/watch?v=ynJyIKwjonM
Everyone ready? >> Yes. >> Awesome. Welcome to AI engineer. Thanks for joining my session. We are going to be talking about agentic search for context engineering today. My name is Leonei. I work at Elastic, the company behind Elastic Search. And usually I like to talk about retrieval on Twitter. Today I'm super excited to be doing this in person. Uh, a little bit of housekeeping. If you want to access the slides and the code we will be looking at, you can scan the QR code. So let's start with why I'm excited about search and retrieval and hopefully why you are excited about it by the end of this workshop as well. Who here has built an agent or some form of it before? Awesome. Then you've probably then you're probably not intimidated by this uh image. You've probably seen some some alternative to this one before. This is essentially what context engineering looks like. So context engineering when we talk about it is the art or engineering techniques about how from all of the possible context sources we have, how do we actually decide what goes into the context window so our LLMs can uh generate the best responses. Often when we talk about this, we talk about context curation and we mean this little this little arrow from context sources to context window. But we're not giving this little arrow right there enough uh credit in my opinion because what's powering this is the search tool or search tools that actually decide what goes from context sources to the context window. And today we're going to be looking at the different search tools we have. So this is my personal hot take. I like to say that context engineering is about 80% agentic search because it's this little box right here. All right, let's start with a little bit of history. And when I say history, I mean the last three years. Rag. Um when we started with rag the original idea was that we had a fixed retrieval pipeline. So the user message would usually more or less ver verbatim be used as a search query to be used usually as a vector search query to pull some data or chunks uh from a database and together with a retrieved context. it would the user message would go into the context window and then it would be fed to the LLM. Nice. This has clearly many limitations. So since this is a fixed pipeline whether or not you actually need any context, you're still retrieving additional information and in the worst case that can actually confuse your LLM. Right? On the other hand, if you're only retrieving once, let's say you need some multihop retrieval, you're asking your LLM something more complex, then if you're only retrieving once, maybe the retrieved chunks um reveal some information about another search query you need, then you would actually might want to have a second round of search, right? So that's why we then moved on to aentic rag. So we replace the fixed pipeline with now a search tool. So now the agent can decide by himself by itself uh whether or not to call the search tool and retrieve some information. So we don't have the problem anymore of do I actually need any information and when I actually retrieve information is this even relevant? Do I need to retrieve more? Um, do I actually have to retrieve something, rewrite the search query? Um, yeah. So, we still only have one context source in this case, one database. Now, when we look at context engineering, the context lies in many different places, right? So, we have context sources in local files. So when you think about your coding agent, you probably have your um coding project in or code files laying around in your local file system, maybe you're using um some kind of working memory like a scratch pad. So you're planning with your um agent what you want to do. Then you probably have something like a plan MD file. When you have agent skills, you also have them usually in a local folder somewhere. We still have databases because many enterprises have their data stored in databases. Um, we have the web as another context source. And I know this is a super controversial uh image here because I did not commit to having the long-term memory in the local file system or the database. This is I think a currently a very big discussion. We can get into this in the Q&A if you like. But we also have long-term memory as another context source. Right? So how do we actually retrieve context from these? Usually we have a set of um let's say context source native search tools. So for the local files you usually have something like um a search files um tool. For skills you usually have a skill loading tool. When you think about databases, we have a little bit more custom tools. So, um something like a semantic search tool. Maybe you also have something more um general purpose like a tool that lets you execute entire search queries against a database like SQL for example. For web you have web search tools and for memory you have something like a dedicated memory tool. If that's not overwhelming enough, we now also have something called a shell tool. Um, Langchain calls it shell tool. Anthropic calls it the bash tool. If you've experienced uh if you've played around with open cloud, it's called the exec tool. But what all of these tools do is they let your agent run commands in the terminal. And that actually makes them super uh versatile because now you can let your agent um use CLIs to navigate your um and explore your local files. So you can just run ls and grab to find data in your local file system. If your database has a custom CLI, you can actually let the agent also use the shell tool um and interact with the database. You could also let your agent write an entire search entire script from scratch like connect to your database, run a search query. Um, if your database is exposed via HTTPS, you can just run a curl command, interact with the database through the shell tool. Speaking of curl commands, you can also um do web searches if you like. So this shell tool is super versatile, right? So the question and the topic of today is what search tool do we actually need? Do we only need a s a shell tool? Do I need all of these? And if you take home only one thing from today is that doing good search is incredibly difficult and that's why we have many different techniques to do search. Right? We have vector search, we have keyword search, even in vector search, we have dense embeddings, sparse embeddings, multi vector embeddings. Then we have many different indexing techniques. So depending on what kind of search requirements and latency requirements you have, you will need to curate your own stack of search tools. Right? So today we're going to be looking at a few of these. Unfortunately, we only have one hour, so I cannot show you all of them. Um, before we get into some code, I want to um give you a few fundamentals um of building good search tools because agentic search at the surface level seems very straightforward. The user makes a request. The agent calls the right tool with the right parameters. I see someone laughing. Um, the retrieval tool gives you the tool response and then your agent uh responds to you with the correct answer. At Elastic, we help a lot of internal and external teams build uh agents based to interact with elastic search data. And the reality is that this can break in many different ways. I'm just going to show you three um today. So the first is the agent doesn't call any tool. So this means the agent decides I actually can answer this question based on my parametric knowledge. I don't need to use any um context retrieval tool. And the other problem is that the agent calls the wrong tool. I was recently talking to a colleague of mine was asking him what was your the most challenging aspect of your project and he was like you won't believe it but it was really difficult to get the agent to actually not call the web search tool but call the database search tool right and then depending on how complex your parameters are for your search tools it can also be quite challenging to get your agent to generate the right um search parameters, right? There's many more failure cases, but we're going to limit this to to these three today. So, I personally hate this slide because I feel like everyone in this room probably knows um that the tool description is the most important aspect, but anytime I see a tool description, it's like the least effort, one sentence, and then you're wondering why your agent isn't calling the right tool. So, arguably, this is a very long tool description. I'm not saying you have to write it like this. I'm just saying if you just start with a core purpose, if it works fine, great. But if you add more parameters or more tools and your agent is starting to struggle with calling the right uh tool, then maybe add some trigger condition. When should this tool be used? When should this tool not be used? Especially if you have multiple tools. Um, adding something like relationships is super important, like first call this agent skill before you actually call this tool or get some confirmation before you call this tool. If you have the perfect tool description and your agent still doesn't call the right tool, then reinforce it in the agent system prompt. That should actually um help out in most cases. Then I want to quickly touch on some on the parameter complexity. If you have a search tool that's just very simple in in the sense that something like get customer by ID, it should be fairly straightforward for the agent to generate an ID parameter given that it's a valid ID. Um same for if you're doing a semantic search, right? generating some valid string should be shouldn't cause any any issues. But let's say if you want to have a semantic search tool and instead of just giving um a topic, you now also want to give it some filter conditions, maybe you want to define the top K. Then you start to have more parameters, right? This isn't very complex here, but the longer the list of parameters you have, the more difficult um it's going to get for the agent to generate the right ones. And I think a very complex one for an agent is to when you have something that's more general purpose like letting the agent um execute um entire search queries against a database. So here I have ESQL which is the elastic search query language could be could be SQL as well. So letting the agent write an entire SQL query from scratch can be quite challenging. Most are pretty good, but um some aren't. So just keep in mind um let's say the the complexity of the parameter also is kind of a failure mode and you can you might need to help the agent out with um a few of these if the they are more complex. Good. Let's look at some code. Okay, so quick show of hands. Who here has built some sort of agentic rag, agentic surge before? Okay, about half. That's good. Um, so we're going to be looking at three things. Uh, I'm going to give you a quick recap or intro to the very vanilla agentic search demo. And then I'm going to show you how easy it is to break this usual demo. And then we're replacing the semantic search tool with something more general purpose. So we're letting the agent write an entire search query from scratch. And for these two examples, we will be using um a local elastic search cluster as a context source. And then for the third part, I'm switching gears and I'm going to be showing you how search over local file system works um with the bash tool. And then I'm also going to show the shell tool. And then I'm also going to show you um some limitations of the shell tool and how you can expand it with custom CLIs. All right. The example that we'll be doing today is I have the conference session data of this conference here. And let me let me start show you this one. So just a quick recap. We have elastic search database and we're going to be writing a semantic search um search tool. And for in the database I have the conference session um already chunked. You probably already know how to chunk and store data in a database. So we're skipping this part. This is not the important aspect. So, what do we need for an agent, an LLM? Oh, sorry. We're going to be using lang chain for this um session just because it wraps a lot of the complexity and uh I don't like it lets us concentrate on the high level concepts. also has some nice built-in features like uh the shell tool is built in and it has some uh code samples for skill loading tools which we will be looking at later. Okay, switching back. I'm using GPT 5.4 Nano for this demo. Then we're defining a very simple system prompt. So the usual you are a search agent tasked with answering questions. Um you have access to different context retrieval tools and before answering a question oops decide whether or not you need to retrieve additional context to help the agent a little bit I have some information about how the data is um structured in elastic search. So here we have a text field. It's comprised of the title of each session and the description of each session and the text field is what actually gets embedded um as the vector embeddings for semantic search and then I also have some metadata fields. So for example the day, the time, the room, um the speaker's name. So since the metadata is not embedded, I can only run um filters over them but not any semantic search just for your information. Okay, now the interesting part let's build a semantic search tool. So how that works in lang chain is I have to first define an embedding model. In this case I'm using the new genome embeddings v5 model. Um this is used to embed the search queries at query time and the embedding model I'm putting in with putting it into my um elastic search store together with the elastic data to create a vector store and then I can create a search tool. So in this case I have all I have to do is I call the similarity search method. It takes in a search query and in this case I'm setting the limit or the top K to three. This is a little bit of foreshadowing because I'm limiting the capabilities of this tool to just returning three search results. Right? What's nice in Langchain as well is that when you use the tool decorator up here, it lets you convert any Python function into a search tool uh sorry into an agent tool. So by default, it takes the um functions u Python functions name as the tool name and the dock string down here is going to uh convert going to get converted into the tool description. You can see I'm breaking my own rule by having a very short tool description here. Why this works is because I only have one search tool here, right? So you will see I'm adding a few things later on, but it's not going to get very descriptive in this demo here. So now we can um run a test and test it for a search query of regulatory constraints. And you can see it finds a talk by my friend B on engineering AI systems under server constraints and it also finds some more talks. One by TAS and one by Pedro. All right, let's plug it in. So we're plugging in the LM, the system prompt, and the search tool. I'm leaving out memory. Obviously, this would be another core component of an agent. In this case, I'm leaving it out to keep it kind of concise. Now I can uh run a simple question like which sessions discuss regulatory constraints in AI systems. And you can see the agent first uh calls my semantic search tool. It wrote a quite extensive search query in my opinion, but it works. Um, it finds the right talk by Bili. Then it decided that that apparently wasn't enough. So, it rewrote the search query. Uh, but decided but got the very similar search results back. So after that it decided that sorry it decided that it's now able to um respond with the right talks. This is where most agentic search demos fail. But this is very brittle. Does anyone have an idea how we can break this? Yes, that's a good idea. Anything else? Asking it something that's not in the database would be something great. What about asking it um something where semantic search actually falls short? Maybe something where we want to look for a keyword, a specific keyword. um also doing something like filtering because we're in this search tool we don't have any filters implemented right so my my choice of search query is which sessions should I visit to learn more about GEA I'm not even sure like I've heard people talk about Ga I'm not even sure if I'm pronouncing it correctly sorry that's why I need to definitely attend this session. So what you can see is the agent now calls the search semantic search tool and this time it's looking for GPA. So far so good but now you can see it's actually returning a talk for um deep minds Gemma models. I guess from token from a tokenization perspective it could be similar to GDPA or JPA I don't know then it returns something on harness engineering not sure if that's necessarily related and then a third one I clearly none of these are related to GPA spoiler alert I know there's a talk about GPA or JPA again I think it's right after this one So we can see the search tool we just created. It's not very useful or at least useful only for a very narrow scope of use cases. Right? What if we let the agent now write an entire search query from scratch? Let me show you how we can do this. So we're now replacing the database tool that we had with an execute query tool. So we're letting the agent not only take in a like a search like a topic, but this time we're giving it an entire the search tool an entire search query. And I'm going to show you because this is quite difficult for an agent. Uh we're also combining it with a skill loading tool. So doing the same thing. I'm setting up my LLM. You've probably noticed I'm switching to a little bit more powerful model here. So I'm switching from the GPT 5.4 Nano to um the mini because I am now anticipating that writing search queries is a little bit more difficult. So the nano is probably not powerful enough. I'm using the exact same system prompt as before. And now I'm creating a general purpose database query tool. Since I'm using elastic search, I'm going to be using the elastic search query language, which is a pipe query language for filtering, transforming, and analyzing data. It looks something like this. Maybe it reminds you of SQL. It's a little bit different. It has different capabilities. Not important for this session. Um, but you can see when I connect to my client and then I use this query method from the SQL class and run this query, you can actually see that there is a session by Samuel which talks a lot about GPA. You can see here is a match. Here's another match. So let's wrap this into a search tool. You can see um this time I just use the the query method again here and the agent takes in the ESQL query um as a parameter and I exchanged the tool description with something that's said that's called execute an ESQL query against the conference schedule index in elastic search. notice anything different about how I wrote this search tool versus the other one. This time I added a try except block here for error handling. Generally speaking, you should have error handling. But since I'm anticipating that writing a good ESQL query or a valid one is gonna cause more problems for the the tool, um I don't want the agent to just fail and then the whole system to crash. So instead of instead I return the error response to the agent so it can kind of self-correct, rewrite the query. Generally speaking, super important to have this, right? So the agent can self-correct. So when we we can test this here, this is not important. Uh then I'm plugging in the LLM, the system prompt, and my not new search tool into the agent again. And when I now um ask it the exact same question as before, which session should I visit to learn more about GDPA? You can see it calls the execute ESQL query tool and it generates something that looks like valid ESQL. I'm not expecting anyone to be very familiar with ESQL. What's wrong with this is that ESQL doesn't use the percentage sign as a wild card character. And in ESQL, you would use the asterisk. So in this case, it's actually looking for percentage sign, GPA, percentage sign in the data as an exact match. So that's why it's actually returning zero search results. And this is when you're working with search tools also super important to think about. Is returning zero search results actually a valid response or is it a failure mode? Right? Okay. How could I overcome this? I could probably write um a more descriptive tool description, give it a little bit more help on how to write better um parameters. I could re reinforce it in the system prompt, give it more instructions there. Or I could use an agent skill because you need more documentation than just like a oneliner, right? So now I'm going to show you how to add an agent skill. So in this case I'm going to be writing my own very short custom uh agent skill. Quick question. Who has you used and played with agent skills before? Okay, good amount. So I'm going to be writing a very short one here. Um there is official elastic search agent skills available if you want to play around with it. In this case I'm just uh using my own custom ones. So how that works is you have the um the skill name and then also skill description which gets injected into the system prompt. So only the uh if you write it in in markdown it's the I think the front meta right that gets injected into the system prompt and then when you need it um more information on the agent skill is loaded into the context window right so it's called something like progressive disclosure where you kind of add more information about the skill as you as needed so in this case I have some minimal instructions like here's the basic um structure of an ESQL query. Um ESQL uses double quotes for string literals. Just some very basic syntax rules. And I also added some more information about the wild card pattern so it's not making this mistake again. And then as I mentioned, Langchen has some boilerplate code you can just copy and reuse for um using agent skills. So I'm skipping over this. All you have to know is we have a tool and skill loading tool and it's get it's getting inject um sorry it gets combined with a something called a skill middleware. Skipping over this because this is not relevant for our session. And now all I have to do is um add it edit the tool description of my general purpose search tool. So this is the exact same tool that I had before except this time I'm now adding some relationship to I'm saying always use elastic the elastic search ESQL skill to generate the ESQL query before using this tool because otherwise if the agent then still uses this tool first without without um using the agent skill then that would be a shame. Right, I'm doing the exact same. So I'm reinforcing this now in the system prompt. I'm saying the same thing to use the um elastic search a agent skill first before calling the general purpose search tool. And now I'm plugging it into the agent again. So this time LLM system prompt for the skill loading tool. I have the skill middleware and then my general purpose ESQL query tool. And now when I let the um ask when I ask the agent which session should I visit to learn more about Ga you can see it f first loads the skill. So it actually loads everything that's kind of in the body part of the skill into my context window. And then it generates this time a very valid SQL with the asterisk as my percentage as my wild card characters. And you can see it actually finds the right session. And now it tells me that at 10:40, so after this session, I should be going to this session to learn more about GPA and learn how to pronounce it correctly. Okay, what's also cool about this is now the agent can do a lot of things, right? It can also do aggregations. So if I ask it something like how many sessions are on April 8th, you can see again it loads the um elastic search ESQL tool. Um and then it writes an ESQL query. Whoops. Writes an ESQL query that's using a filter. So it's filtering for April 8th. And then it also does an aggregation, some some counting and tells me today there are 27 sessions. This is nice because if I just do a search, let's say I ask it to tell me which sessions are on April 8th and it only runs a filtered search, right? So just imagine it would give me a list of all 27 sessions that are today and we let the agent count how many sessions there are. That would probably not be so good because we all know agents or LMS are notoriously bad at counting things. Um and also it would um fill up your context window, right? So by letting the agent do its own calculation, so letting like outsourcing the calculation part into the search tool, it's actually quite an efficient way to do this, right? Any questions so far. Okay, let's switch gears. Um, this is a very prominent topic at the moment. Maybe you've heard the discussion about all an agent needs is a shell tool and a file system. So, I work at Elastic, but I don't discriminate. Let's look at file systems and how to do this because I think it's a very interesting um topic in general. So what I did here um I prepared the data this time in a local file system. So I have a folder called session data and in here I have for each type of session like keynotes and workshops. I have another folder and in there there's per session one file looks something like this. So with a title, some metadata and the description. And now um I'm going to show you how you can use the shell tool with this. So I'm switching back to the GPT 5.4 nano because LM are just generally good at um navigating file systems, writing uh shell commands. So GPT 5.4 nano is sufficient. In this case, I define another system prompt. So the first part of the system prompt is uh exactly the same as the one we had before. And what I'm replacing this time is instead of explaining how the data is structured in elastic search, I'm explaining how the data is structured in my local file system. Okay, so let's use the shell tool. I have to give you a disclaimer. Using the shell tool can be risky since giving your agent access to a terminal can make it delete files or do other things you don't want it to do. So always recommended to uh use it in a sandbox environment. Also in langun it doesn't have any safeguards by default. So please be careful when using this. But other than that it's very easy to use. So you can just use uh import the shell tool and instantiate it here. And here you can see how you would use it. So it takes in the commands parameter. So when I say echo hello world, you can see down here it actually prints hello world into my terminal. And then all you have to do again plug in the LLM, the system prompt and the shell tool. And now you can ask it this exact same thing we had earlier. So are there any sessions about GPA and you can say you can see here it's called the terminal here but it's the it's the agent calls the uh shell tool and it actually writes a few commands. So first it's looking at the folder structure and then it runs um some grab commands. So it's looking for GPA in the session data and I think it's looking for the first 50 entries. So you can see it saw the um the folder structure and then it also found the one session we were talking about earlier. But since I was only looking at the first 50 and only found one session, it decided that it should probably look at the entire session data. So it finds the exact same session again. So this time it decides, okay, then I should probably look at the contents of this session. So now it reads the entire file content. And here you can see the the session information um as a tool response. And then at the end the agent tells me which session I should visit. Okay. Grab works based on exact matches, right? And um reg. And I just want to show you this because I think it's funny how like surprisingly good agents are with bash because they kind of can cheat at semantic search. I Let me show you this. So when I ask it which sessions discuss handling regulatory constraints, this was our semantic search query from the beginning. You can see it again looks at the folder structure and then it the first command or the first search it does it's looking for regulate. It's fair. It's looking for regulation for regulatory. I guess that's that's a fair start. But then it goes ahead and now it just chains a bunch of synonyms together. So it's looking for compliance. It's looking for constraints. It's looking for GDPR. It's looking for governance. Yeah, I guess that's fair. Um, I think it actually finds the it finds a bunch of sessions. So, it's like, okay, let me try a bunch of other synonyms. So, now it's looking again for uh regulate, compliance, GDPR, PR, soenity. Um, I think yeah, the list goes on. It's just looking at a bunch of different synonyms and it actually is successful with this and it finds the session by building and returns the session information and then um is able to respond correctly. I guess it works. Is that the most efficient way to do this? Probably not. I mean just as an example. So, let's say you you want to search for something like movies with animal superheroes or something. Do you really want to do your agent to search for a list of all the animals possible? Will you find all the superhero movies with animal superheroes? Probably not. So, it works. Is it the best? I let you decide. So at the moment there's many different semantic search alternatives to grab. Um I think there's one by llama index called sam tools. There's a really cool one by light on which is called coal grap based on multi vector embeddings. Also there's one by our own Gina that's called Gina Grap. Um, today I'm going to show you how easy it is to actually um, use this together with your agent. So, all you have to do is go ahead and install the Gina CLI and then all all you have to do is tell your agent that it now has access to this tool or to the CLI. So, this is the exact same system prompt as I had earlier. Now with the difference that I'm explaining to it um that it has gen works how it should use it. Here are some examples of how you would use Gina Grap. Just a disclaimer Gina Grap has many different modes. You can use it for um classification. You can use it for re-ranking. Today I'm just showing you how to use it for semantic search. And at the end I'm also explaining to the agent when it should use grap and when it should use gina grabs. Just so it knows for exact matches you probably still want to use grap. And for um more semantic search or fuzzy queries use um gina grap. And then plugging this in to my agent again. And when I now run the exact same semantic search query we had earlier. So which oops which sessions discuss handling regulatory constraints. You can see the same behavior. So it calls the terminal tool. It first explores the folder structure and then it actually on the first try is able to correctly use gina grab. So, it's looking for regulatory constraints and boom, it actually finds um the session by Bill on the first first try. Finds a few others because it's looking for 10. Um it says returned top k of 10 and then it's able to answer me correctly. Nice. All right. Any questions so far? Good. Then I'm switching back. So we were looking at a bunch of different tools today. We saw how big the tool landscape is. Um I showed you a few of the um search tools we have. Now, some practical recommendations on when should you actually use what. So, maybe let's start with this. If you're looking for just one silver bullet tool, that's probably not the right way to go. Again, if you're if you think about it, doing good search is incredibly difficult. So ideally you want to have or curate the right set of search tools for your agents search behaviors and you want to have a combination of specialized tools and a combination and uh general purpose tools. So specialized tools are something that the agent can use out of the box. Something with a very simple parameter, something where you're not where you don't need a very powerful LLM. You know, the agent isn't going to make a lot of mistakes. The agent can just use this tool out of the box. So at Elastic, we like to think about this about of having a low floor. So this is a concept from user experience where the agent can just you use a tool doesn't make many mistakes. It's also efficient so it doesn't have to run your tool multiple times. You can think about this as the semantic search tool we had earlier. Maybe you need to look up customers by ID a lot of the time. Then having a specialized tool for that exact operation would be helpful. But then you also want to give the agent a high ceiling. That means for unexpected queries, for complex questions, you want the agent to still be able to handle these questions, right? And not be have like these limited uh specialized tools and be like, I I cannot solve this. So for this case, something like a shell tool or the very um general purpose one we had earlier of a query execution tool would be very helpful. But the problem with the query execution tool or the shell tool you saw earlier is since it's so general purpose, the agent sometimes might need more iteration to iterations to actually get to the right answer. Right? So this is my my practical recommendation of having a balanced set of search tools um of a low floor and high ceiling. This is all nice when you already know your agents behavior, but if you don't know your agents query behavior yet, then I would recommend to start with a general purpose tool. Then log your agent's behavior. Generally speaking, logging your agents behavior recommended. Um, but if you notice, maybe your agent is taking four or five tool calls per question. That's too many tool calls. Then you probably that's probably an indicator that the tool your agent has is too difficult for it to use. then definitely look at what the agent's actually trying to solve, maybe scope out something more specialized in that case, right? Also, if you notice specific um query behaviors, this is what I personally did with my with my test open claw. I was uh it has the exec tool and I um started logging its behavior and obviously I was playing around with databases. So after three days I was asking it what kind of interesting patterns do you see and was recommending me actually to um implement some specific uh search tools uh to interact with the database because it was out of the box only using the the exec tool. All right. Start with general purpose tools if you don't know your users behavior yet. Log breaks and app purposeuilt interfaces. Yeah, that's that was a lot to take in. Um, I'm sure you have lots of questions, so I'm opening it up for Q&A. And otherwise, on your way out, don't forget to grab yourself some stickers. And then thank you for joining my session. I think there's a mic coming. Thank you. Um, so would you say the tool stack you need is also mostly dependent on the model you're willing to use. So if you were using a very good model, it might be fine using the shell tool or the search tool, but a very small model and light agent might need more specialized tools or >> Yeah, actually we I think in our internal testing we noticed that a more powerful tool actually reduces the error rate of um for the parameters by I don't know the numbers exactly but it was a very big amount where it reduced the error rate. So having a stronger model definitely helps um for the general purpose tools but I think you cannot expect uh just because you have a very strong model that there's going to be no no errors if that makes sense. Yeah. >> Thanks for a nice talk. Um I have one question. Maybe it's slightly off topic but um so now we are talking about a gentic rack but it comes with the drawback of having higher latency against typical rack. So would you recommend of having like a second pathway for simple rack for fast answers and how would you like guide the agent to actually choose the right one because I think it's hard to say which question should be answered by a gent or by simple rag. >> That's a good question. I I don't think I have a have a good answer on like on the top of my head right now. I'm thinking I was maybe something related. I was asked recently if you have a rag system, should you replace it with a gentic rag? And I guess this is kind of going in the same direction of when do you actually need a gentic rag, right? um probably for a lot of use cases. I know rag has been killed many times, but I think the reality is that rag is still very effective for many use cases. Um how would you actually switch between rag and agentic rag? I'm not sure because I assume it's again it needs some kind of almost agentic logic of switching between them. So I'm not sure. I'm sorry. In such cases uh like um when you have uh the wild card in the GAPA uh example uh why can't we just um perform a hybrid tool that maybe search and replaces uh common wrong uh wild card symbols coming from SQL with the correct ones for instance. >> I'm not sure if I understand your question correctly. I mean um there are cases in which uh the agent does doesn't know the how to write the uh correct query because maybe he thinks uh the placeholders uh coming from SQL um apply to ESQL but uh so why don't we perform a hybrid tool that um determine significantly um uh search and replaces the wrong um placeholder the the percentage symbol with the asterisk. >> Yeah, actually so the the example I showed you of using the agent skill wasn't necessarily the the necessary solution for it. You can also um add just some very simple instructions on for ESQL don't use um the percentage sign as a wild card character. It actually works. I tried it when I was building the the demo. Um but then when the agent now runs in the next issue, then you start adding the next piece of documentation. then you can kind of start writing the entire ESQL documentmentation from scratch into your system prompt. And yes, for the demo purposes is it would have worked, but it's probably not how you would do it necessarily when you're building something more robust, right? Because if you just add like little like band-aids every time you run into an error, then what happens when you run into the next edge case? Does that make sense? Yeah. >> Hi, thank you for the really wonderful um presentation. I have a question. Uh in the demo we have walk through the agentic search with DB curious tool and also another one with shell tool. Would you recommend in the practical use we can also kind of uh use combine both tool and then we validate the result from each of tool and then we kind of add the confidence of the result from the LM. Would you like recommend doing this in a practical use? >> Yes. Yes. That's a great question. Also again I'm kind of cheating in this demo right because I'm only showing you one tool per demo. In reality, you would have something more like a bunch of different tools where you then have to um decide which or the agent has to decide which tool to use. I think there was a very interesting blog post by Versel I believe and they did an experiment. I think it's called, if you want to look it up, I think it's called testing is if bash is all you need is the title I think of the blog post. And they actually kind of benchmarked or tested an agent with a bash tool, an agent with a just file search tools, I believe, and an agent with database tools. And in the end, they also had one agent with a bash tool and the database tool. and they noticed that was super interesting. Uh for a specific set of queries um where you have analytical queries, this is a specific use case. Actually, the database tool was more effective, but on the other hand, the file search tool is very effective as you saw for just quickly finding things. But the very interesting aspect was the hybrid agent with the bash tool and the um database tool was actually achieving the highest um like highest accuracy because it first I believe it was first using the database tool and then verifying the results with the shell with the shell tool and that led the agent to actually achieve better accuracy. So I think that was a very interesting way um and behavior to see in in agents. >> Thanks for sharing. >> Um one second question. Um, so if we use the semantic search tool, I think in practice you probably would use some kind of threshold to cut the results to not get something if there's no answer. But in the agentic regime, would you then say, okay, let's put a conservative threshold such that we don't confuse our agent or would you say the agent is smart enough even if we retrieve results that are not really relevant, it will be good enough to to to notice that? >> Yeah, that's a great question actually. Um, so in the examples you probably saw some where the agent was returning. I think in the last genograph example, you see it's actually returning the top K results where only the first one is the actual relevant one. And I think um because the agent does a little bit of reasoning over whether the search results are relevant to the search query, I think it's much better or they're much better today at kind of weeding out what's not relevant. But then you kind of run into the risk if you have longer running conversations that kind of these search results sit in your context window longterm could have the problem of confusing your agent long term. So I think it kind of it depends on your use case of how um your like how your agent can handle like irrelevant search results. Generally speaking, based on search results, it can filter out what's irrelevant. Uh thank you for the talk. Amazing. Um are you utilizing sub agents for this search queries? Because yeah, if we let them decide to is it relevant to use a question like yeah it's done in reggas framework for example for evaluation. I mean sub agents would help a lot. Do you have any experience with them? >> Unfortunately not. I have not played around with sub aents yet. I can only tell you that I know for example I believe in cloud code they're using sub aents for doing specific search tasks. I think there was a blog post on how they're actually using a sub agent to um answer specific questions about claw code because it's kind of like a niche question a user would ask. So in this case they kind of outsourced um their expertise to a sub agent. So having a sub agent for specific niche questions I think would be interesting but I don't have too much like experience and >> Okay. Thanks. Uh can I ask another question? >> Sure. >> Um damn I forget. Sorry I try to catch up later. Um yeah, I kindly off topic but um you talked about skills and the big benefit of skills is just have the description in the uh system prompt and whenever needed we need to load this the full skill. Um do you have any recommendation when and how to clear the system prompt again? So because we want to keep the context window small and maybe for a long um session we might have up to 10 skills full skills in the context and yeah >> I'm not sure Joe do you have a better answer like have an idea. >> So what we're doing is that we learn the skills like but then we to like offload things. So when the context very small but um but yeah so like the way that we are doing it behind the scenes is that we're we're we're providing like that kind of progressive disclosure of skills. So we're providing those um the skill names and descriptions the location within the file store and then from the file store we're loading into the context window when we need that skill and then we offload it once it's once it the prog you know the context window progresses ahead of time around that. So we have this kind of more on demand one around that. And that's the same with our like compaction like comp of context and that's what I would advise you to do like some of the questions do try and use the file store as much as you can and have those tools as well like being able to grap the file store for when you want to see previous tool results and then and then use it from that. Thanks as my colleague Joe from Elastic as well. Awesome. If there are no more question, I will let you guys go into the coffee break. Again, don't forget to grab yourself some stickers and happy to catch up in the halls if anyone's interested. Thanks so much.