Demand-Driven Context: A Methodology for Coherent Knowledge Bases Through Agent Failure
Channel: aiDotEngineer
Published at: 2026-05-05
YouTube video id: _QAVExf_1uw
Source: https://www.youtube.com/watch?v=_QAVExf_1uw
Uh Thank you. Maybe we can get started. Uh first of all, thank you so much for coming for the workshop. Especially ones who didn't get the seat. Uh I I promise you I'll do my best to make it entertaining especially for you for sitting. Uh thank you so much. >> volunteers, right? Yeah. Uh actually it it makes sense. So now I know like why the tickets got sold out, right? Uh which workshop actually sold out the tickets. So uh Let's start with uh my introduction. So I'm Raj. Uh I work as a staff software engineer uh at IKEA. Uh I work for a domain called delivery and services. Uh basically we are like almost more than 100 engineers and six product teams all together. It's like a mini company within the company itself. Uh I'm very interested with architecture, neuroscience, and linguistics. And now AI. So if anyone have some cool projects because everyone is building cool projects these days. Please find me after this meeting. So quick pulse check uh with the audience. Uh who is visiting London for the first time? Okay, cool. Welcome to London. Uh who is here uh from the engineering background? Also with live coding and uh prototyping. No, all of Okay. So uh who actively uses agents like Copilot or Okay, this is going to be tough for me then. So extensions? Okay, so everybody is pro. Okay, fine. Not so much as I want. Okay, so much tension now. Uh So you're going to sit here in this hot room for more than an hour. So, first I will uh give a bit of introduction of what I'm going to present today. Uh it's basically on agent and the context management. Uh I I divided into three parts. Uh one is the situation, which all of you already know, so I'll keep it tight and short like 5 minutes. Uh then I'll talk about the problem. This is where I'll spend a bit more time on the slides because I think like uh nobody is actually seriously looking into the problem. Uh and I want to bring it up. Then less slides and more into some kind of a hands-on. Uh how the uh actual demand-driven context actually works. All good? Okay. So, let's start with the first one. How many of seen this movie? Memento. Okay. Okay. Okay. Cool. So, I'll I'll give the gist of the movie. So, this guy is very skilled, uh very talented. The only problem he has is uh he can't hold memory more than 15 minutes. So, every 15 minutes he has to take his notebook, uh watch his tattoos that he put it on his one, and figure it out, "Okay, what I was doing before the 15 minutes?" And he does it again and again. If you relate to the AI and AI agents things and all, it's actually fits exactly how the movie is and how the agents are right now. If you go and watch this movie, you don't need to watch YouTube or blogs to understand agents and MCP. This movie actually tells you about everything. Literally. Uh and as this guy has a memory problem, uh in the same way the AI uh that we got introduced couple of years ago uh is very good with reasoning, computation, uh code generation. It's It's benchmarked as like uh above the par. The only problem is the institutional knowledge, right? The domain knowledge that you have. That's That's the only thing we have to be uh a bit problematic. So, from AI to agent, if you look at the evolution, it exploded. So, it started from prompt engineering first of all. Then there was rags, MCPs, then multi-agent. Now it is deep agents. Uh I recently found out like uh using Replit actually, you can build a full stack app in 10 minutes. That means the by the time you make the instant noodles, you already have a million-dollar app already working on your laptop. So, we got it to this point. Like it's extraordinarily good. Now, that's AI and agents. So, let's talk about enterprise AI. Okay. Uh I don't know how how many have you have this question, but most of the enterprises I see the question is, "Okay, AI is pretty smart. It's doing uh code generation, full stack apps, reviewing your PRs. Uh doing incident management, all those things. Right? So, if AI is doing that much, why is the Jira tickets or epics are not moving on the dashboard, right? Why do I don't see the delivery actually? So, everybody is speaking about look at like 3 minutes, everything is ready. Yeah, okay, fine. Why are my Jira epics are not moving? Because that defines the business delivery and that defines the return of investment, right? Uh and as you see, like uh it's it's from the McKinsey this year, like 88% of all companies use AI, but they only see like uh 6% of value creation. Okay, so I think this is the problem uh that we have. Uh I have four Jira tickets, different ones, uh sample ones. And you can see the green ones that I have uh marked as basically which LLM is or already trained on like APS standards or like things very they already know it's a general knowledge, right? That's fine. Those tasks from the ticket it can pick up and it can do it. Now, there are second part orange ones which we have to teach them actually. So, you have you know this but do this in this way. So, all this kind of an orange color things will fit into, you know, your agent extension like skills or like uh But, the red ones that's what the institutional knowledge is which sits within the company and within the people. So, unless if it picks a task uh if it picks a ticket, it has to fulfill all of them. It is so good at with uh green ones and orange ones but it struggles with the red one with the institutional knowledge. And what I believe is uh right now the coding agents are getting so so better. I feel like if there is an AGI coming, the first AGI will be a coding agent for sure. Uh So, to fix the uh giving the institutional knowledge to uh the agents, so we have an industry solution already. So, this is basically your return of investment on AI pipeline will look like, right? So, you have LLM model quality, uh you have agents, and you have agent harness. And your your institutional knowledge sits under Confluence, Jira, SharePoint, GitHub, all those things. And basically retrieval layer is what industry is telling us will fix that issue. So, you build that a retrieval layer and then it will fetch all those things and give it to an agent and the agent should be able to do it, right? So, basically uh So, 40% uh actual factual accuracy can be achieved through rag or like knowledge graphs actually. Uh but with a documented knowledge base. Now, basically if you build a retrieval layer it has to work, right? Now, let me ask you a question. How many have you built a retrieval layer things like rags and MCPs? Okay, cool. All of them. Okay, now the question is how many did you build? How many MCPs did you build? How many have you built more than 20 MCPs at least? Okay. Okay, so nobody beats my record then. Uh so what I see is mostly in the enterprise organizations and all people are building like 10 to 15 or like 20 MCP servers or like rags, knowledge graphs on top of their institutional knowledge and to the agent, right? So, the assumption is if we if I can build all those MCP servers and give that agent I don't need to work anything like it will do. But the thing is all when you plugging this MCP servers basically all this data coming out is mostly undeterministic. It's unreliable and it's untested, right? So, especially in engineering nobody does e-vals. Actually, it's it's more like a data machine learning concept but we don't do e-vals. So, for me if you Sorry, uh if you plug an MCP or like a rag and all we see whether the output is coming or not rather than is it really valuable actually? Is it really solving the problem or not? Uh that's the main problem that I see. I'm not saying pointing other people because I was that person. I was like, "Okay, let me build all MCP servers, plug in my institutional knowledge. I'm going to prove the point that agents can semi-autonomously can continue and fill those Jira tickets and finish it, right?" But every time when I build those MCP servers, 10% 20% 30% of time it was accurate, but rest of the time I was doing the data entry job for them actually. So, I was filling the gaps, answering the questions. So, basically I'm doing more work than actually doing less work. Uh So, I think this is the main problem. And I I actually was in this fourth stage where I literally started to write the domain context with handwritten actually. So, okay, let me write everything and prove the point, but I got really exhausted of doing it. Okay. So, how I don't know how many can you relate with this pie chart? Uh but, most of the enterprise uh the institutional knowledge is kind of something like this. So, 20% if you see, it's outdated. 20% it's unreliable. Uh 20% 10% is always duplicated with different places. And the major problem is 40% of the knowledge is always uh tribal knowledge, which means people know how things work. So, it's it's never documented actually. So, in this situation of an enterprise, and you build like 100 MCP servers and plug into that monolith, it doesn't matter how many you build, it won't work because basically your whole institutional knowledge is a is a monolith. Uh I think like because you're all from the engineering background, so you already know uh the transformation of monolithic legacy system to microservices, right? So, in the same way, unless we break down that monolith knowledge base into some kind of a context blocks which are useful for agents, then only we can actually make it useful for them. Uh for the agents and actually make them semi-autonomously can actually do the tasks. So, that's it. Uh we are going to talk in this workshop mostly on that monolith, how to break it, uh what is the approach to break it? And how it is useful when once we break it. And this is a job we need to do because the LLM providers will focus on the LLM model quality. The agents will focus on the harness things and there is a big retrieval market of 9 billion. They're focusing on retrieval. But nobody is going to come to your company and fix your knowledge base. You have to fix it yourself, right? So, how can we do it? Okay. So, the demand driven context is what the as a solution I was trying to propose. Right? So, basically, if I have to give an abstract of it, what it is, like we have Mike monolith services and we have this process of breaking them to microservices. We have waterfall model which we transform into agile. In the same way, when you have a monolith of institutional knowledge, how do you transform into a context blocks using an approach? So, this is an approach of how we can do it. Uh, before starting, just not an idea. So, we already tried with some data sets and try to prove this approach works. And in the March, we have published a preprint in RXP. So, if anyone interested in reading up academic papers, you can find it with the demand driven context or like I can also give you a link after the workshop. Okay. So, how does it work actually? Uh, when we are giving institutional knowledge to agents, basically, what we're trying to do is we're trying to do a push strategy, right? So, we build everything and we push it to to it. So, in in this approach, it's more pull approach. Uh, which means, for example, let's say a new joiner has joined your company, right? How do you onboard a person? So, you onboard them for a 1 2 days. You give some initial orientation and then you tell them like okay, these are the conference links, these are the GitHub, this is the some some kind of a documentation you have to follow things and all. Then you just assign a task to the person. So, but you're not going to tell okay, go and get graduated on on this knowledge and come back then I'll give you work, right? So, you'll just assign a work item. And when you assign the work item, the person will start asking questions, fill the gaps. If the person is very much into documentation, he will also fill the documentation for you. He gradually get his knowledge of of the institutional knowledge, right? In the same way we don't push all the knowledge to the agent rather than we start giving problems to the agents like work items and let them actually pull the information from us. And once pull the information also ask them to document it. Uh in a in a better way rather than in a monolithic structure. So, if you So, that's the four layers. So, you have a monolith a framework and it actually pulls and actually creates a good better context blocks. You can actually relate it to more into a a legacy monolith to microservices directly if you have to have an analogy of it. So, this is how it works. So, this is one cycle of uh how a problem to an agent and in the first attempt the agent will fail to do it. So, it will say you know what? You gave me a problem, but the most of the documentation I couldn't able to find anything. I couldn't able to do it. Then these are the things I need to do to finish this task and it gives a checklist of things. So, we fulfill the checklist like we fulfill the checklist. So, once it is uh, given, the problem is solved, it will take that knowledge and also it will update, that means curate the knowledge in a particular place so that it can reuse or like other regions can also reuse. This is one cycle. So, the idea is if we can do it in multiple, uh, sessions with multiple problems, so it will gradually, uh, curate your knowledge monolithic knowledge base uh, and also document it for you. Uh, you can also relate it to TDD. So, how many are do TDD or like Nobody hates TDD, right? Before before I, uh, yeah. Okay. Okay. So, uh, in the in the same way, right? In in a TDD approach, what we do, we just write the failed test cases. We don't build the product first of all. We just write the failed test cases. Uh, we see what is a code that is missing uh, for the failed test case to pass and we just give that code and we gradually build the a product based on the failed test cases. In the same way we give problems that agent will definitely fail and we gradually, uh, fill those gaps and at a certain point it becomes semi-autonomous with a good, uh, institutional knowledge already. Okay. So, I think I can jump into, uh, some kind of a, uh, demo already. Uh, So, I will use terminal, so don't hate me. Uh, I think like all of you're from engineering background, so I think like you'll like terminal. Uh, Let me switch to terminal. Okay. Okay. So, on the on the far left what, uh, right what you see is how, uh, under the hood it works actually. So, when you have given a problem, how does the agent will fail? How does it demand for the knowledge that the problem has to be solved? And a human like a domain expert and all fills those gaps and then it will curate a new knowledge base for you, which is which is much better. And then the agent succeeds and you can repeat on the next problems. So, that is how one cycle of things are done. So, how I it can be implemented it can be implemented using any agent. There is no it can be implemented on cloud or co-pilot because it's an approach you can do in any way you want. At work I use co-pilot. So, I implemented this using co-pilot but because everybody I I believe loves more cloud code. So, I created this demo with cloud code and you can see it's just a combination of skills, rules, agents and hooks and some kind of a place to save the knowledge base. On the middle pane what you're seeing on the top is your monolith basically. This is a representation of your confluence, Slack, GitHub and all but just for a sake of demo, I just put some flat files that look like them. So, that is how your monolith knowledge base will look like. On the down what you're seeing is on a live. So, when it's solving a problem, how it is actually adding the new knowledge to it. So, this is how let me So, let me So, what I'm going to do is I'm going to go to the agent. Okay, I'm going to basically give an incident problem to do the root cause analysis, right? So, Okay, what I did is you remember in the previous slide there is a Jira ticket samples that I showed, right? It's a combination of knowledge that is documented, not documented things at all. So, this incident also represent the same kind of combination. So, there is some knowledge that is documented already on your monolith, some it is not there or outdated or things like that. And most of it, it doesn't couldn't wouldn't able to find because it's never written down actually. So, when I gave this problem, so it uses those skills that I have developed using this approach. And it will try to actually first go to your monolith actually on the knowledge base and try to find information on what is there. So, think about it like this. So, first part is retrieval. That means it's already doing what RAG and MCP is doing. First part. But what else it is doing is after it fetches the data, what it will do with the data actually. So, that is a missing part. For example, when you give a new conference links to a new employee, the employee goes there, looks into it, but doesn't find information. But it he doesn't stop there actually. He continue asking questions so to solve the problem then then just adding more knowledge and things and all like Those are the next steps missing right now. It's we just stop at retrieval. So, this is the next three steps it does actually. So, you can see the confidence score is almost one to five because it says These are the particular terminologies. I don't understand actually these terminologies. And these business logics is not needed. So, one thing you need to look at here is whatever it has said, this is the undocumented information. That means it was never written down. So, unless you don't do this way, you will never know what is not documented. For example, if somebody says like, "Okay, there is documentation missing. We need to write." Okay, what do you want me to write actually? So, there is so much in the people's head. I can't write so much. It's somehow it has to surface. So, when you give a problem, it actually surfaces what is not documented. And it tells me, "Okay, this is missing. I need to have a new information there." Uh so, what I will do is so it does all the three steps. So, then what I will do is I already have a pre-prepared answer. Very high-level pre-prepared answer I gave it to it. Uh off like what is the missing information. So, okay, this is the missing information you asked me uh to solve this problem. Can you solve this problem now? Uh I didn't expect this one. Okay. Uh Notification is yes. I'll just say yes. If that's the sole what fictional name should I respond? Uh okay, I didn't see it. See, when I did the test and it didn't ask me the questions. Let's see. It knows it is a demo and I trained it to recognize. Okay. No, it is already what it is doing is already So, you can see on the live it is already adding the entities that the new knowledge base has been come into the place. So, the knowledge base is managed as a file system, right? Or as a system files? For the demo, I'm just showing it as a file system, but it's basically your MCB servers the data will stay in confluence, Slack, or things. Uh you can just plug in and use the same MCP servers or rack and all. So, it don't need to be a flat file. >> treat this as your system slide for for this agent. Right? Uh It's like a map. Do do you use any like a memory tool for for this? Yeah, I will show you on the next slide how I'm going to save. Uh Okay. So, it started from 56 entities or something with the this one, right? Now, one problem actually surfaced six entities that are never been documented. And when I gave that information to it, it is able to actually discover, curate another five or six new entities that were never documented. So, it does discovery of the gaps. It also gets information from me and also stores information, new information, and all. Uh this is one, okay? Next, let's see. This is a busy window. I tried to actually do things, but uh it didn't work out. Okay. So, what you're seeing on the on the window is like 14 incidents. You have seen one problem that I solved with an agent, right? The communication. What if I took like 14 incidents uh and I just go and have 14 cycles of this thing and how it does. So, if you see on the left side, it was the first incident, right? So, right now, it has 1.5 confidence. And everything is critical. Every So, basically, nothing is documented. So, everything is critical, high, and the data is missing. So, I started giving answers to in the first incident, then I repeated for the same second and third and continuously for like 14 incidents. But, on the 14 incidents, it basically actually able to go to a confidence level of 4.4 because first it discovered on every instant, it got the list of answers for me and also it documented everything for me. So it gradually from 1.4 to almost like five range of knowledge. It improved. So if you look at the traditional way in traditional way what we do is we solve all the context problem. Right? We have to deal with it first then I have to give it to agent. In this one we are moving agent from consumer to a knowledge manager. So you just don't consume from me. I'm going to tell you but the whole knowledge management is also your job and you have to do it for me. Okay, I think we can get back to the slides a bit. Okay, so what we have seen is we have I have run one cycle and also I have shown how it look like when I run in like 15 or 16 different cycles, right? But if you have want to do it manually, it would be really painful because I tried it after 15 cycles like nobody would like to actually sit with an agent and you know you have an incident but you won't be sitting with your agent and keep actually asking questions and telling you about your problems, right? So that is super painful. So but the thing is we can automate this process. So this is where actually it's it's really good and gets interesting. So here is the thing. You all we already have all the work items, right? We have Jira. We have incidents. We have customer support tickets like that. All those kind of a work items already there. Right? Sitting in the archive. So why can't we take them and actually use the framework and validate across your knowledge database, run an automation and see actually what is the state of your actually right now. Okay, let me see let me show you how it looks like. So, rather than actually doing it manually at a scale if we do this approach, so how does it look like? So, the demo that you're seeing is almost like everything is preset. For example, I have the demo. I have like a platform operations agent and uh I'm saying, "Okay, these are the recent incidents. Let's say I have 20 recent past incidents I have." Uh like an MD file or a JSON file, right? It has all the details of description of it, things and all comments and everything. And the rest of the files are your knowledge base. So, it's it's a file system, but you can also actually connect with the same way confluence and things and all. Just for a demo purpose, I'm just showing it as a flat files. Now, what I'm trying to do is I'm going to take all these incidents and validate each incident across my knowledge base and ask the agent, "Okay, tell me uh how much of the document is good, how much of the documentation is I can't trust it or like old or outdated, and how much is actually missing, not documented as per this incident?" So, let me run it. Okay, it will take some time. So, it will take three steps actually. So, one is uh it generates probes, which means a basic test it will write to actually test your knowledge. Uh Then it will run those tests and then analyze the gaps actually. Okay. It's a little bit hot in the room actually. I'm going to identify the gaps. it's just like a clever problem, so I imagine. Uh okay, for example, let's say let's say you have an incident called the notification service is not sending uh customer uh messages to uh SMS service, right? So, the notification service then you mention that the agent sees is there a documentation related notification service. So, it doesn't find, that means you never wrote a documentation on notification service. I do understand what is the customer SMSes things and all. The customer notification service when you mention, it's a gap because it's never documented. Or it takes the customer notification service, goes to confluence, and sees like the documentation how old it is. If it is says like uh it's like 1 year old, it will tell you, "Look, I looked into it. It's like 1 year old. I don't know whether I need to trust this documentation or not." Or like incomplete uh documentation. So, if you see it's called each incident it took it, and it looked at all the knowledge uh base that you have connected, and have consolidated list of like scoring of like okay, partially the agents can handle uh the basic edge cases of the incidents that you give uh because your knowledge base is not complete, and it will show you how much of the tribal knowledge is missing, system information, business process, what are actually you're missing from your institutional knowledge, when whatever is not documented. Uh this are the probes. And it will also identify uh what is critical and what is high. This is really important because uh let's say there is some kind of an uh example of notification service which I mentioned, right? It is repeatedly uh appearing in like 20 incidents, and you don't have This is the first that you need to you need to as per your documentation. So, it will also help us actually understand when you're uh breaking down your knowledge base you need to understand what is critical actually, what I need to focus on first, what makes value for me. So, we'll organize into critical, high, medium. So, this is what like I showed the flat files, but you can also connect it to the various data sources that we have. So, the step one is basically what it does demand extraction. That means every incident it will extract the checklist of information what is missing. On the second step is what it will consolidate everything what is missing. So, it will create like systems and APIs and all and how many are clean, how many are stale, which is incomplete, what is entirely missing. Uh something is tribal. Uh those kind of a classification it will also do. And it will create a Kanban board for you. So, what happens is So, if you want to fix your institutional knowledge base, basically, you just just like Jira tickets, we finish it. We actually has to document these missing pieces and all. And the the moment you started to So, it also saves in the context like So, it also has to build its own knowledge base. And you can see the performance. So, once you're fixing the tickets on the Kanban uh institutional knowledge. So, the how So, what we've seen is one is the approach first of all, which means not the pull push approach, but the pull approach, how to do it. One cycle or multiple cycle, how it look like. But, if you put it in a scale of automation, then how much valuable it would be. Okay. Now, the important question is Should I do a voice over later, actually? I have the patience. Who has the patience? For a second. So, the question is so I was all all the time I was talking about, okay, receives the context, we give the information, it will store it. But, the question is where does it actually store it? So, I have a very opinionated opinion. Hear me out. I prefer it has to go to a GitHub repository because eventually somebody will actually come up with a you know, 20 million seed funded SaaS solution for you. But, before that, I prefer to actually put it in GitHub as a repository. Why? Because if you look at it at a scale, if you want to do this, there will be multiple agents, multiple teams actually contributing to the same knowledge base. And there will be conflicts and resolutions, right? So, the the easiest way to do is using GitHub because it actually comes with inbuilt uh uh PR processes, review processes, things and all. So, if multiple domain experts are sitting and uploading the files or like agents are contributing to it, the most efficient way to manage is in a GitHub, something like a structure like this. And the other advantage is also if you put it on GitHub, you can also publish it to Confluence later or like Slack wherever you want to publish it to on the solution that you want to use. Uh so, I prefer to have it on GitHub, but if you want to directly integrate it to Confluence and all, you can also install do it. Next is a meta model. How many are aware of the word meta model in the Okay, maybe I can quickly show you how does it look like. So, meta model is basically something like this, right? So, in how does your uh uh domain actually structured around like is a business process are how it is related to a system. How systems are related to uh APIs. And how is this business jargon or like a tech jargons are actually linked it to which one. So, these kind of a relationship meta model is really important. It's not necessary for the approach that I have proposed, but it's an add-on. And why uh you need to have this one is right now think of it is like a map. Right now, your agents doesn't have any map actually to navigate with your knowledge base. Basically, what you're doing is you're dumping like these many number of files and it need to figure out which file I need to need, right? But, your file structure is actually a representation of your meta model. It actually knows how to navigate. For example, let's say can you fix this system? It will understand if I change make changes which business processes will be affected and which APIs I need to change or like touch these kind of the things. So, it's also important to have a meta model. If you have it, uh then it will produce more value. So, I strongly prefer to have a meta model along with this approach. Okay. So, the last part is what is the value it created? So, it There's a lot of slides that you have seen, a lot of demos that you seen. So, personally I need to also share like what's the value that I see when we I was using it or like the other people who I shared with already were using it came back with a feed feedback and told me. Uh first, the most valuable thing is knowing the unknown. So, what is never documented is something can be surfaced only by this approach actually. Uh otherwise, you will just end up in um an endless Miro board of like putting tickets on like okay, this is missing, this is missing, I need to add it, I need to add it and keep on doing it. So, this is the fastest and better way to discover uh with your previous work items and all what is never documented uh things and all. Uh second is uh basically I can now give work to agents rather than I do all the things. Like rather than I become I give the agents all this information. Let it manage my knowledge management. I don't want to be the knowledge manager of it. So, let let it do it. So, those are the two big values that I have seen. If you want to use it, I think like you will also see the those two as the most valuable. Uh but these are the other things what I seen. Now, okay. So, I also need to tell you like what is the uh uh drawbacks of also using it, right? First of all, if you are coming from a small team or like if you say like no, no, no, my documentation, my knowledge base is really good. I'm like super happy for you. Uh you're the lucky ones in this world right now with agents. Uh for you it might not be really relevant unless you have a very very complicated uh documentation that you have. Uh second is I already mentioned the manual manually doing is it's very painful. I don't prefer anyone to do it. If you want to just try it for testing purposes, you can also do it. But, uh, automation is the most best way to actually use this one. Uh, this is very early, this approach. So, by tomorrow morning on YouTube, somebody would have already posted something differently, uh, better than me. So, uh, in the in the era of AI, nobody knows like, uh, how long a thesis or an approach or an app product going to survive. So, for now, I see this is the best approach. Okay, so the whole workshop, so we started with, uh, one pipeline, right? On the ROI. And so, the demand-driven context actually sits between this monolith and also the retrieval layer, actually. And what it does is, uh, it actually helps you build curated context blocks for you. You can also think of it like a uh, cache database that you have. So, every time your agent doesn't need to go and, you know, boil the ocean for your, uh, fixing an issue. Rather than if you have a good context block of information, most of the time, 80% of the time that can be usable. Because what I also believe is it's always the 80/20 percent rule. So, 20% of your documentation is most useful. 80% is some corner cases you have to look into it. So, rather than giving 100% of things, you need to figure out of what is my 20% of that, uh, that is super helpful for agent and have it like a cache database, uh, the context block of it using it. And rest of it, you can leave it like, uh, links. So, whenever agent feels I need more information, then only it can go and check the, the whole, uh, monolithical, uh, institutional knowledge. Okay, so from here, what you can take from this workshop is three things. One I hope I makes I made sense of this approach. So there is a GitHub repo which I detailed it out and also a starter guide on it one if you want to go home and try with it and you can try it. You already have know how the framework works. So you want to go home and just remix the whole approach you can do it and let me also know. I'll leave this one and I'll join with you for contribution. You have a context gap gap scanner that I showed you which is live already with presets. I think like added like $20 on it. So hit it as much as possible. You all right? Okay. Okay. So after $20 so first come first serve. So all these three you can use you can take away from this workshop. Okay, so because this is a workshop so I also would like to want you to try something. What you can try is three things. One is either if if you say like you know what I'm so so tired already it's almost like four it's almost about to go for a party. I don't want to do it. So you can just go to the context gap scanner. Everything is a preset here. You can just try it out hit it and see how it works. If you think it can be done better let me know so that we can work with it. Uh or otherwise let's say now I'm I'm very technical. I want to know how it works under the hood. This is a GitHub repository. Uh it's it's under maybe I'll just take this out. Uh this is a GitHub repository and it has all the information. Uh plus there's is guide also if you want to if you want to try it out. But, uh, if you still feel like no, I want much more simpler. You can also try this one. So, you don't need to do anything. Basically, take this prompt, uh, take one of your Jira ticket or incident that you have right now. If you already built MCP servers, uh, or like any other kind of a things, you just use that prompt, give it to your agent, uh, with the incident or a Jira ticket, and ask it, uh, give me the quality of the knowledge base that I have as per this incident or Jira ticket in this way, and see how many how much of it comes in the red, which is never documented. So, you can you can try also this simple one. I just leave it like this. Maybe you can take a picture. Maybe I can switch to the slide if any anyone want to. Cool. This is your slide, you can go. Yeah, so I see some problems in this approach, but I find it very interesting. So, my my first question is have you already used this way of working at scale? Or because we've seen mostly toy examples, right? Yep, yep. Uh, I used it, uh, not at a scale. I started with simpler because you also need to see what is the scope of it. Let's say I have an enterprise, and I try it at enterprise level, I can't do it because it's multiple domains, things, and all. Even if If do it at domain level, I need to understand I tried it at domain level. Then even at a domain level, there is so much of a domain expertise I need to fill it up and fill those gaps. So, again cut down into maybe what is the smallest team that I have and the smallest team's Jira tickets, the smallest team's instance, and the team's Confluence page. Uh with a bit of a scope. Then if I drill down the scope, then I feel like it's more fast, more useful. But if I do it at a bigger scope, what happens is not one person has the whole domain expertise. So, basically it again becomes like somebody has to come and you know, five or six people has to sit down and start doing these things. Yeah, I'm I'm a bit concerned that this might denial of service attack your your team members in a certain way because our LLMs are fine-tuned to keep eliciting information, to keep getting more information out of us, to ask follow-up Okay. Um so, I think it will be hard on the engineers that have to do the question answering. Um and secondly, the the scanner is nice, but that's still built on the assumption that all of your team members and the rest of the enterprise are still using your enterprise IT well, as planned, that they're actually filling in their tickets with all the details and etc. And I know from practice that that is most of the time not the case. Uh that is true. That is true. I agree with you. Even if I my my assumption is also even if I go to a leadership to buy in, like, "Hey, can you give me a bandwidth or like you know, I need these people to actually sit and fix the context?" I don't think right at this point of time nobody will do. But I think it will happen because slowly I think we are slowly moving towards in agent managers where agents are becoming semi-autonomous or autonomous, and we manage them. But at the certain point of time, somebody has to fix that knowledge because it's not going to come from anywhere. You have to. So then the enterprise focus will shift towards the gap. That's what I started saying, I don't think nobody is looking into the problem yet. Everybody is very focused with agent how good the agent is, how good the retrieval is, but how good the context is, you're not solving. It I think like in down the line in a year or so, I think people will realize importance of it and uh the Kanban board will definitely come into reality actually very soon. Yeah, thanks. Yeah, I think actually they're going on the same price. I think when we look at large which applies actually the source of truth is not actually the documentation, it's actually the code. Just wondering have you applied it to the code base? Uh I did. Uh I also applied it the code base. Uh but I got a mixed result when I So So there here is the thing. What happened is when I only use code base, uh it is particularly good or when I only use Confluence or like textual data uh like uh uh it gives a good results, but when I combine it, somehow actually uh it conflicts because it it creates a theory out of the GitHub repository, but the same GitHub repository documentation is also on Confluence. So there it gets a conflict of Okay, what is the source of truth? Code says this. Should I implement it this way as per the documentation? So then again I need to create an additional skill or rules like Okay, what is the ranking that you need to give? If you see it in a GitHub, that means that is the source of truth. Or if you see it if you don't see it, then you have to uh look the information in Confluence things and all. But those are still I'm trying to fix those things actually. So seeing the gaps and fix those things, but I definitely see uh that issue combining those two. And the second question is But interestingly, it's actually applying the same approach and skills because what we find out is actually like you have your um kind of like your um like the the process starts like by running agents, which is bringing a context and identifying the right skills that need to be used, right? Mhm. Then you go and do the task and you fail, Mhm. right? Then once you fail, you identify what you need to solve. Mhm. You go back, you curate, you you fix, but then you're fixing the knowledge kind of base. What we find out actually is we go back and fix the skill. Okay. >> That's the increment of the the next situation. Sorry. So, I'm not sure if skill could also be a skill that skill that Yep. skill's part of the the iteration loop or something. Uh I think right now the skill that I have built is static, but what you're more proposing, if I'm not wrong, it's like uh evolving skill, right? If the skill fails, it has to evolve, right? Uh I agree with you. I never tried it, but I think like it has to be uh like that. Because I'm also more concentrating on how to do it at scale. Uh the reason is also uh I I want I want the context to be fixed before retrieval itself. Not during operational. So, first when I started with it, I I started doing with when operational, which means oh, I have a work item, I will assign to it, it will fail, then I'll start giving context and all, but it takes a lot of time. It takes a lot of patience for me. So, rather than doing it, you know what? I'm going to fix the context, but before retrieval. So, if I can uh while I was answering your question, if you take a team, the context that you need to fix is very small. So, you can use a uh context gap scanner uh kind of a thing. And maybe if you're good have a good domain expert, I think like couple of weeks you can actually fix your documentation. Not like 100% at least like 60, 70, 80% of a good quality that you can already build it. So, my proposal would always be don't do it at an operational level. Uh at an real-time level, but do it before retrieval uh itself. That is much better in this approach. Yep. Uh yeah, I have another question that is relevant to especially SK. If you have a lot of documentation and if that documentation is sitting in a GitHub repo. You may have situations where you ask questions that may need I don't know, like five or six different docs. So, you will have like I'm going there, reading all the docs. Like this takes time for retrieval of information and then it takes a lot of context also for the Uh Right now, after Claude code announced 1 million of tokens in the context window, I don't know No, I don't have any problem. So, I calculated it. Uh at an average, it's like uh 96K tokens because I tried with different domains actually. Per domain, I see like around 96K tokens uh if I consolidate everything like confluence, things and all. Uh so, easily it fits in the context window actually. Uh I've tried to do some experimentation around, you know, a graph rag, put them there rather than just take all the files, use a graph rag, understand the intent. But for me, just putting the whole context right now in the window gives you more results than actually doing uh uh rag. Unless you have a very big uh almost around a million tokens of a context that you want to fit in. Maybe then you have to use a bit more retrieval mechanisms between it, but otherwise I think like it should be fine. Uh I have one question. Uh I opened your paper, and could you explain this graph like uh comparison between different techniques like domain knowledge strategy, knowledge access. Uh Uh which one? Yep, sure. This one. Okay. So, I also did the citations from other papers. Okay. Uh, so, not directly related, but you have the paper of AS, uh, which is also does a similar thing. So, but AS is not exactly into Uh, how do you say, discovery and curation actually. If I remember correctly, uh, maybe I need to refresh my memory. >> mean between the difference between domain knowledge and strategy knowledge? Okay. So, strategic knowledge, okay. So, what AS and all are doing is when you are trying to have a conversation with AI, uh, you can see in the cloud code and all, it updates its memory. Or like the relationship with your things like that, right? So, and also from the chat history, it understands what is the most important context I need to remember, those kind of the things. So, when you are in communication with it, that operational conversations with AI improvement they propose. So, what I my proposal is not based upon your conversation with an AI, but rather than your domain knowledge which is documented actually. Yeah. Uh, somebody else has a question. Sorry. When you're going to remote knowledge, so things on Confluence and that kind of stuff, um, how do you ensure that your agent only points to the updated or like the the relevant documentation as because in your local file system, you you have tags like outdated and stuff like that, but remotely that's harder to tag, I guess, right? So, Okay. So, when I wrote a pipeline for extracting from Confluence, it also allows actually to give you a date and also last updated who created kind of an analytics on on the space. So, you can use it to actually put a threshold of like, okay, on this particular date, whatever is is old, consider it as an outdated one. And let me know. Don't just consider it as an outdated one, you let me know because sometimes the document can be stale for so long, but it could be an important document actually. So, it lets you know, but not like take decisions actually right now on this one. So, you decide which one is stale and which one is not. You don't have like an intermediate layer where you in the repo you store this is still a data dump. Okay. So, when it is curating the context, also it updates with a date and also the state of the document like stale, active, and clean. So, it also looks into, okay, this is stale, I know I'm not going to touch it, and I'll just go to just look for any other new other documents are there in this one. Did you think about how to manage access or permissions later to this knowledge? Like if you have some Mhm. knowledge in the company which is sensitive and only and only specific people can get access to it. Currently, I guess you just have all the knowledge and everything is accessible, but there Okay. So, because it's not a product or a SaaS solution, it's just basically GitHub. For me, right now, permissions and things are not difficult to implement because GitHub out of the box gives me who I can give the permission to this GitHub, who can have right, read access, things and all, who can merge, those things and all. But in case if it evolves into a product, and for example, context gap scanner as a product, and I want you to test it because the reason why I was using presets for this workshop, not actually asking you to upload the files, is because I don't want to take your IP data on this one, right? So, unless it becomes a product, you don't have any problem GitHub and all, but if you have a SaaS solution for this one, Uh, it's between how the SaaS solution will manage it right right now. But the approach has nothing to do with access things and all. How you implement those access on on the knowledge is up to you. You have a question? So this is of course about documentation, but did you give any consideration about using it on some like central tooling that a company would use? Like let's say that you have a platform team and you have a CLI that the different teams are using. >> Mhm. And so now it's used by different agents, right? Okay. And so the agents can also be like, well, this action is available for a resource, but I don't want to do 500 calls just because I have a list of 500 resources. Okay. >> be nice if the tool could do that. I don't know if if you've given any consideration to this. I I think that is how it has to work in an organization. You need to have a central solution for it. But how you want to do the solution is up to the organization. For example, we are doing Agile, right? So Agile can do by Scrum, Kanban, or like Lean, or something. And also you can do different apps to do it. The The process is the same, but how you do it, which method you will choose, and which app that will you choose in your organization is different. In the same way, what we have discussed is the approach. If you want to put it in the organization, you can use the approach and you do you can do it in whichever way you want. My point is more like, so with this you can identify gaps in your documentation. >> Uh-huh. Right? Could you use it to identify gaps on your tooling? Okay. Uh, when you say tooling, it's the agent Internal tools that I don't know, maybe a team is building for the rest of the company. Just infrastructure in general, right? Maybe. Yeah. Uh, can you give me an example of uh, like how it could >> that uh, I don't know, you build some sort of abstraction on top of Kubernetes. Uh-huh. Okay. >> You don't want your developers to necessarily know what to do with that. And then you a different CLI or you have something, right? Um but then like I say, I don't know, maybe you thought that they would lease one of your custom applications or corporate applications one by one. But a team has grown into using more of that and suddenly they have a lot and they don't want to do that many calls or perhaps even the agent is like, "Well, this is inefficient. I would like this internal tool to work in a different way." Okay. >> And in that way it would identify like gap in the tool or performance improvement, kind of like these tasks for documentation. Could be extended, actually. So because we have seen the business processes also, right? So it can also document business process. The business process is nothing but how uh the process in the application it actually runs and does things, right? So you can extend it to also find out the gaps in the business process or like how it works. It could be an extension to it. Okay, yeah. Thank you. How would you ensure that maintenance won't kill you? I'm sorry? How would you ensure that maintenance won't kill you? Because the knowledge knowledge for a company changes with time. Yeah. So if the answer for a question today is B, tomorrow could be one. On Friday it could be C, right? Mhm. So you have to see whether you you have to identify B one and B and overwrite them. If it's stored as a text, that's a problem. Okay. So when I showed the context gap scanner, you also saw like an indicator of a duplication, right? So if today you have a document, tomorrow you have version 2.0 and something else actually, it will find out the same information is having in three different It It also will find it. If you have only one it is changed, it will take the latest updated one because as a human you changed it. So it will take it as a source of truth, right? But if you have three versions of the same document, that's a duplication, and it will flag it as a duplicate. But, but it's a search problem. How how do you ensure performance? Well, let's say you have a document of 100,000 words. Uh-huh. And you just change a word, like a password, let's say. It won't be there about for example, right? If you just change Okay. So, you have to find this specific word, compare those few in three documents, Mhm. find them and replace them, so on. How is it feasible? Which tools would you use to make to ensure it will it will it will not kill you cost-wise? Um I didn't quite get your question, actually. Uh Is it like the token usage you're worried about, like uh that many tokens that we used How How is What is the cost saving? Yeah, precisely. If if if it will grow Mhm. precisely you need, and if there are a lot of changes, you would have to maintain maintain that whole database, right? So, you have Mhm. Okay. the structure. I don't know many grammar I won't talk about, right? Um but my question is, well, cuz what you presented is sort of a happy path, where you have a gap, you fill it, and then you reuse it. But, in a while, you will have a bigger problem, where you pretend to have that gap filled, but actually it doesn't contain up-to-date information to be done. It contains wrong information, right? So, you want to preserve that. Uh okay. Okay. Uh so, it can flag as per when it is created or the last updated. You can set such kind of a filters. But, let's say you have a latest document which has a wrong information. Right? >> But, you don't know that, right? No, that's true. But, as for example, as a human being, right? So, you go and look into documentation. You told somebody to look into documentation. And the person looked into the documentation and as per the documentation, this is being implemented in this way. The person will do it, right? It's not an agent or a human issue. Okay. Mhm. Okay. Okay. Uh for for example, sorry, you're saying that But I don't think it will cost that much as a As I said like when it tested it, there is none of the domains which cross more than a 100k tokens actually. So, I don't think we will for example conduct gap scanner, right? You I don't think like you don't have to do it like on a daily basis or anything. Even if you run daily basis like 100 tokens and do one scan. For example, right? If you try to start hitting all of them, all of you, the context gap scanner, I think you can't even burn like one $1. I think so. If I'm not wrong. It already had like Oh, you're ready, okay. Okay, I'll I'll cancel the subscription. But I Mhm. specific domain Mhm. So, the moment you scale, you would have to solve this uh cost question. Okay. Which is going to be depend also how fast the to to his point, like how fast the the the data change. Mhm. Which I think most of the time not that much. So, the moment you get to like 80% 90%, you just continue performing the task. Yep. Uh I see it's different use case by different use case. Yep, use case by use case, yep. Yep. Uh any other questions? Yeah, I was curious, so I I ran this game maker for like one of these uh domains, Okay. and it has a bunch of recommendations. How do I know that that's enough Uh It it It actually tries to detail out as much as possible. Right now, I haven't actually exposed everything what it did just for the UI purpose. Uh but the all the per ticket what it actually found, like uh it writes like 100 or like 150 lines of markdown files and save it somewhere. So, that gives you more details in case if you want to know actually. So, for for the demo, I just put the you know, nice UX stuff on on top of it, but you also have a detailed information at this. Anyone else has any questions? Can I have another one? Uh yeah, sure, go on. Right, so you Easy one, huh? I'll just try. Yeah, I think you started with the job for clients or that's 40%. >> Yeah. And if I call the translation right, your claim is with fine lines. Not this You said a couple of weeks, like a couple of weeks, and you will have filled that Not fill the gaps, uh but discover. But if you scope down to a team, within weeks you can do it. Right, cuz I mean, if you wouldn't fill it, and you would have to keep asking those questions, then >> Yeah, so you do it one time. Uh first of all or like multiple times at first, see the whole picture first of all. What is the state of your knowledge base? First fix it at that level. Then you go into operations, right? You still can actually also You you can still continue doing it with the agent with skills. Yep. Uh-huh. Mhm. Okay. So, you mean like you can also give all the transcripts rather than doing the cycle, you mean? Uh that can also be done. If you're only have the all the time you have a discussion in the meeting. Everything is documented meeting's transcript itself. But I don't think like this same case for everyone at least uh uh for I think the amount of time people spending in teams if you do use the transcripts, actually those are the ones actually who which have more tokens actually. There are so many useless meetings. Uh the transcripts actually Could be. Again, it depends upon institution to institution, right? So, are you like more into meetings have solving problems within the conversations? And those conversations has a data? Or like your conference or things has a data. If you have it, use the those transcripts as your knowledge base. And at the same time like the compression actually that works actually. That that is more useful. Yeah. Anyone else? Any questions? No? All good? Uh then thank you so much for attending this session.