Demand-Driven Context: A Methodology for Coherent Knowledge Bases Through Agent Failure

Channel: aiDotEngineer
Published at: 2026-05-05
YouTube video id: _QAVExf_1uw
Source: https://www.youtube.com/watch?v=_QAVExf_1uw
Uh
Thank you. Maybe we can get started. Uh
first of all, thank you so much for
coming for the workshop.
Especially ones who didn't get the seat.
Uh
I I promise you I'll do my best to make
it entertaining especially for you for
sitting.
Uh thank you so much.
>> volunteers, right? Yeah.
Uh actually it it makes sense. So now I
know like why the tickets got sold out,
right?
Uh which workshop actually sold out the
tickets.
So
uh
Let's start with uh my introduction. So
I'm Raj.
Uh I work as a staff software engineer
uh at IKEA.
Uh I work for a domain called delivery
and services.
Uh basically we are like almost more
than 100 engineers and six product teams
all together. It's like a mini company
within the company itself.
Uh
I'm very interested with architecture,
neuroscience, and linguistics.
And now AI. So if anyone have some cool
projects because everyone is building
cool projects these days.
Please find me after this meeting.
So quick pulse check uh with the
audience. Uh who is visiting London for
the first time?
Okay, cool. Welcome to London.
Uh who is here uh from the engineering
background?
Also with live coding and uh
prototyping.
No, all of Okay. So uh who actively uses
agents like Copilot or
Okay, this is going to be tough for me
then.
So extensions?
Okay, so everybody is pro. Okay, fine.
Not so much as I want. Okay, so much
tension now.
Uh
So you're going to sit here in this hot
room for more than an hour. So, first I
will uh give a bit of introduction of
what I'm going to present today.
Uh it's basically on agent and the
context management. Uh I I divided into
three parts. Uh one is the situation,
which all of you already know, so I'll
keep it tight and short like 5 minutes.
Uh then I'll talk about the problem.
This is where I'll spend a bit more time
on the slides because
I think like uh nobody is actually
seriously looking into the problem.
Uh and I want to bring it up.
Then less slides and more into some kind
of a hands-on. Uh how the uh actual
demand-driven context actually works.
All good?
Okay. So, let's start with the first
one.
How many of seen this movie?
Memento.
Okay. Okay. Okay. Cool. So, I'll I'll
give the gist of the movie. So, this guy
is very skilled, uh very talented. The
only problem he has is uh he can't hold
memory more than 15 minutes. So, every
15 minutes he has to take his notebook,
uh watch his tattoos that he put it on
his one, and figure it out, "Okay, what
I was doing before the 15 minutes?" And
he does it again and again. If you
relate to the AI and AI agents things
and all, it's actually fits exactly how
the movie is and how the agents are
right now.
If you go and watch this movie,
you don't need to watch YouTube or blogs
to understand agents and MCP. This movie
actually tells you about everything.
Literally. Uh
and as this guy has a memory problem, uh
in the same way the AI uh that we got
introduced couple of years ago uh is
very good with reasoning, computation,
uh code generation. It's It's
benchmarked as like uh above the par.
The only problem is the institutional
knowledge, right? The domain knowledge
that you have. That's That's the only
thing we have to be uh a bit
problematic.
So,
from AI to agent, if you look at the
evolution,
it exploded. So, it started from prompt
engineering first of all.
Then there was rags,
MCPs, then multi-agent. Now it is deep
agents. Uh
I recently found out like
uh using Replit actually, you can build
a full stack app in 10 minutes. That
means the
by the time you make the instant
noodles, you already have a
million-dollar app already working on
your laptop. So, we got it to this
point.
Like it's extraordinarily good.
Now, that's AI and agents. So, let's
talk about enterprise AI.
Okay.
Uh I don't know how how many have you
have this question, but most of the
enterprises I see the question is,
"Okay, AI is pretty smart. It's doing
uh code generation, full stack apps,
reviewing your PRs. Uh
doing incident management, all those
things.
Right? So, if AI is doing that much, why
is the Jira tickets or epics are not
moving on the dashboard, right? Why do I
don't see the delivery actually? So,
everybody is speaking about look at like
3 minutes, everything is ready. Yeah,
okay, fine.
Why are my Jira epics are not moving?
Because that defines the business
delivery and that defines the return of
investment, right?
Uh
and as you see, like uh it's it's from
the McKinsey this year, like 88% of all
companies use AI, but they only see like
uh 6% of value creation.
Okay, so I think this is the problem
uh that we have.
Uh I have four Jira tickets, different
ones,
uh sample ones. And you can see the
green ones that I have uh marked as
basically which LLM is or already
trained on like APS standards or like
things very they already know it's a
general knowledge, right? That's fine.
Those tasks from the ticket it can pick
up and it can do it. Now, there are
second part orange ones which we have to
teach them actually. So, you have you
know this but do this in this way. So,
all this kind of an orange color things
will fit into, you know, your agent
extension like skills or like uh
But, the red ones
that's what the institutional knowledge
is which sits within the company and
within the people.
So, unless if it picks a task uh if it
picks a ticket, it has to fulfill all of
them.
It is so good at with uh green ones and
orange ones but it struggles with the
red one with the institutional
knowledge.
And what I believe is uh
right now the coding agents are getting
so so better.
I feel like if there is an AGI coming,
the first AGI will be a coding agent for
sure.
Uh
So,
to fix the uh
giving the institutional knowledge to uh
the agents, so we have an industry
solution already. So, this is basically
your return of investment on AI pipeline
will look like, right? So, you have LLM
model quality,
uh you have agents, and you have agent
harness.
And your your institutional knowledge
sits under Confluence, Jira, SharePoint,
GitHub, all those things. And basically
retrieval layer is what industry is
telling us will fix that issue. So, you
build that a retrieval layer and then it
will fetch all those things and give it
to an agent and the agent should be able
to do it, right? So, basically uh
So, 40% uh
actual factual accuracy can be achieved
through rag or like knowledge graphs
actually.
Uh
but with a documented knowledge base.
Now,
basically if you build a retrieval layer
it has to work, right? Now, let me ask
you a question. How many have you built
a retrieval layer things like rags and
MCPs?
Okay, cool. All of them. Okay, now the
question is how many did you build?
How many MCPs did you build? How many
have you built more than 20 MCPs at
least?
Okay.
Okay, so nobody beats my record then.
Uh
so
what I see is mostly in the enterprise
organizations and all people are
building like 10 to 15 or like 20 MCP
servers or like rags,
knowledge graphs on top of their
institutional knowledge and
to the agent, right? So, the assumption
is if we if I can build all those MCP
servers and give that agent I don't need
to work anything like it will do.
But the thing is all when you
plugging this MCP servers basically all
this data coming out is mostly
undeterministic.
It's unreliable and it's untested,
right? So, especially in engineering
nobody does e-vals. Actually, it's it's
more like a data machine learning
concept but we don't do e-vals. So, for
me if you
Sorry,
uh
if you plug an MCP or like a rag and all
we see whether the output is coming or
not rather than is it really valuable
actually? Is it really solving the
problem or not? Uh that's the main
problem that I see. I'm not saying
pointing other people because I was that
person.
I was like, "Okay, let me build all MCP
servers, plug in my institutional
knowledge. I'm going to prove the point
that
agents can semi-autonomously can
continue and fill those Jira tickets and
finish it, right?"
But every time when I build those MCP
servers,
10% 20% 30% of time it was accurate, but
rest of the time I was doing the data
entry job for them actually. So, I was
filling the gaps, answering the
questions. So, basically I'm doing more
work than actually doing less work.
Uh
So, I think this is the main problem.
And I I actually was in this fourth
stage where I literally started to write
the domain context with handwritten
actually. So, okay, let me write
everything and prove the point, but I
got really exhausted of doing it.
Okay. So, how I don't know how many can
you relate with this pie chart? Uh but,
most of the enterprise
uh
the institutional knowledge is kind of
something like this. So, 20% if you see,
it's
outdated. 20% it's unreliable. Uh 20%
10% is always duplicated with different
places.
And the major problem is 40% of the
knowledge is always uh tribal knowledge,
which means people know how things work.
So, it's it's never documented actually.
So, in this situation of an enterprise,
and you build like 100 MCP servers and
plug into that monolith, it doesn't
matter how many you build, it won't work
because basically your whole
institutional knowledge is a is a
monolith.
Uh I think like because you're all from
the engineering background, so you
already know uh the transformation of
monolithic legacy system to
microservices, right? So, in the same
way, unless we break down that monolith
knowledge base into some kind of a
context blocks which are useful for
agents, then only we can actually
make it useful for them.
Uh for the agents and actually make them
semi-autonomously can actually do the
tasks.
So, that's it. Uh we are going to talk
in this workshop mostly on that
monolith, how to break it, uh what is
the approach to break it? And how it is
useful when once we break it. And this
is a job we need to do because
the LLM providers will
focus on the LLM model quality. The
agents will focus on the harness things
and there is a big
retrieval market of 9 billion. They're
focusing on retrieval. But nobody is
going to come to your company and fix
your knowledge base. You have to fix it
yourself, right? So, how can we do it?
Okay. So,
the demand driven context is what the as
a solution I was trying to propose.
Right? So, basically, if I have to give
an abstract of it, what it is,
like we have Mike monolith
services and we have this process of
breaking them to microservices. We have
waterfall model which we transform into
agile. In the same way, when you have a
monolith of institutional knowledge, how
do you transform into a context blocks
using an approach? So, this is an
approach of how we can do it.
Uh, before starting,
just not an idea. So,
we already tried with some data sets and
try to prove this approach works. And in
the March, we have published a preprint
in RXP. So, if anyone interested in
reading up academic papers, you can find
it with the demand driven context or
like I can also give you a link after
the
workshop.
Okay.
So, how does it work actually?
Uh,
when we are giving institutional
knowledge to agents, basically, what
we're trying to do is we're trying to do
a push strategy, right? So, we build
everything and we push it to to it.
So, in in this approach, it's more pull
approach.
Uh, which means, for example, let's say
a new joiner has joined your company,
right? How do you onboard a person? So,
you onboard them for a 1 2 days. You
give some initial orientation and then
you tell them like okay, these are the
conference links, these are the GitHub,
this is the some some kind of a
documentation you have to follow things
and all. Then you just assign a task to
the person.
So, but you're not going to tell okay,
go and get graduated on on this
knowledge and come back then I'll give
you work, right? So, you'll just assign
a work item.
And when you assign the work item, the
person will start asking questions, fill
the gaps. If the person is very much
into documentation, he will also fill
the documentation for you.
He gradually
get his knowledge of of the
institutional knowledge, right? In the
same way
we don't push all the knowledge to the
agent rather than we start giving
problems to the agents like work items
and let them actually pull the
information from us.
And once
pull the information
also ask them to document it.
Uh in a in a better way rather than in a
monolithic structure.
So, if you
So, that's the four layers. So, you have
a monolith
a framework and it actually pulls and
actually creates a good better context
blocks. You can actually relate it to
more into a
a legacy monolith to microservices
directly if you have to have an analogy
of it. So, this is how it works.
So, this is one cycle of uh how
a problem to an agent and in the first
attempt the agent will fail to do it.
So, it will say you know what?
You gave me a problem, but the most of
the documentation I couldn't able to
find anything. I couldn't able to do it.
Then these are the things I need to do
to finish this task and it gives a
checklist of things. So, we fulfill the
checklist like we fulfill the checklist.
So, once it is
uh, given, the problem is solved, it
will take that knowledge and also it
will update, that means curate the
knowledge in a particular place so that
it can reuse or like other regions can
also reuse. This is one cycle. So, the
idea is if we can do it in multiple, uh,
sessions
with multiple problems, so it will
gradually, uh, curate your knowledge
monolithic knowledge base
uh, and also document it for you.
Uh, you can also relate it to TDD. So,
how many are
do TDD or like
Nobody hates TDD, right? Before before
I, uh, yeah. Okay. Okay. So, uh, in the
in the same way, right? In in a TDD
approach, what we do, we just write the
failed test cases. We don't build the
product first of all. We just write the
failed test cases.
Uh, we see what is a code that is
missing uh, for the failed test case to
pass and we just give that code and we
gradually build the a product based on
the failed test cases. In the same way
we give problems that agent will
definitely fail
and we gradually, uh, fill those gaps
and at a certain point it becomes
semi-autonomous with a good, uh,
institutional knowledge already.
Okay.
So, I think I can jump into, uh, some
kind of a,
uh,
demo already. Uh,
So, I will use terminal, so don't hate
me.
Uh,
I think like all of you're from
engineering background, so I think like
you'll like terminal.
Uh,
Let me switch to terminal.
Okay.
Okay. So, on the on the far left what,
uh, right what you see is how, uh, under
the hood it works actually. So, when you
have given a problem,
how does the agent will fail? How does
it demand for the knowledge
that the problem has to be solved? And a
human like a domain expert and all
fills those gaps and then it will curate
a new knowledge base for you, which is
which is much better. And then the agent
succeeds and you can repeat on the next
problems. So, that is how one cycle of
things are done. So, how I it can be
implemented it can be implemented using
any agent. There is no it can be
implemented on cloud or co-pilot because
it's an approach you can do in any way
you want. At work I use co-pilot. So, I
implemented this using co-pilot
but because everybody I I believe loves
more cloud code. So,
I created this demo with cloud code and
you can see it's just a combination of
skills, rules, agents and hooks and some
kind of a place to save the knowledge
base.
On the middle pane what you're seeing on
the top is your monolith basically.
This is a representation of your
confluence, Slack, GitHub and all but
just for a sake of demo, I just put some
flat files that look like them. So, that
is how your monolith knowledge base will
look like. On the down what you're
seeing is on a live. So, when it's
solving a problem, how it is actually
adding the new knowledge to it. So,
this is how
let me
So, let me
So, what I'm going to do is I'm going to
go to the agent.
Okay, I'm going to basically give an
incident problem to do the root cause
analysis, right? So,
Okay, what I did is you remember in the
previous slide there is a Jira ticket
samples that I showed, right? It's a
combination of
knowledge that is documented, not
documented things at all.
So, this incident also represent the
same kind of combination. So, there is
some knowledge that is documented
already on your monolith, some it is not
there or outdated or things like that.
And most of it, it doesn't couldn't
wouldn't able to find because it's never
written down actually. So, when I gave
this problem, so it uses those
skills that I have developed using this
approach. And it will try to actually
first go to your monolith actually on
the knowledge base and try to find
information on what is
there.
So,
think about it like this. So,
first part is retrieval. That means it's
already doing what RAG and MCP is doing.
First part. But what else it is doing is
after it fetches the data, what it will
do with the data actually. So, that is a
missing part. For example, when you give
a new conference links to a new
employee, the employee goes there, looks
into it, but doesn't find information.
But it he doesn't stop there actually.
He continue asking questions
so to solve the problem then then just
adding more knowledge and things and all
like Those are the next steps missing
right now. It's we just stop at
retrieval.
So, this is the next three steps it does
actually. So, you can see the confidence
score is almost one to five because it
says
These are the particular terminologies.
I don't understand actually these
terminologies. And these business logics
is not needed. So, one thing you need to
look at here is whatever it has said,
this is the undocumented information.
That means it was never written down.
So, unless you don't do this way, you
will never know what is not documented.
For example, if somebody says like,
"Okay, there is documentation missing.
We need to write." Okay, what do you
want me to write actually? So,
there is so much in the people's head. I
can't write so much. It's somehow it has
to surface.
So, when you give a problem, it actually
surfaces what is not documented. And it
tells me, "Okay, this is missing. I need
to have a new information there."
Uh so, what I will do is so
it does all the three steps. So, then
what I will do is I already have a
pre-prepared answer.
Very high-level pre-prepared answer I
gave it to it. Uh
off like what is the missing
information. So, okay, this is the
missing information you asked me uh to
solve this problem. Can you solve this
problem now?
Uh
I didn't expect this one. Okay. Uh
Notification is yes. I'll just say yes.
If that's the sole what fictional name
should I respond? Uh okay, I didn't see
it.
See, when I did the test and it didn't
ask me the questions.
Let's see. It knows it is a demo and
I trained it to recognize.
Okay.
No, it is already what it is doing is
already
So, you can see on the live it is
already adding the entities that the new
knowledge base has been come into the
place.
So, the knowledge base is
managed as a file system, right? Or as a
system files? For the demo, I'm just
showing it as a file system, but it's
basically your MCB servers
the data will stay in confluence, Slack,
or things. Uh you can just plug in and
use the same MCP servers or rack and
all. So, it don't need to be a flat
file.
>> treat this as your system slide for for
this agent.
Right?
Uh It's like a map. Do do you use any
like a memory tool for for this? Yeah, I
will show you on the next slide how I'm
going to save. Uh
Okay. So, it started from 56 entities or
something with the this one, right? Now,
one problem actually surfaced
six entities that are never been
documented.
And when I gave that information to it,
it is able to actually discover, curate
another five or six
new entities that were never documented.
So, it does discovery of the gaps. It
also gets information from me and also
stores information, new information, and
all.
Uh
this is one, okay? Next, let's see.
This is a busy window. I tried to
actually do things, but uh it didn't
work out. Okay. So, what you're seeing
on the on the window is like 14
incidents.
You have seen one problem that I solved
with an agent, right? The communication.
What if I took like 14 incidents uh and
I just go and have 14 cycles of this
thing and how it does. So, if you see on
the left side, it was the first
incident, right? So, right now, it has
1.5 confidence.
And everything is critical. Every So,
basically, nothing is documented. So,
everything is critical, high, and the
data is missing. So, I started giving
answers to in the first incident, then I
repeated for the same second and third
and continuously for like 14 incidents.
But, on the 14 incidents, it basically
actually able to go to a confidence
level of 4.4 because first it discovered
on every instant, it got the list of
answers for me and also it documented
everything for me. So it gradually from
1.4 to almost like five range of
knowledge.
It improved.
So if you look at the traditional way in
traditional way what we do is we solve
all the context problem.
Right? We have to deal with it first
then I have to give it to agent. In this
one
we are moving agent from consumer to a
knowledge manager.
So you just don't consume from me. I'm
going to tell you but the whole
knowledge management is also your job
and you have to do it for me.
Okay, I think we can get back to the
slides a bit.
Okay, so what we have seen is
we have I have run one cycle and also I
have shown how it look like when I run
in like 15 or 16 different cycles,
right?
But if you have want to do it manually,
it would be really painful because I
tried it after 15 cycles like nobody
would like to actually sit with an agent
and you know you have an incident but
you won't be sitting with your agent and
keep actually asking questions and
telling you about your problems, right?
So that is super painful.
So
but the thing is we can automate this
process. So this is where actually it's
it's really good and gets interesting.
So here is the thing.
You all we already have all the work
items, right? We have Jira. We have
incidents.
We have
customer support tickets like that. All
those kind of a work items already
there. Right? Sitting in the archive.
So why can't we take
them and actually use the framework
and validate across your knowledge
database, run an automation and see
actually what is the state of your
actually right now.
Okay, let me see let me show you how it
looks like. So,
rather than actually doing it manually
at a scale if we do this approach, so
how does it look like? So,
the demo that you're seeing is almost
like everything is preset.
For example, I have the demo. I have
like a platform operations agent and uh
I'm saying, "Okay, these are the recent
incidents. Let's say I have 20 recent
past incidents I have." Uh like an MD
file or a JSON file, right? It has all
the details of description of it, things
and all comments and everything. And the
rest of the files are your knowledge
base. So, it's it's a file system, but
you can also actually connect with the
same way confluence and things and all.
Just for a demo purpose, I'm just
showing it as a flat files.
Now, what I'm trying to do is
I'm going to take all these incidents
and validate each incident across my
knowledge base
and ask the agent, "Okay, tell me
uh how much of the document is good, how
much of the documentation is I can't
trust it or like old or outdated, and
how much is actually missing, not
documented as per this incident?" So,
let me
run it.
Okay, it will take some time. So, it
will take three steps actually. So, one
is uh it generates probes, which means a
basic test it will write to actually
test your knowledge. Uh
Then it will run those tests and then
analyze the gaps actually.
Okay.
It's a little bit hot in the room
actually.
I'm going to identify the
gaps. it's just like a clever problem,
so I imagine.
Uh okay, for example, let's say let's
say you have an incident called the
notification service is not sending
uh customer
uh messages to
uh
SMS service, right? So, the notification
service then you mention that the agent
sees is there a documentation related
notification service.
So, it doesn't find, that means you
never wrote a documentation on
notification service. I do understand
what is the customer
SMSes things and all. The customer
notification service when you mention,
it's a gap because it's never
documented. Or
it takes the customer notification
service, goes to confluence, and sees
like the documentation how old it is. If
it is says like uh
it's like 1 year old, it will tell you,
"Look, I looked into it. It's like 1
year old. I don't know whether I need to
trust this documentation or not." Or
like incomplete uh documentation. So,
if you see it's called each incident it
took it, and it looked at all the
knowledge uh base that you have
connected, and have consolidated list of
like scoring of like okay, partially the
agents can handle uh the basic
edge cases of the incidents that you
give
uh because your knowledge base is not
complete, and it will show you how much
of the tribal knowledge is missing,
system information, business process,
what are actually you're missing from
your
institutional knowledge, when whatever
is not documented.
Uh this are the probes.
And it will also identify uh what is
critical and what is high. This is
really important because
uh let's say there is some kind of an uh
example of notification service which I
mentioned, right? It is repeatedly
uh appearing in like 20 incidents, and
you don't have
This is the first that you need to you
need to as per your documentation. So,
it will also help us actually understand
when you're
uh
breaking down your knowledge base
you need to understand what is critical
actually, what I need to focus on first,
what makes value for me.
So, we'll organize into critical, high,
medium.
So, this is what like I showed the
flat files, but you can also connect it
to the various data sources that we
have.
So, the step one is basically what it
does
demand extraction. That means every
incident it will extract
the checklist of information what is
missing.
On the second step is what it will
consolidate everything what is missing.
So, it will create like systems and APIs
and all and how many are clean, how many
are stale, which is incomplete, what is
entirely missing. Uh something is
tribal.
Uh those kind of a classification it
will also do.
And it will create a Kanban board for
you. So, what happens is So, if you want
to fix your institutional knowledge
base, basically, you just just like Jira
tickets, we finish it. We actually has
to document these missing pieces and
all. And the the moment you started to
So, it also saves in the context like
So, it also has to build its own
knowledge base.
And you can see the performance. So,
once you're fixing the tickets on the
Kanban
uh
institutional knowledge.
So,
the how So, what we've seen is one is
the approach first of all, which means
not the pull push approach, but the pull
approach, how to do it.
One cycle or multiple cycle, how it look
like. But, if you put it in a scale of
automation, then how much valuable it
would be.
Okay. Now,
the important question is
Should I do a voice over later,
actually?
I have the patience. Who has the
patience? For a second.
So,
the question is so I was all all the
time I was talking about, okay, receives
the context, we give the information, it
will store it. But, the question is
where does it actually store it? So, I
have a very opinionated opinion.
Hear me out. I prefer it has to go to a
GitHub repository because eventually
somebody will actually come up with a
you know, 20 million seed funded SaaS
solution for you. But, before that, I
prefer
to actually put it in GitHub as a
repository. Why? Because if you look at
it at a scale, if you want to do this,
there will be multiple agents, multiple
teams actually contributing to the same
knowledge base. And there will be
conflicts and resolutions, right? So,
the
the easiest way to do is using GitHub
because it actually comes with inbuilt
uh
uh PR processes, review processes,
things and all. So, if multiple domain
experts are sitting and uploading the
files or like agents are contributing to
it, the most efficient way to manage is
in a GitHub, something like a structure
like this.
And the other advantage is also if you
put it on GitHub, you can also publish
it to Confluence later or like Slack
wherever you want to publish it to on
the
solution that you want to use.
Uh so, I prefer to have it on GitHub,
but if you want to directly integrate it
to Confluence and all, you can also
install do it.
Next is
a meta model.
How many are aware of the word meta
model in the
Okay, maybe I can quickly show you how
does it look like.
So, meta model is basically something
like this, right? So,
in how does your
uh
uh domain actually structured around
like
is a business process are how it is
related to a system. How systems are
related to
uh APIs.
And how is this
business jargon or like a tech jargons
are actually linked it to which one. So,
these kind of a relationship
meta model is really important. It's not
necessary
for the approach that I have proposed,
but it's an add-on. And why uh you need
to have this one is right now think of
it is like a map. Right now, your agents
doesn't have any map actually to
navigate with your knowledge base.
Basically, what you're doing is you're
dumping like
these many number of files and it need
to figure out which file I need to need,
right? But, your file structure is
actually
a representation of your meta model. It
actually knows how to navigate. For
example, let's say can you fix this
system? It will understand if I change
make changes which business processes
will be affected and which APIs I need
to change or like touch these kind of
the things.
So, it's also important to have a meta
model.
If you have it,
uh
then it will produce more value. So, I
strongly prefer to have a meta model
along with this approach.
Okay.
So, the last part is what is the value
it created? So, it
There's a lot of slides that you have
seen, a lot of demos that you seen. So,
personally I need to also share like
what's the value that I see when we
I was using it or like the other people
who I shared with already were using it
came back with a feed feedback and told
me. Uh first, the most valuable thing is
knowing the unknown.
So, what is never documented is
something can be surfaced only by this
approach actually.
Uh otherwise, you will just end up in um
an endless Miro board of like putting
tickets on like okay, this is missing,
this is missing, I need to add it, I
need to add it and keep on doing it. So,
this is the fastest and better way to
discover uh with your previous work
items and all what is never documented
uh things and all. Uh second is uh
basically
I can now
give work to agents rather than I do all
the things. Like rather than I become I
give the agents all this information.
Let it manage my knowledge management. I
don't want to be the knowledge manager
of it. So, let let it do it. So, those
are the two big values that I have seen.
If you want to use it, I think like you
will also see the those two as the most
valuable. Uh
but these are the other things what I
seen.
Now, okay. So, I also need to
tell you like what is the
uh uh drawbacks of also using it, right?
First of all, if you are coming from a
small team or like if you say like no,
no, no, my documentation, my knowledge
base is really good.
I'm like super happy for you. Uh
you're the lucky ones in this world
right now with agents. Uh
for you it might not be really relevant
unless you have a very
very complicated
uh documentation that you have.
Uh second is I already mentioned the
manual manually doing is it's very
painful. I don't prefer anyone to do it.
If you want to just try it for testing
purposes, you can also do it. But, uh,
automation is the most best way to
actually use this one.
Uh, this is very early, this approach.
So, by tomorrow morning on YouTube,
somebody would have already posted
something differently,
uh, better than me. So,
uh, in the in the
era of AI, nobody knows like, uh, how
long a thesis or an approach or an app
product going to survive. So, for now, I
see this is the best approach.
Okay, so
the whole workshop, so we started with,
uh,
one pipeline, right? On the ROI. And so,
the demand-driven context actually sits
between this monolith and also the
retrieval layer, actually.
And what it does is,
uh, it actually helps you
build curated context blocks for you.
You can also think of it like a
uh, cache database that you have. So,
every time your agent doesn't need to go
and, you know, boil the ocean for your,
uh, fixing an issue. Rather than if you
have a good context block of
information, most of the time, 80% of
the time that can be usable. Because
what I also believe is it's always the
80/20 percent rule. So, 20% of your
documentation is most useful. 80% is
some corner cases you have to look into
it. So, rather than giving 100% of
things, you need to figure out of what
is my 20% of that, uh, that is super
helpful for agent and have it like a
cache database, uh, the context block of
it using it. And rest of it, you can
leave it like, uh, links. So, whenever
agent feels I need more information,
then only it can go and check the, the
whole, uh, monolithical, uh,
institutional knowledge.
Okay, so from here, what you can take
from this workshop is three things. One
I hope I makes I made sense of this
approach. So
there is a GitHub repo
which I detailed it out and also a
starter guide on it one if you want to
go home and try with it and you can try
it. You already have know how the
framework works. So you want to go home
and just remix the whole approach you
can do it and let me also know.
I'll leave this one and I'll join with
you for contribution. You have a context
gap gap scanner that I showed you which
is live already with presets. I think
like added like $20 on it. So hit it as
much as possible. You all right? Okay.
Okay. So after $20 so first come first
serve.
So all these three you can use
you can take away from this workshop.
Okay, so
because this is a workshop so I also
would like to
want you to try something. What you can
try is three things. One is
either
if if you say like you know what I'm so
so tired already it's almost like four
it's almost about to go for a party. I
don't want to do it. So you can just go
to the context gap scanner.
Everything is a preset here. You can
just try it out hit it and see how it
works. If you think it can be done
better let me know so that we can work
with it.
Uh
or otherwise let's say now I'm I'm very
technical. I want to know how it works
under the hood. This is a GitHub
repository. Uh it's it's under maybe
I'll just take this out.
Uh this is a GitHub repository and it
has all the information.
Uh plus there's is guide also if you
want to if you want to try it out. But,
uh, if you still feel like no, I want
much more simpler.
You can also try this one. So, you don't
need to do anything. Basically, take
this prompt,
uh, take one of your Jira ticket or
incident that you have right now. If you
already built MCP servers, uh, or like
any other kind of a things, you just use
that prompt, give it to your agent, uh,
with the incident or a Jira ticket, and
ask it, uh, give me the quality of the
knowledge base that I have as per this
incident or
Jira ticket in this way, and see how
many how much of it comes in the red,
which is never documented. So, you can
you can try also this simple one.
I just leave it like this. Maybe you can
take a picture. Maybe I can switch to
the slide if any anyone want to.
Cool.
This is your slide, you can go.
Yeah, so I see some problems in this
approach, but I find it very
interesting. So, my my first question is
have you already used this way of
working at scale? Or because we've seen
mostly toy examples, right? Yep, yep.
Uh, I used it, uh,
not at a scale. I started with simpler
because you also need to see what is the
scope of it. Let's say I have an
enterprise, and I try it at enterprise
level, I can't do it because it's
multiple domains, things, and all. Even
if If do it at domain level, I need to
understand I tried it at domain level.
Then even at a domain level, there is so
much of a domain expertise I need to
fill it up and
fill those gaps. So, again cut down into
maybe what is the smallest team that I
have and the smallest team's Jira
tickets, the smallest team's instance,
and the team's Confluence page.
Uh with a bit of a scope. Then if I
drill down the scope, then I feel like
it's more
fast, more useful. But if I do it at a
bigger scope, what happens is
not one person has the whole
domain expertise. So, basically it again
becomes like
somebody has to come and
you know, five or six people has to sit
down and start doing these things. Yeah,
I'm I'm a bit concerned that this might
denial of service attack your your team
members in a certain way because our
LLMs are fine-tuned to keep eliciting
information, to keep
getting more information out of us, to
ask follow-up Okay. Um so, I think it
will be hard on the engineers that have
to do the question answering.
Um and secondly, the the scanner is
nice, but that's still built on
the assumption that all of your team
members and the rest of the enterprise
are still using your enterprise IT
well, as planned, that they're actually
filling in their tickets with all the
details and etc. And I know from
practice that that is most of the time
not the case.
Uh that is true. That is true. I agree
with you. Even if I
my my assumption is also even if I go to
a leadership to buy in, like, "Hey, can
you give me a bandwidth or like
you know, I need these people to
actually sit and fix the context?"
I don't think right at this point of
time nobody will do. But I think it will
happen because slowly I think we are
slowly moving towards in agent managers
where agents are becoming
semi-autonomous or autonomous, and we
manage them. But at the certain point of
time, somebody has to fix that knowledge
because it's not going to come from
anywhere. You have to. So then the
enterprise focus will shift towards the
gap. That's what I started saying, I
don't think nobody is looking into the
problem yet. Everybody is very focused
with agent how good the agent is, how
good the retrieval is, but how good the
context is,
you're not solving. It I think like in
down the line in a year or so,
I think people will realize importance
of it and uh
the Kanban board will definitely come
into reality actually very soon.
Yeah, thanks.
Yeah, I think actually they're going on
the same price. I think when we look at
large which applies actually the
source of truth is not actually the
documentation, it's actually the code.
Just wondering have you applied it to
the code base?
Uh I did. Uh I also applied it the code
base. Uh
but I got a mixed result when I So
So there here is the thing. What
happened is when I only use code base,
uh it is particularly good or when I
only use Confluence or like textual data
uh like uh
uh it gives a good results, but when I
combine it, somehow actually uh it
conflicts because it it creates a theory
out of the GitHub repository, but the
same GitHub repository documentation is
also on Confluence.
So there it gets a conflict of Okay,
what is the source of truth? Code says
this. Should I implement it this way as
per the documentation? So then again I
need to create an additional skill or
rules like Okay, what is the ranking
that you need to give? If you see it in
a GitHub, that means that is the source
of truth.
Or if you see it if you don't see it,
then you have to uh
look the information in Confluence
things and all. But those are still
I'm trying to fix those things actually.
So seeing the gaps and fix those things,
but I definitely see uh that issue
combining those two. And the second
question is
But interestingly, it's actually
applying the same approach and skills
because what we find out is actually
like you have your um
kind of like your um
like the the process starts like by
running agents, which is bringing a
context and identifying the right skills
that need to be used, right? Mhm. Then
you go and do the task and you fail,
Mhm. right? Then once you fail, you
identify what you need to solve. Mhm.
You go back, you curate, you you fix,
but then you're fixing the knowledge
kind of base.
What we find out actually is we go back
and fix the skill.
Okay.
>> That's the increment of the
the next situation.
Sorry. So, I'm not sure if skill could
also be a skill that skill that Yep.
skill's part of the
the iteration loop or something.
Uh I think right now the skill that I
have built is static, but what you're
more proposing, if I'm not wrong, it's
like uh evolving skill, right? If the
skill fails, it has to evolve, right? Uh
I agree with you. I never tried it, but
I think like it has to be uh like that.
Because I'm also more concentrating on
how to do it at scale. Uh the reason is
also
uh
I I want I want the context to be fixed
before retrieval itself.
Not during operational. So, first when I
started with it, I I started doing with
when operational, which means oh, I have
a work item, I will assign to it, it
will fail, then I'll start giving
context and all, but it takes a lot of
time. It takes a lot of patience for me.
So, rather than doing it, you know what?
I'm going to fix the context, but before
retrieval.
So, if I can uh
while I was answering your question, if
you take a team,
the context that you need to fix is very
small.
So, you can use a
uh context gap scanner uh kind of a
thing.
And maybe if you're good have a good
domain expert, I think like couple of
weeks you can actually fix your
documentation. Not like 100% at least
like 60, 70, 80% of a good quality that
you can already build it.
So, my proposal would always be don't do
it at an operational level.
Uh at an real-time level, but do it
before retrieval uh itself. That is much
better in this approach.
Yep. Uh yeah, I have another question
that is relevant to
especially SK. If you have a lot of
documentation and if that documentation
is sitting in a GitHub repo.
You may have situations where you ask
questions
that may need I don't know, like five or
six different docs. So, you will have
like
I'm going there,
reading all the docs. Like this takes
time for
retrieval of information and then
it takes a lot of context also for the
Uh
Right now, after Claude code announced 1
million of tokens in the context window,
I don't know No, I don't have any
problem. So, I calculated it.
Uh at an average, it's like uh 96K
tokens because I tried with different
domains actually. Per domain, I see like
around 96K tokens uh if I consolidate
everything like confluence, things and
all. Uh so, easily it fits in the
context window actually. Uh I've tried
to do some experimentation around, you
know, a graph rag, put them there rather
than just take all the files, use a
graph rag, understand the intent. But
for me,
just putting the whole context right now
in the window
gives you more results than actually
doing uh uh
rag. Unless you have a very big uh
almost around a million tokens of a
context that you want to fit in. Maybe
then you have to use a bit more
retrieval mechanisms between it, but
otherwise I think like it should be
fine.
Uh I have one question. Uh
I opened your paper, and could you
explain this graph like
uh comparison between different
techniques like domain knowledge
strategy, knowledge access.
Uh
Uh which one? Yep, sure.
This one.
Okay. So, I also did the citations from
other papers. Okay. Uh, so, not directly
related, but you have the paper of AS,
uh, which is also
does a similar thing. So, but AS is not
exactly into
Uh, how do you say, discovery and
curation actually.
If I remember correctly,
uh, maybe I need to refresh my memory.
>> mean between
the difference between domain knowledge
and strategy knowledge?
Okay. So,
strategic knowledge, okay. So, what AS
and all
are doing is when you are trying to have
a conversation with AI,
uh, you can see in the cloud code and
all, it updates its memory. Or like the
relationship with your things like that,
right? So, and also from the chat
history, it understands what is the most
important context I need to remember,
those kind of the things. So, when you
are in communication with it, that
operational conversations with AI
improvement they propose. So, what I
my proposal is not based upon your
conversation with an AI, but rather than
your domain knowledge which is
documented actually.
Yeah.
Uh, somebody else has a question.
Sorry.
When you're going to remote knowledge,
so things on Confluence and that kind of
stuff, um,
how do you ensure that your agent only
points to the updated or like the
the relevant documentation as
because in your local file system, you
you have tags like outdated
and stuff like that, but remotely that's
harder to tag, I guess, right? So,
Okay. So, when I wrote a pipeline for
extracting from Confluence, it also
allows actually to give you a date and
also last updated who created kind of an
analytics on on the space. So, you can
use it to actually put a threshold of
like, okay, on this particular date,
whatever is is old, consider it as an
outdated one. And let me know. Don't
just consider it as an outdated one, you
let me know because sometimes
the document can be stale for so long,
but it could be an important document
actually. So, it lets you know, but not
like take decisions actually right now
on this one. So, you decide which one is
stale and which one is not. You don't
have like an intermediate layer where
you in the repo you store
this is still a data dump. Okay. So,
when it is curating the context, also it
updates with a
date and also the state of the document
like stale, active, and clean. So, it
also looks into, okay, this is stale, I
know I'm not going to touch it, and I'll
just go to just look for any other new
other documents are there in this one.
Did you think about how to manage access
or permissions later to this knowledge?
Like if you have some Mhm. knowledge in
the company which is sensitive and only
and only specific people can get access
to it.
Currently, I guess you just have all the
knowledge and everything is accessible,
but there Okay. So, because it's not a
product or a SaaS solution, it's just
basically GitHub.
For me, right now, permissions and
things are not difficult to implement
because GitHub
out of the box gives me
who I can give the permission to this
GitHub, who can have right, read access,
things and all, who can merge, those
things and all. But in case if it
evolves into a product, and for example,
context gap scanner as a product, and I
want you to test it because the reason
why I was using presets for this
workshop, not actually asking you to
upload the files, is because I don't
want to take your IP data on this one,
right? So, unless it becomes a product,
you don't have any problem
GitHub and all, but if you have a SaaS
solution for this one, Uh, it's between
how the SaaS solution will manage it
right right now. But the approach has
nothing to do with
access things and all. How you implement
those access on on the knowledge is up
to you.
You have a question?
So this is of course about
documentation, but
did you give any consideration about
using it on some like central
tooling that a company would use? Like
let's say that you have a platform team
and you have a CLI that the different
teams are using.
>> Mhm. And so now it's used by different
agents, right? Okay. And so the agents
can also be like, well, this action is
available for a resource, but I don't
want to do 500 calls just because I have
a list of 500 resources. Okay.
>> be nice if the tool could do that.
I don't know if if you've given any
consideration to this. I I think that is
how it has to work in an organization.
You need to have a central solution for
it. But how you want to do the solution
is up to the organization. For example,
we are doing Agile, right? So Agile can
do by Scrum, Kanban, or like Lean, or
something. And also you can do different
apps to do it. The The process is the
same, but how you do it, which method
you will choose, and which app that will
you choose in your organization is
different. In the same way, what we have
discussed is the approach.
If you want to put it in the
organization, you can use the approach
and you do you can do it in whichever
way you want.
My point is more like, so with this you
can identify gaps in your documentation.
>> Uh-huh. Right?
Could you use it to identify gaps on
your tooling?
Okay.
Uh, when you say tooling, it's the agent
Internal tools that I don't know, maybe
a team is building for the rest of the
company. Just infrastructure in general,
right? Maybe. Yeah.
Uh, can you give me an example of uh,
like
how it could
>> that uh, I don't know, you build some
sort of abstraction on top of
Kubernetes. Uh-huh. Okay.
>> You don't want your developers to
necessarily know what to do with that.
And then you a different CLI or you have
something, right?
Um
but then like I say, I don't know, maybe
you thought that they would lease one of
your custom applications or corporate
applications one by one.
But a team has grown into using more of
that and suddenly they have a lot and
they don't want to do that many calls or
perhaps even the agent is like, "Well,
this is inefficient. I would like this
internal tool to work in a different
way." Okay.
>> And in that way it would identify like
gap in the tool or performance
improvement, kind of like these tasks
for documentation.
Could be extended, actually.
So because
we have seen the business processes
also, right?
So it can also document business
process. The business process is nothing
but how
uh the process in the application it
actually runs and does things, right? So
you can extend it to also find out the
gaps in the business process or like how
it works.
It could be an extension to it. Okay,
yeah.
Thank you.
How would you ensure that maintenance
won't kill you?
I'm sorry?
How would you ensure that maintenance
won't kill you?
Because the knowledge knowledge for a
company changes with time. Yeah. So if
the answer for a question today is B,
tomorrow could be one.
On Friday it could be C, right? Mhm. So
you have to see whether you you have to
identify B one and B
and overwrite them. If it's stored as a
text, that's a problem.
Okay. So when I showed the context gap
scanner, you also saw like an indicator
of a duplication, right? So if today you
have a document, tomorrow you have
version 2.0 and something else actually,
it will find out the same information is
having in three different It It also
will find it.
If you have only one it is changed, it
will take the latest updated one because
as a human you changed it. So it will
take it as a source of truth, right? But
if you have
three versions of the same document,
that's a duplication, and it will flag
it as a duplicate.
But, but
it's a search problem. How how do you
ensure
performance? Well, let's say you have a
document of 100,000 words. Uh-huh. And
you just change a word, like a
password, let's say. It won't be there
about for example, right? If you just
change Okay. So, you have to find this
specific word, compare those few in
three documents,
Mhm. find them and replace them, so on.
How is it feasible?
Which tools would you use to make to
ensure it will it will it will not kill
you cost-wise?
Um I didn't quite get your question,
actually.
Uh Is it like the token usage you're
worried about, like uh that many tokens
that we used
How How is What is the cost saving?
Yeah,
precisely.
If if if it will grow Mhm. precisely you
need, and if there are a lot of changes,
you would have to maintain maintain that
whole database, right? So, you have Mhm.
Okay. the structure.
I don't know many grammar I won't talk
about, right?
Um
but my question is, well,
cuz what you presented is sort of a
happy path, where you have a gap, you
fill it, and then you reuse it.
But, in a while, you will have a bigger
problem, where you pretend to have that
gap filled,
but actually it doesn't contain
up-to-date information
to be done. It contains wrong
information, right? So, you want to
preserve that.
Uh okay. Okay. Uh
so,
it can flag as per when it is created or
the last updated. You can set such kind
of a filters.
But, let's say you have a latest
document which has a wrong information.
Right?
>> But, you don't know that, right? No,
that's true. But, as for example, as a
human being, right? So, you go and look
into documentation. You told somebody to
look into documentation. And the person
looked into the documentation and as per
the documentation, this is being
implemented in this way. The person will
do it, right? It's not an agent or a
human issue.
Okay.
Mhm.
Okay.
Okay.
Uh for for example, sorry, you're saying
that
But I don't think it will cost that much
as a
As I said like
when it tested it, there is none of the
domains which cross more than a 100k
tokens actually.
So, I don't think we will for example
conduct gap scanner, right? You I don't
think like you don't have to
do it like on a daily basis or anything.
Even if you run daily basis like 100
tokens and do one scan. For example,
right? If you try to start hitting all
of them,
all of you, the context gap scanner, I
think you can't even burn like one $1. I
think so. If I'm not wrong. It already
had like Oh, you're ready, okay.
Okay, I'll I'll cancel the subscription.
But I
Mhm.
specific domain Mhm. So, the moment you
scale, you would have to solve this uh
cost question. Okay. Which is going to
be depend also how fast the to to his
point, like how fast
the the the data change. Mhm. Which I
think most of the time not that much.
So, the moment you get to like 80% 90%,
you just continue performing the task.
Yep.
Uh I see it's different use case by
different use case. Yep, use case by use
case, yep. Yep.
Uh any other questions?
Yeah, I was curious, so I I ran this
game maker for like one of these uh
domains, Okay. and it has a bunch of
recommendations. How do I know that
that's enough
Uh
It it
It actually tries to detail out as much
as possible. Right now, I haven't
actually exposed everything what it did
just for the UI purpose. Uh but the all
the per ticket what it actually found,
like uh it writes like 100 or like 150
lines of markdown files and save it
somewhere. So, that gives you more
details in case if you want to know
actually. So, for for the demo, I just
put the you know, nice UX stuff on on
top of it, but you also have a detailed
information at this.
Anyone else has any questions?
Can I have another one? Uh yeah, sure,
go on. Right, so you Easy one, huh?
I'll just try.
Yeah, I think you started with the job
for clients or that's
40%.
>> Yeah.
And if I call the translation right,
your claim is
with fine lines.
Not this You said a couple of weeks,
like a couple of weeks, and you will
have filled that
Not fill the gaps,
uh but discover. But if you scope down
to a team,
within weeks you can do it. Right, cuz I
mean, if you wouldn't fill it, and you
would have to keep asking those
questions, then
>> Yeah, so you do it one time. Uh first of
all or like multiple times at first, see
the whole picture first of all. What is
the state of your knowledge base?
First fix it at that level. Then you go
into operations, right? You still can
actually also
You you can still continue doing it with
the agent with skills.
Yep.
Uh-huh.
Mhm.
Okay. So, you mean like you can also
give all the transcripts rather than
doing the cycle, you mean?
Uh that can also be done.
If you're only have the
all the time you have a discussion in
the meeting. Everything is documented
meeting's transcript itself. But I don't
think like this
same case for everyone at least uh
uh for
I think the amount of time people
spending in teams
if you do use the transcripts, actually
those are the ones actually who which
have more tokens actually. There are so
many useless meetings. Uh
the transcripts actually
Could be. Again, it depends upon
institution to institution, right? So,
are you like more into meetings have
solving problems within the
conversations? And those conversations
has a data? Or like your conference or
things has a data. If you have it, use
the those transcripts as your knowledge
base. And at the same time like the
compression actually that works
actually. That that is more useful.
Yeah.
Anyone else? Any questions? No? All
good?
Uh then
thank you so much
for attending this session.