AI Engineer World’s Fair 2025 — GraphRAG

Channel: aiDotEngineer
Published at: 2025-06-04
YouTube video id: RR5le0K4Wtw
Source: https://www.youtube.com/watch?v=RR5le0K4Wtw
Quick quick question everyone who
attended the keynote this morning. Uh we
got a we got a very good question from
the organizer. Not not a question a
statement. Is rag dead or is is uh
agents taking over rag? How many of you
have implemented rag in
production? Oh, plenty. So I can say
with high confidence rag is not dead.
My my take on rag is u uh I think if rag
can solve the problem that you're
working on in production you don't need
agents and vice versa why
uh wrong analogy why build an effl tower
when you can get done with a smaller
minuscule version of it so I think that
is how I see rag being important and
there are a lot of use cases where rag
has found its uh application so I'll
I'll I'll stop there we'll we'll talk
more in my in in my presentation but uh
I'm I'm waiting for 11:15 because this
session is streamed live on YouTube for
uh our friends who couldn't attend uh
this
conference. So when he gives me a heads
up I'll I'll start.
Should we start? Okay. Oh thank you. All
right. So thanks for coming. Um to
quickly introduce myself my name is
Mitesh. I lead the develop advocate team
at Nvidia. And the goal of my team is to
uh create technical workflows, notebooks
uh for different applications and then
we release that code base uh on GitHub.
So developers in general which is me and
you all of us together we can harness
that uh that knowledge and take it
further for the application or use case
that you're working on. So that is what
my uh my team does including myself. In
today's talk, I'm I'm going to talk
about this project that we did with one
of our partners um um and some of my
colleagues at Nvidia and our partner
about how can we create a graph rack
system, what are the advantages of it
and if we add the hybrid nature to it,
how it is helpful. So that's what my uh
my talk is going to be on. I will not
give I will not be able to give you a 10
ft view where you can I can dive with
you in the codebase but there is a
GitHub link at the end of this talk
which you can um scan and all these
notebooks whatever I'm going to talk
about is available for you to take home
but I'll give you a 10,000 ft view or if
you are trying to build your own graph
rag system how can you build it so u a
quick refresher uh what is knowledge
graph um and why are very important. So
um it is a network that represents
relationship between different entities
and those entities can be anything. It
can be people, places, uh concept
events. A a simple example would be me
being here. What is my relationship to
AI worldfare conference? AI engineers
worldfare conference and my relationship
is I'm a speaker at this conference.
What is my relationship to anyone who's
attending here? Well, uh our
relationship is you attended my session.
So this edge of relationship between the
two entities becomes very important uh
to which only graph-based network can
exploit or knowledge graphs can exploit
and that is the reason why uh there's a
lot of active research happening in this
domain of how you can harness graph rag
u how can how you can harness knowledge
graph and put it into a rag based system
so the goal is three things how can you
create a triplet which is the which
defines the relationship between these
entities is that graph our graph based
system or knowledge graph is really good
at
exploiting and that's what is unique
about this knowledge graph. So if you
think about
um why can they work better than
semantic u u rag system well it captures
the information between entities in much
more detail. So those connections can um
can provide a very comprehensive view um
um of the knowledge that you that you
are creating in your rag system and that
will become very important to exploit
when you are retrieving some of that
information and and converting that into
into a response for the user who is
asking that question and it and it has
the ability to organize your data from
multiple sources. I mean that's a given
no matter um what kind of rack system
you're building.
So how do we create a graph rag or a
hybrid system? So this is the highlevel
diagram of what it entails. So I broke
it down into four components. The very
first thing is your data. You need to
process your data. The better you
process your data, the better is a
knowledge graph. The better is a
knowledge graph, the better is the
retrieval. So four components data, data
processing, your graph creation or your
semantic uh um embedding vector database
creation. Those are the three uh steps.
And then the last step is of course
inferencing when you're asking questions
uh to your uh rag
pipeline. And at a higher level this can
be broken down into two big pieces
offline online. So all your data
processing u work which is a one-time
process is offline and and once you have
created your knowledge graph which is
your triplet entity relationship entity
2 um or your semantic uh vector database
once you have it then it's all about
quering it and converting that
information into um um a response that
is readable to the user. It cannot be
something that here are the three
relationship and then we as the user
have to go figure out what does this
exactly
mean. So the top um part of this uh flow
diagram is where you build your semantic
uh vector database which is you you pick
your uh u documents and then you convert
them into vector embeddings and you
store it into a vector database.
So that piece is uh is how you create
your semantic uh uh vector database and
then the piece below is how you create
your knowledge graph and it is much more
uh um there are much more steps that you
have to follow a care that you have to
take when you're creating your knowledge
graph.
So diving in the first step creating
your knowledge graph. How can you create
those triplets out of documents that are
that are not that structured? So
creating triplets which uh which exposes
the information between two entities and
picking up those entities uh so that
that information becomes helpful is very
important. Here's a simple example. This
document is of Exon Mobile's uh results.
I think uh they're quarterly results and
we we try to pick up uh um the
relationship or create the the knowledge
graph using an LLM and if you see at the
first line it's Exon Mobile which is a
company that's the entity uh cut is the
feature of um of that entity spending
oil and gas exploration um and
activity my apologies cut is the
relationship between Exon Mobile and
spending on oil and gas exploration and
activity entity is the the um the name
of the entity is spending on oil and gas
exploration. So this is how the
relationship needs to be exploited. Now
the question that comes to our mind is
that sounds very difficult to do and
exactly it is difficult to do and that
is the reason why we need to harness uh
or we need to use LLMs to figure out a
way to extract this information and
structure it for us so that we can save
it in um um in a triplet format. And how
can we do
that prompt engineering but we need to
be much more uh uh uh defined about it.
So you based on the use case that you
are trying to work on you can define
your oncology and once you have defined
your oncology you can put it in your
prompt and then ask the LLM to go
extract this information that is
oncology specific from the documents and
then structure it in that way so that
that can be stored in a form of a
triplet. This step is very important.
You might be spending a lot of time here
to make sure your prompt is doing the
right thing and it is creating the right
oncology for you. If your oncology is
not right, uh if your triplets are not
right, if they are noisy, your retrieval
will be noisy. So this is where you will
be going back and forth figuring out how
to get a better oncology.
So th this is where you will spend my
take is this is where you'll spend uh
80% of your time to make sure you get
the oncology right and you'll be going
back and forth in an iterative manner to
see how you can make it better over
time and then the next vector database
for a hybrid drag system is to create
the semantic vector database and that is
very reasonably straight straight
straightforward or it is well studied.
So you pick your document. This is the
first page of attention is all you need
research paper. And you you break it
into chunk sizes and you you have
another factor called overlap. And chunk
sizes are important because what
semantic vector database does is it will
it will pick up that chunk and convert
that into use the embedding model and
convert them into a u embedding vector
and store into the vector database. And
it will if you don't have an overlap
then the context between the previous
and the and the next chunk will be lost
if there is any relationship. So you try
to be smart on how much overlap do I
need between my previous chunk and the
and the next chunk and what is the size
of the chunk that I should uh I should
use when I'm chunking my documents into
different paragraphs. That is where the
the advantage of graph rag comes into
play because uh if you think about it
the important information which is uh
the relationship between different
entities are not exploited by u by your
semantic uh uh vector database but they
are exploited really well when you're
trying to um use a knowledge graph or
create a knowledge graph based system.
So once you have created this uh um this
knowledge graph what is the next step?
Now, now comes the retrieval piece which
is um um you you ask a question what is
Exxon Mobile's um cut this quarter that
that it is looking like and knowledge
graph will will help you figure out how
to retrieve those nodes or those
entities and the relationship between
them. But if you do uh a very flat
retrieval which is a single hop you are
missing uh the the most important uh
piece that graph allows you which is
exploitation through multiple nodes that
you can think about and that becomes
very very very important. I I cannot
stress how important that becomes. So
think of different strategies. Again you
will spend a lot of time to optimize
this whether you should look at single
hop, double hop, how much deep you want
to go so that nodes um the relationship
between your first node to the second
node, your second node to the third node
is exploited pretty well. And and the
the more deeper you go, the better
context you'll get. But there's a
disadvantage of that. The more deeper
you go, the more time you're going to
spend on retrieving that information. So
then uh uh latency becomes a factor as
well especially when you're working in a
production environment. So there is a
sweet spot that you'll have to hit when
you're trying to um go how deep you want
to go how how many hops you want to go
into your graph versus how many uh what
is the latency that you can uh u you can
survive. So so that becomes very uh very
important and those some of those
searches can be accelerated. So um um um
we created a library called cool graph
um which which is a which is available
or integrated in a lot of u libraries
out there like network X and whatnot.
But that acceleration becomes important
so that it gives you the flexibility to
get deeper into your graph go through
multiple hops but at the same time you
can reduce the latency so your
performance of your graph improves uh a
lot. So this is the where the retrieval
piece comes into play where you can have
different strategies defined so that
when you're querying uh your data um and
get getting the responses you can have
better
responses and the other important piece
I personally worked on this piece so I I
can talk at length on this but uh I'm
I'm going to give you a very high level
um is evaluating the performance and
there are multiple factors that you can
evaluate around faithfulness um answer
relevancy uh precision recall
um um if you try to use an LLM model,
helpfulness, collectiveness, coherence,
complexity, verbosity, all these factors
becomes very important. So there is a
library pistol library called Ragas. Um
it is meant to evaluate your rag
workflow end to end. Anyone who used
Ragas for evaluating your graph rag? All
right, a few of them. Thank you. But it
is it is an amazing library that you can
uh uh use to evaluate your uh your rag
pipeline end to end because it evaluates
the response it evaluates the retrieval
and it evaluates what the query is. So
EV uh it it will evaluate your your
pipeline end to end which becomes very
handy when you're when you're trying to
test whether my retrieval is doing the
right thing or whether my uh the
questions that I'm asking is the LLM
interpreting it in in the right way or
not. So you can break down your
responses in uh the Raga's pipeline will
evaluate all those pieces and see what
your eventual score is. So it is a pip
install library. The other is LLM uh and
Ragas under the hood uses an LLM. Um no
surprises there. Uh by default it is
integrated with u GPT. uh but it
provides u um you the flexibility that
if you have your own um u model you can
bring it in as well and you can uh wire
it up with your API and you can use that
LLM to figure out on these four four
evaluation parameters that RAS offers.
So so it's a it's it's it's quite comp I
would say it's comprehensive but it's
really good in terms of giving you that
flexibility. The other path is
uh using a model that is meant to
evaluate specifically the response
coming out of LM and that is where this
model Lanimotron 340 billion reward
model that we released I think few years
ago at that time it was a really good
response model it's it's a 340 billion
parameter model so reasonably big but uh
it
evaluates um it's a reward model so it
will go and evaluate the response of
another LLM and judge it in terms of um
how the responses are looking looking
like on these five parameters but it is
meant to go and judge other LLMs that is
how it was
trained. So moving further I would like
to use this analogy that for
uh to create a graph rack system it will
take you uh which is 80% of the job it
will take you 20% of your time but then
to uh make it better which is the last
20% uh sorry which is the the 8020 rule
the last 20% will take 80% of your time
because now you are in the process of
optimizing it further to make it make
sure it works for the use case good
enough um um um for u for the
application that you're working on and
there are some strategies there which I
would like to walk you through. So one
as I said before which I couldn't stress
enough the way you are creating your
knowledge graph out of your unstructured
data becomes very important. The better
your knowledge graph the better results
you're going to get. And something that
we did as experimentation through this
use case that we were exploring with one
of our partners uh was can we fine-tune
an LLM model to get the quality of the
of the triplets that we are creating
better and does that improve results can
we do a better job at data processing
like removing reax apostrophes brackets
words that uh characters that don't
matter if we remove them does it give
you better results? So these are like
small things that um that you can think
about but it gives you it it improves
the performance of your overall system.
So that is where you I'm talking about
80% of your time small nitty-gritty of
the things that you are the knobs that
you
are fine-tuning with slowly and steadily
to make sure your performance gets
better and better and I would like to
share a few strategies that we did which
we got uh which led us to uh uh which
led us to get better results. So the
very first thing is uh rejects or just
cleaning out your data. Um we we removed
uh apostrophe as other other uh
characters that are not that important
if you think about uh triplet generation
that led us to uh um to better uh better
results. We we then implemented another
strategy of reducing the not not missing
out of longer output making it smaller.
that got us uh uh better results and we
also fine-tuned the um the llama 3.3
model or 3.2 model and that got us
better better results. So if you look at
the last three columns you'll see that
by using llama 3.3 as is we got 70 1%
accuracy. So this was tested on 100 uh
triplets to see how it is performing and
as it got sorry 100 documents. So as it
got better and uh as we introduced Laura
we fine tuned the llama 3.1 model our
our accuracy or uh performance went up
from 71 to 87%. And then we did those
small tweaks uh it improved the
performance better. Again remember this
is on 100 documents so the accuracy is
looking high but if your document pool
increases that will come down a bit but
in comparison to where we were before we
saw improvement and and that is where
the small uh tweaks come into play which
would be very very very helpful to you
when you're putting a a system a graph
rag or a rack system into
production. The other is from a latency
standpoint. Um so if your graph gets
bigger and bigger now you're talking
about a network which which goes into
millions or billions of parameter and uh
or millions and billions of nodes. Now
how do you how do you do search in um in
those millions and billions
u in the graph that has got millions or
billions of nodes and that is where
acceleration comes into play. So with
with with cool graph which is now
available through network. So network X
is also al also a pip install library.
Uh anyone who used not network X here
right few. Okay. Um so network is also a
pip install library. Uh under the hood
um it uses um acceleration and if you
see a few of the algorithms uh we um we
we did a performance test on that and um
you can see the amount of latency in
terms of overall execution reducing
drastically. So that is where you can
again small tweaks which will lead you
to better results. So these are two
things that we experimented which led us
to to better results in terms of
accuracy as well as reducing the overall
latency and these are small tweaks and
it it leads us to better
results. So then the question obviously
is should I uh use graph or should I use
semantic um based rack system or should
I use hybrid and I'm going to give you
the diplomatic answer. It depends but
but there are few things I would like to
you guys to take home to to um um which
will help you to come up to a decision
so that you can make an educated guess
that for this use case that I'm working
on a rack system would solve the problem
I don't need a graph and vice versa or I
need a hybrid approach so it depends on
two two factors one is your data um
traditionally if you look at retail data
if you look at FSI data if you look at
employee database of companies those
have a really good structure defined. So
those kind of data set becomes really
good use cases for graph based system
and the other thing to think about is
even if you have unstructured data can
you create a good uh graph knowledge
graph out of it. If the answer is yes
then it's worthwhile experimenting uh um
with u to go the graph path and it
depend it will depend on the application
and use case. So if your use case
requires to um to understand the complex
relationship and then extract that
information uh um to for the response
that you um for the questions that you
are asking only then it makes sense uh
to use graph because remember these are
compute heavy uh heavy systems. So you
need to make sure that these things are
taken care of. I am running out of time
I think but uh as I said before all
these things that I talked about I gave
you a 10,000 ft view but if you want to
get a 100 ft view where you are coding
into into things all these things is
available on GitHub even the finetuning
of the llama 1.1 Laura model and we had
a workshop a two-hour workshop so I gave
you a 20-minute talk but this whole
workshop is covered uh in two hours as
well and lastly um join our developer
programs we do release all these things
on a regular basis you if you join the
mailing list you get this information
based on your interest and as uh my
colleague mentioned I will be across uh
the hall at Neo4j booth uh to answer
questions if any I would love to
interact with you and see if you have
any qu uh any questions and I can answer
those questions. Thank you for your
time.
[Applause]
Thank you. Thank you Mate. That was
fantastic. We've got another great talk
coming up here.
J come on up.
And if I get this right, you're going to
take a philosophical perspective on
this. Yes. Yes.
Hello. Yeah. Thanks.
Five,
four, three, two,
one. Thanks.
You've got this.
Wait, where's the note there?
Um yeah. Oh,
it seems like there's a rehears.
Okay. So, I don't know if there's a way
to get speaker notes onto the screens at
the bottom. Do you guys know?
Yeah. on full screen. Do you need the
notes? Yeah, I need the note
there. I see the note.
You can also just walk through it on the
side and just Yeah, like that. Can you
collapse this collapse this?
I think it's better. Yeah. Yeah. Okay.
So, hi. Hi everybody. Uh my name is
Ching Kyong Lamb. Um I'm the founder and
CEO of PO.AI. AI uh a bit background
about my company uh PTO AI started two
years ago with a invitation from
National Science Foundation from the
SBIR grant funding investigating LLM. We
did a LMB driven drug discovery
application. Uh since then we branch out
to leverage what we learned about
building AI system for large
corporation. We are currently building
expert AI system for several clients.
Currently the system we build goes
beyond rack system. Um many of our
client is asking for AI system that
perform task like research and advisory
role based on their area of interest. Uh
today the talk is about sharing with our
fellow AI engineer what we learned so
far building this kind of system. Okay.
Uh what is knowledge? Okay. Generally
philosophically I say knowledge is the
understanding and awareness gained
through experience education and the
comprehension of facts and principle and
that lead to the next question is what
is knowledge graph right so knowledge
graph is a systematic method of
preserving wisdom by connecting them and
creating a network or interconnect
relationship that's important the graph
represent the thought
process and comprehensive tonomy of a
specific domain of expertise That's why
this is is very important for people
moving forward is about AI system then
think a lot and return uh advice instead
of just retrieve you know data from your
database right so that comes to the
development of this uh K a okay what is
K a kag stand for knowledge augment
generations and it's different from rack
okay it is enhanced language model by
integrating structure knowledge graph
for more accurate and insightful respond
making it smarter more structural
approach than a simple rack kag doesn't
just retrieve remember it understand
this is
different okay after in interviewing a
lot of my client okay so or we also
expert in a certain area of scale I
found that there are common ways of
their thinking decision making process
the way that make them expert in their
area knowledge graph seems to be a
perfect fit So here is a graph or state
diagram if you're a computer engineering
bra like me. So um it shows a wisdom the
the wisdom note as you can see is the is
a core right it's wisdom it just isn't
static it actively guide decision and
fus by other
element the output from the wisdom
actually goes to decision making in the
blue right wisdom isn't passive it guide
decision helping us choose wisely Okay.
And then the decision making analyze the
situation given in the circle in the uh
green and decision aren't make you know
in a vacuum. Okay. They analyze real
world situation. That's the difference.
Okay. So look at the wisdom input. Okay.
Look at the relationship feedback from
the knowledge to wisdom in gold color.
Example of that is knowledge to wisdom
like all your books smart and
encyclopedia wikipdia whatever you store
plus once that data get absorbed by
whatever model you use up there it need
to regurgitate that and understand
that's why it's very important that
wisdom is able to synthesize the data
after you ingested knowledge that's kind
of abstract but I I'll come come to that
later what I'm talking about okay from
Insight example of that is wisdom derive
pattern from chaos like some of my
client has a lot of social media they
their product how do they you know track
their product sediment from from social
media right so it's okay chaotic and
from x tweet right so so from that you
can see some pattern of their competitor
versus uh current what my product is
that that's like an example of that and
I will go to that later okay when all
these connected notes matter together
why do they matter matter all the notes
relate to one another to ever incre
enriching wisdom storing system. Okay,
this talk is about storing wisdom,
right? So knowledge tells you what it
is, right?
And experience tell you what worked
before. Insight invent what to try next.
Right? Like a pizza, knowledge is
recipe. Experience is knowing your oven
burn crust inside is like hey it is at
adding you know honey to the crust you
make caramelize perfectly right so the
most important part of the knowledge
graph is feedback loop okay feedback
isn't oneway street it learn from itself
look at the feedback from the uh going
back to all the note from insight wisdom
okay um situation inform future wisdom
experience deepen it inside. Sharpen it
like a tree growing roots. The more
effect the stronger it gets. Now I want
to ask you a question in general. Where
do you see this circle in your life?
Maybe a tough decision that you know
taught you
something. So one practical application
for leadership is wisdom. Avoid knee
jack reaction by learning from feedback.
As for personal growth, ever notice how
past mistake make you wiser? That's the
loop in the action all this. So the take
away from the slide in this is wisdom
isn't a trophy you earn. It is a muscle
you exercise. The more you feed
knowledge, experience, insight, the more
that guide you. Now I will show you how
it being mapped to my current client.
You know all this is like very abstract,
right? So how I one of my clients
actually doing a competitive analysis uh
they used to have a modeling department
doing that but they want AI to do that
right they asked me to build the system
this exactly what I did with the same
taxonomy of storing all this so this
taxonomy will be later on I talk about
how multi- aent is going to handle all
that here is one of the chatbot that I
build for my client to do you know not
just some uh we not just some chatbot is
our wisdom graph power AI designed to
turn data into strategy right dominant.
So what kind of question I talk about
talk about how do I win my competitor in
this market space that's kind of very
sophisticated question right so without
uh if you do simply just write by first
speaker talk about right right so it's
not going to cut it they're not going to
able to answer that kind of question
okay what I did is this uh we retain the
same testonomy and uh the wisdom is then
mapped the same engine there the wisdom
engine is like a orchestration agent
that does a lot of decision making
including advising what the ARM is able
to see bas based on the current
situation what to do next right so um
what I did is uh for the uh decision
making I map it to a strategy generator
so these customers are talking about a
competitive analysis right so um I map
the knowledge in term of knowledge what
do they have they have market data right
so I map this experience to HP is one of
a kind past campaign so they have a lot
of campaign doing a lot of marketing and
then um the insight is actually mapped
to uh in industrial insight they have a
database doing storing that and then of
course the most important is is the the
situation the situation is how how am I
doing how my product selling right so so
that that is like a situation and then I
map that to a competitor weakness that
means they say if you make the aware of
that you probably get a very good answer
and then the chatbot will probably be
doing the right thing advising so from
here very high level you know state
diagram or there how do I map it to a
system that drive well here comes the
trick so anybody here heard of
n all right all right all right right
all right right all right right it's all
good so so I I first encounter similar
situation when my past IoT project which
is not red developed by uh IBM right so
it's the same kind of thing it's like no
code but but underneath the hood there's
a bunch of code okay it's all nodejs
code okay so uh but but for the for for
for proving your concept and all that.
It's very very very flexible and I I
write highly recommend that and and and
here here you can take a look at the the
workflow the waveform I enable the
implementation of this complicated state
diagram with um what I say is there is a
different community know one of the very
powerful node is AI agent well
previously n is just a workflow
automation tool I'm not selling for n
here I'm just telling you I'm using it
uh for pro prototyping further down the
road maybe the client say too like I I
really need to goal is now have the
option to drive uh different model like
open AAI model entropic model and even
onrem model and then that the key in
making the state m the state machine
work is that about in a graph rag track
why are we talking about fusion inder
well I'm glad you asked because the next
big breakthrough was knowledge graph
with fusion and decoder so you can use
knowledge graphs with fusion and decoder
uh as a technique and this sort of
improves upon the fusion and decoder
paper by using knowledge graphs to
understand the relationships between the
retrieved passages and so it helps with
this efficiency bottleneck and improves
uh the the process. I'm not going to
walk through this diagram step by step
but this is the diagram in the paper of
the architecture where it it uses the
graph and then does this kind of
two-stage ranking of the passages and it
helps with uh improving the efficiency
while also lowering the cost. And so the
team took all this research and came to
came together to build their own um
implementation of fusion indicer since
we actually build our own models uh to
make that kind of the final piece of the
puzzle and it really helped our
hallucination rate. It really drove it
down and then we published a white paper
with our own findings of
it. And so then we kind of had that
piece of the puzzle and there's a few
other techniques that we don't have time
to go over but point being we're we're
assembling together multiple techniques
based on research to get the best
results we can for our customers.
So that's all well and good but like
does it actually work? Like that's the
important part, right? So we did some
benchmarking last year. We used Amazon's
robust QA data set and compared our
retrieval system with knowledge graph
and fusion decoder and everything uh
with our with seven different vector
search uh systems and we found that we
had the the best accuracy and the
fastest response time. So encourage you
to check that out and kind of check out
this process. benchmarks are really cool
but what's even cooler is like what it
unlocks for our customers which are
various features in the product. Um for
one because like most graph structures
we can actually expose the thought
process because we have that
relationships and the additional context
where you can show the snippets and the
subqueries and the sources for how the
rag system is actually getting the
answers and we can expose this in the
API to developers as well as in the
product and then we're also to have able
to have knowledge graph accel call it
multihop questions where we can um
reason across multiple documents and
multiple topics without any struggles.
And then lastly, it can handle complex
data formats where vector retrieval
struggles where an answer might be split
into multiple pages or maybe there's a
similar term that doesn't quite match
what the user is looking for. But we
because we have that graph structure and
and the and fusion and decoder with the
additional context and relationships,
we're able to uh formulate these correct
answers. So again, my main takeaway here
is that there are many ways that you can
get the benefits of knowledge graphs in
rag. That could be through a graph
database. It could be through doing
something creative with posters. It
could be through a search engine. uh but
you can you take advantage of the
relationships that you can build with
knowledge graphs uh in your rag system
and as you get there you can challenge
your assumptions and focus on the
customers to be able to get to the end
result to to make the team successful
and so for our team it was focusing on
the customer needs instead of what was
hyped staying flexible based on the
expertise of the team and letting
research challenge their assumptions
um so if you want to join this amazing
team we're hiring across research
engineering and product. Uh we would
love to talk to you about any of our
open roles. Uh and I am available for
questions. You can come find me in the
hallway or reach out to me on Twitter or
LinkedIn. And that's all I've got for
you. Thank you so much.
[Applause]
can you hear me now? There I am live on.
Okay. In the giant umbrella that is
graph rag, there are many techniques,
many approaches, many ways to get things
done. There's knowledge graph
construction, there's retrieval, but
then there's the notion of going post
rag and thinking about different ways of
thinking about what knowledge is, what
we actually doing in the first place.
So, next up is my good friend Daniel
from Zap to lead us through that.
Daniel,
let's move on.
Not yet. Oh, here we go. 5 4 3 2
1. Great. Well, welcome everybody. Uh,
thank you Andrea. Andreas as well for
the intro. Uh, I'm Daniel, the founder
of Zapai, and we build memory
infrastructure for AI
agents. And I'm going to tell you that
you're doing memory all wrong. Well, it
may not be you directly, but it may be
the framework that you're using to build
your
agents. I also think that knowledge
graphs are awesome. Otherwise, why would
we be here, right? And you should be
using them for agent memory, not just
for graph
rag. So, before I dive into expanding on
my hot takes, I want to touch on why
memory is so important. So we're
routinely building agents that forget
important context about our users. All
that dynamic data that we're gathering
from conversations between the agent and
the user. All the data, business data
from our applications, line of business
applications, etc. There's so much
richness about who the user is and yet
we're not enabling our agents with that
data. And our agents respond as a result
generically or hallucinate even worse.
And this this definitely isn't the path
to AGI or more concretely retaining our
customers. So memory isn't about
semantically similar content. Rag does
that really well. And when I when I talk
about rag here, I'm primarily talking
about vector database-based rag. uh not
necessarily graph
rag but consider the stylized example
where we have learned a brand preference
for Adidas shoes and unfortunately
Robbie's Adidas shoes fall apart so he's
rather
unhappy so the preference changes
however Robbie's follow-up question to
the
agent where he asked what sneakers he
should purchase is most similar to the
first Adidas fact and so if we're using
a vector a database, that fact may be at
the top of the search results and the
agent responds
incorrectly. So when using rag, each
fact is an isolated and immutable piece
of content. And this is a real problem.
The three facts on the left exist with
no understanding of
causality. Semantic search can't reason
with the
why things change over time. And this is
why rag approaches fail as memory. Rag
lacks a native temporal and relational
reasoning. And none of this should be a
surprise. Under the hood, we're just
working with similarity in an abstract
space. There's no explicit relationships
between these embeddings, these vector
representations of the facts that we've
generated for our memory.
However, when we look at knowledge
graphs, we can define explicit
relationships. Graphs can model the why
and at Zep, we've got them to model the
when as well behind the preference
change, which adds a temporal dimension
that your agent can reason over. And
this structural difference is
fundamental to how memory should
work. So, which is a good segue to
graffiti. Graffiti is Zep's open source
framework for building real time dynamic
temporal graphs and it addresses these
exact problems. Graffiti is temporally
aware and graph
relational. You can find it on GitHub uh
go to
git.new/graffiti. It has uh 10,000 plus
stars almost 11,000 quadrupled within
the last six weeks. So thank you
everybody who's tried out graffiti and
loved it. So let's dive into how each of
these attributes of graffiti
works. So this is the secret source.
Graffiti extracts and tracks multiple
temporal dimensions for each fact. It
identified when a fact is valid and
becomes invalid. On the right hand side,
you can see how when we're using the
example that I uh illustrated a few
slides back, how graffiti would pause
those different time
frames. And this enables temporal
reasoning. What did the user prefer in
February? And it can answer questions
that rags simply cannot handle or it
enables your agent to answer questions
that rags simply cannot handle.
And so when we look
at what rag can do, we actually sit with
a bunch of contradictory embeddings with
no resolution in the vector database. So
if we're updating the brand preference,
we'll have a new brand
preference fact in the in the vector
database.
However, graffiti understands that
broken shoes invalidate the love
relationship, which creates a causal
relationship between those three events
in the previous
slide. Broken shoes result in
disappointment, which results in a brand
preference
change. Graffiti doesn't delete the
history of facts as they change, as
they're invalidated, but marks them
invalid rather. And so we store a
sequence of state changes on the graph
which allows your agent to
then reason with those state changes
over time. So for example, the next time
I come back to the e-commerce agent to
purchase shoes, it's not going to
recommend the Adidas shoes to me.
And here's the resulting graph, a closer
approximation to how humans might
process and recall changing state over
time. On the graph, we can see that the
existing Adidas brand
preference is still there, but it hasn't
expired at date. We also see that
there's a new brand preference for Puma
shoes which is in which is valid and it
doesn't have an invalid
date. So graffiti doesn't abandon
embeddings. They're still incredibly
useful. Graffiti uses semantic search
and BM25 full text retrieval to identify
subgraphs within the broader graffiti
graph. And these can be traversed using
graph traversal techniques to develop a
richer understanding of memory. So we
can find adjacent facts that might fill
out the agents understanding of
memory. And the results are then each
fused together. And so this offers a
very fast accurate retrieval approach.
And graffiti has a number of different
search recipes built into it. So you can
really explore how to take different
approaches to um retrieving data for
your particular agent.
So, a little bit of a
bonus when we look at recent changes
that we've added to Graffiti, we allow
developers now to model their business
domain on the graph. Because a mental
health application will have very
different types of things it needs to
store and recall from memory to an
e-commerce agent.
And so graffiti allows you to build
constructs, custom entities and edges
that represent the business objects
within your particular uh business or
application. And so here we have an
example of a media preference where
we've been learning um all about a
users's preferred podcasts and music.
And we have defined an actual structure
here for media
preference. And what this does is it
allows us to then also
retrieve explicitly retrieve media
preferences from the graph rather than a
bunch of other noise that we might have
added to
memory. And this ontology really enables
you to bring a lot of depth to how
memory operates.
So I'm not advocating that you replace
rag everywhere. Rag, graph rag, the
various forms of graph rag and graffiti
each has its strengths and ideal use
cases. The key is recognizing when you
need each. Graffiti is really strong
when you're wanting to integrate
incrementally dynamic data into a graph
without significant recomputation. It's
really strong when you want to model
your business domain. It's strong where
it has very low latency retrieval.
There's no LLM in the path. If you've
tried graph rags, they often have an LLM
in the path. Incrementally summarizing
the output from the graph can take tens
of seconds. Graffiti operates in under
hundreds of milliseconds.
And so the key is recognizing which
solution
offers to your business, what it offers
to your business. And most agent
applications could use rag, a rag
approach and
graffiti. So just summing it all up,
agent memory is not about knowledge
retrieval. Temporal and relational
reasoning is so critical to coherent
memory. We need to track state changes
over time. We need to understand how
something like preferences or user
traits might change over
time and that's something that
contemporary rag solutions
lack. So we published a paper earlier
this year uh describing Zep's use of
graffiti and it's a deep dive into the
graffiti architecture and how Zep
performs as a consequence of using
graffiti under the hood. So you can
follow the link below to land at the
archive preprint if you'd like to take a
look and I'm sure the slides will be
available after the talk so you can uh
uh go to the
paper. So a quick plug for Zep. Zep go
go goes beyond simple agent memory to
build a unified customer record derived
from both chat history and business
data. So you can stream in user chat
conversations but also stream in
business data from your SAS application
from line of business applications like
CRM or billing systems and it builds
this unified holistic view of the user
really enabling your agent to have an
accurate and very comprehensive
real-time understanding of the user so
it can solve complex problems for that
user.
So stick around for the agent memory
lunch and learn which is the next
session. It's being led by Mark Bane and
in it uh amongst other folks uh I'll be
demoing Zep's approach to domain
specific memory built on Graffiti's
custom entities and edges. So uh thanks
for listening to me. Uh we have a few
minutes so I'm happy to answer questions
and I will Yeah, if there are
any no questions. Oh, there's one over
there.
Uh the question was, do you need to use
Zep to use graffiti? No. Graffiti is
open source. It's available on GitHub.
Uh you can go to the link git.new new
graffiti and uh all you'll need today is
Neo forj. So our partners Neoj uh can
assist you with a community edition
install and uh strongly recommend their
desktop product. It's wonderful and you
can get going very
easily. Another question here.
Yeah.
So underneath the
hood, how do you invalidate graph
edges? Are we using
LLMs? So graffiti makes extensive use of
LLMs to intelligently pause incoming
data which could be unstructured or
structured. So the unstructured
conversation, unstructured emails, um
structured data in JSON format and fuse
it together on the graph. And as part of
integrating, we're using LLMs to
identify in a pipeline, identify
conflicting facts. And so that's where
we get this ability to go from broken
shoes to disappointment to a switch in
brand preferences.
Um the LLM is able to understand
emotional veilance of uh the events that
it is
seeing. One more question.
Yeah, depending on the context. So the
question was how do we handle
revalidation of facts if a state flips
back to a prior state and so it depends
on the context. A new edge might might
be created that represents this uh a
successor fact or the invalid at date
might be nullified.
Yeah. Yeah, that's a really good
question. So why can Graffiti do real
time updates but Microsoft Graph Ragg
cannot?
So micro graph rag is uh an
oversimplification is
summarizing document chunks or documents
at many different levels and creating
repeated summarizations at different
levels. So a summary of a summaries of a
summaries etc. And that's very
computationally expensive. So if any of
the underlying data changes you're
ending up with a cascading number of
summarizations. It's expensive and
complicated. Graffiti is designed to
identify specific nodes and edges that
are
implicated in an
update and then is able to invalidate
with a a surgical precision the edges
that are implicated in the
conflict. or we just add new edges or
gra or uh nodes into the graph where
it's relevant. So we're able to use um a
variety of search uh pipelines as well
as a number of different huristics to
really make very focused changes in the
graph which are lightweight and cheap.
Here we go. How does how
does that data? How do you Yeah, that's
a good question as well. So, there are
two ways that uh graffiti
operates. The fir last question the f
the first one is
that graffiti will build the ontology
for you and will very carefully try to
dduplicate edge and node types.
Secondly, as I mentioned a little bit
earlier, we allow you to
define an ontology
using paidantic, zod, uh, gostructs,
etc. All right, I think we're at time.
So, thank you everybody.
And as Daniel mentioned, there's lunch
being served outside right now,
actually. So, I encourage you all to go
get a bite to eat at 1:00. Come back
into the room. We're gonna have a panel
discussion about overall agentic memory
and different implementations of it and
you're going to be part of that panel as
well, right? Yeah. So, it'll be a great
session. So, come back at one o'clock
for a longer session into a memory. Hi,
I have a question. Are there
limitations? Why don't I hop down? Yeah.
Are there any limitation when it comes
to the data set graph can handle because
I
have did a great job. Thank you. I will
come back. Um about 10 minutes before 1.
Okay. Yeah.
Yeah. Did you go
for a lunch break? Okay, cool.
[Music]
Yeah,
she What's going on in this room? That's
why I love our room. She's doing that,
you know.
check check. One, two. Okay, they must
Well, I put him in there just because
Check, check, check.
Make sure I have the image.
You can wear this on your belt. You can
wear it on your You can do whatever you
want. You can put it Could you help me?
As long as I think pocket is the best. I
I usually put it to hide the cable.
There's a camera on it. You like the
cable to be invisible. So, could you
help me? Yeah, sure. Just uh tuck it in.
Of course. Feel free. You're welcome to
all set.
Thanks.
Can we do a test for all of us before we
head to the
stage? Um, is the Thanks. Is the uh
computer all set? Everything fine?
Perfect. I think I need to mirror,
right? Mirror the screens. Okay, I'll do
that.
And the Ethernet is already here. Could
we It's right here. It's right here. Oh.
Oh, wait.
So, you just own this, right? You can
start this. You're all set. You good to
go? I think I need like five minutes if
we have three three minutes. Yes, that's
that's the right
amount. All right.
Can we do a test? He's going to do that.
I see.
I see his. I see.
You said something about mirroring your
screen. Mhm. Up there. So, we have to
like set that up. Yeah. Yeah.
You can practice that. You could
actually tell that to Daniel and Pacilia
just in case. Okay. So that there is no
I mean there are always glitches but
less of them. Yeah.
Oh, can I do this test or you need a
moment? You need a moment. Okay,
great. Thank you.
Let me straighten that.
Come on back to
[Music]
beautiful. All right. All
right. Once you get up there, you're
going to
Nope. Hello. Hello. Hello. Hello. All
right.
That's right. This is the musical
section of the afternoon. Uh, hi
everyone. Welcome to the graph rag track
and we're having a lunch and learn. And
I should remember myself and I've told
the other speakers to stay here in the
middle of the stage uh for the lunch and
learn we we have the great treat that my
good friend Mark Bane Mark who's over
there Mark who's going to be taking us
through agentic memory doing a kind of a
broad sweep like you know dive into the
agentic memory but then we're also going
to have a panel discussion around it and
a couple of demos. This is going to be a
longer session than the rest of the
graph track. It's about 45 minutes.
Totally worth staying for the entire
time. Should be amazing. Mark, are you
ready to talk? Of course. My friend Mark
Bane, please. All
right. One, one, one. All right. How is
everyone doing here?
I'm super excited to be here with you.
Um, this is my first time speaking at AI
Engineer. And, um, we have an amazing
um, group of speakers, guest speakers.
Vasilia Marovitz from Cognney, Vasilia.
Um, oh, there is Vasilia, Daniel Chalev
from Graffiti and Zepai and Alex Gilmore
from Neo4j. Um,
the the plan looks like this. I will do
a very quick power talk and about about
the topic that I'm super passionate
um the AI memory. Next we'll have four
live demos uh and we'll move on to some
new solution that we are proposing a
graph rack chat arena uh that I will be
able to demonstrate and I would like you
to follow along once it's being um
demonstrated and at the very end uh
we'll have a very short Q&A session.
Um there is um a Slack channel that I
would like you to join. Um so please
scan the QR code right now before we
begin and let's make sure that everyone
has access to the um to these materials.
There is um a walkthrough shirt on the
channel that will go
through closer to the end of our
workshop. But I would like you to start
setting it up if you if you may on your
laptops if you want to follow
along. All right, it's a workshop graph
chat. You can also find it on Slack and
you can uh join the channel. So a little
bit about myself. Uh so hi everyone
again. I'm Mark Bane and I'm very
passionate about the memory what is
memory the deep physics and applications
of memory across different technologies.
Um you can find me at markbane uh on
social media or on my website and let me
tell you a little bit of a story about
myself. Uh so when I was um 16 years old
I was very good at maths and I did math
olympiads with many brilliant minds
including Voyek Sarmba the co-founder of
OpenAI and thanks to that deep
understanding of mass and physics
I did have many great opportunities to
be exposed to the problem of AI memory.
So first of all I would like to recall
um two conversations that I had with
Voych and Ilia in
2014 in September when I came here to
study at Stanford. Um, at one party we
met with Ilia and Voytech who back then
worked at Google and they were kind of
trying to pitch me that there will be a
huge revolution in AI and I kind of like
followed that. I was a little bit
unimpressed back then. right now. I
probably
um kind of take it as a very big
excitement when I look back to the times
and I was really wishing good luck to to
the guys who were doing deep learning
because back then I I didn't really see
this prospect of GPUs giving that huge
edge uh in
compute. Uh however uh during that
conversation was like 20 minutes at the
very end I asked
Ilia all right so there is going to be a
big AI
revolution but how will these AI systems
communicate with each
other and the answer was very perplexing
and kind of sets the stage to what's
happening right now uh Ilia simply
answered I don't know I think they
invent their own
language. So that was 11 years ago. Fast
forward to now. Um the last two years
I've spent doing very deep research on
physics of AI and kind of like dove into
all of these most modern AI
architectures including attention
diffusion models, VAEs and many other
ones. And I realized that there is
something critical, something
missing. And this power talk is about
this missing thing.
So over the last two
years I kind of followed on on my last
years of doing a lot of research in
physics, computer science, information
science and I came to this conclusion
that
memory AI memory in fact is any data in
any format and this is important
including code
algorithms and hardware.
were and any causal changes that affect
them. That was something very
mind-blowing to to reach that conclusion
and that conclusion sets the tone to
this whole track, the graph track.
In fact, I was also perplexed by how
biological systems use memory and how
different cosmological structures or
quantum structures they in fact have a
memory. They kind of
remember
and let's get back to maths and to
physics and geometry. When I was doing
science olympiads, I was really focused
on two three things. Geometry,
trigonometry and algebra. And I
realized in the last
year that more or less the volume
of loss in physics perfectly matches the
volume of loss in mathematics. And also
the constants in mathematics, if you
really think deeply through geometry,
they match the
constants both in mathematics and in
physics. And if you really think even
deeper, they kind of like transcend over
the all the other
disciplines. So that made me think a
lot. And I found
out that the principles that govern
LLMs are the exact same principles that
govern
neuroscience and they are the exact same
principles that govern mathematics. I
studied I studied papers
of Perilman. I don't know if you've
heard who is Pearlman. Perilman is this
mathematician who refused
um to take a $1 million award for
proving the
um for for proving the one of the mo
most important
conjectures about symmetries of three
spheres.
Um, and once I realized that this deep
math
of spheres and circles is very much
linked with how attention and diffusion
models work. Basically the formulas that
Pearlman reached are
linking
entropy with curvature and curvature
basically if you think of curvature it's
attention it's gravity so in a sense
there are multiple disciplines where the
same things are appearing multiple
times and I will be publishing a series
of papers with some amazing ing
supervisors who
are co-authors of two of these
uh meth methods methodologies
um the transformers and
VAEs and I came to this realization that
this equation governs everything governs
mass governs physics governs our AI
memory governs neuroscience biology
physics chemistry and so on and so
forth.
So I came to this equation that memory
times compute would like to be a squared
imaginary unit
circle. If that existed ever, we would
have perfect symmetries and we would
kind of not exist because for us to
exist, this asymmetries needs to show
up. And in a sense, every single LLM
through weights and
biases, the weights are giving the
structure. The compute that comes and
transforms the data in sort of the row
format. The compute turns it into
weights. The weights are basically if
you take these billions of parameters,
the weights are the sort of like matrix
structure of how this data looks like uh
when when you really find relationships
in the row data. All right. And then
there are these biases, these tiny
shifts that are kind of like trying to
like in a robust way adapt to this model
so that it doesn't break apart but still
is still is very well reflecting the
reality. So something is missing. So
when we take weights and biases and we
apply scaling laws and we keep adding
more data, more compute, we kind of get
a better and better and better
understanding of the reality. In a
sense, if we had infinite data, we
wouldn't have any biases. And this
understanding is again the principle of
this track of
graph. The disappearance of biases is
what we are looking for when we are
scaling our
models. So in a sense, the amount of
memory and compute should be exactly the
same. it's just slightly expressed in a
different way. But if there are some
there in any
imbalances then something important
happens. And I came to another
conclusion that our universe is
basically a network
database. It has a graph structure and
it's a temporal structure. So it keeps
on moving following some certain
principles and rules.
And these principles and
rules are not necessarily
fuzzy. They have to be
fuzzy because otherwise everything would
be completely
predictable. But if it would be
completely predictable, it means that me
myself would know everything about every
single of you about myself from the past
and myself from the future. So in a
sense, it's impossible. And that's why
we have this sort of like heat diffusion
entropy
models. They allow us to exist. But
something is
preserved.
Relationships. Any single asymmetry that
happens at the quantum
level, any single tiny asymmetry that
happens preserves causal
links. And these causal links are the
exact thing that I would like you to
have as a takeaway from this workshop.
The difference between simple rack,
hybrid rack, any types of rack and graph
rack is that we are having the
ability to keep these causal links in
our memory systems. Basically, the
relationships are what preserves
causality. That's
why we can solve hallucinations.
That's why we can
optimize hypothesis generation and
testing. So we will be able to do
amazing research in biosciences,
chemical
sciences just because of understanding
that this causality is preserved within
the relationships. And these
relationships when there are these
asymmetries that are needed they kind of
create this curvature I would say. So we
under we intuitively feel every single
of you is choosing some specific
workshops and talks that you guys go
to. Right now all of you are attending
to the talk and workshop that we are
giving. It means that it matters to you
and it means that potentially you see
value and this value this information is
transcended through space and
time. It's very subjective to you or any
other
object and I think we
really need to understand this. So LMS
are basically this weights and biases.
So correlations they give us this
opportunity to be
fuzzy. You know actually one thing that
I learned from Voytech 10 8 11 years
ago was that hallucinations are the
exact necessary thing to be able to
solve a problem where you have too
little memory or too little compute for
the combinatorial space of the problem
you are solving. So you're
basically imagining you're taking some
hypothesis basing based on your history
and you're kind of trying to project it
into the future. But you have too little
memory, too little compute to do that.
So you can be as good as the amount of
memory and compute you have. So it means
that the missing part is something that
you kind of can curve thanks to all of
these causal relationships and this
fuzziness. and
oops
reasoning is reading of these
asymmetries and the causal links.
Hence, I really believe that agentic
systems
are sort of the next big thing right now
because they are following the network
database
principle. But to be
causal, to recover this causality from
our
fuzziness, we need graph databases. We
need causal relationships. And that's
the major
thing in this emerging trend of
graph that we are here to talk about.
And I would like to at this moment
invite on stage our three amazing guest
speakers. And I would like to start with
Vasilia. Vasilia, please come over to to
the stage. Next will be
Alex and Daniel.
and I'll present something myself. All
right. So, uh, Vasilia will show us how
to lurch, search and optimize memory
based on certain use case at hand. All
right, let's test.
Um, so let's just make sure this works.
Okay. So, um,
nice to meet you all. Uh, and I'm
Vasili. I'm originally from Montenegro,
small country in the Balkans. Um,
beautiful. So, if you want to go there,
my cousins Igor and Milos are going to
welcome you. Everyone knows everyone. So
uh you know if in case you're just
curious about memory I'm building a
memory tool on top of the graph and
vector databases. My background is in
business big data engineering and
clinical psychology. So a lot what Mark
uh talked about kind of connects to
that. Um I'm going to show you a small
demo here. Uh the demo is to do a
Mexican standoff between two developers
where we are analyzing their GitHub
repositories and these uh data from the
GitHub repositories is in the graph and
this Mexican standoff means that we will
let the crew of agents go analyze look
at their data and try to compare them
against each other and give us a result
that should represent how uh who should
we hire let's say out ideally out of
these two people. So uh what we're
seeing here currently is how Cognify
works in the background. So Cognify is
working by uh adding some data turning
that into a semantic graph and then we
can search it with wide variety of
options. We plugged in crew AI on top of
it. So we can pretty much do this on the
fly. So um here in the background I have
a client running. This client is
connected to the to the system. So um
it's now currently uh searching the data
sets and starting to build the graphs.
So let's uh see it takes a couple of
seconds but uh in the background uh we
are effectively ingesting the GitHub uh
data from the GitHub API building the
semantic structure and then uh letting
the agents actually search it and make
decisions on top of it. So as every time
with live demos things might go wrong.
So I have a video version in case this
does. Let's
see and I'll switch to the vid. Oh here
we go. So uh the semantic graph started
generating and as you can see we have
activity log where the graph is being
continuously updated on the fly. Data is
being stored in memory and then uh data
is being enriched and the agents are
going and making decisions on top. So
what you can see here on the side is
effectively the agentic logic that is
reading, writing, analyzing and using
all of these uh let's say preconfigured
set of weights and benchmarks to to
analyze any person here. So cogn is a
framework that's modular. You can build
this task. You can ingest from any type
of a data source. 30 plus data sources
supported now. You can build any type of
a custom graph. You can build graph from
relational databases. semi-structured
data and we also have this memory
association layers inspired by the
cognitive science approach. And then
effectively um as we kind of build and
and enrich this graph on the fly, we see
that uh you know it's getting bigger,
it's getting more popular and then we're
storing the data back into the graph. So
this is the uh stateful temporal aspect
of it. we kind of build the graph in a
way that we can add the data back that
we can analyze these reports that we can
search them and that we can let other
agents access them on the fly. The idea
for us was let's have a place where
agents can write and continuously add
the data in. So um I'll have a look at
the graph now so we can inspect it a
bit. So if we click on on any node we
can uh see that uh the details about the
commits about the information from the
from the developers the PRs whatever
they did in the past and and which repos
they contributed to and then at the end
as the graph is pretty much filled we
would see the final report kind of
starting to come in. So let's see how
far we got with this. So it's
taking it's preparing now the final
output for the hiring decision task. So
let's have a look at that when it gets
loaded. We just finished this this
morning. I hope to had a hosted version
for you all today, but didn't work
through AI's causing some troubles. So
uh let's uh we had to resolve this
one. So let's
see. Yes. So I will
just show you the video with the end so
we don't wait for it.
So here you can see that towards the end
uh we can see the
graph
and we can see the final decision which
is a green node and in the green node we
can see that we decided to hire lasslo
our uh developer who has a PhD in graphs
so it's not really difficult to make
that call and we see why and we see the
the numbers and the benchmarks. So thank
you. This has been very fast three
minute demo. So hope you enjoyed and if
you have some questions I'm here
afterwards. We we are open source so
happy to see new users and if you're
interested try it. Thanks.
Woohoo. Thank you. Thank you Vasilia. Um
next up is Alex. So Vasilia showed us
something I call semantic memory. So
basically you take your raw data, you
load it and cognify it as they like to
say. Come on, come on up, Alex.
And that's the base. That's something we
already are doing. And next up is Alex
will show us Neo4j MCP uh server. The
stage is yours.
test. Four, three, two, one. We're good.
Okay. All
right. Okay. So, hi everyone. My name is
Alex. Um, I'm an AI architect at Neo
Forj. Um, and I'm going to demo the
memory MCP server that we have
available. Um, so there is this
walkthrough document that I have. Um,
we'll make this available in the Slack
or by some means so that you can do this
on your own. Um, but it's pretty simple
to set up. Um, and what we're going to
showcase today is really like the
foundational functionality that we would
like to see in a aentic memory sort of
application. Um, primarily we're going
to take a look at semantic memory in
this um, MCP server, but we are
currently developing it and we're going
to add additional memory types as well
um, which we'll discuss uh, probably
later on in the presentation.
Um so in order to do this we will need a
neo forj database. Neoraj is a graphn
native database that we'll be using to
store our knowledge graph that we're
creating. Um they have a aura option
which is um hosted in the cloud or we
can just do this locally with the neoj
desktop app. Um, additionally, we're
gonna do this via Claude desktop. And
so, we just need to download that. And
then we can just add this config to the
um, MCP configuration file in Claude.
And this will just connect to the Neo
Forj instance that you create. Um, and
what's happening here is we're going to
uh, Claude will pull down um, the memory
server from Pi and it'll host it in the
back end for us. And then it'll be able
to use the tools that are accessible via
the MCP server.
And the final thing that we're going to
do before we can actually have the
conversation is we're just going to use
this brief system prompt. And what this
does is just ensure that we are properly
recalling and then logging memories
after each interaction that we
have. Uh so with that um we can take a
look at a conversation that I had um in
claw desktop using this memory server.
Um and so this is a conversation about
starting an agentic AI memory company.
Um and so we can see um all these tool
calls here. And so initially we have
nothing in our memory store which is as
expected. But as we kind of progress
through this conversation we can see
that at each interaction it tries to
recall memories that are related to the
user prompt. And then at the end of this
interaction it will create new entities
in our knowledge graph um and
relationships. And so in this case, an
entity is going to have a name, a type,
and then a list of observations. And
these are just facts that we know about
this entity. And this is what is going
to be updated um as we learn
more in terms of the relationships.
These are just identifying how these rel
uh how these entities relate to one
another. And this is really the core
piece of why using a uh graph database
as sort of the context layer here is so
important because we can we can identify
how these um entities are actually
related to each other. It provides a
very rich context. And so as this goes
on we can see that we have quite a few
interactions. We are adding observations
creating more
entities. And at the very end here we
can see we have quite a lengthy
conversation. we can say, you know,
let's review what we have so far. And so
we can read the entire knowledge graph
back as context and Claude can then
summarize that for us. And so we have
all of the entities we found, all the
relationships that we've identified, and
all the facts that we know about these
entities based on our conversation. And
so this provides a nice review of what
we've discussed about this company and
our ideas about how to create it. Now,
we can also go into Neoraj browser. Um,
and this is available both in Aura and
local. And we can actually visualize
this knowledge graph. We can see that we
discussed Neo Forj, we discussed MCP and
Langraph. And if we click on one of
these nodes, we can see that there is a
list of observations that we have. And
this is all the information that we've
tracked throughout that conversation.
And so it's important to know that like
even though this knowledge graph was
created with a single conversation, we
can also take this and use it in
additional conversations. We can use
this knowledge graph with other um uh
clients such as cursor IDE or windsurf.
And so this is really a powerful way to
um create a like memory layer for all of
your
applications. Um and so with that um
I'll pass it on. Thank you. All right,
give a round of applause to Alex.
Thank you, Alex. The next up is Daniel.
Um I will just share personal um beliefs
about MCPS. Um I was testing MCPs of
Neo4j graffiti Cognney Mzier just before
the workshop and I'm a strong believer
that this is our future. We'll have to
work on that and in a second I will be
showing a mini graph chat arena. And
next up something very very important
that Daniel does is temporal graphs.
Daniel uh is co-founder of graffiti and
zep. They have 10,000 stars on GitHub
and growing very fast. The stage is
yours. Daniel, please show us what you
do. Thank you. So, five, four, three,
two, one. Did that work? Seems to have
right. So,
um I'm here today to tell you that
there's no onesizefits all memory.
um and why you need to model your memory
after your business domain. So, if you
saw me a little bit earlier and I was
talking about Graffiti, Zep's
open-source temporal graph framework,
um you might have seen me just speak to
how you can build custom entities and
edges in the graffiti graph for your
particular business domain. So business
objects from your business domain. What
I'm going to demo today is actually how
Zep implements that and how it easy it
is to use from Python, TypeScript or Go.
And what we've done here is we've solved
a fundamental problem plaguing
memory. And we're enabling developers
to build out memory that is far more
cogent and capable for many different
use cases.
So I'm going to just show you a quick
example
of where things go really wrong. So many
of you might have used chat GPT before.
It generates facts about you in memory.
And you might have noticed that it
really struggles with
relevance. Sometimes it just pulls out
all sorts of arbitrary facts about you.
And unfortunately when you store
arbitrary facts and retrieve them as
memory, you get inaccurate responses or
hallucinations.
And the same problem happens when you're
building your own
agents. So here we go. We have an
example media assistant and it should
remember things about jazz music, NPR,
podcasts, the daily, etc. All the things
that I like to listen to. But
unfortunately, because I'm in
conversation with the agent or it's
picking up my voice when I'm, you know,
it's a voice agent. Um, it's learning
all sorts of irrelevant things like I
wake up at 7 a.m. My dog's name is
Melody, etc. And the point here is that
irrelevant facts pollute memory. They're
not specific to the media player
business domain. And so the technical
reality here is as well that many
frameworks take this really simplistic
approach approach to generating facts.
If you're using a framework that has
memory capabilities, agent framework,
it's generating facts and throwing it
into a vector database. And
unfortunately the facts dumped into the
vector database or reddus mean that when
you're recalling that memory, it's
difficult to differentiate what should
be returned. We're going to return what
is semantically similar.
And here we have um a bunch of facts
that are semantically similar to my
request for my favorite tunes. Um we
have some good things and unfortunately
Melody is there as well because Melody
is a dog named Melody and that might be
something to do with tunes. Um and
so bunch of irrelevant
stuff. So basically semantic similarity
is not business
relevance and this is not
unexpected. I was speaking a little bit
earlier about how vectors and are just
basically projections into an embedding
space. There's no causal or relational
uh relations between them. And so we
need a solution. We need domainaware
memory not better semantic search.
So, with that, I am going to
unfortunately be showing you a video
because the Wi-Fi has been absolutely
terrible.
Um, and let me bring up the
video.
Okay.
So, I built a little application here
and it is a finance coach and I've told
it I want to buy a house.
And it's asking me, well, how much do I
earn a year? It's asking me about what
student loan debt I might have. And
we'll see that on the right hand side,
what is stored in Zep's
memory are some very
explicit business objects. We have
financial goals, debts, income sources,
etc. These are defined by the developer
and they're defined in a way which is
really simple to understand. We can use
paidantic or zod or go
strcts and we can apply business rules.
So let's go take a look at some of the
code here. We have a TypeScript
financial goal schema using Zep's
underlying SDK. We can define these
entity types. We can give a description
to the entity type. Uh we can even
define fields, the business rules for
those fields. So the values that they
take on. And then we can build tools for
our agent to retrieve a financial
snapshot which runs multiple zep
searches at the same time concurrently
and filters by specific node
types. And when we start our Zep
application, what we're going to do is
we're going to register these particular
goals uh sorry objects with uh Zep. So
it knows to build this ontology in the
graph. So let's do a quick little
addition
here. I'm going to say that I have
$5,000 month
rent. I think it's rent.
And in a few seconds, we see that Zep's
already paused that new message and has
captured that $5,000. And we can go look
at the ch the graph. This is the the Zep
front end. And we can see the knowledge
graph for this user has got a debt
account entity. It's got fields on it um
that we've defined as a developer. And
so again, we can really get really tight
about what we retrieve from Zep by
filtering. Okay, so we're at time. So
just very quickly, we wrote a paper
about how this all of this works. You
can get to it uh by that link below. And
appreciate your time
today. You can look me up
afterwards. Great paper, by the
way. All right. Uh once I'm getting
ready um I would appreciate if you
confirm with me uh whether you have
access to Slack. Uh is the Slack working
for you the Slack channel? All right. I
think we are slowly running out of time.
So I'd appreciate if you have any
questions to any of the speakers. Please
uh write these questions on Slack and we
will be outside of this room and we are
happy to answer more of these questions
just after the workshop.
I right now move on with um a use case
that I developed and to this graph rack
uh chat
[Music]
arena to be
specific
before delving into aic memory into
knowledge
graphs. I led a private cyber security
lab and worked for defense clients, a
very big clients with very serious
problems on the security side. And I
used
to in one project I had to navigate
between something like 27
29 different terminals and
shells
and it requires knowing lots of
languages like if you think of like
different Linux distros every firewall
and networking devices usually has its
own shell proprietary often there is
PowerShell so you need to know like lots
of languages to communicate with these
machines to to work with such clients
and I realized that LLMs are not only
amazing to translate these languages but
they are also very good to kind of
create a new type of shell a human
language shell. There are such shells,
but such shells, they would really be
excellent if they have episodic memory,
the sort of temporal memory of what was
happening in this shell historically.
And if we have access to this temporal
history, the events, we kind of know
what the users were doing, what their
behaviors are, we kind of can control
every single code execution function
that's running, including the ones of
agents. So I spotted with some investors
and advisers of mine, I spotted a
niche, something we call agentic
firewall, and I wanted to do a super
quick demo of how it would work.
So basically you would um run commands
and type pwd and in a sense we I suppose
lots of us had computer science classes
or or we worked in shell and we have to
remember all of these commands like um
show me running docker containers like
it's docker ps right but if you go for
more advanced commands we can
Uh, I think it's for a reason. Yeah, I
think it's for a reason.
Um, one
second. Sorry about that.
All right, it's there. Okay, thank you.
In general, I would need to know right
now some command that can extract me,
for instance, the name of the container
that's running and its status. Show me
just
um image and
status. I can make mistakes like human
language fuzzy mistakes.
Um show if Apache is running.
All right, show the
command we did
three commands
ago. So basically, if you plug in the
agentic, if you plug in the agentic
memory to things like that, I think I
think it got it wrong, but you get me
right. So if I get through like
different shells and terminals
um and I have this textual context that
what was done and the context of the
certain machine of what is happening
here and it kind of spans across all the
user all the machines all the users and
all the sessions in PTY TTY I think that
we can really have a very good context
also for security so that space um the
temporal locks, the episodic locks is
something that I see will boom and
emerge. So I believe that all of our
agents that will be executing code in
terminals will be executing it through a
maybe not all but the ones that are
running uh on the enterprise gate they
will be going through agentic firewalls.
I'm I'm close to sure about that. So
that's my use case. Um, and now let's
move on to GraphRack Chat Arena. So, you
have on Slack
uh a link to this doc and this doc is
allowing you to set up a repo that we've
created for this workshop and we'll be
promoting it afterwards. So about a year
ago, I met with Jeru from Lamal Index
and we were chatting quite a while about
like how to evolve this conversational
memory and he gave me two pieces of
advice. One of them think about data
abstractions, the other think about
evolves. Data abstractions I kind of
quickly solve within like two months.
Evils I realized that there won't be any
evolves in form of a benchmark. This all
of these hot potatoes and all of that
it's fun. I know that there are great
papers written by our guest speakers and
other folks about hot potatoes, but it's
not the thing. You can't do a benchmark
for a thing that doesn't exist.
Basically, the agentic graph memory will
be this type of memory that evolves. So,
you don't know what will evolve. So, if
you don't know what will evolve, you
will need a simulation arena and that
will be the only
right evil.
So one year fast forward and we've
created a prototype of such a gentic
memory arena. Think about it like web
arena but for memory. And let me quickly
show you that you can go to this
repository. I did a fork of that there
is mezzero, there is graffiti, there is
cogni um and there will be two
approaches. one approach will be
um sort of the repo the the library
itself and the other is through MCPS
because we don't really know what will
work out better so whether repos or the
MCPS will work out better so we'll need
to test these different approaches but
we need to create this arena for that so
you basically clone that repo and we use
ADK for that so we get this nice chat
where you can talk to these agents
And you can switch between agents. So I
want to talk with Neo and there is a
Neo4j agent running behind the scenes.
There is a cipher graph agent running
behind the scenes and I can kind of for
now switch between these agents. Maybe
I'll increase the font size a little
bit. So the Neo agents basically
answering the questions about this
amazing technology, the graphs,
specifically Neo forj.
and I can switch to
cipher and then an agent that is
excellent at running cipher queries
talks with me and I'm writing add to
graph data mark and I'm passionate about
memory architectures and basically what
it does is it runs these layers that are
created by cogni by mezero by graffiti
and all the other vendors of semantic
and temporal memory solutions
or specifically created by an MCP server
that Alex was demonstrating, the Neo
Forj MCP server. So, I'm really looking
forward to how this uh technology
evolves. But what I really what I
quickly wanted to show you is that it
already works. It has this science of
being this identic memory arena. So I
can ask my graph through questions and
the agent goes to the connection. This
is just one. You know what's amazing?
It's just one Neo4j
graph. It's just one Neo4j graph on the
back end and all of these technologies
that can be tested how the graphs are
being created and retrieved. It's it's
like when I think of that it's like the
most brilliant idea that we can do with
agentic memory simulations. So I get
answers from the graph. Here is the
graph. I can basically rerun uh the
commands to see what's happening on this
graph. And let me just move
on and next thing is I would like to add
to the graph that Vasilio will show how
to integrate Cognney and So I add new
information and the cipher writes it to
the
graph and then I want to do something
else. It's it's super early stage still
but then I transfer to graffiti and I
can repeat the exact same process. So I
can right now using graffiti search what
I just added and I can switch between
these different memory solutions. So
that's why I'm so excited about that.
And we do not have time to like practice
it together, do the workshop, but I'm
sure we will write some articles. So
please follow us. And I would appreciate
if you have any questions, pass them on
to Slack. I I will ask Andreas whether
we have time for a short Q&A or we need
to move it to to like breakout or
outside of the room. Take like five
minutes. Five minutes. All right. So um
that's all for for now for today. I I
really uh would like um Vasilia, Daniel
and Alex to come back to stage so you
can ask any of us. Please uh direct the
questions to to any of us and we'll try
to uh answer them. Yeah, let's go. Hi,
I'm Lucas. Um I want to ask a a
fundamental question. How do you decide
what is a bad memory over time? uh
because you you could like as a
developer and as a person we evolve the
the line of thought right so one thing
that you thought was good like three
years 10 years ago may not be good right
today uh so how do you
decide a very good question so um I I'll
answer in maybe you guys can help I will
answer in a very scientific way so
basically the one that causes a lot of
noise the noisy one doesn't make a lot
of sense
So you decrease noise by
redundancy and by relationships. So the
less relationships and the more
noisiness the so in a sense and not not
well connected node has a potential of
not being correct but there are other
ways to validate that
and would you like to follow on? Yeah,
sure. Uh, a practical way, um, we let
you model the data with Pantics. So, you
can kind of load the data you need and
add weights, uh, to the edges and nodes.
So, you can do something like temporal
waiting. You can add your custom let's
say logic and then effectively you would
know how your data is kind of evolving
in time and how it's becoming less or
more relevant and what is the set of
algorithms you would need to apply. So,
this is the idea not solve it for you,
but let help you solve it with tooling.
Uh but yeah there is depends on the use
case I would say. Yeah I don't add I
think it's a great explanation. I I
think I what I would add is that there
is missing causal causal links. Missing
causal links is what is most
probably a good indicator of fuzziness.
Yeah. Next question.
Can you hear me? How would you bet embed
in um security or privacy into the
network or the application layer? If
there's a corporate, they have top
secret data or I have personal data that
is a graph. I want to share that but not
all of
it. Oh, that's that's a really good one.
I I think I'll answer that um very
briefly. So, basically, you do have to
have that context. You do have to have
that these decisions intentions of
colonels of majors and anyone like in
the enterp like seesaw and anyone's in
in the enterprise stack and in a sense
it also gets kind of like fuzzy and
complex. So I expect this to be a very
big challenge that's that's why I want
to work in that. But I'm sure that
applying ontologies, the right
ontologies first of all to this
enterprise cyber security stack really
kind of provides this guard guard rails
for navigating this challenging problem
and and decreasing this fuzziness and
errors. Thank you. Yeah, I would also
just add like all these applications are
built on Neo forj and so in Neo forj you
can like do RO based access controls and
so you can prevent users from accessing
data that they're not allowed to see. So
it's something that you can configure
with that.
This question is for Mark.
Yeah, go on. Go on. Go on. You were
about to say something. Please go. Just
one thing like we also noticed that if
you isolate per graph per user or kind
of keep it like very physically separate
for us it really works well. People
react to that really well. So that's one
way. Yes. Independent graphs, personal
graphs. Yeah. Mark in your earlier
presentation you mentioned and this
equation that related gravity entropy
and something and also memory and
compute to IQ square. Could you show
those two again and explain them again?
Of course. Yeah. If if we have time.
Other than that um it's probably for a
series of papers to properly explain
that. So that's one memory times compute
equals I square. The other one is that
if you take all the attention diffusion
and VAS which are doing the smoothing it
preserves the sort of
asymmetries. So very briefly speaking
let's set up the vocabulary. So first of
all curvature equals attention equals
gravity. This is the very simple most
important principle here. I I will need
to when writing these papers we are
really tightly trying to define these
three next diffusion heat
entropy it's the exact same thing we
just need to align definitions and if
it's not exact same thing if there are
other definitions we need to show what's
really different and now if you think
about
attention it kind of shows the sort of
like pathways towards certain
asymmetries if you take a sphere if you
start bending that sphere and make it
like you know like you you kind of try
to extend it. Two things happen. Entropy
increases and curvature increases in a
sense and and Pearlman what he did he
proved that you can like bend these
spheres in any way. 3D spheres 4D and 5D
and higher like level spheres were
already solved. So he solved for 3D
sphere and these equations are proving
that basically there won't be any other
architectures for LMS. it will be just
attention diffusion models and VAS like
maybe not just VAS but like kind of like
something that moves uh leaves room for
biases. All right, thank you all. Uh I
really appreciate you coming. I hope it
was helpful. Thank you the guest
speakers and we'll answer the questions
uh outside of the room. Appreciate that.
Okay, we've got about uh maybe a
10-minute break before the next speaker
is up, but we've got a bit of setup to
do. So, this is a great time to grab a
coffee. Michael is going to be talking
to us
next about practical graph rag, right?
Yeah.
Hi. If you are staying for the next
session, I believe you have to go out
and get your badge scanned because
that's how they keep track of how many
people are at each session.
So, that was the directive. Thank you.
Thank you.
We'll hack later. Thanks.
Thank you. Thank you. Um,
yeah. Thank you everyone. We're closing
out this for turnover. Appreciate you.
Have a very nice presentation. I like
your way of presenting from another.
Please go out and get your batteries
scanned if you're going to stay in the
room. Let's head out because they need
to get ready. So, I will be there. Let's
talk. I'll be there in 30 seconds. Okay,
just go grab my stuff.
Am I a robot?
So very interesting. Of course. Let's
go. Yes, we will. Um, go on. Love the
physics uh connection you built there.
Can I do you write blogs about this? Can
I discover some of your um I will be
writing such like deep science like
theoretical physics papers. first
theoretical physics then I mean I have
drafts okay that are being like reviewed
it's it's it's it's really like one
second
uh so it's really challenging to kind of
question general relativity and per it's
like built on perilman and all of this
like quantum physics it's just like I I
only feel comfortable doing that when I
have very good supervisors so it takes
time I basically so the The way it was
is I was starting with like Cognney and
graffiti and me zero. We were like kind
of like building things, but I like I
want to get into science. So um follow
me on LinkedIn and on uh the website.
Okay, I can probably write some uh post
uh about it briefly. But what we are
trying to do is like do this deep
theoretical physics papers first and
after that so so papers like one to
three will be about that papers like 35
will be about relating that to uh
transformers diffusion models heat
transfer and all of these other things
and in a sense
um I I I feel like doing popular science
someone will take care of that. So, so
we are trying to do like real research.
What's your website again? Uh, it's
markbane.com. I I can show it to you.
One second.
Um,
um, could take a picture. Of course.
Yeah,
of course. Uh, can you can you end my
one second? Can you take it? Yeah.
Thanks. Yeah.
Yeah, we'll take care.
Did you get mic back?
Oh, okay. Cool.
Okay. So, he's going to be one.
Okay.
Okay, go ahead and do the count back
from five. Hello. Hello. Say 51.
Michael,
count back from five, please.
A little slower. Oh, good.
Give me a countdown back from 10. 10 9 8
7 5 6 5 4 3 2 1. All right, you're
good. So go ahead and
[Music]
this this part I'll
fix it.
Sure.
Okay, count back from five for me. Five,
four, three, two, one, zero.
Is that okay? Yeah. Yeah, that's fine.
Okay. you know best.
Beautiful.
All right. Thank you very much.
Thank you.
So, we
just at the beginning have all these
slides about like the papers. Do we want
to skip some of them? Because I had this
like summary slide that had all the
search stuff on one slide.
How do you want to do it? Was it like
slides.
I mean, I'm just going to click through
them. Okay. Yeah. So, and I'll go here,
right? So, from here. Yeah. Switch. She
said you need to stay at the podium. So,
you have to step back and then I go here
because of the live streaming. Okay.
It's going to YouTube. It's fine. Cool.
Okay. So, I'll just intro. I think we
should be up here together for the
intro. Yeah. But then if you don't want
to stand here and watch me then you can
sit down.
So you can just do this and then
Yeah.
Do you have a clicker? No, I forgot mine
at home. Do you have one? Yeah, I got
one.
I mean,
let's try.
Oh, I think just one USB
stick. Oh, USB.
We're always
ready.
Good. Now we can
That's
good. Hello everyone. Hope you had some
good coffee. Please come in. Uh we are
talking about graph rack today. That's
the graph rack trick of course. Uh and
we want to look at patterns for
successful graph applications uh for um
making LLMs a little bit smarter by
putting knowledge graphs into picture.
My name is Michael Hunga. I'm VP of
product innovation at V4J. My name is
Steven Shen. I lead the developer
relations at Neo Forj and um actually
we're we're both co-authoring. This is
fun because we're both already authors
and finally we've been friends for years
and we finally get to co-author a book.
We're co-authoring Graph Ragg the
definitive guide for O'Reilly. So
basically we didn't sleep this past
weekend because we had a book deadline.
Yep.
So, um, I'm going to talk a little bit
about kind of at a high level what graph
is, why it's important, what we're
seeing in the media, and then Michael's
going to drill down into all of the
details and patterns, and give you a
bunch of takeaways and things you can
do. This is probably if if you want to
know how to do graph rag, Michael's
quick dive on this is the best
introduction you can get. So, I'm also
excited. Awesome. Let's get going. Okay,
so the case for graph rag is where we're
going to start. And the challenge with
using LMS and using other patterns for
this is basically they they don't have
the enterprise domain knowledge. They
don't verify or explain the answers.
They're subject to
hallucinations. Um, and they have
ethical and data bias concerns. And you
can see that very much like our our
friendly parrot here. Um, they are all
the things which parrots behave and act
like except a cute bird.
So we want to do better than this with
graph rag and figure out how we can use
domain specific knowledge accurate
contextual and explainable answers. And
really I think like what a lot of
companies and what the industry is
figuring out is it's really a data
problem. You you need good data. You
need to have data you can power your
system with. Um one of the patterns you
can do this with is rag. So you can
stick your external data into into a rag
system. you can get stuff back from a um
a database for the pattern, but vector
databases and rag fall short because
it's it's lacking kind of your full data
set. It's it's only pulling back a
fraction of the information by vector
similarity algorithms. Typically, a lot
of the especially modern vector
databases which everyone's using,
they're they're easy to get started
with, but they're not robust. They're
not mature. They're not something which
has scalability and fallback and gives
you that what you need to get into build
a strong robust um enterprise system and
vector similarity is not the same as
relevance. So results you get back from
using a basic rag system. They're they
give you back things which are related
to the topic but it's not complete and
it's typically also not very relevant.
And then it's very hard to explain
what's coming out of the system. So we
need answer lifeline. Yeah. Graph rag.
And what graph is is we're
bringing the re we're bringing the
knowledge and the context in the
environment to what LLMs are good at. So
you can think of this kind of like the
human brain. Our our left brain is um
our right brain is more creative. It
does more like like building things. It
does more um extrapolation of
information. Whereas our left brain is
the logical part. That's what actually
has reasoning, has facts, and can enrich
data. And it's built off of knowledge
graphs. So, a knowledge graph is a
collection of nodes, relationships, and
properties. Here's a really simple
example of a knowledge graph where you
have two people. They they live
together, you have a car, but when you
look into the details, it's actually
like a little bit more complex than it
seems at first because they they both
have a car, but the owner of the car is
not the person who drives it. This this
is kind of like my family. My wife does
all the bills, but then she hands me the
keys whenever we get on the freeway. She
she hates driving. So, knowledge graphs
also are a great way of getting really
rich data. Um, here's an example of the
Stack Overflow graph built into a
knowledge graph where you can see all of
the rich metadata and the complexity of
the results. And we can use this to
evolve rag into a more complex system,
basically graph rag, where we get better
relevancy. We're getting more relevant
results. we get more context because now
we can actually pull back all of the
related information by graph closeness
algorithms. We can explain what's going
on because it's no longer just um
vectors. It's no longer statistical
probabilities coming out of a vector
database. We actually have nodes. We
have structure. We have semantics we can
look at and we can add in security and
role-based access on top of this. So
it's contextrich, it's grounded. This
gives us a lot of power and it gives us
the ability to start explaining what
we're doing. where now we can we can
visualize it, we can analyze it and we
can log all of this. Now um this is one
of the the initial papers the the graph
rag paper from Microsoft research where
they went through this and they did they
showed that you could actually get not
only better results but less token
costs. It was actually less expensive to
do a graph rag algorithm. Um there have
been a lot of papers since then which
show all of the different research and
interesting work which is going on in
the graph rag area and um this is just a
quick view of the different studies and
results which are coming out but even
from the early data.orld study where
they showed a three times improvement in
graph rag
capabilities and the analysts are even
showing how graph rag is trending up. So
this is the um Gartner um kind of hype
cycle from from 2024 and you can see
generic AI is kind of you know on the
downtrend. Rag is getting over the hump
but graph rag and a bunch of these
things actually are providing and
breathing more life into the AI
ecosystem. So a lot of great reports
from from Gartner showing that it's
grounded in facts. It resolves
hallucinations together. knowledge
graphs and AI are solving these problems
and it's getting a lot of adoption by
different industry leaders by big
organizations um who are taking
advantage of this and actually producing
production applications and making it
work like LinkedIn customer support
where they actually wrote this great
research paper where they showed that
using a knowledge graph for customer
support scenarios actually gave them
better results and allowed them to
improve the um quality and reduce the
response time for getting back to
customers. Um, median perissue
resolution time was reduced by
28.6%. I mentioned the data.world study
which basically was a comparison of
doing um, rag on SQL versus rag on graph
databases and they showed a three times
improvement in accuracy of LM responses
and let's chat about patterns Michael
because I think everyone's here to learn
how to do this. Exactly. So let's let's
look at how to do this actually. Right.
So and um if you look at graph rack uh
there actually two sides to the coin. So
one of course you don't start in a
vacuum you have to create your knowledge
graph right. So VC basically multiple
steps to get there. Initially you get
unstructured information. You
substructure it. You put it into a
lexical graph which represents documents
chunks and their relationships. In a
second step, you can then extract
entities using for instance LLMs with
this graph schema to extract entities
and relationships from that graph. And
in a third phase, you would enrich this
graph for instance with graph algorithms
doing things like you know page rank,
community summarization and and so on.
And then when you have this uh built-up
knowledge graph, then you do graph rack
as the as the search mechanism um either
with local search or global search and
um other ways. Right? So let's first
look at the first phase of like
knowledge graph construction a little
bit. Um so like always in data
engineering there is if you want to have
higher quality outputs you have to put
in more effort at the beginning right so
it's basically nothing comes for free.
There's no free lunch after all. But
what you do at the beginning is
basically paying off multiple times
because what you get out of your
unstructured documents is actually
highly high high quality high structured
information which you then can use to
extract contextual information for your
for your queries which allows the rich
retrieval at the
end. Okay. And so after seeing uh graph
rack being used uh by a number of users
customers we've seen uh we looked at
research papers we we saw that a number
of patterns emerging uh in terms of like
how we structure our graphs how we query
these graphs and so on and so we started
to collect these patterns and put them
on graph.com
um and we want to I wanted to show what
what this looks like. So we have
basically uh example graphs uh in the
pattern the pattern has a name
description uh context and we see also
queries that are used for extracting
this information. Right? So for instance
here's an uh mix of a lexical graph and
a domain graph and then we can have the
query that fetches uh this information.
Let's look at the three steps in a
little bit more detail on the um on the
graph model side. So on one side we have
uh for lef figure graphs you documents
and the elements. So that could be
something simple as a chunk. But if you
have structured element documents, you
can also do something like okay have a
book which has chapters which have
sections which have paragraphs where the
paragraph is the semantically cohesive
unit that you would use to for instance
create a vector embedding of that you
can use later for vector search. But
what's really interesting in the graph
is basically you can connect these
things all up right so you know exactly
who's the predecessor who's the
successor to a chunk who's the parent of
an element and using something like a
vector or text similarity you can also
connect these uh chunks as well by an K
nearest neighbor or similarity graph
where you basically store similarities u
between chunks and then you put on the
relationship between them and an and
weighted score basically how similar the
two chunks and then you can use all
these relationships when you extract the
context in the retrieval phase to for
instance find what are related chunks by
document by uh temporal sequence by
similarity and other things right so
that's on the on the lexical side um
this looks like this so if for instance
you have an RFP and you want to break it
up in a structured way then you
basically create the relationships
between these chunks uh or the the these
subsections at the text do the vector
embeddings and then you do it at scale
and then you get a full uh lexical uh
graph graph out of that. Next phase is
entity extraction. Uh which is also
something that has been around for quite
some time with NLP but LLMs actually
take this to the next level with their
multi- language understanding with their
high flexibility good language skills
for extraction. So you basically provide
an graph schema and an um instruction
prompt to the LLM plus your pieces of
information, pieces of text. Now with
large context windows you can then put
in 10,000 100,000 tokens for extraction.
If you have you can also put in already
existing ground truth. So for instance,
if you have ex existing structure data
where your entities, let's say products
or genes or partners or clients are
already existing, then you can also pass
this in as part of the prompt. So that
the LLM doesn't do an extraction, but
more an recognition and and finding um
approach where you find your entities
and then you extract relationships from
them and then you can store additional
facts and and uh additional information
that you store uh as part of
relationships and entities as well. So
basically in the first part you have the
lexical graph which is representing
document structure. But then in the
second part you extract the relevant
entities and their relationships. If you
have already an existing knowledge graph
you can also connect this to an existing
knowledge graph. So imagine you have an
um CRM where you already have customer
clients uh and and leads in your
knowledge graph but then you want to
enrich this with for instance uh
protocols from call transcripts and then
you basically connect this to the
existing structure data as well. So
that's also a possibility and then in
the next phase what you can do is you
can run graph algorithms for
enrichment which then for instance can
do clustering on the entity graph and
then you generate uh something like uh
communities where an LLM can generate
summaries uh across them as such right
and uh for especially last one it's
interesting because what you identify is
actually cross document uh topics right
so because it's basically each document
is an basically temporal uh vertical
ical representation of information. But
what this is is actually it looks at
which topics are reoccurring across many
different documents. So you find these
kind of topic clusters across uh
documents as well. Cool. So if you look
at the the second phase, the search
phase, which is basically retrieval uh
part of red. What we see here is
basically that in a graphic retriever
you don't just do a simple vector look
up to get uh returns uh results returned
but what you do you do an initial index
search it could be vector search full
text search hyper search spatial search
other kinds of searches to find the
entry points in your graph and then you
basically uh can take as you can see
here um starting from these entry points
you then follow the relationships up to
a certain degree or up to a certain
relevancy to fetch in uh additional
context and this context can be coming
from a user question. It can be external
user context that comes in. For
instance, when someone from let's say
your uh finance department is looking at
your data, you return different
information and if someone from the
let's say engineering department is is
looking at your data, right? also takes
this external context into account how
much and which context you retrieve and
then you return to the LLM to generate
the answer. Not just basically text
fragments like you would do in vector
search but you also create the return
these um more complete uh subset of the
of the contextual graph uh to the LLM as
well. And modern LLMs are actually more
trained on uh graph processing as well.
So they can actually deal with these uh
additional pattern structures where you
have uh node relationship node patterns
uh that you provide as additional
context uh to the LLM. Um and then of
course I mentioned that you can enrich
it using graph algorithms. So basically
you can do things like uh clustering,
link prediction, pitch rank and other
things to enrich your data. Cool. Let's
look at some uh practical examples. We
don't have too much time left. Uh so one
is knowledge of construction from
unstructured sources. So there's a
number of libraries. Uh you've already
heard some uh today from people that do
these kind of things. Um so one thing
that we built is an a tool that allows
you to take PDFs uh YouTube uh
transcripts uh local documents, web
articles, Wikipedia articles and it
extracts your uh data into an graph. And
let me just switch over to the to the uh
demo here. Uh so this is the this is the
tool. uh so I uh uploaded uh information
from different Wikipedia pages, YouTube
videos, articles and so on. And here's
for instance an Google DeepMind uh
extraction. So you can use a lot of
different LLMs here. And then you can
also if you want to in graph enhancement
provide graph schema as well. So you can
for instance say a person uh works for
uh a company and uh add these patterns
uh to your um to your schema and then
the LLM is using this information to
drive the extraction uh as well. And so
if you look at the data that has been
extracted from uh deep mind that is this
one here we can actually
see from the Wikipedia article um two
aspects. one is the document with the
chunks which is this uh part of the of
the graph right and then the second part
is the entities that have been extracted
from from this uh article as well. So
you see actually the connected knowledge
graph of entities which are companies,
locations, people and technologies. So
it followed our um followed our schema
to extract this and then if I want to
run graph rack you have here a number of
different retrievers. So we have vector
retriever, graph and full text, entity
retrievers and others uh that you can
select. Uh all of this is also an open
source project. So you can just go to
GitHub and have a look at this. And so I
just ran this before because internet is
not so reliable here. So what has deep
mind worked on? And I get an detailed
explanation. And then if I want to I can
here look at details. So it shows me
which sources did it use. Alphafind
Wikipedia another PDF. I see which
chunks have been used which is basically
the full text and hybrid search. But
then I also see which entities have been
used from the graph. So I can actually
really see from an explanability
perspective these are the entities that
have been retrieved by the graph
retriever passed to the LLM in addition
uh to the text that's connected to these
entities. So it gets an richer response
uh as such and then you can also do eval
on that with with feras as
well. Um so while I'm on the screen uh
let me just show you another thing uh
that we worked on which is more like an
engetic approach where you basically put
these individual retrievers into an an
configuration where you have basically
domain specific retrievers uh that uh
are um running individual suffer
queries. So for instance, if you look at
uh let's say this one, it has uh the
query here and basically a tool with
inputs and a description and then we can
have an agentic um loop using these
tools basically doing uh graphic with
each individual tool taking the
responses and then doing uh deeper uh
tool calls. Uh I'll show you an deeper
example in a in a minute. So this is
basically what I showed you. This is all
available as uh open source libraries.
You can use it yourself in from Python
as well. Uh it showed neo converse which
is also able not to just output text but
also uh charts and other visualizations
uh networks uh visualizations as well.
And what's interesting here in the
agentic approach, you don't just use
vector search to retrieve your data, but
you basically break down a user question
into individual tasks and extract
parameters and run these individual
tools. Um, which then are either run in
sequence or in a loop to uh return the
data. And then you get basically these
uh outputs back and uh basically for
each of these things different
individual tools are called and and used
here. And the last thing that I want to
show is the uh graphite python package
uh which is basically also encapsulating
uh all of this in construction and the
retrieval into one package. So you can
build the knowledge graph. You can
implement the retrievers and create the
pipelines here. And here's an example of
where I pass in uh PDFs plus a graph
schema and then basically uh it runs uh
the import into NEFJ and then I can uh
in the Python notebook visualize uh the
data later on. And with that I leave you
with one second uh the takeaway which is
on graph.com you find all of these
resources a lot of the patterns and uh
we'd love to have contributions and love
to talk more. I'm outside at the at the
booth if you have more questions. Yeah.
So now that was great and I think you're
getting it all from the expert with all
the tooling. Actually Michael's team
builds a lot of the tools like knowledge
graph builder. Um, very excited you all
came to the graph rag track and hope to
chat with you all more. If you have
questions for me and Michael, just meet
us in the Neo Forj booth across the way.
Thank you. Thank you.
Thank you. Thank you, Michael and
Stephen. That was fantastic. My my big
takeaway was that there is so much to
look at. It's amazing.
[Music]
This is uh this one for power
Wi-Fi. Okay. Thank you so much. So in
this next talk is going to be taking us
through a multi- aent framework for
network analysis. Is this right?
Correct. Fantastic. Correct. Looking
forward to it. Yes. Awesome.
I'm so sorry.
So that we could get a sound. All right.
One, two, three, four,
five. Five, four, three, two, one.
Microphone check. One, two, one,
two. All right. Good afternoon,
everyone. My name is Hola Mabad. I'm a
product guy from Cisco. Um, so my
presentation is going to be a little
more producty than techie, but um, uh, I
think you're going to enjoy it. And so,
um, I've been at Cisco working on, uh,
AI for the last three years. And, um, I
work in this group called outshift. So,
outshift is Cisco's incubation group. uh
our charter is to help Cisco look at
emerging technologies and see how this
emerging technologies can help us
accelerate the road maps of our
traditional business units and uh so um
by uh by training I'm electrical
engineer um doubled into network
engineering enjoyed it and I've been
doing that for a while but over the last
three years focused on AI um our group
also focuses on quantum technology so
quantum networking is something that
we're focused on and um if you want to
learn more about what we do. Uh we
outshift at Cisco. Uh you can learn more
about that. So uh for today we're going
to dive into this uh real quick and um
like I said I'm a product guy. So I
usually start with my customers problems
trying to understand what are they
trying to solve for and then from that
work backwards towards creating a
solution for that. So as part of the
process for us we usually go through
this incubation phase where we ask
customers a lot of questions and then we
come up with prototypes we do a testing
b testing and then we kind of deliver an
MVP into a production environment and
once we get product market fit that
product graduates into the Cisco's
businesses so this customer had this
issue they said when we do change
management we have a lot of challenges
with failures in production how can we
reduce that can we use AI to reduce that
So we double clicked on that problem
statement and we realized it was a major
problem across the industry. I won't go
into the details here but it's a big
problem. Now uh for us to solve the
problem we wanted to understand does AI
really have a place here or it's just
going to be rulebased automation to to
solve this problem. And we looked at the
workflow we realized that there are
specific spots in the workflow where AI
agents can actually help address a
problem. And so we we kind of
highlighted three, four and five where
we believe that AI agents can help
increase the value uh for customers and
reduce the pain points that they were
describing. And so we sat down together
with the teams. We said let's figure out
a solution for this. Um and so uh this
solution consists of three big buckets.
The first one is the fact that it's a it
has to be natural language interface
where network operations teams can
actually interact with the system. So
that's the first thing and not just
engineers but also systems. So for
example in our case we built this system
to talk to an ITSM tool such as service
now. So we actually have a agents on the
service now side talking to agents on
our side. Um the second piece of this is
a multi- aent system that sits within
the within this application. So we have
a agents that are tasked at doing
specific things. So an agent that stacks
as doing impact assessment, doing
testing, doing uh reasoning around uh
potential failures that could happen in
the in the network. And then the third
piece of this is where we're going to
spend some of the time today, which is
network knowledge graph. So we have a a
the concept of a digital twin in this
case. So what we're trying to do here is
to build a twin of the actual production
network. And that twin includes a
knowledge graph plus a set of tools to
execute test testing. And so, um, we're
going to dive into that in a little bit,
but before we go into that, I I we we
had this challenge of, okay, we want to
build a representative representation of
the actual network. How are we going to
do this? Um, because if you know
networking pretty well, networking is a
very complex uh technology. You have a
variety of vendors in a customer's
environment, variety of devices,
firewall, switches, routers, and so on.
And then all of these different devices
are spitting out data in different
formats. So the challenge for us was how
can we create a representation of this
real world network using knowledge
graphs in a data schema that can that
can be understood by agents. And so the
goal was for us to create this ingestion
pipeline that can represent the network
in such a way that agents can take the
the right actions in a meaningful way
and predictive way. And so for us to to
kind of proceed with that we had this
three big buckets of things to consider.
So we we had to think about what are the
data sources going to be. So if you
again in networking there controllers
systems there the devices themselves
there agents in the devices there are
configuration management systems all of
these things are all collecting data
from the network or they have data about
the network. Now when they spit out
their data they're spitting it out in
different languages Yang JSON and so on.
another set of considerations to have
and then in terms of how the data is
actually coming out it could be coming
out in term of streaming telemetry it
could be configuration files in JSON it
could be some other form of of data how
can we look at all of these three
different considerations and be able to
set come up with a set of requirements
that allows us to actually build a
system that that addresses the
customer's painoint again and so um the
team uh from a product side we had a set
of requirements we we wanted a system
that uh a knowledge graph that can have
multimodal flexibility
uh that means you can talk key value
pairs, you understand JSON files, he
understands uh relationships across
different entities in a network. Second
thing is performance. Uh if a if an
engineer is quering a knowledge graph,
we want to have instant access to the
node information about the node no
matter where the the location of that
node is. That was important for our
customers. The second thing was
operational flexibility. So the schema
has to be such that uh we can
consolidate into one schema framework.
Uh the fourth piece here is where the
the the rag piece comes into place. So
we've been hearing a lot about graph rag
for for for a little bit today. Uh we
wanted this to be a system that has
ability to have vector indexing in it so
that when you want to do semantic
searches at some point you can do that
as well. And then in terms of just
ecosystem u um stability, we want to
make sure that when we put this in the
customer's environment, uh there's not
there's not going to be a lot of heavy
lifting that's going to be done by the
customer to integrate with their systems
and again it has to support multiple
vendors. So these were the requirements
from a product side and then our
engineering teams kind of we started to
consider some of the options on the
table. Uh new forj obviously market
leader uh and the various other open
source tools. At the end of the day the
engineering teams decided to kind of do
uh some analysis around this. So I can
I'm showing the table on the right hand
side. It's not an exhaustive list of
things that they considered but this
were the things that they looked at that
they wanted to see okay what is the
right solution to address the
requirements coming from product and um
uh we they kind of we kind of all
centered around the first two here no 4G
and Arango DB but for historical reasons
the team decided to go with Arango DB
because we had some use cases that were
in the security space uh that was kind
of a recommendation system uh type of
use cases that we wanted to kind of
continue using and so um But we are
still exploring the use of neo forj for
some of the use cases that are coming up
as part of this project. So um we
settled on on a rangodv for this and uh
we eventually came up with a solution
that looks like this. So we have this
knowledge graph solution. This is an
overview of it. Um on the left hand side
we have all of the production
environment. We have the controllers the
the splunk which is a sim system traffic
telemetry coming in. All of them are
coming into this ingestion service uh
which is doing an ETL transforming all
of this information into one schema open
config. So open config schema is a
schema that is designed around
networking primarily and uh it helps us
to because there's a lot of
documentation about it on the internet.
So LM understand this very well. So um
this setup is primarily a a database of
uh of uh networking information that has
open config schema as a primary way for
us to communicate with it. So uh natural
language communication through an
individual engineer or the agents that
are actually interacting with that
system. And so we built this in the form
of layers. So uh if you if you're if
you're into networking again um there is
a set of entities in the network that
you want to be able to interact with. Uh
so we have layered this up in this way
such that if uh there's a tool call or
there's a decision to be made about a
test for example let's say you want to
do a test about uh configuration drift
as an example um you don't need to go to
all of the layers of the graph you just
go straight down to the raw
configuration file and be able to do
your comp comparisons there. If you're
trying to do like a test around
reachability for example then you need a
couple of layers maybe you need raw
configuration layers data control data
plane layers and control plane layers.
So um it's structured in a way that when
the agents are making their calls to
this system uh they understand what the
request is from the from the uh system
and they're able to actually go to the
right layer to pick up the information
that they need to ex to execute on it.
So this is kind of a high level view of
what the graph system looks like in
layers. Now
um I'm going to kind of switch gear
switch gears now and go back to the
system. Remember I described a system
that had agents a knowledge graph and
digital twin as well as natural language
interface. So let's talk about the
aentic layer and before I kind of talk
about a specific agent in um in this
system on this application we are
looking at how we are going to build a
system that is based on open standards
for all of the internet and this is one
of the challenge we have within Cisco.
We we are looking at a system a a set of
a collective open source collective that
includes all of the partners we see down
here. So we have uh outshift by Cisco we
have lang chain Galileo we have all of
these uh members who are supporters of
this uh of this collective and what
we're trying to do is to set up a system
that allows agents from across the
world. Uh so it's a big vision uh that
they can talk to each other without
having to do heavy lifting of
reconstructing your agents every time
you want to integrate them with another
agent. So it consists of identity uh
schema framework for defining an agent
skills and capabilities the directory
where you actually store these agents
and then how you actually compose the
agents both at the semantic layer and
the synthetic layer and then how do you
observe the agents in process all of
these are part of this collective uh
vision as as as a group and if you want
to learn more about this is on
agency.org RG and I also have a slide
here that kind of talks about um there's
real code actually that you can leverage
today or if you want to contribute to
the code uh you can actually go there
there's a GitHub repo here that you can
go to and and you can start to
contribute or use use the use the data
um there's documentation available as
well and there's sample applications
that allows you to actually see how this
works in real life and uh um we know
that there's MCP there's A2A all of
these protocols are becoming uh very
popular uh we also integrate with all of
these protocols because the goal again
is not to uh create something that is
bespoke. We want to make it open to
everyone to be able to create agents and
be able to make these agents work in
production environments. So back to the
specific application we're talking about
based on this framework, we delivered
this set of agents uh we build a set of
agents as a group. So we have five
agents right now as part of this
application. Um there's an assistant
agent that's kind of the planner that
kind of orchestrates things across the
globe across all of these agent agents
and then we have other agents that are
all based on react reasoning loops.
There's one particular agent I want to
call out here the query agent. This
query agent is the one that actually
interacts directly with the knowledge
graph on a regular basis. Um we have to
fine-tune this agent because um we
initially started by doing a uh
attempting to use rag to do some
querying of the knowledge graph but that
was not working out well. So we decided
that for immediate results, we're going
to fine-tune it. And so we did some
finetuning of of the of this agent with
some schema information as well as
example queries. And so that helped us
to actually reduce two things. The
number of tokens we were burning because
every time we were before that the AQL
queries were going through all of the
layers of the knowledge graph and in a
in a reasoning loop was consuming lots
of tokens and taking a lot of time for
it to result to return results. after
fine-tuning, we saw a drastic reduction
in number of tokens consumed as well as
the amount of time it took to actually
come back with a result. So that kind of
helped us there. Um so um I'm going to
kind of pause here. I'm talking a lot
about there's a lot of slide wear here.
I want to show a quick demo of what this
actually looks like. So tying together
everything from the natural language
interface interaction with an ITSM
system to how the agents interact to how
that collects information from knowledge
graph and delivers results to the
customer. Okay. Yeah. So um the scenario
we have here is a a network engineer
wants to make a change to a firewall
rule. They have to do that to
accommodate a new server into the
network. And so what they need to do is
to first of all start from ITSM. So they
submit a ticket in uh in their
in service now. Now our system here the
the v the UI I'm showing here right here
is the UI of the actual system we've
built the application we built. We have
ingested information about the uh
tickets here in natural language and so
the agents here are able to actually
start to work on this. So I'm going to
play a video here just to make it uh uh
more relatable. So the first thing
that's happening here is that these
agents uh the first agent is asking that
the inter for the for the information to
be synthesized in a summarized way so
that they can understand uh what to
quickly do. The next action that has
been asked here is for you to create an
impact assessment. So impact assessment
here just means that I want to
understand will this change have any
implications for me beyond the immediate
uh target area and that's going to be
summarized and we are now going to ask
the agent that is responsible for this
particular task to go and attach this
information into the ITSM ticket. So I'm
going to say uh attach this information
about the impact assessment into the
ITSM ticket. So that's been done. Now
the next step is to actually create a
test plan. So test plan is one of the
biggest problems that our customers are
facing. Um they they run a lot of test
but they miss out on the right test to
run. So these agents are actually able
to reason through a lot of information
about test plans across the internet and
based on the intent that was collected
from the service now ticket is going to
come up with a list of tests that you
have to run to be able to make sure that
this firewall rule change doesn't make a
big impact or create problems in
production environment. So, as you can
see here, this agent has gone ahead and
actually listed all of the test cases
that needs to be run and the expected
results for each of the tests. So, we're
going to ask this agent to attach this
information again back to the ITSM
ticket because that's where the approval
board needs to see this information
before they implement before they
approve the implementation of this
change in production environment. So, we
can see here that that information has
now been attached back by this agent to
the ITSM tickets. So, two separate
systems but agents talking to each
other. Now the next step is actually run
a test on all of these test cases. So um
in this case the configuration file that
is going to be used to make the change
in the firewall is sitting in the GitHub
repo. And so we're going to do a pull
request of that config file and going to
take that information. So this is the
GitHub repo where the where we're going
to do a pull request. We're going to
take the link for that pull request and
paste it in the
ticket and so that when the executor
execution agent starts doing its job is
actually going to pull from that and use
it to run this test. So um at this
moment we we have we're going to start
running the test. We're going to ask
this agent to go ahead and actually run
the test and execute on this test. And
so um I have attached the change sorry I
don't have my glasses. I've attached my
uh change candidates to the ticket. Can
you go ahead and run the test? So what
is going to happen here is if you look
on the right hand side of this screen
here, a series of things are happening.
The first thing is that the this agent
called the exeutor agent goes looks at
the test cases and then it goes into the
knowledge graph and it's going to go
ahead and actually do a snapshot of the
most recent visual or most recent
information about the network. is now
going to take the pull request that it
pulled from GitHub, the snapshot it just
took from the knowledge graph. It's
going to compute it together and then
run all of the individual test one at a
time. So we can see that it's running
the test one test, test one, test two,
test three, test four. So all of this is
happening in what we call a digital
twin. So a digital twin again is a cons
combination of the knowledge graph, a
set of tools that you can use to run the
test. So an an example of a tool here
could be batfish or could be routnet or
some other tools that you use for
engineering for network engineering
purposes. So once all of these tests are
completed uh this tool actually is going
to this agent is going to now generate a
report about the test results. So um we
give it some time to run through this.
It's still running the tests but when it
once it concludes all of the tests it's
going to report what actually uh the
test results are. So which results which
tests actually passed which ones failed.
for the ones that have failed is going
to make some recommendations of what you
can do to go and fix the problem. Um um
I'm going to skip to the front here to
just quickly get this on uh done quickly
because of time. Um so um it's attached
the results to the ticket and this is
the report that it's spitting out in
terms of this is the report for the test
that were run. So this execution agent
actually created a report about all of
the different test cases that were run
by the system. So um very quick short
demo here. Uh there's a lot of detail
behind the scenes but I can answer some
questions offline. Um the the the couple
of things I want to leave us with is
that uh before I go to the end of this
uh is that evaluation is very critical
here for us to be able to able to
understand how this delivers value to
customers. Um we're looking at a variety
of things here. So the agents themselves
the knowledge graph digital twin and
we're looking at the what can we
actually measure quantifiably. Now for
the knowledge graph, we're looking at
extrinsic metrics particularly not
intrinsic ones because we want to map
this back to the customer's use case. So
this is the summary of the of what we
see in terms of evaluation metrics. Um
we are still learning this is a this is
for for now it's it's an MVP. Um but
what we are learning so far is that
those two key building blocks the
knowledge graph and an open framework
for building agents is very critical for
us to actually build a scalable system
for our customers. And so, um, I'm going
to stop with 8 seconds to go. Thank you
for listening to me. And then if you
have questions, I'll be out there.
Yeah.
Thank you so much, Ola. That was
fantastic. I love getting a deep dive
and always a perspective from a product
guy is always good to hear. Keep us set
in reality. Thank you.
So for for the graph track closing out
the day on this track, it's gonna be my
friend Tom Smoker from
YAL who's gonna be talking about legal
documents and how to turn those into
knowledge, right? Yeah, part of it.
Yeah. Awesome.
Quick check on your audio.
Five, four, three, two, one.
Back up. Still up. Oh, there you are.
Beautiful.
Okay.
Cool. Thanks everyone.
Um, when we're ready, ABK, just let me
know.
Good. Cool. I can't see a whole lot.
Thank you. I'll get started. Okay. Uh,
yes. Oh, no. Stand by. One moment. We
got to change the reports. No worries.
The drives. I'm sorry.
Okay, we're ready. Thank you. Thank you.
I have bad eyesight and an Australian
accent. So, this is a not a great
combination. So, I appreciate you
working with me. Thank you. Uh, hello
everyone. I am here to talk about graph
rag as we're here for the track, but I'm
talking about what to do in the legal
industry and what we do in the legal
industry and what does it look like to
turn documents into graphs and use those
graphs in the age of
AI. I tend to have to qualify why I'm at
places. Uh, there's various reasons why
I could be talking today. Uh, you choose
the one that you want to, but generally
I've been working on graphs for about a
decade. Uh I have a good relationship
with the near forj team uh and I've been
doing graphs for a long time but
primarily I am the technical founder of
a company called yhar.ai and we find
cases first uh before lawyers do and
then give them to lawyers. Now how we
find these cases is a process that I'll
go through but we use a variation of
graphs multi- aent systems signals etc.
And I'll detail through today how we do
that at a high level and a low level.
And I'm happy to answer questions at any
point. This is broadly what we do. We
work in law. This is an example. We find
class action mass cases before other
people do. Um we have agents. Uh we have
graphs. We store that information. We
scrape the web. We qualify that with a
proprietary process. And uh we deal with
lawyers every day and understand exactly
how they think and build these cases.
And the cases I'm referring to would be
like many people used a pharmaceutical
product. That product has caused them
harm. Science has proved that harm and
we can collect those people and
collectively sue the pharmaceutical
company. So we support the law firms
that do that. And as I'm talking and
everyone here for a graph rag track can
start to imagine that I'm starting to
develop a bit of a schema there. I'm
describing individuals. I'm describing
products. Those products have
ingredients. Those ingredients have
concentrations. Those concentrations may
have an ID number. And all of a sudden
you can start to imagine there is this
largeworked schematized bit of data that
has particular points in it that are
very valuable and very visual and very
useful to domain
experts. So I'm going to start to use
some definitions because there is
knowledge graphs have been around for a
long time and ABK would know that more
than I would. But I started my PhD and
well my masters in graphs in 2016 and it
was not nearly as popular as it is now
and it's fascinating to see how far it's
come. But I do think it's important for
me to define how we use them and how we
think about
them. Broadly to me, graphs are
relations. That's part of the visual
element. There's a backend element as
well. But it's the benefit of using
graphs is that I can see what is
connected to something else. I can be
explicit about what is connected to
something else. And I can do mass
analytics on what is connected to
something else. All the way from I can
see it down to I can do large scale
analytics on it. Is the value of the
relations. And when I use relations,
it's not necessarily node to node. It
can be node to node to node. It can be
multihop. It can be as varied and as
forked and as distributed as you want.
This is why we use graphs in our
process. Broadly throughout the process
of running this company and previously
as an academic, this is what I think is
easy about graphs. People look at them
and go well that's fantastic. I have a
great understanding of what this is. And
someone else says me too. And there
isn't necessarily a consistency in what
those two people just said. They may
have a different understanding of what
is
represented broadly throughout my
career. These are the things that are
difficult about graphs, right? And you
can say that they're nodes connected to
edges. You can say they're distributed.
You can say they're backed up. There's a
variety of ways in which people use the
data uh that they have, the way they
store it, and the way they talk about
it. And now, as graphs have become very
necessary and consistent for things like
graph, rag, for things like structured
data, etc. More and more people are
coming to this relatively niche area
previously that even at the time wasn't
necessarily agreed upon what it was. So
I do like to define what it is we're
using. So graphs and multi-agent
systems, these are the two things that I
want to define as there's a variety of
ways that people use
them. This is how we use multi- aent
systems, right?
So now multi-agent systems are all the
way from very specifically define what
you're dealing with and chain those
together and use an LLM to glue it all
together or it is in our case break down
a complicated white collar workflow down
into a specific set of steps that I can
IO test right and each of those steps
have different requirements different
frequencies different state and that
state can be controlled often in our
case by a graph
This is why we like to use them when
we're building an application for the
legal industry. Not sure if you guys
know this, but lawyers don't really like
when things are incorrect, right? It is
basically the whole industry is make
this very specifically correct and
proper and definitely in the right
language. So when it comes to building
applications, probabilistic large
language models don't necessarily work
for that just in isolation. I need to
have a very specific control and
structure and schema for the way that we
build these systems. and I need to be
able to test and be able to pinpoint
exactly what is going right and wrong at
any point in
time. Here are some of the issues with
that, right? And we've heard about multi
well, at least I've heard about
multi-agent systems a lot. I'm sure
other people have as
well. Sometimes the part in the workflow
is much more important than the other
part. Sometimes there's parts in the
workflow I don't particularly care
about. Uh there are also agents in the
world. Agents imply that these things
are very capable, but I can write a bad
prompt very easily and all of a sudden I
have a bad agent. So when it comes to
what is the agent that I trust, very
few. We spend a lot of time guardrailing
as much as we possibly can. We spend
time making sure that the memor is not
just immediate but it's episodic. We
spend time capturing the information
state over time and then pruning that
state. And again to bring it back,
capturing, expanding, pruning,
structuring and then querying state for
us happens in a graphical format because
the necessity of having the structure,
having the extendability and then having
the ability to remove that extension is
really important for us. Then finally,
I'm trying not to make this too in deep
depth and too many numbers,
but 95% accuracy for a single agent, I
think, is a tall order at this point.
Maybe people have entirely accurate
agents. I'm very happy for you. I don't
have that exactly right now. I have
systems that I can put in place like
guardrails and humans in the loop that
can bring these agents to a point that
it is accurate enough that people are
willing to use them. However, five 95%
accurate agents chained together
sequentially. That's 77% expected
accuracy. That's not that many agents in
a row. If you think about a workflow,
that's five steps. And if I'm basically
saying that if each of those five steps
are 95% accurate, already quite a hard
thing to ask, especially if there's an
LLM involved, now we're at 77% of the
time it gets to the end of that workflow
in the way that I want. That is part of
probably if I was to summarize my main
problem, it would be that it' be
decision-m under uncertainty throughout
the process of building these
systems. That's the background. That's
that's how we understand these systems.
We use multi- aent systems and we're
naturally skeptical. We use graphs every
day and we have a natural skepticism of
exactly how these things are stored and
structured. But we use them specifically
and consistently in the way that we
like. So I am using the term agent
because everyone's using the term agent.
We build litigation agents. Litigation
is the process of well I'm going to
summarize but we work with class
action/masstor law as I said before get
everyone together they were harmed put
that harm all in place and then sue a
pharmaceutical company. Now, we don't do
any of the litigating as a company or
the suing, but we do support the lawyers
who do that. We do that in a few
different
ways. Here is one of the ways that we
look at the legal industry. Right?
Without exception, everything needs to
be perfect. It needs to be accurate. It
needs to be written in the correct way.
Right? There's also once you have that
correct format, creative arguments. The
best lawyers are very very very detail-
oriented and then very very creative in
the way that they can apply those
details to a case. For example, there
was an issue with uh Netflix and they
were uh capturing data from their users
as they do and they should and I'm a
Netflix user and they capture my data
and I appreciate it because they give me
the better shows that I'd like to watch.
However, there is a legal limit as to
how much information they can capture
from me, right? And you cannot surpass
that legal limit or you can but then you
can go into the process of litigation.
Now, if you surpass that, there needs to
be a precedent as to why someone could
say, "You cannot capture this much
information." And the particular
precedent I'm referring to is many years
ago, Blockbuster was sued by keeping too
many details about the literal physical
DVDs that people rented. That is a
reasonably creative way to say, "Look, I
remember that Blockbuster happened, and
what Netflix is doing isn't that
different. It may be in a digital
format. It may be at a larger scale. It
may be into an algorithm instead of
someone who's recommending it. However,
that is an interesting application of
what I'm doing.
So these problems then which is ne
necessary accuracy and then creativity
on top of that accuracy and then all of
that information is kept in separate
places and a lot of that creativity
comes from the latent knowledge in the
expert's head starts to come to a bit of
a four when you say well I have these
probabilistic agents that you could
argue aren't that creative right I have
these agents that most of the time do a
pretty good job and can be creative in a
way that frankly can be quite
frustrating especially to a lawyer
So, this butts heads in terms of exactly
how lawyers want to deal with this
information. And again, I'm painting a
very broad brush. I'm not a lawyer. My
co-founder is. If anyone is a lawyer in
the audience is offended, I do
apologize. But this is broadly what I've
seen to be
accurate. We help with legal discovery
as well, right? Like I described before,
there could be an unnamed pharmaceutical
company. A pharmaceutical company's
great, but they happen to have done some
harm, right? And it is in their best
interest to give all of the information
to the law firm and describe exactly
well not exactly describe in as many
ways as possible. Here is 500 gigabytes
of emails that don't matter. Go nuts,
right? Figure out exactly what happened
at what point and bring up the
information. Now that is a challenge at
the moment. A lot of the time it's
manually reviewed. There are shortcuts
and processes by necessity because a lot
of these lawsuits are on a particular
timeline. It is physically impossible to
read all of the information that is
given in the discovery of the processing
of a lawsuit. However, and this is just
a generic graph I use because I'm not
allowed to use the ones that I'm
currently working on. However, if you
can take all of that information, you
can extract the information and
structure it in such a way that it is
consistent, all of a sudden that
mountain of emails becomes a lot of
information I can immediately dismiss
and a bunch of generally genuinely
useful information that I can look at.
And not just that, when it comes to a
graph, I can actually augment the
information from discovery and then I
can give that visual to the expert who
can make an immediate decision. I'm
going to loop back to the example I was
working describing before, which is the
pharmaceutical example. So again, if
ingredients are a certain concentration,
that concentration is at a problem. That
problem happened at a certain time.
There is only going to be a few people
in that graph of potentially millions of
nodes that are a problem, right? in the
same way that there are only few people
in that mountain of documents that were
a problem. However, now I've changed the
form factor such that I can specifically
hone in on what matters and not just
hone in in a datadriven way. I can hone
in in a visual way and natural language
such that the lawyer who knows exactly
what that natural language means or the
expert who knows exactly what that
natural language means can make a
decision that's data
driven. This is also a process of if we
can build this information exactly and
I'm giving the fundamentals. This is a
graph rag talk. We want to bring this
graph in. The graph I just described is
not that large. The graph I just
described has a consistent schema and
the graph I just described can be
relatively easily retrieved. I'm not
going to say that retrieval is
completely solved. I am going to say we
have agents in production right now that
lawyers can in natural language query
and further understand the lawsuit and
the individuals that they're
representing. Now we get to case
research. So that was more discovery,
right? Mountain of documents. Case
research would be a lot of people used
said product and they're complaining
about it online. And this is a lot of
the value of our company and what we do.
People can complain all the time. They
can shout into the void of a niche
subreddit or they can go on Twitter or
they can be on a forum that they're used
to. They can be an IRC where they can be
wherever they want, right? But they're
using similar language about a specific
thing. And so when it comes to
traditional case research, that
information isn't really discovered. A
lot of the time it happens through
talking to another individual,
subscribing to a newsletter. etc. How do
people find the
information? So, and this is a graphic
I've taken from our website, which I
promise looks significantly better than
the slides that I make, but I tend to uh
try and talk to them. Uh here is how
case research in our case for our
business works and that is we start and
scrape the entire web. Now, anyone can
scrape the entire web. It's doable. It's
a technical challenge, but it's doable
and you can scrape it at a frequency and
the services etc. What we do is scrape
the web and then qualify the leads of
that scraping. We filter down all of the
information down to specifically what
the individuals want. We have schemas
that we work with particular law firms
and lawyers. And those schemas get us
down to just the information that they
care about. And look, maybe there is,
but right now, at least for me, there's
no such thing as a perfect case. There's
no such thing as a perfect lawsuit. It
depends on the lawyer or the partner or
the firm who's willing to take that on.
So, it is not a problem of best. It's a
problem of specific and personalized.
And that is where things like LLMs are
particularly useful at the moment.
That's where things like multi-agent
systems are fantastic. That's where
things like structured information and
graphs all of a sudden a different
lawyer can have a different multi- aent
system and a different graph that backs
up their specific way that they like to
work as opposed to having a compromise
previously on the way that everyone else
like to work to maybe hear something if
they can. And from there once we've
honed down just to the signals that they
care about the qualified signals that
are specific to them that signal can
then further generate a report and that
report can be entirely specific to the
lawyer as well. So when it comes to
report generation again a multi- aent
system that's backed up by a schema and
that schema is consistent and pruned and
that schema looks like controlled state
with a graph that can build the report
that the lawyer wants and every report
is going to be different but the
structure is going to be the same for
each lawyer and each lawyer has a
different process.
What I'm broadly describing is mass
scraping the web down to a specific
signal generated just for the lawyer.
It's entirely personalized service
that's been automated and that is the
process of what we do and this is part
of how we are able to manage and use
state and graphs and multi- aent systems
to bring the information
together. Cool. I'm going to go through
I think I have one case study um that I
want to describe just conscious of time.
This happens. Um, it's not great. No one
really wants it to. There may be
situations in which there's a bunch of
people who bought a car who really
wanted it to catch fire. We don't
necessarily deal with them. What we do
find is that there are people who are
driving their car and it starts a smoke
and then it catches fire and there's not
the behavior that they intended for to
happen, right? It was not on the
brochure when they bought it. It's not
what they want. Those people immediately
go and complain as they should, right?
They go to government website. They go
to carcomplaints.com. They're on a
specific subreddit or forum. And once we
can start to track that, which we can,
and once we can start to scrape and then
structure and then schematize and then
analyze, we can start to basically build
a density of complaints for a specific
vehicle, for a specific year, for a
specific problem. And that density is a
combination of how many complaints
multiplied by the velocity of
complaints. So a certain amount per
month over a, you know, number of
months. All of a sudden, we get to the
point where we're finding these leads
particularly early. And now as we're
building models, we're starting to find
these leads early and earlier and that
we don't necessarily need the velocity
straight away. We can start to figure
out what are the previous lawsuits which
were all public and very well documented
and exactly what happened in that
process. And so for a large law firm,
maybe eight or nine months post people
starting to complain, they can take that
lawsuit on if they want to. uh for us we
can find it uh within about 15 minutes
and then generally it takes probably a
month for you to be confident that this
is the signal that you want and so we
can find things significantly earlier
that process again scraping the web
filtering down producing the specific
report this is an example that we did
and again uh we deal with what the
lawyers want so this lawyer again he
made the case that uh people's cars are
catching fires they don't really want
them to those are the cases that he
would like to take on it's of a certain
amount of money it's of a certain make
model it's in a certain jurisdiction etc
those specific filters that schema can
be applied throughout the entire
process. That's basically the graph.
Each of these lawyers have a specific
graph that they want. And not just that,
they can filter and feedback that
information. So it's not just a static
graph. I mean the benefit of a graph
structure at least well one of the
benefits of a graph structure I should
say is that it's an extensible schema
and that I can update and I can query
across and I can understand that
information. So uh while we are dealing
with rag I would say we have less of a
chat rag interface. While the lawyers
definitely do appreciate that, a lot of
what we have when it comes to rag or
retrieval augmented generation would be
generating these reports because as much
as a lawyer does want an answer, what
they also want is the form factor that
they're used to. And so all of these
graphs are consistently made and built
each day and then some subgraph from
that broader monolithic structure is
then brought in and composed into a
report that a lawyer can
action. Kind of what's what's next and
I'll talk about the future a little I
mean what I described is kind of what
we're doing but this is what we're
doing. Final lawsuits early and
compensate harm and then people can have
that information if they want to. Um
we're able to do this entirely
technically we're able to scrape the web
structure etc. We're able to iteratively
build up a schema as we want to. uh this
is not just a genai problem and I think
this is an important thing that I've
seen around this conference and people
may be seeing is that genai is not
better than machine learning uh and LLMs
are not you know better than traditional
ML systems but there are situations in
which one is fantastic and one is not if
you look at multi-agent systems and
again I was previously an academic in
multi- aent systems and no one ever
listened to me so this is a bizarre
situation but that when you used to
structure the multi-agent systems
together somewhere along that workflow
you would have to stop or say this is
not doable because I cannot plug these
two bits of information together. It's
too probabilistic or it's too random or
it's too inconsistent or the way to
describe it is not a binary feature.
Right? It is I really just want to kind
of type what I want. Right now with LLM
you can but it's very much for us not an
LLM filtered system. It's an ML filtered
system that LLMs have allowed us to pipe
together such that you can actually
provide value completely end to end
which I think was previously not doable.
And for us, again, we've been using
graphs for a long time. For us, the
ability to iteratively build that graph,
prune that graph, and every single
report gets better because we're able to
manage the state is why people like
working with us because we can
consistently follow and track exactly
what they want
specifically. Cool. I think I'm just
about at time. Kind of got in early, but
that's been the talk specifically around
I'm happy to talk to anyone about the
specifics, um, graph rag, etc.
multi-agent systems, but that's how we
use the process. Thank you very much.
[Applause]
Oh, cool.
Yeah. I don't want to do I don't want to
touch it if I don't need to. What's the
best way? Oh, sorry. Let me come down.
I'll take Can I plug unplug this? Is
that okay? Thank you.
for our day.
Thank you very much. Talk to you soon.
Wow, your information is so awesome. Do
you guys do you have a business?
uh in the specific
Thank you everyone for attending the
sessions in this room. We do have to
clear this room out to set it for
tomorrow. So we thank you and there's
plenty of
workspace out there.
I promise they still have Wi-Fi out
there. It's probably even better. They
might even have coffee and lemon bars.
Thank you. We're going to have our
questions outside in the track. Thank
you. He's going to meet you outside. I
promise he's not going to go anywhere.
Be glad I don't have music.
What's that? We usually just turn the
music up, but they didn't clear any
copyright. They didn't clear any music.
No music for us.
Yeah. Yeah.
Can you tell I'm a producer?
Yes.
Awesome. Thank you.