When Vectors Break Down: Graph-Based RAG for Dense Enterprise Knowledge - Sam Julien, Writer

Channel: aiDotEngineer

Published at: 2025-07-22

YouTube video id: XlAIgmi_Vow

Source: https://www.youtube.com/watch?v=XlAIgmi_Vow

[Music]
welcome. So glad to see you all here.
Uh, welcome to When Vectors Breakdown,
graph-based rag for dense enterprise
knowledge. And big thank you to Swix and
Ben for putting on yet another amazing
event. Um, so it's a pretty interesting
signal that we have an entire track
dedicated to graph-based rag. And I
think in addition to all of the agentic
uh promise of graph-based rag, we're
also seeing that the market is starting
to catch up that vector search is just
not enough for rag at scale. You may
have seen this really interesting
article by Joe Christian Bergam who is
around here somewhere on the rise and
fall of the vector database
infrastructure category and his
subsequent interview on latent space
where he talked about how vector
databases have experienced this gold
rush after ChatP's launch. uh but that
the industry is starting to recognize
that vector search alone is just
insufficient for sophisticated retrieval
and that we're going to need multiple
strategies beyond simple vector
similarity. This is music to our ears at
Ryder because we've actually been
talking about this for a long time.
We've been uh talking about the benefits
of graph-based rag for a couple of years
now. In fact, if you look at this
article from November 2023, which in AI
time is like prehistoric times, um we
actually talk about the benefits of
knowledge graphs and the shortcomings of
vector databases and simple similarity
search for enterprise rag at scale.
And if you're not familiar with writer,
we're this end-to-end agentic platform
for enterprises where we build our own
models, we build our own graph-based rag
system and have this suite of software
tools on top of that for enterprises to
be able to build agents and AI
applications. And so as we've been
building knowledger graph over the
years, it's been an interesting journey
as we've been working with these Fortune
500 and global 2000 companies at scale.
Most of them or many of them are in
highly regulated industries like
healthcare and finance where accuracy
and low hallucinations are super
important. And so our team has been
putting together this system over the
years of different components put
together and different techniques that
we could really drive our accuracy rate
up high and reduce our hallucinations.
And so what I wanted to share in this
talk was kind of the journey of how we
got there. And the main takeaway being
as you're seeing in several of these
talks like the first talk about hybrid
search there are many different ways
that you can get the benefits of
knowledge graphs in rag and also what
how you get there and what you learn
along the way is actually often very
valuable as you're building out your
retrieval system uh almost just as
valuable as the end result itself. So,
I'm going to weave together these two
stories of our journey to graph-based
rag and sort of the first principles
thinking that I think has made our team
successful in putting together this
system as we continue to iterate and
improve on it. So, I'm Sam Julene. I'm
the director of developer relations at
Writer and you can find most of my
writing and books and newsletters and
all of those things at sjulene.com.
So, I talked about this system composed
of multiple pieces put together over a
couple of different years. And I want to
talk about sort of how we got to this
point and where we are now. And I'm just
going to put a blanket caveat on here
that please consider this a sketch and
not a blueprint of what is currently in
production. Of course, there are like
many moving pieces and many layers to
this. Uh, but I want to abstract it
enough to make it something that is
practical and and usable for people. So
our research team, we have a cracked
research team at Ryder and they have
four main areas of focus. Enterprise
models like like our Palmyra X5 model,
that's the one powering the chat on the
AI engineer website right now. Practical
evaluations like our finance uh
benchmark called failsafe QA domain
specific uh specialization. These are
our domain specific models like Palmyra
Med and Pomera Finn. And then what our
focus is here retrieval and knowledge
integration. So bringing enterprise data
to work with our models in a secure
reliable way. And I think what's really
cool about the way our research team
works is that they're very focused on
solving practical problems for our
customers. Uh they're not just sort of
like working in isolation uh working on
theoretical things. They're actually
driven by customer insights. And that's
uh really what I would consider like
sort of the first meta lesson of what
why I think this is working so well for
writer. Right now we're really focused
on solving the customer problems rather
than implementing specific solutions.
So the problem that we are trying to
solve kind of constantly as most of us
are here is that enterprise data is
really dense, specialized and massive.
So we're often dealing with terabytes of
data and it uses very specific language
and it's often very clustered together.
There's not a lot of diversity in the
language used in these documents. And
that's what our research and engineering
teams have been focused on these last
few years. So like most we kind of
started out with a regular search of
quering a knowledge base using an
algorithm and passing that to the LLM.
But that quickly sort of like ran out
because of you know it was good for
basic keyword searches but not really
great for that advanced similarity
search that we needed. So then again
like most we went to vector embeddings
and did chunking and embeddings and put
it in a database and then similarity
search uh and passing it to the LLM for
the end user to query. But we ran into
two major problems with this. The first
is that with vector retrieval chunking
and nearest neighbors can give
inaccurate answers. Uh so if you look at
this example of kind of this text about
the founding of Apple and the timeline,
it's very easy for us as humans to look
at these text chunks and pick out the
fact that the Macintosh was created in
1984. But when you chunk this text
naively and you just give it to a
nearest neighbor search, uh it can get
confused and it thinks that it was
actually in 1983 instead of 1984 because
it's in the same chunk as the
introduction of the Lisa. Uh side note,
I'm a huge Apple vintage Apple nerd and
so I liked this example. The other big
problem that we ran into with vector
retrieval was that it was failing with
really concentrated data. So if you
think about a lot of large enterprises,
it's not like they're dealing with
documents where like some of them are
talking about animals and some of them
are talking about fruit, right? Like so
if you have a mobile phone company for
example and they have thousands and
thousands of documents that all use
megapixels and cameras and battery life
and things like that and you ask the rag
system and the LLM to compare two
different phone models, it's going to
really struggle with that because it's
going to find all these answers and have
no idea how to make sense of them.
And so that's what took took us to
graph-based rag where instead we would
query a graph database and get back the
relevant documents using keys uh and
generate an answer. Especially powerful
if you combine that with like full text
and similarity search and things like
that. Um and so this really helped us
with our accuracy because we were able
to preserve the relationships with the
text and provide more context to to the
model. Uh, and this was really
interesting because at the time there
actually weren't that many people doing
graph-based rag o last over the last
couple of years. And that's why I think
the focus of the team on really trying
to solve the problem of the customer
rather than chase whatever was uh being
hyped up at the time was really
important. So that was really great. But
we did run into some challenges back
then with using graph databases. Now
this is not an indictment of any graph
database technology. It's just that we
were running into these issues at the
time. a couple of years ago. And so
there were four things that we ran into.
First, that converting the data into the
structured graph was getting really
challenging and costly at scale. Uh as
the as the graph database scaled, we
were hitting the limits of our team's
expertise as well as hitting some cost
issues. And then we were running into
some problems where cipher was
struggling with the advanced similarity
matching that we needed. And we were
noticing that LLMs were doing better
with textbased queries rather than
complex graph structures. Now again, if
you were to do this now, you might not
run into those problems, but this is
what we ran into historically.
And so I think the way that the team
approached this is also very interesting
where they decided to stay flexible
based on their expertise. So they were
running into these problems that I think
were not necessarily fundamental to the
technology itself, but more like okay,
how can we solve the problems for our
customers using the expertise that we
have on the team? And so they came up
with a few really interesting solutions
to this problem to these problems. So
first when it came to converting the
data into the graph structure team went
back to their expertise and they say
what do we know how to do? We know how
to build models. So let's build a
specialized model that can scale and run
on CPUs or smaller GPUs which I think is
a really clever solution. Now if you
were to do this now there's probably
enough fast small models out there that
you could fine-tune something like that.
You wouldn't have to build it yourself.
But at the time we didn't really have
any options like that. So the team built
it themselves and fine-tuned a model
that was trained to map this data into
graph structures of nodes and edges and
we did some uh better contextaware
splitting and chunking to uh preserve
the context and the semantic
relationships and this really helped uh
preserve the reliability.
Okay. And so then the issues with the
scaling of the graph databases and the
limitations of the the expertise on the
team with the cost at scale. So again we
went back and and thought about like
what is our team's expertise in and what
can we do and so what we did was instead
we stored the data points as JSON in a
lucine based search engine. So we take
the graph structure we converted into
JSON and we put it in the search engine
and this allowed us to easily handle the
large amounts of data without any
performance or speed degradation uh at
scale while still being something that
the team was really good at. And so the
team had started to assemble this
concept of of what our rag system was
look was looking like. And again, this
is kind of more of a historical snapshot
and a and a and a sketch over time. But
uh where we do the context aware
splitting and text to graph with this
specialized model and then pass it to a
search engine. Uh and we were really
starting to drive up our accuracy.
But uh we still have those problems with
the similarity matching and the
textbased queries doing better than the
complex graph structures. And so again,
the team sort of like went back to first
principles and thought, okay, what what
is it that we're trying to solve here
and let's go back to the research and
figure out like what we can build on to
build a solution that's best for our
customers and our specific needs. And I
think this is kind of the final meta
point of letting research challenge your
assumptions. So rather than staying
focused on the solution, you know, step
back, look at the research and figure
out what you can do to solve the
challenges for your customers. So they
went back to the original rag paper and
if you go back to the original rag paper
it doesn't actually ever talk about
using prompt context and questions which
is super interesting right that's sort
of like the deacto way of doing rag now
but the the original rag paper actually
proposed this whole like two uh
component architecture with a retriever
and a generator with pre- pre-trained
sequence to sequence model never
actually talks about prompt and context
in questions and so that's where they
came across fusion and decoder which I
kind of think of as like an alternate
timeline for rag like if we if we didn't
go down the road of uh prompt and
context and questions and so fusion and
decoder is this technique that kind of
builds upon the original proposal of the
original rag paper where it processes
the passages independently in the
encoder to get linear scaling instead of
quadratic scaling but then jointly in
the decoder for better evidence
aggregation. So big efficiency
breakthrough and lots of
state-of-the-art performance. I know
there's a super abstract. So if you go
to Facebook, they actually have a a
fusion and decoder uh library that you
can play around with and actually do the
steps of fusion and decoder. I also know
that at this point you're going like
what the heck is this guy talking about
in a graph rag track? Why are we talking
about fusion and decoder? Well, I'm glad
you asked because the next big
breakthrough was knowledge graph with
fusion and decoder. So you can use
knowledge graphs with fusion and decoder
uh as a technique and this sort of
improves upon the fusion and decoder
paper by using knowledge graphs to
understand the relationships between the
retrieved passages and so it helps with
this efficiency bottleneck and improves
uh the the process. I'm not going to
walk through this diagram step by step,
but this is the diagram in the paper of
the architecture where it it uses the
graph and then does this kind of
two-stage ranking of the passages and it
helps with uh improving the efficiency
while also lowering the cost. And so the
team took all this research and came to
came together to build their own um
implementation of fusion indicer since
we actually build our own models uh to
make that kind of the final piece of the
puzzle and it really helped our
hallucination rate. really drove it down
and then we published a white paper with
our own findings of it.
And so then we kind of had that piece of
the puzzle and there's a few other
techniques that we don't have time to go
over but point being we're we're
assembling together multiple techniques
based on research to get the best
results we can for our customers. So
that's all well and good but like does
it actually work? Like that's the
important part right. So we did some
benchmarking last year. We used Amazon's
robust QA data set and compared our
retrieval system with knowledge graph
and fusion decoder and everything uh
with our with seven different vector
search uh systems and we found that we
had the the best accuracy and the
fastest response time. So encourage you
to check that out and kind of check out
this process. Benchmarks are really cool
but what's even cooler is like what it
unlocks for our customers which are
various features in the product. Um for
one because like most graph structures
we can actually expose the thought
process because we have that
relationships and the additional context
where you can show the snippets and the
subqueries and the sources for how the
rag system is actually getting the
answers and we can expose this in the
API to developers as well as in the
product
and then we're also to have able to have
knowledge graphic sell multihop
questions
where we can um reason across multiple
documents and multiple topics without
any struggles. And then lastly, it can
handle complex data formats where vector
retrieval struggles where an answer
might be split into multiple pages or
maybe there's a similar term that
doesn't quite match what the user is
looking for. But we because we have that
graph structure and and the and fusion
and decoder with the additional context
and relationships, we're able to uh
formulate these correct answers.
So again, my main takeaway here is that
there are many ways that you can get the
benefits of knowledge graphs in rag.
That could be through a graph database.
It could be through doing something
creative with posters. It could be
through a search engine. Uh but you can
you take advantage of the relationships
that you can build with knowledge graphs
uh in your rag system. And as you get
there, you can challenge your
assumptions and focus on the customers
to be able to get to the end result to
to make the team successful. And so for
our team, it was focusing on the
customer needs instead of what was
hyped, staying flexible based on the
expertise of the team and letting
research challenge their assumptions.
Um, so if you want to join this amazing
team, we're hiring across research,
engineering, and product. Uh, we would
love to talk to you about any of our
open roles. Uh, and I'm available for
questions. You can come find me in the
hallway or reach out to me on Twitter or
LinkedIn. And that's all I've got for
you. Thank you so much.