When Vectors Break Down: Graph-Based RAG for Dense Enterprise Knowledge - Sam Julien, Writer
Channel: aiDotEngineer
Published at: 2025-07-22
YouTube video id: XlAIgmi_Vow
Source: https://www.youtube.com/watch?v=XlAIgmi_Vow
[Music] welcome. So glad to see you all here. Uh, welcome to When Vectors Breakdown, graph-based rag for dense enterprise knowledge. And big thank you to Swix and Ben for putting on yet another amazing event. Um, so it's a pretty interesting signal that we have an entire track dedicated to graph-based rag. And I think in addition to all of the agentic uh promise of graph-based rag, we're also seeing that the market is starting to catch up that vector search is just not enough for rag at scale. You may have seen this really interesting article by Joe Christian Bergam who is around here somewhere on the rise and fall of the vector database infrastructure category and his subsequent interview on latent space where he talked about how vector databases have experienced this gold rush after ChatP's launch. uh but that the industry is starting to recognize that vector search alone is just insufficient for sophisticated retrieval and that we're going to need multiple strategies beyond simple vector similarity. This is music to our ears at Ryder because we've actually been talking about this for a long time. We've been uh talking about the benefits of graph-based rag for a couple of years now. In fact, if you look at this article from November 2023, which in AI time is like prehistoric times, um we actually talk about the benefits of knowledge graphs and the shortcomings of vector databases and simple similarity search for enterprise rag at scale. And if you're not familiar with writer, we're this end-to-end agentic platform for enterprises where we build our own models, we build our own graph-based rag system and have this suite of software tools on top of that for enterprises to be able to build agents and AI applications. And so as we've been building knowledger graph over the years, it's been an interesting journey as we've been working with these Fortune 500 and global 2000 companies at scale. Most of them or many of them are in highly regulated industries like healthcare and finance where accuracy and low hallucinations are super important. And so our team has been putting together this system over the years of different components put together and different techniques that we could really drive our accuracy rate up high and reduce our hallucinations. And so what I wanted to share in this talk was kind of the journey of how we got there. And the main takeaway being as you're seeing in several of these talks like the first talk about hybrid search there are many different ways that you can get the benefits of knowledge graphs in rag and also what how you get there and what you learn along the way is actually often very valuable as you're building out your retrieval system uh almost just as valuable as the end result itself. So, I'm going to weave together these two stories of our journey to graph-based rag and sort of the first principles thinking that I think has made our team successful in putting together this system as we continue to iterate and improve on it. So, I'm Sam Julene. I'm the director of developer relations at Writer and you can find most of my writing and books and newsletters and all of those things at sjulene.com. So, I talked about this system composed of multiple pieces put together over a couple of different years. And I want to talk about sort of how we got to this point and where we are now. And I'm just going to put a blanket caveat on here that please consider this a sketch and not a blueprint of what is currently in production. Of course, there are like many moving pieces and many layers to this. Uh, but I want to abstract it enough to make it something that is practical and and usable for people. So our research team, we have a cracked research team at Ryder and they have four main areas of focus. Enterprise models like like our Palmyra X5 model, that's the one powering the chat on the AI engineer website right now. Practical evaluations like our finance uh benchmark called failsafe QA domain specific uh specialization. These are our domain specific models like Palmyra Med and Pomera Finn. And then what our focus is here retrieval and knowledge integration. So bringing enterprise data to work with our models in a secure reliable way. And I think what's really cool about the way our research team works is that they're very focused on solving practical problems for our customers. Uh they're not just sort of like working in isolation uh working on theoretical things. They're actually driven by customer insights. And that's uh really what I would consider like sort of the first meta lesson of what why I think this is working so well for writer. Right now we're really focused on solving the customer problems rather than implementing specific solutions. So the problem that we are trying to solve kind of constantly as most of us are here is that enterprise data is really dense, specialized and massive. So we're often dealing with terabytes of data and it uses very specific language and it's often very clustered together. There's not a lot of diversity in the language used in these documents. And that's what our research and engineering teams have been focused on these last few years. So like most we kind of started out with a regular search of quering a knowledge base using an algorithm and passing that to the LLM. But that quickly sort of like ran out because of you know it was good for basic keyword searches but not really great for that advanced similarity search that we needed. So then again like most we went to vector embeddings and did chunking and embeddings and put it in a database and then similarity search uh and passing it to the LLM for the end user to query. But we ran into two major problems with this. The first is that with vector retrieval chunking and nearest neighbors can give inaccurate answers. Uh so if you look at this example of kind of this text about the founding of Apple and the timeline, it's very easy for us as humans to look at these text chunks and pick out the fact that the Macintosh was created in 1984. But when you chunk this text naively and you just give it to a nearest neighbor search, uh it can get confused and it thinks that it was actually in 1983 instead of 1984 because it's in the same chunk as the introduction of the Lisa. Uh side note, I'm a huge Apple vintage Apple nerd and so I liked this example. The other big problem that we ran into with vector retrieval was that it was failing with really concentrated data. So if you think about a lot of large enterprises, it's not like they're dealing with documents where like some of them are talking about animals and some of them are talking about fruit, right? Like so if you have a mobile phone company for example and they have thousands and thousands of documents that all use megapixels and cameras and battery life and things like that and you ask the rag system and the LLM to compare two different phone models, it's going to really struggle with that because it's going to find all these answers and have no idea how to make sense of them. And so that's what took took us to graph-based rag where instead we would query a graph database and get back the relevant documents using keys uh and generate an answer. Especially powerful if you combine that with like full text and similarity search and things like that. Um and so this really helped us with our accuracy because we were able to preserve the relationships with the text and provide more context to to the model. Uh, and this was really interesting because at the time there actually weren't that many people doing graph-based rag o last over the last couple of years. And that's why I think the focus of the team on really trying to solve the problem of the customer rather than chase whatever was uh being hyped up at the time was really important. So that was really great. But we did run into some challenges back then with using graph databases. Now this is not an indictment of any graph database technology. It's just that we were running into these issues at the time. a couple of years ago. And so there were four things that we ran into. First, that converting the data into the structured graph was getting really challenging and costly at scale. Uh as the as the graph database scaled, we were hitting the limits of our team's expertise as well as hitting some cost issues. And then we were running into some problems where cipher was struggling with the advanced similarity matching that we needed. And we were noticing that LLMs were doing better with textbased queries rather than complex graph structures. Now again, if you were to do this now, you might not run into those problems, but this is what we ran into historically. And so I think the way that the team approached this is also very interesting where they decided to stay flexible based on their expertise. So they were running into these problems that I think were not necessarily fundamental to the technology itself, but more like okay, how can we solve the problems for our customers using the expertise that we have on the team? And so they came up with a few really interesting solutions to this problem to these problems. So first when it came to converting the data into the graph structure team went back to their expertise and they say what do we know how to do? We know how to build models. So let's build a specialized model that can scale and run on CPUs or smaller GPUs which I think is a really clever solution. Now if you were to do this now there's probably enough fast small models out there that you could fine-tune something like that. You wouldn't have to build it yourself. But at the time we didn't really have any options like that. So the team built it themselves and fine-tuned a model that was trained to map this data into graph structures of nodes and edges and we did some uh better contextaware splitting and chunking to uh preserve the context and the semantic relationships and this really helped uh preserve the reliability. Okay. And so then the issues with the scaling of the graph databases and the limitations of the the expertise on the team with the cost at scale. So again we went back and and thought about like what is our team's expertise in and what can we do and so what we did was instead we stored the data points as JSON in a lucine based search engine. So we take the graph structure we converted into JSON and we put it in the search engine and this allowed us to easily handle the large amounts of data without any performance or speed degradation uh at scale while still being something that the team was really good at. And so the team had started to assemble this concept of of what our rag system was look was looking like. And again, this is kind of more of a historical snapshot and a and a and a sketch over time. But uh where we do the context aware splitting and text to graph with this specialized model and then pass it to a search engine. Uh and we were really starting to drive up our accuracy. But uh we still have those problems with the similarity matching and the textbased queries doing better than the complex graph structures. And so again, the team sort of like went back to first principles and thought, okay, what what is it that we're trying to solve here and let's go back to the research and figure out like what we can build on to build a solution that's best for our customers and our specific needs. And I think this is kind of the final meta point of letting research challenge your assumptions. So rather than staying focused on the solution, you know, step back, look at the research and figure out what you can do to solve the challenges for your customers. So they went back to the original rag paper and if you go back to the original rag paper it doesn't actually ever talk about using prompt context and questions which is super interesting right that's sort of like the deacto way of doing rag now but the the original rag paper actually proposed this whole like two uh component architecture with a retriever and a generator with pre- pre-trained sequence to sequence model never actually talks about prompt and context in questions and so that's where they came across fusion and decoder which I kind of think of as like an alternate timeline for rag like if we if we didn't go down the road of uh prompt and context and questions and so fusion and decoder is this technique that kind of builds upon the original proposal of the original rag paper where it processes the passages independently in the encoder to get linear scaling instead of quadratic scaling but then jointly in the decoder for better evidence aggregation. So big efficiency breakthrough and lots of state-of-the-art performance. I know there's a super abstract. So if you go to Facebook, they actually have a a fusion and decoder uh library that you can play around with and actually do the steps of fusion and decoder. I also know that at this point you're going like what the heck is this guy talking about in a graph rag track? Why are we talking about fusion and decoder? Well, I'm glad you asked because the next big breakthrough was knowledge graph with fusion and decoder. So you can use knowledge graphs with fusion and decoder uh as a technique and this sort of improves upon the fusion and decoder paper by using knowledge graphs to understand the relationships between the retrieved passages and so it helps with this efficiency bottleneck and improves uh the the process. I'm not going to walk through this diagram step by step, but this is the diagram in the paper of the architecture where it it uses the graph and then does this kind of two-stage ranking of the passages and it helps with uh improving the efficiency while also lowering the cost. And so the team took all this research and came to came together to build their own um implementation of fusion indicer since we actually build our own models uh to make that kind of the final piece of the puzzle and it really helped our hallucination rate. really drove it down and then we published a white paper with our own findings of it. And so then we kind of had that piece of the puzzle and there's a few other techniques that we don't have time to go over but point being we're we're assembling together multiple techniques based on research to get the best results we can for our customers. So that's all well and good but like does it actually work? Like that's the important part right. So we did some benchmarking last year. We used Amazon's robust QA data set and compared our retrieval system with knowledge graph and fusion decoder and everything uh with our with seven different vector search uh systems and we found that we had the the best accuracy and the fastest response time. So encourage you to check that out and kind of check out this process. Benchmarks are really cool but what's even cooler is like what it unlocks for our customers which are various features in the product. Um for one because like most graph structures we can actually expose the thought process because we have that relationships and the additional context where you can show the snippets and the subqueries and the sources for how the rag system is actually getting the answers and we can expose this in the API to developers as well as in the product and then we're also to have able to have knowledge graphic sell multihop questions where we can um reason across multiple documents and multiple topics without any struggles. And then lastly, it can handle complex data formats where vector retrieval struggles where an answer might be split into multiple pages or maybe there's a similar term that doesn't quite match what the user is looking for. But we because we have that graph structure and and the and fusion and decoder with the additional context and relationships, we're able to uh formulate these correct answers. So again, my main takeaway here is that there are many ways that you can get the benefits of knowledge graphs in rag. That could be through a graph database. It could be through doing something creative with posters. It could be through a search engine. Uh but you can you take advantage of the relationships that you can build with knowledge graphs uh in your rag system. And as you get there, you can challenge your assumptions and focus on the customers to be able to get to the end result to to make the team successful. And so for our team, it was focusing on the customer needs instead of what was hyped, staying flexible based on the expertise of the team and letting research challenge their assumptions. Um, so if you want to join this amazing team, we're hiring across research, engineering, and product. Uh, we would love to talk to you about any of our open roles. Uh, and I'm available for questions. You can come find me in the hallway or reach out to me on Twitter or LinkedIn. And that's all I've got for you. Thank you so much.