Stop Using RAG as Memory — Daniel Chalef, Zep

Channel: aiDotEngineer
Published at: 2025-07-22
YouTube video id: T5IMo5ntyhA
Source: https://www.youtube.com/watch?v=T5IMo5ntyhA
[Music]
I'm here today to tell you that there's
no onesizefits all memory. Um,
and why you need to model your memory
after your business domain. So, if you
saw me a little bit earlier and I was
talking about Graffiti, Zep's
open-source temporal graph framework,
um, you might have seen me just speak to
how you can build custom entities and
edges in the graffiti graph for your
particular business domain. So, business
objects from your business domain. What
I'm going to demo today is actually how
Zep implements that and how it easy it
is to use from Python, TypeScript or Go.
And what we've done here is we've solved
a fundamental problem plaguing memory.
And we're enabling developers to
build out memory that is far more cogent
and capable for many different use
cases. So I'm going to just show you a
quick example of
where things go really wrong. So many of
you might have used chat GPT before. It
generates facts about you in memory and
you might have noticed that it really
struggles with relevance.
Sometimes it just pulls out all sorts of
arbitrary facts about you. And
unfortunately when you store arbitrary
facts and retrieve them as memory, you
get inaccurate responses or
hallucinations.
And the same problem happens when you're
building your own agents.
So here we go. We have an example media
assistant and it should remember things
about jazz music, NPR, podcasts, the
daily, etc. All the things that I like
to listen to. But unfortunately, because
I'm in conversation with the agent or
it's picking up my voice when I'm, you
know, it's a voice agent. Um, it's
learning all sorts of irrelevant things
like I wake up at 7 a.m. My dog's name
is Melody, etc. And the point here is
that irrelevant facts pollute memory.
They're not specific to the media player
business domain. And so the technical
reality here is as well that many
frameworks take this really simplistic
approach approach to generating facts.
If you're using a framework that has
memory capabilities, agent framework,
it's generating facts and throwing it
into a vector database. And
unfortunately the facts dumped into the
vector database or reddus mean that when
you're recalling that memory, it's
difficult to differentiate what should
be returned. We're going to return what
is semantically similar. And here we
have um a bunch of facts that are
semantically similar to my request for
my favorite tunes. Um we have some good
things. And unfortunately Melody is
there as well because Melody is a dog
named Melody and that might be something
to do with tunes. Um and so
bunch of irrelevant stuff.
So basically semantic similarity is not
business relevance
and this is not unexpected.
I was speaking a little bit earlier
about how vectors and are just basically
projections into an embedding space.
There's no causal or relational
uh relations between them. And so we
need a solution. We need domainaware
memory not better semantic search.
So, with that, I am going to
unfortunately be showing you a video
because the Wi-Fi has been absolutely
terrible. Um,
and let me bring up the video.
Okay. So,
I built a little application here and it
is a finance coach and I've told it I
want to buy a house.
And it's asking me, well, how much do I
earn a year? It's asking me about what
student loan debt I might have. And
we'll see that on the right hand side,
what is stored in Zep's memory
are some very explicit
business objects. We have financial
goals, debts, income sources, etc. These
are defined by the developer and they're
defined in a way which is really simple
to understand. We can use paidantic or
zod or go strructs
and we can apply business rules. So
let's go take a look at some of the code
here. We have a TypeScript financial
goal schema using Zep's underlying SDK.
We can define these entity types. We can
give a description to the entity type.
Uh we can even define fields, the
business rules for those fields. So the
values that they take on. And then we
can build tools for our agent to
retrieve a financial snapshot which runs
multiple zep searches at the same time
concurrently and filters by specific
node types.
And when we start our Zep application,
what we're going to do is we're going to
register these particular goals uh sorry
objects with uh Zep. So it knows to
build this ontology in the graph. So
let's do a quick little addition here.
I'm going to say that I have $5,000
month rent.
I think it's rent.
And in a few seconds, we see that Zep's
already paused that new message and has
captured that $5,000. And we can go look
at the chart, the graph. This is the the
Zep front end. And we can see the
knowledge graph for this user has got a
debt account entity. It's got fields on
it um that we've defined as a developer.
And so again, we can really get really
tight about what we retrieve from Zep by
filtering. Okay, so we're at time. So
just very quickly, we wrote a paper
about how this all of this works. You
can get to it uh by that link below. And
appreciate your time today.
You can look me up afterwards.
[Music]