Building a Smarter AI Agent with Neural RAG - Will Bryk, Exa.ai

Channel: aiDotEngineer

Published at: 2025-07-29

YouTube video id: xnXqpUW_Kp8

Source: https://www.youtube.com/watch?v=xnXqpUW_Kp8

[Music]
All right. So, I was gonna give uh live
demo coding,
but well, I will, but I know you all are
actually here to hear a cool story. So
I'll tell you a story about web search
built for AI and then we do some coding
at the end.
This story will end with this slide uh
one API to get any information from the
web
and you'll know what this means by the
end but the story starts in 1998
and what you're looking at is the the
state-of-the-art in information
retrieval in 1998. You type in a word
Australia to this new search engine
called Google and it magically finds you
all the documents that contain the word
Australia from the web. It's crazy. Um
and the the big insight of Google was
they had this page rank algorithm. So uh
the results are ranked by authority
based on the graph structure of the web.
And this was a clever algorithm and it
was really cool. I was two years old at
the time. So if I was conscious I would
have thought this was cool. Um
okay and now our story our now our story
uh skips 23 years to 2021. Um by this
point I was conscious barely and uh uh
and I I noticed that you know GBD3 had
recently come out and it was this
magical thing that you could input a
whole paragraph explaining exactly what
you want uh and it would really
understand the subtleties of your
language and give you an output that
exactly matched. Um, and it's hard to
remember how magical this was, but it
was really magical in 2021. And at the
same time, I noticed there was Google,
which you know, you type in a simple
query like shirts without stripes and it
would give you shirts with stripes,
which is crazy. Uh, it like doesn't
understand the word without u because
it's doing a keyword comparison
algorithm. And so I decided that for the
next at least 10 years I'm going to
devote myself to building a search
engine that combines the technology of
GB3 uh to with a search engine to make a
search engine that actually understands
what you're saying uh at a deep level
and understands all the documents on the
web at a deep level and gives you
exactly what you ask for. This is a very
big idea and we're working we've been
working on it for four years and uh a
lot of progress
but it would change the world if you
actually solve this problem. And so in
2021, uh, we we we joined YC summer
2021. Uh, we raised a couple million
dollars and we did what every YC startup
should do. We spent half of it on a GPU
cluster.
I'm joking. You shouldn't do that.
Um, and and then we also followed YC's
advice uh where we didn't talk to any
users or or customers for a year and a
half and we just did research. Um,
again, you shouldn't do that. You should
duck us, but in our case, it made sense
because we were trying to solve a really
hard problem which is like redesign
search from scratch. um using the same
technology as DB3, this like next token
prediction idea with transformers. What
if you could apply the same thing uh to
search? And this is actually one of our
uh WDB training runs. Um the purple one
I believe is was a breakthrough where it
like really it really like learned there
was like a few breakthroughs along the
way uh involving like random data sets
and different uh transform architectures
that we were trying. And this purple one
like really started to like work well.
Um and the general idea we had was like
okay so what is what is a search engine?
have like a trillion documents on the
web. Um, and traditional search engines
uh on a very high level will create like
a keyword index of those documents. So
for each document you you say you ask
what are the words in those document and
you create this big inverted index where
you map from like words like brown to
all the documents that contain that
word. Um, and then at search time, you
know, when a search without stripes
comes in, you do some crazy keyword uh
comparison algorithm and get the top
results. That's obviously a
simplification of what Google does. But
at a fundamental level, it's doing it's
like a keyword comparison.
But the idea was like what if you could
actually so with transformers like the
big thing is like what if you could turn
each document not into a set of keywords
but into embeddings. Uh and these
embeddings can be arbitrarily powerful,
right? Like it's a list of an embedding
is just a list of of of numbers and uh
it could represent lots of information.
So and embedding it doesn't just capture
the words in the document but also the
meaning the ideas in the document and
the way people refer to that document on
the web and you know embedding can be
arbitrarily big and so it like of course
in the limit it would just destroy
keywords and so you have this like
arbitrarily powerful representation um
and that the fundamental idea was just
like the bitter lesson what if we could
like you know train transformers to
output embeddings for documents and if
we keep getting more and more data
that's high quality we could uh make a
search engine that actually understands
you and um the way it would work at
inference at search time is like a
search comes in, a query comes in like
shirts without stripes. Traditional
search engines would use the above thing
where they would do a very fancy keyword
comparison and a bunch of other things.
Um, and then instead we would just embed
the shirts without stripes and compare
it to the embeddings of all the trillion
documents.
And you know, after a year and a half,
we actually had a new search engine that
worked in a very different way. Uh, and
you search shirt search shirts without
stripes on Google, sorry, on Exa and you
um you get a list of results that
actually are not do not have stripes. Uh
it's a simple uh example, but like you
could uh it could handle like more way
more complex queries like paragraph long
queries.
And when we launched this in November
2022, we got a lot of excitement on
Twitter. Um this is a very new paradigm
for search. You can do all sorts of
interesting queries that you couldn't do
before. And then two weeks later, this
happened. It was a small tweet. Um
and uh this is a visual depiction of San
Francisco at the time. Um you guys
probably all remember this.
And then this is a visual depiction of
the exit team at the time because chatbt
completely changed the way we interact
with the world's information. You know,
like everyone can now use an LLM to just
like talk talk to their computer and and
get information. And we were thinking,
wait, is there even a role for search in
this world? Like these LLMs are so
powerful. And then very quickly we
realized, yes, there is a role because
LLM don't know everything on the web.
So, for example, if you ask an LLM like
GBD4, find me cool personal sites of
engineers in San Francisco. Um, it'll it
it can't like it just doesn't have that
in the weights. It'll apologize,
whatever. Um, and you know, there's a
very simple information theory argument
here where it's like there literally
isn't enough information in the weights
of GB4 to store the whole web. GB4 will
call like we don't know exactly how many
uh parameters. I think someone leaked it
on YouTube once, but it's like, you
know, a couple trillion parameters. You
could call like less than 10 terabytes
uh in the weights of GB4. And then the
internet is like over a million
terabytes. And that's just the documents
on the web. Uh there's also images and
video and that's way more. Um actually
the the web if you look I I did a tweet
recently about the the size of the web
and it's it's in the exabyte range. Um
and our name is Exa. It's not a
coincidence. Um anyway, so like LLM uh
need to search the web just from this
simple argument and they're going to
need to do that for a long time which um
if you talk to ML researchers they'll
say the same thing. It's just like it
it's too hard. Also the web is
constantly updating. That's another
problem. It's not just the size of the
web, it's the constant updatingness of
the web that makes it very tricky. So
LMS always will need search. That's
great. Um, and so when you combine an
LLM with a search engine like Exa, you
can handle these uh queries. So like
find me cool personal sites and
engineers and SF. Uh, the LLM will
search EXA, get a list of personal
sites, uh, and then like use that
information to output the perfect thing
for the user. You're all very familiar
with this like LLM plus search. It's
obvious now, right? Like everyone knows
about it. But now let me tell you a
secret about search that most people
don't know. Um
and the secret is that traditional
search engines were not built for this
world of AI. Traditional search engines
were built for humans. Uh and humans are
not are very different from AI. Uh so
every search engine like Google, Bing,
you name it. Uh was built in a different
era for this kind of creature. uh this
this slow flesh human that's typing
keywords and wants to read a few links
and really cares about UI of the page
and all these things like it's a lazy
human. They type simple keywords. Google
is great for this creature. Um Google
was optimized for this creature. It
gives you exactly the kinds of things
you would click on.
But AIs are very different. Um this like
an AI can gobble up information like
crazy. This is a much slowed down
version of what our ais probably feel
like inside. Uh and so AI are very
different. They want to use complex
queries, not simple ones, to find not a
couple links, but just tons of
knowledge, as much knowledge as they
could get, because they actually have
the patience to just analyze it all
extremely fast. And so the the search
algorithm that's optimal for this type
of creature is not the same algorithm
that's optimal for the human. Like that
would be crazy if the same algorithm was
optimal for humans was optimal for uh
AIs. And so like all the a lot of the
tools, the search tools that we're
talking about these days on Twitter and
stuff like that, they're still using
like the old traditional search combined
with AIS. It's just not the right puzzle
fit. Um so Exo, we're really trying to
think of like what is the right search
engine for this AI world.
And so just a few examples uh we could
dive deep into um to of how AI are
different. Well, AIS want precise
controllable information. So by the way,
when I say AI, I'm usually I'm talking
about like an AI product. So imagine
like in this case like a VC that's using
an AI system to find a list of companies
uh because they want to invest. So you
know they're looking for something
what's the next big thing? What's the
next big thing that feels like Bell
Labs? Well, when they tell their AI what
they want, the AI will then go search a
search engine, right? And if it searches
a search engine like Google, they'll get
a list of results that humans like to
click on, but it's not very information
dense and it doesn't even match what the
person asks for what the AI asks for.
The AI asks for startups working on
something huge that feels like Bell
Labs. It should get a list of startups.
It's kind of a crazy idea, but what if
search engines actually returned exactly
what you asked of them and not what you
want to what Google knows you will click
on. And so with AI especially, they just
want a search engine that returns
exactly what they ask for. Because
what's what really the world is going to
look like is you're going to interact
with your AI agent and you're going to
ask for something and then it's going to
make tons of searches like, okay, maybe
they want startups working on something
like similar to Bill Bell Labs. Maybe
they want startups working only in New
York City that have this quality and
that quality and and and they'll do all
sorts of searches and it just wants a
search API that just does what it asks
and and so you need a search engine like
that. So X is like that. Um another
difference between AIS and humans is AI
want to search with lots of context.
Again, if you're if you have an AI
assistant and you talk to it all day and
then you ask for restaurants or
apartments or or what have you, uh the
AI has lots of context on you. So it
should be able to search with this large
multi paragraph thing saying like you
know my human is a software engineer and
it likes these types of things and I
like these types of things and like can
you give me uh you know restaurants that
match those preferences. Uh and so you
need a search engine that could
literally handle multiple paragraphs of
text. But traditional search like search
engines like Google were not meant to do
that because humans would never type in
multiple paragraphs because they're too
lazy. So Google was optimized for like
simple keyword queries. So Google I
think has like a a few dozen keyword
limit. Uh whereas uh Exa can handle like
multiple paragraphs. of text.
Another big one where AI are different
than humans is AIS want comprehensive
knowledge. Uh like if you give a human
10,000 links or 10,000 pages, it doesn't
know what to do with that. Like it would
take 10 days of extreme patience to
process all that. But AI can do it in 3
seconds if it's parallelized, right? So
if I'm an a VC and I want to report on
like all the companies in a space, I
want literally all the companies. And
there's a huge amount of value to
getting truly all of them and not just
like the 10 or 20 that Google is able to
find. And so you need a search engine
that exposes the ability to return a
thousand 10,000 whatever it is. And also
has this semantic ability to like you
know when you say like every starter
funded by YC working on AI you actually
can get all of them. So like Google
literally just can't do this at all.
Okay. I hope that through these examples
we see that the space of possible
queries is actually like way larger than
people realize. Uh and until like 2022,
we were kind of in this like top left
blue world. Uh so this circle is like
the space of possible queries and the
blues are like uh you know specific
subsets of that space. And so like we
were all in that top left corner of blue
for a long time where you could you know
we could search engines could handle
like uh like basic keyword queries like
stripe pricing or uh someone's GitHub
page or Taylor Swift's boyfriend or
whatever it is. Uh after 2022, everyone
started to want the top right blue uh
circle where it was like, "Hey,
actually, I want to make queries like
explain this concept to me like I'm a
5-year-old or here's my code. Can you
like debug it?" The this is a form of
query. Doesn't require search, but it's
a it's another type of query that was
introduced to the world in 2022. And
then like uh there's other types of
queries like these semantic queries like
people in San Francisco who know
assembly. uh as far as I'm aware, XA is
like I mean XA kind of like introduced
this kind of query and and uh and does
really really well on them on those
queries. And then there's these like
really complex queries like find me
every article that argues X and not Y
from an author like Z. And we're
starting to now have systems like X's
like websites product that could handle
these things. And I think this is
actually a huge space because this like
turns the web into like a database you
could filter however you want. And
that's really what AIs want. They want
this like full control database like
query system that they could just get
whatever they need for their user. And
then there are the queries that no one
has thought of yet. Um like every week
we get tons of queries and like oh wait
that's a really interesting type of
query that uh that no search engine
could do right now. And and eventually
we'll try to you know handle all the the
queries that are possible. But there
there's so many new types of queries now
because we have these AI systems and the
stakes like the the expectations have
just gotten way higher.
Okay. So now you we end our story. uh
with the same slide a one API to get any
information from the web. So again like
EXO is trying to if you go back like
handle not just like the keyword queries
but also the semantic queries and also
the super complex queries and eventually
all queries. Um we we want one API that
could like give these AI systems
whatever knowledge they want. You have
the AI and you have Exa providing uh the
knowledge. Oh, I only have four minutes.
Okay. Um okay.
So that's let's see. Oop.
How do I go to a different part of my
computer?
Uh, if I change to the code editor, how
do I do that? Let's see. What? Oh, it's
there. Oh, but I can't see it. So weird.
Oh, cool. Okay. Okay. Um,
there we go. Okay, cool. Well, first of
all, I just just very quick exploration
of this is our our search dashboard, we
could try different queries. I would
just point out like in the search uh API
endpoint. Uh you know, we expose lots of
different toggles. So, first of all, you
just try out a query and get uh it shows
you the code and it gets you uh a list
of results. Uh and it exposes tons of
different types of filters that you
might want to do. For example, like
number of results, 10, 100, a thousand,
whatever it is. Uh you could have like
date ranges or you know, I only want to
search over these domains. And it's a
lot of toggles, but I think the point is
actually you want the toggles because
your AI is actually going to be calling
this. You want a search engine that
gives you full control. Um, and we have
like neural and keyword search. So you
could try different ones. Um, okay, let
me quickly jump the the
code. Okay, so I prepared this like code
uh agent.py. So we made this agent uh
agent Mark and Mark loves to make
markdown out of things. Anything you
give it, it will make markdown. Mark
will make markdown. Uh and so in this
case uh we're going to here well I guess
in this case let's try uh this query uh
personal site of engineer in San
Francisco who likes information
retrieval.
Uh well this is this is the kind of
query that neural would be a lot better
at. What?
Okay. Save it.
Oh, wrong the wrong agent.
Okay, so it's just it's making a query
to get like a list of personal sites of
engineers San Francisco who like
information retrieval and and mark the
agent is just making a markdown output
of that. That's a very neural type
query. You also might want to do uh a
different type of query which is like a
more keyword heavy one. Let's see like
um
GitHub let me and my GitHub so okay so
here I would want to make a keyword
query so you just change the keyword
search so it's going to get information
from my from my GitHub using keyword
search because this is a very typical
like Google like search that would work
well right oh god I'm running this wrong
one
okay
cool that's information about Wicks,
GitHub. Um, and then, okay, so when
you're actually building an agent,
you're going to be combining lots of
different types of searches. So, neural
searches and keyword searches, uh, and
all sorts of other searches that X
exposes. So, like the right agent in the
future is going to be this system that
decides what type of search it needs,
uh, for, uh, what whatever the user
says, like be like, oh, okay, I'm going
to make like a neural search to get a
list of things, and then for each one,
I'm going to do a keyword search. Right?
You want to give the uh, agent like just
full access to the world's information
in however way it wants. uh not just
keyword search but also all these other
things. Um and so here I oneshotted with
03 a GitHub agent which combines these
two queries. So first it'll because I
want you know I want to get the GitHub
of every uh engineer in San Francisco
who likes information retrieval. Uh so
the agent will make uh a neural search
to get a list of people extract the
names and then search those using a
keyword search to get their GitHubs. And
then if you run that
here, it's just getting 10 results, but
we could, you know, with Exo, we could
do 100 or a thousand if you're on an
enterprise plan. Um,
so now it's getting all the GitHub info.
Cool. So that's just a example. Um, and
yeah, I mean there are lots of other
things that you could do with Exa like
um, we actually just today launched this
research endpoint um, where it will
actually do like as much searches in the
NL LM calls in the background to get you
that perfect report or that perfect
structured output for the thing you
asked for. So it's kind of like a deep
research API and it state-of-the-art
deep research API. Um, cool. That is the
talk. I hope that was interesting. Thank
you.
[Music]