Building a Smarter AI Agent with Neural RAG - Will Bryk, Exa.ai
Channel: aiDotEngineer
Published at: 2025-07-29
YouTube video id: xnXqpUW_Kp8
Source: https://www.youtube.com/watch?v=xnXqpUW_Kp8
[Music] All right. So, I was gonna give uh live demo coding, but well, I will, but I know you all are actually here to hear a cool story. So I'll tell you a story about web search built for AI and then we do some coding at the end. This story will end with this slide uh one API to get any information from the web and you'll know what this means by the end but the story starts in 1998 and what you're looking at is the the state-of-the-art in information retrieval in 1998. You type in a word Australia to this new search engine called Google and it magically finds you all the documents that contain the word Australia from the web. It's crazy. Um and the the big insight of Google was they had this page rank algorithm. So uh the results are ranked by authority based on the graph structure of the web. And this was a clever algorithm and it was really cool. I was two years old at the time. So if I was conscious I would have thought this was cool. Um okay and now our story our now our story uh skips 23 years to 2021. Um by this point I was conscious barely and uh uh and I I noticed that you know GBD3 had recently come out and it was this magical thing that you could input a whole paragraph explaining exactly what you want uh and it would really understand the subtleties of your language and give you an output that exactly matched. Um, and it's hard to remember how magical this was, but it was really magical in 2021. And at the same time, I noticed there was Google, which you know, you type in a simple query like shirts without stripes and it would give you shirts with stripes, which is crazy. Uh, it like doesn't understand the word without u because it's doing a keyword comparison algorithm. And so I decided that for the next at least 10 years I'm going to devote myself to building a search engine that combines the technology of GB3 uh to with a search engine to make a search engine that actually understands what you're saying uh at a deep level and understands all the documents on the web at a deep level and gives you exactly what you ask for. This is a very big idea and we're working we've been working on it for four years and uh a lot of progress but it would change the world if you actually solve this problem. And so in 2021, uh, we we we joined YC summer 2021. Uh, we raised a couple million dollars and we did what every YC startup should do. We spent half of it on a GPU cluster. I'm joking. You shouldn't do that. Um, and and then we also followed YC's advice uh where we didn't talk to any users or or customers for a year and a half and we just did research. Um, again, you shouldn't do that. You should duck us, but in our case, it made sense because we were trying to solve a really hard problem which is like redesign search from scratch. um using the same technology as DB3, this like next token prediction idea with transformers. What if you could apply the same thing uh to search? And this is actually one of our uh WDB training runs. Um the purple one I believe is was a breakthrough where it like really it really like learned there was like a few breakthroughs along the way uh involving like random data sets and different uh transform architectures that we were trying. And this purple one like really started to like work well. Um and the general idea we had was like okay so what is what is a search engine? have like a trillion documents on the web. Um, and traditional search engines uh on a very high level will create like a keyword index of those documents. So for each document you you say you ask what are the words in those document and you create this big inverted index where you map from like words like brown to all the documents that contain that word. Um, and then at search time, you know, when a search without stripes comes in, you do some crazy keyword uh comparison algorithm and get the top results. That's obviously a simplification of what Google does. But at a fundamental level, it's doing it's like a keyword comparison. But the idea was like what if you could actually so with transformers like the big thing is like what if you could turn each document not into a set of keywords but into embeddings. Uh and these embeddings can be arbitrarily powerful, right? Like it's a list of an embedding is just a list of of of numbers and uh it could represent lots of information. So and embedding it doesn't just capture the words in the document but also the meaning the ideas in the document and the way people refer to that document on the web and you know embedding can be arbitrarily big and so it like of course in the limit it would just destroy keywords and so you have this like arbitrarily powerful representation um and that the fundamental idea was just like the bitter lesson what if we could like you know train transformers to output embeddings for documents and if we keep getting more and more data that's high quality we could uh make a search engine that actually understands you and um the way it would work at inference at search time is like a search comes in, a query comes in like shirts without stripes. Traditional search engines would use the above thing where they would do a very fancy keyword comparison and a bunch of other things. Um, and then instead we would just embed the shirts without stripes and compare it to the embeddings of all the trillion documents. And you know, after a year and a half, we actually had a new search engine that worked in a very different way. Uh, and you search shirt search shirts without stripes on Google, sorry, on Exa and you um you get a list of results that actually are not do not have stripes. Uh it's a simple uh example, but like you could uh it could handle like more way more complex queries like paragraph long queries. And when we launched this in November 2022, we got a lot of excitement on Twitter. Um this is a very new paradigm for search. You can do all sorts of interesting queries that you couldn't do before. And then two weeks later, this happened. It was a small tweet. Um and uh this is a visual depiction of San Francisco at the time. Um you guys probably all remember this. And then this is a visual depiction of the exit team at the time because chatbt completely changed the way we interact with the world's information. You know, like everyone can now use an LLM to just like talk talk to their computer and and get information. And we were thinking, wait, is there even a role for search in this world? Like these LLMs are so powerful. And then very quickly we realized, yes, there is a role because LLM don't know everything on the web. So, for example, if you ask an LLM like GBD4, find me cool personal sites of engineers in San Francisco. Um, it'll it it can't like it just doesn't have that in the weights. It'll apologize, whatever. Um, and you know, there's a very simple information theory argument here where it's like there literally isn't enough information in the weights of GB4 to store the whole web. GB4 will call like we don't know exactly how many uh parameters. I think someone leaked it on YouTube once, but it's like, you know, a couple trillion parameters. You could call like less than 10 terabytes uh in the weights of GB4. And then the internet is like over a million terabytes. And that's just the documents on the web. Uh there's also images and video and that's way more. Um actually the the web if you look I I did a tweet recently about the the size of the web and it's it's in the exabyte range. Um and our name is Exa. It's not a coincidence. Um anyway, so like LLM uh need to search the web just from this simple argument and they're going to need to do that for a long time which um if you talk to ML researchers they'll say the same thing. It's just like it it's too hard. Also the web is constantly updating. That's another problem. It's not just the size of the web, it's the constant updatingness of the web that makes it very tricky. So LMS always will need search. That's great. Um, and so when you combine an LLM with a search engine like Exa, you can handle these uh queries. So like find me cool personal sites and engineers and SF. Uh, the LLM will search EXA, get a list of personal sites, uh, and then like use that information to output the perfect thing for the user. You're all very familiar with this like LLM plus search. It's obvious now, right? Like everyone knows about it. But now let me tell you a secret about search that most people don't know. Um and the secret is that traditional search engines were not built for this world of AI. Traditional search engines were built for humans. Uh and humans are not are very different from AI. Uh so every search engine like Google, Bing, you name it. Uh was built in a different era for this kind of creature. uh this this slow flesh human that's typing keywords and wants to read a few links and really cares about UI of the page and all these things like it's a lazy human. They type simple keywords. Google is great for this creature. Um Google was optimized for this creature. It gives you exactly the kinds of things you would click on. But AIs are very different. Um this like an AI can gobble up information like crazy. This is a much slowed down version of what our ais probably feel like inside. Uh and so AI are very different. They want to use complex queries, not simple ones, to find not a couple links, but just tons of knowledge, as much knowledge as they could get, because they actually have the patience to just analyze it all extremely fast. And so the the search algorithm that's optimal for this type of creature is not the same algorithm that's optimal for the human. Like that would be crazy if the same algorithm was optimal for humans was optimal for uh AIs. And so like all the a lot of the tools, the search tools that we're talking about these days on Twitter and stuff like that, they're still using like the old traditional search combined with AIS. It's just not the right puzzle fit. Um so Exo, we're really trying to think of like what is the right search engine for this AI world. And so just a few examples uh we could dive deep into um to of how AI are different. Well, AIS want precise controllable information. So by the way, when I say AI, I'm usually I'm talking about like an AI product. So imagine like in this case like a VC that's using an AI system to find a list of companies uh because they want to invest. So you know they're looking for something what's the next big thing? What's the next big thing that feels like Bell Labs? Well, when they tell their AI what they want, the AI will then go search a search engine, right? And if it searches a search engine like Google, they'll get a list of results that humans like to click on, but it's not very information dense and it doesn't even match what the person asks for what the AI asks for. The AI asks for startups working on something huge that feels like Bell Labs. It should get a list of startups. It's kind of a crazy idea, but what if search engines actually returned exactly what you asked of them and not what you want to what Google knows you will click on. And so with AI especially, they just want a search engine that returns exactly what they ask for. Because what's what really the world is going to look like is you're going to interact with your AI agent and you're going to ask for something and then it's going to make tons of searches like, okay, maybe they want startups working on something like similar to Bill Bell Labs. Maybe they want startups working only in New York City that have this quality and that quality and and and they'll do all sorts of searches and it just wants a search API that just does what it asks and and so you need a search engine like that. So X is like that. Um another difference between AIS and humans is AI want to search with lots of context. Again, if you're if you have an AI assistant and you talk to it all day and then you ask for restaurants or apartments or or what have you, uh the AI has lots of context on you. So it should be able to search with this large multi paragraph thing saying like you know my human is a software engineer and it likes these types of things and I like these types of things and like can you give me uh you know restaurants that match those preferences. Uh and so you need a search engine that could literally handle multiple paragraphs of text. But traditional search like search engines like Google were not meant to do that because humans would never type in multiple paragraphs because they're too lazy. So Google was optimized for like simple keyword queries. So Google I think has like a a few dozen keyword limit. Uh whereas uh Exa can handle like multiple paragraphs. of text. Another big one where AI are different than humans is AIS want comprehensive knowledge. Uh like if you give a human 10,000 links or 10,000 pages, it doesn't know what to do with that. Like it would take 10 days of extreme patience to process all that. But AI can do it in 3 seconds if it's parallelized, right? So if I'm an a VC and I want to report on like all the companies in a space, I want literally all the companies. And there's a huge amount of value to getting truly all of them and not just like the 10 or 20 that Google is able to find. And so you need a search engine that exposes the ability to return a thousand 10,000 whatever it is. And also has this semantic ability to like you know when you say like every starter funded by YC working on AI you actually can get all of them. So like Google literally just can't do this at all. Okay. I hope that through these examples we see that the space of possible queries is actually like way larger than people realize. Uh and until like 2022, we were kind of in this like top left blue world. Uh so this circle is like the space of possible queries and the blues are like uh you know specific subsets of that space. And so like we were all in that top left corner of blue for a long time where you could you know we could search engines could handle like uh like basic keyword queries like stripe pricing or uh someone's GitHub page or Taylor Swift's boyfriend or whatever it is. Uh after 2022, everyone started to want the top right blue uh circle where it was like, "Hey, actually, I want to make queries like explain this concept to me like I'm a 5-year-old or here's my code. Can you like debug it?" The this is a form of query. Doesn't require search, but it's a it's another type of query that was introduced to the world in 2022. And then like uh there's other types of queries like these semantic queries like people in San Francisco who know assembly. uh as far as I'm aware, XA is like I mean XA kind of like introduced this kind of query and and uh and does really really well on them on those queries. And then there's these like really complex queries like find me every article that argues X and not Y from an author like Z. And we're starting to now have systems like X's like websites product that could handle these things. And I think this is actually a huge space because this like turns the web into like a database you could filter however you want. And that's really what AIs want. They want this like full control database like query system that they could just get whatever they need for their user. And then there are the queries that no one has thought of yet. Um like every week we get tons of queries and like oh wait that's a really interesting type of query that uh that no search engine could do right now. And and eventually we'll try to you know handle all the the queries that are possible. But there there's so many new types of queries now because we have these AI systems and the stakes like the the expectations have just gotten way higher. Okay. So now you we end our story. uh with the same slide a one API to get any information from the web. So again like EXO is trying to if you go back like handle not just like the keyword queries but also the semantic queries and also the super complex queries and eventually all queries. Um we we want one API that could like give these AI systems whatever knowledge they want. You have the AI and you have Exa providing uh the knowledge. Oh, I only have four minutes. Okay. Um okay. So that's let's see. Oop. How do I go to a different part of my computer? Uh, if I change to the code editor, how do I do that? Let's see. What? Oh, it's there. Oh, but I can't see it. So weird. Oh, cool. Okay. Okay. Um, there we go. Okay, cool. Well, first of all, I just just very quick exploration of this is our our search dashboard, we could try different queries. I would just point out like in the search uh API endpoint. Uh you know, we expose lots of different toggles. So, first of all, you just try out a query and get uh it shows you the code and it gets you uh a list of results. Uh and it exposes tons of different types of filters that you might want to do. For example, like number of results, 10, 100, a thousand, whatever it is. Uh you could have like date ranges or you know, I only want to search over these domains. And it's a lot of toggles, but I think the point is actually you want the toggles because your AI is actually going to be calling this. You want a search engine that gives you full control. Um, and we have like neural and keyword search. So you could try different ones. Um, okay, let me quickly jump the the code. Okay, so I prepared this like code uh agent.py. So we made this agent uh agent Mark and Mark loves to make markdown out of things. Anything you give it, it will make markdown. Mark will make markdown. Uh and so in this case uh we're going to here well I guess in this case let's try uh this query uh personal site of engineer in San Francisco who likes information retrieval. Uh well this is this is the kind of query that neural would be a lot better at. What? Okay. Save it. Oh, wrong the wrong agent. Okay, so it's just it's making a query to get like a list of personal sites of engineers San Francisco who like information retrieval and and mark the agent is just making a markdown output of that. That's a very neural type query. You also might want to do uh a different type of query which is like a more keyword heavy one. Let's see like um GitHub let me and my GitHub so okay so here I would want to make a keyword query so you just change the keyword search so it's going to get information from my from my GitHub using keyword search because this is a very typical like Google like search that would work well right oh god I'm running this wrong one okay cool that's information about Wicks, GitHub. Um, and then, okay, so when you're actually building an agent, you're going to be combining lots of different types of searches. So, neural searches and keyword searches, uh, and all sorts of other searches that X exposes. So, like the right agent in the future is going to be this system that decides what type of search it needs, uh, for, uh, what whatever the user says, like be like, oh, okay, I'm going to make like a neural search to get a list of things, and then for each one, I'm going to do a keyword search. Right? You want to give the uh, agent like just full access to the world's information in however way it wants. uh not just keyword search but also all these other things. Um and so here I oneshotted with 03 a GitHub agent which combines these two queries. So first it'll because I want you know I want to get the GitHub of every uh engineer in San Francisco who likes information retrieval. Uh so the agent will make uh a neural search to get a list of people extract the names and then search those using a keyword search to get their GitHubs. And then if you run that here, it's just getting 10 results, but we could, you know, with Exo, we could do 100 or a thousand if you're on an enterprise plan. Um, so now it's getting all the GitHub info. Cool. So that's just a example. Um, and yeah, I mean there are lots of other things that you could do with Exa like um, we actually just today launched this research endpoint um, where it will actually do like as much searches in the NL LM calls in the background to get you that perfect report or that perfect structured output for the thing you asked for. So it's kind of like a deep research API and it state-of-the-art deep research API. Um, cool. That is the talk. I hope that was interesting. Thank you. [Music]