Building Alice’s Brain: an AI Sales Rep that Learns Like a Human - Sherwood & Satwik, 11x
Channel: aiDotEngineer
Published at: 2025-07-29
YouTube video id: KWmkMV0FNwQ
Source: https://www.youtube.com/watch?v=KWmkMV0FNwQ
[Music] Okay, thanks everyone for coming today. Uh, so today's talk is called Building Alice's Brain. How we built an AI sales rep that learns like a human. Uh, my name is Sherwood. I am one of the tech leads here at 11X. I lead engineering for our Alice product and I'm joined by my colleague Saw. So 11X for those of you who are unfamiliar is a company that's building digital workers for the go to market organization. We have two digital workers today. We have Alice who is our AI SDR and then we also have Julian who is our voice agent and we have more workers on the way. Today we're going to be talking about Alice specifically and actually uh Alice's brain or the knowledge base which is effectively her brain. So let's start from the basics. Uh what what is an SDR? Well, an SDR is is a sales development representative if you're not familiar. I know that's a room full of engineers, so I thought I would start with the basics. And this is essentially an entry-level sales role. This is the kind of job that you might get uh right out of school. And your responsibilities basically boil down to three things. First, you're sourcing leads. These are people that you'd like to sell to. Then you're contacting them or engaging them across channels. And finally, you're booking meetings with those people. So your goal here is to generate positive replies and meetings booked. These are the two uh key metrics for an SDR. And a lot of an SDR's job boils down to writing emails like the one that you see in front of you right now. This is actually an email that Alice has written and uh it's an example of the type of uh type of work output that Alice has. Uh Alice sends about 50,000 of these emails to uh in a given day and that's in comparison to a human SDR who would send 20 to 50. Uh and Alice is now running campaigns for about 300 different uh business organizations. So before we go any further, I want to define some terms because since we work at 11X, we have our customers but then our customers also have their customers. So things get a little confusing. Uh today we'll be using the term seller to refer to the company that is selling something through Alice. That is our customer. And then we'll be using the term lead to refer to the person who's being sold to. And here's what that looks like as a diagram. You can see the seller is pushing context about their business. These are the the products that they sell or the uh case studies that they have that they can reference in emails. She they're pushing that to Alice and then Alice is then using that to personalize emails for each of the leads that she contacts. So there are two requirements that Alice needs to uh in order to succeed in her role. The first is that she needs to know the seller, the products, the the services, the case studies, the pain points, the value props, the ICP. And the second is that she needs to know the lead, uh their role, their responsibilities, what they care about, what other solutions they've tried, uh pain points that they might have be experiencing, the company they work for. And today we're going to be really focused on knowing the seller. So in our in the old version of our product, the seller would be responsible for pushing context about her uh about their business to Alice. And they did so through a manual experience uh called the library. And here you could see what it looks like there where the library shows uh all of the different products and offers that are available for this business that uh Alice can then reference when she writes emails. The user would have to enter details about all every individual product and service and all of the pain points and solutions and value props associated with them in our dashboard and including these detailed descriptions. And those descriptions would uh were were important to get right because these actually get included in the context for the emails or for Alice when she writes the emails. Then later on during campaign creation, this is what it looks like to to create a campaign. And you can see we have a lead in the top left and the user is selecting the different offers that they've defined from the library in the top right that and these are the offers that Alice has access to when she's generating her emails. We had a lot of problems with this user experience and the first one was it was just extremely tedious. It was a really bad and and and cumbersome user experience. The user had to enter a lot of information and that created this onboarding friction where uh users couldn't actually run campaigns until they hadn't filled out their library. And finally, the emails that we were generating using this approach were just sub-optimal. Users would have to either choose between too few email or too few offers, uh, which meant that, uh, you'd have irrelevant offers for a given lead, or too many offers, which means that you have all of the stuff in the context window, and Alice just wasn't as smart when she write writes those emails. So, how can we address this? Well, we had an idea which is that instead of the seller being responsible for pushing context about the business to Alice, we could flip things around so that Alice can proactively uh pull all of the context about the seller into her system and then use what'sever whatever is most relevant when writing those emails. And that's effectively what we accomplished with the knowledge base which we'll tell you more about in just a moment. So for the rest of the talk, we're going to first do a highle overview of the knowledge base and how it works. Then we will do a deep dive on the pipeline, the different steps in our rag system pipeline. Then after that we will talk through the user experience of the knowledge base and we will wrap up with some lessons from this project and uh future plans. So let's start out with an overview. All right. So overview, what is knowledge base, right? It's basically a way for us to kind of get closer to a human experience. Like if a hum if you're training a human SDR, you would kind of get them in and then you will basically dump a bunch of documents on them and then they ramp up throughout a period of like weeks or months. Um, and you can basically check in on their prog progress. Um, and similar to that, knowledge base is basically a centralized repository on our platform for the seller info and then users can kind of come in, dump all their source material and then we are able to reference that information at the time of message generation. Um, now what resources do SDRs care about? Here's a little glimpse into that. Marketing materials, case studies, uh, sales calls, press releases, you know, and a bunch of other stuff. Um, now, how do we bucket these into categories that we're actually going to parse? Uh, well, we created documents and images, websites, and then media, audio, video, and you're going to see why that's important. So, here's an overview of what the architecture looks like. It starts off with the user uploading something any document or resource in the client and then we save it to our S3 bucket and then send it to the back end um which then you know creates a bunch of resources in our DB and then kicks off a bunch of jobs depending on the resource type and the vendor selected. Now the vendors are asynchronously doing the parsing. Once they're done, they send a web hook to us which we consume via ingest and then once we've consumed that web hook, we take that parsed uh artifact that we get back from the vendors and then we store it in our DB and then at the same time upsert it to pine cone and embed it. Um, and then eventually once we store it in local DB, we have like a UI update and then eventually our agent can query pine cone, our vector DB for that stored information that we just put in. So now that we have a high level of understanding of how the knowledge base works, let's dig into each individual step in the pipeline. There are five different steps in the pipeline. The first is parsing. Then there's chunking. Then there's storage. Then there's retrieval. And finally, we have visualization, which will uh sounds a little untraditional, but we'll cover it in a in a moment. So, let's start with parsing. Uh what is parsing? I think that we probably all take this for granted, but it's worth defining. Parsing is the process of converting a non-ext resource into text. And the reason that this is necessary is because, as we all know, language models, they speak text. So in order to make information that is represented in a different form like a PDF or an MP4 file or a or an image legible or useful to the LLM, we need to first convert it to text. And so one way of thinking about parsing is it's the process of making non-ext information legible to a large language model. Um and we do have multimodal models that are one solution to this, but there are lots of restrictions on multimodal models that make it u that make parsing still relevant. So to illustrate that we have the five different document types or resource types that we mentioned momentarily ago uh going through our parsing process and coming out is actually markdown which is a type of text that as we all know contains some structural information and formatting which is actually semantically semantically meaningful and useful. Let's talk about the process of how we implemented parsid and the the short answer is that we did not we didn't want to build this from scratch and we had a few different reasons for doing this. The first is that you just saw that we had five different resource types and a lot of different file types within each of them. We thought it was going to be too many and we thought it was going to be too much work. We wanted to get to market quickly. Um the last reason was that we just weren't that confident in the outcome. There are vendors who dedicate their entire company to building an effective parsing system for a specific resource type. We didn't want our team to to have to become specialists in in parsing for each one of these resource types and to build a a parsing system for that. We thought that maybe if we tried to do this, the outcome actually just wouldn't be that that successful. So, we chose to work with a vendor and here are a bunch of the vendors that we we came across. You can find 10 or 20 or 50 with just a quick Google search, but these are some of the leaders that we evaluated and in order to make a decision, we came up with some requirements and three specific requirements. The first was that we needed support for our necessary resource types. That goes without saying. We also wanted markdown output. And then finally, we wanted this vendor to support web hooks. We wanted to be able to receive that output in a convenient manner. A few things that we didn't consider to start out with. Accuracy. Crazy. We didn't consider accuracy. We didn't consider either accuracy or comprehensiveness. Our assumption here was that most of the vendors that are leaders in the market are going to be within a reasonable band of accuracy and comprehensiveness. And accuracy would refer to whether or not the extracted output is actually matches the the original resource. Comprehensiveness on the other hand is the amount of extracted information that is uh available um in the in the final output. The last thing that we didn't really consider was cost uh to be honest and this was because we were this system was pre-production. We didn't have real production data yet and we didn't know uh what our usage would be. And so we we figured what we would do is would come back and optimize cost once we had real usage data. So on to our final selections for documents and images. We chose to work with llama parse which is a llama index product. Uh I think Jerry was up here earlier today. Uh and the reasons that we chose to work with llama parse was first it supported the most number of file types out of any document parsing solution we could find. And second their support was really great. Jerry and his team were were were were quick to get in a Slack channel with us. I think within just a couple of hours of us doing an initial evaluation. And with Llama Parse, we're able to turn documents like this PDF of a 11X sales deck into a markdown file like the one you see on the right. For websites, we chose to work with Firecrawl. The other main vendor that we were considering was Tavi. And this is actually not really a a major knock on Tavi. For Firecrawl, we chose to work with them because first we were familiar. we had already worked with them on a previous project. And secondly, Taval's crawl endpoint, which is the endpoint that we would have needed for this project, was still in development at the time. So, it wasn't something we could actually use. And similar to uh llama parse with t with fire crawl, we are able to take a website like this homepage that you see here and turn it into another markdown document. Then we have audio and video. And for audio and video, we chose to work with a newer uh upstart vendor called Cloudglue. And the reasons that we chose to work with Cloud Glue were first they supported both audio and video, not just audio. And second, they were actually capable of extracting information from the video itself as opposed to just transcribing the video and giving us back a markdown file that contains the transcript of the audio. And so with Cloud Glue, we're able to turn uh YouTube videos and MP4 files and other video formats into markdown like you see on the right. So now that everything is marked down, we move on to the next step, which is chunking. All right, markdown. Let's go. Now, basically, we have a blob of markdown, right? And we want to kind of break it down into like semantic entities that we can embed and put it in our vector DB. At the same time, we want to uh protect the structure of the markdown because it contains some meaning inherently like something's a title versus something's a paragraph. There is inherent meaning behind that. Um, so we're splitting these long blobs of text like 10-page documents into chunks that we can eventually retrieve uh after we've embedded and stored them in a vector DB, right? And now basically we can like take all of this and we're thinking about how we can you know split a long document into chunks, right? So chunking strategies um you have various things that you can do. You can split on tokens, you can split on sentences, you can also split on markdown headers, right? And then you can do like LLM calls and have an LLM split your document into chunks, you know, or any combination of the above. Um, now what you want to ask yourself when you're deciding on a chunking strategy is like um what kind of logical units am I trying to preserve in my data, right? What do I eventually want to extract during my retrieval, right? what strategy will keep them intact and at the same time you're able to successfully embed them and store them in whatever DB you want. Um so and then should I try a different strategy for different resource types we have like we have to deal with PDFs, powerpoints, videos, right? Um and then eventually what kinds of queries or retrieval strategies am I expecting? Um and then we ended up with like a combination of all the three like all the things that we mentioned. So we split on markdown headers and then we kind of a waterfall. So because we want our like records in our vector DB to be a certain token count. So we split our markdown headers and then we split on sentences and then eventually we split on tokens and then yeah it's like worked well for us for all types of documents. Um and it has successfully preserved our markdown chunks that we can kind of cleanly show in the UI. Um, and it also prevents super long chunks which are, you know, diluting the meaning behind your document if you end up with that. Okay, so we have split all of our markdown into individual chunks. It's now time to put those chunks somewhere. We're going to store them. Let's talk about storage technologies. So for storage technologies, I'm sure everyone is like here for the rag section. So they think that we're using a vector database. We actually are using a vector database. But to be pedantic, rag is retrieval augmented generation. So we all know that uh anytime you're retrieving context from an external source whether it's a graph database or elastic search or uh a file in the file system that also qualifies as rag. Um some of the other options you can use for rag uh I just mentioned a graph database document databases uh relational databases key value stores you could even use object storage like S3. In our case, we did use a vector database and that's because we wanted to do sim similarity search which is what vector databases are are built for and optimized for. Once again, we had a lot of options to choose from. This is uh not a complete or an exhaustive list. In the end, we chose to work with a company called Pine Cone. And the reason that we chose to work with Pine Cone was first it was a well-known solution. We were kind of new to the space and we thought probably can't go wrong going with the market leader. It was cloud hosted so our team wouldn't have to spin up any additional infrastructure. It was really easy to get started. They had great getting started guides and SDKs. Uh they had embedding models bundled with the solution. So for a vector database typically you have to embed the information before it goes into the database. Uh that would require the use of a third party or an external vector vector excuse me embedding model. And uh with Pine Cone, we didn't actually have to go find another embedding model provider or host our own embedding model. We just used the one that they provide. And last but not least, their customer support was awesome. They got on a lot of calls with us, helped us analyze different vector data database options and think through a graph databases and graph rag whether that made sense for us. So retrieval, the rag part of the rag workflow that we just built, right? Um you'll see that there's actually an evolution of different rag techniques over the last year. We started off with just traditional rag which is kind of a play on you're pulling information and then enriching your system prompt for an LLM API call right and then eventually that turned into an agentic rag form where now you have all these tools for getting information retrieval and then you attach those tools to whatever agentic flow that you have and then it calls the tool as a part of its larger flow. Right now something we we're seeing emerge in the last couple of months is deep research rack where now you have these deep research agents which are coming up with a plan and then they execute them and the plan may contain one or many steps of retrieval. Right? These deep research agents can go broad or deep depending on the context needs and they can evaluate whether or not they want to do more retrieval. Um we ended up building a deep research agent. Um we actually used a company called Leta. Leta is a cloud agent provider and they're really easy to build with. Um how it works basically we pass in the lead information to our agent and then it basically comes up with a plan. plan contains one or many context retrieval steps and then eventually you know does the tool call summarizes the results and then generates an answer for us in a nice clean Q&A manner right and then this is kind of how it looks like for a system with two questions that we ask um now on to visualization the most uh mysterious part of the pipeline so what does visualization have to do with a a rag or ETL pipeline Um, for more context, our customers are trusting Alice to represent their business. They really want to know that Alice knows her stuff, that she actually knows the products that they sell, and she's not going to lie about case studies or testimonials or make things up about the pain points that they address. So, how can we reassure them? In our case, we came up with a solution, which is to let al let users peek into Alice's brain. Get ready. This is what that looks like. We have a a interactive 3D visualization of the knowledge base available in the product. What we've done here is taken all of the vectors vectors from our uh pine cone vector database and uh collapsed or actually excuse me I think the correct term is projected them down to just three dimensions. So we're going to render them as nodes in threedimensional space um um um with using um uh and once the nodes are visible in this space you can click on any given node to view the associated chunk. This is one of the ways that uh for example our sales team or support team will demonstrate Alice's knowledge. Now how does it look like in the actual UI? Right? Basically you start off with this nice little modal you know you drop in your URLs, your web pages, your documents, your videos and then you click learn and then it kind of shows up nicely in the UI. Um you have all the resources there and then you have the ability to interrogate Alice about what she knows of your knowledge base. Right? It's a really nice agent that we built again using Leta. And here's how it looks like in the campaign creation flow. You see that on the left hand side we have the knowledgebased content showing up as a nice Q&A where you can click on the questions and it shows you a drop down of the chunks that we retrieved and these were used as a part of the messaging flow. So now with that we have achieved our goal. Our agent is closer to a human than being an email. Right? We are now we are now basically uh emulating how you onboard a human SDR. You dump in a bunch of context and they just know. So in conclusion, the knowledge base was a pretty revolutionary project for our product and really changed the user experience and also leveled up our team a lot. Uh we learned a lot of lessons. It was hard to create this slide, but there are just three that I want to highlight for you today. The first was that rag is complex. It was a lot harder than we thought it was going to be. There were a lot of micro decisions made along the way, a lot of different technologies we had to evaluate. Supporting different research types was hard. Hopefully, you all have a better appreciation of how complicated RAD can be. The second lesson was that you should first get to production before benchmarking and then you can improve. And the idea here is that with all of those decisions and vendors to evaluate, uh, it can be hard to get started. So we recommend just getting something in production that satisfies the product requirements and then establishing some real benchmarks which you can use to iterate and improve. And the last learning here was that you should lean on vendors. You guys are all going to be buying solutions and they're going to be fighting for your business. Make them work for it. Make them teach you about the different uh the different offerings and why their solution is better. And so our future plans are to first track and address hallucinations in our emails. Evaluate parsing vendors on accuracy and completeness, those metrics that we uh identified earlier. Experiment with hybrid rag, the introduction of a graph database alongside our vector database. And finally, to just focus on reducing cost across our entire pipeline. And if any of this sounds interesting to you, we are hiring. So, please reach out to either Sautwick or myself. Join us. And uh thank you all for coming today. [Music]