Building Alice’s Brain: an AI Sales Rep that Learns Like a Human - Sherwood & Satwik, 11x

Channel: aiDotEngineer
Published at: 2025-07-29
YouTube video id: KWmkMV0FNwQ
Source: https://www.youtube.com/watch?v=KWmkMV0FNwQ
[Music]
Okay, thanks everyone for coming today.
Uh, so today's talk is called Building
Alice's Brain. How we built an AI sales
rep that learns like a human.
Uh, my name is Sherwood. I am one of the
tech leads here at 11X. I lead
engineering for our Alice product and
I'm joined by my colleague Saw.
So 11X for those of you who are
unfamiliar is a company that's building
digital workers for the go to market
organization. We have two digital
workers today. We have Alice who is our
AI SDR and then we also have Julian who
is our voice agent and we have more
workers on the way.
Today we're going to be talking about
Alice specifically and actually uh
Alice's brain or the knowledge base
which is effectively her brain.
So let's start from the basics. Uh what
what is an SDR? Well, an SDR is is a
sales development representative if
you're not familiar. I know that's a
room full of engineers, so I thought I
would start with the basics. And this is
essentially an entry-level sales role.
This is the kind of job that you might
get uh right out of school. And your
responsibilities basically boil down to
three things. First, you're sourcing
leads. These are people that you'd like
to sell to. Then you're contacting them
or engaging them across channels. And
finally, you're booking meetings with
those people. So your goal here is to
generate positive replies and meetings
booked. These are the two uh key metrics
for an SDR.
And a lot of an SDR's job boils down to
writing emails like the one that you see
in front of you right now. This is
actually an email that Alice has written
and uh it's an example of the type of uh
type of work output that Alice has. Uh
Alice sends about 50,000 of these emails
to uh in a given day and that's in
comparison to a human SDR who would send
20 to 50. Uh and Alice is now running
campaigns for about 300 different uh
business organizations.
So before we go any further, I want to
define some terms because since we work
at 11X, we have our customers but then
our customers also have their customers.
So things get a little confusing. Uh
today we'll be using the term seller to
refer to the company that is selling
something through Alice. That is our
customer. And then we'll be using the
term lead to refer to the person who's
being sold to.
And here's what that looks like as a
diagram. You can see the seller is
pushing context about their business.
These are the the products that they
sell or the uh case studies that they
have that they can reference in emails.
She they're pushing that to Alice and
then Alice is then using that to
personalize emails for each of the leads
that she contacts.
So there are two requirements that Alice
needs to uh in order to succeed in her
role. The first is that she needs to
know the seller, the products, the the
services, the case studies, the pain
points, the value props, the ICP. And
the second is that she needs to know the
lead, uh their role, their
responsibilities, what they care about,
what other solutions they've tried, uh
pain points that they might have be
experiencing, the company they work for.
And today we're going to be really
focused on knowing the seller.
So in our in the old version of our
product, the seller would be responsible
for pushing context about her uh about
their business to Alice. And they did so
through a manual experience uh called
the library. And here you could see what
it looks like there where the library
shows uh all of the different products
and offers that are available for this
business that uh Alice can then
reference when she writes emails. The
user would have to enter details about
all every individual product and service
and all of the pain points and solutions
and value props associated with them in
our dashboard and including these
detailed descriptions. And those
descriptions would uh were were
important to get right because these
actually get included in the context for
the emails or for Alice when she writes
the emails.
Then later on during campaign creation,
this is what it looks like to to create
a campaign. And you can see we have a
lead in the top left and the user is
selecting the different offers that
they've defined from the library in the
top right that and these are the offers
that Alice has access to when she's
generating her emails.
We had a lot of problems with this user
experience and the first one was it was
just extremely tedious. It was a really
bad and and and cumbersome user
experience. The user had to enter a lot
of information and that created this
onboarding friction where uh users
couldn't actually run campaigns until
they hadn't filled out their library.
And finally, the emails that we were
generating using this approach were just
sub-optimal. Users would have to either
choose between too few email or too few
offers, uh, which meant that, uh, you'd
have irrelevant offers for a given lead,
or too many offers, which means that you
have all of the stuff in the context
window, and Alice just wasn't as smart
when she write writes those emails.
So, how can we address this?
Well, we had an idea which is that
instead of the seller being responsible
for pushing context about the business
to Alice, we could flip things around so
that Alice can proactively uh pull all
of the context about the seller into her
system and then use what'sever whatever
is most relevant when writing those
emails. And that's effectively what we
accomplished with the knowledge base
which we'll tell you more about in just
a moment.
So for the rest of the talk, we're going
to first do a highle overview of the
knowledge base and how it works. Then we
will do a deep dive on the pipeline, the
different steps in our rag system
pipeline.
Then after that we will talk through the
user experience of the knowledge base
and we will wrap up with some lessons
from this project and uh future plans.
So let's start out with an overview. All
right. So overview, what is knowledge
base, right? It's basically a way for us
to kind of get closer to a human
experience. Like if a hum if you're
training a human SDR, you would kind of
get them in and then you will basically
dump a bunch of documents on them and
then they ramp up throughout a period of
like weeks or months. Um, and you can
basically check in on their prog
progress. Um, and similar to that,
knowledge base is basically a
centralized repository on our platform
for the seller info and then users can
kind of come in, dump all their source
material and then we are able to
reference that information at the time
of message generation. Um, now what
resources do SDRs care about? Here's a
little glimpse into that. Marketing
materials, case studies, uh, sales
calls, press releases, you know, and a
bunch of other stuff. Um, now, how do we
bucket these into categories that we're
actually going to parse? Uh, well, we
created documents and images, websites,
and then media, audio, video, and you're
going to see why that's important.
So, here's an overview of what the
architecture looks like. It starts off
with the user uploading something any
document or resource in the client and
then we save it to our S3 bucket and
then send it to the back end um which
then you know creates a bunch of
resources in our DB and then kicks off a
bunch of jobs depending on the resource
type and the vendor selected. Now the
vendors are asynchronously doing the
parsing. Once they're done, they send a
web hook to us which we consume via
ingest and then once we've consumed that
web hook, we take that parsed uh
artifact that we get back from the
vendors and then we store it in our DB
and then at the same time upsert it to
pine cone and embed it. Um, and then
eventually once we store it in local DB,
we have like a UI update and then
eventually our agent can query pine
cone, our vector DB for that stored
information that we just put in. So now
that we have a high level of
understanding of how the knowledge base
works, let's dig into each individual
step in the pipeline. There are five
different steps in the pipeline. The
first is parsing. Then there's chunking.
Then there's storage. Then there's
retrieval. And finally, we have
visualization, which will uh sounds a
little untraditional, but we'll cover it
in a in a moment. So, let's start with
parsing. Uh what is parsing? I think
that we probably all take this for
granted, but it's worth defining.
Parsing is the process of converting a
non-ext resource into text. And the
reason that this is necessary is
because, as we all know, language
models, they speak text. So in order to
make information that is represented in
a different form like a PDF or an MP4
file or a or an image legible or useful
to the LLM, we need to first convert it
to text. And so one way of thinking
about parsing is it's the process of
making non-ext information legible to a
large language model. Um and we do have
multimodal models that are one solution
to this, but there are lots of
restrictions on multimodal models that
make it u that make parsing still
relevant.
So to illustrate that we have the five
different document types or resource
types that we mentioned momentarily ago
uh going through our parsing process and
coming out is actually markdown which is
a type of text that as we all know
contains some structural information and
formatting which is actually
semantically semantically meaningful and
useful.
Let's talk about the process of how we
implemented parsid and the the short
answer is that we did not we didn't want
to build this from scratch and we had a
few different reasons for doing this.
The first is that you just saw that we
had five different resource types and a
lot of different file types within each
of them. We thought it was going to be
too many and we thought it was going to
be too much work. We wanted to get to
market quickly. Um the last reason was
that we just weren't that confident in
the outcome. There are vendors who
dedicate their entire company to
building an effective parsing system for
a specific resource type. We didn't want
our team to to have to become
specialists in in parsing for each one
of these resource types and to build a a
parsing system for that. We thought that
maybe if we tried to do this, the
outcome actually just wouldn't be that
that successful. So, we chose to work
with a vendor and here are a bunch of
the vendors that we we came across. You
can find 10 or 20 or 50 with just a
quick Google search, but these are some
of the leaders that we evaluated
and in order to make a decision, we came
up with some requirements and three
specific requirements. The first was
that we needed support for our necessary
resource types. That goes without
saying. We also wanted markdown output.
And then finally, we wanted this vendor
to support web hooks. We wanted to be
able to receive that output in a
convenient manner.
A few things that we didn't consider to
start out with. Accuracy.
Crazy. We didn't consider accuracy. We
didn't consider either accuracy or
comprehensiveness. Our assumption here
was that most of the vendors that are
leaders in the market are going to be
within a reasonable band of accuracy and
comprehensiveness. And accuracy would
refer to whether or not the extracted
output is actually matches the the
original resource. Comprehensiveness on
the other hand is the amount of
extracted information that is uh
available um in the in the final output.
The last thing that we didn't really
consider was cost uh to be honest and
this was because we were this system was
pre-production. We didn't have real
production data yet and we didn't know
uh what our usage would be. And so we we
figured what we would do is would come
back and optimize cost once we had real
usage data.
So on to our final selections for
documents and images. We chose to work
with llama parse which is a llama index
product. Uh I think Jerry was up here
earlier today. Uh and the reasons that
we chose to work with llama parse was
first it supported the most number of
file types out of any document parsing
solution we could find. And second their
support was really great. Jerry and his
team were were were were quick to get in
a Slack channel with us. I think within
just a couple of hours of us doing an
initial evaluation.
And with Llama Parse, we're able to turn
documents like this PDF of a 11X sales
deck into a markdown file like the one
you see on the right.
For websites, we chose to work with
Firecrawl. The other main vendor that we
were considering was Tavi. And this is
actually not really a a major knock on
Tavi. For Firecrawl, we chose to work
with them because first we were
familiar. we had already worked with
them on a previous project. And
secondly, Taval's crawl endpoint, which
is the endpoint that we would have
needed for this project, was still in
development at the time. So, it wasn't
something we could actually use.
And similar to uh llama parse with t
with fire crawl, we are able to take a
website like this homepage that you see
here and turn it into another markdown
document.
Then we have audio and video. And for
audio and video, we chose to work with a
newer uh upstart vendor called
Cloudglue.
And the reasons that we chose to work
with Cloud Glue were first they
supported both audio and video, not just
audio. And second, they were actually
capable of extracting information from
the video itself as opposed to just
transcribing the video and giving us
back a markdown file that contains the
transcript of the audio.
And so with Cloud Glue, we're able to
turn uh YouTube videos and MP4 files and
other video formats into markdown like
you see on the right. So now that
everything is marked down, we move on to
the next step, which is chunking. All
right, markdown. Let's go. Now,
basically, we have a blob of markdown,
right? And we want to kind of break it
down into like semantic entities that we
can embed and put it in our vector DB.
At the same time, we want to uh protect
the structure of the markdown because it
contains some meaning inherently like
something's a title versus something's a
paragraph. There is inherent meaning
behind that. Um, so we're splitting
these long blobs of text like 10-page
documents into chunks that we can
eventually retrieve uh after we've
embedded and stored them in a vector DB,
right? And now basically we can like
take all of this and we're thinking
about how we can you know split a long
document into chunks, right? So chunking
strategies um you have various things
that you can do. You can split on
tokens, you can split on sentences, you
can also split on markdown headers,
right? And then you can do like LLM
calls and have an LLM split your
document into chunks, you know, or any
combination of the above. Um, now what
you want to ask yourself when you're
deciding on a chunking strategy is like
um what kind of logical units am I
trying to preserve in my data, right?
What do I eventually want to extract
during my retrieval, right? what
strategy will keep them intact and at
the same time you're able to
successfully embed them and store them
in whatever DB you want. Um so and then
should I try a different strategy for
different resource types we have like we
have to deal with PDFs, powerpoints,
videos, right? Um and then eventually
what kinds of queries or retrieval
strategies am I expecting? Um and then
we ended up with like a combination of
all the three like all the things that
we mentioned. So we split on markdown
headers and then we kind of a waterfall.
So because we want our like records in
our vector DB to be a certain token
count. So we split our markdown headers
and then we split on sentences and then
eventually we split on tokens and then
yeah it's like worked well for us for
all types of documents. Um and it has
successfully preserved our markdown
chunks that we can kind of cleanly show
in the UI. Um, and it also prevents
super long chunks which are, you know,
diluting the meaning behind your
document if you end up with that. Okay,
so we have split all of our markdown
into individual chunks. It's now time to
put those chunks somewhere. We're going
to store them. Let's talk about storage
technologies.
So for storage technologies, I'm sure
everyone is like here for the rag
section. So they think that we're using
a vector database. We actually are using
a vector database. But to be pedantic,
rag is retrieval augmented generation.
So we all know that uh anytime you're
retrieving context from an external
source whether it's a graph database or
elastic search or uh a file in the file
system that also qualifies as rag. Um
some of the other options you can use
for rag uh I just mentioned a graph
database document databases uh
relational databases key value stores
you could even use object storage like
S3. In our case, we did use a vector
database and that's because we wanted to
do sim similarity search which is what
vector databases are are built for and
optimized for.
Once again, we had a lot of options to
choose from. This is uh not a complete
or an exhaustive list.
In the end, we chose to work with a
company called Pine Cone. And the reason
that we chose to work with Pine Cone was
first it was a well-known solution. We
were kind of new to the space and we
thought probably can't go wrong going
with the market leader. It was cloud
hosted so our team wouldn't have to spin
up any additional infrastructure.
It was really easy to get started. They
had great getting started guides and
SDKs.
Uh they had embedding models bundled
with the solution. So for a vector
database typically you have to embed the
information before it goes into the
database. Uh that would require the use
of a third party or an external vector
vector excuse me embedding model. And uh
with Pine Cone, we didn't actually have
to go find another embedding model
provider or host our own embedding
model. We just used the one that they
provide. And last but not least, their
customer support was awesome. They got
on a lot of calls with us, helped us
analyze different vector data database
options and think through a graph
databases and graph rag whether that
made sense for us.
So retrieval, the rag part of the rag
workflow that we just built, right? Um
you'll see that there's actually an
evolution of different rag techniques
over the last year. We started off with
just traditional rag which is kind of a
play on you're pulling information and
then enriching your system prompt for an
LLM API call right and then eventually
that turned into an agentic rag form
where now you have all these tools for
getting information retrieval and then
you attach those tools to whatever
agentic flow that you have and then it
calls the tool as a part of its larger
flow. Right now something we we're
seeing emerge in the last couple of
months is deep research rack where now
you have these deep research agents
which are coming up with a plan and then
they execute them and the plan may
contain one or many steps of retrieval.
Right? These deep research agents can go
broad or deep depending on the context
needs and they can evaluate whether or
not they want to do more retrieval. Um
we ended up building a deep research
agent. Um we actually used a company
called Leta. Leta is a cloud agent
provider and they're really easy to
build with. Um how it works basically we
pass in the lead information to our
agent and then it basically comes up
with a plan. plan contains one or many
context retrieval steps and then
eventually you know does the tool call
summarizes the results and then
generates an answer for us in a nice
clean Q&A manner right and then this is
kind of how it looks like for a system
with two questions that we ask
um now on to visualization the most uh
mysterious part of the pipeline so what
does visualization have to do with a a
rag or ETL pipeline Um, for more
context, our customers are trusting
Alice to represent their business. They
really want to know that Alice knows her
stuff, that she actually knows the
products that they sell, and she's not
going to lie about case studies or
testimonials or make things up about the
pain points that they address. So, how
can we reassure them? In our case, we
came up with a solution, which is to let
al let users peek into Alice's brain.
Get ready.
This is what that looks like.
We have a a interactive 3D visualization
of the knowledge base available in the
product. What we've done here is taken
all of the vectors vectors from our uh
pine cone vector database and uh
collapsed or actually excuse me I think
the correct term is projected them down
to just three dimensions. So we're going
to render them as nodes in
threedimensional space um um um with
using um uh and once the nodes are
visible in this space you can click on
any given node to view the associated
chunk. This is one of the ways that uh
for example our sales team or support
team will demonstrate Alice's knowledge.
Now how does it look like in the actual
UI? Right? Basically you start off with
this nice little modal you know you drop
in your URLs, your web pages, your
documents, your videos and then you
click learn and then it kind of shows up
nicely in the UI. Um you have all the
resources there and then you have the
ability to interrogate Alice about what
she knows of your knowledge base. Right?
It's a really nice agent that we built
again using Leta. And here's how it
looks like in the campaign creation
flow. You see that on the left hand side
we have the knowledgebased content
showing up as a nice Q&A where you can
click on the questions and it shows you
a drop down of the chunks that we
retrieved and these were used as a part
of the messaging flow.
So now with that we have achieved our
goal. Our agent is closer to a human
than being an email. Right? We are now
we are now basically uh emulating how
you onboard a human SDR. You dump in a
bunch of context and they just know.
So in conclusion, the knowledge base was
a pretty revolutionary project for our
product and really changed the user
experience and also leveled up our team
a lot. Uh we learned a lot of lessons.
It was hard to create this slide, but
there are just three that I want to
highlight for you today. The first was
that rag is complex. It was a lot harder
than we thought it was going to be.
There were a lot of micro decisions made
along the way, a lot of different
technologies we had to evaluate.
Supporting different research types was
hard. Hopefully, you all have a better
appreciation of how complicated RAD can
be.
The second lesson was that you should
first get to production before
benchmarking and then you can improve.
And the idea here is that with all of
those decisions and vendors to evaluate,
uh, it can be hard to get started. So we
recommend just getting something in
production that satisfies the product
requirements and then establishing some
real benchmarks which you can use to
iterate and improve. And the last
learning here was that you should lean
on vendors. You guys are all going to be
buying solutions and they're going to be
fighting for your business. Make them
work for it. Make them teach you about
the different uh the different offerings
and why their solution is better. And so
our future plans are to first track and
address hallucinations in our emails.
Evaluate parsing vendors on accuracy and
completeness, those metrics that we uh
identified earlier. Experiment with
hybrid rag, the introduction of a graph
database alongside our vector database.
And finally, to just focus on reducing
cost across our entire pipeline.
And if any of this sounds interesting to
you, we are hiring. So, please reach out
to either Sautwick or myself. Join us.
And uh thank you all for coming today.
[Music]