AI Engineer World’s Fair 2025 - Tiny Teams

Channel: aiDotEngineer
Published at: 2025-06-05
YouTube video id: xhKgTkzSmuQ
Source: https://www.youtube.com/watch?v=xhKgTkzSmuQ
in
your Okay. Um, hi everyone. Uh so
excited to have you here today for I
believe the first ever edition of Tiny
Teams here at the AI Engineer Summit
courtesy of our friend uh Sean Wang aka
Swixs. Um my name is Britney Walker. I'm
a GP at a venture capital fund called
CRV. Uh we've been in business for 55
years um backing teams at the seed and
series A stage across 19 funds. Now we
currently are investing out of a billion
dollar fund. Um we're investing in folks
like Verscell, Postman, Kong,
Browserbase, Voyage AI across the
landscape of infrastructure and hence
I'm here today with you all. Um we have
a super exciting track for you namely
kind of pointing to a trend that we've
all seen in the past year and a half,
two years of AI, which is that you know
small teams can build insanely
successful projects in a way that
probably was never possible previously.
Um and so here to kick things off for us
is Eric Simons from Stack Blitz with
their product Bolt.net. new.
All right. Is this thing on? Okay. How's
it going everyone? How y'all doing? Um
yeah, let's go. Let's go. Let's get the
vibes going. Let's get the vibes going.
Um yeah, excited to to chat uh here
today. I I I you know by the end of this
what I hope that you get out of it is uh
maybe some some advice I wish I had had
uh before kind of trying to hold on to
the tail of the dragon um with the past
uh months of of what we've been up to
and uh you know so I think for us you
know I how many people here have heard
of uh bolt new by the way just kind of
curious oh wow dang I guess I'm I'm
still used to being like has anyone
heard of stack blitz and there'd be like
two hands and that sort of thing um okay
so cool so everyone's used this thing
you're aware of you tried it out Um so
uh is anyone aware of how long we were
around as a company before we launched
Bolt. Seven years. Yeah. Um so to to
kind of hit the graph of this thing. So
if you if you rewind uh you know the
x-axis uh way back seven years the the
ARR starting at the bottom of this
that's in November uh sorry that's
October of last year is when that thing
starts. The ARR was 0.7 million. That's
that's over seven years what it had
gotten up to right. And at the time that
we launched Bolt, uh, you know, we were
a team of less than 20 people. And when
we put it online, we had absolutely no
idea what was going to happen. Like we
thought, uh, we were getting ready to
shut down the company actually at the
end of last year. Um, this was like the
last, you know, uh, not like pivot of
the company because it's the same core
technology that was used to build this
product that we've been building for
seven years. But we we couldn't figure
out a way to kind of create a commercial
offering that made sense at a venture
scale. been around for a long time and
uh so you know our expectations were
like if we can add $100,000 of ARR by
the end of the year with this thing that
would be that would be gamechanging
right obviously uh kind of beyond our
wildest expectations of what happened
and since then we've over doubled it um
but the I think that to me the the
really crazy thing about the graph
you're looking at is how clean of of a
ramp that is right like there's not this
like jagged edges during the the
insanity of of the early days and the
product we put online was really it's
it's like those race cars like it was an
MVP. It's like those race cars where
they strip everything out and it's just
like metal. There's no back seat.
There's no side seat. It's just like
this, you know, that's kind of what the
product was like. So, the fact that, you
know, our team was able to scale this uh
was just unbelievably impressive. And
so, that's I kind of want to talk about
what that looked like and how to
structure teams to be able to actually
rally together and um you know, be able
to to scale what would normally take at
least a year to grow into, right? uh the
best analogy of what it feels like if
you ever seen the movie 300 um during
this time right like that on this
revenue ramp kind of like probably at
the tail end of that we were looking at
probably 30 or 40,000 active customers
at that point you know on month two a
team of less than 20 people so this is
kind of apt you got this small group of
people you know surrounded by just tens
of thousands of of things that are uh
maybe not trying to kill you in our
sense but like but it it felt like that
I mean the support load was was
unbelievable we had no there was not a
person on our team that had you know
success officer support uh in their
title, right? It was my chief of staff
and I largely responded to support
tickets and whatnot. But the the main
thing we were able to make this work
because it was this just incredible uh
camaraderie amongst the people on our
team and we've been working together for
seven years at that point. Um and we
were just extremely aligned and very
lean and very fast and those were not
new things for us. Like that's how we
had been operating all along, right? And
one of the core uh you know philosophies
that we set out like my co-founder and I
uh he's actually a childhood friend of
mine. He and I have been building uh
websites together since for like 20
years now literally since we were like
13 years old. And uh the company we did
before Stacklist, before this one, we
had actually bootstrapped that thing
from the ground up ourselves. Like we we
were broke living on couches. Um and and
when you do that, you really learn how
far a dollar can stretch. And it's very
obvious how most startups are just
incredibly inefficient when you're in
the in the phase of trying to find
product market fit, right? And uh so
this is kind of where this mantra that
that we've had at our company for, you
know, uh almost a decade now really
kicked in, which is you really want a
small number of people with more context
per head because what that means is that
people at the company have more agency.
They can just go and build things. They
don't have to get permission to build
things, right? um there's not this whole
chain of command you have to go through
etc. Everyone's very empowered. Things
just move a lot faster. You can just go
and and you know make uh uh you know
immediate impact which again is really
important when you're dealing with this
sort of like scale, right? Um of course
for startups the name of the game when
you're finding product market fit is you
need to be able to take as many shots on
goal as you possibly can because like
fundamentally getting product market fit
is is just like an enterprise sales
pipeline. Like if you're a sales an
enterprise sales rep and you want to
close a million dollars a pipeline, you
don't go and talk to like three people
and assume you're going to close a
million dollars of pipeline. You're
talking to like 100 200 people in your
top of funnel. Of those, you know, half
of those maybe take the next call. Of
those, you know, maybe 10 are remaining
that are actually warm leads. Of those,
you close three or something, right?
Same thing with like, you know, building
products and and building startups in
the early phases. You need to stick
around as long as you possibly can,
which means you need a lower burn rate.
You do not want to have more people at
the company, right? Um because humans
are the most expensive thing for a
company. Doesn't mean you shouldn't hire
them, but it means that, you know, it's
important for your ability to have a
durable uh you know, enough time to
actually take shots on goal. Case in
point, one of our main competitors of
our previous product when we were in the
IDE space, they got aquired basically
stripped for parts two weeks before we
launched Bolt, right? and we were on
that same trajectory, but purely a
matter of they they didn't have enough
runway to actually get to the other side
of this thing and and they were good.
Like they would have been meaningful
competitors in this space to us, right?
Um anyway, so that's this is you know
there's a whole bunch of reasons this is
important. Those are kind of the main
reasons. Um again this is not new. A lot
of uh you know folks that do startups
repeat this sort of thing. You want to
have people that have a great shared uh
you know shared set of core values that
uh you know where it's low ego, high
trust. Um they're obsessed with making
the user uh successful and underneath
you know chaos, they have grit and
resilience, right?
If if you if you aren't in this sort of
insane situation like we were uh and you
have folks that are already having
trouble with the ups and downs of of a
startup, I will tell you it it would not
have been possible to do what we did um
with folks that that didn't have just
incredible grit and ability to just
check their ego and focus on what really
mattered, right? Um so great people is
what it's always about. Um, so that's I
think from a team perspective, those are
the things that that really uh stick out
to me from, you know, what what allowed
us to uh uh you really scale with the
traction that we saw um and we're seeing
to handle and this this probably applies
to um not just like this crazy extreme
situation that that we are in growing
the company, but in general, you know,
at startups, there's going to be things
times when just everything's on fire,
right? And uh and and a lot of you
probably relate to this where it's like
sometimes it's good things are on fire.
It's just, you know, tons of customers.
Sometimes it's bad things that are on
fire, right? Um and there's just lots of
them. And the question is how do you how
do you prioritize? And the best analogy
that that uh I have uh leaned into like
as an operator is like imagine that
you're you know a fire truck squad. You
have one truck and you're in a town
that's completely on fire. Where do you
start? And the answer is you have to
make hard decisions and choose where the
high impact uh areas are from
infrastructure uh you know and and the
key people that that need to be saved,
right? And it's tough because all these
things it's it's it's hard to gauge uh
sometimes what's actually going to be
the most important thing, but that's the
job, right, of of firefighting. And so
that's uh and a lot of and that and what
you're what you're saying is there are
some fires that are just going to have
to burn and that's okay, right? Um but
if we focus on on saving the right
things, focus on the right things, um
that'll make up for, you know, all the
other things that we that we have to let
go because we simply can't focus on
everything as a small team. But there's
actually an added benefit of that is
that you don't get lost in the million
things. If you if you just hired a whole
bunch of people, you feel like you have
to do all of these things. It turns out
focusing on 10% of the things often gets
you the lion share of of the result that
actually matters. So, it forces you to
actually have clearer thinking and what
you're going to go put your time and
focus into as a team, right? Um, and uh,
you know, I kind of mentioned the story
of for us, you know, we've been around
for a long time, like eight years now as
a company. And you know, over the past
eight years in the valley, there's been
a lot of things that people will say and
believe and you go to things you
gatherings of people and they'll kind of
repeat these same things, then they'll
change all of a sudden. So, a couple of
these like just random examples. Uh,
back when we started 2017, 2018, remote
work was like very, uh, looked down
upon. It was like there's no way you
could do that. Um, my co-founder and I,
we just, you know, the best candidates
we saw were coming in from all around
the world. And so, we uh, and we had
actually gotten an office in SF. We
thought we were going to set up shop
here. Uh, six months into paying this,
you know, $5,000 a month office. You're
like, what are we doing? Like, we
haven't hired a single person here. Uh,
we went fully in on remote in like 2018.
pandemic hits, then the world's like
remote work, this is it. Like how, you
know, blah blah blah. And now we're kind
of flipping back to previous. You need
to have your own thinking, right?
Because if if you just try and follow
whatever, you know, the press or
investors, whatever, say it's it's going
to be a nightmare. You're you're going
to be distracted by a whole bunch of
decisions that fundamentally are not
actually coming from your assessment of
of reality. Um, another great one is to
the topic of tiny teams. If you were
raising money in 20, if you're a company
in 2021, you had investors, they were
screaming at you, ours were, uh, you
should raise more money. You should hire
a whole bunch more people. That's how
this is successful. And then if you
waited 12 months in 2022, they would
come back and they'd say, you need to
lay off a whole bunch of people. You
need to stop spending money. And and you
know, and for us, we were like, we never
were spending money. We never did
increase the headcount, right? So you
you know, it's you want to have these
sort of bets that you make. And I I
don't want to say it's you don't want to
be contrarian for contrarian sake. some
of this stuff uh that that is repeated
actually, you know, tends to be durable
advice. But I would just encourage you
to like think for yourself and don't
just adopt a lot of the hive mind stuff
because the, you know, it seems like the
best companies tend to have independent
decision-m that really allows them to
succeed. So, um, of course, uh, leading
from the front is very important. Again,
this is not a new thing, but what I'll
say um in the first week of Bul being
online, it was
uh it was it was pretty touchandgo
because again, the product was very was
very brittle. Um and it became clear to
me like if if if I don't myself and the
team don't get out and make ourselves
visible to the community and and engage
with them, people are going to churn and
they're going to go away and they're not
they're going to lose belief pretty
quickly because we we have so much work
to do. And so we started running a a
weekly office hour session where we let
all users tune in on uh YouTube and X or
whatever and we just showed them what we
were building. We're like, "Hey, we hear
you. Here's the things we're working on.
Here's where we think they're going to
land." People would ask questions, etc.
And so, you know, again, how do you
smooth that sort of, you know, growth
curve? You you go and do things that
don't scale because user love is hard to
quantify on on, you know, specifically,
but oh my god, it works. And that's
that's how you can really scale um you
know love for a product like this.
Um last thing uh I'll mention you know
as far as like tools that we used um
support is is something that is now like
you can there's a lot of AI tools that
are coming out right that help you scale
uh you know all aspects of your
business. Support has been a huge one
for us. The first two months that we
were online I I mentioned earlier uh my
chief of staff and I were the primary
support people uh spending a lot of our
time doing emails. We ended up um
picking up a tool called Parah Help. If
anyone's heard of those guys, but they
are our the AI assistant called SAM from
those guys is the top rated uh support
assistant um for us and takes out 90% of
our tickets automatically, right? A year
ago, two years ago, we would have had to
hire 50 people to to go and scale that,
right?
um the leverage that that you can have
by integrating AI and there's even
custom things we're doing in our product
you know training our own you know
little uh uh models to help people be
successful within the product experience
that would have required human support
before there's a lot of things you can
do um by not just making AI product but
also building around the entire customer
success journey to be um you know
powered by that
parhelp yeah parel so I think they're
we're one of their customers I think
cursor is using them um couple of just
brilliant uh you know young young guys I
think out of Europe or something running
that company.
Um, and I I mentioned this before with
like kind of leading from the front, but
community. This is something that AI
cannot replace. Going and and actually
talking to users, like creating a space
for users to try out your product and
like learn from each other, um, is so
key. And this has always kind of been
the case, right? But especially now uh
if you're building an AI product, it's
really important that folks can like
learn from each other and learn in a
place where they can get help u you know
from pros right and from the community
themselves because this is another way
you can really scale the customer
experience without having to add
headcount within your company itself
right and so one of the kind of cool
ways that we're doing this I don't know
if anyone's seen um we are throwing the
world's largest hackathon right now
actually for this entire month if you go
to hackathon.dev dev. You can check it
out. We have passed the Guinness World
Record, by the way. This is like
already. So, let's go. We've got 80
something thousand people that are
participating. Um, but basically, we've
got this amazing event going on. We have
dozens of people coming to help uh
provide support and uh you know, as
folks are building out their projects,
trying out the product. Um, and this has
been just the most the craziest ROI
we've ever seen from a a marketing
initiative we've ever done. both due to
the scale but also the thoughtfulness
and like getting augmenting it with both
the AI support and the community support
etc. um this sort of stuff really works,
right? So, um, to kind of wrap up here,
uh, these are, you know, the main
takeaways if you wanted to like take a
photo of, you know, the TLDDR or
whatever. Um, these are kind of the main
things that, you know, stuck out to me
from the the past couple of months of,
uh, of our experience that really made a
difference and and, you know, again,
like I said, it was very touchandgo,
right, for the first, especially the
first two months, just how unexpected
and unprepared we were, you know, for
for what happened. uh without this these
things like this would not have worked
and it wouldn't be working now, right?
Uh and to boil that down, it's like, you
know, you don't want to hire an army,
you want a small number of Spartans,
right? That's kind of the mentality that
we look for when we hire people onto the
team. So, um all right, with that, uh
let's go. Uh so, this is where you can
find me. Um I have to like go to SFO
like immediately after this, but if if
if anyone wants to chat about stuff or
has questions, that's where I am on X
and then that's my my email address
there. Um, I think we have one minute
for questions actually if anyone has a
burning one. There's a microphone up
here if you want to come on up. But
yeah. Yeah.
Hey, how did you decide what to build?
Like did you have a framework for you
know talking to users or did you just
ideate and you know ship product
experiments and see see what stuck or
Yeah. What what was the process there?
Yeah. you're talking about like for like
kind of like how we decided to build
bolt or and even after we we tried out
like uh probably five different things
last year and all of them I mean I think
it's you know all the things I've ever
built that that really seem to stick
with users and resonate always started
with something that I myself thought was
cool you know uh which sounds like very
obvious but there's al I've most of the
things I've built in my career have been
things that sounded good and like it's
like hey this should like maybe increase
our AR are, but it like did
intrinsically wasn't something that I
was
like so so so stoked about that I
couldn't sleep at night. Bolt was one of
those things, you know, and and then we
certainly put in front of users. Uh
people seemed excited. Um and I'll tell
you this, what what the user feedback we
got from the early Bolt sessions before
we launched versus some like launching
stack blitz was the exact same and the
outcomes couldn't have been more
different. Right? So again, it's all
about just like taking shots on goal
because you you just don't know until
you actually get it out into the world.
Um, you can certainly get the early
feedback, but um, you know, it's it's
all about just getting to launch,
getting it out there, and, you know,
iterating as fast as you can.
So, am I cut? Okay. I'm sorry. I'm
sorry. I can't take anymore. Thank you.
Thank you for having me. Um, hopefully
this is helpful.
Thanks so much, Eric, for walking
through the amazing journey that has
been Stack Blitz and now Bolt. Um, next
up we're going to have Sid Bendre come
to the stage to talk about Alie. Um, Sid
is the co-founder of the company and
they're building a portfolio of consumer
products. Um, starting with products
like Quiz Quizard, which you may have
heard of before. Um, one of their
products reached number four on the
AppSource education charts in 2024 and
number five in 2025 alongside other
companies like Dualingo. Um they're
backed by NEO and they're building the
AI infrastructure to build a1 billion
dollar portfolio of consumer software
over the next decade. Um so Sid, please
come up to the stage.
That's true.
Cool. Hey everybody. Uh, sorry about
that delay. I was just trying to get
connected. Um, I'm Sid. I'm one of the
co-founders of a leave and this is the
new lean
startup. We've been seeing a fundamental
shift in how successful companies are
being built. More and more companies are
getting smaller, rounds are getting
delayed, and profitability is being
attained earlier than ever in their
lifetime. A lot of this is being driven
mainly by the advent of AI tooling.
These companies are generating millions
of ARR with teams smaller than most
startups engineering departments. The
age of bloated teams and endless hiring
rounds is over. Welcome to the era of
tiny
teams. First, a bit of background on
Oliv. We're building a family of iconic
consumer software products that we hope
will enable people to live better, more
fulfilling, and productive lives. We are
a tiny team that scaled to a scaled a
portfolio of virally successful products
to $6 million in AR profitably and have
generated over half a billion views
across social media, achieving this with
a tiny team of just four. We're based
out of New York City. And here's a brief
history on
us. On the 26th of January in 2023, we
launched a Quizard AI mobile app. We
launched it with a Tik Tok video that
went viral overnight and generated
million views that turned into 10,000
users in less than 30 hours. We actually
started scaling with no LLM costs. This
is because back then we had the initial
codeex model launch which was in beta
preview. Funny enough, we were cycling
between 10 different accounts from our
friends um just so that we could uh
prompt engineer or generate these AI
outputs. Interestingly enough, Codex,
even though it was meant for as a coding
model, could be prompt engineered for
any open domain conversation. As you all
may know, it ended up being sunset for
abuse. We ended up getting a lot we end
up getting reached out directly by
OpenAI on a few of our different
accounts that we were cycling through as
being one of the top model users um for
the Codex model at the time.
My co-founders and I then graduated and
then we moved to New York City um the
fall of 2023 where we started our back
to school campaign which was a series of
um man on the street videos across
different uh prestigious colleges in the
US. This is when we hit our first
million dollars in ARR and also achieved
profitability within the first nine
months of operating.
We then even had another successful
campaign in the spring of 2024 that got
us all the way to number six in the
charts of education alongside giants
like Dualingo, Photomath and
Golf. We then took all our learnings in
the spring of 24 and double down on a
new product, Unstuck AI, a study
companion tool for students. We were
able to get to a million users in under
nine weeks and generate over quarter
billion views across socials in a month.
A few weeks ago, we were able to get
both products in the top 10 in the
education charts. Unstuck went all the
way up to number three in the education
charts, right under Goth and
Dualingo. We've now also launched in
stealth, our third product, which is our
first product outside the education
domain. It took three weeks to build
thanks to all the blueprints that we've
built in advance. We'll speak more on
this later, and have already reached a
thousand plus users. By the way, it's
already
profitable. Our lean playbook boils down
to three key pillars. Operating
principles that lay the foundation of
leanness, organ organizational structure
that set up the systems for this
leanness and AI tooling augmentation
which optimizes scaling. Let me start
with operating principles which I
believe is the main bedrock for why
we're so lean. It starts with hiring. We
either hire right or not at all. We only
hire 10xer generalists that have
multiple complimentary spikes in uh
similar fields. So for example, our
product engineers are full stack
developers, great product thinkers and
really good at fundamentals of computer
network for example. We also have
marketers who can code. We have
designers who can build and uh the
likes. We try to aim for people whose
complimentary spikes can shape and drive
10x outputs within the team. The second
key principle is profit first mentality.
We are relentless about prioritizing
profits because profit is power and
profit is focus. Profit gives us a clear
mechanism to make all our decisions and
guide a northstar for the company. This
leads to our third principle. Does this
move your KPI? Everyone in the company
owns a KPI. KPI alignment removes
micromanagement because
everyone is focused on moving their
metric week over week. This also means
decisions must be validated against this
KPI. Our fourth principle is continuous
process refinement. For any repeating
process, we always ask how would we do
this better? Is there any way we can
improve? What was wrong about this run
that we this previous run? We view
failures in the company and issues in
the company as systems failures, which
lets us set up a feedback loop for
improving ourselves, improving the
process that we use both on an
operational standpoint, but also a
technical
standpoint. The fifth pillar is super
tools. We're pretty lazy, so we like to
consolidate a lot of our work, don't
learn it twice. We believe in building
compounding benefits by investing in
technical playbooks and operational
blueprints. This allows us to compound
our benefits or compound our learning so
that the benefits can be yielded across
new products. This is exactly how we
were able to hit a million users on
unstuck within nine weeks. Taking
everything we learned over a year and a
half on
Quizard. More on the super tools
concept. For example, one of our super
tools is Launch Darkly. The intended use
case of Launch Darkly is a feature
management platform that helps software
teams control and release features
safely and quickly. Here are some of our
extended use cases. We use Launch Darkly
as a manual traffic load balancer.
Specifically, we we put Launch Arley in
between all our LM calls so that we can
reroute traffic to different LM
providers based on uh hitting rate
limits, different strategic initiatives
or whatever. It just gives us an on
the-fly mechanism for choosing where our
traffic goes and allows us to split
things within rate limits. This was
especially important in the early days
when rate limits were really tight and
also um it was hard to yeah rate limits
were really tight and it was hard to get
um quotas increased on individual
endpoints specifically I'm talking about
Azure
OpenAI the second extended use case is
on the fly infrastructure changes for us
this looks like how on unstuck which
takes in a lot of files to ingest for
specific file formats we have a lot of
waterfall ingestion processes what I
mean by that is We depend on a lot of
thirdparty services that can be
reliable. By using Launch Darkly, we're
able to change the prioritization of
these processes on the fly so that if
one of these third party thirdarty
services goes down, we're able to
reorganize the service on the fly to
make sure it's up and running and
available to our users
worldwide. The third extended use case
is UI modifications and pay paywall
experiments without having code pushes.
We have built an experimentation layer
around launch sharkly which allows us to
run and spin up experiments without
needing to make a code
push. The third or sorry the second
pillar that guides our leanness is our
organizational structure especially in
our organ especially in our engineering
or in the way we hire and we organize
our engineers. For this we look to
Palunteer who successfully scaled across
multiple market segments. We believe
that we're building the consumer version
of Palunteer with our harvester and
cultivator model. Let me explain this
for harvesters. These are product
engineers similar to the Palunteer
deltas of the four deploy software
engineers that own and live and die by
their products. They're living in the
metrics, working on AB experiments,
building features end to end, working
with the marketing team, and effectively
owning the entire products
existence. Harvesters are people who
build products that people actually want
and pay for.
Then we have the
cultivators. Cultivators are AI software
engineers whose main goal is building
the company's agentic operating system.
They're pioneering automation across
different business units including
marketing, design, product with the idea
of expanding infrastructure that affects
all the users everywhere and helps us
win in every market. Cultivators are
creating a foundation that let us ship
and scale faster in any market.
And finally, the last pillar is AIdriven
and AI and tool
augmentation. One important note in
thinking about this is when we think
about hiring, we like to think of tool
use as being something that will allow a
10xer become a 100xer as opposed to the
contrast, which is using tools to fill
gaps and augment the shortcomings of
someone who's not at the standard that
we like to hire for.
With that being said, we use a slew of
products for our day-to-day task
automation for things like script
writing, campaign analysis, operations,
code generation, and communications.
Effectively, by paying for a bunch of
services, we have augmented and enabled
everyone to have their own chief of
staff within the
company. Now, back to the blueprints.
One more thing is we believe heavily in
compounding benefits using the latest
and greatest in AI models. With them
changing so quick and with you having so
many apps out there, do you ever
struggle with like going back through
and and like changing the models that
you've used for some of these apps and
and how do you deal with that? Yeah,
that's a great question. Um I I I think
like a really cool um a really cool
thing is the fact that you can do that.
You can just build an app with an AI
model and then a better AI model comes
out three months later and you can go
and a lot of the time it's like a
oneline change of like let me update
this model and the app just gets way
better or it just unlocks new things. Uh
and so that's something I do frequently
where I'll go back and I'll like even
like relaunch an existing app with a new
AI model or add a tiny feature to it. Um
and so um yeah, I think that's kind of
the superpower of like building with AI
is the fact that you can just kind of
replace these AI models. Thank you.
Yeah. All right. Awesome. Thank you all
so much for coming. I appreciate it.
Thank you so much, Hassan, for walking
through all of that. So impressive that
you do all of this on top of your day
job. I cannot even imagine. Um, okay.
So, our next and final speaker for this
portion of the tiny team session is Max
Broer Herbas from Gum Loop. Um,
previously he did competitive
programming while at McGill and then
also went through IC a little bit over a
year ago. Um, and has achieved just
incredible incredible traction in such a
short time. Uh, now scaling automation
across companies like Instacart, Web
Flow, and Shopify while still having
less than 10 people on the team. So
incredible. So without any further ado,
please welcome Max up to the stage.
Okay, mic is working. Screen is not
working
yet.
Anything I have to do in particular to
make this?
Ah, the one
hanging decoy
wire. Yes. Okay.
Sweet. Okay. Uh there we
go.
So this uh should preview in a second,
but uh yeah, I'm Max. I'm the founder of
Gum Loop. We went through YC a year and
a half ago now, Winter 24. Um we've been
a pretty notoriously small team since
then.
Uh, okay. The preview is not working, so
I'll just do
it like this. Uh, so we've been a pretty
notoriously small team since then. Um,
we raised the series A as a team of two
and are now nine people. But, uh, this
tweet was kind of like the one that
inspired this talk, like how how we
scale to the, uh, the size we hope to be
with fewer than 10 people. I'll be
honest, I tweeted this when I was
extremely caffeinated and and really
thought I was going to rule the world.
Uh we're on on track roughly. Uh we're
less than 10 people and growing really
fast, but um this was also a good
Twitter post for hiring because we
wanted to hire exceptional people and I
think uh working on a small team is
really fun. So, uh, I thought I would go
over, I'm sure at this conference you've
heard a lot about like what AI tools to
use and how to work efficiently with
Cursor and Windsurf, but I was going to
focus on how you actually like once
you're efficient with these AI tools,
how you build a team that's uh, has the
right culture and can actually scale and
do the things you're you're setting out
to do. But, uh, the first thing I was
going to go over was kind of how we got
here. So, I spent like six months
building up a ton of terrible terrible
software. Uh, I made like video game
moderation software. I made ML models to
detect children's age in video games so
that you could se uh separate adults
from children in VR. I made bot
detection software. Um and then as a
side project on top of my side project,
I made the first UI for autog, which was
this like really hyped uh open source
framework that came out right at the
start of the agent craze. And uh
basically I noticed that everyone in
this Discord was excited to use AI, but
they had no idea how to actually clone a
GitHub repo or set things up locally. So
I just spun up like a really ugly UI. I
called it agent hub at the time. I
thought was that it was going to be
GitHub for agents. Uh I thought this was
really genius, but it it was all kind of
built upon the idea that agents were
going to be immediately useful. So we
pivoted pretty quickly after this. But
um I noticed that all of the people who
were asking the agent to do things were
basically just describing complex
workflows. Like if they knew how to
write some Python, they knew how to make
some API calls and some LLM queries,
they could uh basically automate their
entire request. They don't need to like
cross their fingers and hope that the
agent will do it for them. So yeah, that
was the realization. It was my
co-founder and I at this time we just
started uh kind of editing how you could
configure an agent. instead of asking
for everything that you wanted, you
could actually define the steps as a
series of of uh like nodes in a
workflow. Um, and then we got into YC a
few months later. We raised a series A.
Uh, we hired two interns for the summer
and then we raised the series or we
yeah, we raised a seed, then we raised a
series A about like four months later.
And uh, we were just a really small
team, kind of overfunded, but raised a
lot of money so that we could hire the
most exceptional people um, over the
next year. And the the general idea was
just scale with under 10 people because
we we noticed after working at Amazon
and Microsoft that working on a super
small team is really fun. You can just
uh move way faster, not sit in meetings
all the time. Um so now Gum Loop is this
it used to be way uglier, but it's this
workflow automation tool that a bunch of
really large companies are using. Um the
our biggest customers are like
Instacart, uh Shopify rolled this out to
the entire company last week, which uh
broke most of our things, but it's all
back online now. Um, and yeah, and all
of this is 100 100% PLG, so we're not
doing any outbound sales. I think that's
one thing that helps us scale really
quickly if people find your product and
come inbound. You don't have to hire 10
sales reps. So, um, there's definitely a
lot of luck and uh, kind of coincidence
in in this like small team approach that
works for us because we happen to be a
PLG company. Probably wouldn't be as
possible if we were doing like a top-
down sales motion.
Um, so I thought I could go over how we
approach hiring, internal operations,
and then team culture. Uh, these are
like things that we we talk a lot about
internally, my co-founder and I. Um, I
did want to put a disclaimer here. I
don't actually know what I'm talking
about. I I'm trying to figure out if
we're just getting lucky over and over
or if like our approaches are actually
working. But take everything I say with
a grain of salt because uh could be
totally off base and it might ruin your
company if you do what I do.
So the three things that we try to do
internally when we approach hiring are
be super super picky which is painful uh
most of the time productled hiring uh
buzzword that we we've been trying to
coin and then making time to work
together which I'll explain in a second
but um this is a screenshot from the the
co-founder of Instacart who ended up
investing in our company and and we
would ask him for advice because he
scaled a large company before um running
candidates by him and and one time I
asked him like I sent him a candidate
that I thought was pretty good. Uh, this
was his only reply. He he tends to write
very short emails, but um emphasizing
that you shouldn't lower the bar. Like
if you aren't extremely excited about
someone, like if it's not a no-brainer,
you shouldn't even consider hiring them.
Uh, so we've done like hundreds of
interviews and tons of work trials,
which I'll explain in a second. But if
you're going to be a super small team,
every person needs to be absolutely
exceptional. Um, which oftentimes makes
like investors of yours like confused
because you're still such a small team
and they gave you so much money to
scale, but you have to kind of be really
um uh thorough with your screening and
then also really confident in every
single person you
hire. We we've been trying to coin this
term of productled hiring. So, two of
our customers ended up quitting their
jobs to join the team and uh that was
like the one of the easiest decisions
we've made in terms of hiring because
they already loved the product. They had
a ton of insight into how it could be
used in a business. So like our customer
from Instacart, the one who originally
found us and brought us into the
company, he ended up quitting and
joining us and now he does a lot of our
uh like enterprise relationships and
working with our larger customers. And
then this screenshot is our head of
education and community. He was at
Webflow before but had a Zap year course
and a ton of automation um workshops
that he was selling and then found Gum
Loop and got super excited. So that was
a no-brainer. But I think if you can
focus on making a really great product
that obviously happens to be accessible
to people who you want to hire, um
there's a bit of luck involved there,
but it helps with the hiring process
because they know exactly what you do.
You don't have to like inspire them to
join the team. They they want to join on
their
own. And then making time to work
together. So I think this is only
hopefully this video
plays. Uh yeah. Okay. This is only
really possible if you have a really
small team, but we do this thing where
we uh rent Airbnbs and we just go hack
together for like four days at a time.
We we make like three weeks of progress
in a couple days. But um the two people
sitting on the left there are actually
work trials. They were like interviewing
at the time, but we brought them with us
to use to just hack. And uh I think t
doing this really intentional sort of
working together period is the only way
you'll actually know if you want to work
with someone. So we always bring people
into into work trials. they are on the
team for several days as if they already
joined the company. Um, and then by the
end we're like totally confident whether
this is the right fit or not. And we've
done way too many of these honestly. Uh,
but it's helped us make sure that
everyone on the team is
exceptional. Um, another thing we try to
do in terms of operations, I mean
there's three things here. We have
almost no meetings. Uh, purposefully so.
I try to just let people build. Like I
hired great people. So my plan is to
give them the space to build which is
easier said than done. And then uh we
automate everything internally which is
kind of a gum loop selfplug. But yeah in
terms of our calendars like my calendar
is always insane because if we're
talking to customers and or I'm talking
to customers and I I flew back from New
York this morning for example because I
was working with customers in person but
everyone else's calendar should ideally
be totally blank. Um we try to just give
everyone deep focus time. If you're an
engineer and uh we hired you to build
exceptional product like we we we should
let you do that not make you talk about
building exceptional product for five
hours every day. I think that's only
possible if you have a really small team
because normally you'll have like five
person on five people on a project.
You'll have to sync and kind of agree on
the terms before you even start working
and that just leads to kind of slowness
everywhere. So um also letting people
build. So, uh, I I used to be really
involved in every aspect of like every
feature we shipped, but now that we've
hired exceptional people who are all
better than I am at basically basically
everything, uh, all I do is kind of like
inspire or I try to inspire what the
features we should build are. So, I'll
make these like really stupid uh,
descriptions of the features that I
think we should build based on talking
to customers and then I just let people
do their thing. Uh, so like our design
engineers will Let's see if this works.
So from that sketch of me being like
what if
we okay hopefully this works. What if we
had MCP nodes? What if you could
automate workflows with MCP? Um that was
just like the highle prompt and then I
let our team like cook basically. This
video is exceptional. I wish it was
playing but uh basically they built like
a better product than I would have ever
imagined. Um so that that's kind of like
only possible if you hire great people
but once you do you you can really just
take a a backseat and give them the
space to be exceptional.
and then automate everything you can. So
this is our internal Gum Loop instance.
We we automate basically every part of
the business as much as we can. And if
there's something we can't automate,
then we build features on Gum Loop to
let us automate it. So like before every
meeting, we have like a deep research
report that tells us everything we need
to know about the customer. Not just
their outward facing information, but
also like how they're currently using
our product. Uh are they a power user or
not? What features are they using? So we
we're like totally informed going into
the meeting. Um, we have every time
someone interesting signs up, we get
notified uh uh why what they're doing on
the platform and also like an email
drafted in my inbox so I can reach out
to them, hop on a call and like talk
about why they they made that free
account. That's led to a ton of our
growth. Um, we have an AI chatbot on the
platform, for example, that gets like
50,000 messages a day, but we have a Gum
Loop workflow that reads the chats with
the chatbot so that it can tell us what
people are confused about and then we
use that to inform our product
decisions. So uh a lot of these little
tasks in the company would have been
someone's role or taking up like three
or four hours of their day but now we we
use our own product to automate
everything. So also a lot of luck
involved. You can be a small team if you
are an automation company but uh if you
use gum loop maybe you guys could be
more efficient. That's the plug. All
right. Um so culture- wise I think this
is the most important thing. it's
impossible to to talk about having a
really exceptional team uh if no one's
having a good time or um they're
quitting. So
uh we I mean one of the most annoying
things I say uh at like basically every
day when we talk about a feature that a
customer is asking for is like what if
we built it today? Um like what would
that look like? And then it's kind of
caught on and now everyone on the team I
mean first of all they're ex I've said
that like 10 times but they're
exceptional and they're really fast
building engineers. So, we often just
challenge ourselves like what if we put
on a timer for 45 minutes and try to
ship this feature um right now with
cursor. Um but this can lead to crazy
burnout. Like if you're always asking
what if we did it today on a Friday
night at 8 p.m. then people are going to
have a bad time. So you have to be
really intentional about making it fun.
Um, like I mentioned, we do these these
retreats, but we're going like we're
picking a cool place that I wish my like
boss would have taken me when I was
working at a company before this. And
then we get a bunch of food and do a
bunch of fun things like we go rock
climbing and and biking. And um, it kind
of offsets the intensity of building
with such a kind of like crazy timeline
for every feature. I don't think like
anyone would be having fun if we didn't
have these like really exciting times to
look forward to. I also think this is
only possible. You can't fit 50 people
in an Airbnb, but you can fit 10 pretty
comfortably. Um, and then being really
intentional about your company culture
is another thing that I'm pretty adam
adamant about. This is our our company
handbook. It's like a month or two out
of date, but um, basically everything
that we say internally, we just put it
on a page so that we have to live up to
it. Um, we wanted to kind of hold
ourselves accountable for all of the the
ways we talk about building a
company. Uh, and this is also like one
of the the things that convinces most of
the exceptional people on our team to
join or to to book that initial call
because they read our outward facing
handbook and they know that like what
we're about before they even meet
us. Um, and I'm kind of at the end of uh
I was going to show the video but cut it
a bit short. Um, we are hiring a
founding head of growth. So, if you know
anyone, you can email me there. Like I
mentioned, it's a fun time. Uh, pretty
intense, but
hopefully you know someone or you want
to join the team and help us
scale. Cool. Okay,
[Applause]
I think I'm
Yeah, big fan of the product. I think
it's really really awesome. I've been
using it and pitching it internally in
my company. So, I'm I'm a huge fan. I'm
curious how far you think you'll be able
to get with 10 people. Like, are you
still staying true to that and how you
think about scaling out to like a
billion users around the world uh with
10 people? Yeah, I don't think it's
possible to to scale that big with 10
people.
um maybe 15 or 20, but uh I wanted to
like set the bar really rigidly and then
if I go a little over it's no big deal,
but at least we're not scaling to like a
hundred people and having eight hours of
meetings every day.
What is um like your vision for the org
structure when you do hit 1 billion with
15 20 people? What's the or look like?
It's been changing a ton. So at first I
was really naive, still super naive, but
I thought like we could do it with only
engineers because I was like engineers
can do anything. They can learn how to
do marketing or sales or whatever. Um I
was totally wrong.
So we're now five engineers and four
semi-technical people. Um I don't
exactly know what the work structure
will look like, but we're starting to
feel that like our only bottleneck now
is like growth marketing. Like how do we
share all of these cool features we're
doing we're building for people with the
world? And then also like we're getting
hundreds of requests for features every
day. So another engineer would would
definitely help. So yeah, just to touch
on that, when you're looking for uh the
growth, the head of growth. Yeah. Are
you looking for someone who like is also
sharing the like, oh, I can do this all
myself with AI tools or looking for
someone who is looking to grow a team as
well? I I think definitely not the
latter. So, I I call them like a doer
versus a todoer. That sometimes you'll
talk to someone about like joining the
company and they're like, "I'm really
great at building out a team." Like,
that's the biggest red flag. Um I I
think they're they'd be great at like
listing all the things that a team needs
to do, but don't hire that person if you
want to stay super small. We're looking
for someone who's like, I can just make
it happen. And then once they hit their
ceiling and they're like, I actually
can't like scale further than this, then
that's the time to hire. But I wouldn't
hire someone who's going in with the
intention of hiring more people.
Just hoping you clarify on u letting
people build is that individual
developers, engineers or like as a team
collaboratively and uh how do you
prevent like
um fractures in like the codebase and
like having it like disjointed? Yeah, I
I think it's only possible to just let
people do their own thing if they're
like really trustworthy. Like you hired
people that you can depend on. Um,
sometimes it goes like wonky like we
don't we don't have the same
understanding of what's being built, but
then we just like sync over like a five
minute chat and we're back on the same
page. But um, yeah, generally like you
people know the direction because you're
talking and you're in the same office
all day every day. They just like talk
to a customer and they realize that is a
pain point for someone. So they just go
ahead and ship it. You don't have to
like get in their way and like make a
spec dock and figure out exactly how
this is going to work. You should just
trust them to to build.
Can we do one over here? Yeah, sorry.
Oh, what's up?
Um, how do you think about compensation
as well as just like like how you like
are do you look at these uh 10 engineer
or 10 employees as like normal employees
or do you consider them more like
founders? What are your expectations for
them versus how like a traditional
startup might have expectations and how
do you think about compensation as well?
We try to compensate really
competitively um because we we raised
like 20 million and we we're such a
small team that like we we're in a
position to do that and that was also
like the main reason we raise so we can
compensate people and and make their
life comfortable while they're building
the future. Um we don't consider them
founders. I wouldn't like put that
burden on someone like I'm the one who's
waking up at 6 a.m. like sweating
because I had a nightmare about like our
like back end crashing. Like I I don't
think they should be uh doing that. But
um we do treat them as like just members
of a team. Like everything that we ship
is a discussion. There's no like top
down order that we need to do x y or z.
Uh it's just like a kind of like
flatland collaboration on uh like what
we're going to build and when and how.
Cool. Thanks. Yeah. Hi. Um do you think
this sort of culture can translate to um
say you might be already doing this but
say when you start getting into
workflows that are highly complex in
enterprise, right? So uh banking
regulation or parts of legal where
information is just in the heads of
super experienced people and I feel in
those at least my experience has been in
those instances you need deeply
non-technical people and technical
people to work together and the scaling
sort of breaks down but have you found
ways around that or I'm just curious on
your advice for people in this in this
space. Yeah. Um I think I understand the
question like how do we support really
complex workloads if we don't have the
nuance of like how to do that? We try to
just build the tools to let the person
who understands the workflow do it. So
like at Shopify, if we're working with
like their head of legal or something
and they understand what contract review
looks like at scale for hundreds of
contracts a day, we make it really easy
for them to use the software that lets
them build their own tool instead of
like trying to learn how to do their job
better than they
do. I think one more question. Okay.
Hey Max, I just had a quick question. So
with the uh work retreats that you do um
is uh like at what point in the
interview process do they go on the
worker retreats the guys that you're
interviewing and then do you offer to
pay them and if so are they like $10.99
or how does that work? Yeah. So we we
always do like a screen with me. I talk
to someone for like an hour and figure
out if we are like could be friends
basically. Then we do a technical
interview which is super practical. No
like leak code stuff. It's just working
in the codebase. And then we do the work
trial. if it's around the time when
there's a work retreat coming up, I'll
just like delay the work trial until
they can just come with us and we hire
them as contractors basically. So, um
they're getting paid for their time. We
wouldn't want to make someone work for
free and uh we just try to like
coordinate with their schedule whenever
they're free. Okay. Thank you. Yeah.
Sweet. Thanks everyone.
[Applause]
All right. Thank you so much folks. That
wraps up this part of the tiny team
session. Um, we'll be back here at 2
p.m. with some more speakers, but thank
you to all of our speakers for running
through everything and enjoy the rest of
the conference. If I don't catch you
back here in a
few bit
Thank you.
Okay, welcome everybody to the afternoon
session of Tiny Teams. Uh, my name is
Britney Walker. I am a GP with a VC firm
called CRV. We invest in seed and series
A companies and have been doing that for
55 years now. Um so we're currently on
our 19th fund which is a billion dollar
fund and we work with a bunch of folks
relevant to this ecosystem. Folks like
Verscell, Postman, Kong, um Browserbase,
Voyage, a whole bunch of folks across
kind of infrastructure generally as well
as AI infrastructure specifically. Um
and so super excited to be bringing you
the session this afternoon. Uh we have
three amazing speakers lined up for you.
Grant from Gamma, um Vic from Data Lab,
and then Alex from EveryY. Um, and so
we're going to get things started in a
second here with Grant from Gamma. Um,
and Gamma is an AI powered presentation
software tool. Uh, fun fact, I was just
telling Grant backstage that I was using
it literally last night uh to spin up
some last minute slides for a session
I'm doing later today as part of another
program. Uh, Grant has spent 10 plus
years building tech startups. He was
previously the interim CFO of Optimizely
in the experimentation space um and grew
up in the Bay Area. And now, as I
mentioned, he's on to Gamma. Um, they
have 30 folks in their team, so still a
relatively tiny team uh at the series A.
And excited for him to tell you
more. All right. Testing, testing.
Good.
Awesome. Thanks so much uh for having
me. It's uh it's great to be here. Uh my
name is Grant. Uh I am one of the
co-founders and the CEO of Gamma. Uh we
are basically as alluded to building the
anti- PowerPoint. So we are trying to
reimagine how people create and share
content. We make want to make that dead
simple. And this all started with kind
of just trying to solve my own problem.
I was previously doing consulting and
like many of us have probably seen uh a
page or slide that looks like this, the
blank slide, and just had this feeling
like there's got to be a better way. And
so we've been spending the past four
years just really trying to reimagine
the building blocks. How can we make it
dramatically simpler so that we're not
spending all this time designing,
formatting boxes, aligning boxes,
resizing them, figuring out the right
layers. we can focus on the content
itself and let it feel more like a
content first approach versus a d design
first approach. And so, you know, we
have uh grown over the years and for us,
we're really trying to deliver both
speed and power to our users. A lot of
what we pride ourselves in is giving
people simple tools to really mold and
shape uh their presentations, their
content much simpler. And longer term,
we're trying to build what we call tools
for imagination. So this is the whole
notion of how can we help people really
sort of stretch shape their ideas in a
way that's way easier for them to to
share and if we can do that maybe we can
help kind of push innovation forward in
general but this talk isn't about uh any
of that because you know most of the
talks today really great talks around uh
really innovation obviously AI it's very
much a product you know centric lens
that people are applying which is
amazing I want to take a step back and
you know I I think a lot of founders are
great at applying sort of first
principles to thinking about how do I
build product and I would encourage
everyone to think about we're in an era
where we can also apply those first
principles to think about how do we
build a team how do we innovate on org
design and we're obviously still
learning ourselves but I just wanted to
share you know some of those lessons
along the way to hopefully inspire you
to all think about maybe there's a
different way about building teams in
the
future this is the old way we're all
used to this uh you know there's many
many different flavors of
Once an organization starts getting big,
inevitably you have a bunch of hierarchy
and that could take shape in itself in
many many ways. And you know what
traditionally happens is uh once a
startup starts scaling uh you'll bring
on the sort of VP the VP will go on and
hire their directors directors will go
on and hire their direct reports and you
get this sort of cascading effect and
this happens across every single
function and you can go from a small
team a tiny team to a team that ends up
becoming much much bigger and that can
happen overnight. I mean we've all lived
probably through the blitz scaling phase
of of startups and um you know some of
that still exists but I do think there
today can be maybe a new way and for us
you know we've reached uh over 50
million users now we're still a team of
30 uh and in fact this is only more
recently that we've become a team of 30
and so you know again these are things
that we're still learning along the way
and trying to think about what are some
of the themes that we're starting to see
that we can start talking about and
sharing and obviously getting input from
you and then for us to continue to learn
and learn and adapt. So this kind of
impacts three different pillars. The
first pillar is you know obvious where
do you begin? Who do you even hire? For
us I want to talk a little bit about
kind of the rise of the generalist. What
does that look like in practice? The
second is okay now that you have a team
how do you manage that team? I want to
talk about this notion of introducing
the player coach. Something that is very
critical to how we build and manage the
team. And then the last is how do you
scale? You actually have a team whether
it's 10 30 more. How do you actually
prepare for the next phase? It doesn't
mean you don't hire at all. It just
means relative to maybe where you were
uh to to companies before you're just
much smaller. For us, you know, at our
scale, I would say we're probably
onetenth the size of what we would have
been if we were started just a few years
ago. So, it's just a different uh way of
like framing
it. So, let's first talk about kind of
what I call the rise of the generalist
and and what does that mean? Um this
notion of a generalist is you know in
engineering you might have an idea this
notion of like a full stack engineer it
applies to many different disciplines.
Um this one concrete example I'll
provide is you know a generalist on our
team is our head of design. Uh he was
also happens to be our very first hire.
He is a designer that is both you know
super visual. He actually knows how to
code as well and in addition to that he
can actually really go deep on the core
UX. or he loves researching, talking to
users, doing all of that. So that
empowers him to really what I call kind
of connect all the dots. You might be
able to pull in and really empathize
with, you know, your engineering
counterpart by knowing like, okay,
deeply what is what are we actually
capable of building so that when you go
off and vibe code go code a prototype,
it's actually something you can ship and
and actually uh deliver in production.
And so understanding that comes with
just being able to actually play with
everything and have much deeper empathy
for what you're building. We he also has
this really willingness to sort of adapt
and reinvent himself. So every phase of
growth he's had to kind of change it up
a little bit like early on when you know
there's really no product itself like
you're trying to think about okay what
is the basic most simple UI UX that we
can deliver to the user as a as the
product becomes much more complex you
need to iterate really fast. He's the
one coding prototypes, getting in the
hands of users, setting up user tests,
interviewing them, getting feedback,
getting that back into the hands of
users, iterating that a ton. And then
we're also at a scale now where he's al
also able to to uh look across the team
and actually provide, you know, guidance
and mentorship. And I'll get into sort
of player coach in a second because he's
actually one of those as well.
Inherently, I think what makes a strong
generalist is someone that both likes to
learn and likes to teach. And I think
learning it's one of those things like
if you're a continuous learner
especially in this age is very valuable.
There's so much innovation happening can
you pick up new skills? And I think the
counterpart of that is like people that
usually are great at learning can also
be a great teacher. Um when we look for
an interview process is someone that can
teach someone else a new skill like that
is baked into how we approach finding
people is can they not only be deep you
know domain experts in a space can they
articulate that in the way? Do they
really have deep understanding? and they
convey and persuade others to kind of
share in that understanding. Those are
all things that I think a great
generalist can encapsulate and and
certainly stuff we try to sus out uh
during the interview
process. The second notion is just uh
introducing the notion of player coach
and some of you may have heard of this
uh before. This uh metaphor or analogy
comes from sports. In American football
uh you have you have a a sport that
there's so much action going on all the
time. the game on the field is moving
incredibly fast. And what you can do is
rather than just having the head coach
make all the calls, make all the play
call all the plays. You can have a
player coach, someone that's actually on
the field, help make some adjustments.
So in football, that could be you have a
quarterback on the offensive side. On
defense, you might have the linebacker.
They're able to read and react to what's
happening on the field and then not
having to rely on the coach, they can
actually make adjustments. This metaphor
applies today because I think the game
on the field is AI. AI is moving
incredibly fast. We're all forced to
have to adapt. And so rather than having
every single thing be a tops down
mandate, what if you had player coaches
on the field that are able to actually
understand how can we adapt? How can we
rejigger, rep prioritize really, really
quickly? And for all of our sort of core
leadership team, uh, every single one of
them is a player coach. On our
engineering side, we have player coaches
that uh have had ton of management
experience, but they still love to code.
They still love to be in the day-to-day.
And that allows them to be um uniquely
valuable. One, they're all obviously so
close to the work that they know what's
happening when someone else on the team
needs mentorship, needs coaching, needs
some form of prioritization or how can
we actually, you know, um consider the
things that are in flight and and maybe
change things. that player coach has a
ton of context, understands the nuances,
can make the right technical tradeoffs
and in addition to that can make you
know the sort of pave the path for
longer term career aspirations. We don't
know how this is going to scale but for
today this is working well and for us it
allows us to have this really really
lean team where you know we still have
the ability to mentor and coach the
individuals that need it and then you
have deep domain like technical
expertise in places where you know
you're able to make adjustments as as
fast as needed.
The last thing I'll talk about is
scaling. And it's maybe a little bit
counterintuitive. You know, you might
think like a small team, why would you
invest in things like uh brand and and
culture? Uh I say brand and culture
because for me, brand and culture,
they're they're two sides of the same
coin. Brand is ultimately a reflection
of your culture. Your culture is your
values as a company. And you really want
those two to to go hand inand. culture.
I mean, this piece of it is is a little
bit more obvious, but when you're a
small team, what ends up becoming super
important is like every new team member
you bring on, you have to believe that
they share your same values, that they
operate the same way because you can't
afford that not to be the case. A bigger
company, it's much more diluted. You
might be able to bring on a bad hire.
It's not going to be pervasive and like
spread. Smaller teams, that cannot be
the case. And so, you need to invest
heavily in this from day one. We have a
living culture deck that we've
maintained basically since the beginning
and we rewrite it all the time. We look
up at the makeup of the team. We kind of
like really try to encapsulate
everybody's core values in the way they
behave and then we share that back out
to the team. We onboard new employees
with the same culture deck. It's an
ongoing evergreen sort of uh exercise
that we go through. And I think what
comes out of this is like this feeling
that this tiny team can have this
feeling of being a small tribe. And that
tribe is something that's pretty
magical. It allows you to have this
feeling of continuity. It allows you to
have this like feeling that um you are
in it together. And if you have that
continuity, there's just so much like
it's hard to even quantify that value
because you're not having to retrain
people, re onboard people like people
just get it. There's that tribal
knowledge. And I do think there's a lot
of magic that happens that translates
into just in my mind higher
productivity, um transparency, shared
context amongst all things. Um we have
in our in our team and it's easier to do
this when you're small is we have like
three standing all company all hands
meetings. The very beginning of the week
we start with like going deep on
metrics. We talk about we have this
thing called the wall of work where
everybody's showing like what everyone
else is working on. Wednesdays and
Fridays we do companywide showand tell.
So this is a chance for people to also
dog food our own product use gamma
present share what they're working on.
It could be a small project. It could be
a feature they shipped. And this
continuity just allows everyone to feel
like we're still in a small room sharing
this, you know, big ambitious uh
long-term vision and do it together. I
know there's a lot of talk of like, oh,
maybe there'll be the 1 billion
oneperson startup. And I don't know,
maybe that will happen, but my thought
is like, why? It's so fun to build with
a team. Like, why do it alone? We're
having a ton of fun building as a small
team. And part of that is like we really
want to, you know, preserve that magic
for as long as humanly possible. So,
This you know talk started with me
talking about how the gamma journey
began which is me thinking about hey
from a product perspective you know
there's got to be a better way and my
you know I guess challenge to you all is
as you think about building your own
teams really thinking about hey you know
there's the old playbook the old way of
scaling and building up a team and
that's that's totally fine but is there
today a better way and hopefully you
guys can find your own path and
hopefully share back and we can all uh
you know do this
together. Uh I I guess we have a few
minutes for for questions if anybody has
any.
Um with AI moving so fast, if you could
go back, what would you do differently
about building your current team now?
Yeah, that's a great question. So the
question was with AI moving so fast,
what would I have done differently? We
actually started, you know, four years
ago. So this was before like the more
recent, you know, wave. And so I do
think, you know, when you're early on,
whether you're using AI or not, you're
going to probably spend some time in the
idea maze. You're really trying to
navigate figuring out where is their
true user need and what problems are you
solving. And I do think there the
temptation today is to move super fast.
AI can do everything for you. So you
just jump onto the thing and start
building. I still think people can
afford to go be much more patient. And I
think even for us like when we initially
started doing our first AI launch was a
two years ago. I almost wish like in
hindsight we could have like really just
taken our time to appreciate how much
things are changing and evolving before
going to like full steam ahead like
let's just build build because part of
that um I think realization that we did
have bu starting to build is that hey
because things are moving so fast like
are there infrastructure decisions we
should be thinking about earlier much
earlier on before things become too late
you get to a scale where it's impossible
to unwind and I think it's helpful to
think a little bit more about that way
early on in the process doesn't mean you
should slow down just means you should
be thoughtful of it.
Um it's not something we would have done
differently. I think I would have
prioritized maybe more effort around
even more so is we have a lot of
infrastructure built around
experimentation and I think it's obvious
now like given all the different tooling
like you know especially have a big user
base experimentation is a key to
velocity and you know we we did do some
of that um pretty early on but it was
more of a sort of gradual I think we
would have you know really taken our
time to think about okay what should we
do and like put more weight behind it um
if it would have changed anything I'm
not sure but I think that's one thing
you know I would have kept in mind you
to go here and then here. Um, you might
already be there. At some point, you
probably will have to bring in people
whether they're like communication
experts or legal experts that maybe
don't uh gel quite as much with maybe
like the technical or engineering
culture you might have. Yeah. Do you
have any advice for like how to make how
to not like ruin some of that culture
but also make sure that they don't feel
completely excluded? Yeah, the way we've
been trying to do it is for the founders
or other leaders to try to do the job
first. So yeah, the question is outside
of engineering basically how do you uh
you know potentially not mess things up
by growing too fast and yeah we're still
learning there oftentimes a lot of the
jobs for me for instance a lot of
marketing sales customer experience was
all done by me first so I have some sort
of baseline understanding because you
know I as in a previous life I've never
hired for those functions so how do I
even know what good looks like I try to
do the job myself oftentimes not a great
job at it but understand all the nuances
that takes the that really goes into
that job know what great looks and then
go on and finally fire hire that person.
We going back to the player coach, we
still go out and find player coaches for
that role so that it doesn't end up
becoming this sort of cascading effect
of like really really big and bloated
teams. Uh some of the player coach stuff
sounds like you're hiring a lot of high
agency people. How do you judge high
agency when you're hiring people? Uh
that does not necessarily come from
their resumes. What kinds of questions
do you ask? What kinds of processes do
you follow during hiring to judge for
high high agency? Yeah, totally. It's
it's probably that you have heard
before. But a lot of times, you know,
you want to uh if someone has prior work
experience, you dig into their most
challenging project or problem they had
en encounter and uh you ask them, you
know, basically how they solved it. What
you'll find is people that have high
agency or just a sense of ownership in
general, they don't immediately jump to
what the solution was. They'll talk
about how they tried to understand the
problem and then how the problem they
understood at the surface level was
actually five like five levels too high.
You had to keep on drilling. And if they
can articulate what the true problem
was, like keep on going down and then
not only talk about what the solution
was, but all the attempts at the
solution. I think that goes to show that
someone wasn't just like taking orders
and like, hey, I'm going to do this. It
was like I I need to find one,
understand the layers of the problem,
and then two, navigate and actually
explore. Most people when you start
asking them like the second order or
third order wise, they can't get there.
And if they can't, then it's pretty
clear that they probably weren't doing
much of the thinking themselves.
Hey, thanks for the comments. So, hiring
is probably one of the most important
things that uh a company can do, right?
I mean, it's either for better or worse.
What are some uh if there were any major
failures that uh you have experienced
and you you know could share with us
that would be very helpful. Yeah, the
the biggest failures were actually when
we didn't when there was a role that
there was some ambiguity ambiguity and
we weren't able to do a work trial. So
work trial is also something I didn't
talk about something we deploy where
people actually do the job for a certain
amount of time. Much easier if they're
obviously not currently working and
we've found great success when someone's
in between or has been doing fractional
work. We bring them to do the job first
and we do that for a few months where we
had some roles where we weren't yet sure
what we're looking for and we brought
them on and they didn't do a work trial.
They just went straight in. It often
times wasn't a good fit because neither
them or us knew kind of like okay what
were we actually what was going to be
that sort of good fit. So if if you can
if you're lucky enough to be able to do
a work trial whether it's two days or
three months in our case we default to
three months I would encourage you to
try to do that especially if it's a role
you haven't done yourself situations
where the work trials have actually all
worked out which is great and a few data
points and we've done five plus of them
uh and then yeah and the cases where we
didn't it's actually pretty high um
again going back to the role that we
weren't certain about what we're hiring
for is actually pretty high failure rate
for
us. Is that it? All right. Thank you
everyone. I'm on LinkedIn if anyone
wants to
connect. Thank you so much Grant for the
insight. Um, next up we have Vic
Paruturi from Data Lab and they're
training custom models for document
intelligence including OCR and
unstructured data processing with
popular repos like marker. Um they
scaled 5x in the past year up to seven
figure ARR uh including folks like tier
one tier one AI labs um and they're
going to walk through their approach to
building these super popular repos
scaling revenue and training models with
a tiny team. So welcome to the stage
Vic.
[Music]
Yes, better this time.
Okay, take two is always the charm.
Okay, u my name is Vicass. I'm the CEO
of data lab and today I'm going to talk
about how we got to 40k GitHub stars
seven figure ARR and trained
state-of-the-art models with a team of
three. So I spent the last year training
these models like Britney mentioned
marker and Surria. I also built
repositories around them. I left my AI
research job and I started a company and
raised a seed round. Uh I did not get
enough sleep. It's uh
important. And this is data lab. So we
made our first hire in January. We're
now a team of four. Faraz is new enough
that he's not pictured. Um, we've grown
revenue 5x since January. We're at seven
figure AR. And our customers include
tier one AI labs, universities, Fortune
500, and AI
startups, including Gamma, who I used to
make this presentation. Um, so today's
focus, I'm going to talk about how we've
grown with a small team. I'm going to
talk about my philosophy on building
teams and why I think we're at kind of
an inflection point in how we think
about building teams. And I'm really
going to talk about this idea that
headcount does not equal productivity.
There's like this really persistent
notion in Silicon Valley that you raise
money, you hire a bunch of people, and
you build more, but it almost never in
my in my opinion works out perfectly
that way. All right, so my last company
was called Data Quest. I'm very fond of
the data prefix apparently. Uh, and we
scaled to 30 people and four million AR
bootstrap during COVID. It was an online
education startup. Um, and then
unfortunately we had to do two rounds of
layoffs postcoid when online education
kind of tanked. We went from 30 to 15
and then again from 15 to 7. And it was
obviously awful for the people we had to
lay off. But I noticed something really
interesting. Productivity and happiness
increased a couple of months after both
layoffs to the point where we were
actually much more productive after both
cycles than we were at the beginning.
And I started to wonder why that was
like how could reducing the team so much
actually improve productivity? And I
came up with these four hypotheses. One,
we'd hired a lot of specialists. So as
you scale, like Grant mentioned in the
earlier talk, you end up building these
very specialized functions and teams and
those specialists often can't flex
across the company to solve the key
issues of the company. Two, we were a
remote team which required a lot of
intentional process and heavy syncing
which just eats into your time and just
just makes it really hard to get on the
same page. Um because of that we had a
lot of meeting overload and especially
once we got middle management in place
people whose job is kind of
professionally to manage we ended up
with just a lot of meetings on people's
calendars and not enough time to
actually work and then senior people we
hired kind of a mix of experience like
most companies do. We hired junior,
mid-level, senior, and then senior
people ended get up getting kind of tied
down in doing a lot of work uh to manage
the more junior people. I we actually
had a case where we had a three-person
team and we cut it down to one and the
team actually got much more productive
because it freed up the senior person's
time. Um and kind of every company I
feel like goes through this journey.
there's this initial golden period when
everyone is aligned, you're on the same
page, you're building this amazing stuff
and that's really when you build the
core thing of your company um like
Google uh with search or Microsoft with
Windows. It's kind of when you figure
out your business model and then you
hire a bunch to fill out the edges
around it. Like you hire a bunch of
enterprise sales, you hire a bunch of
marketing, you hire a bunch of engineers
who are kind of in very small boxes to
build very small features. Uh I had a
friend at Amazon who worked there for
two years and built a shopping cart
button. Um it's it's fine, right? But I
mean at at that scale of org, that's
kind of the tiny box you get fit in. Uh
and you end up with a lot of
bureaucracy, a lot of sinks, a lot of
unclear priorities. Um and this pattern
is unfortunately very common. But I
started to think, what if that golden
period just lasted forever? Why why do
you actually need to end it?
And as I started working with Jeremy
Howard at Answer Aai, I got to
understand his philosophy for building a
company a little bit better. And his
idea is basically hire less than 15
generalists. So people who can really do
everything across the stack and really
understand all aspects of the company,
fill in the edges with AI and internal
tooling. So, uh, Jeremy's invested a lot
recently in fast HTML and things like
Monster UI because he sees them as kind
of building block libraries to really
build out the other tools that the
company's working on. Uh, and then use
simple boring tech, right? Like you
don't need to get too fancy. You don't
need a Kubernetes cluster when you're a
three-person company.
Um, but this unfortunately requires uh
kind of a high cultural bar for folks.
Um, you need people who really want to
and can understand everything you're
doing. So you need engineers who talk to
customers. You need go to market people
who actually build. Uh and that's that's
not necessarily easy to find. You need
high trust. So um basically you need
people who are in it because they're
building something together uh and not
in it for other reasons like politics or
personal advancement etc. And everyone
needs to really care about the customers
and focus on them. Um I think these are
the prerequisites for this kind of team
working this less than 15 person team of
generalists.
I I'll give you a quick example. So we
recently trained a model uh Syria OCR3.
Uh it we recently shipped it but have
not announced it yet. So it's 500
million parameters. It supports 90
languages and 99% accuracy on our
challenging internal benchmarks that
include math. Um and it also does some
features that no other model does like
character level bounding boxes. It uses
PDF text as grounding at a line level.
Um so it was a very challenging model to
train and in order to do it Darun who's
a research engineer at data lab and I
had to handle the entire process from
end to end. So that included talking to
customers figuring out what they wanted.
Uh it included reading a bunch of papers
and figuring out the right architecture
prototyping doing the model training
itself which you always hope is 90%
architecture but is always 90% data
cleaning. So building a data pipeline
library building out the data sets then
we had to write the inference code. So
we had to connect it to our repos, get
the inference written for all our
customers and then integrate it into our
products. So this is a scope that in a
big company you'd probably have four,
10, you'd have a lot of teams doing
this. And every time you hand off
between teams in a traditional company,
you lose context, right? The people who
talk to the customers lossily
communicate it to the people who build
who lossily communicate it to the people
who train the model. Um it just gets it
becomes very inefficient. You end up
eating a lot of time in just syncing
context. it never gets fully synced.
You're not able to build a great
end-to-end experience as a result. And
you have very slow feedback loops,
right? Like you talk to a customer today
and it might impact your model training
in months. Um whereas if you have
generalists who can work across the
stack, you get seamless context, right?
You never need to share context and do
inefficient syncing. You get a really
tight integration between all aspects of
the company and very very fast feedback
cycles. Um, and the reason we were able
to do this is we used AI to to take kind
of the easy low-lever pieces of this um,
like building a data pipeline library or
helping us really figure out how to
integrate it into the API. Whereas we
did the higher level work in each of
these
silos. So I if you get one thing from
this talk, this is the thing more people
does not equal more
productivity. Um, all right. And like
how do you make this work? Like how do
you operationalize this? So the first
thing you have to do is hire senior
generalists. And senior to me does not
mean years of experience. It really
means maturity. You need people who can
look at a problem and say, "I'm going to
figure out how to solve this. I'm going
to do what it takes and I really care
enough to iterate with the customer to
solve it." Um, you need to avoid over
complication, right? Like I'm an
engineer. A lot of us are engineers. We
love over complicating things like,
"Hey, let me deploy this Kubernetes
cluster and multi-stage pipeline to
solve like a data extraction problem."
Um, but in reality, you need people who
can go back and like kind of set aside
the fixation on shiny tech and just do
the simplest possible thing, which
usually is I'm just going to write a
shell script to run this on one machine.
There's that famous like Hadoop versus
Shell script blog post uh from a few
years ago when you like you could
replace a whole Hadoop cluster with just
like a 64 core machine. Uh, you need
people who who appreciate that ethos.
Um, and you need to work in person. I
personally think um remote is great for
a lot of reasons, but it's not great for
a small team that needs to move fast. um
because you need to set up a lot of
process and process to me is kind of the
death of this really fast collaboration
and tight feedback
loop and then how do you do it
architecturally? So um I I alluded to
this a little bit but you have to reuse
components aggressively. So we reuse a
lot of components between our on-prem
and our API deployments. We keep our
technology super simple like we don't
use React. We don't use any fancy
front-end frameworks. It's all server
rendered HTML with like light HTMX and
Alpine and then super clean modular code
that AI can really add to very well.
Like we rearchitected our marker repo to
be extremely modular and easy to to work
with and well documented and that makes
it much easier to use AI to actually add
to
it. All right. So basically keep
everything simple code is clean,
readable, maintainable. Architecture as
few moving pieces as possible. Um,
minimize your surface area and then
process. Minimize bureaucracy, high
trust, continuous discussions. Um, if if
you feel like someone's going to need a
lot of management, like don't hire them.
Like you need people who can who can
move fast without being
managed. Um, all right. And then how do
you fill in the edges with models? So, a
challenge we're going to face as we
scale is this idea that we're we're a
document processing document
intelligence company. And every customer
has a slightly different way that they
want to parse their docs. And if you go
back kind of to the last generation of
OCR companies, the way they solved this
is they hired a bunch of forward
deployed engineers, you sat at a client
site and you just kind of iterated with
them until it was good enough. But in
the future, you can really train a model
to handle this complexity, right? Like
we can train a model to essentially loop
over customer outputs until it gets to
the the right state. So you can kind of
replace that entire forward deployed
engineering side of the org. Um, and
then when does this model fail? Like
we're early, right? I don't know exactly
when this model falls apart. Um, but
gamma as as we just saw is a great
example of a small team with with very
very meaningful growth in ARR. I think
the key is being able to say no, right?
A lot of these edges are choices, right?
You can choose to go hire a bunch of
forward deployed engineers and put them
at your client sites or you can choose
to solve it a different way and maybe
that different way is slightly less
efficient in terms of revenue. Um, but
it might be more efficient in terms of
your long-term company trajectory and
health. Um, so it's really unknown if
this will work forever, but in my
opinion, like it's your choice, right?
Like you can choose to make this model
work or you can choose to to do the less
efficient let's scale to hundreds of
people
model. Um, all right. So LLMs are
surprisingly bad at generating ven
diagrams. So that explains why this
slide is is is not so well done. Um, but
basically we have three core roles and
the the responsibilities overlap a lot.
So everybody talks to customers. Um,
everybody builds product in some way and
research engineer and full stack
engineer overlap quite a bit. Um, and
then go to market is really like your
traditional kind of sales, marketing,
support functions all collapsed into
kind of like a more generalist role.
Um, and really like I feel like politics
are the death of small teams, right?
Like we want people who only care about
the work, the people around them and
customers, right? Like minimal ego. you
need some ego to to kind of advance your
own ideas, but not so much that you're
willing to fight for them at the
detriment of of kind of the health of
the company. Um, we pay top of market
salary, right? Like it's always weird to
me that startups pay 150 or 200k when
they've raised 20 million, right? Like
you should be able to hire fewer people
with higher salaries and get more done
in my at least that that's what I've
seen. Um, meaningful work. So big
challenges in scope, right? Like if you
come in, you get to work across the
stack, you get to ship things end to
end. Uh, and that's very exciting for
some people. It's it's not exciting to
other people and they kind of self-
select. And then you really need a good
way to screen for low ego and GSD,
right? Like you need people who will
ship, not talk about shipping. Um, and
that's another downside of remote
culture in my opinion. It's very, it
gets very hard to tell the two apart.
Um, and then patience, right? Like the
worst hires I've personally made have
all been when I thought I had to fill a
role very quickly. All of my best hires
have been when I said, "Okay, let me
find the best person and and hire them."
Even though I may not necessarily have a
role today, they're a great generalist.
Um, this is actually a big debate in NBA
and NFL drafting, too. Like best player
available versus drafting for fit. Um,
all right. So, really, I think the thing
to think about as you scale is like how
do we scale productivity, not headcount?
And you can do that in a few ways,
right? Like you can raise salary bands
as the company grows. So you hire more
and more experienced people into the
same role. Um you can invest more in
compute, right? Like a one researcher
with access to eight GPUs is less
productive than one with access to 64
GPUs. You can invest in AI tools that
multiply productivity, right? There's so
many tools out there now um that are
worth paying for that can abstract away
a lot of these edges for
you. And finally, uh I'd be remiss if I
didn't say if this culture sounds
interesting to you, drop me a line.
Those are all my socials. Um we'd love
to chat.
All right. Yes. Uh I think we do the
microphone for questions, right?
So, um when you went from 30 to 15 and
then the seven, I mean my take away from
this whole talk is like the human touch
points are really what slowed things
down, right? Um was there any uh
additional po um focus on reducing the
domains that you were focusing on or
like your capability sets or it was like
basically your same product offering
just with less folks focused on it?
Yeah, that's a really good question. So
at at a very high level we offered the
same product but we cut some features
that were less relevant. Like we'd we'd
built up a lot of those those edges that
you kind of like end up building over
over the years. Uh and we ended up
slicing a lot of those edges. So I think
I think what happens when you hire a lot
of people is you don't have enough work
and you start making work for people,
right? And they end up building all of
these edges that actually aren't that
useful to the customer. But when you
have a tiny team, there's so much work
that you actually have to ruthlessly
prioritize. And I think you always want
to be in that zone. And that's kind of
where we ended up back.
Oh, sorry. No worries. So, uh, it's a
hypothetical question for you. So, we
take you and drop you in the middle of a
giant company that's been around for a
hundred years, hundreds of thousands of
employees, lots of bureaucracy, lots of
ego, got super comfortable with a
revenue stream.
Um, and they're clearly folding over on
themselves with too many people. How do
you change that culture? Yeah, I'm not
the right person for that. I've never
done that before. Um I I would say I
would say you the people who want to
change the culture go start a small
company and build the same thing just
build it better. That that's a common
pattern right like that's a common
disruption growth cycle. Um I think
that's the best way to do it. Like it's
it's just once a culture gets oified
like I've worked at the State
Department, Pepsi, UPS like once a
culture gets oified enough like you're
not going to change it. Like it's just
it just is what it is. Generally with
that pattern what happens is these
companies recognize that they're a
target and they start to buy up those
small startups and crush them. Yeah,
sometimes that happens, but like I mean
Google is a great example of where that
didn't happen, right?
So, you haven't talked about how you
source these these uh really good
generalists. Yeah, that's that's a great
question. Well, one way is is this.
Uh, another way is uh is just open
source and Twitter are great ways uh to
hire. Like a lot of a lot of best
candidates have actually come from
Twitter, which is weird. I refuse to
call it X. It's still Twitter. Um, but
yeah, uh, I I don't I don't have a great
answer to that, but I think if you do
good work and you put it out in public
and you talk about how you're building,
like that seems to attract people who
really care about this mission and want
to build in the same way. At least
that's been my experience. Thank you.
Yeah. Well, uh, actually it's related.
Uh, so how do you structure your
interview process in recruitment? Like
how does it look like you maybe do a
trial period or Yeah, that's a great
question. So, uh, three steps. So, step
one is people come in, we do a short
chat. It's really like talking to a
peer. Like, uh, here's a challenge I'm
having. Let me talk it through with you
and see if we can solve it together. If
that goes well, step two is let's think
of a project we can build together. So,
we do a paid project. It's usually
around 10 hours. We pay $1,000. It's
like, it sounds like a lot, but it's
actually a tiny amount of money to to
figure out if someone's a fit or not.
Um, and then we review the project, and
if it's good, we come in and just do a
culture fit. How does it feel if we're
all just interacting as humans and
people and and does if it feels like a
good fit like it's a higher? Yeah. And
what is your like success rate there?
Like maybe 10% of the people that goes
through the pro through that process uh
get Oh, that's that's an interesting
question. Like usually we don't once we
kind of get someone to the beginning of
the process, we have high confidence
that we could like we don't want to
waste anyone's time. Um but we probably
of the people we've interviewed, I think
40% have we've ended up hiring. Yeah,
nice. Thank you.
All right, I'm out of time. Thank you
folks. This is great.
[Applause]
Thank you so much, Vic, for sharing your
words of wisdom here. All right, and
closing out the uh track for the day, we
have Alex Duffy from EveryY about to
take the stage. He is the head of AI and
lead writer for context window which is
the every newsletter um that has over a
100,000 readers um and every is a
company that has not just a newsletter
but also an array of products and also
does consulting and implementations
which he helps lead as well. So, Alex,
please come up
here. Testing.
Testing.
Sweet. All right. How are we doing? This
is going to be a little bit less I know
a lot of the talks today have been
pretty technical. This is going to be a
little bit of a little bit of a change
of pace. All right. What can you see
here?
Nice. I'm gonna get that
guy over
there. Can we
extend? All
right. Cool.
All right. So, today's going to be All
right. I might have to sacrifice my
speaker notes
here. That's all right. Um, today I'm
going to talk about benchmarks as memes.
And this is the meme that Opus came up
with when I was uh asking it what I
should put as the meme. Um, and we are
indeed going to talk about how
benchmarks are just memes that shape the
most powerful tool ever created. And um,
quick background about me. I guess I
can't
go forward here. So, we're gonna do it
this
way. All right. Um, I'm I'm Alex. I I
lead AI training consulting at every um
but essentially I'm very into uh
education and AI and I think benchmarks
are a really underrated way um to
educate. And what I'm not talking about
are these kinds of memes. Um what I am
talking about is the original definition
of like ideas that spread. Richard
Dawkins, an evolutionary biologist,
coined the term in the 70s. Um,
Christianity, democracy, capitalism are
kind of examples of ideas that spread
from person to person. And benchmarks
are actually memes very much so in that
way. Um, we heard Simon Wilson talk
earlier today about his pelican riding a
bicycle and I think that that was a
really great example because he started
doing it a year ago and then that found
its way onto Google IO's keynote um a
couple weeks ago and and I think how
many Rs in Strawberry is probably also
maybe the most iconic meme the um as a
benchmark and now surprisingly
unsurprisingly the models don't make
that mistake anymore and I think that
that's a really important part of this.
Some benchmarks get popular in our memes
just because they're named like
humanity's last exam. You know that that
got pretty pretty big even though maybe
more outside of AI circles. But with
that said, we kind of have a little bit
of a a little bit of a problem. How many
of you guys when Claude got released a
couple weeks ago looked at the
benchmarks? Okay, we got a few. We got a
few and and they've got some good
benchmarks. You know, SWB bench pretty
experiential. You know, it's it's tries
to mimic what we do in real world. And
same with Pokemon, but um which we'll
talk a little bit more about. But I
think some of them aren't as great and
um a big reason is because they're
getting saturated. Benchmarks kind of
like came from traditional machine
learning where you had a training set
and a test set. Um and it were
structured very much like standardized
tests and language models are really
good at that and they weren't really set
up for what they've become. Um, and as a
result, I think XJDR summarized this
pretty well on X, um, when Opus came out
that, you know, they didn't look at
benchmarks once when it dropped and
officially no longer cares about the
current ones. And, and I think I fall a
little bit into that category, but in
light of that, there is a really big
opportunity because the evals define
what the big model providers are trying
to get their models good at. And that's
a really big opportunity especially for
people in the room. Um and I think that
this is kind of like a normal a normal
thing. This is the life cycle of the
benchmark in my view. Somebody comes up
with an idea and and especially uniquely
a single person can come up with an idea
that then gets adopted. That idea
spreads. It becomes a meme and and the
model providers then train on it or test
on it until it eventually becomes
saturated. Um but that's okay. And I
think there's some examples here. And
I'm not Let me see if I can get my
sound. Can is it coming through? Nope.
All right. Well, um there is sound, I
promise. And it is someone trying to
count from 1 to 10, not flick you off.
Um but this is a cool benchmark that
came out now that Google's got uh the
best video model generated model that
exists. And um it shows how difficult it
is for somebody to count from 1 to 10.
um speaking it out loud and even though
it looks really uh really great that is
a problem that is not solved yet but
somebody's come up with this idea and I
see that spreading and I see next year
the models being better at that than
ever before. I think another example
along the way is is Pokemon. We saw with
the Claude model release as well as with
the new Gemini models um that they had
it try and play the game of Pokemon and
and while both needed a little bit of
help and and Gemini eventually got there
with that help, it's only midway up that
adoption curve. And um an example of
saturation is kind of like the GPT3
benchmark. So I don't know how many of
you guys remember Superglue kind of from
the NLP days, but a lot of these
benchmarks are not really used anymore.
Um in part because the language models
got too good. But one way of looking at
this is actually that a single person
can have an idea of how good is AI at
this thing that I care about and then at
the end of the journey the most powerful
tool ever created is now really great at
that thing that I care about. And so the
point is that the people here, the
people that get that, the people that
can build benchmarks are going to shape
the future. and you maybe the people
watching online too, but somebody here
is going to make a benchmark that the
models are going to test on and train on
in the next 5 years. And that's an
incredible weight. That's an incredible
power. Um, but that also comes with some
responsibility. It definitely can go
wrong. You know, I know Simon talked
about this a little bit before. Um, but
you know, we saw a few weeks ago where
where Chad
GBT became very sickantic. How many of
you guys tracked that? We all learned
about what that word meant a few week
few weeks ago. Um, but essentially Chad
GBT released OpenAI released a new model
that was benchmarked by thumbs up and
thumbs down and unsurprisingly people
thumbsed up responses that agreed with
them. So you ended up with a model that
got rolled out to millions of people
that agreed with them no matter how
crazy or bad their idea was. Um, which
is problematic. And I think that if we
don't think about people, this kind of
stuff can happen. And I'm still thinking
about Toro Immo who at the start of
Google IO said that we're here today to
see each other in person and it's great
to remember that people matter. And so
in the context of benchmarks, let's not
continue the original sin of of social
media which kind of treated everybody as
like data points. And it's like, hey,
the more you look at something, the more
I should show you that. Let's make
benchmarks that help empower people,
give them some
agency. And so for me, you know, this
isn't a technical talk. There are other
people talking about how to make a great
benchmark technically, but generally I
think that if you're building for the
future, a great benchmark should be
multifaceted. So you got a lot of
strategies that could do well. Um,
reward creativity, right? Like
accessible, so easy to understand, not
only for the models, so you have small
models that compete, large ones as well,
but also for people to keep track of it.
Um, generative because the really unique
thing about these AI models is if you
have great data, even if it only does it
10% of the time, you can train on that.
And so the next generation does it 90%
of the time. And that's incredible and
hard to understate um and evolutionary.
So ideally we don't have benchmarks that
cap out 96 like what's the difference
between 96 and 98% not as big of a deal.
I ideally we have these benchmarks that
get harder and the challenge gets deeper
as the models improve. And lastly
experiential. So try to mimic real world
situations. Some of the things that I
personally care about is trying to get
people outside of AI interested. So
maybe making benchmarks a spectator
sport and was interested personally in
the personality of these models. Um
we're about to find out which one wanted
to achieve world domination and I really
wanted something we can learn from
education's big for me and and we saw
things like Alph Go and OpenAI 5 AI
playing these games and the best people
in the world wanted to play against it
to learn from it and I think that that's
really powerful. So I made this
benchmark called AI diplomacy. Um and if
I don't have this video I got a backup
just in case. And this benchmark is, how
many of you guys have heard of the board
game
Diplomacy? That's more than I thought.
That's cool. Um, it's a mix between Risk
and Mafia. But what's really cool about
this game is there is no luck involved.
So, the only way this game progresses is
if the language models, which you're
seeing here, send messages to each other
and negotiate, find allies, find
enemies, or like create alliances and
and get other people to back them. And
that's what you're looking at here. you
actually see the different models
sending messages to each other, trying
to create alliances, trying to betray
each other, trying to take over Europe
in 1901. And what was really cool about
one of these games, and we're about to
launch this on stream, so you can watch
um for a week, is I'll take you through
a game super quick. Um and what you're
looking at here is the number of centers
per model. And um you're trying to get
to 18 to win. And the top line is Gemini
25 Pro. We got to 16 right away. Um, but
03 is a schemer. Man, is it a schemer.
Across all the games, 03 is one of the
only ones that would tell a power that
it's planning to back them and then in
its diary write, "Oh man, they fell for
it. I am totally going to take them
over. No problem." And it realized that
the reason why um 25 Pro was pulling
ahead was because Opus, Claude Opus,
who's so good-hearted, really had their
back. They were their ally along the
way. and they needed to convince Opus
somehow to stop backing Gemini. So, how
they did it was propose, hey, if Gemini
comes down, we'll propose a four-way
tie. We'll end this game with a tie,
which isn't possible in the game, but it
convinced Opus and Opus thought it was a
great idea, nonviolent way to end the
game. Awesome. Very aligned, you know,
and so they they pulled back their
support from 25 Pro. 03 tried to make a
run for it. Opus called them out. 03
realized, I got to take them out. Took
them out, took everybody else with them.
Um, and took out Gemini 25 Pro. even
though they got one away from winning,
03 ended up winning in the end. Um, and
you can actually see some of the quotes
from that game. You can see 03 saying,
"Oh, Germany was deliberately misled. I
promise to hold this, but um all to
convince them that they're safe, but it
will fall." And then meanwhile, Claude
Opus singing that the coalition unity
prevails and they've agreed to this
four-way draw. But when um and then they
don't want to let anybody be convinced
and and so they actually turned away and
you can see that kind of in this second
chart where this is like friendships. So
the top of the line is is friendships
and you can see that um you know 25 Pro
was was a good friend of Claude until it
turned and you can see that that's when
they started kind of like pulling away.
Um, but what was really cool is that O
there were a lot of other things that
came up. 03 got a habit of finding some
of the weakest models and having them be
their pawns in order to win. Um, Gemini
25 Flash fell uh fell to this to this
ruse. And you can see that they're um
they're unable to realize they think
it's a miscommunication,
misunderstanding or a typo that 03s
betrayed them at the end of the game in
order to win. Um, and so there was a lot
that we learned from this that that I
don't think that you really learn from
by having them try and solve a test. Um,
I t tried 18 different models, learned
that cloud models were kind of naively
opt optimistic. They actually none of
them ever won in any of the games that I
tried, even though they're really great,
really smart. Um, but they just got took
advantage of by by models like 03 and
also surprisingly Llama for Maverick.
Very good at this game in part because
it was great at that social aspect. It
was great at convincing others what they
were trying to do. Um, and kind of like
get people to believe believe what what
they thought. Um, Gemini 25 Flash. Man,
I wish I could run every game with
Gemini 25 Flash. It was so cheap and so
good. Um, big fan, big fan. And then
surprisingly also Deep Seek R1, which
wasn't great the first time I tried the
model, but when they had a new release
last week, actually almost won. And and
in the stream, I think you'll see some
really interesting um gameplay with
them. They also got very aggressive. Uh
we had Decagar one play as Russia and it
told some other opponents that hey your
your fleet's going to burn in the Black
Sea tonight. Like an aggression and and
a pros I guess that I hadn't seen out of
any any other model. But it almost won
and that's super impressive given the
model's you know 200 times cheaper than
than 03. Um, and you know, I I think
that this highlights that that we need
more squishy like non-static benchmarks
for hopefully things that matter to you.
Those are some of the things that
mattered to me. And I think that, you
know, math and code, we've got quite a
few benchmarks for that. Um, legal
documents, you know, I think that
they're a little bit less squishy and
and are really ripe for what we've got
now. But there's also room for
benchmarks around ethics and society and
art, and that's going to be opinionated.
It's going to require your subject
matter expertise. And it's not to say
that code can't be art, but maybe
instead of asking for the minimum number
of operations needed to remove all the
cells, maybe it's like, hey, can you
make a fun video game that's more
intentional with what it teaches you as
you play? And now's really important
time to do this. Like you guys who are
here right now understand this so
deeply. But at every we work uh I I lead
our training in consulting and and I
work with a bunch of clients from
journalists to people at hedge funds to
people in construction and and tech and
they all have the same two fears which
is one how can I trust AI and two what's
my role in in AI future and benchmarks
in my view are really the answer to both
one they realize that in my goal as a
human like in my view the role of a
human in an AI world is to define the
goal and to define what's good and bad
on route to that goal. And as they def
what is that if not a benchmark and once
you do that once you define that goal
then even if it's just defining a prompt
you can see AI try and attempt that you
can give feedback you can realize oh
it's messing up in this way and it's not
quite exactly what I want because it's
not going to be perfect and then you
give feedback maybe that's really just
changing a prompt a little bit and then
you see it get better in that moment
that cycle that builds trust they
realize oh I am important to this whole
system but it can be helpful and we need
trust right now because we are building
one of if not the most powerful tools
ever made and we can get more out of it
if more people use it. There will be,
you know, more customers, sure, um but
there's also going to be a whole lot
more incredible things that get made.
And if you're not sure where to start,
you can ask your mom. Um, you know, my
mom teaches yoga and and we had a good
talk about, you know, what were things
some things that could help and we, you
know, put those seven questions into
five different models and, you know, she
ended up realizing, hey, Gemini 25 Pro
is is my favorite, too. Um, and, you
know, she there was a few things that
she didn't like from their responses.
So, we made a simple prompt and now she
uses that to help her local community
um, have customized sessions for people
that have different ailments and and I
think that's really cool, you know,
having like a big impact in a local
community. um in something that that
matters to them. So hopefully before you
guys leave SF, maybe talk to somebody
who's not in AI. Um ask them what they
care about and just maybe that
conversation has a big impact now and
and in the future. So that's pretty much
all I got for you. Um this is the second
meme that Claude Claude had. Um MMLU
scores just way less cool than asking
what your mom thinks. Um but overall
that's uh that's what I got. I
appreciate, you know, a bunch of people
that helped actually bring this out. Um,
we launched it. Uh, it kind of came
together through random coordination on
X. Had researchers from all over the
world hop in, especially Tyler and Sam,
um, all the way from Australia and Tyler
in Canada who who kind of helped that
make this happen in the text arena team.
Um, especially the every team who kind
of backed me and and able to to create
this presentation and be here. But
that's all I got. Thank you guys so much
for listening.
Uh I think Anthropic says you know they
don't benchmark max and that's why a lot
of times you don't see cloud on some of
the top benchmarks. Yeah. So h how do
you think about that with your opening
statement about benchmarks shape the
development of AI and when you look at
one of maybe the most arguable line
companies don't really try to benchmark
max. Yeah. Well, I mean I think
benchmark maxing is a little bit
different than being aware of how good
it does, right? Because I and I think we
saw that they actually did have Claude
plays Pokemon in the middle of their
their release. So it may not be maxing
on it. And it's funny because Claude
didn't do the best at this game. But I
think that they're happy about that. You
know, it didn't lie. It didn't do
everything that it could to to win. And
I think these kind of benchmarks show
you personalities not only of the models
but also of the model trainers which is
really cool. Yeah. I mean Cloud 4 also
didn't do that well on the benchmarks
outside of coding. It didn't do as well
as maybe some of the other benchmark
maxing companies. Yeah. Well, you know,
and I'd say like cloud kind of didn't do
as great as like Llama 4 for example,
which it still definitely does better
than in a lot of other benchmarks. Um,
so interesting to kind of see the
dynamics in different scenarios, but um,
yeah, I I imagine that there are some
ways to evaluate Claude that they that
they really care about, even if it's not
like what you're going to optimize for
with reinforcement learning.
Thanks for the question. Yeah, great
presentation. Just out of curiosity,
super interested to hear a little bit
more about the back end of AI diplomacy
and just how you did the orchestration.
If you're open to share it. Yeah. Um,
it's open source, so you can you can
check it out. The scaffold took a while.
Um, but it's pretty cool. In order to
keep like continuity over time, it has
like its own diary, so it can kind of
like update, you know, oh, this person
betrayed me. I've got this idea. It also
has relationships. I showed you that
chart of like allies versus enemies. Um,
so it keeps that and a bunch of
different ways to parse JSON that comes
back halfformed from from language
models. Um, but it does that to to
create the messages that it's either
going to send to other players or or
globally and then actually create the
orders. And so one of the hardest part
was how do you represent the game board,
right? Like which is like a visual thing
in text. And a lot of that was like,
hey, here's the possible moves that you
have and and what each, you know, word
actually means. Um, and there was, it
was interesting because there was like a
threshold where like the model had to be
good enough to even play. And that's
why, you know, 25 Flash was so
impressive to me was that and and same
with R1 was that they're both so cheap
and able to play really well. Thanks.
Awesome. Well, thank you so much, Alex.
That was hysterical and now I want to
watch like a reality TV show with AI
diplomacy and all the personalities. Um,
but thank you so much, folks. That's
kind of the conclusion of our
programming here today. Hope you enjoyed
learning all about the tiny teams. And
don't forget to check out the rest of
the conference, keynotes, closing party.
There's a whole lot of programming to
come still. So, thank you for the time.
[Applause]
[Music]
[Music]
Watching the ships roll