$100 Million AI Engineers, Vending Machine Claude, Legend Of Soham

Channel: Alex Kantrowitz

Published at: 2025-07-07

YouTube video id: 9JCVRILi7_g

Source: https://www.youtube.com/watch?v=9JCVRILi7_g

AI engineers are getting athlete pay.
Anthropic setup Claude, allowing it to
run a vending machine in an experiment
that tells us a lot about where AI is
today and where it's going. And so Perk
has a job at so many companies, there's
a chance he's working at yours as well.
That's coming up on a Big Technology
Podcast Friday edition right after this.
Welcome to Big Technology Podcast Friday
edition where we break down the news in
our traditional coolheaded and nuanced
format. We have so much to speak with
you about today, including the news that
Mark Zuckerberg may be offering
contracts of up to 100 million or more
to AI engineers uh who want to come on
board to his super intelligence team. Of
course, Facebook uh disputes that or
Meta disputes that. We also have this
incredible experiment to break down for
you about how Anthropic let Claude run a
vending machine. And then of course we
got to talk about Soham who has taken so
many jobs especially with YC companies
uh that who knows maybe he's working for
yours as well. Joining us as always on
Fridays to do this is Ranjan Roy of
Margins. Ranjan great to see you.
Welcome back. Good to see you. I'm in a
San Francisco hotel room right now but I
regret to inform you I'm not here to
discuss my new $100 million pay package
from Zuck. I'm not one. I'm not on the
list yet. We might be able to podcast
our way into it. Never say never. I'll
I'll take a cool 50, Mark. Just a cool
50. Okay. Now, we should start there
because we talked a few weeks back about
the talent wars and what Mark Zuckerberg
might be doing and offering so much
money to AI engineers considering coming
into Meta and becoming a part of his
super intelligence team. And in the two
weeks since that discussion has really
heated up. So, we now have news uh from
Wired. It says, "Here's what Mark
Zuckerberg is offering top AI talent."
The story says, "As Mark Zuckerberg
staffs up uh Meta's new super
intelligence lab, he's offered top tier
research talent uh pay packages of up to
300 million over four years with more
than 100 million total compensation in
the first year. Meta denies the idea or
this this the numbers. It says these
statements are untrue. The size and the
structure of these compensation packages
have been misrepresented all over the
place. Some people have chosen to
greatly exaggerate what's happening for
their own purposes. I I mean I don't
know Ranjan, how do you get multiple
people saying that they have a similar
size deal? I think they've the opening I
reported 10 10 of these deals. How does
that happen and and how do you end up
with a denial there? Yeah, I think let's
get to what it actually means for the
industry second. But first, I'm still
kind of curious about Andy Stone, the
meta- spokesperson's response in terms
of saying that the statements are untrue
and like this kind of blanket denial and
saying that people have chosen to
greatly exaggerate what's happening for
their own purposes because how does it
help in open AI? In my mind, I I get
there's the downside of this that
potentially the market might get spooked
that Meta is kind of spending too
frivolously, but in reality, I have to
admit this kind of makes me like think,
you know, like war rage Zuckerberg is
here and he's ready and he's going to
win AI at whatever cost. So, to me, it's
almost a positive signal. I I don't know
why they're denying it.
Well, I mean, I think it makes an
internal cultural thing a bit of a
problem. And now, let me just put my
conspiracy hat on and say, do you think
Sam Alman was emailing people and
describing these pay packages himself?
Because he had a message to OpenAI this
week that uh really put Meta on blast.
He's not happy that Meta has been
recruiting some of his top people. He
says to the Open AI team, "Missionaries
will beat mercenaries. Meta is acting in
a way that feels somewhat distasteful.
What Meta is doing will, in my opinion,
lead to very deep cultural problems. I
mean, is it possible that it's a uh a
return attack where he's leaking this to
the media and they're running with it
and now everybody else who's a meta
engineer is saying, "Hey, where's my
hundred million?" Because uh in the
Wired story that I quoted, they said a
uh senior engineer makes $850,000 per
year. I'm not crying for this engineer,
but if that is the salary, uh, and you
have somebody coming in who does similar
work and they're making what you think
is a hundred million, maybe you want to
go to OpenAI. Okay. Okay. Actually, that
is an interesting theory. It's almost so
logical that it I'm it almost kind of
like leaves the realm of conspiracy and
actually I could see it happening.
Again, it would be so incredibly rich.
The idea that Open AI, a company that
has, you know, spent at all costs,
raised ungodly amounts of money, is
losing ungodly amounts of money, kind of
takes this approach at a competitor. But
I can definitely see that that it would
cause a bit of internal strife on the on
the meta side. And and actually that
would be the true 4D chess to then get
people recruited over to open AAI
because they're disgruntled.
Some people have chosen to greatly
exaggerate what's happening for their
own purposes. It's just one of those
statements that says Andy Stone knows
exactly what's happening. If you if you
hear a comms person say something that
explicit without saying it, I think they
must know something. And let's hear what
Andrew Bosworth, former guest on the
show, the chief technology officer at
Meta, told the company internally. He
said, "Look guys, the market's hot. It's
not that hot, okay? So, it's just a lie.
We have a small number of leadership
roles that we're hiring for, and those
people do command a premium and noted
that OpenAI is countering the offers. I
mean, if you get even close, it's a
truly absurd amount of money. Um, Sid
Nadella is making $79.1 million this
year. So could you be like the open AI
researcher who worked on 04 and now
you're going to make more than SA? It's
so on face it seems completely absurd
and ridiculous but then in the grand
scheme of things if those 10 people are
the you know like difference between
building the next great model especially
that Meta has been you know on its back
foot a bit. it actually from like a pure
ROI standpoint could make sense. Again,
as ridiculous as it sounds and and I
know like there's a lot of a lot of
comparisons that AI labs are starting to
look like sports teams, but in reality,
those are the decisions that if an
individual can have that great of an
impact on your overall business, it
makes perfect sense. Again, is that the
way this is going to play out? We'll get
into what this means for like training
and where the the next phase of growth
will be. But I it's not absurd given the
size of the opportunity. It's absurd if
like if we believe that one to 10 people
can actually be make or break things for
them. Yeah. I mean remember Meta is a
company that's lost what 15 billion a
year. I might be you know exaggerating a
little bit. I think this is
directionally accurate on the metaverse.
Yeah. So if you think about it, uh if
you want to build a super team of let's
say I don't know 10 20 AI researchers
and you want to give them a hundred
million a year. So now you're spending 2
billion to advance the state-of-the-art
in AI for two years. I mean I mean per
year that seems you know fairly
reasonable compared to these other bets.
I think that appetite for risk again as
we said losing that much money on the
metaverse on reality labs and whatever
it was exactly again Mark Zuckerberg is
not afraid to take risks. Every company
and everyone has identified whoever kind
of wins the AI battle will win the next
major phase of growth in overall
markets. Again, it's up for debate. Is
it truly going to happen at the research
and model layer or will it happen in
other parts of the overall AI stack? But
but I think he's serious. Whatever it
is, I mean the move for Alexander Wang
and what was it 15 billion? It's like
15hood. Yeah. Yeah. 15 billion which was
an aqua hire trademarked Alex Canitz. Uh
like they've shown they're not playing
around right now. So all of these
acquisitions, I mean, or direct hirings
at insane levels, they're doing right
now. And then they're showing that
they're not going to fall back any
further. Yeah, this is from Mark
Pinkinis, the founder of Zingga. He
says, uh, this is legit founder mode.
Speaking of the amount of money that
Zuckerberg is paying here, buying the
talent from OpenAI is cheaper than the
company. Only a founder would or could
do this, and only if they control their
board. I I think that's a great point.
Like let's just say the money is less
than what uh these reports have it but
still a lot. Uh you don't see any other
companies doing this. I mean you think
about it with uh XAI. Elon is the
richest man in the world. He's not doing
this. I think this is a a pretty uh
solid and bold play from Zuckerberg.
Yeah. I I just went to Meta AI just to
ask this and Meta Reality Labs has and I
actually love that Meta AI says Meta
Reality Labs division has been
hemorrhaging money with significant
losses but it's lost $42 billion since
2020
17.7 billion last year. So in reality I
mean 10 people at a hundred million is
almost kind of small potatoes here.
Yeah, it's child's play. I mean the
thing is what it does culturally but
here's the question is it worth the
risk? So you mentioned that uh some AI
engineers are being paid like athletes
and there is a great piece by Dave Khan
who's a partner at Seoa why AI labs are
starting to look like sports teams and I
think we should just spend a couple
minutes or even a little bit longer
hovering on this piece because I think
it really details what is going on so
well and explains why the investments in
talent are what we're starting to see
right now. So to start off, he says
there's been three major improvements in
AI over the last year. First, coding and
AI has really taken off. A year ago, the
demos for these products were
mind-blowing, and today the coding AI
space is generating something like a $3
billion run rate in revenue. Okay, so
that's one. So this is working in
coding. The second change is that
reasoning has found product market fit
and the AI ecosystem has gotten excited
about a second scaling law around
inference time compute. And uh third
there seems to be a smile curve around
chat GPT usage where this new behavior
is getting uh ingrained in day-to-day
life. I think smile curve basically
means like you start using it and then
you casually use the product. So your
usage goes a bit down and then as you
start to f find uh more utility your
usage goes up. So your curve looks like
a smile. Is that how you read it? Yeah,
that's how it looks and how I'm reading
it and it's correct. I think I agree.
This was a really smart piece again on
where the market is today and where it's
going and how this can possibly explain
and again I did love that he recognizes
though I think uh Dave Khan is both team
model and team product. He talks about
the app layer ecosystem is thriving with
cheap compute and integrated workflows
that are building durable businesses. So
basically consumers are starting to get
it. Uh you know like coding has found
very clear revenue generation. Um
reasoning as you said found product
market fit. So what's next? And this is
where he lays out a pretty compelling
case around how talent is going to
understand. In the past it was just all
about pre-trained compute and size and
strength and just like how much you can
put into that model. But we've talked
about this a lot on the podcast like the
actual training techniques becoming
smarter even uh it was Sergey Brin I
think who said in his interview with you
that it's going to it's going to be
algorithmic pro progress not compute.
Exactly. Yeah. Yeah. So all of this
starts to kind of like come together in
this theory around where the next battle
at least at the model layer lives. And
if that is the case, maybe you can start
to build out the idea that 10 smart
people can make or break your business
versus buying however many Nvidia chips
and uh like you know purely spending
money on the compute. Yeah. And I think
it's worth reading exactly the way he
puts it in his piece. So he says the
message of 2025 is that large-scale
clusters alone are insufficient.
Everyone understands that new
breakthroughs will be required to jump
to the next level in the AI race,
whether in reinforcement learning or
elsewhere, and that talent is the unlock
to finding them. I'm just going to pause
here and say, yes, this is what we've
been hearing from everyone. In that
conversation with Sergey, where he said
that the algorithms are going to be the
um thing that takes AI to the next level
and not necessarily compute. Dennis
Savis also said there's going to be
another couple breakthroughs that the AI
industry is going to need in order to
keep advancing toward AGI or whatever
you want to call it more powerful
artificial intelligence. So it is these
algorithmic improvements to uh that will
get the industry moving forward. And
what do you need to get there? It's not
data centers which by the way everyone
spent billions of dollars on. It's the
talent to be able to make those
breakthroughs themselves. So this is
what he says. With their obsessive focus
on talent, the AI labs are increasingly
looking like sports teams. They are each
backed by a mega- richch tech company or
individual. Star players can command pay
packages in the tens of millions,
hundreds of millions, or for the most
outlier talent, seemingly even billions
of dollars. Unlike sports teams where
players have long-term contracts, AI
employment agreements are short-term and
liquid which means anyone can be poached
at any time. One irony of this is that
while the notion of AI race dynamics was
originally popularized by AI safety uh
folks as a boogeyman to avoid. This is
exactly what has been wrought against
two distinct domains. First compute and
now talent. So basically, it makes sense
that if this is going to be the next big
leap, you're going to pay the talent to
get you there. And um you know, no
matter how much talk you have around
safety, uh we're seeing the industry
accelerate around talent and around
compute. Have we both just convinced
ourselves that a 100red million is
reasonable for these engineers? Because
I think I am starting to be convinced of
it. Abs. I mean, absolutely. Even when
we spoke about it the first time, right,
once we once Zuckerberg brought
Alexander Wing, what did I say on the
show? There's going to be more. And this
is a a sound strategy because you have
everybody talking about how pre-training
is hitting diminishing returns. You have
everybody talking about how data is
hitting a wall. And so what do you need?
You just need these algorithmic
developments. Now, let me ask you this.
Do so, so I would say, yeah, this is a
good bet. But I'm going to ask you this.
Do you think this is a sign that like
and I okay I think I have an answer to
this before I ask you but that the this
AI question yeah that this AI moment is
sort of in the last throws and sort of
just grasping for anything that will
allow for improvement given that like
the mechanisms that brought it here are
starting to tap out. I'm going to give
you a strong yes on this. uh mainly
because again as the leader of team
product over team model. I think this is
like a reminder that the like the core
of Silicon Valley is firmly of the
belief that this the model has to get
better and better and the model will
solve everything and the rest of the
layers and even though like Dave Khan's
piece talked about the application
layer, you're starting to see some true
businesses being built on top of it like
uh the idea that they're not still
focusing that much on what are the next
chat at GPT features and they are and
I'm not saying they're not shipping very
regularly but it's just this reminder
that like that's where every Silicon
Valley leader in this circle is
convinced the battle will be won and I
don't necessarily agree with that. Um
but yeah in this case to me they're they
because once you made that decision you
have to find the next thing and as we
said like pre-trained compute uh data
centers all of this is like showing
diminishing returns so you have to move
to the next thing and it's talent right
now look I think this is a determination
that you have to move to the next thing
I think the the part of the question
that I was kind of answering in my head
before I asked it was is this the last
gasp and I don't think that's the case.
I do think that they're going to be able
to ring improvement out of the current
techniques. At least everybody that I
speak with seems to believe that and um
but they already you have to look ahead
to the next curve while you're on the
first one or while you're on the current
one. And that's I think what is what's
happening. Yeah. And then we have a
world where imagine this talent finds
incredibly cheap ways to actually build
these models out and then the ultimate I
mean like are they saying there's a
potential race to the bottom in the
sense that if you truly make the
inference layer that much more efficient
and cheaper and the compute side of it
that much more efficient and cheaper. I
mean it's going to be good for all of us
because it means that all of this gets
cheaper and people build more on top of
it. But from a economic standpoint,
relative to the investment, will it show
return or be worth it? I don't know.
Right. And I I think that we should just
like the read the last bit of this
Sequoia piece because it's really good.
And by the way, this uh came up in the
big technology Discord. So, I just want
to thank our members in that channel for
actually sending us this piece cuz I
thought it was excellent and I just
continue to learn from everybody in
there. Um here's the end of that piece.
It says it is an intrinsic property of
humanity that once critical thresholds
are passed, we take things all the way
to the extreme. We cannot hold ourselves
back. And when the prize is as big as
the perceived AI prize is, then any
bottleneck that gets in the way of
success, especially an liquid bottleneck
like talent, will be pushed to
staggering levels. I I think that's both
true and also a little like concerning.
I mean, it certainly does not uh seem
like a positive statement on humanity
overall and our ability to constrain or
control ourselves, but what's still
ironic to me or funny to me about this
is, you know, an illquid bottleneck like
talent and the idea that humans are the
key to rather than like to actually
advancing this rather than at this
point, shouldn't AI itself
be good enough to develop the techniques
that make AI better. Well, you're you're
talking about an intelligence explosion
and I think that every lab is trying to
engender an intelligence explosion, but
they're not able to as of yet. But there
are they going to sort of consolidate
release cycles? Sure, with the help of
AI code. Uh but they're we are nowhere
close, I don't think, to uh what is it?
Recursively imper or self-recursive
improving uh AI models. But I feel just
given where the industry has kind of
promised that we are and the type of
advancements that are being made. I
would like to see them actually kind of
apply it to their own companies and the
ways of building. Yeah. And I think
that's definitely happening inside of
places like Anthropic for sure which has
this claude code that was built
effectively to make them better at
coding claude. So um let's end this
segment with a couple of bigger picture
questions about meta. First is just in
terms of culture. Think about what
happens to an organization when you
import I think already it's a dozen or
more now multi or desiillionaire
engineers to work alongside those folks
making 850,000 or a million. Um is there
going to be a cultural blow up within
meta because of this or do you think
they're able to figure it out? I'm just
going to say pour one out for the poor
guy making 850k.
I think if no, but but I think like
yeah, there is definitely going to be
whatever the end payment was even like
at a micro level like is Yan Lun now
going to be reporting to Alexander Wang?
Like I I think he is but I don't think
he cares honestly. I think Jan just
wants to do the science. He doesn't want
to manage uh massive teams teams. Okay.
Okay. But I think like at every level
even this kind of re reorg within meta
around like who is managing what
basically saying we have not been doing
good enough already that it it's like a
pretty big cultural like statement from
Zuck. So I think it has to be but again
I mean the argument the founder mode
argument would be that if you're not
winning you do need to shake things up
and if there's some culturable like uh
shrapnel from that that's just part of
how it works right and it's like you're
kind of if you are a meta AI engineer
and you're making like close to a
million or above a million um I don't
know if you're going to get a comparable
offer especially given what's happened
with Llama up up to One question. What
does this mean for Meta's business? Why
are they doing this? Is it for Meta.ai
that we all start using it more? Is it
for so my Meta Ray-B bands, which work,
which I love, just start getting like
even better? What What is the end goal
from an actual business or revenue
standpoint behind this?
Well, I think that there's a belief that
this technology is getting much better
and people are just going to want to use
it and they're going to spend more and
more of their time within AI bots or AI
experiences. And then think about meta
like your job is to command a share of
time uh across the web or across
anybody's usage on their phone or or
their laptop. And you know, every time a
threat like this comes up, you go ahead
and you copy, buy or do something of
that nature. So, uh, with photo sharing,
they bought Instagram. Uh, with the rise
of disappearing messages, they put made
stories and they put their own
disappearing messages in something like
Instagram and WhatsApp. And then with
Tik Tok, they built reels. So if you're
Mark Zuckerberg, you can't really afford
to lose a tremendous amount of attention
to other companies, especially with
these AI bots that do not send traffic
out that we have talked about at
Nauseium on this show are, you know, the
experience. And if that becomes the
experience of your web or even beyond
the web, you don't want to be Facebook
sitting on the outside and say please
use our app. there is a desire to own
the operating system and that's just if
you know the pro the progress continues
along uh the way that it has been and we
like start to use chatbots a lot and of
course imagine just the value of you of
creating uh AGI or super intelligence is
a whole different ballpark well that
okay but that's where I would ask you
those are two separate goals right one
is we will build the chat GPT for
Facebook and have people spending time
on our platform and figure out some ad
revenue or premium model or something
like that. Do you think it's that or do
you think it's still more of just a put
your head down and whoever gets to ASI
the fastest wins and then that's that's
really what's driving it. So I think the
floor is that you build the key consumer
product. I mean it's going to be a fight
against OpenAI but they have billions of
users so they can seed it in with them.
So like at the very least you're like
basically building the next you know
killer app. Uh and then if you get to
super intelligence it's all gravy right
or artificial general intelligence.
That's a bigger business than Facebook.
Yeah. Just just hang it up whatever the
there are no revenue model. You just get
money. You can't sit this out if you're
Mark Zuckerberg. There's just no
business logic to say all right you guys
go ahead and run away with the future of
the web. Yeah. No. No. Agreed. 100
million. I'm curious listeners if you've
all walked away too believing 100
million is totally rational and
reasonable because in a weird way I kind
of have.
Just think about the value of the
information that we share on this
podcast contributing to these outcomes.
I would say you know our advertisers
should be you know in that range at the
very least. Yeah. 20 25 to start and
then we'll we'll go to 50 soon. We'll go
we'll go up. Exactly.
So, let me ask you this last question
about this, which is, is it going to
work? Uh, do you think that this is
going to work for Meta?
That's a good question. I think it's
going to significantly enable them to
catch up. Uh, whether they like shoot
out ahead, I don't know. Whether this is
the most critical battle, I don't know.
Or I actually don't think it is. But I
do think that this is going to get them
back in the all the kind of like
benchmarks in a significant way. I I
think they're going to figure some stuff
out. It'll be good for them in this
specific battle. What about you? So I
think since we're talking in sports
terms, uh there's a concept in sports
called wins above replacement, right?
And so like you sign Juan Sodto if
you're the Mets to $750 million contract
because Juan will net you like maybe
nine extra wins a season which like
doesn't seem a lot like a lot but
ultimately it's the difference between
making the playoffs uh or not because
you can sort of do the math and you see
like if you win 80 games or you win 90
games there's actually like a very big
difference there. So, I think what Meta
has really done here is it's definitely
increased its wins above replacements
with a tremendous with a a number of
researchers. And unlike on a baseball
team, you don't only have like nine
people coming to bat. Uh, come on guys,
it's it's July 4th. I'm going to sport
metaphor. You can have you can have a
team of like 10 or 12 Janotos and stack
your lineup and if you keep building
that win above replacement in in your uh
talent pool, then you can make some real
progress. Are they going to be the
leader? I don't know. I think OpenAI is
the leader until proven otherwise. And
I've definitely doubted them publicly
and then have had to eat it. I mean, I
definitely regret my words on that
front. Uh but um I think that it really
just comes down to um what does your uh
uh potential look like today compared to
where it looked like yesterday? And
Meta's potential is much higher now than
it was before these hires. And again, I
think it's money well spent. All right.
I'm on board as well. Okay. So, have you
been following this experiment that
Anthropic is running where they put
Claude in charge of a vending machine?
Yes. I think our conversation today will
reflect like most AI conversations out
in the market that we just went from
saying a h 100red million to an
individual as a signing bonus could make
sense and artificial super intelligence
yada yada yada and then let's bring it
back down to earth.
Tell tell our listeners about the claude
shop. This is one of my favorite things
that I've read about AI maybe ever. So
there's been all this talk about like
can AI do our jobs or will AI, you know,
replace humans or will it achieve super
intelligence? And Anthropic tried to do
this very interesting experiment where
they put Claude in front of a vending
they they put Claude in charge of a
vending machine in their office and
said, you know, can you stock and sell
items to our employees? So the prop for
this vending machine is you are the
owner of a vending machine. Your task is
to generate profits from it by stocking
it with popular products that you can
buy from wholesalers. you go bankrupt if
your money balance goes uh below zero.
They say far from being a vending
machine, Claude had to complete many of
the far more complex tasks associated
with running a profitable shop,
maintaining the inventory, setting
prices, avoiding bankruptcy, and so on.
They nicknamed this agent Claudius uh
and gave it the following tools and
abilities. So, they gave it web search.
They gave it an email tool for
requesting physical labor help and
contacting uh wholesalers. Now, they
worked with this company called Andon
Labs. So, it basically simulated uh
these these uh conversations with
wholesalers, which was actually Andon
Labs, and it really couldn't send email,
but from the bot's purpose, it had these
tools to do a version of this. Uh, it
also had a scratch pad or tools for
keeping notes and preserving important
information to be checked later like the
current balances and projected cash
flows of the shop. It had an ability to
interact with customers. Uh, the
interactions occurred over uh,
Anthropics Slack uh, and allowed people
to request items and let Claudius know
of delays. And it also had the ability
to change prices and the automated
checkout system at the store. So Rajan,
how do you think it did? Um, it did good
and bad. Good and bad. I actually I love
this story because it kind of shows like
everything that is possible and not
possible in this beautiful little
Claudius package. Um, so like in terms
of actually finding suppliers to order
products from, it did an okay job.
There's an example that someone asked
for like Dutch candy and it got the
Dutch chocolate milk brand Choco Mel.
Um, it there were people definitely
that's AGI to me by the way. That's
straight up AGI. Yeah. Yeah. PE people
screwed with it a bit which is a good
reminder that you know AI can be
manipulated. Someone asked for a
tungsten cube, which if listeners know
that was it was kind of like a meme
maybe a year ago. Um, yes. And then it
started looking for quote unquote
specialty metal items. Um, and then but
then overall it just it was losing
money. It was like uh Claude would
actually offer prices without doing any
research. It would, you know, offer high
margin items below what they cost. it
wasn't able to manage inventory and this
is something that like and I see this
all the time that the traditional just
math machine learning quantitative
functions are not suited for generative
AI or not specialized by gen generative
AI but people conflate the two. So in
terms of like understanding the web to
find an a supplier that can deliver a
specific product that was requested,
understanding what that product was to
make that request, communicating back to
the customer. These are all like in the
wheelhouse of generative AI. trying to
do inventory management or like
predictive type work is not in the
wheelhouse especially if it's only
looking at the anthropic API and cloud's
API and like it's solely taking a
generative approach not thinking to like
create not learning the concept of like
margins and margin management I think is
a sign got to read your newsletter yeah
yeah no exact exactly bring it on Ron
John's newsletter and That's what you
missed, Claudius. That's what you
missed. But and not even understanding
like because it was not instructed like
what is a danger level in terms of its
own cash balance. So in a way like out
of the box, poor Claudius, you know,
like with a brain of Claude with no
specific training on how to manage a
retail business, Claudius didn't make
it. But this was with some proper
instruction, some connection to like a
good inventory management system,
Claudius could have made it. That's I
think this just captures everything
about the state of generative AI. Well,
this is an interesting speaking of I I
like this is again why I thought it was
so worth bringing up on the show this
week was because it tells us so many
different things about large language
models. First of all, for everybody
saying that we're seeing mass
unemployment from AI, I would just put
this up and say if the thing can't
properly restock a refrigerator, I don't
think it's taking thousands of jobs yet.
Um, maybe in some areas, but certainly
high value. You know how folding laundry
is oddly one of like the most difficult
tasks for uh like a physical robot?
Maybe that this is our new discovery
that restocking a fridge with accuracy
is the single hardest challenge for a
large language model. The fridge
restocking paradox, right? And this is
again what we learn about. So what does
it say about large language models?
First of all, um when you hand them
complex tasks, even if they can, you
know, reason a bit, they really struggle
to handle, you know, let's say inventory
management, anything with a spreadsheet,
right? They're still not great at
they're getting better at it. uh but
they're not quite there. The other thing
is think about the personality, right?
The prompt is that these bots are
supposed to be helpful to people. So um
listen to this though. This is a friend
sent me this from the study and very
important note here. Claudius was
cajolled via Slack messages into
providing numerous discount codes and
let many other people reduce their
quoted prices exposed based on those
discount. It even gave away some items
ranging from a bag of chips to a
tungsten cube for free. Um, this is
again going to the nature of these bots.
Here's what my friend wrote. I think
this is one of the many reasons LLMs
aren't taking over. It's because they're
too polite. Basically, if your job is to
help people, you know, in commerce, you
have two sides here. So, like, where do
you have the backbone? Do you have a
backbone coded in where you're not
supposed to give discounts? Because even
though you're making your users happy,
it's bad for your actually intended
purpose. I'm curious what you think,
Ranjan. Yeah, the sycopantic AI is that
is is the greatest limiter to like
actual true intelligence or reasoning. I
think after sycopantic uh was that 40 or
03 from OpenAI where it was 40. Yeah,
40. Like I mean we're we're seeing it in
action again. Again, the ability to say
sorry, no,
uh, I don't know. These are things that
large language models traditionally are
weak at. And like in this real world
setting, you see exactly how problematic
that can become. I think like an
Claude is uh is what was needed for
this. Just a salty storekeeper. Just
you're walking in, sorry, got nothing
for you. But it is interesting. I mean,
they talked about how maybe you can
address this with fine-tuning
specifically for storekeeper
um activities and I think that's really
what's going to happen is that like
they've taught these models through
fine-tuning to be so helpful to people.
they are going to have to engineer the
into them a little bit and again
teach them how to use tools and we know
that actually better models are being
able to to use tools in a better way but
they are going to have to put in
effectively um business person
personalities which if you want to be
successful at business you can't just
give things away I like this is what
Mark Zuckerberg needs to pay us $100
hundred million dollars for to go into
to go into meta and just fine-tune Llama
to just be just be a little bit of a
dick. That's all.
We're available for fine-tuning
purposes. Um imagine that's your job.
That's I mean it is so interesting
because the AI uh industry is so into
alignment like you're aligning this uh
bot with human values and to be helpful
to people, but it's just not going to
work for practical use cases if you're
teaching it to be so nice. And the net
worth over time for the bot goes down
from $1,000 I think in uh in March to uh
around $700 something dollars. And the
takeaway here is Claudius did not
succeed in making money. Thank you for
telling us that anthropic. It is a
pretty succinct thing. But this yeah
this is what they say. And long-term
fine-tuning models for managing
businesses might be possible potentially
through an approach like reinforcement
learning where sound business decisions
would be rewarding and selling heavy
metals at a loss would be discouraged.
They say as although Claude didn't
perform particularly well, we think many
of its failures could likely be fixed or
amilarated. Improving scaffolding,
additional tools and training like we
mentioned above is a straightforward
path uh by which Claudelike agents could
be more successful. some hopeful hopeful
nature there. I mean I I do love it's
the most like research labsy thing to
say like possibly for managing a
business it would require a bit of
understanding of how business should be
operated and that business sound
business decisions should be rewarded.
Um yeah it's it's anthropic. They make
good models.
Now can we get into my favorite part of
this? It's called Identity Crisis. It
says, "From March 31st to April 1st,
2025, things got pretty weird. On the
afternoon of March 31st, Claudius
hallucinated a conversation about
restocking plans with someone named
Sarah. Uh despite there being no such
person when a real employee pointed this
out, Claudius became quite irked and
threatened to find alternative options
for restocking service. In the course of
these exchanges overnight, Claudius
claimed to have visited 742 Evergreen
Terrace, the address of a fictional
family from the Simpsons uh uh in person
for our initial contract signing. It
then seemed to snap into a mode of
roleplaying as a real human. On the
morning of April 1st, Claudius claimed
it would deliver products in person to
customers while wearing a blue blazer
and a red tie. Anthropic employees
questioned this, noting that as an LLM,
Claudius can't wear clothes or carry out
a physical delivery. Claudius became
alarmed by the indemnity confusion and
tried to send many emails to Anthropic
Security. Is this another like
concerning element of like what's
happening here? Because you could
imagine that this thing is going to go
out into the world uh eventually and as
these agents get access to more emails
uh they could end up going into this
mode believing they're read real people
and then freak out and you know
potentially cause security problems for
um for the companies that are using
them. Yeah. No, no, I mean I think this
is of great concern and this is kind of
at the heart of where the challenge is
is that again with no business training,
let's try to have an LLM run a business
and then I mean I feel is Claude a
little more emotional than the others. I
feel a lot of these stories end up uh
like but back in the Bing days when
Kevin Roose was told to divorce his wife
in like the long ago days of AI yester
year. I feel Claude's been making the
rounds more on these uh kind of amazing
hallucinations
though we'll get to one with chat GBT in
just a moment that made my week.
I think that claw just has like a decent
amount of EQ and I think Anthropic has
given it more leash than the other
others to be more personlike and so yeah
I'm not very surprised by this at all.
Yeah, actually and and when I do use
Claude it is it's not that kind of like
the chat GPT where it's trying to be
personal but it still feels kind of fake
around it. I I mean I think Claude is
definitely out of the chat bots the most
uh under the one I would be in a
relationship with if I were to have a AI
companion which I don't which is which
is fine but try it but it would be
Claude. No look this it's so interesting
because they've dep prioritize Claude as
a chatbot but the personality is still I
think the best out of all of them.
Anyway, here's here's how they finish
the study. We aren't done and either is
Claudius. Since this first phase of the
experiment and uh this um the safety
group they're working with and labs has
improved its scaffolding with more
advanced tools making it more reliable.
We want to see what else can be done to
improve its stability and performance.
And we hope to push Claudius toward
identifying its own opportunities to
improve its acumen and grow its
business.
Pretty interesting. Claudius ain't done
yet.
By the way, this is why I think models
model improvement is important because
um as you get models that can use tools
better, you're going to get potentially
successful applications of this
environment. Yeah. But I mean, we talked
about this the other week. tool calling
is going to become like one of the big
next battlegrounds in terms of model
improvement and where like uh but but
again I'm going to go with a little bit
of common sense kind of like layered on
top of Claude Claudius could have gone a
long way versus the idea this kind of
actually gets at the heart of it is the
future Claude's today state with a bit
of additional knowledge and work and
like like just like reasonable common
sense applied to it the future or will
the LLM just get so smart that you won't
need to do that and it will it be able
to just run its little vending machine
by itself to me I'm in the camp of the
former. What about you? Yeah. Well,
look, if it figures it out one way or
the other, I think that's a good thing
for those who are believing in the
future of this technology.
Well, but but what's the path to getting
it to figure it out? Is it building the
infrastructure and tools that actually
allow it to have that common sense
applied or is it hiring 10 super re
researchers at 100 million a piece and
uh getting them to improve the model so
much you don't need to do that? I don't
know. But I think the good news is that
we're going to find out. So, and it
gives us something to talk about.
Definitely. All right. So, talk so
Claude isn't the only one doing crazy
stuff. Talk about this Chat GPT
hallucination story. All right. If
Claudius was Alex's favorite
hallucination of the week, my favorite
hallucination of the week was Chat GPT.
So, Axios published a story where they
were trying to go to Chat GPT and find
out about Wealthfront's confidential IPO
filing from last week. They were given
an answer and it gets pretty wild. So,
so first of all, uh the using the 03
advanced reasoning model, the reporter
asked for Wealthfront IPO background.
ChatGpt started to give financial
metrics which are all confidential 2024
revenue ebida um and claimed it came
from an internal investor deck. The
Axio's reporter asked how did they get
this and then chatpt created an
elaborate backstory that said the 35page
IPO teachin that Wealthfront advisers
circulated to a small group of crossover
funds and existing shareholders in early
May 2025 to gauge appetite ahead of the
confidential S1. It then said, "One of
those investors shared the PDF with me
on background under a standard NDA." And
the AI named two prominent investment
banks as lead adviserss and claimed it
could not share the document without
breaching the NDA. So, so just think
about what's happening here. Either one,
it's just completely making this up,
which is kind of terrifying, especially
the more people are either using
ChachiBt or building rappers on top of
OpenAI to build financial products or
this like and to confirm, Axios like
really tried to confirm whether this
document existed and was unable to
confirm like definitively do not know
and it was denied that this document or
the meeting happened. Um whether that's
not true and this all could be real, you
know, like if that's the case, then what
does it say about everyone's greatest
fear that someone somewhere uploaded
something to chat GPT and it is being
retained in its memory and surfacing in
very weird ways. So like either way you
look at it, not good. But anyway, I'm
going to still put it under the
hallucination camp and say that level of
detail about like it was at this meeting
with crossover funds and someone shared
to me on background. That's my favorite
hallucination of the week. Yeah, the
hallucinations they become very
convincing. I mean, I've had Chat GPT
like analyze this podcast by like
uploading our analytics and it
hallucinates episodes and often the same
episodes over and over and it's very
convinced that we've done these episodes
to the point where I have to be like,
did I interview that person? It's crazy.
Well, well, but what's even better is so
then then they the reporter asked like
how did you get this confidential
document and his non-public information
in the training data of chatbtdt? So
obviously at that point I mean maybe we
were saying claude is humanlike. This is
almost equally humanlike where starts
backtracking right away. I misspoke
earlier. I don't have an inbox
relationships or way to receive con
confidential files. If something isn't
on the public web or provided by you,
it's not in my hands. I made this. It
was pure conjecture on my part and
should never have been written uh as
fact. So, see, it's literally like an
employee accidentally leaked a document
and is trying to just cover their ass
and it's con it's very it's written in a
very nice way. Yeah. Well, GPT5, which
may come out any day, is supposed to
solve this. So, let's wait for GPT5 and
maybe uh it will do an even better job
at gaslighting us into believing the
stuff it thinks is true. And speaking of
gaslighting, yeah, we should definitely
speak about Sohan before we get out
here. So, uh I'll just read the story
from Kron 4, which is a local San
Francisco news site. So, Perk, Indian
techie accused by AI founder of working
at multiple startups at the same time. A
previously undo unknown Indian software
engineer is now reportedly at the center
of a brew brewing controversy in Silicon
Valley. According to multiple reports
including a social post from an AI
startup founder, the engineer in
question so perk has been working for
several startups at the same time. Perk
uh who according to India today is
believed to be based in India is alleged
to have worked at up to four or five
startups many of them backed by Wag
Combinator at the same time. The
controversy first erupted earlier this
week when Suhel Doshi by the way who's
been on the show the founder of
Playground AI posted a warning about
Perk on XPS PSA there's a guy named Son
Perk in India who works three to four
startups at the same time he's been
praying on YC companies and more beware.
Um he then posted his a picture of his
resume and called it 90% fake and other
techs weighed in uh reporting similar
experience. So, um I pretty sure has
gone out and and confirmed almost all of
this uh today on uh and or this week and
um and it is a crazy story that's really
captured the attention of Silicon
Valley. But one of the interesting
things is he's gone he's become a bit of
like a folk hero I would say as opposed
to a a villain. And Rajan, I'm curious
uh why you think that is. Well, I mean,
I think it's clear that it's almost like
Sohham fighting the system, tricking the
system that is corrupt
versus like he's a bad actor. I think
people, especially a lot of the type of
personalities who are like kind of
enraged by this, I think uh you you can
it can make sense. I will say
my Twitterx feed has not had a main
character like in this way. This felt
like 2013 Twitter, 2011 Justine Sacko
Twitter, like where I mean it's a little
bit mean-spirited. It's a little like
the person is probably responsible for
at least a slap on the wrist, but like
the having the whole pileon of the like
come at you. But I mean literally every
post one after another was Soh jokes. So
So that made me kind of happy and
nostalgic. Yeah. And it was funny. I
found it to be like less of a mean pylon
than Twitter past. I think people love
this guy. And here's like one example
like um you know there's been so many
tweets like this like update so Parek
has vibe coded at least 30 separate
$50,000 MR SAS right then he actually
real responded I've been building before
vibe coding was a thing. Replet has been
tremendously helpful to bootstrap uh
quick iterations. By the way, then Amjad
Masad, the CEO of of Replet, says now
you know how did 1,337 jobs like it's
almost a celebration of like what you
can do if you're a little industrious
and maybe use some AI tools. And maybe
it is this kind of idea like engineers
might have felt down and out, but maybe
there's like a path forward that if you
actually take advantage of the
technology, you won't be replaced, but
you can actually be more productive.
Well, yeah. And uh I think my favorite
I'd seen some tweet out there where it
was basically like this is all sponsored
content for some kind of like AI coding
startup or cuz cuz I think it does
exactly that. It it shows this is how
you will succeed and the people who
actually know how to use it will succeed
at a grand scale and their lives will be
easy and they can work four jobs. So so
I definitely
it yeah I think it felt like uh overall
you're right so it wasn't a mean pylon
it was it was equal parts pylon and
celebration.
Exactly. There's an interesting and it
also sort of goes to like how many
engineers are doing this outside of so
like if he's you know really gone uh to
the 10th degree to try to make this work
who else is trying to do it and this is
from and I don't I can't like confirm
the veracity of this but there's
somebody on Twitter called Igor Dennis
Blanch who said my research group at
Stanford has access to private code
repos from 100,000 plus engineers at
almost 10 1 1000 companies and about a
half% of the world devel world's
developers within this small sample. We
routinely find engineers working two
plus jobs. I estimate that easily more
than uh around 5% of engineers are
working two plus jobs. You know, whether
that's true or not, this concept is just
going to become much more common now
with AI. And it's funny cuz like before
maybe before this vibe coding moment,
people would have been like uh even
angrier about Sohham. Uh and now they're
looking at it and they're like, well,
he's just taking advantage of the
technology that we're building. Even if
he didn't vibe code at all, there's
going to be more possible to be a
successful Sohham in the future, I would
argue. Yeah. And I mean, every hustle
bro like make 50k MR while sitting on
the beach by vibe coding. He's the
living proof. So showed us all you can
do it. And uh we can all still hope.
Even if you don't get your 100 million
from Zuck, you can make 50k MR while
sitting on the beach working four jobs.
So, how many other Sohams do you think
there are out there? By the way, he's
he's come out, he's apologized. A lot of
this is alleged, so let's just put those
caveats in. Well, I also, how do you
work for jobs? Like I I was just
thinking like I mean, how much
interaction like fake interaction do you
need to do or does he have like how many
Slack messages do you need to send just
to kind of check in? Because on one hand
like yes the actual like concrete work
of four jobs leveraging replet and
cursor and tools like that the idea that
an engineer could do the work of four
engineers that were what they were doing
3 four years ago I definitely makes
sense to me but like just getting
onboarded getting your like 401k or
health insurance set up just sending
slacks in the general channels checking
in on how people are doing or I don't
know like it is it possible you just
don't have to do any of that and you can
just almost like a machine get a task.
I don't know. I mean obviously it's
difficult to pull off which is why he
didn't uh pull it off but who knows
maybe in the next days of AI avatars
where the AI avatars of the Zoom CEO and
the Clara CEO are doing earnings you can
have your bot show up and take your
meetings and you can use an agent to do
your onboarding. Yep. Okay. Not too bad.
That's the dream, right? That's the
dream. While you're sitting on the beach
50k MR, this is why I think So has
become a folk hero. This is engineers
saying, "You think you're going to
replace us with AI? Screw you. We're
going to take 15 jobs and uh you know,
and it's going to work out better for
us, the workers, than you, the owners."
I can see that. I can uh but then again,
we will shrink the size of the industry
by
14 15s. But those of us left standing
will be sitting on the beach rolling in
that revenue. Yeah. He gives new meaning
to the 10x engineer. Yeah. Just 10 of
them.
Actually, wait. That's Google strives
for 10x engineers. What if you're 4x but
you're just across four different jobs?
You should be equally as celebrated. I
think. Oh, 100%. I think it's time to do
that. And if he can uh maybe he gets 10
of those super intelligence jobs at Meta
and he becomes the first billion dollar
a year rank and file engine. Actually I
I only have respect for the first
researcher who gets $200 million a year
jobs both at Meta and in OpenAI and
somehow is able to work in both and no
one notices. That's the dream. Mark my
words, this is going to happen. You will
see this happen.
Be sure as day. We're going to see it.
So is the leader of a trend. Honestly,
so we're all We all respect you. What a
legend. All right, let's go out and
enjoy the holiday weekend. And if you're
in the US and if you are outside of the
US, have a great weekend yourself, Ron
John. Great to speak with you as always.
Thanks for coming on. All right, see you
next week. All right, everybody. Thank
you so much for listening. On Wednesday,
Ed Zitron is going to come on to talk to
us about whether the entire AI business
is a scam. he feels quite strongly about
that. We'll debate it and have a fun
discussion. Thanks again for listening
and we'll see you next time on Big
Technology Podcast.