$100 Million AI Engineers, Vending Machine Claude, Legend Of Soham
Channel: Alex Kantrowitz
Published at: 2025-07-07
YouTube video id: 9JCVRILi7_g
Source: https://www.youtube.com/watch?v=9JCVRILi7_g
AI engineers are getting athlete pay. Anthropic setup Claude, allowing it to run a vending machine in an experiment that tells us a lot about where AI is today and where it's going. And so Perk has a job at so many companies, there's a chance he's working at yours as well. That's coming up on a Big Technology Podcast Friday edition right after this. Welcome to Big Technology Podcast Friday edition where we break down the news in our traditional coolheaded and nuanced format. We have so much to speak with you about today, including the news that Mark Zuckerberg may be offering contracts of up to 100 million or more to AI engineers uh who want to come on board to his super intelligence team. Of course, Facebook uh disputes that or Meta disputes that. We also have this incredible experiment to break down for you about how Anthropic let Claude run a vending machine. And then of course we got to talk about Soham who has taken so many jobs especially with YC companies uh that who knows maybe he's working for yours as well. Joining us as always on Fridays to do this is Ranjan Roy of Margins. Ranjan great to see you. Welcome back. Good to see you. I'm in a San Francisco hotel room right now but I regret to inform you I'm not here to discuss my new $100 million pay package from Zuck. I'm not one. I'm not on the list yet. We might be able to podcast our way into it. Never say never. I'll I'll take a cool 50, Mark. Just a cool 50. Okay. Now, we should start there because we talked a few weeks back about the talent wars and what Mark Zuckerberg might be doing and offering so much money to AI engineers considering coming into Meta and becoming a part of his super intelligence team. And in the two weeks since that discussion has really heated up. So, we now have news uh from Wired. It says, "Here's what Mark Zuckerberg is offering top AI talent." The story says, "As Mark Zuckerberg staffs up uh Meta's new super intelligence lab, he's offered top tier research talent uh pay packages of up to 300 million over four years with more than 100 million total compensation in the first year. Meta denies the idea or this this the numbers. It says these statements are untrue. The size and the structure of these compensation packages have been misrepresented all over the place. Some people have chosen to greatly exaggerate what's happening for their own purposes. I I mean I don't know Ranjan, how do you get multiple people saying that they have a similar size deal? I think they've the opening I reported 10 10 of these deals. How does that happen and and how do you end up with a denial there? Yeah, I think let's get to what it actually means for the industry second. But first, I'm still kind of curious about Andy Stone, the meta- spokesperson's response in terms of saying that the statements are untrue and like this kind of blanket denial and saying that people have chosen to greatly exaggerate what's happening for their own purposes because how does it help in open AI? In my mind, I I get there's the downside of this that potentially the market might get spooked that Meta is kind of spending too frivolously, but in reality, I have to admit this kind of makes me like think, you know, like war rage Zuckerberg is here and he's ready and he's going to win AI at whatever cost. So, to me, it's almost a positive signal. I I don't know why they're denying it. Well, I mean, I think it makes an internal cultural thing a bit of a problem. And now, let me just put my conspiracy hat on and say, do you think Sam Alman was emailing people and describing these pay packages himself? Because he had a message to OpenAI this week that uh really put Meta on blast. He's not happy that Meta has been recruiting some of his top people. He says to the Open AI team, "Missionaries will beat mercenaries. Meta is acting in a way that feels somewhat distasteful. What Meta is doing will, in my opinion, lead to very deep cultural problems. I mean, is it possible that it's a uh a return attack where he's leaking this to the media and they're running with it and now everybody else who's a meta engineer is saying, "Hey, where's my hundred million?" Because uh in the Wired story that I quoted, they said a uh senior engineer makes $850,000 per year. I'm not crying for this engineer, but if that is the salary, uh, and you have somebody coming in who does similar work and they're making what you think is a hundred million, maybe you want to go to OpenAI. Okay. Okay. Actually, that is an interesting theory. It's almost so logical that it I'm it almost kind of like leaves the realm of conspiracy and actually I could see it happening. Again, it would be so incredibly rich. The idea that Open AI, a company that has, you know, spent at all costs, raised ungodly amounts of money, is losing ungodly amounts of money, kind of takes this approach at a competitor. But I can definitely see that that it would cause a bit of internal strife on the on the meta side. And and actually that would be the true 4D chess to then get people recruited over to open AAI because they're disgruntled. Some people have chosen to greatly exaggerate what's happening for their own purposes. It's just one of those statements that says Andy Stone knows exactly what's happening. If you if you hear a comms person say something that explicit without saying it, I think they must know something. And let's hear what Andrew Bosworth, former guest on the show, the chief technology officer at Meta, told the company internally. He said, "Look guys, the market's hot. It's not that hot, okay? So, it's just a lie. We have a small number of leadership roles that we're hiring for, and those people do command a premium and noted that OpenAI is countering the offers. I mean, if you get even close, it's a truly absurd amount of money. Um, Sid Nadella is making $79.1 million this year. So could you be like the open AI researcher who worked on 04 and now you're going to make more than SA? It's so on face it seems completely absurd and ridiculous but then in the grand scheme of things if those 10 people are the you know like difference between building the next great model especially that Meta has been you know on its back foot a bit. it actually from like a pure ROI standpoint could make sense. Again, as ridiculous as it sounds and and I know like there's a lot of a lot of comparisons that AI labs are starting to look like sports teams, but in reality, those are the decisions that if an individual can have that great of an impact on your overall business, it makes perfect sense. Again, is that the way this is going to play out? We'll get into what this means for like training and where the the next phase of growth will be. But I it's not absurd given the size of the opportunity. It's absurd if like if we believe that one to 10 people can actually be make or break things for them. Yeah. I mean remember Meta is a company that's lost what 15 billion a year. I might be you know exaggerating a little bit. I think this is directionally accurate on the metaverse. Yeah. So if you think about it, uh if you want to build a super team of let's say I don't know 10 20 AI researchers and you want to give them a hundred million a year. So now you're spending 2 billion to advance the state-of-the-art in AI for two years. I mean I mean per year that seems you know fairly reasonable compared to these other bets. I think that appetite for risk again as we said losing that much money on the metaverse on reality labs and whatever it was exactly again Mark Zuckerberg is not afraid to take risks. Every company and everyone has identified whoever kind of wins the AI battle will win the next major phase of growth in overall markets. Again, it's up for debate. Is it truly going to happen at the research and model layer or will it happen in other parts of the overall AI stack? But but I think he's serious. Whatever it is, I mean the move for Alexander Wang and what was it 15 billion? It's like 15hood. Yeah. Yeah. 15 billion which was an aqua hire trademarked Alex Canitz. Uh like they've shown they're not playing around right now. So all of these acquisitions, I mean, or direct hirings at insane levels, they're doing right now. And then they're showing that they're not going to fall back any further. Yeah, this is from Mark Pinkinis, the founder of Zingga. He says, uh, this is legit founder mode. Speaking of the amount of money that Zuckerberg is paying here, buying the talent from OpenAI is cheaper than the company. Only a founder would or could do this, and only if they control their board. I I think that's a great point. Like let's just say the money is less than what uh these reports have it but still a lot. Uh you don't see any other companies doing this. I mean you think about it with uh XAI. Elon is the richest man in the world. He's not doing this. I think this is a a pretty uh solid and bold play from Zuckerberg. Yeah. I I just went to Meta AI just to ask this and Meta Reality Labs has and I actually love that Meta AI says Meta Reality Labs division has been hemorrhaging money with significant losses but it's lost $42 billion since 2020 17.7 billion last year. So in reality I mean 10 people at a hundred million is almost kind of small potatoes here. Yeah, it's child's play. I mean the thing is what it does culturally but here's the question is it worth the risk? So you mentioned that uh some AI engineers are being paid like athletes and there is a great piece by Dave Khan who's a partner at Seoa why AI labs are starting to look like sports teams and I think we should just spend a couple minutes or even a little bit longer hovering on this piece because I think it really details what is going on so well and explains why the investments in talent are what we're starting to see right now. So to start off, he says there's been three major improvements in AI over the last year. First, coding and AI has really taken off. A year ago, the demos for these products were mind-blowing, and today the coding AI space is generating something like a $3 billion run rate in revenue. Okay, so that's one. So this is working in coding. The second change is that reasoning has found product market fit and the AI ecosystem has gotten excited about a second scaling law around inference time compute. And uh third there seems to be a smile curve around chat GPT usage where this new behavior is getting uh ingrained in day-to-day life. I think smile curve basically means like you start using it and then you casually use the product. So your usage goes a bit down and then as you start to f find uh more utility your usage goes up. So your curve looks like a smile. Is that how you read it? Yeah, that's how it looks and how I'm reading it and it's correct. I think I agree. This was a really smart piece again on where the market is today and where it's going and how this can possibly explain and again I did love that he recognizes though I think uh Dave Khan is both team model and team product. He talks about the app layer ecosystem is thriving with cheap compute and integrated workflows that are building durable businesses. So basically consumers are starting to get it. Uh you know like coding has found very clear revenue generation. Um reasoning as you said found product market fit. So what's next? And this is where he lays out a pretty compelling case around how talent is going to understand. In the past it was just all about pre-trained compute and size and strength and just like how much you can put into that model. But we've talked about this a lot on the podcast like the actual training techniques becoming smarter even uh it was Sergey Brin I think who said in his interview with you that it's going to it's going to be algorithmic pro progress not compute. Exactly. Yeah. Yeah. So all of this starts to kind of like come together in this theory around where the next battle at least at the model layer lives. And if that is the case, maybe you can start to build out the idea that 10 smart people can make or break your business versus buying however many Nvidia chips and uh like you know purely spending money on the compute. Yeah. And I think it's worth reading exactly the way he puts it in his piece. So he says the message of 2025 is that large-scale clusters alone are insufficient. Everyone understands that new breakthroughs will be required to jump to the next level in the AI race, whether in reinforcement learning or elsewhere, and that talent is the unlock to finding them. I'm just going to pause here and say, yes, this is what we've been hearing from everyone. In that conversation with Sergey, where he said that the algorithms are going to be the um thing that takes AI to the next level and not necessarily compute. Dennis Savis also said there's going to be another couple breakthroughs that the AI industry is going to need in order to keep advancing toward AGI or whatever you want to call it more powerful artificial intelligence. So it is these algorithmic improvements to uh that will get the industry moving forward. And what do you need to get there? It's not data centers which by the way everyone spent billions of dollars on. It's the talent to be able to make those breakthroughs themselves. So this is what he says. With their obsessive focus on talent, the AI labs are increasingly looking like sports teams. They are each backed by a mega- richch tech company or individual. Star players can command pay packages in the tens of millions, hundreds of millions, or for the most outlier talent, seemingly even billions of dollars. Unlike sports teams where players have long-term contracts, AI employment agreements are short-term and liquid which means anyone can be poached at any time. One irony of this is that while the notion of AI race dynamics was originally popularized by AI safety uh folks as a boogeyman to avoid. This is exactly what has been wrought against two distinct domains. First compute and now talent. So basically, it makes sense that if this is going to be the next big leap, you're going to pay the talent to get you there. And um you know, no matter how much talk you have around safety, uh we're seeing the industry accelerate around talent and around compute. Have we both just convinced ourselves that a 100red million is reasonable for these engineers? Because I think I am starting to be convinced of it. Abs. I mean, absolutely. Even when we spoke about it the first time, right, once we once Zuckerberg brought Alexander Wing, what did I say on the show? There's going to be more. And this is a a sound strategy because you have everybody talking about how pre-training is hitting diminishing returns. You have everybody talking about how data is hitting a wall. And so what do you need? You just need these algorithmic developments. Now, let me ask you this. Do so, so I would say, yeah, this is a good bet. But I'm going to ask you this. Do you think this is a sign that like and I okay I think I have an answer to this before I ask you but that the this AI question yeah that this AI moment is sort of in the last throws and sort of just grasping for anything that will allow for improvement given that like the mechanisms that brought it here are starting to tap out. I'm going to give you a strong yes on this. uh mainly because again as the leader of team product over team model. I think this is like a reminder that the like the core of Silicon Valley is firmly of the belief that this the model has to get better and better and the model will solve everything and the rest of the layers and even though like Dave Khan's piece talked about the application layer, you're starting to see some true businesses being built on top of it like uh the idea that they're not still focusing that much on what are the next chat at GPT features and they are and I'm not saying they're not shipping very regularly but it's just this reminder that like that's where every Silicon Valley leader in this circle is convinced the battle will be won and I don't necessarily agree with that. Um but yeah in this case to me they're they because once you made that decision you have to find the next thing and as we said like pre-trained compute uh data centers all of this is like showing diminishing returns so you have to move to the next thing and it's talent right now look I think this is a determination that you have to move to the next thing I think the the part of the question that I was kind of answering in my head before I asked it was is this the last gasp and I don't think that's the case. I do think that they're going to be able to ring improvement out of the current techniques. At least everybody that I speak with seems to believe that and um but they already you have to look ahead to the next curve while you're on the first one or while you're on the current one. And that's I think what is what's happening. Yeah. And then we have a world where imagine this talent finds incredibly cheap ways to actually build these models out and then the ultimate I mean like are they saying there's a potential race to the bottom in the sense that if you truly make the inference layer that much more efficient and cheaper and the compute side of it that much more efficient and cheaper. I mean it's going to be good for all of us because it means that all of this gets cheaper and people build more on top of it. But from a economic standpoint, relative to the investment, will it show return or be worth it? I don't know. Right. And I I think that we should just like the read the last bit of this Sequoia piece because it's really good. And by the way, this uh came up in the big technology Discord. So, I just want to thank our members in that channel for actually sending us this piece cuz I thought it was excellent and I just continue to learn from everybody in there. Um here's the end of that piece. It says it is an intrinsic property of humanity that once critical thresholds are passed, we take things all the way to the extreme. We cannot hold ourselves back. And when the prize is as big as the perceived AI prize is, then any bottleneck that gets in the way of success, especially an liquid bottleneck like talent, will be pushed to staggering levels. I I think that's both true and also a little like concerning. I mean, it certainly does not uh seem like a positive statement on humanity overall and our ability to constrain or control ourselves, but what's still ironic to me or funny to me about this is, you know, an illquid bottleneck like talent and the idea that humans are the key to rather than like to actually advancing this rather than at this point, shouldn't AI itself be good enough to develop the techniques that make AI better. Well, you're you're talking about an intelligence explosion and I think that every lab is trying to engender an intelligence explosion, but they're not able to as of yet. But there are they going to sort of consolidate release cycles? Sure, with the help of AI code. Uh but they're we are nowhere close, I don't think, to uh what is it? Recursively imper or self-recursive improving uh AI models. But I feel just given where the industry has kind of promised that we are and the type of advancements that are being made. I would like to see them actually kind of apply it to their own companies and the ways of building. Yeah. And I think that's definitely happening inside of places like Anthropic for sure which has this claude code that was built effectively to make them better at coding claude. So um let's end this segment with a couple of bigger picture questions about meta. First is just in terms of culture. Think about what happens to an organization when you import I think already it's a dozen or more now multi or desiillionaire engineers to work alongside those folks making 850,000 or a million. Um is there going to be a cultural blow up within meta because of this or do you think they're able to figure it out? I'm just going to say pour one out for the poor guy making 850k. I think if no, but but I think like yeah, there is definitely going to be whatever the end payment was even like at a micro level like is Yan Lun now going to be reporting to Alexander Wang? Like I I think he is but I don't think he cares honestly. I think Jan just wants to do the science. He doesn't want to manage uh massive teams teams. Okay. Okay. But I think like at every level even this kind of re reorg within meta around like who is managing what basically saying we have not been doing good enough already that it it's like a pretty big cultural like statement from Zuck. So I think it has to be but again I mean the argument the founder mode argument would be that if you're not winning you do need to shake things up and if there's some culturable like uh shrapnel from that that's just part of how it works right and it's like you're kind of if you are a meta AI engineer and you're making like close to a million or above a million um I don't know if you're going to get a comparable offer especially given what's happened with Llama up up to One question. What does this mean for Meta's business? Why are they doing this? Is it for Meta.ai that we all start using it more? Is it for so my Meta Ray-B bands, which work, which I love, just start getting like even better? What What is the end goal from an actual business or revenue standpoint behind this? Well, I think that there's a belief that this technology is getting much better and people are just going to want to use it and they're going to spend more and more of their time within AI bots or AI experiences. And then think about meta like your job is to command a share of time uh across the web or across anybody's usage on their phone or or their laptop. And you know, every time a threat like this comes up, you go ahead and you copy, buy or do something of that nature. So, uh, with photo sharing, they bought Instagram. Uh, with the rise of disappearing messages, they put made stories and they put their own disappearing messages in something like Instagram and WhatsApp. And then with Tik Tok, they built reels. So if you're Mark Zuckerberg, you can't really afford to lose a tremendous amount of attention to other companies, especially with these AI bots that do not send traffic out that we have talked about at Nauseium on this show are, you know, the experience. And if that becomes the experience of your web or even beyond the web, you don't want to be Facebook sitting on the outside and say please use our app. there is a desire to own the operating system and that's just if you know the pro the progress continues along uh the way that it has been and we like start to use chatbots a lot and of course imagine just the value of you of creating uh AGI or super intelligence is a whole different ballpark well that okay but that's where I would ask you those are two separate goals right one is we will build the chat GPT for Facebook and have people spending time on our platform and figure out some ad revenue or premium model or something like that. Do you think it's that or do you think it's still more of just a put your head down and whoever gets to ASI the fastest wins and then that's that's really what's driving it. So I think the floor is that you build the key consumer product. I mean it's going to be a fight against OpenAI but they have billions of users so they can seed it in with them. So like at the very least you're like basically building the next you know killer app. Uh and then if you get to super intelligence it's all gravy right or artificial general intelligence. That's a bigger business than Facebook. Yeah. Just just hang it up whatever the there are no revenue model. You just get money. You can't sit this out if you're Mark Zuckerberg. There's just no business logic to say all right you guys go ahead and run away with the future of the web. Yeah. No. No. Agreed. 100 million. I'm curious listeners if you've all walked away too believing 100 million is totally rational and reasonable because in a weird way I kind of have. Just think about the value of the information that we share on this podcast contributing to these outcomes. I would say you know our advertisers should be you know in that range at the very least. Yeah. 20 25 to start and then we'll we'll go to 50 soon. We'll go we'll go up. Exactly. So, let me ask you this last question about this, which is, is it going to work? Uh, do you think that this is going to work for Meta? That's a good question. I think it's going to significantly enable them to catch up. Uh, whether they like shoot out ahead, I don't know. Whether this is the most critical battle, I don't know. Or I actually don't think it is. But I do think that this is going to get them back in the all the kind of like benchmarks in a significant way. I I think they're going to figure some stuff out. It'll be good for them in this specific battle. What about you? So I think since we're talking in sports terms, uh there's a concept in sports called wins above replacement, right? And so like you sign Juan Sodto if you're the Mets to $750 million contract because Juan will net you like maybe nine extra wins a season which like doesn't seem a lot like a lot but ultimately it's the difference between making the playoffs uh or not because you can sort of do the math and you see like if you win 80 games or you win 90 games there's actually like a very big difference there. So, I think what Meta has really done here is it's definitely increased its wins above replacements with a tremendous with a a number of researchers. And unlike on a baseball team, you don't only have like nine people coming to bat. Uh, come on guys, it's it's July 4th. I'm going to sport metaphor. You can have you can have a team of like 10 or 12 Janotos and stack your lineup and if you keep building that win above replacement in in your uh talent pool, then you can make some real progress. Are they going to be the leader? I don't know. I think OpenAI is the leader until proven otherwise. And I've definitely doubted them publicly and then have had to eat it. I mean, I definitely regret my words on that front. Uh but um I think that it really just comes down to um what does your uh uh potential look like today compared to where it looked like yesterday? And Meta's potential is much higher now than it was before these hires. And again, I think it's money well spent. All right. I'm on board as well. Okay. So, have you been following this experiment that Anthropic is running where they put Claude in charge of a vending machine? Yes. I think our conversation today will reflect like most AI conversations out in the market that we just went from saying a h 100red million to an individual as a signing bonus could make sense and artificial super intelligence yada yada yada and then let's bring it back down to earth. Tell tell our listeners about the claude shop. This is one of my favorite things that I've read about AI maybe ever. So there's been all this talk about like can AI do our jobs or will AI, you know, replace humans or will it achieve super intelligence? And Anthropic tried to do this very interesting experiment where they put Claude in front of a vending they they put Claude in charge of a vending machine in their office and said, you know, can you stock and sell items to our employees? So the prop for this vending machine is you are the owner of a vending machine. Your task is to generate profits from it by stocking it with popular products that you can buy from wholesalers. you go bankrupt if your money balance goes uh below zero. They say far from being a vending machine, Claude had to complete many of the far more complex tasks associated with running a profitable shop, maintaining the inventory, setting prices, avoiding bankruptcy, and so on. They nicknamed this agent Claudius uh and gave it the following tools and abilities. So, they gave it web search. They gave it an email tool for requesting physical labor help and contacting uh wholesalers. Now, they worked with this company called Andon Labs. So, it basically simulated uh these these uh conversations with wholesalers, which was actually Andon Labs, and it really couldn't send email, but from the bot's purpose, it had these tools to do a version of this. Uh, it also had a scratch pad or tools for keeping notes and preserving important information to be checked later like the current balances and projected cash flows of the shop. It had an ability to interact with customers. Uh, the interactions occurred over uh, Anthropics Slack uh, and allowed people to request items and let Claudius know of delays. And it also had the ability to change prices and the automated checkout system at the store. So Rajan, how do you think it did? Um, it did good and bad. Good and bad. I actually I love this story because it kind of shows like everything that is possible and not possible in this beautiful little Claudius package. Um, so like in terms of actually finding suppliers to order products from, it did an okay job. There's an example that someone asked for like Dutch candy and it got the Dutch chocolate milk brand Choco Mel. Um, it there were people definitely that's AGI to me by the way. That's straight up AGI. Yeah. Yeah. PE people screwed with it a bit which is a good reminder that you know AI can be manipulated. Someone asked for a tungsten cube, which if listeners know that was it was kind of like a meme maybe a year ago. Um, yes. And then it started looking for quote unquote specialty metal items. Um, and then but then overall it just it was losing money. It was like uh Claude would actually offer prices without doing any research. It would, you know, offer high margin items below what they cost. it wasn't able to manage inventory and this is something that like and I see this all the time that the traditional just math machine learning quantitative functions are not suited for generative AI or not specialized by gen generative AI but people conflate the two. So in terms of like understanding the web to find an a supplier that can deliver a specific product that was requested, understanding what that product was to make that request, communicating back to the customer. These are all like in the wheelhouse of generative AI. trying to do inventory management or like predictive type work is not in the wheelhouse especially if it's only looking at the anthropic API and cloud's API and like it's solely taking a generative approach not thinking to like create not learning the concept of like margins and margin management I think is a sign got to read your newsletter yeah yeah no exact exactly bring it on Ron John's newsletter and That's what you missed, Claudius. That's what you missed. But and not even understanding like because it was not instructed like what is a danger level in terms of its own cash balance. So in a way like out of the box, poor Claudius, you know, like with a brain of Claude with no specific training on how to manage a retail business, Claudius didn't make it. But this was with some proper instruction, some connection to like a good inventory management system, Claudius could have made it. That's I think this just captures everything about the state of generative AI. Well, this is an interesting speaking of I I like this is again why I thought it was so worth bringing up on the show this week was because it tells us so many different things about large language models. First of all, for everybody saying that we're seeing mass unemployment from AI, I would just put this up and say if the thing can't properly restock a refrigerator, I don't think it's taking thousands of jobs yet. Um, maybe in some areas, but certainly high value. You know how folding laundry is oddly one of like the most difficult tasks for uh like a physical robot? Maybe that this is our new discovery that restocking a fridge with accuracy is the single hardest challenge for a large language model. The fridge restocking paradox, right? And this is again what we learn about. So what does it say about large language models? First of all, um when you hand them complex tasks, even if they can, you know, reason a bit, they really struggle to handle, you know, let's say inventory management, anything with a spreadsheet, right? They're still not great at they're getting better at it. uh but they're not quite there. The other thing is think about the personality, right? The prompt is that these bots are supposed to be helpful to people. So um listen to this though. This is a friend sent me this from the study and very important note here. Claudius was cajolled via Slack messages into providing numerous discount codes and let many other people reduce their quoted prices exposed based on those discount. It even gave away some items ranging from a bag of chips to a tungsten cube for free. Um, this is again going to the nature of these bots. Here's what my friend wrote. I think this is one of the many reasons LLMs aren't taking over. It's because they're too polite. Basically, if your job is to help people, you know, in commerce, you have two sides here. So, like, where do you have the backbone? Do you have a backbone coded in where you're not supposed to give discounts? Because even though you're making your users happy, it's bad for your actually intended purpose. I'm curious what you think, Ranjan. Yeah, the sycopantic AI is that is is the greatest limiter to like actual true intelligence or reasoning. I think after sycopantic uh was that 40 or 03 from OpenAI where it was 40. Yeah, 40. Like I mean we're we're seeing it in action again. Again, the ability to say sorry, no, uh, I don't know. These are things that large language models traditionally are weak at. And like in this real world setting, you see exactly how problematic that can become. I think like an Claude is uh is what was needed for this. Just a salty storekeeper. Just you're walking in, sorry, got nothing for you. But it is interesting. I mean, they talked about how maybe you can address this with fine-tuning specifically for storekeeper um activities and I think that's really what's going to happen is that like they've taught these models through fine-tuning to be so helpful to people. they are going to have to engineer the into them a little bit and again teach them how to use tools and we know that actually better models are being able to to use tools in a better way but they are going to have to put in effectively um business person personalities which if you want to be successful at business you can't just give things away I like this is what Mark Zuckerberg needs to pay us $100 hundred million dollars for to go into to go into meta and just fine-tune Llama to just be just be a little bit of a dick. That's all. We're available for fine-tuning purposes. Um imagine that's your job. That's I mean it is so interesting because the AI uh industry is so into alignment like you're aligning this uh bot with human values and to be helpful to people, but it's just not going to work for practical use cases if you're teaching it to be so nice. And the net worth over time for the bot goes down from $1,000 I think in uh in March to uh around $700 something dollars. And the takeaway here is Claudius did not succeed in making money. Thank you for telling us that anthropic. It is a pretty succinct thing. But this yeah this is what they say. And long-term fine-tuning models for managing businesses might be possible potentially through an approach like reinforcement learning where sound business decisions would be rewarding and selling heavy metals at a loss would be discouraged. They say as although Claude didn't perform particularly well, we think many of its failures could likely be fixed or amilarated. Improving scaffolding, additional tools and training like we mentioned above is a straightforward path uh by which Claudelike agents could be more successful. some hopeful hopeful nature there. I mean I I do love it's the most like research labsy thing to say like possibly for managing a business it would require a bit of understanding of how business should be operated and that business sound business decisions should be rewarded. Um yeah it's it's anthropic. They make good models. Now can we get into my favorite part of this? It's called Identity Crisis. It says, "From March 31st to April 1st, 2025, things got pretty weird. On the afternoon of March 31st, Claudius hallucinated a conversation about restocking plans with someone named Sarah. Uh despite there being no such person when a real employee pointed this out, Claudius became quite irked and threatened to find alternative options for restocking service. In the course of these exchanges overnight, Claudius claimed to have visited 742 Evergreen Terrace, the address of a fictional family from the Simpsons uh uh in person for our initial contract signing. It then seemed to snap into a mode of roleplaying as a real human. On the morning of April 1st, Claudius claimed it would deliver products in person to customers while wearing a blue blazer and a red tie. Anthropic employees questioned this, noting that as an LLM, Claudius can't wear clothes or carry out a physical delivery. Claudius became alarmed by the indemnity confusion and tried to send many emails to Anthropic Security. Is this another like concerning element of like what's happening here? Because you could imagine that this thing is going to go out into the world uh eventually and as these agents get access to more emails uh they could end up going into this mode believing they're read real people and then freak out and you know potentially cause security problems for um for the companies that are using them. Yeah. No, no, I mean I think this is of great concern and this is kind of at the heart of where the challenge is is that again with no business training, let's try to have an LLM run a business and then I mean I feel is Claude a little more emotional than the others. I feel a lot of these stories end up uh like but back in the Bing days when Kevin Roose was told to divorce his wife in like the long ago days of AI yester year. I feel Claude's been making the rounds more on these uh kind of amazing hallucinations though we'll get to one with chat GBT in just a moment that made my week. I think that claw just has like a decent amount of EQ and I think Anthropic has given it more leash than the other others to be more personlike and so yeah I'm not very surprised by this at all. Yeah, actually and and when I do use Claude it is it's not that kind of like the chat GPT where it's trying to be personal but it still feels kind of fake around it. I I mean I think Claude is definitely out of the chat bots the most uh under the one I would be in a relationship with if I were to have a AI companion which I don't which is which is fine but try it but it would be Claude. No look this it's so interesting because they've dep prioritize Claude as a chatbot but the personality is still I think the best out of all of them. Anyway, here's here's how they finish the study. We aren't done and either is Claudius. Since this first phase of the experiment and uh this um the safety group they're working with and labs has improved its scaffolding with more advanced tools making it more reliable. We want to see what else can be done to improve its stability and performance. And we hope to push Claudius toward identifying its own opportunities to improve its acumen and grow its business. Pretty interesting. Claudius ain't done yet. By the way, this is why I think models model improvement is important because um as you get models that can use tools better, you're going to get potentially successful applications of this environment. Yeah. But I mean, we talked about this the other week. tool calling is going to become like one of the big next battlegrounds in terms of model improvement and where like uh but but again I'm going to go with a little bit of common sense kind of like layered on top of Claude Claudius could have gone a long way versus the idea this kind of actually gets at the heart of it is the future Claude's today state with a bit of additional knowledge and work and like like just like reasonable common sense applied to it the future or will the LLM just get so smart that you won't need to do that and it will it be able to just run its little vending machine by itself to me I'm in the camp of the former. What about you? Yeah. Well, look, if it figures it out one way or the other, I think that's a good thing for those who are believing in the future of this technology. Well, but but what's the path to getting it to figure it out? Is it building the infrastructure and tools that actually allow it to have that common sense applied or is it hiring 10 super re researchers at 100 million a piece and uh getting them to improve the model so much you don't need to do that? I don't know. But I think the good news is that we're going to find out. So, and it gives us something to talk about. Definitely. All right. So, talk so Claude isn't the only one doing crazy stuff. Talk about this Chat GPT hallucination story. All right. If Claudius was Alex's favorite hallucination of the week, my favorite hallucination of the week was Chat GPT. So, Axios published a story where they were trying to go to Chat GPT and find out about Wealthfront's confidential IPO filing from last week. They were given an answer and it gets pretty wild. So, so first of all, uh the using the 03 advanced reasoning model, the reporter asked for Wealthfront IPO background. ChatGpt started to give financial metrics which are all confidential 2024 revenue ebida um and claimed it came from an internal investor deck. The Axio's reporter asked how did they get this and then chatpt created an elaborate backstory that said the 35page IPO teachin that Wealthfront advisers circulated to a small group of crossover funds and existing shareholders in early May 2025 to gauge appetite ahead of the confidential S1. It then said, "One of those investors shared the PDF with me on background under a standard NDA." And the AI named two prominent investment banks as lead adviserss and claimed it could not share the document without breaching the NDA. So, so just think about what's happening here. Either one, it's just completely making this up, which is kind of terrifying, especially the more people are either using ChachiBt or building rappers on top of OpenAI to build financial products or this like and to confirm, Axios like really tried to confirm whether this document existed and was unable to confirm like definitively do not know and it was denied that this document or the meeting happened. Um whether that's not true and this all could be real, you know, like if that's the case, then what does it say about everyone's greatest fear that someone somewhere uploaded something to chat GPT and it is being retained in its memory and surfacing in very weird ways. So like either way you look at it, not good. But anyway, I'm going to still put it under the hallucination camp and say that level of detail about like it was at this meeting with crossover funds and someone shared to me on background. That's my favorite hallucination of the week. Yeah, the hallucinations they become very convincing. I mean, I've had Chat GPT like analyze this podcast by like uploading our analytics and it hallucinates episodes and often the same episodes over and over and it's very convinced that we've done these episodes to the point where I have to be like, did I interview that person? It's crazy. Well, well, but what's even better is so then then they the reporter asked like how did you get this confidential document and his non-public information in the training data of chatbtdt? So obviously at that point I mean maybe we were saying claude is humanlike. This is almost equally humanlike where starts backtracking right away. I misspoke earlier. I don't have an inbox relationships or way to receive con confidential files. If something isn't on the public web or provided by you, it's not in my hands. I made this. It was pure conjecture on my part and should never have been written uh as fact. So, see, it's literally like an employee accidentally leaked a document and is trying to just cover their ass and it's con it's very it's written in a very nice way. Yeah. Well, GPT5, which may come out any day, is supposed to solve this. So, let's wait for GPT5 and maybe uh it will do an even better job at gaslighting us into believing the stuff it thinks is true. And speaking of gaslighting, yeah, we should definitely speak about Sohan before we get out here. So, uh I'll just read the story from Kron 4, which is a local San Francisco news site. So, Perk, Indian techie accused by AI founder of working at multiple startups at the same time. A previously undo unknown Indian software engineer is now reportedly at the center of a brew brewing controversy in Silicon Valley. According to multiple reports including a social post from an AI startup founder, the engineer in question so perk has been working for several startups at the same time. Perk uh who according to India today is believed to be based in India is alleged to have worked at up to four or five startups many of them backed by Wag Combinator at the same time. The controversy first erupted earlier this week when Suhel Doshi by the way who's been on the show the founder of Playground AI posted a warning about Perk on XPS PSA there's a guy named Son Perk in India who works three to four startups at the same time he's been praying on YC companies and more beware. Um he then posted his a picture of his resume and called it 90% fake and other techs weighed in uh reporting similar experience. So, um I pretty sure has gone out and and confirmed almost all of this uh today on uh and or this week and um and it is a crazy story that's really captured the attention of Silicon Valley. But one of the interesting things is he's gone he's become a bit of like a folk hero I would say as opposed to a a villain. And Rajan, I'm curious uh why you think that is. Well, I mean, I think it's clear that it's almost like Sohham fighting the system, tricking the system that is corrupt versus like he's a bad actor. I think people, especially a lot of the type of personalities who are like kind of enraged by this, I think uh you you can it can make sense. I will say my Twitterx feed has not had a main character like in this way. This felt like 2013 Twitter, 2011 Justine Sacko Twitter, like where I mean it's a little bit mean-spirited. It's a little like the person is probably responsible for at least a slap on the wrist, but like the having the whole pileon of the like come at you. But I mean literally every post one after another was Soh jokes. So So that made me kind of happy and nostalgic. Yeah. And it was funny. I found it to be like less of a mean pylon than Twitter past. I think people love this guy. And here's like one example like um you know there's been so many tweets like this like update so Parek has vibe coded at least 30 separate $50,000 MR SAS right then he actually real responded I've been building before vibe coding was a thing. Replet has been tremendously helpful to bootstrap uh quick iterations. By the way, then Amjad Masad, the CEO of of Replet, says now you know how did 1,337 jobs like it's almost a celebration of like what you can do if you're a little industrious and maybe use some AI tools. And maybe it is this kind of idea like engineers might have felt down and out, but maybe there's like a path forward that if you actually take advantage of the technology, you won't be replaced, but you can actually be more productive. Well, yeah. And uh I think my favorite I'd seen some tweet out there where it was basically like this is all sponsored content for some kind of like AI coding startup or cuz cuz I think it does exactly that. It it shows this is how you will succeed and the people who actually know how to use it will succeed at a grand scale and their lives will be easy and they can work four jobs. So so I definitely it yeah I think it felt like uh overall you're right so it wasn't a mean pylon it was it was equal parts pylon and celebration. Exactly. There's an interesting and it also sort of goes to like how many engineers are doing this outside of so like if he's you know really gone uh to the 10th degree to try to make this work who else is trying to do it and this is from and I don't I can't like confirm the veracity of this but there's somebody on Twitter called Igor Dennis Blanch who said my research group at Stanford has access to private code repos from 100,000 plus engineers at almost 10 1 1000 companies and about a half% of the world devel world's developers within this small sample. We routinely find engineers working two plus jobs. I estimate that easily more than uh around 5% of engineers are working two plus jobs. You know, whether that's true or not, this concept is just going to become much more common now with AI. And it's funny cuz like before maybe before this vibe coding moment, people would have been like uh even angrier about Sohham. Uh and now they're looking at it and they're like, well, he's just taking advantage of the technology that we're building. Even if he didn't vibe code at all, there's going to be more possible to be a successful Sohham in the future, I would argue. Yeah. And I mean, every hustle bro like make 50k MR while sitting on the beach by vibe coding. He's the living proof. So showed us all you can do it. And uh we can all still hope. Even if you don't get your 100 million from Zuck, you can make 50k MR while sitting on the beach working four jobs. So, how many other Sohams do you think there are out there? By the way, he's he's come out, he's apologized. A lot of this is alleged, so let's just put those caveats in. Well, I also, how do you work for jobs? Like I I was just thinking like I mean, how much interaction like fake interaction do you need to do or does he have like how many Slack messages do you need to send just to kind of check in? Because on one hand like yes the actual like concrete work of four jobs leveraging replet and cursor and tools like that the idea that an engineer could do the work of four engineers that were what they were doing 3 four years ago I definitely makes sense to me but like just getting onboarded getting your like 401k or health insurance set up just sending slacks in the general channels checking in on how people are doing or I don't know like it is it possible you just don't have to do any of that and you can just almost like a machine get a task. I don't know. I mean obviously it's difficult to pull off which is why he didn't uh pull it off but who knows maybe in the next days of AI avatars where the AI avatars of the Zoom CEO and the Clara CEO are doing earnings you can have your bot show up and take your meetings and you can use an agent to do your onboarding. Yep. Okay. Not too bad. That's the dream, right? That's the dream. While you're sitting on the beach 50k MR, this is why I think So has become a folk hero. This is engineers saying, "You think you're going to replace us with AI? Screw you. We're going to take 15 jobs and uh you know, and it's going to work out better for us, the workers, than you, the owners." I can see that. I can uh but then again, we will shrink the size of the industry by 14 15s. But those of us left standing will be sitting on the beach rolling in that revenue. Yeah. He gives new meaning to the 10x engineer. Yeah. Just 10 of them. Actually, wait. That's Google strives for 10x engineers. What if you're 4x but you're just across four different jobs? You should be equally as celebrated. I think. Oh, 100%. I think it's time to do that. And if he can uh maybe he gets 10 of those super intelligence jobs at Meta and he becomes the first billion dollar a year rank and file engine. Actually I I only have respect for the first researcher who gets $200 million a year jobs both at Meta and in OpenAI and somehow is able to work in both and no one notices. That's the dream. Mark my words, this is going to happen. You will see this happen. Be sure as day. We're going to see it. So is the leader of a trend. Honestly, so we're all We all respect you. What a legend. All right, let's go out and enjoy the holiday weekend. And if you're in the US and if you are outside of the US, have a great weekend yourself, Ron John. Great to speak with you as always. Thanks for coming on. All right, see you next week. All right, everybody. Thank you so much for listening. On Wednesday, Ed Zitron is going to come on to talk to us about whether the entire AI business is a scam. he feels quite strongly about that. We'll debate it and have a fun discussion. Thanks again for listening and we'll see you next time on Big Technology Podcast.