The Wild World of AI: 6 Months That Changed Everything
Channel: aiDotEngineer
Published at: 2025-07-10
YouTube video id: I_pICAT8Bhg
Source: https://www.youtube.com/watch?v=I_pICAT8Bhg
There are all of these benchmarks full of numbers. I don't like the numbers. There are the leaderboards. I'm kind of beginning to lose trust in the leaderboards as well. So for my own work, I've been leaning increasingly into my own little benchmark, which started as a joke and has actually turned into something that I I rely on quite a lot. And that's this. I prompt models with generate an SVG of a pelican riding a bicycle. I have good reasons for this. Um firstly, these are not image models. These are text models. They shouldn't be able to draw anything at all, but they can output code and SVG is a kind of code. So, that works. Fast forward to January. Um, and January, we get Deepseek again, Deepseek Strike Back. This is what happened to Nvidia's stock price when DeepSeek R1 came out. Um, I think it was the 27th of January. This was Deepseek's first big reasoning model release. Again, open weight. They put it out to the world. The Chinese labs were not supposed to be able to do this. We have trade. we have like trading restrictions on the best GPUs to stop them getting their hands on them. Turns out they'd figured out the tricks. They'd figured out the efficiencies and yeah, the market kind of panicked and I believe this is a world record for the most a company has dropped in a single day. That's a pretty freaking good Pelican. I mean, the bicycle's gone a bit sort of cyberpunk, but we are getting somewhere, right? And that Pelican cost me like four cents. So, very exciting news on the Pelican benchmark front with Gemini 2.5 Pro. Also that month got I've got to throw a mention out to this. Open AAI launched their GP another one that this came out of the clawed system cut the claw for system cards. Claude 4 will rat you out to the feds if you expose it to evidence of malfeasants in your company and you tell it it should act ethically and you give it the ability to send email, it'll rat you out. Blink and you miss it. They're on to me. They found out about Pelican. That was in the Google IO keynote. I'll have to switch something else. Thank you very much. I'm Simon Wilson. simwilson.net. And that's my talk. Thank you.