Predicting AI’s Next Advances — With Suhail Doshi
Channel: Alex Kantrowitz
Published at: 2024-03-20
YouTube video id: 11a5dqvGOvc
Source: https://www.youtube.com/watch?v=11a5dqvGOvc
a leading AI CEO and entrepreneur joins us to talk about the state of the field the research products competition and where this is all heading all that and more coming up right after this welcome to Big technology podcast a show for cool-headed nuanced conversation of the tech world and Beyond today we have a great guest for you someone I've been trying to bring on for months and I'm very excited to have him here soel Doshi is here he's the CEO and founder of playground which is an AI image generation and editing software company uh somebody that I've been following on Twitter uh pretty religiously to get a sense as to like where this is all heading so I'm thrilled he's here so hell welcome to the show yeah thanks for having me Alex thanks for being here I always end up learning about like the latest breakthroughs through your Twitter account you're like definitely on it you're talking with the right people you have a sensus to where this is going so just to start this conversation on that theme um where are we right now in terms of the curve you know the curve of the curve um are we at the early part are we kind of tring here and I'm also curious to hear just kind of a two-parter where do you think the business cases for this stuff are going to land because I think that's still kind of an open question I mean you run an image generation company and editing company so that's something I'm sure you think about because you have to figure out who you're going to sell to so yeah let's just start real broad to begin with yeah in terms of where we are gosh I am always surprised how you know I used to have this tweet last year that um there were AI breakthroughs every single week and eventually I got to the point where uh I think like Elon must tweeted every single day um and the interesting thing is that that's it still feels like that's happening right I try to follow just the right people researchers uh you know I troll through this uh place where the all the research papers get uploaded called archive and I just read through things that are interesting um that that are kind of taking off sometimes these things are good demos and they're nothing more than a good demo but sometimes they are truly breakthroughs so right now the pace continues to seem uh Relentless you know I'm I'm often surprised by what new thing happens and and what's interesting is sometimes a breakthrough will happen in a week and it's only like a couple days later that something beats that beats that thing in terms of performance or capability or some sort of surprise but if I were take a snapshot about roughly where we are right now it has been over a year since uh gp4 has basically been out uh I I view it as a little bit over a year because I I was a lucky person and I had access around um October to gp4 and I think it kind of came around came out around February um and so it's interesting I I think it's interesting to point at that Milestone because we don't know what has been internally happening at a company like open AI over the last year and they're definitely training a new model so the question and so I had this this thought today that was kind of like we don't really know how far behind everybody else is in language because we have no idea what openai or anthropic or some other company is training internally we have markers for things like Gemini from Google or we have markers from uh mistol but really don't know how far behind they are we only know where they are they are matching compared to last year or the year before that in October that's in language and in images uh images is interesting because it's probably a couple years behind a gp4 true moment right and now audio is starting to happen with a company called sunno that I actually tried out this weekend I'm a producer so I was making songs um so I was trying that out so have some weird thoughts about that and then uh I think the last area is the companies that are doing 3D are just starting to get started um there's a friend of mine who's uh who's starting like a 3D um Foundation model to do like Pixar level type of creation wow um I don't know if I can name them yet so I'll probably avoid doing that for now the fact that that's happening is but it's getting it's yeah video yeah there's video yeah now we're getting like minute long sequences that are not kind of artifact they're they're sort of more coherent with the right character consistency we are at the very beginning I think still where where do I mean what is the Northstar for this stuff like is there like so I'm trying to think about a chatot like or a GPT model right so when do you think the okay it's already pretty good like it does a great job of synthesizing information and spitting stuff back like where does where does it end where are you where like because you mentioned okay they're working on a new model well how's that new model going to be an improvement from what we have and then where do we end up getting to if this keeps on getting better and better I mean Sam's Sam alman's response would be like AI right um I think there's another there's like another version of of of his belief which is like one I think one time when we were talking with Sam um just kind of at a dinner party his I think he he said this thing that he believed that everybody would just like have a thousand employees and I think we all thought he was crazy still do by the way um he may may prove to be right but uh I think chat Bots are just like a very like sort of a basic primitive thing that we'll end up getting you know my general feeling is that one of the I I was talking to someone uh at open AI who was working on the robotics team there back in the day and uh I I was starting to get into AI robotics a little bit I was kind of curious where things were in the general you know to summarize kind of like where the field seems to be I'm not an expert but I've talked enough to enough people that are that you know broadly robotics kind of ASM tooted and hit a ceiling uh about like three or four years ago and the research still isn't like kind of on a trajectory that's amazing but the reason why I'm bringing a robotics is because I want to answer your question about where I think things are headed is that there is a belief um for a little while someone at open a at least had this Bel can't say that it was uh everybody there that the ceiling due to robotics was in part because um uh because maybe the solution to solving it was actually through large language models first maybe if we could find a model that could reason to the extent of language that maybe that could help the robots sort of navigate through some of the toughest problems that they're they're having trouble with and so it's kind of like the sequence is sort of like first make language great and now we're starting to see and then the second thing we start to see is image models and Graphics becoming great and now we're and then the next piece is now we're starting to get a sense of multimodal Vision Plus language plus Maybe audio can we make a multi a very powerful multi modal model and if we can do that maybe those will surface and cause many breakthroughs one of which could be in robotics one of which means that you'd have a robot not a Roomba like a robot that you know maybe embodies us like you know there are a lot of humanoid startups right now and by the way the reason why the humanoid startups are humanoids and not different looking robots is because we know that humans are already able to generalize to lots of human related things right we know that if we're human look if we look like a human then then we can hit a printer button and take a box and you know all these different activities so I think that if the models get more powerful we're probably going to see we're probably going to see some kind of Westworld version uh of the world we're going to go Way Beyond a chatbot and what about reasoning because open AI had this qar thing that people were talking about there's another company that says they've been able to reason is the adding reasoning into large language models is that like another next new bound or is that just a way to get us to this reality that you're talking about man reasoning and AI is kind of like this really big philosophical word I think amongst researchers you work with a research team you you know you talk to researchers reason is really tough for anyone to really prove whether that's actually happening or not um right because the big question is is it just like spitting back the next next word or is it really able to like work through problems on its own which I mean how do we know you're reasoning or I'm reasoning right now how do we know we not doing that how do we know we're not just reacting to our surroundings and predicting the next token we don't really know interesting but then why do you think that there are AI companies that are working on this problem I mean well I think it's not an abstract thing like there's actual research programs and progress that are being made on this question itself you know I think the the qar example from what I've read is that it it basically can solve complex math problems on its own so that requires being able to conceptualize and reason through a problem as opposed to like take all you know it could do novel novel problems so as opposed to like take what you've seen before and spit out something that looks like an answer to a math problem yeah I mean just to dive into the philosophy of reasoning just for like just the tip of the iceberg I think that um you know just some just because something is able to articulate its reason for doing for getting to An Answer doesn't mean that it is necessarily reasoning I think that one way that you could kind of pro the one way that you could prove reasoning is maybe uh you know like one one H because because it's possible that its training data uh you know it's just really tightly fit to its training data and it kind of knows uh it just happens that the next token ends up being step one and then step two and step three we don't really know but I think one way that you could maybe litmus test reasoning is if you gave the model something truly out of distribution right like an example of something that's truly out of distribution that humans faced was Co a pandemic a pandemic that it had not yet seen before and then we had to reason about how we would go and deal with that kind of odd current event um you know the question would be like could you if you gave a model something that it was truly not trained on if you could prove that it was not trained on and you gave it uh and you asked it for a solution could it really figure out the right solution that might be really hard we might be find might find that really difficult yeah yeah and humans don't necessarily do either right well it's interesting because when people talk about like artificial general intelligence it's like well who's your Baseline because right yeah I don't know we we'll see I mean AI can definitely exceed humans in some areas and um and others it can't so anyway it'll be something that we we'll all be talking about for a while um obviously like image generation is something that requires some understanding of the world you're doing it at at playground right the image will understand like let's say you say you know give show me a a monkey sitting on a beach ball will understand that there's some Physics in the world and the monkey has to sit on top of of a beach ball so um I'd love to hear your your perspective on the state of of image generation right now you have an update that you're releasing or you have released by the time this this goes live um obviously it's an exciting time but you're also coming up against some very big companies that are trying to do this as well uh mid Journey's been been at it for a while Dolly 3 is pretty impressive I use it through co-pilot from Microsoft Google of course has has tried it but they've had some some problems there so um talk a little bit about it and also I'd love to hear like the business case here because you know for llms that's one thing like you can say all right we'll read contracts understand them help spit back but for images you know is it that it will replace design that it will democratize Design and make it available to everyone else like I'm curious to hear your perspective on why that's a problem to work on right yeah yeah you know images are are definitely behind in terms of overall capability and utility relative to language you know like I think at at the end of the day all all these models at the for the time being have their kind of narrow s somewhat utility right um you know there are a lot of things posted on Twitter about how people are using language models but the predominant use case continues to be homework right um and then there's kind of this other one that's like coding for images it just turns out that uh the predominant use case is making art and it just turns out it's very surprising but it just turns out that millions of people are very excited to make art and art art can be it's not art that you're going to necessarily always put on your wall but it's art that could be used in marketing like maybe you post it on your Instagram or maybe make an icon and you use it as an icon for your app or maybe it's a YouTube thumbnail or it's an image that you put in a blog post maybe it's just a fun meme that you send a friend but that's some the state of images right now it's really interesting imaginative art uh but I think that it hasn't quite gotten to the utility of language and I think there are a number of things that are probably coming for graphics but I do think it's going to be about democratizing graphics for people I mean our our company is trying to help people make graphics like a pro without being one you shouldn't have have to if you ever open up Photoshop I mean it's there's a dizzying amount of menus right there's all these icons you have to go to YouTube and do a really sophisticated tutorial to be good at illustrator or photoshop or Lightroom I had to take classes on both of those Photoshop and illustrator semester long classes to be able to do that stuff right yeah there was a summer where I grinded just making logos and then I would upload them to site point and try to win logo contests to get better at my own skills back in high school when I was a lot younger and it doesn't have to be that way anymore right um so I think the first thing that's going to happen is that a lot of Graphics are going to be be able to be capable on the model may maybe you know if you have wedding pictures and you wish you could like color grade them somehow maybe You' use light room or something for that I think an image model will be very good at that I know to move on but graphic designers do they become like a extinct profession or where do they go because you canot only create images within playground you can edit them no I mean look like Walt Walt Disney started as a you know person that Drew pictures and then he worked somewhere where he animated them and then that just like evolved and eventually we got to 2D cartoon movies like Snow White and things like that right and uh and then and then and then things like Pixar came up came about and we built like a 3D rendering engine so did all the people that um you know were dra that were people that Drew the 2D cartoons they they they lose their jobs and you know that was the end of the end of an era definitely not people retooled the stories that came from them were still really material to their creative process story matters more for a company like Disney or Pixar than the animation itself so I think in this case you know the graphics matter but I think that people will retool the question is are we giving people enough time to retool yeah so like if you're doing one sheets like I used to work in marketing before I went into a reporting and we used to do one sheet and if you're doing one sheets right like get the headline image you know and passed off as a piece of marketing collateral like that's that seems like you might want to try to invest in some new skills yeah I I also think that there's something there's something beautiful about the person that is creating the thing like let's say you're doing a piece of writing or maybe you're a music artist and you made a song I think there's something really beautiful about that person being able to connect their art their Graphics as closely inter with the other kind of art that they're making like if I'm a music artist I want to be able to choose the exact album art I don't want to have to always Outsource that to somebody who may not really understand what I want yeah it totally resonates because like I've been using image generation for big technology uh and like we I we're I'm a oneperson shop like I couldn't afford to do graphic design for every single story I mean I was hardly making it work with like the whatever iock photo and every now and again I would um you know pay for the image but like now it's like these ill we get perfect almost perfect illustrations every time for every story because this technology has made that possible right you're able to marry the creative process of your podcast with the graphics and the thing you want to show people and only you you can go through as many iterations as you want to find the perfect thing that you think is like the right mapping so I think though I think like and and I have actually talked to people like artists or people that draw who hate this stuff last year I think like a year or two ago I basically got almost cancelled like on Reddit and Twitter and everywhere back when AI art was you know people hated the idea that you would even say AI art is Art and and so one of the things I decided to do is be really curious and I said let me go talk to some of these people that you know basically are sending me death threats on Twitter or something like that and you know some of these people love drawing it doesn't matter that you offer them a better tool they love the idea of picking up a pencil and drawing and so for those people certainly you know that's one way of making art and some people will treasure that and enjoy that um but that's that that would be taking something that they enjoy well they still can do it they can still enjoy it just might not mean that it kind of doesn't evolve perhaps with the time so there will be some people I think it does matter to think a little bit about how fast the technolog is moving and how people will deal with that right definitely okay sorry I didn't mean to break your momentum no no worries yeah so I I think that I think that those things will be possible I think the utility of Graphics are going to increase though significantly um in the next year I think that we haven't really thought through editing for example right um you know a lot of this stuff is generating synthetic images but we haven't really thought about like what if I have an image and I want to add my dog who's not in the image for my holiday card could I take a dog could I take my dog from a different image and then just like insert it where it gets all the lighting and the shadows and the color and all all Ambiance correct what if I want to do Fantastical things like what if I want to see what you would look like if you were the Incredible Hulk big and green but it really had your face and you thought that face was your face doesn't take much imagination to tell you the truth okay yeah I I think I think that um you know what if I want to make a logo logo is really hard I mean I I remember having to pay I have paid people $50,000 to just generate you know whole not well to make hand make a bunch of logos that I want to use to Brand my product um sometimes you can only get five of them though why can't I get a 100 of them um you know I think with Graphics it's uh Graphics is on this NeverEnding um cycle where we never feel like it's good enough if you can you can think a little bit about like PlayStation 1 and then PlayStation 2 and then 3 four five right Grand Theft Auto the first Grand Theft Auto to now maybe what five looks like we can see that Graphics is still improving you know 30 years later so I think by giving giving people tools with where they can do incredible Feats of Graphics is going to be really exciting but I think graphics is only a subfield of of a bigger of a bigger plan that that at least our company has I don't know if there are other companies that care to do this but um our company cares about creating a unified Vision model where we can create and edit and understand anything with pixels a single unified model this is missing in Vision but definitely kind of exists in language in language we can solve hundreds or thousands of different tasks but in graphics but in Vision it's all separated it's kind of like where language was back uh three or four years ago where you you know there would be a model that could summarize and a model that could do sentiment analysis and a model that could you know do little like these different little act tasks but there wasn't a unified single large language model but there's no equivalent for vision what is a large Vision model we don't really have a term for that so my feeling is that Vision as a field uh is going to significantly expand why why can't a robot look at images and navigate the world like a self-driving car right um You that's one thing why can't we understand images or what what's going on better you know we've seen early glimpses of that you know I think there's like a famous picture of Barack Obama stepping on a scale and the model goes knows that Barack Obama is like trying to increase the stale it's like a joke um but can models like really understand what's going on in these images to a much deeper level um so there's large uh there's large Vision models that are starting to incorporate language and images but Vision there's no real all-encompassing multitask Vision model so I have a couple questions for you on this um first of all well on the vision part um does that does that sort of play into like a lot of people have been talking about how you know one of the biggest applications of this current generation of AI is going to be an augmented reality right and like The Meta has those glasses where like you're not you don't have an overlay right now uh but you can talk to things uh to their AI bot and it will look at the world and then give you a sense as to like what you're looking at or you can even just ask questions about things and it we talk to you so I'm curious like how how um how seriously you take this this new era of augmented reality that we seem to be heading in because speaking of one of your tweets you wrote it's going to be hard to beat a computer in your pocket you can use inconspicuously when you need to so it sounds like you're a believer that the phone is going to be the the way that we're going to interact with Computing for a while but uh maybe there's something I'm missing yeah I mean I think the the phone is really good form factor of computing um you know I I've talked to lots of different friends who've tried the Vision Pro and and such um you know it seems like that's still kind of early in terms of its use cases uh and and its utilties so we'll see what happens over the next year or two I tend to be like more optimistic no matter what because you never know um about these things I think I think one thing that meta is doing regardless of where V is headed or or ar is headed is they have one of the most world-class teams for graphics and they have to because of of all the stuff that they're doing in VR um but yeah you know I think um it's kind of unclear what the right form factor is is it is it on your face is it somewhere else is it a you know a thinner V you know video screen I'm not sure but one thing I do feel pretty confident in is that we will care a lot about being able to use AI to manipulate Graphics regardless of the form factor like I'm somewhat form factor agnostic is it a TV is it a watch is it a is it glasses is it some new thing I don't know um but it seems very likely that we're going to care you know like an example would be I wish I could just go into a store stand in front of a mirror and then just sort of swipe for like a jacket that wearing you know I went to I went this weekend to the uh s the S the D young and it had this like sort of fashion uh San Francisco fashion exhibit and there was this you know powered by Snapchat um thing where it like put on a dress and so I was in a dress um at the exhibit um but it was really cool and it's a very it's very obvious that this thing could be higher Fidelity right and that was like a really cool AR experience but why can't I have that for jeans and a jacket or anything that I want to wear without having to try it on um so I think those kinds of experiences seem inevitable regardless of the form factor yeah and then talking about the limitations of image generation models today you know it just seems like they all end up generating images that look so similar and you know when I said before that like I generate the perfect image for each story conceptually yeah um but you can still tell that it's been generated by an AI image model and not a graphic designer so I'm curious like why so much of these AI generated images from your perspective look so similar is it because they're using the same underlying technology using the same training set is it just that they're not quality enough that people can pick them out what do you think yeah I mean language models have this problem too right like the way that we know this is that language models are kind of overly V bur for Boose right they talk a lot they talk a lot right so that's kind of the the the little um tell for language models for images you know the Tells are a little bit different uh maybe they have overly crazy bouquet or they are super Lush in ways that you don't need them to be Lush right um but I I think with images you know what's happening is is that maybe the models are a little bit too curated it's at its infancy but I think that the models are probably too curated and maybe overfit to be based on human preference and human preference is is in your human preference it's your preference I mean it's some kind of average of human preference and so you know in art there's art that we like in the modern time and then there's kind of Avent Guard art and maybe you prefer that right you want something more ostentatious or maybe you want something more minimal and laidback and I think what we're what we've kind of discovered is that actually like there are just huge wide varieties of preference and then there's the average and so I think with image models is somewhat twofold it's it's that we're not catering to people's personalized preferences and styles I think that's one problem or the niches right um and I think the other is that quality is the lowest it's ever going to be starting today right so the quality is going to get incredibly good but it's also it's interesting because so one of I having worked at at Publications where we did have graphic artists right like one of the interesting things were was you would give a prompt or that you'd write a story and that artist would then sort of take it back and based on their own style end up creating an image and I love doing this because I was seeing what they came up with because I was always surprised by what they built because they would do it through their own lens and focus but what AI does I think is it tends to um sort of take everything into account and spit back the average right like kind of words average image and that's where I sort of say you know sometimes I'll be surprised by what an AI image generation engine will will will create but often times it's like yeah that sounds right or that's close enough let's put it on the top of the story right yeah I mean it's interesting humans human Graphics designers are also kind of overfit right they have their own particular style like when anytime I've reached out to a graphics designer sometimes they'll say hey why did you reach out to me like what did you like that I did MH or an interior designer or whoever right um so they all have their style they they're kind of like less Rob these human human graphic designers are somewhat like less robust designers in some sense like they are very skewed to something and then you pick them um and that's cool because then they can lead to brandable things the models if you prompt them simply then they will they will be an average style an average style that represents something you you will probably get at this point you will get something that is beautiful but it may not be it might not be like stretched it might not be headed in a in a stylistic direction because everybody uses it then it feels kind of fatigued it's like a you know what I call it I internally they call this pop it's like pop art right like just pop music top 40 music and then you like like scrx with like growls and noises and sounds right but then there's pop music there's like Justin Bieber that kind of thing what you're getting from the image models is pop and people love Pop we know that love top 40 but it's hard to Market pop all the time because it gets tiring so yeah in this case you know it is definitely the models are capable it's just you have to have like this perfect Alchemy of figuring out the right prompt prompt engineering promp thing that's where it gets interesting then because then let's say okay you know I like um like art in the style of a specific graphic contemporary graphic design and let's say the model's trained on that art and I say all right you know create an image of like robot playing tennis in the style of you know person why sure but then you get into some really tricky questions because we have to figure out a way to either compensate these people because it's like it really becomes like some sort of intellectual property theft so I'm curious like you're running a company um that does image generation so how do you think about this oh yeah super super interesting issue and and really complex you know these days we don't really we're not really seeing customers you know whale be like I wanted in this person's this one person's name you know um which is good I think it's a good thing you know GRE Greg rowski is someone that I think about in this case because a lot of people add that person's name he makes really Amazing Fantasy r that on dvnr maybe he's helped out some video game studios and stuff I'm not sure but um uh but his name is kind of quit essential in this debate and um you know what the reason why people are it's important to understand the reason why people are doing it people aren't doing it because they're trying to copy Greg Kowski they're doing it as a shortcut to get somewhere because if you take an image from rowski it's not easy to articulate in fact there's a reason why there's a phrase called you know picture speaks a thousand words it's very hard to describe his artart completely we can come up with some words but it's a Vibe it's a style and so people are using it as a shortcut and there's other people like HR Gyer who does like kind of more eerie you know horse type stuff there I've learned about a lot of artists because of how people are promting there's a lot and and uh you there's no no easy way around this but I think that this thing is going to go away in the next year this year I mean we're working on something I can't talk about it exactly right now but I think this idea is is that users are doing this because it's a shortcut to get to a very difficult to describe Style and what they really want is to say I like this like I want to reference this I want to reference actually want to reference five of these different things and get to this get to an image because actually with graphics and a lot of a lot of images and art what's happening is like it's like remixing a lot of things like even I I because I make music I can kind of relate to it because it's sort of like you know if your inspiration is Kanye West and then your other inspiration is you know Dr Dre and then your other inspiration is um you know six who produces music for logic and you want to combine like the drum rhythms of this person and the instrumentals of this person and the lyrics of this person did you copy them I mean all of these people were inspired by people and so I think in this case people are just feeling inspired but they're using a shortcut so the question is how do we get them away from just copying actual gregorowski because that that's definitely the wrong thing we definitely don't want that in the world nobody should be copying Kanye uh you know wholesale that's bad too right but you just kind of it's difficult to eliminate the prompt completely like let's say you did have audio generation you know and you could say write me a song about you know I don't know my girlfriend in the you know style of Kanye West yeah I don't know why you want to do that but you could and you know you sort of get into those issues you do yeah you you definitely do but I don't think I don't think that's people's true intent let me ask you this do you think that the artist that are whose work is being trained on should be compensated I think we need to find some solution for them yeah you know we we do a small you know it's not clear like you know every time you generate an image they get like you know some Spotify streaming payment you know 100 1,000 of a penny I don't think anyone's going to be happy in that circumstance right but you know we we try to do something small small we don't we don't think this is like solution we don't think this is enough of a remedy per se by any means as a stretch of imagination but one thing we do that nobody seems to else to do is uh we actually link back to a lot of these artists yeah when an image gets generated we say additional credit gregorowski and it links directly back to his Deviant Art page so that people can find him learn about him pay him donate to him whatever they want right we even we even link back to like Wikipedia artists that you know no longer or artists that are on Wikipedia they're not living just so people understand what they're doing um yeah I think that's a good start okay so let me let me ask you this um you're like you're you're doing you're doing a an image generation startup you're very focused like you'll tweet often about how it's so so important to stay focused and I do think there's something to be said for that because there's so many other companies that are just kind of going all over the place um do you regret not doing video though because what we've seen out of open AI Sora and others is just kind of you know jaw-dropping it's pretty amazing so is that something that now you think you should have done I mean it's too soon to tell You' have to ask me in a year to find out if I regret it right um right now you know I don't have any regret um I it's it's funny where we are with video is kind of where we were with images and I don't know if people remember but about two years ago Dolly came out Dolly 2 came out in April uh and the world was amazed totally amazed but if anyone goes looks at a dolly2 image today images are awful horrendous images you would laugh right right so when we see something like Sora come out I you know I I I have this belief i' I've been having this personal reaction or a moment with all of this stuff which is that my Baseline for Quality instantly resets like 15 minutes after the the technology comes out I'm just kind of like anything worse than this is unacceptable and my feeling is that we're only at the very beginning of video and and the truth is if you could probably go talk to real video people they'll be like yeah this is not good enough this is definitely I can't use this people are going to have a lot of fun doing it but the utility is probably not there yet it's probably we're really just at the beginning I think for video so I think my feeling is my bet to people you know sort of listening is that in a year we will think you know we'll think something like s was not even close right yeah it is amazing I mean that transition you're you talked about from Dolly 2 to Dolly 3 I mean even going from Dolly 2 to Mid journey I was just like I'll never type the word dolly in a Google search ever again I get anywhere close to it and it is right it's amazing I think you've pointed this out how fast that we're moving that these jaw-dropping breakthroughs become obsolete or like kind of looked at as unimpressive a few months later that's the speed that this stuff is is moving up totally and I think with you know so that's video so I don't I don't have any worries about video because I think video is still early like there's still maybe a moment where we can do vide there's nothing there's nothing stopping us from doing that to somewhat easily take what you've learned with images and go to go to video underrating it but yeah yeah I think I think you know without saying too much I feel like probably where we're headed with images is not going to be like it's not going to be like a completely you throw every we have to restart and throw everything away to go do video you know to put put it simply I mean we were trying we were trying to work to a a unified Vision model that incorporate 3D and video and everything related to pixels into a single model that's capable of everything but I think for now we're just we're trying to start with something that's narrow and sharp that we think is deeply underinvested in and we still think that images have ways to go um yeah let me ask you something about video before we go to break there because there's a debate that I've been trying to wrap my head around which is kind of this debate between Yan laon who built this thing called V JEA right which will black out a portion of a video and then the model with its understanding of the world will basically fill in what it should have been so you know you have a guitar and someone seems to be playing it black out the hand and the model will create the hand and the strings showing that it has an understanding of the real world they say that that's not generative that that's actual real world understanding and then on the other side you have open AI that's created uh Sora and it's uh it's this pretty amazing thing where like clearly this model understands the physics of what's happening because the pirate ships are you know sloshing around in a cup of coffee ocean and it's like oh they understand that the ships belong in the ocean and this is the way the ocean moves and this is the way the ships should interact with the water it's so impressive and it seems like it also understands the world but you ask the meta folks and they would say that actually uh that process of generating these videos is actually limited and doesn't achieve what the AI research Community is trying to achieve what do you think yeah I think that you know I haven't studied um Yan's bppa thing too deeply uh but I get the gist of it I mean I would posit this to you are you sure it understands physics no right because actually because let me stand on the side of of that it does and then you can sort of take take this argument down I mean come on like boats in the water the water's coffee you know right yeah I mean that's my argument what do you have to say well I mean there that that it's it's not too difficult to refute in part because like just just imagine that there's um video and the video represents uh the physics of a different world like Mars right right and even though there are natural physics to Mars they don't necessar represent the physics of Earth they represent some Physics it just happens to not be Earth and so I would say you could just pull that thread a little bit longer and just say actually what it's really doing is it's representing the physics it understands in the videos it's being trained on which could be incorrect physics it's really what it understands what it's being trained on it's m kind of my main the main thrust of my point and that to a human to us it looks like physics it's imitating physics and it's not but it's not necessarily imitating correct physics so it's really mimicking and understanding of it's training data and likely and if there's any training data that's like cool CG or like you know the Matrix or Neo's like B you know on his back that's not real physics of our world but it models its training data and I think that that's totally fine though for a tool that's meant for creativity that's acceptable but can we really say that it has learned physics I can't say that I don't think we can not yet you know maybe lighting but even the videos that have lighting could have incorrect lighting right uh yeah so on and so forth yeah I think that that the folks that I speak with in the AI Community are really divided on this like we had Brian kazero from Nvidia a little bit back he's runs applied machine learning there and he's like implying some metaphysical capabilities in these large language models where whereas like others would say that it's just predicting the next word and this could be the same thing we're like we still we we're still so early on and still trying to figure out like what's happening in these advances that it's still an open question or maybe I'm just giving the people you too much credit I just but I I I take like a very different argument than these like two different factions yeah okay I posit I take the argument that it doesn't I just like I don't think it matters right like at the end of the day we are making we are making these mod could do it or could not but either way what matters is what utility it brings to humanity and if what it brings is this amazing you know creative tool to create super slow motion action shots for the next mat Matrix movie that's fine and if it can truly model physics in the real world because we want to simulate what might happen with self-driving cars uh at a faster speed than actually having the cars be out in the world so be it to me it doesn't matter it's kind of irrelevant what what matters more is it's value to us as humans and I think we're a little like too deep on a philosophical level about whether you know it's this or that the reason why I ask the philosophy questions is that they matter from my perspective in terms of like what you can do next like if it does understand physics then you can imagine or or anticipate that it will be able to do more than if it doesn't but it's it's definitely interesting I guess I'm trying to say that it can do both right so anything is really like the options are kind of wide open definitely okay let's take a break I want to when we come back I want to do a quick lightning round through the tech Giants and also talk a little bit about uh well one of the tech Giants the state of Google so uh why don't we do that when we come back right after this and we're back here on big technology podcast we're here with soel DOI he's the CEO and founder of playground uh we talked a little bit about image generation in the beginning uh in the time we have left let's go rather quickly through the tech Giants um let start with Google because Google's been sort of like the punching bag of the AI community for a while um so you even had a tweet that says Google's lost its way it's the best company to compete with even investors have stopped ask even investors have stopped asking what if Google does it I mean Google did just start doing image generation they had to shut it down um what what what is happening there oh man I wish you know obviously I I only have a slight preview into what's going on at Google but you know my my guess as to what feels like is happening is uh they are in a significant race where either investors or customers believe that by losing this race uh it's an existential issue time will tell however and Google's rushing to uh be a strong leader in that race and they have to contend with a significant complex bureaucracy that is not really well attuned for the velocity um that AI is running at right now so it's organizational and I also like yeah go I I'll just say this I wonder how I've wondered how much of it is because Google sees a threat to search over time if it pushes the status quo forward too quickly and right before we were talking I was on uh CNBC talking about the state of Google and I was absolutely floored by one of the numbers that DJ brosa uh who's an anchor there brought up up which is that um G I think Gartner believes that by 2026 we will be doing 26% less searches than we are today or search engines will have 26% less traffic I know you've uh you're you're connected with perplexity in some way we just said arv in certain us I'm kind of floored by that number I don't believe it I think that um search is going to be continue to be a a way that we use web navigation and AI search like perplexity will be more to satisfy curiosity um and and engage with different topics what do you think about the stat and what do you think about that that argument that I'm making I mean I think there's a very high probability it is greater than that number in a shorter span of time whoa for real that will be doing even fewer that that search engines will have even less traffic than even an even greater decline than 25% that's right and that will Happ before 2026 okay yeah that's right exactly because let's think about the model jump so far right we've got um you Dolly 2 and April two years ago look at the difference between that and and any cutting inch model uh we can look at gpt3 which was four years ago um and now we have GPT 4 GPT 5 is probably slated imminently this year the jump from four to three was incredible and I don't and I think the key the reason why I believe this this perhaps this like very surprising thing is because uh I don't think people quite internalize how many more how big of a jump can be had still like we're still so at the beginning the early phases of this thing that um it is it is moving faster than Moors law by a lot M and the biggest people right now are putting in huge quantities of money the I think I already find it annoying to have to go to Google and like run through a few links and then click and then back and then click and then back and then oh there's an ad here okay let me scroll down you know it's already it's I think Humanity already can tell it's frustrating so if you were to if you were to go hm this thing is already kind of inefficient somewhat frustrating in fact like I just want the answer I don't want to have to find the answer right I think that's the problem these things are solving and you look at the model jumps right over the last 3 four years it doesn't seem it doesn't seem surprising that like almost all the traffic would shift to something that is I mean Google has very low switching costs you mean right now it happens to be integrated well in the browser right it it happens to be um uh but actually the funny thing about Google is like it has slightly less lockin and ease on mobile in most consumer traffic desktop is shrinking for Consumer while uh Mobile's dramatically increasing those lines cross many times yeah we you have Android we do have Android but but we're talking about you know whether Google search matters like Google could make a model that matters and is relevant but it still might it might still might spell the end of its search business so my my general feeling about this is that the ux of like something like perplexity we've already figured out is like a nice uxx and you combine that with another model jump like GPT 5 or six it doesn't seem that crazy to me that we end this desire of going to Google and then scrolling through BL links and clicking on each of them is your default SE Google or something else oh certainly it's Google but I've already but but I've already shifted so much my it's not my first goto right right unless I want to go to a very specific site I mean people going to their address bar inside of a browser or phone to search a website that they're trying to go to is just they're not really using Google's value they're just you know it's like my you my dad used to type in CNN.com into Google and you could just type it in the that's not a real search yeah right um I I don't I already think it's not really a a great go-to interesting okay another thing that you said let's go to meta you said the only thing scar than Satya is Mark Zuckerberg taking AI seriously unpack that well I feel like Mark has been very focused on VR because he's trying to do something that I think you know just regardless of your view of whether VR is going to succeed or not it's ambitious if he succeeds and I think he's like a very Relentless entrepreneur and founder so and he's one of the few entrepreneurs and Founders that are like running a trillion dollar plus company not that many left so I feel you know I somewhat feel like it's him and Jensen I think yeah and he's and he's he's very young still so I think that you know for him to take AI ser and the thing about meta is it is super set up to succeed at this they have the world's they're like the first second biggest research lab they are they have an immense quantity of compute that's only growing I mean I think he talked about having 350,000 h100s by the end of the year or something like that yeah and they're going to have total 650,000 either uh uh GPU like um equivalents by by I think the end of the year which is crazy yeah he he's got an extremely ambitious AI research leader that's a lot of gpus it's a lot how many do you guys have uh no not anywhere close to that I mean more than a thousand not more than a thousand right so it's just the crazy I mean speaking with service now also which is they 150 billion 160 billion public company like they wouldn't say that they have in the thousands in an interview that I did with that wow so yeah wow to have 600,000 isn't it's crazy yeah I just I just think that you know you combine founder with uh Relentless ambition with compute with the best talent you know to me it's a recipe that is hard to I mean and then you compare that with Google you know it feels a little like you know uh to me it feels like uh they're forced to be reckoned with in the next few years okay let's talk about Nvidia speaking of Jensen I want to test an assumption here uh I recently have found out that they the basically are the software that they that they sell along with their chips is core to training AI models and that makes switching away a lot more difficult is that something that you're finding in your business that you're using the chips and the software to train models and you'd have a hard time switching to like an AMD yeah the software is called cuda and it's like their platform for doing all kinds of the it's their way of interfacing with their gpus uh and so you know it has locking in the sense that there's like a huge developer Community around it just like x86 or something like that you know maybe there's um you know software that's really tuned and optimized for x86 six so that's what causes people to kind of stay on it with Cuda um you know it's not Cuda that's keeping I think keeping a lot of us it's actually that there is nothing really dramatically better than nvidia's gpus and so if there's nothing dramatically better then I mean the the reality is the costs for training and inference are so high at companies that scale that Cuda is not is not like a big reason why you're going to stay there it's gonna it's going to come down to cute costs and so if there were somebody that were really driving the costs down for the rest of us we would all flip because it would be worth it so to me it's not really it's not just a function of Cuda you know I think that does that is true to some extent but I think for the big companies or anyone spending a lot of money uh you know we're just we are we all want there to be someone that can compete with Nvidia because one of the problems with Nvidia was that they You released their h100 but they didn't really reduce its cost MH you know it is it's two is you know 1.9x faster but 2x costlier uh and um and and it technically reduces your cost because you're getting more GPU compute per node like you have a server server costs finite amount well now you can put more of the GPU dense compute per node so your Costco down but they didn't really price their gpus lower so that's somewhat disappointing because it would have been nice if it were the same price but double the compute obviously so Nvidia knows what they have right and so what about a company like Amazon they're obviously developing their own chip they're making models available off the shelf people are using AWS compute I imagine to run models what's your perspective on Amazon's place here I mean they also have Alexa which is like you know the sleeping giant yeah I mean I think I think AWS has significantly missed the mark actually on this I think that Azure and gcp are doing uh Azure better than gcp better than AWS AWS is interesting me we we look we were looking for compute last year and AWS wanted to charge us five times more for the same GPU uh than 10 different providers all around them would you stop at a gas station that was that cost you five times more than the that's right next to it I would speed bu it throw right and and that's kind of what you know I think what's happening there is that this is my insight um so I hope it's helpful for someone but my guess is that they have a scarce stockpile of gpus and they know that they can price those gpus internally they can price them to their internal customers at that price and the customers will buy it because the customers can't go anywhere else you know maybe maybe because they're not allowed to in their company so then they can charge five times more and that's what the sales reps are doing but if you are a new customer we have choices you're not doing that you're not going to do that but the sales reps will do it because it helps them reach you know quota that's crazy so it's so I think there's a shortterm there's a sort of short it kind of feels like ever since Andy Jazzy became CEO AWS has turned very short-term minded um about how it's going to earn revenue and this is obviously bad because anyone that knows anything about startups knows that the the biggest companies are yet to be built but they're definitely going to not be running a AWS if their compute is five times more expensive wow yeah that's crazy and I wonder what that means for startups like anthropic that are you know have billions of funding from Amazon and are going there they might be priced at 35x yeah exactly um let's talk Apple real quick I wonder what they're going to do with AI I mean they're hinting that they're going to do something at WWDC it's like they're going to make a supercharged Siri or you know take the search bar away from Google and then give up on all that money they're getting um it's they have some incumbency advantage don't disadvantage don't they because if they really push push hard on AI to you know take up more room in the operating system then they can crowd out some of the advantages that they have today yeah I I think apple is in a really good position because their culture is already seemingly like one where they wait and see and their advantages are not easily eroded because they own the hardware platform and all the network effects that are associated with that so Apple seems like they're in a really healthy position to wait and see and build the best things not just build uh kind of like aimlessly and Google feels like it's just trying to build everything right they're building image gen and Gemini and the coding and they're building an IDE and you know they're and then they're like asked all the PMS clearly to like integrate it this week into like every imaginal product I opened Gmail opened docs I opened so many different random Google things uh and they they're all trying to con me to use Ai and I think apple is super well positioned to just like let Google do all those experiments and then just pluck the ones that are the best ones and use its massive install base and distribution power uh to deliver an amazing experience not a rushed rushed one so I think apple is behind but I think that they are often okay being behind and they execute very well uh kind of from behind because they find ways to LEAP um yeah all right let's talk lastly about Microsoft and openai you know we you you gave a pretty strong statement about Amazon I'm curious what you would think what you think about the current offering from those companies I mean obviously you're competing with them on the image gen front um and also just like from your sense do you think that the open AI situation is stable right now or is are there going to be more fireworks on the governance side there H you know I think that I think that the folks at open AI really only care about one thing they and I I think people don't fully internalize this because it seems it seems a little too crazy it seems like sometimes when you read a company's Mission you're like whatever but I think that they I think that Sam is genuinely focused on AI in attaining that and I think he does not care about graphics and you know video necessarily I think I think he those are stepping stones and that helps research uh you know get to the next point but I think he is very focused on that and so I think you know broadly we don't tend to worry about that because we're pouring all of our energy uh into Graphics um I can't I can't say much about Microsoft but I can just say that I you know genuinely believe that open AI is trying to pursue that effort um I can't tell whether that'll be three years from now or 30 years from now though yeah in terms of what Microsoft's doing yeah I don't know not sure but brilliant play by suia either way seriously yeah yeah no matter what happens to be the most valuable company in the world uh been they seem like they're the tech giant that's in the best position right now which is wild given where they were like seven years ago eight years ago and also and also just very surprising because it's like if they had not done that they would have been maybe in the worse position totally I mean being aggressive sometimes it it matters they've learned their lesson right they sat by and tried to ride windows for as long as they could and then people were like yeah we don't want to use desktop operating systems anymore and they're like oh that's interesting okay and right and the person that led that shift from you know one era of computing to another was SAA in the server and tools division so here he goes again yeah you know he he has he is doing something that I find that's even like slightly even more brilliant which is not just the open E deal but if You observe very carefully he is actually partnering with everybody he is bringing all the models into Azure right um and he's doing it very methodically and I think that he is really setting up Azure to be uh to to LEAP to leap and and be a lot more competitive so I I just I actually think that he's doing a really good job kind of playing every field yeah uh and positioning himself kind of in the mid positioning Microsoft sorry uh kind of in the middle of all that um so you know game recognize game to sub you totally all right uh just to end I I want to say that um I actually reached out to you initially uh when you had a a tweet advising Founders that you know if you're going to speak with a journalist speak with uh someone who's independent and I certainly am independent and uh I dm'd you and I was like so and you you lived up to your words so I appreciate that and I'd also say that like these conversations are super valuable and um I think that that speaking with journalists inside we're probably not going to agree on this one so I'm be a different different conversation but speaking with journalists inside um some of the corporate media um I don't think they're all out to to get Tech Founders um especially off the record conversations sort of like if there's a divide between Founders and and reporters then the misunderstandings will just grow um but anyway my piece so I appreciate you being here though but go ahead yeah I yeah I think that's sort of the real issue is not that you know the individual reporter is you know bad person like I think they're all like well-meaning when well intentional so when you if we have conversations with them right if you have a conversation over drinks or dinner whatever they're Obviously good people good good well- meaning people working hard that's not so much the issue that that a lot of us uh that you know that basically think that you should largely stop talking to the institutional media have it's not that we think they're bad it's that we think that their institutions are bad and their institutions create incentives that uh create bad situations um you know uh where where like they you know like what is the cause you know we should I think we should also be a little bit curious like what is the cause that causes a reporter to write a story and then email you and say you know do you have a comment and then publish the story one hour later what is the cause for that you know is that person a bad person probably not that person is under some kind of deadline or incentive or pressure that is causing this thing and this particular this I pick on this instance because it's it's a very obvious one that everybody knows is bad um is is not well you know that has uh really bad implications know not giving Founders to respond to something um you it's happened to a lot of my friends a lot of other people talk about this so you know I think that's the real issue and that's why you know you could just as well work at you know any of these media institutions but the fact that you're independent causes your incentives and your desires of what you want to write and what you want to do to be totally different and the reporters that used to work at some of these institutions that have struck it out on their own like you uh you can see it all get cleaned up right they they completely change what they write uh what their beats are and how they work and interact with other people in the world um so I think it's like it's a lot better more factual more interesting um reporting yeah it's it's interesting I mean like I obviously am competing against like the broader media ecosystem um so I do hear you on that front anyway it's one it's something that we could talk about forever it is good to hear your perspective on it and and once again I appreciate that you um you put something out there in the world and then when I was like all right let's talk you said yes so I hope this isn't the last time I hope to have you back and so thrilled that you were able to come on and join and talk about all the new stuff that you're working on in the broader industry it's like it's cool to be able to speak with someone who like you read their stuff on Twitter and then like you have a conversation like this goes more longer than an hour and it could easily go two or more so uh the substance is there and and appreciate you being here thanks again you thank you for having me all right everybody thank you for listening we'll be back on Friday to break down the news as we do every week and we'll see you next time on big technology podcast