Predicting AI’s Next Advances — With Suhail Doshi

Channel: Alex Kantrowitz

Published at: 2024-03-20

YouTube video id: 11a5dqvGOvc

Source: https://www.youtube.com/watch?v=11a5dqvGOvc

a leading AI CEO and entrepreneur joins
us to talk about the state of the field
the research products competition and
where this is all heading all that and
more coming up right after this welcome
to Big technology podcast a show for
cool-headed nuanced conversation of the
tech world and Beyond today we have a
great guest for you someone I've been
trying to bring on for months and I'm
very excited to have him here soel Doshi
is here he's the CEO and founder of
playground which is an AI image
generation and editing software company
uh somebody that I've been following on
Twitter uh pretty religiously to get a
sense as to like where this is all
heading so I'm thrilled he's here so
hell welcome to the show yeah thanks for
having me Alex thanks for being here I
always end up learning about like the
latest breakthroughs through your
Twitter account you're like definitely
on it you're talking with the right
people you have a sensus to where this
is going so just to start this
conversation on that theme um where are
we right now in terms of the curve you
know the curve of the
curve um are we at the early part are we
kind of tring here and I'm also curious
to hear just kind of a two-parter where
do you think the business cases for this
stuff are going to land because I think
that's still kind of an open question I
mean you run an image generation company
and editing company so that's something
I'm sure you think about because you
have to figure out who you're going to
sell to so yeah let's just start real
broad to begin with yeah in terms of
where we are gosh I am always surprised
how you know I used to have this tweet
last year that um there were AI
breakthroughs every single week and
eventually I got to the point where uh I
think like Elon must tweeted every
single day um and the interesting thing
is that that's it still feels like
that's happening right I try to follow
just the right people
researchers uh you know I troll through
this uh place where the all the research
papers get uploaded called archive and I
just read through things that are
interesting
um that that are kind of taking off
sometimes these things are good demos
and they're nothing more than a good
demo but sometimes they are truly
breakthroughs so right now the pace
continues to seem uh Relentless you know
I'm I'm often surprised by what new
thing happens and and what's interesting
is sometimes a breakthrough will happen
in a week and it's only like a couple
days later that something beats that
beats that thing in terms of performance
or capability or some sort of surprise
but if I were take a snapshot about
roughly where we are right now it has
been over a year since uh gp4 has
basically been out uh I I view it as a
little bit over a year because I I was a
lucky person and I had access around um
October to gp4 and I think it kind of
came around came out around February um
and so it's interesting I I think it's
interesting to point at that Milestone
because we don't know what has been
internally happening at a company like
open AI over the last year and they're
definitely training a new model so the
question and so I had this this thought
today that was kind of like we don't
really know how far behind everybody
else is in language because we have no
idea what openai or anthropic or some
other company is training internally we
have markers for things like Gemini from
Google or we have markers from uh mistol
but really don't know how far behind
they are we only know where they are
they are matching compared to last year
or the year before that in October
that's in language and in
images uh images is interesting because
it's probably a couple years behind a
gp4 true moment right and now audio is
starting to happen with a company called
sunno that I actually tried out this
weekend I'm a producer so I was making
songs um so I was trying that out so
have some weird thoughts about that and
then uh I think the last area is the
companies that are doing 3D are just
starting to get started um there's a
friend of mine who's uh who's starting
like a 3D um Foundation model to do like
Pixar level type of creation wow um I
don't know if I can name them yet so
I'll probably avoid doing that for now
the fact that that's happening is but
it's getting it's yeah video yeah
there's video yeah now we're getting
like minute long sequences that are not
kind of artifact they're they're sort of
more coherent with the right character
consistency we are at the very beginning
I think still where where do I mean what
is the Northstar for this stuff like is
there like so I'm trying to think about
a chatot like or a GPT model right so
when do you think the okay it's already
pretty good like it does a great job of
synthesizing information and spitting
stuff back like where does where does it
end where are you where like because you
mentioned okay they're working on a new
model well how's that new model going to
be an improvement from what we have and
then where do we end up getting to if
this keeps on getting better and better
I mean Sam's Sam alman's response would
be like AI right um I think there's
another there's like another version of
of of his belief which is like one I
think one time when we were talking with
Sam um just kind of at a dinner party
his I think he he said this thing that
he believed that everybody would just
like have a thousand employees and I
think we all thought he was crazy still
do by the way um he may may prove to be
right but uh I think chat Bots are just
like a very like sort of a basic
primitive thing that we'll end up
getting you know my general feeling is
that one of the I I was talking to
someone uh at open AI who was working on
the robotics team there back in the day
and uh I I was starting to get into AI
robotics a little bit I was kind of
curious where things were in the general
you know to summarize kind of like where
the field seems to be I'm not an expert
but I've talked enough to enough people
that are that you know broadly robotics
kind of ASM tooted and hit a ceiling
uh about like three or four years ago
and the research still isn't like kind
of on a trajectory that's amazing but
the reason why I'm bringing a robotics
is because I want to answer your
question about where I think things are
headed is that there is a belief um for
a little while someone at open a at
least had this Bel can't say that it was
uh everybody there that the ceiling due
to robotics was in part because um uh
because maybe the solution to solving it
was actually through large language
models first maybe if we could find a
model that could reason to the extent of
language that maybe that could help the
robots sort of navigate through some of
the toughest problems that they're
they're having trouble with and so it's
kind of like the sequence is sort of
like first make language great and now
we're starting to see and then the
second thing we start to see is image
models and Graphics becoming great and
now we're and then the next piece is now
we're starting to get a sense of
multimodal Vision Plus language plus
Maybe audio can we make a multi a very
powerful multi modal model and if we can
do that maybe those will surface and
cause many breakthroughs one of which
could be in robotics one of which means
that you'd have a robot not a Roomba
like a robot that you know maybe
embodies us like you know there are a
lot of humanoid startups right now and
by the way the reason why the humanoid
startups are humanoids and not different
looking robots is because we know that
humans are already able to generalize to
lots of human related things right we
know that if we're human look if we look
like a human then then we can hit a
printer button and take a box and you
know all these different activities so I
think that if the models get more
powerful we're probably going to see
we're probably going to see some kind of
Westworld version uh of the world we're
going to go Way Beyond a chatbot and
what about reasoning because open AI had
this qar thing that people were talking
about there's another company that says
they've been able to reason is the
adding reasoning into large language
models is that like another next new
bound or is that just a way to get us to
this reality that you're talking
about man reasoning and AI is kind of
like this really big philosophical word
I think amongst researchers you work
with a research team you you know you
talk to researchers reason is really
tough for anyone to really prove whether
that's actually happening or not um
right because the big question is is it
just like spitting back the next next
word or is it really able to like work
through problems on its own which I mean
how do we know you're reasoning or I'm
reasoning right now how do we know we
not doing that how do we know we're not
just reacting to our surroundings and
predicting the next token we don't
really
know interesting but then why do you
think that there are AI companies that
are working on this problem I
mean well I think it's not an abstract
thing like there's actual research
programs and progress that are being
made on this question itself you know I
think the the qar example from what I've
read is that it it basically can solve
complex math problems on its own so that
requires being able to conceptualize and
reason through a problem as opposed to
like take all you know it could do novel
novel problems so as opposed to like
take what you've seen before and spit
out something that looks like an answer
to a math problem yeah I mean just to
dive into the philosophy of reasoning
just for like just the tip of the
iceberg I think that um you know just
some just because something is able to
articulate its reason for doing for
getting to An Answer doesn't mean that
it is necessarily reasoning I think that
one way that you could kind of pro the
one way that you could prove reasoning
is maybe uh you know like one one H
because because it's possible that its
training data uh you know it's just
really tightly fit to its training data
and it kind of knows uh it just happens
that the next token ends up being step
one and then step two and step three we
don't really know but I think one way
that you could maybe litmus test
reasoning is if you gave the model
something truly out of distribution
right like an example of something
that's truly out of distribution that
humans faced was Co a pandemic a
pandemic that it had not yet seen before
and then we had to reason about how we
would go and deal with that kind of odd
current event um you know the question
would be like could you if you gave a
model something that it was truly not
trained on if you could prove that it
was not trained on and you gave it uh
and you asked it for a solution could it
really figure out the right solution
that might be really hard we might be
find might find that really
difficult yeah yeah and humans don't
necessarily do either right well it's
interesting because when people talk
about like artificial general
intelligence it's like well who's your
Baseline because right yeah I don't know
we we'll see I mean AI can definitely
exceed humans in some areas and um and
others it can't so anyway it'll be
something that we we'll all be talking
about for a while um obviously like
image generation is something that
requires some understanding of the world
you're doing it at at playground right
the image will understand like let's say
you say you know give show me a a monkey
sitting on a beach ball will understand
that there's some Physics in the world
and the monkey has to sit on top of of a
beach ball so um I'd love to hear your
your perspective on the state of of
image generation right now you have an
update that you're releasing or you have
released by the time this this goes live
um obviously it's an exciting time but
you're also coming up against some very
big companies that are trying to do this
as well uh mid Journey's been been at it
for a while Dolly 3 is pretty impressive
I use it through co-pilot from Microsoft
Google of course has has tried it but
they've had some some problems there so
um talk a little bit about it and also
I'd love to hear like the business case
here because you know for llms that's
one thing like you can say all right
we'll read contracts understand them
help spit back but for images you know
is it that it will replace design that
it will democratize Design and make it
available to everyone else like I'm
curious to hear your perspective on why
that's a problem to work on right yeah
yeah you know images are are definitely
behind in terms of overall capability
and utility relative to language you
know like I think at at the end of the
day all all these models at the for the
time being have their kind of narrow s
somewhat utility right um you know there
are a lot of things posted on Twitter
about how people are using language
models but the predominant use case
continues to be
homework right um and then there's kind
of this other one that's like coding for
images it just turns out that uh the
predominant use case is making art and
it just turns out it's very surprising
but it just turns out that millions of
people are very excited to make art and
art art can be it's not art that you're
going to necessarily always put on your
wall but it's art that could be used in
marketing like maybe you post it on your
Instagram or maybe make an icon and you
use it as an icon for your app
or maybe it's a YouTube thumbnail or
it's an image that you put in a blog
post maybe it's just a fun meme that you
send a friend but that's some the state
of images right now it's really
interesting imaginative art uh but I
think that it hasn't quite gotten to the
utility of language and I think there
are a number of things that are probably
coming for graphics but I do think it's
going to be about democratizing graphics
for people I mean our our company is
trying to help people make graphics like
a pro without being one you shouldn't
have have to if you ever open up
Photoshop I mean it's there's a dizzying
amount of menus right there's all these
icons you have to go to YouTube and do a
really sophisticated tutorial to be good
at illustrator or photoshop or Lightroom
I had to take classes on both of those
Photoshop and illustrator semester long
classes to be able to do that stuff
right yeah there was a summer where I
grinded just making logos and then I
would upload them to site point and try
to win logo contests to get better at my
own skills back in high school when I
was a lot younger and it doesn't have to
be that way anymore right um so I think
the first thing that's going to happen
is that a lot of Graphics are going to
be be able to be capable on the model
may maybe you know if you have wedding
pictures and you wish you could like
color grade them somehow maybe You' use
light room or something for that I think
an image model will be very good at that
I know to move on but graphic designers
do they become like a extinct profession
or where do they go because you canot
only create images within playground you
can edit them
no I mean look like Walt Walt Disney
started as a you know person that Drew
pictures and then he worked somewhere
where he animated them and then that
just like evolved and eventually we got
to 2D cartoon movies like Snow White and
things like that right and uh and then
and then and then things like Pixar came
up came about and we built like a 3D
rendering engine so did all the people
that um you know were dra that were
people that Drew the 2D cartoons they
they they lose their jobs and you know
that was the end of the end of an era
definitely not people retooled the
stories that came from them were still
really material to their creative
process story matters more for a company
like Disney or Pixar than the animation
itself so I think in this case you know
the graphics matter but I think that
people will retool the question is are
we giving people enough time to retool
yeah so like if you're doing one sheets
like I used to work in marketing before
I went into a reporting and we used to
do one sheet
and if you're doing one sheets right
like get the headline image you know and
passed off as a piece of marketing
collateral like that's that seems like
you might want to try to invest in some
new
skills yeah I I also think that there's
something there's something beautiful
about the person that is creating the
thing like let's say you're doing a
piece of writing or maybe you're a music
artist and you made a song I think
there's something really beautiful about
that person being able to connect their
art their Graphics as closely inter with
the other kind of art that they're
making like if I'm a music artist I want
to be able to choose the exact album art
I don't want to have to always Outsource
that to somebody who may not really
understand what I want yeah it totally
resonates because like I've been using
image generation for big technology uh
and like we I we're I'm a oneperson shop
like I couldn't afford to do graphic
design for every single story I mean I
was hardly making it work with like the
whatever iock photo and every now and
again I would um you know pay for the
image but like now it's like these ill
we get perfect almost perfect
illustrations every time for every story
because this technology has made that
possible right you're able to marry the
creative process of your podcast with
the graphics and the thing you want to
show people and only you you can go
through as many iterations as you want
to find the perfect thing that you think
is like the right mapping so I think
though I think like and and I have
actually talked to people like artists
or people that draw who hate this stuff
last year I think like a year or two ago
I basically got almost cancelled like on
Reddit and Twitter and everywhere back
when AI art was you know people hated
the idea that you would even say AI art
is Art and and so one of the things I
decided to do is be really curious and I
said let me go talk to some of these
people that you know basically are
sending me death threats on Twitter or
something like that and you know some of
these people love drawing it doesn't
matter that you offer them a better tool
they love the idea of picking up a
pencil and drawing and so for those
people certainly you know that's one way
of making art and some people will
treasure that and enjoy that um but
that's that that would be taking
something that they enjoy well they
still can do it they can still enjoy it
just might not mean that it kind of
doesn't evolve perhaps with the time so
there will be some people I think it
does matter to think a little bit about
how fast the technolog is moving and how
people will deal with that right
definitely okay sorry I didn't mean to
break your momentum no no worries yeah
so I I think that I think that those
things will be possible I think the
utility of Graphics are going to
increase though significantly um in the
next year I think that we haven't really
thought through editing for example
right um you know a lot of this stuff is
generating synthetic images but we
haven't really thought about like what
if I have an image and I want to add my
dog who's not in the image for my
holiday card could I take a dog could I
take my dog from a different image and
then just like insert it where it gets
all the lighting and the shadows and the
color and all all Ambiance correct what
if I want to do Fantastical things like
what if I want to see what you would
look like if you were the Incredible
Hulk big and green but it really had
your face and you thought that face was
your face doesn't take much imagination
to tell you the truth okay yeah I I
think I think that um you know what if I
want to make a logo logo is really hard
I mean I I remember having to pay I have
paid people $50,000 to just generate you
know whole not well to make hand make a
bunch of logos that I want to use to
Brand my product um sometimes you can
only get five of them though why can't I
get a 100 of them um you know I think
with Graphics it's uh Graphics is on
this NeverEnding um cycle where we never
feel like it's good enough if you can
you can think a little bit about like
PlayStation 1 and then PlayStation 2 and
then 3 four five right Grand Theft Auto
the first Grand Theft Auto to now maybe
what five looks like we can see that
Graphics is still improving you know 30
years later so I think by giving giving
people tools with where they can do
incredible Feats of Graphics is going to
be really exciting but I think graphics
is only a subfield of of a bigger of a
bigger plan that that at least our
company has I don't know if there are
other companies that care to do this but
um our company cares about creating a
unified Vision model where we can create
and edit and understand anything with
pixels a single unified model this is
missing in Vision but definitely kind of
exists in language in language we can
solve hundreds or thousands of different
tasks but in graphics but in Vision it's
all separated it's kind of like where
language was back uh three or four years
ago where you you know there would be a
model that could summarize and a model
that could do sentiment analysis and a
model that could you know do little like
these different little act tasks but
there wasn't a unified single large
language model but there's no equivalent
for vision what is a large Vision model
we don't really have a term for that so
my feeling is that Vision as a field uh
is going to significantly expand
why why can't a robot look at images and
navigate the world like a self-driving
car right um You that's one thing why
can't we understand images or what
what's going on better you know we've
seen early glimpses of that you know I
think there's like a famous picture of
Barack Obama stepping on a scale and the
model goes knows that Barack Obama is
like trying to increase the stale it's
like a joke um but can models like
really understand what's going on in
these images to a much deeper level um
so there's large uh there's large Vision
models that are starting to incorporate
language and images but Vision there's
no real all-encompassing multitask
Vision model so I have a couple
questions for you on this um first of
all well on the vision part um does that
does that sort of play into like a lot
of people have been talking about how
you know one of the biggest applications
of this current generation of AI is
going to be an augmented reality right
and like The Meta has those glasses
where like you're not you don't have an
overlay right now uh but you can talk to
things uh to their AI bot and it will
look at the world and then give you a
sense as to like what you're looking at
or you can even just ask questions about
things and it we talk to you so I'm
curious like how how um how seriously
you take this this new era of augmented
reality that we seem to be heading in
because speaking of one of your tweets
you wrote it's going to be hard to beat
a computer in your pocket you can use
inconspicuously when you need to so it
sounds like you're a believer that the
phone is going to be the the way that
we're going to interact with Computing
for a while but uh maybe there's
something I'm
missing yeah I mean I think the the
phone is really good form factor of
computing
um you know I I've talked to lots of
different friends who've tried the
Vision Pro and and such um you know it
seems like that's still kind of early in
terms of its use cases uh and and its
utilties so we'll see what happens over
the next year or two I tend to be like
more optimistic no matter what because
you never know um about these things I
think I think one thing that meta is
doing regardless of where V is headed or
or ar is headed is they have one of the
most world-class teams for graphics and
they have to because of of all the stuff
that they're doing in VR um but yeah you
know I think um it's kind of unclear
what the right form factor is is it is
it on your face is it somewhere else is
it a you know a thinner V you know video
screen I'm not sure but one thing I do
feel pretty confident in is that we will
care a lot about being able to use AI to
manipulate Graphics regardless of the
form factor like I'm somewhat form
factor agnostic is it a TV is it a watch
is it a is it glasses is it some new
thing I don't know um but it seems very
likely that we're going to care you know
like an example would be I wish I could
just go into a store stand in front of a
mirror and then just sort of swipe for
like a jacket that wearing you know I
went to I went this weekend to the uh s
the S the D young and it had this like
sort of fashion uh San Francisco fashion
exhibit and there was this you know
powered by Snapchat um thing where it
like put on a dress and so I was in a
dress um at the exhibit um but it was
really cool and it's a very it's very
obvious that this thing could be higher
Fidelity right and that was like a
really cool AR experience but why can't
I have that for jeans and a jacket or
anything that I want to wear without
having to try it on um so I think those
kinds of experiences seem inevitable
regardless of the form factor yeah and
then talking about the limitations of
image generation models today you know
it just seems like they all end up
generating images that look so similar
and you know when I said before that
like I generate the perfect image for
each story conceptually yeah um but you
can still tell that it's been generated
by an AI image model and not a graphic
designer so I'm curious like why so much
of these AI generated images from your
perspective look so similar is it
because they're using the same
underlying technology using the same
training set is it just that they're not
quality enough that people can pick them
out what do you
think yeah I mean language models have
this problem too right like the way that
we know this is that language models are
kind of overly V bur for Boose right
they talk a lot they talk a lot right so
that's kind of the the the little um
tell for language models for images you
know the Tells are a little bit
different uh maybe they have overly
crazy bouquet or they are super Lush in
ways that you don't need them to be Lush
right um but I I think with images you
know what's happening is is that maybe
the models are a little bit too curated
it's at its infancy but I think that the
models are probably too curated and
maybe
overfit to be based on human preference
and human preference is is in your human
preference it's your preference I mean
it's some kind of average of human
preference and so you know in art
there's art that we like in the modern
time and then there's kind of Avent
Guard
art and maybe you prefer that right you
want something more ostentatious or
maybe you want something more minimal
and laidback and I think what we're what
we've kind of discovered is that
actually like there are just huge wide
varieties of
preference and then there's the average
and so I think with image models is
somewhat twofold it's it's that we're
not catering to people's personalized
preferences and styles I think that's
one problem or the niches right um and I
think the other is that quality is the
lowest it's ever going to be starting
today right so the quality is going to
get incredibly good but it's also it's
interesting because so one of I having
worked at at Publications where we did
have graphic artists right like one of
the interesting things were was you
would give a prompt or that you'd write
a story and that artist would then sort
of take it back and based on their own
style end up creating an image and I
love doing this because I was seeing
what they came up with because I was
always surprised by what they built
because they would do it through their
own lens and focus but what AI does I
think is it tends to um sort of take
everything into account and spit back
the average
right like kind of words average image
and that's where I sort of say you know
sometimes I'll be surprised by what an
AI image generation engine will will
will create but often times it's like
yeah that sounds right or that's close
enough let's put it on the top of the
story right yeah I mean it's interesting
humans human Graphics designers are also
kind of overfit right they have their
own particular style like when anytime
I've reached out to a graphics designer
sometimes they'll say hey why did you
reach out to me like what did you like
that I did MH or an interior designer or
whoever right um so they all have their
style they they're kind of like less Rob
these human human graphic designers are
somewhat like less robust designers in
some sense like they are very skewed to
something and then you pick them um and
that's cool because then they can lead
to brandable things the models if you
prompt them simply then they will they
will be an average style an average
style that represents something you you
will probably get at this point you will
get something that is beautiful but it
may not be it might not be like
stretched it might not be headed in a in
a stylistic direction because everybody
uses it then it feels kind of fatigued
it's like a you know what I call it I
internally they call this pop it's like
pop art right like just pop music top 40
music and then you like like scrx with
like growls and noises and sounds right
but then there's pop music there's like
Justin Bieber that kind of thing what
you're getting from the image models is
pop and people love Pop we know that
love top 40 but it's hard to Market pop
all the time because it gets tiring so
yeah in this case you know it is
definitely the models are capable it's
just you have to have like this perfect
Alchemy of figuring out the right prompt
prompt engineering promp thing that's
where it gets interesting then because
then let's say okay you know I like um
like art in the style of a specific
graphic contemporary graphic design and
let's say the model's trained on that
art and I say all right you know create
an image of like robot playing tennis in
the style of you know person why sure
but then you get into some really tricky
questions
because we have to figure out a way to
either compensate these people because
it's like it really becomes like some
sort of intellectual property theft so
I'm curious like you're running a
company um that does image generation so
how do you think about this oh yeah
super super interesting issue and and
really complex you know these days we
don't really we're not really seeing
customers you know whale be like I
wanted in this
person's this one person's name you know
um which is good I think it's a good
thing you know GRE Greg rowski is
someone that I think about in this case
because a lot of people add that
person's name he makes really Amazing
Fantasy r that on dvnr maybe he's helped
out some video game studios and stuff
I'm not sure but
um uh but his name is kind of quit
essential in this debate and um you know
what the reason why people are it's
important to understand the reason why
people are doing it people aren't doing
it because they're trying to copy Greg
Kowski they're doing it as a shortcut to
get somewhere because if you take an
image from rowski it's not easy to
articulate in fact there's a reason why
there's a phrase called you know picture
speaks a thousand words it's very hard
to describe his artart completely we can
come up with some words but it's a Vibe
it's a style and so people are using it
as a shortcut and there's other people
like HR Gyer who does like kind of more
eerie you know horse type stuff there
I've learned about a lot of artists
because of how people are promting
there's a lot and and uh you there's no
no easy way around this but I think that
this thing is going to go away in the
next year this year I mean we're working
on something I can't talk about it
exactly right now but I think this idea
is is that users are doing this because
it's a shortcut to get to a very
difficult to describe Style and what
they really want is to say I like this
like I want to reference this I want to
reference actually want to reference
five of these different things and get
to this get to an image because actually
with graphics and a lot of a lot of
images and art what's happening is like
it's like remixing a lot of things like
even I I because I make music I can kind
of relate to it because it's sort of
like you know if your inspiration is
Kanye West and then your other
inspiration is you know Dr Dre and then
your other inspiration is um you know
six who produces music for logic and you
want to combine like the drum rhythms of
this person and the instrumentals of
this person and the lyrics of this
person did you copy them I mean all of
these people were inspired by people and
so I think in this case people are just
feeling inspired but they're using a
shortcut so the question is how do we
get them away from just copying actual
gregorowski because that that's
definitely the wrong thing we definitely
don't want that in the world nobody
should be copying Kanye uh you know
wholesale that's bad too right but you
just kind of it's difficult to eliminate
the prompt completely like let's say you
did have audio generation you know and
you could say write me a song about you
know I don't know my girlfriend in the
you know style of Kanye West yeah I
don't know why you want to do that but
you could and you know you sort of get
into those issues you do yeah you you
definitely do but I don't think I don't
think that's people's true intent let me
ask you this do you think that the
artist that are whose work is being
trained on should be
compensated I think we need to find some
solution for them yeah you know we we do
a small you know it's not clear like you
know every time you generate an image
they get like you know some Spotify
streaming payment you know 100 1,000 of
a penny I don't think anyone's going to
be happy in that circumstance right but
you know we we try to do something small
small we don't we don't think this is
like solution we don't think this is
enough of a remedy per se by any means
as a stretch of imagination but one
thing we do that nobody seems to else to
do is uh we actually link back to a lot
of these artists yeah when an image gets
generated we say additional credit
gregorowski and it links directly back
to his Deviant Art page so that people
can find him learn about him pay him
donate to him whatever they want right
we even we even link back to like
Wikipedia artists that you know no
longer or artists that are on Wikipedia
they're not living just so people
understand what they're doing um yeah I
think that's a good start okay so let me
let me ask you this um you're like
you're you're doing you're doing a an
image generation startup you're very
focused like you'll tweet often about
how it's so so important to stay focused
and I do think there's something to be
said for that because there's so many
other companies that are just kind of
going all over the place um do you
regret not doing video though because
what we've seen out of open AI Sora and
others is just kind of you know
jaw-dropping it's pretty amazing so
is that something that now you think you
should have
done I mean it's too soon to tell You'
have to ask me in a year to find out if
I regret it right
um right now you know I don't have any
regret um I it's it's funny where we are
with video is kind of where we were with
images and I don't know if people
remember but about two years ago Dolly
came out Dolly 2 came out in April uh
and the world was amazed
totally amazed but if anyone goes looks
at a dolly2 image today images are awful
horrendous images you would laugh right
right so when we see something like Sora
come out I you know I I I have this
belief i' I've been having this personal
reaction or a moment with all of this
stuff which is that my Baseline for
Quality instantly resets like 15 minutes
after the the technology comes out I'm
just kind of like anything worse than
this is
unacceptable and my feeling is that
we're only at the very beginning of
video and and the truth is if you could
probably go talk to real video people
they'll be like yeah this is not good
enough this is definitely I can't use
this people are going to have a lot of
fun doing it but the utility is probably
not there yet it's probably we're really
just at the beginning I think for video
so I think my feeling is my bet to
people you know sort of listening is
that in a year we will think you know
we'll think something like s was not
even close right yeah it is amazing I
mean that transition you're you talked
about from Dolly 2 to Dolly 3 I mean
even going from Dolly 2 to Mid journey I
was just like I'll never type the word
dolly in a Google search ever again I
get anywhere close to it and it is right
it's amazing I think you've pointed this
out how fast that we're moving that
these jaw-dropping breakthroughs become
obsolete or like kind of looked at as
unimpressive a few months later that's
the speed that this stuff is is moving
up totally and I think with you know so
that's video so I don't I don't have any
worries about video because I think
video is still early like there's still
maybe a moment where we can do vide
there's nothing there's nothing stopping
us from doing that to somewhat easily
take what you've learned with images and
go to go to video underrating it but
yeah yeah I think I think you know
without saying too much I feel like
probably where we're headed with images
is not going to be like it's not going
to be like a completely you throw every
we have to restart and throw everything
away to go do video you know to put put
it simply I mean we were trying we were
trying to work to a a unified Vision
model that incorporate 3D and video and
everything related to pixels into a
single model that's capable of
everything but I think for now we're
just we're trying to start with
something that's narrow and sharp that
we think is deeply underinvested in and
we still think that images have ways to
go um yeah let me ask you something
about video before we go to break there
because there's a debate that I've been
trying to wrap my head around which is
kind of this debate between Yan laon who
built this thing called V JEA right
which will black out a portion of a
video and then the model with its
understanding of the world will
basically fill in what it should have
been so you know you have a guitar and
someone seems to be playing it black out
the hand and the model will create the
hand and the strings showing that it has
an understanding of the real world they
say that that's not generative that
that's actual real world understanding
and then on the other side you have open
AI that's created uh Sora and it's uh
it's this pretty amazing thing where
like clearly this model understands the
physics of what's happening because the
pirate ships are you know sloshing
around in a cup of coffee ocean and it's
like oh they understand that the ships
belong in the ocean and this is the way
the ocean moves and this is the way the
ships should interact with the water
it's so impressive and it seems like it
also understands the world but you ask
the meta folks and they would say that
actually uh that process of generating
these videos is actually limited and
doesn't achieve what the AI research
Community is trying to achieve what do
you
think yeah I think that you know I
haven't studied um Yan's bppa thing too
deeply uh but I get the gist of
it I mean I would posit this to you are
you sure it understands physics
no right because actually because let me
stand on the side of of that it does and
then you can sort of take take this
argument down I mean come on like boats
in the water the water's coffee you know
right yeah I mean that's my argument
what do you have to say well I mean
there that that it's it's not too
difficult to refute in part because like
just just imagine that there's um video
and the video represents uh the physics
of a different world like Mars right
right and even though there are natural
physics to Mars they don't necessar
represent the physics of Earth they
represent some Physics it just happens
to not be Earth and so I would say you
could just pull that thread a little bit
longer and just say actually what it's
really doing is it's representing the
physics it understands in the videos
it's being trained on which could
be incorrect physics it's really what it
understands what it's being trained on
it's m kind of my main the main thrust
of my point and that to a human to us it
looks like physics it's imitating
physics and it's not but it's not
necessarily imitating correct
physics so it's really mimicking and
understanding of it's training data and
likely and if there's any training data
that's like cool CG or like you know the
Matrix or Neo's like B you know on his
back that's not real physics of our
world but it models its training data
and I think that that's totally fine
though for a tool that's meant for
creativity that's acceptable but can we
really say that it has learned physics I
can't say that I don't think we can not
yet you know maybe lighting but even the
videos that have lighting could have
incorrect lighting right uh yeah so on
and so forth yeah I think that that the
folks that I speak with in the AI
Community are really divided on this
like we had Brian kazero from Nvidia a
little bit back he's runs applied
machine learning there and he's like
implying some metaphysical capabilities
in these large language models where
whereas like others would say that it's
just predicting the next word and this
could be the same thing we're like we
still we we're still so early on and
still trying to figure out like what's
happening in these advances that it's
still an open question or maybe I'm just
giving the people you too much credit I
just but I I I take like a very
different argument than these like two
different factions yeah okay I posit I
take the argument that it doesn't I just
like I don't think it
matters right like at the end of the day
we are making we are making these mod
could do it or could not but either way
what matters is what utility it brings
to humanity and if what it brings is
this amazing you know creative tool to
create super slow motion action shots
for the next mat Matrix movie that's
fine and if it can truly model physics
in the real world because we want to
simulate what might happen with
self-driving cars uh at a faster speed
than actually having the cars be out in
the world so be it to me it doesn't
matter it's kind of irrelevant what what
matters more is it's value to us as
humans and I think we're a little like
too deep on a philosophical level about
whether you know it's this or that the
reason why I ask the philosophy
questions is that they matter from my
perspective in terms of like what you
can do next like if it does understand
physics then you can imagine or or
anticipate that it will be able to do
more than if it doesn't but it's it's
definitely interesting I guess I'm
trying to say that it can do both right
so anything is really like the options
are kind of wide open definitely okay
let's take a break I want to when we
come back I want to do a quick lightning
round through the tech Giants and also
talk a little bit about uh well one of
the tech Giants the state of Google so
uh why don't we do that when we come
back right after this and we're back
here on big technology podcast we're
here with soel DOI he's the CEO and
founder of playground uh we talked a
little bit about image generation in the
beginning uh in the time we have left
let's go rather quickly through the tech
Giants um let start with Google because
Google's been sort of like the punching
bag of the AI community for a while um
so you even had a tweet that says
Google's lost its way it's the best
company to compete with even investors
have stopped ask even investors have
stopped asking what if Google does it I
mean Google did just start doing image
generation they had to shut it down um
what what what is happening
there oh man I wish you know obviously I
I only have a slight preview into what's
going on at Google but you know my my
guess as to what feels like is happening
is uh they are in a significant race
where either investors or customers
believe that by losing this race uh it's
an existential issue time will tell
however and Google's rushing to uh be a
strong leader in that race and they have
to contend with a significant complex
bureaucracy that is not really well
attuned for the velocity um that AI is
running at right
now so it's
organizational and I also like yeah go I
I'll just say this I wonder how I've
wondered how much of it is
because Google sees a threat to search
over time if it pushes the status quo
forward too quickly and right before we
were talking I was on uh CNBC talking
about the state of Google and I was
absolutely floored by one of the numbers
that DJ brosa uh who's an anchor there
brought up up which is that um G I think
Gartner believes that by
2026 we will be doing 26% less searches
than we are today or search engines will
have 26% less traffic I know you've uh
you're you're connected with perplexity
in some way we just said arv in certain
us I'm kind of floored by that number I
don't believe it I think that um search
is going to be continue to be a a way
that we use web navigation and AI search
like perplexity will be more to satisfy
curiosity um and and engage with
different topics what do you think about
the stat and what do you think about
that that argument that I'm
making I mean I think there's a very
high probability it is greater than that
number in a shorter span of time whoa
for real that will be doing even fewer
that that search engines will have even
less traffic than even an even greater
decline than
25% that's right and that will Happ
before 2026 okay yeah that's right
exactly because let's think about the
model jump so far right we've got um you
Dolly 2 and April two years ago look at
the difference between that and and any
cutting inch model uh we can look at
gpt3 which was four years ago um and now
we have GPT 4 GPT 5 is probably slated
imminently this year the jump from four
to three was incredible and I don't and
I think the key the reason why I believe
this this perhaps this like very
surprising thing is because
uh I don't think people quite
internalize how many more how big of a
jump can be had still like we're still
so at the beginning the early phases of
this thing that um it is it is moving
faster than Moors
law by a lot M and the biggest people
right now are putting in huge quantities
of money the I think I already find it
annoying to have to go to Google and
like run through a few links and then
click and then back and then click and
then back and then oh there's an ad here
okay let me scroll down you know it's
already it's I think Humanity already
can tell it's frustrating so if you were
to if you were to go hm this thing is
already kind of inefficient somewhat
frustrating in fact like I just want the
answer I don't want to have to find the
answer right I think that's the problem
these things are solving and you look at
the model jumps right over the last 3
four years it doesn't seem it doesn't
seem surprising that like almost all the
traffic would shift to something that is
I mean Google has very low switching
costs you mean right now it happens to
be integrated well in the browser right
it it happens to be
um uh but actually the funny thing about
Google is like it has slightly less
lockin and ease on mobile in most
consumer traffic desktop is shrinking
for Consumer while uh Mobile's
dramatically increasing those lines
cross many times yeah we you have
Android we do have Android but but we're
talking about you know whether Google
search matters like Google could make a
model that matters and is relevant but
it still might it might still might
spell the end of its search business so
my my general feeling about this is that
the ux of like something like perplexity
we've already figured out is like a nice
uxx and you combine that with another
model jump like GPT 5 or six it doesn't
seem that crazy to me that we end this
desire of going to Google and then
scrolling through BL links and clicking
on each of them is your default SE
Google or something else oh certainly
it's Google but I've already but but
I've already shifted so much my it's not
my first goto right right unless I want
to go to a very specific site I mean
people going to their address bar inside
of a browser or phone to search a
website that they're trying to go to is
just they're not really using Google's
value they're just you know it's like my
you my dad used to type in CNN.com into
Google and you could just type it in the
that's not a real search yeah right um I
I don't I already think it's not really
a a great go-to interesting okay another
thing that you said let's go to meta you
said the only thing scar than Satya is
Mark Zuckerberg taking AI seriously
unpack
that well I feel like Mark has been very
focused on VR because he's trying to do
something that I think you know just
regardless of your view of whether VR is
going to succeed or not it's ambitious
if he succeeds and I think he's like a
very Relentless entrepreneur and founder
so and he's one of the few entrepreneurs
and Founders that are like running a
trillion dollar plus company not that
many left so I feel you know I somewhat
feel like it's him and Jensen I think
yeah and he's and he's he's very young
still so I think that you know for him
to take AI ser and the thing about meta
is it is super set up to succeed at this
they have the world's they're like the
first second biggest research lab they
are they have an immense quantity of
compute that's only growing I mean I
think he talked about having 350,000
h100s by the end of the year or
something like that yeah and they're
going to have total 650,000
either uh uh GPU like um equivalents by
by I think the end of the year which is
crazy yeah he he's got an extremely
ambitious AI research leader that's a
lot of
gpus it's a lot how many do you guys
have uh no not anywhere close to that
I mean more than a thousand not more
than a thousand right so it's just the
crazy I mean speaking with service now
also which is they 150 billion 160
billion public company like they
wouldn't say that they have in the
thousands in an interview that I did
with that wow so yeah wow to have
600,000 isn't it's crazy yeah I just I
just think that you know you combine
founder with uh Relentless ambition with
compute with the best talent you know to
me it's a recipe that is hard to I mean
and then you compare that with Google
you know it feels a little like you know
uh to me it feels like uh they're forced
to be reckoned with in the next few
years okay let's talk about Nvidia
speaking of Jensen I want to test an
assumption here uh I recently have found
out that they the basically are the
software that they that they sell along
with their chips is core to training AI
models and that makes switching away a
lot more difficult is that something
that you're finding in your business
that you're using the chips and the
software to train models and you'd have
a hard time switching to like an
AMD yeah the software is called cuda and
it's like their platform for doing all
kinds of the it's their way of
interfacing with their
gpus uh and so you know it has locking
in the sense that there's like a huge
developer Community around it just like
x86 or something like that you know
maybe there's um you know software
that's really tuned and optimized for
x86 six so that's what causes people to
kind of stay on it with Cuda um you know
it's not Cuda that's keeping I think
keeping a lot of us it's actually that
there is nothing really dramatically
better than nvidia's gpus and so if
there's nothing dramatically better then
I mean the the reality is the costs for
training and inference are so high at
companies that scale that Cuda is not is
not like a big reason why you're going
to stay there it's gonna it's going to
come down to cute costs and so if there
were somebody that were really driving
the costs down for the rest of us we
would all flip because it would be worth
it so to me it's not really it's not
just a function of Cuda you know I think
that does that is true to some extent
but I think for the big companies or
anyone spending a lot of money uh you
know we're just we are we all want there
to be someone that can compete with
Nvidia because one of the problems with
Nvidia was that they You released their
h100 but they didn't really reduce its
cost MH you know it is it's two is you
know 1.9x faster but 2x
costlier uh and um and and it
technically reduces your cost because
you're getting more GPU compute per node
like you have a server server costs
finite amount well now you can put more
of the GPU dense compute per node so
your Costco down but they didn't really
price their gpus lower so that's
somewhat disappointing because it would
have been nice if it were the same price
but double the compute obviously so
Nvidia knows what they have right and so
what about a company like Amazon they're
obviously developing their own chip
they're making models available off the
shelf people are using
AWS compute I imagine to run models
what's your perspective on Amazon's
place here I mean they also have Alexa
which is like you know the sleeping
giant yeah I mean I think I think AWS
has significantly missed the mark
actually on this I think that Azure and
gcp are doing
uh Azure better than gcp better than AWS
AWS is interesting me we we look we were
looking for compute last year and AWS
wanted to charge us five times more for
the same
GPU uh than 10 different providers all
around
them would you stop at a gas station
that was that cost you five times more
than the that's right next to it I would
speed bu it
throw right and and that's kind of what
you know I think what's happening there
is that this is my insight um so I hope
it's helpful for someone but my guess is
that they have a scarce stockpile of
gpus and they know that they can price
those gpus internally they can price
them to their internal customers at that
price and the customers will buy it
because the customers can't go anywhere
else you know maybe maybe because
they're not allowed to in their company
so then they can charge five times more
and that's what the sales reps are doing
but if you are a new customer we have
choices
you're not doing that you're not going
to do that but the sales reps will do it
because it helps them reach you know
quota that's crazy so it's so I think
there's a shortterm there's a sort of
short it kind of feels like ever since
Andy Jazzy became CEO AWS has turned
very short-term minded um about how it's
going to earn revenue and this is
obviously bad because anyone that knows
anything about startups knows that the
the biggest companies are yet to be
built but they're definitely going to
not be running a AWS if their compute is
five times more expensive wow yeah
that's crazy and I wonder what that
means for startups like anthropic that
are you know have billions of funding
from Amazon and are going there they
might be priced at 35x yeah exactly um
let's talk Apple real quick I wonder
what they're going to do with AI I mean
they're hinting that they're going to do
something at WWDC it's like they're
going to make a supercharged Siri or you
know take the search bar away from
Google and then give up on all that
money they're getting
um it's they have some incumbency
advantage don't disadvantage don't they
because if they really push push hard on
AI to you know take up more room in the
operating system then they can crowd out
some of the advantages that they have
today yeah I I think apple is in a
really good position because their
culture is already seemingly like one
where they wait and
see and their advantages are not easily
eroded because they own the hardware
platform and all the network effects
that are associated with that so Apple
seems like they're in a really healthy
position to wait and see and build the
best things not just build uh kind of
like aimlessly and Google feels like
it's just trying to build everything
right they're building image gen and
Gemini and the coding and they're
building an IDE and you know they're and
then they're like asked all the PMS
clearly to like integrate it this week
into like every imaginal product I
opened Gmail opened docs I opened so
many different random Google things uh
and they they're all trying to con me to
use Ai and I think apple is super well
positioned to just like let Google do
all those
experiments and then just pluck the ones
that are the best
ones and use its massive install base
and distribution power uh to deliver an
amazing experience not a rushed rushed
one so I think apple is behind but I
think that they are often okay being
behind and they execute very well
uh kind of from behind because they find
ways to LEAP um yeah all right let's
talk lastly about Microsoft and
openai you know we you you gave a pretty
strong statement about Amazon I'm
curious what you would think what you
think about the current offering from
those companies I mean obviously you're
competing with them on the image gen
front um and also just like from your
sense do you think that the open AI
situation is stable right now or is are
there going to be more fireworks on the
governance side
there H you know I think that I think
that the folks at open AI really only
care about one thing they and I I think
people don't fully internalize this
because it seems it seems a little too
crazy it seems like sometimes when you
read a company's Mission you're like
whatever but I think that they I think
that Sam is genuinely focused on AI in
attaining that and I think he does not
care about graphics and you know video
necessarily I think I think he those are
stepping stones and that helps research
uh you know get to the next point but I
think he is very focused on that and so
I think you know broadly we don't tend
to worry about that because we're
pouring all of our energy uh into
Graphics um I can't I can't say much
about Microsoft but I can just say that
I you know genuinely believe that open
AI is trying to pursue that effort um
I can't tell whether that'll be three
years from now or 30 years from now
though yeah in terms of what Microsoft's
doing yeah I don't know not sure but
brilliant play by suia either way
seriously yeah yeah no matter what
happens to be the most valuable company
in the world uh been they seem like
they're the tech giant that's in the
best position right now which is wild
given where they were like seven years
ago eight years ago and also and also
just very surprising because it's like
if they had not done that they would
have been maybe in the worse position
totally I mean being aggressive
sometimes it it matters they've learned
their lesson right they sat by and tried
to ride windows for as long as they
could and then people were like yeah we
don't want to use desktop operating
systems anymore and they're like oh
that's interesting okay and right and
the person that led that shift from you
know one era of computing to another was
SAA in the server and tools division so
here he goes again yeah you know he he
has he is doing something that I find
that's even like slightly even more
brilliant which is not just the open E
deal but if You observe very carefully
he is actually partnering with
everybody he is bringing all the models
into Azure right um and he's doing it
very methodically and I think that he is
really setting up Azure to be uh to to
LEAP to leap and and be a lot more
competitive so I I just I actually think
that he's doing a really good job kind
of playing every field yeah uh and
positioning himself kind of in the mid
positioning Microsoft sorry uh kind of
in the middle of all that um so you know
game recognize game to sub
you totally all right uh just to end I I
want to say that um I actually reached
out to you initially uh when you had a a
tweet advising Founders that you know if
you're going to speak with a journalist
speak with uh someone who's independent
and I certainly am independent and uh I
dm'd you and I was like so and you you
lived up to your words so I appreciate
that and I'd also say that like these
conversations are super valuable and um
I think that that speaking with
journalists inside we're probably not
going to agree on this one so I'm be a
different different conversation but
speaking with journalists inside um some
of the corporate media um I don't think
they're all out to to get Tech Founders
um especially off the record
conversations sort of like if there's a
divide between Founders and and
reporters then the misunderstandings
will just grow um but anyway my piece so
I appreciate you being here though but
go ahead yeah I yeah I think that's sort
of the real issue is not that you know
the individual reporter is you know bad
person like I think they're all like
well-meaning when well intentional so
when you if we have conversations with
them right if you have a conversation
over drinks or dinner whatever they're
Obviously good people good good well-
meaning people working hard that's not
so much the issue that that a lot of us
uh that you know that basically think
that you should largely stop talking to
the institutional media have it's not
that we think they're bad it's that we
think that their institutions are bad
and their institutions create incentives
that uh create bad situations um you
know uh where where like they you know
like what is the cause you know we
should I think we should also be a
little bit curious like what is the
cause that causes a reporter to write a
story and then email you and say you
know do you have a comment and then
publish the story one hour later what is
the cause for that you know is that
person a bad person probably not that
person is under some kind of deadline or
incentive or pressure that is causing
this thing and this particular this I
pick on this instance because it's it's
a very obvious one that everybody knows
is bad um is is not well you know that
has uh really bad implications know not
giving Founders to respond to something
um you it's happened to a lot of my
friends a lot of other people talk about
this so you know I think that's the real
issue and that's why you know you could
just as well work at you know any of
these media institutions but the fact
that you're independent causes your
incentives and your desires of what you
want to write and what you want to do to
be totally different and the reporters
that used to work at some of these
institutions that have struck it out on
their own like
you uh you can see it all get cleaned up
right they they completely change what
they write uh what their beats are and
how they work and interact with other
people in the world
um so I think it's like it's a lot
better more factual more interesting um
reporting yeah it's it's interesting I
mean like I obviously am competing
against like the broader media ecosystem
um so I do hear you on that front anyway
it's one it's something that we could
talk about forever it is good to hear
your perspective on it and and once
again I appreciate that you um you put
something out there in the world and
then when I was like all right let's
talk you said yes so I hope this isn't
the last time I hope to have you back
and so thrilled that you were able to
come on and join and talk about all the
new stuff that you're working on in the
broader industry it's like it's cool to
be able to speak with someone who like
you read their stuff on Twitter and then
like you have a conversation like this
goes more longer than an hour and it
could easily go two or more so uh the
substance is there and and appreciate
you being here thanks again you thank
you for having me all right everybody
thank you for listening we'll be back on
Friday to break down the news as we do
every week and we'll see you next time
on big technology podcast