Spotify Co-President Gustav Söderström on their future with Generative AI

Channel: Alex Kantrowitz

Published at: 2024-11-13

YouTube video id: jV4MinWX39o

Source: https://www.youtube.com/watch?v=jV4MinWX39o

We have a great show for you today
because we're sitting here in Four World
Trade Center, Spotify's New York City
headquarters with the company's
co-president, chief product officer, and
chief technology officer. Yes, all that
in one. Gustav Sodstrom is here. Gustav,
great to see you. Welcome to Big
Technology.
Thank you for having me, Alex. It's a
pleasure to be here. Very
great to be here. I mean, we're in a
beautiful studio in your office. I I've
been looking around. I just can't
believe how amazing the studio is. And
also, it's cool for me to be sitting
here with you because I'm using your app
every day. And Spotify is the place
where I touch some of the most I don't
wouldn't even call it possessions cuz
I'm subscribed to it, but one of the
most beloved experiences that I have,
which is music, and so many of us use
Spotify all the time, but we hear from
you guys rarely. So, I do appreciate the
opportunity to speak with you.
Me, too. I I appreciate that. I'm very
glad to hear that. and uh I'd love to
share as much as I can about how Spotify
actually works. It's sort of a passion
of mine to try to explain things and and
how they work. So I uh I actually love
these podcasts.
In some ways, an app will determine how
people experience a format, but in some
ways a moment in time will determine how
an app sort of has to deal with the
content within it. Yeah.
And Spotify is going through both of
those. Both of those regard uh
artificial intelligence.
I don't know if you've heard of Sunno.
In fact, I'm sure you've heard of.
It's one of our favorite things to use
on Big Technology podcast. Ron and I, we
do this show on Friday. Uh we built a a
theme song with Suno and played it and
it was a good time. Um and I'm curious
from your perspective running product
that's Spotify, how do you feel about AI
music, AI generated music? because the
songs, they're not amazing, but they're
good. There have been some big hits. Um,
do you view this as an opportunity, a
threat? Do you want it on your platform?
So, the way I think about, I'm a
technologist, so obviously I'm very
excited about the technology itself, and
I love AI. I think it's a super
impressive product. It works amazingly
well and it's philosophically it's very
interesting that something we thought
was impossible just a few years ago that
a machine could sound like something a
human did can be creative
legitimately incredible you prompt it
and out comes a great sounding song
it is incredible so I think that
technology is amazing now my interest is
to think of these technologies as tools
so if you if you think about music it's
gone through a journey of more capable
tools. If if you go way back, if you
were a musical genius, um like a Bach or
someone, you literally needed access to
an orchestra to be able to realize that
genius. Even if you could play multiple
instruments yourself, you couldn't play
them at the same time. So, you actually
needed like an orchestra. And then we
got to recording music and you could
record one instrument at a time. So, you
got more and more independent. And then
somewhere around the 80s, the
synthesizer came along and made that and
meant that you didn't have to be able to
play all the instruments yourself. You
could you could sort of quote unquote
fake the drums using the synthesizer and
the guitar and so forth. So I think
there's been this progression of more
more powerful tools that enabled more
and more creativity. And then somewhere
in the 90s the the DA the digital audio
workstation came along and and being a
Swede very proud of this someone like
Avvici came along and and what is
interesting with Avichi is he was not
very proficient at any one instrument or
a singer. So in a previous world he
would not have been considered a a very
creative person because he couldn't
realize that with access to this tool
the digital audio workstation. Turns out
he was one of the most creative people
we had that we are very very proud of.
So so for for him the digital audio
workstation was as Steve Jobs would say
a bicycle for the mind. It meant that he
could he get more productive and he
could he could express his his genius.
And the big question with this next
round of tools is the same. Is it
amplifying creativity or is it replacing
people? And I I think it's amplifying
creativity. It is giving more and more
people the access to be creative. You
need even less um motor skills on a
piano or something. You need less
technical skills in a digital audio
workstation. So I think of them as tools
and and I think there's this interesting
question on what is AI music. I think
people say AI music and it means
something that was prompted with like
not too much of a prompt and not too
much work. So like 100% AI but the truth
is that much of music being made today
made today is a combination. I think
many of the big artists are using AI for
parts of their songs or parts of the
track or the drums etc. So, I think
there's actually a scale between zero AI
and 100% AI. And I think we're on this
this progression where it's actually
going to be very difficult to say what
is an AI song. Does it have to be 100,
99%, 70%, 50%.
But but the real question is, do you
welcome this stuff on your platform?
Let's say somebody does prompt 100% AI.
Uh Spotify could fill up with songs that
are AI prompted. It's very easy to
create these songs and then upload them
to the internet. How do you feel about
those? Do you want them?
So there there two questions there. One
is what are what is Spotify about? We're
a tool for for creators and if creators
want to use AI to enhance their music,
as long as we follow the legislation and
copyright laws, we want them to be able
to monetize their music and pay out,
right? So for us um we are trying to
support creators and and uh the music
catalog has grown tremendously since we
started from tens of millions of tracks
to hundreds of millions of tracks and I
think it's going to keep expanding. But
what I think is important for for us to
figure out that I think is is our job
and the rest of the music industry is if
you go back to the years of piracy there
was this technology called peerto-peer
and file sharing that was amazing.
We worked on that early on.
Exact that. Exactly. We actually
incorporated that technology into
Spotify. But before Spotify, the
technology sort of preceded the business
model. It was great for consumers. They
could now get all of this music for
free, but it didn't work for creators.
And I think we're in the same period of
time now where the technology has
preceded the business model. So, I think
the technology is great. I do think we
need to find a way for for the creators
who have participated in this to be
reimbursed. So that's something that we
are thinking about and the rest of the
industry is thinking about if we can
find the business model
I think we could unlock
an tremendous amount. So, so there's a
separate question which is then these
models would the way they were trained
will that be considered legal or not
which is a legal question that is being
decided uh on some some time period for
example in the US these companies are
now sued so I think that question will
be decided by legislation but let's
assume that there is one of these models
whether it has to be retrained on other
data or not is that an interesting tool
for us if it was trained legally yes if
creators can participate in it
so first of It's good to hear that
you're already thinking about issues of
compensating creators, musicians
because, you know, I write text in
addition to podcasting and I know that
models have trained on my text and
previously I'm not going to see a dime
on that. Um, it's a little different,
right, with music, but yeah, if you can
channel different musicians, there
should be, I think, some renumeration.
Um, but I'm going to just ask one last
time on this point, then we're going to
move on. Um, so Meta for instance, they
have AI generators. The feeds have I
won't say filled, but there's lots of AI
generated images. They're engaging. Meta
seems to be okay with this. It doesn't
ban it. And now some of the top content
on a meta platform is shrimp Jesus,
which sort of combines like two of
people's great loves, which is God,
Jesus, and seafood. And
I've seen that shrimp.
It's massive. These type of images are
massive on Meta. Yeah.
So from a Spotify perspective, if these
songs generated by AI music generators
become engaging and let's say they
follow the rules, is that good for
Spotify?
Well, I think like this. If creators are
using this u these technologies, they
are creating music in a legal way that
we reimburse and people listen to them
and they are successful. We should let
people listen to them. I think what is
different though, I don't think it's our
job to generate that music instead of
the creators, right? That's a that's a
key difference. Are we as a a platform
for creators? And then we can have a
discussion on which tools are they
allowed to use? Like they could use or
the workstation but not LLM. Maybe
that's not actually we we shouldn't
decide that for them.
But there is a question should we
generate all the music ourselves and
that's where we're saying no,
we're not going to generate that music
and other platforms maybe will because
it's it's it's cheap content, right? So
that's the key difference of we decided
what we want to be in this world and
it's a platform for creators. Then then
there's a question which tools they are
allowed to have which is partially a
legal question and partially up to up to
the creators I think.
Okay. So there's a potential world where
one of these tools seems to have
violated copyright and you might ban
creators from uploading music that have
used that tool.
We are already taking if if we get we
have detection systems for if you if you
are um if it's a derivative of work of
of something that already exists. So we
have systems to take these down. uh if
you're creating something completely new
that isn't a derivative of anything.
There isn't a there isn't a a copyright
infringement then the labels tell us. So
so that's the other question on like
what are these models trained on and
we're not creating these models. So so
we're watching what happens there and
we're going to follow the law but I
think from a high level this should be a
very exciting tool for creators for for
musicians for authors for podcasters. I
I think um I think if you look at
something like notebook LM for example
was actually created by a journalist and
a writer as a tool. So I I think my bet
is that these are bicycles for the mind
but sort of bicycles for the mind on
steroids right
and that when those shifts happens
there's always tension between the the
the people who didn't use these tools it
feels like this is a little bit like
cheating and the people who are saying
like no I want to be creative too and
it's always a different difficult
transition period. It's just the story
of technology. And by the way, we're
going to get to notebook LM in a bit. So
I I definitely want to hear your
perspective on that. But let me ask this
one. So first of all, what you're
describing is just sort of like this is
what happens in tech companies. You
think you have something figured out and
then next thing you know, new innovation
you have to account for.
That's kind of what makes it exciting.
That's what makes it fun
that that it happens. And you already
have addressed where this is going,
which is do we get to a place where
remember you started talking about this
saying we never could have anticipated
that this is possible and now it's like
feels like magic a prompt and you get a
song out and I called them great
earlier. They're not great but they're
good enough.
And this is literally first generation
of this stuff. It's going to get better.
And as you think deeper about it, do we
go to a place where you can start to
prompt music that is going to be better
than any song that you might listen to
that has been created for certain moods.
For instance, like let's say you're in
like a introspective mood or in a loving
mood or in an angry mood and you're just
able to prompt it and create that song
that perfectly touches the heart at that
moment. And I started off talking about
how this this format is beloved. Music
is blood. It touches the heart. And if
AI can do that, does that become the
future of music? So, you've already said
you don't want to play in it. But is
that something that you can discount
from coming in?
So, I think two things. Um, music is
used for many different things, right?
Um, and so you have, for example, music
that you're using to study, I think is a
good example. The extreme version of
that is people listen to white noise.
So, like would white noise be generated?
is actually already artificially
generated.
It's one of the top podcast formats,
right? So, so there's a scale here. And
I think you're right for for certain
things. Maybe you could create better
white noise. Maybe you could create
better comp uh you know, always varying
ambient music for your studying. Maybe
for gaming, maybe that music should
automatically adjust what's happening on
the screen. So, I think we're going to
see lots of AI generated music for those
use cases. But there's another use case
which I think is very important. A lot
of people use music to build their
identity, right? Especially when you're
a teenager. You go to a concert, you buy
the jacket from that concert. Why do why
did you buy that jacket? Well, it's it's
a it's a it's like a pin. You're
identifying with this band. You're
you're building your own identity
through this band. I don't think that
will work with AI generated music
because there is no one behind it. So, I
think some music uh and and I'm sure
this is happening already. I'm sure many
publishers are generating music for for
coffee tables and so forth. That will
probably happen. Um, but I do think the
human need for for having someone to
believe in an actual artist that you
care about. I don't think Taylor Swift
will be replaced by an AI. Not because
the music couldn't sound similar, but
because the whole point is Taylor Swift
and belonging to something. So, I think
it's not a it's not a binary answer like
is this going to happen or not? No, it's
going to not going to happen. I think
both both will probably happen.
You know, two years ago I might have
fully agreed with you that there's
always going to be that need for the
story and the human connection and now
I'm not so sure
because
because I do think that that this stuff
can be good enough. It's already proven
that it's it it's already exceeded some
of our greatest expectations. And um I
think we would like to think that we
want that connection with the human. But
all right, let's go right into notebook
LM.
But but I think one thing to say that
that I think is interesting is what
tends to happen in these worlds is that
the thing that is scarce gets even more
valuable. So one bet would be that true
human connection gets more valuable than
ever when a lot of what you talk to in
the future may be LLMs. That that that
would be my bet. I'm I'm hoping that's
the case because part part of the
business that I'm running is predicated
on the idea of connecting to a human who
can sort of dissect and break stuff down
is valuable. So I'm hoping that is the
case. So but I also I'm not as sure as I
used to be. And
I think it's wise to not be sure of
anything right now given the pace of pro
pace of progress.
And I think that brings us right into
Notebook LM which I was planning to
leave for later but you set it up
perfectly. And it's this Google product
that you can put notes in and then it
will actually generate this podcast uh
with two co-hosts that sound like
ridiculously human.
Yeah.
They don't they don't sound like robots.
And in fact, people have sort of like uh
fed them scripts where they like realize
that they're actually not real people
and they're AIs and they just have this
kind of breakdown and it's insanely
entertaining. But the bottom line is and
they're not quite where they need to be.
They're still a little hokey, I think,
and just kind of they're like if you
listen for a minute, you're blown away.
If you listen for 5 minutes, you start
to cringe. But they also do a good
enough job of breaking things down
where they can pass. And I started to
see uh them right now showing up in the
second half of episodes where people are
like, "We're going to do the episode and
in the second half we're going to give
you the AI to listen to."
Uh but what happens if they end up being
the first half? And Spotify's made a big
move into podcasts. What do you think
about the rise of these AI podcast
hosts?
So I think Notebookm is very impressive
and u you know you could predict given
the the evolution of voice quality of
these things and understanding of a
language model that this would happen.
So I'm not at all surprised in a sense
that you can generate audio that is
engaging to listen to talk audio. But
what I think was the great innovation of
um Notebook LM was that people generated
monologues and what what humans really
respond to are dialogues. And in
retrospect, it's pretty obvious like
almost all podcasts are dialogues. Like
if I sat here for one hour, it's not
that interesting.
So I think the big hack was to to go
through a piece of material and present
it as a dialogue and prompt it the right
way. There was also obviously um you
know the internal Gemini model at Google
that is probably very good and the voice
models got better but I actually think
what they found was product market fit
for the actual audio format and it
turned out to be the podcast format
quite quite literally.
That's pretty crazy. I mean somebody on
threads tagged me and was like the male
voice sounds like you and I listened and
I was like not the same tone but also
the cadence and the type of questions.
I'm like, does that mean that I'm just
like the blend of of all different? Am I
like this like, you know, kind of um the
unremarkable middle of this or do they
copy my voice? I'm hoping it's the
second one.
It'll be interesting to see if people
either get tired of hearing the same two
people talk about everything or the
opposite, they get used to the same two
people and would prefer to hear the same
two and build trust.
I don't know. I I think um I think
humans are very quick and prone to sort
of anthropomorphizing.
It's it's sort of a hack on our human
brain. So you feel like you know these
people because you heard them talk about
so many things now. So I think it's very
interesting. It's hard to predict where
we'll go as as a platform. We view it
the same way. Of course people are
uploading these podcasts uh to Spotify
as well and I don't I don't know um from
the top of my head how you know if
anyone has super high engagement but
certainly people are are listening to
them. So it's the same question. Does
this turn into a tool for creative
people um who can write stories but
don't want to have the podcast around it
or or just have no one interviewing them
so they just do an interview around
their own material. Um I don't think I
think you're going to run into the same
problem where if you just ask it to talk
about something it's not going to be
very good. You need a good source
material. So it's the same question. Is
this a tool for creative people to get
even more productive and creative or is
it a replacement of creative people? My
bet is it's another tool.
It's pretty interesting because it sort
of broadens out the long tail. And for
those not familiar with the industry
jargon, it's basically just that like a
lot of listening is concentrated in a
small amount of shows and then there's
this great long tail, right? Like if you
think about like a a bar chart as it
just sweeps out and
there's uh lots of, you know, seldomly
listened to shows. And the thing about
these podcast generators, Notebook LM in
particular, is you can take it and
create podcast for something that's so
niche that you would never have a show.
Similar with AI code, right? You can
start coding things. I think you spoke
about this in your interview with Tom uh
on building one, another LinkedIn
podcast network show
where now you'll code things that you
would never code before because you can
do it. And it's similar. It might go the
same way with podcasts where you can,
for instance, when I before I was uh
heading down to Menllo Park to interview
Andrew Bosworth, I just dumped in all my
source material and it read me a I
created a podcast about like his current
statements. There was like seven
interviews that him and Zuck did before
I showed up there and I was able to get
the summary. That podcast never would
have actually made sense to produce, but
for me it made sense. And maybe that's
where this goes.
Yeah, I love that framing. Like one
useful framing I think of these
techniques is is a financial framing
like the cost of something goes to zero
like the cost of writing code goes to
zero cost of doing a podcast goes to
zero cost of prediction goes to zero.
what happens, you know, and and usually
what happens is is the the alternatives
to that good, they get challenged, but
the compliments to that good, you know,
you have the famous like what if the the
the
uh price of coffee goes to zero, then
then the tea is going to be replaced,
but sugars or compliment is going to
explode. So, I like that way of of
thinking about it and and I think what's
going to happen is exactly what you're
saying. we're going to have enormous
amounts of content around niches where
it didn't make sense to produce a
podcast. So, one way to think about it
is just like the cost went to zero. So,
I do think that the catalog is going to
explode. And then what does that mean?
Well, it probably means that the
recommendation problem becomes even more
important because now it's even harder
to keep track of everything that is
uploaded. I also think that if you have
this like vast sea of the perfect sort
of discussion around any topic uh so the
recommendation problem becomes more
valuable to solve the bigger the the
catalog is but I also think you're going
to see the same thing as we see in music
the superstars will actually also get
bigger this is what I find fascinating
people say like are you know Netflix
winning or YouTube well the truth is
both the tail is getting bigger but the
shows are getting bigger and they're
saying saying are the indas winning or
Taylor Swift well
indis are winning but Taylor Swift is
bigger than ever. I tend to see like
these both things happening at the same
time which is why I'm hesitant to like
say like now that is going to happen
right
but not this.
Yep. Okay. Let's talk about AI
recommendation. Uh it's a big part of
Spotify and
we're going to just start at the end for
this conversation because your vision
eventually is so right now like we'll go
into Spotify there'll be some
algorithmic recommendation. And there'll
be some stuff that we listen to. Your
vision, if I have it right, is
eventually you want Spotify to be sort
of this ambient friend for us that knows
this context of the situations we're in.
Maybe AR. We're just talking about Orion
glasses before we start uh recording,
but maybe they know the context of where
we are and can chime in and give us, you
know, an example of type of some music
that we might want to listen to. Is that
right? Why would we why would uh why
would you be pursuing that? Well, I I do
think of so when we um started Spotify,
I was not part of funding Spotify. I
joined in 2008, late 2008, 2009. Spotify
was found in 2006, but was pretty early
on. And um it's interesting that this
was before machine learning became a
thing. And so Spotify was quite focused
on social features for purposes of
recommendation. We needed social
features because that's how most people
discover music, through a friend. So we
wanted you to connect to people and then
uh AI came came along or what was called
machine learning back then and we
realized that through all the
playlisting data we had uh which is
basically one way to think about the
playlisting data is almost as labeling
for for the user they creating a set for
themselves for Spotify they were saying
like these tracks go well together these
tracks go well together so we got a lot
of of label data basically and we said
internally now some people have a
musical friend that happens to know
their taste and so forth, but most
people don't. So now we can build this
friend for for everyone. That was the
AI. But the interesting thing is like
that thing of like building a friend for
everyone that can give music
recommendations like discover weekly. It
was always an analogy. People did not
think of discover weekly as a friend.
Thought of as a set as a service and so
forth. I think what's happening now with
AI is that the analogy is actually
becoming reality. And so you can see you
can see us moving a little bit in that
direction. and you have the AI DJ that
starts to give Spotify voice that talks
to you. Um, and I think what is going to
happen with these LLMs is at least for
some brands, you will start having
literal relationships with them. And I
would love if it is the case that you
think of Spotify as actually a friend,
not an analogy anymore, but reality.
This is a person that this is a a thing
that knows me well. this is a musical
intelligence, a podcast intelligence, a
book intelligence, and I actually like
hearing it, you know, tell me about new
things and suggest things I'm interested
in. So, I think that's that is where
we're moving. I think other brands are
moving there as well. I think if you if
you look at some someone like uh
Dualingo, they've actually only
communicated through four characters all
along. When you get a push note, it's
not from Duolingo, it's from Lily or
Star or something. They really they uh
they give me a hard time if I'm away for
a couple hours. It's like
and that was also kind of an analogy but
now with AI you can actually talk to
these characters. So I think this is a
journey many companies are on and it's
interesting to to to play that out means
that part of what was called branding
before is like what personality do you
want your company to have? Not as an
analogy but literally what personality
should Spotify have? I think that's
fascinating time to work in in tech and
it's something we're thinking a lot
about
and I think that you might be
underrating how much people view
Discover Weekly as a friend. Now, for
folks who don't use Spotify, Discover
Weekly will basically take into account
your listening and your preferences and
give you a playlist of what 30 songs on
a Monday morning. And they're just new
songs for you to discover. And people
will be like, "Uh, Discover Weekly
really got me this week or Discover
Weekly is inflicting some pain on me
this week or what happened? I thought we
had a close relationship and now you
don't owe me at all." And you also have
so you have this AI DJ. It's you can
find it in the app.
Um,
It's okay. I think I [laughter] there's
definite I'm curious the feedback I've
heard is people were excited about it
initially and have gra have moved away
from it. And what is so now I'm sitting
in front of the you know person running
product at Spotify. What is actually
happening with this AI DJ? Is the
experience there and are people using
it?
Yeah. So in the numbers they're not
moving away from it. It's actually very
successful.
So my friends are just pretty snobby
music listeners.
Well for the people that use it it's
actually um their biggest set. It's
bigger than their discover weekly usage.
So, it's quite a quite a binary
experience. I think it's a for people
who don't know what they want to listen
to and just want to put something on,
it's working very very well. Um, what I
would say though is when we launched um
the AI DJ, the big innovation there was
that we managed to basically digitize a
voice of a real person to make it sound
very believable. But the things that it
said around the music were were were
like to some extent uristics and kind of
repetitive after a while. Uh so what
we've done since then is we've invested
quite a lot in um this is quite recent
that is rolling out in LLMs that
actually tell interesting stories about
the music and we see very strong effects
on this on the retention uh of the
application. So whereas the thing used
to say here's this and this song from
this and that I think you'll like it.
Now we can say things like um this
artist was just in Copenhagen or has
played here on the last. You're starting
to you're starting to get interesting
stories. You're starting to feel more
personal.
The other thing that I think is missing
that I hope we can do someday
is it can talk to you and you can talk
back by skipping. But obviously in the
in the age of like talking to machines,
you would like to be able to just talk
to it and say like no this was not very
good. my discover weekly this week was
not what I wanted and give actual
feedback and that is technically very
possible now with these LLMs. So, so
that's what I'm hoping will happen. This
should not be a one-way relationship
which Spotify has been for technical
reasons. It should turn into a two-way
relationship.
Okay, I have questions about that coming
up. And to introduce that segment, I
want to talk to you a little bit about
how much we should allow the algorithms
to dictate what our music experience and
podcast experience is going to be versus
how much should be uh dictated by us.
How much agency should we have over our
own choices? Um Kyle Chika, New Yorker
reporter, recently wrote about how he's
leaving Spotify. I'm just going to put
the argument out there and hear what you
think. And I'll just read it straight
from the story. He goes, "Through
Spotify, I can browse many decades of
published music more or less instantly.
I can freely sample the uh work of new
musicians. It has become aggravatingly
difficult to find what I want to listen
to. With a recent product update, he
says it became clearer than ever what
the app has been pushing me to do.
Listen to what it suggests, not choose
my music on my own." What do you think
about that argument? Well, I think this
is an individual feedback, but I think
generally you have very different types
of users. So, I'm I'm going to get I'm
going to get this person back on Spotify
100%. I think there is a there's an
interesting trade-off here that is that
is real. So, people want less friction.
Um, they want to spend less time
searching. You want to make things as as
easy as possible, right? But there is
this end of the line where you you sit
there and you just receive. You're kind
of force-fed and you don't give any
signal back. maybe a few clicks and so
forth. Um, and that's something that
that we want to avoid. I think this is
where the industry is going. It's going
more towards distraction content and
sort of just sitting and receiving. And
it's a little bit of a dystopian um end
of the line there. So, what is
interesting with Spotify, which we are
reemphasizing, is that it was actually a
platform where you invested quite a lot
in your own playlisting, right? And the
there's a trade-off here between if we
you could have as a vision is we should
be so good at machine learning that you
should never playlist again. That would
be the goal. Um because then you've done
the user a great service supposedly. But
then you also receive no signal and the
user does no investment. So we're
actually reemphasizing playlisting quite
a lot.
Okay.
Your own investment and and you know
over the years we we've gone more
towards um machine learning and
algorithms because it works. people
listen more and they they appreciate the
service more. Um, but we need to cater
to everyone including this reporter. So
the Spotify user base is divided into
many different kinds of people. You have
the the sort of the track listeners only
listen to playlist. You have the
hardcore album listeners. It's like I
just want to listen to an album the way
the creator thought about it. I don't
want the songs in between. um you have
like the artists, radio listeners only
listen to to one one type of artist and
it's a it's actually a big challenge to
build a service that serves everyone
when people are very different. Uh so we
we try our best to make sure that the
sort of music aionados who want their
library to be album
can have their service and then but then
you have the other people who just want
like I just want my daily mix to play in
my air. I don't, you know, I just want
to collect tracks. They also need to be
successful. So, we're we're trying to
build and cater for both. You can never
please everyone 100%. But we're trying
to be statistical about it uh to make
sure that
it is um it is uh vastly better for the
majority of people. But we our goal is
to cater to everyone. And I do think
there's a real point around
going to zero user investment seems good
in the short term, but I don't think
it's good in the long term because you
actually lose signal from that user and
at the end I I think they feel less
participatory in the experience. Even if
the engagement looks high, if you've
done no feedback, I don't know how much
you feel this is actually your service.
Definitely. And look, I'll confirm that
Spotify does listen to user feedback. I
sent a a tweet out uh a couple years ago
talking about how like some of sometimes
I'm baffled by the Spotify product
decisions and I mean maybe it was
because I was a reporter but someone
from your team reached out and I talked
about how I wanted to see recently
played.
Like often times I'll be listening to
something and then I'll go away from it
and I can't find in the app and then a
couple months later there's a recently
played button in the app.
There are some great updates coming for
you as well on that topic cuz this is a
big user need. Maybe it takes a little
bit longer than we want, but obviously
our goal is to is to listen to user
feedback and try. But we get very
sometimes really completely opposing
user feedback. That's the tricky thing.
Who who do you listen to the most? The
people who want this desperately or hate
this desperately. And and there's a lot
of both types of feedback. So it's
product development at this scale is
sort of a statistical experience, but
you still have to have a bit of an
opinion. If you only treat it as
statistics, the application is going to
be very weird
at the end of the day. You have to
combine some sort of vision and
conviction, but you have to be still
very datadriven. I think an interesting
example of user investment and AI that
that we launched recently is something
called um AI playlisting. Uh so this is
I think a good example of like the first
time you can talk to Spotify. So the AI
DJ talks to you and it's getting better
but it doesn't listen. It listens to the
clicks maybe. But with AI playlisting um
we built this experience where you can
you can prompt what is an LLM with what
kind of playlist. So we have an LLM and
the LMS have a set of world knowledge
about music but then we have the music
catalog and we have your listening
history. So this is an LLM that
understands your particular taste and
you can ask it for a playlist with you
know big u big drops and EDM for driving
fast at night or something and then it
will try to do that and then you can say
like no um a bit more upbeat or not that
artist and so forth and and this I think
is a good mix of using AI but not to
force feed you stuff. It's actually very
high signal. you are literally telling
us what you want
and then when we say here it is you say
that one yes no no yes and then you can
reprompt so so it's back to I think it
should be a two-way conversation and I
think the first wave of machine learning
allowed us to do the one-way push uh the
the next wave generative allows us to
actually listen to you even in clear
text so communicating with Spotify just
through skip buttons it's a pretty
narrow signal so it's kind of hard for
us to understand like when you skip it
was it because you hated it or because
you liked but it was too many times. Now
you can actually say like I really don't
like this cuz like remove it.
So I was DMing with Kyle last night. I
was like, "Hey, I'm going to meet with
Gustav. What should I ask him?" And one
of the things he said is uh should
Spotify users be able to tweak the
recommendations? And your answer here is
resounding yes. And you're working.
Absolutely. Absolutely. We are working
on these things both the obvious things
where you can say like uh I didn't like
this particular thing. But I think the
free text element is very interesting.
If you could talk to it, you probably it
would learn much more. But it you would
probably also get more trust
definitely. Let me ask you one broader
question about this because I I'll I
won't stick on Kyle's uh stuff for the
entire uh conversation, but I thought it
was really interesting
and he wrote a book called Filter World.
The main argument, he's been on the
show. I I'll link it in the show notes.
The main argument is that al our world
mediated by algorithms has become too
bland and you know effectively that the
algorithm have flattened out you know
what used to be a more vibrant yeah
experience with things like music. Do
you see that at all?
I think this is a really interesting
argument there. There are two ways I
want to address that. Uh one is for
Spotify specifically. We've seen the
feedback that people feel like it's
great for the kind of stuff I already
listen to, but I feel like I'm in a
bubble. I'm getting more of the same.
I'm now getting new stuff. This is sort
of a Spotify specific challenge because
most of the time your phone is in the
pocket and you're listening. And when
you're listening, you're listening to a
session. Let's say you're listening to
indie folk or something. Then it's quite
easy for us to say, "Here's another
indie folk song." And and you're going
to say, "Oh, that's that's a good
recommendation." But if we start playing
Metallica there, you're going to be
like, "What is this?" So most of the
recommendation sort of inventory we have
is kind of constrained naturally to what
order they're listening to because we
can't put in very random things. You
would say this is a bad recommendation.
So this is a challenge for us when you
know when we want to show you something
completely new. The favorite example is
I love Regaton, but you wouldn't have
seen that from my listening history. How
do we solve that problem? So we started
investing about two years ago in in
other types of of foreground
recommendation. So sort of like the
feeds that you see on social media, but
you can it you can literally say like,
"Okay, I'm bored. I want to go wide."
Then you can go into these um foreground
feeds of music where you can swipe
through many tracks and they're very
efficient. The hit rate is going to be
low because now we're in a territory
where the whole point is we don't know
that you like this. So our hit rate is
going to be low. Then I think you need a
very efficient UI to evaluate lots of
content, right? Because the hit rate may
be one in 20. You're not going to listen
to 20 songs. That's over an hour of
music. You need to go quick. So we try
to solve that problem for for when like
Alex is bored and he wants to branch out
as soon as we see that signal. We didn't
have tools for that before. So So we
built that. So that's part of the
answer. Spotify being an audio service
made it a bit harder to go explore. So
now we have these foreground feeds. They
have music videos, not in the US yet,
but but in much of the rest of the
world, we have music videos. They're
very helpful when you're evaluating new
music. [snorts] But the more
philosophical part of this answer is did
the algorithms sort of flatten out?
Because they are to some extent trying
to find statistical patterns and
averages.
And I think if you look at
recommendation technology, I don't think
this is widely known yet, but these deep
learning based systems, they had
flattened out in terms of if you added
more use data or more parameters, they
did not get better like the LMS. There
there were no scaling laws. It's just
like it is what it is and you could move
it 2%. There's something that has
happened there recently recently which
is called generative recommendations
where you actually use a sort of large
language model instead of these old deep
learning models and you basically think
of um user actions as a language. So you
have a sequence for us. So they they
click this, they listen to that, they
click this, they listen to that. And
then just if you turn that into tokens,
just as you can turn a language into
tokens, you can just as you can try to
predict the missing word in a sentence,
you can try to predict the missing
action in a sequence. And it turns out
that these generative recommendations,
they do scale with more user data and
more parameters, just like the LLMs. So
this is a long-winded way of saying I
think he's right that the
recommendations did flatten out. It's
also true that people are changing
recommendation stacks and it now is
unclear why they couldn't continuously
get better. So I'm hoping that the
recommendations do get more intelligence
because intelligent because now it's not
just a statistical average. They can
look at your specific user history going
years back and they could potentially
understand that it's actually uh you
know Christmas again and last year at
Christmas you did this. So I'm hoping it
gets more intelligent. And one last
question about recommendations or maybe
I have two but one important one that
comes from Ron John Roy who's on the
Friday show with us. He would like there
to be a parent mode on Spotify where if
you have kids you can be like I'm on
child mode and then recommend kid music
and then parent mode, you know, and
don't uh blur my recommendations. What
do you think about that?
So So we have a a bunch of different
solutions uh for this. Obviously,
there's a family plan. So, hopefully
your kid can have their own account and
then it doesn't cost more the recommend.
Exactly.
What are you going to do for your
three-year-old?
Exactly. There's the other thing is you
can create a playlist for your kid and
then if you
click the settings, you can say do not
include in my recommendations
and then it actually doesn't destroy
your recommendations at all. Uh so, so
there are those solutions. We're also
trying to understand that all of this is
kids music. So, while this is part of
your taste profile, we should not play
this in your other sets because this is
probably something you're doing for sort
of a use case. So, you probably want a
kids music playlist in there, but you
don't want that music to affect your
your other sets. There's an algorithmic
component. There's a there's a
subscription plan component and then
it's back to like more user control. You
can actually already say that this
playlist should not be considered my
taste and so we're going to build more
of those controls.
Okay. Raj will be happy to hear that.
Yeah.
Uh, okay. Really last question about
recommendations, then we're going to go
into podcast and some other formats.
Um, I don't know if you have seen this
YouTuber, his name is Fontana. He did
this thing about the Shabuzzi being song
being the song of the summer explaining
why. And he made an observation there
that was interesting to me, talking
about how we used to hear music on the
radio often. And that was the music that
was played there was music that would
often be played when we're with other
people, with friends, having a good
time. And it led to more, you know,
dance songs, rock anthems, and stuff
like this. And today we're like mostly
accessing music via streaming platforms.
and he says those are much more
individualized recommendations which has
kind of shifted the way that music is
made and even the hits in music. What do
you think about that argument?
So there is a philosophical question
there which has been researched a few
times which is do you have an innate
taste in your brain and our job is to
search for that and find it or do what
we play actually affect what you like
and there are all these experiments in
colleges where you know you play like
different songs to different groups and
then you see what they like and it seems
like it's a bit a bit a bit of both. You
have some sort of innate taste but
you're also affected by what you hear to
this argument like the the radio can
change your your taste. Uh so so I think
there's um there's truth to that
argument. What I think is interesting
about um our music listening is that
when we survey users and we ask them
what percentage of your listening is
with others it's a huge percentage like
double digit percentage. So music is
actually a very social activity still
and in some cases we see this we have
this feature called jam that is is
taking off like a rocket for us. It's
doing very well and and jam is
essentially we can detect when two
phones are close to each other. It's
just like hey do you want to join Alex's
jam and now we have a joint queue. So at
a party the way you party right now with
Spotify is you don't go and like
interrupt. You just bring up your phone
you join the queue and then you can
queue things up right and so uh we have
a lot of of u joint listening and people
are listening like I said I don't want
to say the exact percentage but it's
double digit percentage of listening
happening in groups. It just looks to
individual as individual listening to
us. So I think it's actually happening
more than maybe people think. It's not
100% individual listening. But because
we don't see them as group listenings,
we're still treating them as individual
listen. So now that we're getting more
data on what is good group music, that
becomes a different category. So I I
think uh the radio use case is
happening. You're hearing songs at
parties and with others and when you're
riding in the car and so forth. It just
looks to these services as lonely
listening, but it's actually quite
social,
right? Okay, let's take a quick break
and come back to talk about podcast,
audiobooks, and see how many random
questions I can get to before our time
is out. We'll be back right after this.
And we're back here on Big Technology
Podcast with Gustav Sodestrom. He's the
chief product officer, chief technology
officer, and co-president of Spotify. So
Spotify is investing heavily in
podcasts. Um this has been going on for
a long time first through largely
through an original strategy and now
less so. Um also audio books. You can
find my book always day one on Spotify
if you're a premium listener which I'm
happy about cuz
more people can listen to to the book.
What has gone into the decision to just
bring all these formats together in one
app? And um I mean are they good
businesses for you spot uh uh podcasts
and audiobooks?
Yes. If we start with the first one, how
do we come to this decision? Uh what
happened is that we saw internally
actually at Spotify a lot of our
developers sort of hacking Spotify into
or hacking podcasts using RSS into the
Spotify experience. And we saw it again
and again at hackw weeks and first we
thought like maybe it's a it's a it's a
niche random need we saw it again and
again and so then we just it's like user
feedback or user research you know
Spotify is still like many thousands of
employees so it's it's not a very
representative sample of society but it
is some sample of society so if you see
the same user need many times you should
take it seriously so we started looking
at that and then we looked at podcast
that we saw had a lot of potential and
was growing but we didn't think anyone
was doing something very interesting
with it. So we decided to to then uh
just approach it because we saw the user
need internally. We saw the market
growing. We sized it and then we saw
that there was no one really investing
in it. Apple hadn't invested in it and
they had like 98% of the market. So
that's how we came to it. And then the
question is
yeah that Apple podcast app needs work.
Okay. But sorry go ahead. [snorts]
But we were grateful for that. Uh so
then the question is why in in the same
application? Why not as a separate
application? And uh that's there there
are two views of that. One is it's a
strategic decision. The the the
biggest barrier to something new right
now unfortunately isn't necessarily the
quality of the application. It's the
user acquisition cost.
Distribution is everything.
Distribution is still everything. And
and actually at the beginning of the
iPhone era, there was a lot of organic
distribution. People went to the app
store every day. It's like no one goes
there anymore. So you almost have to pay
for revenues. So user acquisition cost
is probably the biggest inhibitor to
most business plans. So if we built a
separate app, we would have to reacquire
our own users again and that would make
it very expensive and we have seen all
of these big big companies the American
tech companies launching app after app
and basically nothing worked. Then we
look at China which is a different
strategy of the super apps where they
double down on their introduc on their
own distribution and so you can think of
like podcast pre-installed. So that was
the strategic angle for where this made
sense.
But I actually have a a user angle on
this where I think it is the better
experience. So I think in 2024 the user
should not adapt the software to the
content. I think in 2024 the software
should adapt to the content. So if you
play a piece of music, there should be
skip buttons. If you play a podcast,
it's not rocket science to change the
skip buttons to 15-second scrub. And if
you play an audio book to to change them
to chapters. Like come on, it's 2024.
Why do you have to switch apps for that?
Right. So, we we actually both believe
that it was strategically the best for
us because then we we could double down
our own distribution, but we also think
this long-term is the right user
experience. It is the easiest for the
user. Now, we have these beautiful
connections between the audio book and
the author being interviewed in a
podcast on the same thing where it's
seamless instead of like now you should
switch the app and go somewhere else.
So, so that's the reason that we do it
in the same application.
And talk a little bit about
discoverability because that's the
biggest issue for podcasts. I mean, if I
and as a company that's an expert in
recommendations, which we've spent like
most of this show talking about, that
should be something that you get done
pretty well. But for instance, like if
I'm listening to tech shows, you know,
and and I'm not listening to Big
Technology Podcast, I probably want to
see that um there's a show called Big
Technology Podcast out there. And from
what I've heard, discoverability like
both from um product people and from
podcast producers has been the biggest
issue. Uh probably because there's like
a huge investment that goes into
listening to even that first 5 minutes
of a show. I mean that's like 2 minutes
longer than your average song to try out
a new show. And most of them most I mean
I actually changed my show that we could
do our like really like you know
informationrich uh intro which you just
experienced and then take a break take a
break and come back in
because if people are going to try it
out I want them to know what they're
getting versus like the typical Long
way.