How Amazon Rebuilt Alexa From The Ground Up

Channel: Alex Kantrowitz

Published at: 2025-03-05

YouTube video id: Jq-JmvFUXCM

Source: https://www.youtube.com/watch?v=Jq-JmvFUXCM

the Amazon leaders who spearheaded the
new Alexa are here in studio to talk
about what it took to rebuild the
pioneering Ai and where voice AI is
heading in the age of large language
models we're joined today by Panos panai
Amazon senior vice president of devices
and services and Daniel Roush the VP of
Alexa and Fire TV gentlemen great to see
you welcome to the show thanks man so
great to be here sounded kind of fun so
you you both sounded kind of fun yeah
you both must be relieved to have this
out uh yeah I mean excited relieved is a
tricky word on this one you know we're
we're finishing the product now it's
coming out next month so we're pumped
that we're through the event and yeah
there's some relief I would say would
you agree you feel a little bit of
relief but the truth is like it's all
about getting it into a customer's hands
as fast as possible so you still the
team's feeling that urgency right now
yeah that's the big moment for the team
right you get that first customer
response so we're we still feel like
we're building towards it but yesterday
was great okay so I have uh three Echo
devices in my house we have three rooms
yeah what are they house is generous uh
but in my apartment there's one in the
bedroom there's one in the kitchen
dining room and there's one in the
office yeah so they are first generation
I'm really looking forward to getting
these updates working hopefully within
these devices and uh and getting a
chance to use a new and improved Alexa
I've been hanging on to the echos for a
long time in the hope that something
like this would happen so we're here and
I I'm just I was at your your event
where you were announcing it I'll give
listeners a little bit understanding of
what I saw and then we're going to go
into some questions about what it was
like to build this so this new Alexa
it's called Alexa plus it is
conversational so it understands natural
language it understands your context and
you don't have to say Alexa every time
it we'll sort of have it back and forth
with you it is I think you could call it
a gentic it allows you to take action
like book a table call an Uber it will
go out on the world and help monitor
ticket prices for you for instance and
it's also deeply integrated into Amazon
services and namely Prime it's going to
be free for Prime members $1 19999 a
month if you're not a Prime members not
not a Prime member and the coolest thing
I saw in the demo was that uh you you I
think one of you asked for a uh the song
with what was it Bradley Cooper and Lady
Gaga and didn't say lady Goa I just said
Bradley Cooper Bradley Cooper uh in in
that what was the movie called uh stars
born stars born great great movie and
then it it called up P pull the play the
song and then you said now let me see it
the in the movie and it connects to
Prime video and you could see in the
movie so very cool product uh definitely
I think what a lot of us have been
hoping to see from the Alexa team and
from Amazon on Alexa um we're going to
talk a little bit about what it took to
build it and then the strategy here so I
think the first question I need to ask
you both is uh what what did take so
long because I think that like for all
of us whove you know I think there's 500
or 600 million uh Alexa enabled devices
out there we've been wondering as open
AI of the world and other companies have
made these big advances on voice AI when
Amazon was going to make its move and
and you have made the move um but what
what was the uh process that made it
take as long as it has ponos I think the
easiest way to say it is uh you've
started the summary when you have
hundreds of millions of customers that
are active right now you I mean this is
we talked a little bit about it
yesterday but every one of them matter
how do we make sure they all get the
great experience they need um meaning
you can't start from
zero and ignore it and if you could it
could be much faster although it's not
that easy to hook up the thousands of
apis and all the partners that we're
bringing together and all the experts
it's not it takes time but the first
thing is and there's two parts to it but
the first thing is you got hundreds of
millions of customers they love certain
things that they do on Alexa today they
might not love everything but they love
certain things for sure you can't you
can't leave that behind can't wake up
one day and whatever you use Alexa for
whether it's timers or music you can't
not make it better and great and so you
don't feel like something was taken away
from you when you when you take
something away from a customer um you've
just missed you've missed and so that's
one it takes time to make sure you can
get it all done so everything on what
you would call Alexa not Alexa plus
works on Alexa plus but better and that
was just the first point part of the
vision can't leave anyone behind which
was important we can talk about devices
and so forth but customers who love
their products um that are in and they
need them we can't take that away that
was one second piece is you're rearching
from the ground up so you've got first
the weight of keeping hundreds of
millions of customers and then you're
rearching from the ground up if we
started from zero customers I think this
is a different story you can move a lot
faster we can solve problems and then
just add features as we go if that makes
sense so maybe we just had a
conversationalist a pretty cool one then
we can add personalization then we can
add memory then we can add the experts
and people would just get updates along
the way and maybe learn and be great
however on day one we need to support
everything people love and know about
Alexa day one and so a little bit of
patience there um it takes a little bit
longer and the vision was it the vision
Vision was clear like we're going to go
bring a conversational agent forward an
assistant for everyone that is smart has
memory can personalize to you and then
ultimately be incredibly useful and so
we when we had that laid out we're okay
great but we can't leave any customers
behind and right at that point you kind
of step back once you put the vision
together you realize you need a full re
architecture but you're not going to
leave your customers out so you're re
architecting pretty much two stacks at
that point one what is classically known
as Alexa to be awesome and come into
this conversational world and the other
is everything new that it has to do yeah
and I want to go a level deeper with
Daniel on this one because P what you're
talking about a re architecture is sort
of what I've heard has been the holdup
here with Alexa for all these years
which is that and Daniel tell me if I'm
wrong but basically what folks have told
me is that the old version or the
original version of Alexa was built with
a lot of like then commands right so you
know it will understand some structured
commands turn on the lights okay then it
will take that and almost like
deterministically say okay I understand
this command this is what I'm going to
do turn the switch with large language
models it's a completely different ball
game because you have to make room for
uncertainty so actually the fact that
you've been able to introduce an Alexa
with large language models which I think
will have will be able to keep that
functionality is an engineering feat um
that's my perspective from the outside
what what is it actually like on the
inside and how close is that assessment
to the challenge well the team will love
to hear you say engineering feat uh cuz
I do think that I think there no lack of
feet it is it is real that is the size
of the task for sure I think um I think
you're you're on to it for sure uh you
know large language models the one thing
I'd add just in terms of thinking
through the technical architecture to
what Pano said is that it's really just
the the latest generations of large
language models that can even do the
things that Alexa needs to be able to do
uh so you're talking about our Nova
models right which we uh announced
within the last few months and starting
to get into customers hands that's super
exciting uh you know partnership that we
have with anthropy like you you really
need very state-of-the-art technology at
the base of the architecture and those
large language models and in large part
because of what you said we need them to
behave in ways that we can predict and
are certain someone says lock my door or
you know play that song you want it to
happen right some are higher consequence
than others and you really need to get
it right but you also want all the
elegance and nuance and understanding
and non-deterministic behaviors of large
language models themselves right so we
would call that a stochastic system that
you know it's literally at runtime that
you're making those determinations so if
you want to integrate tens of thousands
of services on day one day one out of
the box take advantage of everything
that Alexa has always been able to do as
Panos was saying and introduce all of
this new unbelievable behavior that you
can get out of large language models
that is a big engineering feat so how
does it know when the user is saying
Turn The Lights On versus like something
more esoteric like is there something
built within the technology that's kind
of like a switcher that determines first
your intent and then decides which part
of the model to send it out to the way
to think about it is you know at at the
base level you have large language
models and you have this model agnostic
system that's even itself going to
choose the right model for the job and
the models play different roles in there
what what's already happened is um even
even honestly sort of in the way you
asked a few of the questions is that
people assume the large language model
is the product a product like Alexa is
so much more than quote unquote just a
large language model so you have models
playing many different roles in the in
the system overall even models helping
us decide which model and models
themselves deciding if they're the best
you know tool for the job so to speak so
then you have a system that
progressively decides how to get
something done I wouldn't think about it
like a switch or something in classic
computer science that is a you know it's
a gate that's not that's not how the
system works it's it's a collection of
model behaviors and systems Downstream
of that that complete specific tasks and
that and that's where we introduce this
term expert to try to help coales around
the system behavior and explain it
better the large language models are
interacting with these experts that do
things like get you the sports score
play a song play a video know where you
are in the song so that you can go to
the video like all the things that you
saw yesterday at the event and so Panos
this is a mixture of experts model it is
you think about it and a mixture of
experts model but each expert
theoretically has its own model as well
so you're building on top of it each
expert is smarter when you think experts
it's like it's a weird term yeah but
there's think photos Smart Home
entertainment whether that's music or
video local info info all the partners
that connect you have communication
expert you have an artifact expert you
have a memory expert you have a
personalization expert each of them play
a role and they kind of arbitrate with
each other at all times so like the
model is just lighting up when it
determines that that's what you want to
do that's right uh just Daniel kind of
said it well like because the llm at the
bottom of that stack is it's deter it's
deterministic it's choosing which model
to use then the experts come into play
on top of it it's pretty it's a pretty
phenomenal way to you know it's a pretty
interesting way to think about it this
is a mixture of experts model for those
at home it's been part of what deep seek
has used to become much more efficient
uh in its in its reasoning for instance
because instead of lighting up the
entire large language model is deciding
to light up certain areas that might be
I it's not a deep seek Innovation but
they've just kind of used it to an
extreme extent um has that has using
that architecture helped you build this
in a way that for instance like reducing
latency or sort of lightening the
compute burden that you otherwise might
have had if you want something
incredibly fast stable even secure like
the paths on data right that where
you're really taking care of customers
um this is this is the fundamental
approach I think that that is
state-of-the-art and accurate and for
sure accurate don't forget accurate so
important yeah but on on that note I
mean are the is the new Alexa is there
going to be some sacrifice to having
those Alexa commands those standard turn
the lights on set the alarm um to in
order to enable all the llms to work the
way that they they're going to I think
you just called out the sacrifice and
it's time okay how long it's taken us to
get to where we are it's why it's my
favorite question like why is it taking
you so long like if I told you where we
were four months ago on somebody said
lock that door and then we had to
determine what that meant
versus in the past lock my front door
and you had to know it was the front
door and you had to say front door um
it's pretty phenomenal but you know 6
months ago it took longer than anyone
would wait to lock a door and you know
our customers need immediate response
and we won't make that trade-off so to
be that accurate with the latency that's
needed with the speed sub two seconds at
the end of the day um you end up you end
up needing a little bit more time ref
the expert so the expert can be quicker
and the model can pick the right model
quicker and the smaller model can be
trained to make sure it knows where the
door is he gave an example earlier which
I thought it's a Nuance but let me just
share it with
you previously in Alexa you couldn't say
play that
song it would look for a song called
that right it was that
simple now the model has to reason and
say that song I wonder what he's asking
I wonder what she's asking I wonder what
the person's asking that's what's
happening in the system then the expert
shows up looks at the history the
personalization what conversation were
we having play that song oh he's talking
about the conversation we just had about
Bradley Cooper and Lady Gaga shallow
shallow play that all happens in you
know Sub 2000 you know how how many
milliseconds are we talking we count in
single milliseconds now in component so
now you're all that is going on and the
Stack's working through it versus today
which is play shallow and that's the
only way you're going to play shallow
yep that's it and so I think it's just
understanding that
Nuance um in where natural language
comes in where you can talk to the you
can talk to Alexa without being precise
just like you can talk to me and I'll
use some microtels to get you know are
you asking me a rude question a great
question a nice question are you leading
me um and then from those microtels I
can then move to the words and then
determine where you're taking me and you
don't have to write it down type it and
read it exactly all that is happening
now in the machine which is pretty
powerful there was a cool scene in your
demo uh at the event at the launch event
where I think Panos it was you where you
said don't play the music in the baby's
room yeah so and it's really I didn't
say that so that's very explicit too
like don't play the music in the baby's
room it will the model will come up the
expert will show up the music expert
this is where it's super powerful and go
got it play it everywhere else or you
can just say don't wake the baby play
the music everywhere then the model will
go don't play it in the baby's room I
know I know what they're asking so this
is where that just that small model in
the expert does its job um and the fact
that you can just naturally move it
around in that demo I don't know if you
noticed by the way
nerve-wracking yeah so for listeners
Panos did this entire demo live I mean
we're going to talk about Apple
intelligence in a second but Apple
intelligence I was at the wwc launch
event and uh it was all a vision and
what we saw at this Alexa launch event
was a working demo now look I I mean we
know to reserve us commentators know to
reserve complete judgment until it's in
our hands yeah you have to for sure but
it was real it was all real real and
working yeah but what makes you nervous
in an event like that you're not worried
about the product working I mean six
months ago I would have worried about
the product working and I would have
shown you more Vision
demos like videos but the product's
working
the challenge is the
infrastructure the thousands of Wi-Fi
signals that are pinging around that
room like it's just an unusual these
live environments are very unusual turns
out Tech reporters like Tech and they're
using a lot of it we all on the Wi-Fi
well more more I mean the signals that
are being pulled from Bluetooth to Wi-Fi
to I mean who knows what's in pockets
and one of my favorite tech demo moments
is Steve Jobs just losing his on
stage cuz all the report are connected
to Wi-Fi and he's like you could either
be connected to Wi-Fi or you can have a
demo you pick totally yes that's but we
didn't have to have that situation so
but then and then you got you know the
servers have to be lit up and you you
know you're worried about latency and
what's happening in the room so you got
all that going on and now you're going
to do live and this is your baby right I
mean you love what you're about to show
you love it and if it doesn't go off
like I don't want to tell you what the
backup plan was you know what was backup
we're not going to talk about it for
real let's not talk about the backup
plan let me just say I can't tease the
backup plan and then not share the
backup plan they were really good they
were really good not a great it was not
a great backup just say no it was they
were great they were great they weren't
going to work but they were great plants
I would say I'm looking over here at
some of the team that was helping
yesterday I I uh but during that moment
um you you may have you may have heard
it's it was very nuanced at one point I
said move the musicak bring the music
here I want to hear the music over there
and the reason I use different
sentences I know what the model's going
to reason over and do but I wanted to
make it clear like you don't have to
think about what you want to happen you
just have to talk I want the music over
there okay and if and if the model
doesn't know or if the if Alexa doesn't
know she'll ask
you do you mean in the living room yeah
so so are we going to have a spe
trade-off here from the traditional
Alexa tasks just quickly Daniel I'm
curious like is it is the stuff I was
doing beforehand like or doing I'm doing
now um set an alarm is it going to take
a little longer because of this process
or it'll be the same amount of time no
be I mean this is where we have such a
high bar before we're willing to put it
out because and and deterministic
systems are incredibly fast right it is
it is straightforward computer science
in this day and age with an AWS cloud
and the great connect ity that everyone
has in their homes to make a
deterministic system fast on something
exactly like you said making a
non-deterministic system fast that can
respond in any way gathers all the
context figures out Legions of different
things which experts to invoke making
that system fast on something as simple
as a instruction or you know is is hard
that quite hard What technological
breakthroughs or Innovations did you
rely on to get it from a place where you
were dissatisfied with the latency to a
point now or you're happy well I think
it's it's another version of using the
right tool for the job and building
building a system of that's frankly just
more complex overall to get the simple
things done so it's a bit you know
there's like an irony in that but you
need a system
that creates very fast paths for simple
things even though you started with an
incredibly complex system already you're
adding these kinds of complexity to get
simple things done so that I mean I
won't go into the specific technical
details here but that is that's the
upshot you need to be able to figure out
you're trying to do something simple so
that you can do it fast very comp and it
gets tricky you know people understand
how to speak to Alexa today I think our
new customers we want to you know and
current customers we want to open their
minds on what they can ask for and how
to how to get something done um take the
simple tasks that we have timers alarms
there's a different way to think about
them and then in the non-deterministic
world how to translate what's being said
into what's being asked for which is
different um an example you said how
quick will be setting an alarm it'll be
lightning fast and you'll likely set it
the way you always have I need an alarm
set an alarm for 8: a.m. I think that's
the classic way to set an alarm or you
can say Alexa I need to I need to wake
up tomorrow at
8 okay and now that's non- determinist
and now it's going I think you need an
alarm and then it'll offer you an alarm
or just set it same with the timer set
me a
timer um by the way how long do you want
the timer for you say the time you can
move that to uh set me a timer to I'm
cooking my steak medium rare and then
she'll say I'm setting you a timer for
six minutes okay and so you understand
like when you get into that natural
language non- deterministic what's
happening what are you asking for you're
cooking your steak okay I'll get you six
minutes on each side or tell me how
thick it is and then the answer is you
know 2 in thick whatever or I want a
Ramen egg that's 8 minutes I got you
tell me when you start I'm starting 8
Minute Timer started for you and so the
world just change from even these most
simple
tasks it it just changes in the spirit
of by the way I never knew how long it
took to cook a Ramen egg so I'd always
have to go to Tik Tok open it spend 20
seconds watching somebody make ramen
eggs and then eventually it says put it
in the water for eight minutes like
that's all you see on TI to for the next
week and then I would say yeah that's
very True by you by the way don't search
Ramen eggs it'll you get hammered with
Ramen eggs but I I think uh and then all
of a sudden you're like got it 8 minutes
set a timer for 8 minutes now just
change it just ask for ramen egg and
Alexa will just determine what you're
looking for and give you an8 Minute
Timer okay so just to wrap this section
on the technical side my note that I
wrote to myself that said they spent too
much time building the Alexa microwave
and the Alexa alarm clock and not
focusing on the technology maybe uh I
underestimated the technological lift
here a little bit
I don't know we can't determine what you
were thinking for sure but I think
there's a lift here you said it's a feat
of engineering that's where you started
we have one of the best teams on the
planet working on this uh a lot of it
has 10 years of history in it you know
there's so many um people that work on
Alexa today that have been there since
its Inception you've got a lot of
passion around that in the engineering
team and the product you know just the
product Team all up we call product
makers when you put them all in a
collection um and yeah it's a feat it's
it it it's okay though we don't it
doesn't matter if somebody thinks it
should be easy or it's not easy or
whatever it doesn't matter actually if
it feels like it's easy that sounds
pretty good to me right I mean I don't
mind it means the customer is happy like
this must have been easy like yeah okay
I don't care do you like it like do you
love it great and that's I think that's
where we go so I want to talk about the
vision of this product because and the
strategy that you're going to put into
play here because again I was sitting in
the audience and I talked about Apple
intelligence before I guess the SE the
seg this segment of our conversation is
I I've headlined it's Apple intelligence
but it works um and you know it's a
little factious but I tried not to read
anything you posted coming in today
because I was like oh no I don't want to
defend or have a preconceived notion so
that's interesting to keep sharing we've
been talking on the show a lot about how
you know and yeah just we talked a lot
about the buildup to wwc the reveal and
it was a a it seems like every Big tech
company has almost the same vision and
tell me if I'm wrong here but like apple
was like the Apple intelligence demo was
like um you talk to Siri and ask when
your flight is and you're switching
flights and it's helping you pick your
kids up and um that demo looked a lot
like the Google Assistant demo that I've
seen like almost every year at Google IO
and uh and then I saw your demo and I
was also just like this is a similar
idea which is that it's a a contextually
it's a contextually
aware smart AI assistant that helps you
get things done and makes your life
easier so I'm curious if if you both see
the competitive landscape in the same
way I do if there's something different
about Alexa than the others and how you
plan to win uh given the the landscape
is developing the way it is you want to
jump in so I got a long one here so why
don't you just know you start and then
I'll
go I mean here look the vision for Alexa
has been super consistent actually uh
for 10 years I think Panos this was it
made it into your final deck I believe
yesterday you know we we have always
wanted to just make lives easier better
simpler uh and be the world's best
personal assistant that's been the
vision for Alexa from the beginning um
and so now we just have a technical leap
that lets us get closer to that Vision
but nothing
you know that's been the vision since uh
for for all 10 years that Alexa has been
out there we have a much more capable AI
assistant that's
conversational that is personal and
personalized now that can get an
incredible amount of things done for you
uh but the vision is consistent okay I
want to go to ponos in a second but I I
need to follow up on that because you
know the the reaction to this reveal has
been this is great it's personalized it
has your data to help you figure things
out but then you look at a company like
Apple which has so much personal data
for that people have trusted apple with
because it's all it has this security
messaging or Google which you know has
your you know maybe your Gmail your
Google calendar Google Maps uh this is
these are the services that you use to
get around the world and interact with
people um so if you're going to be this
personalized assistant like you are
coming up against these companies that
basically have already been deeply
integrated into people's daily daily
routines so what is the play there I
mean it the the phone you're basically
asking about the role of the phone uh in
not just the phone because Google has
I'm I'm plenty of services on the
desktop I mean I'm on an Apple machine I
got Gmail open maps to figure out how to
get here calendar and so it's it's the
op almost the operating system for your
your life I mean look you you you told
us you have echo in every room in your
home and that's great that's also true
I'm starting to think may I too much
and well you might look at your job I
mean come
on you didn't this would be a problem
saying customers you know we do so much
for customers in the home today and of
course we're Amazon so that's not just
thank you by the way for having echo in
every room in your home that's awesome
uh but also we probably put some
packages on your doorstep and probably
stream you some content and we've got
great deep relationships with our
customers Prime is an incredibly
valuable program for example and you
know hundreds of millions of customer
customers literally take value uh in
that and love it and use it all the time
so we we love our relationship with our
customers too and think that we can
deeply integrate any Services customers
want as well we work with Gmail we have
the Outlook calendar we integrate Apple
calendar I think it's a very powerful
Point like you have to take that and and
understand like we're both kind of a we
have this if you will you have music
shopping movies this is real things that
people love doing in the home I mean
this these are personal at every level
um photos but also but we're such an
open platform with thousands of Partners
um it's hard to say it's a platform so
I'd be careful with the word but at the
end of the day every single integration
Point um across Alexa gives us so many
of those insights as well but the key
and Daniel hit it he when he asked you a
question it might have been rhetorical
at some level um
I don't I don't think there's anyone
close um to being able to understand
your
home as as Amazon as Alexa it's very
it's it's it's a super important element
for us Alex like the idea that smart
home is connected to your music to your
entertainment to your life the fact that
we're now bringing in memory to Alexa
and you can have that conversation it'll
hold the context for you I I think I
don't think there's anything else like
it because then it's connected to to all
your services in a natural way too I
don't think it replaces the centerpiece
of the phone I think it just adds value
to your life in a very different way and
I think there might be a little bit of
opportunity and this is me understating
it but the ambient devices in your house
right now and the ones that you can buy
from us and some of the beautiful
products that we're both making now and
have released recently they're in your
home and you don't have to think you
don't have to open anything you don't
have to log into anything you just have
to be there and speak and it's a it's a
powerful concept when it when natural
language shows
up yeah I was with uh speaking with Jal
gani the head of prime at your event
yesterday and he was talking about how
the the family calendar is on his Alexa
device and it is a Google Calendar so
there the fact that there is that
interoperability I think where yeah you
don't have a phone uh that actually
might maybe that's an advantage I'm just
trying to it is an advantage just think
of it this way like we're not asking you
to start something that you knew that
you don't already do right we we just
want to make it simpler for you so
Google Calendar is a great example okay
just attach all four of your family's
calendar we'll make it a family calendar
and put it front in Center for you and
then when you decide if you're going to
dinner on Friday night we'll rationalize
it and you know that concept that
there's a communal device in your house
that everyone can see you know it's
something that people have been asking
for for a long time but now that you
have so much intelligence in the product
and it can do the rationalization for
you you I feel like we stand alone there
I do think I would I think this calendar
example is one that helps flip the
question a little bit in my mind
because it really is like how often do
you say well it was just on my
calendar I didn't know to meet you there
why I was on my work
calendar I say that to my wife tally you
know all the time she's like we we we
missed the restaurant we missed the
reservation so anyway having one spot
that is can be communal and personal
it's pretty powerful I want to press a
little bit on this because the phone
seems to be the place where people like
it's it's all about like where do people
interact with these assistants yep the
phone seems like it's going to be a
pretty important place it will be so if
you don't have a phone I mean again
there's some advantage in that like you
can bring any service in but like if
people are like on uh an Android and
they're summoning a Google Assistant uh
whatever the name is that week or
they're on uh an iPhone and they're
summoning Apple intelligence or Siri um
where does Alexa fit in on that like is
are you going to have to look at deeper
Integrations with these phone makers
will they even allow you to do that I
think people will use different
assistants I don't think there's any
question about it I don't think there's
one although if you lean into Alexa we
have the Alexa app on the phone and with
one touch of the button on your iPhone
you're having the same conversation
you're actually carrying the
conversation from your home to your
phone to your car to your PC with
alexa.com we thought that through
because we needed that thread for sure
so you know as she becomes more personal
to you and then you know more needed you
want to have her with you everywhere
that that app is doing a crazy cool job
right now and we haven't released the
new Alexa app yet it's coming with if
you get Alexa plus you get the Alexa app
the Alexa Plus app as well as alexa.com
ala.com right there's going to be a web
version of this there is um and you just
see the more traditional long form work
that you do with any AI browser at this
point it's the easiest way to say it but
you also get all the personalization you
also get the context of carryover if you
had a conversation in your kitchen it'll
just remind you what conversations
you've had lately if you've booked a
reservation whatever you've done it'll
collect it there it so it'll be on your
PC and your phone as well so I think we
just want to provide that for our
customer so they have the opportunity to
say I want my assistant my single
assistant with me everywhere you might
use your phone for different things you
might use a different AI assistant on
your phone I think that's a fair you
know Fair proxy I I don't I wouldn't
disagree it just depends on what's the
best path to get something done I think
Alexa will provide a lot of that best
path okay I want to take a quick break
and then talk a little bit about the
agentic elements in your uh new Alexa
release where agents uh might be going
and then maybe we dream a little bit
about where this technology is going to
lead we'll be back right after this and
we're back here on big technology
podcast with two Amazon Executives
responsible
for the new Alexa we have Panos panai
here is Amazon's senior vice president
of devices and service and Daniel Rous
is Amazon's vice president of Alexa and
Fire TV uh so it's interesting that Alex
that that um this a gentic buzzword is
now starting to be translated into
things that we're things that we're
seeing in product and it's kind of
interesting because Alexa's had skills
for a while like call me in Uber and now
you can use Alexa to call you maneuver
so um is this actually like a really a
new moment for a gentic AI or is this a
rebranding of some stuff that works a
little better than it has Panos what do
you
think I I can't get it to work anywhere
else I mean I think this is a at the end
of the day it's it's incredibly new but
it's also solving so many different
things at the same time first
um you have to always go back to how
much understanding is is in an utterance
just in natural language being able to
translate it and we've talked about this
already getting down to calling a
service calling the right API part
making the right partnership so that API
is called to make it as simple as
possible um it it it's uh I don't think
it's been accomplished I don't think
you're seeing it out there anywhere
connected to an assistant right now I
think there's a lot
of maybe I maybe you've seen it you got
to share with me where it is but uh I
don't think you have I don't I have not
and so what agents fundamentally like
using you know a core llm with an agent
non- deterministic calling the right API
calling that service booking that
service bringing it back and tying it
back into all your other services it's a
demo we've we've all seen a thousand
times but haven't been able to use I
think as consumers yeah
um okay I I think yeah maybe maybe
that's the case I haven't seen those
demos myself but I do I believe it I
believe it I maybe just need to watch
closer uh but I do think it's new I
think it's new um what we've cre in what
we're doing and building it up I think
it is I also think we mean we might mean
different things by agent and so I'm
just curious Alex what do you what yeah
make sure we're grounded in your
definition sure there's a ground just in
passing I mentioned yesterday in my own
part of uh part of our event you know
that boy everyone just uses this term
agent and I do think people use it in
different ways what does it mean to you
yeah it's such a great question because
I do think that in some ways that agent
has been used to Rebrand
automation um we've seeing automation
demos forever I mean even so just to
give you one example upon I wasn't
trying to shade the Amazon uh demo I was
just to give you one example yeah we
were all I mean we a lot of folks
watching the tech world were at
google.io when they demoed a voice
assistant that will go they will call a
restaurant for you and book you a table
and like they did the actual
conversation and the the assistant has
like human utterance goes um well maybe
we could have a table for it's like and
then it would go and book you book you
the restaurant I don't I I just don't
remember using it but correct so again
there's the demo there's the demo and
then there's real life and but I think
it was also just like you gave a tech
command and it would go out and do that
for you um but I a lot of this stuff
like I said we we've seen demos we
haven't seen it uh actually work my
definition for agent uh is something
that can go out and and accomplish for
you uh so um you know you you had a a
good demo that I enjoyed watching about
um trying to go see a uh Red Sox Yankee
game uh by the way for folks listening
we the the reveal event was in New York
Daniel's apparently a Red Sox fan he
trolled the entire audience including
the guy sitting directly guy wearing a a
Yankee it was almost like he planned it
we I kept saying I'm are you sure you
want to do this Red Sox bit he's like
sure goes through the entire offseason
acquisitions which I like I mean as a
Mets fan I I will say fine you were fine
by the way you saw that that you saw the
info expert in action right there that's
what it was yeah because your and it was
it was not deterministic and then of
course it's a different answer every
time Alec um every time um Daniel did
the demo at the end of the day I mean it
was Alexa's decision to talk about Alex
bregman it wasn't Daniel's like you
couldn't lead that you can't you can't
plan that and so a bit of a risky demo
because if Alexa decided not to talk
about bregman I don't know where you
would have taken them I do know a lot
about the Red Sox so I figured you know
maybe eventually we get to buy some
tickets as what was I was thinking but
it it wasn't it was to set an example of
that kind of agentic capability of set
the Baseline of what we mean which is
hey I just I actually was just having a
a chat about the Red Sox could I get
some tickets actually that's a tough
game to get oh they're expensive can you
watch for tickets for me I mean that
that was where we ended up with the demo
could have ended up in a lot of
different places but being able to set
an agent off if you want to call it an
agent in that case we think about it a
little bit differently but in that case
that agentic capability to say first of
all I could buy you these tickets right
now second of all you don't like the
price I'll watch for you infinite
patience never runs out of gas if those
tickets do drop below a certain price
I'm notified and can buy them that's a
hugely useful thing for a customer yeah
and you could buy it with a command yeah
because you're integrated with Ticket
Master exactly yep so we so yeah to me I
would say that's a gentic behavior great
I would say it qualifies um we we had
some questions in in a we have a big
technology Discord I was like sharing
notes with the with the crew as the
event was going on and we had some um
notes from people about what they they
want uh sort of Beyond those simple use
cases cool what is I call it simple but
you know obviously there's a tech
there's a there's a tech lift to get it
done so um one one of our listeners said
is it going is Alexa still going to be
reactive to requests or can it be
proactive and suggest at the start of
the day
uh some smart ideas based on the context
that Amazon has for instance um I would
say you know do I need to order any
birthday gifts and it would then go out
go out and say well look on your
calendar there are you know five
birthdays coming up these are the dates
and these are our suggestions so is it
going to get is it because that's I
think a step further I think you're
stepping in you're stepping you said you
want to talk a little bit about the
future and how proactive Alexa can be
like there's a balance one we think
Alexa can be incredibly proactive like
to the point of when you wake up in the
morning you walk into the kitchen it's
like Alex you didn't sleep well you know
then you can imagine integration with
some partners that he like okay let's
have the conversation um also say hey
your day looks pretty packed today you
should probably find some time that
proactivity is there it's in the system
we're using it in a very different way
we don't want to be intrusive with it we
got to learn from our customers first
like how much proactivity do you want I
think it's very very important to you
know you don't want to jump to that
future you you got to be right so yeah
it's a good example wake up in the
morning and if I need to buy a birthday
gift can you just remind me we can
create reminders we can create a
conversational piece but I don't think a
lot of people want Alexa just to wake up
and start talking to you no I do think
that yeah don't want to be intrusive you
got to be really careful it's we got to
be so smart about you know we have 10
years of lessons this is what's so
awesome about it and you know how much
privacy matters and and when you want to
invoke Alexa to be part of the
conversation
versus um when you how proactive you
want you want it to be and you know we
have a balance on it but I think it's a
good push she's already proactive in the
spirit
of um she has a way to if if I if I went
out there and said hey I've been looking
for this I watched this movie last week
what was that song that was playing in
that movie okay give it that little
information check Prime video what was
he watching okay I got it I think you're
watching this movie it was this
song proactivity also includes do you
want me to play that song or you just
want the name of it and a lot of times
Alexa will say do you want me to play it
for you that's a subtle proactive it's
not intrusive it's using context you
know contextual information some memory
some of your history and in the past
you've asked me to play it every time so
I'm not don't I just ask you to play it
I think those are different forms of
productivity but our vision includes
Alexa being proactive it has to be that
we believe the next step customers will
ask for is I I want her more not less
right and so instead of me thinking oh I
should ask a
is there a point where Alexa will know
to ask me I think that's a real question
I don't think that's today I think that
is the future um and I think you know
back to where you know we're pretty well
positioned for that if that's what
customers want I think we can do it for
them but the I think what this listener
was asking is can I just like with
natural language say um can't you know
take a look at my calendar and oh yeah
tell me something okay so that's
different that's different sorry I went
all the way to my vision but but here's
what I'll P pitch back that already
happens okay so when you wake up in the
morning whoever that listener is here's
the answer yes with Alexa Plus or with
curent okay sorry not with Alexa right
so this there's no there's no way it's
going to happen with Alexa okay it's not
but with Alexa plus 100% wake up in the
morning get your daily brief tell me
what's going on and I you know Alexa
knows what time you start work will warn
you of the traffic you should probably
leave by 8:20 if you got to be there by
9: today like that level of proactivity
that's in the system but you have to
engage first okay uh this this idea of
of Alexa uh being proactive like it is
it's definitely I I see where your
caution is coming from because there are
these proactive notifications that you
get with Alexa I've had to turn them off
yeah um yeah we learned from that yeah
so okay that's that's good that um
there's learning
there I I could go with some other Alexa
product feedback but I feel let's let's
use our time let's stick with Alexa Plus
for a minute but if you want to talk
about Alexa and we can tell you if Alexa
plus has fixed your frustration we too
well the one thing I'll say is I've had
uh I use it to play alarms and there
have been moments where it will play the
ad before it will play the song in the
morning so um but but that kind of goes
to a question that we did also get in
the Discord where people talked about um
they talked about who whose assistant do
you
trust and in the back of some people's
head there will be this perspective um
Amazon is just going to try to sell me
something like for instance that example
of you didn't sleep very well like all
right or it's like a suggestion for
sleeping pills coming up I don't know
exactly what it is but like how do you
get past this perception of like I'm
going to get because you do with an
assistant you trust it with a lot of
data so how do you get to the point
where where people are comfortable
sharing this data and feeling good about
the fact that it won't be used to lead
to
purchases well I mean first I think even
before you get to that part of the
question it's just how do you manage a
customer's data how do they see
transparently what you're doing what
they've told the system how do they have
control over their data so all of That's
So Paramount uh that you have to start
there actually it's like one question
earlier than that which is do you trust
Alexa and the answer has to be yes so
we've been building on a foundation of
transparency and control there's the
Alexa privacy dashboard which one great
place to see everything in terms of
system settings and your data Etc I just
want to make clear all of that carries
forward to Alexa plus I think that's
sort of the the important point to make
at the top um and then if the question
is you
know is the question boy should I be you
know should I be offered a product in a
given case where a system thinks I need
it um I find that great when it's great
it is great when it's great like I found
a pair of shoes I I don't even think it
was on Amazon uh recently through uh
something I was reading online and I've
been I've got a orthotic and you know
it's great when it's great basically I
was referred something they're awesome
alas they have a wide toe box I'm not
going to sell alas on your show I'm just
telling you that I found them because if
you're listening we need sponsors it's
an arcade is this the camera which camer
that's the ultra sponsor yeah P give
them a
head
sponsors Alex needs sponsors it's an
Arcane example but the bottom line is
like it's great when it's great and why
is it great it's contextual it's
relevant it's offering me something that
I actually need and so Building Systems
where you can do that elegantly like
customers actually love that we get
feedback that that's great it's not what
what's terrible is when you get you know
inundated with things that are
irrelevant to you and so we're building
a system that doesn't do that does Alexa
need to have a screen I mean p put this
to you you're the head of devices at
Amazon a lot of the the demos uh that
you did at your launch event were with
Alexa with the screen again I have like
first or second generation echoes in my
house might be time to upgrade but you
should upgrade like there's a couple of
things you're missing one you're missing
speed that you could have that you don't
have and I think we speed is time for me
okay um it's comfort you know it's
confidence like there's so much like
first yeah I I would always
encourage not not just because I want to
sell the next device that's not why I
just having something modern if your
device is N9 years old you you you're
missing eight years of tech okay so I'm
judging you giving what you do you know
and so your feedback is like half her at
this point but I would say okay I say
that you know jokingly but I go look um
you need more you it's better the
product's just better as it you know
generationally generation over
generation always got better now does it
need a screen inredible yeah it does
does okay it doesn't have to have a
screen it's a better experience with a
screen okay it really is now let me let
me qualify it because you have a screen
in your pocket that works with
Alexa you have a screen on your desktop
that works with
Alexa the screen in your home you should
have one it's very powerful it's nuanced
it's not intrusive the new design is
elegant it's soft if that makes sense
like where it's what you want in the
home something softer um you can get the
expression from Alexa from that screen
and she brings visual Expressions as as
much as anything but but here's the
trick it will come with you in your
earbuds it'll come on your Alexa frames
it'll be in your pocket it will be in
your car so you don't always need a
screen but in your home I mean the the
the command and control the information
management what you get off of it it is
powerful will it work without a screen
absolutely absolutely and it'll be great
so need is a relative term I want you to
have a screen okay because it the
experience is that much better and uh
there's a Nuance in it like when we when
we start rolling out preview the first
customers to get preview will be our
screen based customers because it's the
best experience okay that simple and so
you'll be like I want the preview and
I'll I'll
say you need a screen get a screen all
right maybe two and then I'll light
we'll light up all your we'll light up
all your Echoes but but you need a
screen okay maybe one in the kitchen one
in the office you only need one yeah
well I mean it's up to you keep the
screen out the bedroom at least that's
that's my perspective totally like you
know the only the only screen I allow in
the bedroom is the Kindle that's a cool
product but I I'm using mine here you
know I just listening to you I by the
way I got the alarm in the morning note
I get that bug filed like I got you but
the but
the but the idea that different devices
work in different places is real right
but I think you need a central Hub right
now I think Alexa plus is so
Dynamic um and the more you can learn to
do the screen will teach you like hey
get after it y you saw Daniel's
Thumbtack demo which is a little bit
even is was more agentic than if you
will for us then the GrubHub slash did
we do GrubHub or open table last night
Open Table gu t with Uber right um but
the Thumbtack demo was you know
conversation let's I need a repair
person well that agent goes out and
starts booking it for you on the website
and then you need the screen to give you
a status like working on it back in a
bit don't worry about it okay I think
that is uh that's what you want that
Ambience for in the background so I
think the can't be more clear I don't
think I think it'd be great okay I'm I'm
sold I'm going to get one all right
we're running up on time here I want to
give you both a minute to answer uh this
question and then we'll head out but
it's got to be a minute or your team
here will have my head um uh we talked
about how uh voice AI might be the
future of AI or the Catalyst for these
large language models on the show a
while back open AI for instance debuted
or introduced this advanced form of AI
uh called uh in with GPT 40 and you can
see the inflection point of chat GPT
that the second they announced that bam
it goes from 100 million to 300 million
users um is voice AI the future of
artificial intelligence you want to
start and I'll close this out I mean
we've believe for a long time that voice
is the most natural interface uh we're
using it right now we're using it with
your listeners we're using it with each
other um it's incredibly expressive you
can load an unbelievable amount of
context and Power in it you can be
definite you can be vague you can be
nuanced so and it's just we're born with
the knowledge of how to use it and it's
completely intuitive so I think we do
strongly believe that it's one of the
best ways to get things done it is not
the only way to get things done but I do
think it's pushing us it's challenging
us to get more and more human more
natural and that's why it's always been
one of the kind of centerpieces of our
vision for Alexa so yes my answer is yes
and I think it's really pushing the
envelope now okay a minute two Panos I
think we're at that time we're this is
the inflection point and mentioned it
yesterday you know the I believe the
vision for Alexa is incredibly ambitious
it centers around voice for sure I don't
think it ends at voice I think the
interaction model needs to be the one
that's most natural to you no doubt if
you need to touch the screen to complete
a task if you need to get to your
computer and write the long form I think
it's a flow and what the thing you don't
want to do is you don't want to block
the customer from the interaction that
they need to go get something done it's
why we're on the phone it's why we're on
the PC it's why we're in your glasses
it's why we're in your ears and
ultimately though the anchoring point of
all of it is the voice because it is
natural it's innate to all of us the
trick is getting to natural conversation
the trick is trusting that you can just
talk and and realize that as we talk to
each other it's pretty sure you can talk
that way with Alexa and you're going to
find that and I think that is the
transformation that's coming I think it
finishes you know um the next chapter
ends the first chapter and starts the
next chapter and leads us to getting so
finishing is the wrong word there but
getting us to that next that next leap
over the next 10 years this is that
starting point that technology is
enabling it right now and that
inflection is happening um and it's
compelling so it was a longer way to say
yeah it starts with voice but I don't
think it ends with voice it never will
like we it is also a Nate test you
always we as as humans were always going
to find the
best and easiest path to get something
done and we think voice will lead to
most of that but not all of it like we
don't want to overstate it like we will
find the best easiest which means
basically the fastest path to completion
which is why you need to upgrade your
devices and get a screen you with me I
told you already I'm buying one all
right well get on it man we did sell we
sold at least one device here in New
York good news while we're here our goal
this week was not to sell devices but
we'll do that soon very efficient and SC
we're killing it now we have a new
sponsor we uh we sold a device this is
we're we're killing it well look Panos
and Daniel um I want to just say while
we're recording that I don't take it for
granted to be speaking uh on record with
Amazon um it's always great for me to be
able to hear what you're doing and be
able to ask these questions and I'm sure
for listeners uh it'll be great as well
so thank you both for being here and
thanks for coming you so much really
great awesome well thank you everyone
for listening and we'll see you next
time on big technology podcast