NVIDIA VP Rev Lebaredian Talks Plan To Build AI That Understands The Real World

Channel: Alex Kantrowitz

Published at: 2025-02-05

YouTube video id: n_A1Nf7mjjA

Source: https://www.youtube.com/watch?v=n_A1Nf7mjjA

Let's talk about Nvidia's push to
generate AI that understands the real
world with technology that can influence
the future of robotics, labor, cars,
[music] Hollywood, and more. We're
joined by the company's VP of Omniverse
and simulation technologies right after
this.
Welcome to Big Technology podcast, a
show for cool-headed nuanced
conversation of the tech world and
beyond. Today, we're joined by Rev
Lebaredian. He's the vice president of
Omniverse and simulation technology at
Nvidia for a fascinating conversation
about what may well be the next stage of
AI progress, the pursuit of world models
that provide common sense to AIs. Rev,
I'm so happy to see you here. We
actually spent [music] some time
at your headquarters a couple months
back and I'm really glad that you're
here today and to introduce you to the
[music] Big Technology audience. Welcome
to the show.
Thank you for having me. All right,
before we jump into world models,
obviously, we're having this
conversation in the wake of the deep
seek revolution. I don't know what you
want to call it. And everyone is talking
about Nvidia now. You're in quiet
period, so we're not going to go into
financials, but I can and do want to ask
you about the technology side of this,
specifically about Jevons paradox.
I keep hearing Nvidia, Jevons paradox,
Jevons paradox, Nvidia. What is Jevons
paradox and what do you think about it?
Uh my understanding of what Jevons
paradox is is essentially an economic
kind of principle that as you um reduce
the cost of something of running it, you
create more demand for it because it
unlocks essentially more uses of of that
technology when it becomes more
economically feasible to use that.
Uh I think I think that really does
apply in this case in the same way that
it applies to almost every other
important computing innovation uh over
the last 40 50 years or at least as long
as I've been alive.
Um
you know, the inception of Nvidia in
1993,
uh Nvidia selected
uh very carefully selected the very
first computing problem to address
um in order to to create the conditions
by which we could continue innovating
and grow keep growing that market. And
this was the the problem of computer
graphics and particularly
rendering within computer graphics,
generating the these these images.
Uh the reason we selected it is it's
because it's an endless problem.
No matter no matter how much compute you
throw at it, no matter how much
innovation we throw at it, you always
want more.
And throughout the time I've been at
Nvidia, which is now 23 years,
you know, many times I've heard, "Well,
graphics are good enough.
Rendering is good enough. And so, um
soon soon Nvidia's big GPUs and more
computing power is not going to be
necessary. We'll just get consumed by
SOCs or integrated into another chip as
integrated graphics and it'll
disappear." But that never happened
because
uh the fundamental problem of simulating
the how the physics of light and matter
was endless.
Uh we see this in almost every important
computing domain.
AI is one of these things. I mean, can
we really say that that um we have now
reached the point where
our computers are intelligent enough or
the intelligence we create is good
enough and so it's just going to shrink.
We're not going to have any more use for
more compute power there?
I don't think so. I think intelligence
is something that is
probably the most endless of all all
computing problems. If we can throw more
compute at the problem, we can make more
intelligence and do it better and
better. So, making making
AI more efficient
will just increase its economic value
uh in in many many the applications we
want to apply it to and increase demand.
And can we talk about the progression of
AI models becoming more efficient? I
know it's like a hot topic right now,
but it does seem to me that over the
past couple years we've definitely seen
models become more and more efficient.
So, what can you tell us about this what
just talk about large language models on
this front. Um the efficiency gains that
we've seen over time with them.
I mean, this is this isn't new. This has
been happening
uh for the past 10 12 years or so,
essentially since we first
um discovered deep learning on our GP
GPUs with AlexNet.
Um if you look at the the uh
computational curve, what our GPUs can
do
um in terms of uh tensor operations, the
AI kind of math that we
need to do, over the last 10 years,
we've had essentially a million X
performance increase.
And that increase isn't isn't just from
the raw hardware. It's also through
through many layers of the software,
algorithms. So, we're getting these
these benefits, these speedups
continuously at a very rapid rate,
exponentially,
by compounding many um
many layers at all the different layers
at which
this computing happens, from the
fundamental hardware, the chips
themselves, at systems level,
networking, system software, algorithms,
frameworks, and so on. Um so, what what
we've seen here with with deep seek is
is a great advancement that's on that
same curve that we've been on for for a
decade now.
Okay, and 23 years at Nvidia. I'm going
to save a question to ask you about that
as we get later on or towards the end of
the interview cuz I'm very curious what
your experience has been being at Nvidia
for so long, especially given that, you
know, the company's technology
um at least from the outside world was
viewed as in favor and then people
questioned it and back in favor, people
questioned. Obviously, we see what's
going on now. Maybe we're living through
a mini cycle at this point. So, very
curious about your experience, but I
want to talk about the technology first.
And let me just bring you into the
conversation that we had here on the
show with Yann LeCun, who's Meta's chief
AI scientist, really right after ChatGPT
came out. And one of the things that
Yann did was he said, "Go ask ChatGPT
what happens if you let go of a piece of
paper
with one with your left hand." And I
typed it in, it gave a very convincing
answer. It was completely wrong because
with text, you don't have the common
sense about physics. And try as you
might to teach a model physics with
text, you can't. There's just not enough
literature that describes what happens
when you drop a paper with a hand and
therefore the models are limited. And
Yann's point here was basically like,
"If you want to get to truly intelligent
machines, you need a
build something into the AI that teaches
common sense, that teaches physics, and
you need to look beyond words to do
that." And so, now I turn it over to
you, Rev, because I do think that right
now within Nvidia
a big initiative is to build a picture
of the world
to teach AI models that common sense
that Yann had mentioned was lacking. And
I have some follow-ups about it, but I
want to hear first a little bit about
what you're doing and whether your
efforts are geared towards solving the
problem that Yann brought up.
Well, what Yann said is absolutely true.
Uh and it it makes intuitive sense,
right? If a an AI has only been trained
on words, on on text that that we've
digitized, how can it possibly know
about concepts from our physical world,
like what the color red really is?
Or what what it means to hear sound.
What it what it means to to feel uh
fells. You know, it can't it can't know
those things because it never
experienced them.
When when we train a model, essentially
what we're doing is we're providing life
experience
to that model and it's and it's pulling
apart patterns or it's discerning
patterns from all of the experience that
we give it.
And what's what was really really
amazing about uh GPT, the advancements
with LLMs, you know, starting with the
transformer, is that we could take
um
this really really complex um set of
rules that humans had no way of actually
uh
defining directly in a in a clear and
robust manner, the rules of language,
and we were able to pull that out of a
corpus of data.
We we took all of this text, all the all
these books, and and whatever
information we could scrape from the
internet about that. And somehow
this model figured out what all the
patterns of language it are in many
different languages and could then,
because it understand the fundamental
rules of language, do some amazing
things. It could generate new text, it
could style some text that you give it
in a different way. It can translate
text from one form to another, from one
language to another. It can do do all of
this awesome stuff.
Um but it lacks any information about
our world other than what's been
described in those words.
And so, the next step is the next step
in AI is for us to take the same
fundamental technology we have, this
machine we have, where we can feed it
life experience and it figures out what
the patterns and the rules are
and feed it with uh actual data about
our physical world
and about how our world works so that it
could apply that same learning to the
rules of physics in instead of the rules
of grammar, the rules of language. It's
going to understand how how how the
physical world around us works.
And our
uh thesis is that from all the AIs we're
going to create into the future, the
most valuable ones are going to be the
ones that can interact with our physical
world.
The world that we experience around us,
the world created out of atoms.
Today, the AIs that we're creating
are largely about our world of
knowledge, our world of information,
ones and zeros, things that you could
easily represent inside a computer in
the in the digital world.
Uh but but uh if we can apply the same
AI technology to the physical world
around us
then essentially we unlock robotics.
We can have these agents
uh with with these with this
intelligence and even super intelligence
uh in specific tasks
do amazing things in the world around
us, which is um if you look at global
markets
um
if you look at uh all of the commerce
happening in in the world and GDP
um
the world of knowledge, information
technology
is somewhere between two to five
trillion dollars a year.
But everything else, transportation,
manufacturing,
supply chain, warehouse and logistics,
uh creating drugs, all the stuff in the
physical world, that's about a hundred
trillion dollars.
So, the application of of of this kind
of AI to the physical world uh is going
to bring more value to us.
So, it's interesting. It's not just
basically inputting that real-world
knowledge into LLMs, right? So, they can
get the
question about dropping the paper with a
hand correct.
It is also something that you're working
on is building the foundation for
robots to go out into our world and
operate within it.
So, yes, it's not it's not inputting it
in the same way that we do for these
text models. We're not just going to
describe
uh with words how what happens when you
drop a piece of paper.
We're going to give these models
uh other senses
during the learning process. So, they'll
they'll watch um watch videos of of
[snorts]
paper dropping. We can also give it
uh more more accurate specific
information in the 3D realm
uh because we can simulate
these physical worlds inside a computer
today. We have physics simulations of
worlds.
We can pull ground truth data
about about the position and orientation
and and uh state of things inside that
3D world and use that as another mode of
input into these models.
And so, what we'll end up with is a a
world foundation model that was trained
on many different modes of data,
essentially different sense senses. It
can see, it can hear, it can um touch
and feel and do
do many of the things we can do or many
things other animals or or even things
no no creature can do cuz we can provide
it with sensors that don't exist uh
inside inside the natural world.
And uh it can from that
kind of decipher what are the actual
combined rules of of of the world.
And this
this um encoding of the knowledge of how
the physical world works
can then be the basis for us to build
agents
inside the real world, to build the
brains of these agents, otherwise known
as physical robots.
Right. And so, this is your recently
announced Cosmos project. So, talk a
little bit about like what Cosmos is. I
mean, obviously it's a world
foundational model, but uh
where how long you've been building it
and what type of companies are and
developers might use it and what they
might use it for.
Um
we've been we've been working towards
Cosmos uh
for probably about 10 years.
We envisioned that eventually
uh
this new technology that had formed with
deep learning
that that was going to uh be the the
the
uh
critical technology necessary for us to
create robot brains.
And that that that was is ultimately
what's going to unlock this incredible
amount of value for us. So, we started
working towards this a long time ago.
Uh we realized early on that the big
problem we were going to have is in
order to train such a model
to train uh a robot brain to understand
the physical world and to to work within
it we're going to have to give it
experience.
We're going to have to give it the data
that represents the physical world and
capturing this data from the real world
is
is not really uh an easy thing to do.
It's very expensive and in some cases
very dangerous.
For example, for self-driving cars
uh which is a type of robot. It's a
robot that can autonomously, you know,
on its own figure out how to get from
point A to point B by controlling this
physical being, a uh
a a car, by braking and accelerating and
steering. How are we going to ensure
that a a self-driving car really
understands when a child
runs into the street as it's barreling
down the street that it should stop? And
how can we be sure that it's actually
going to do that without actually
doing that in the real world? We don't
want to go capture data of a child
running uh across the street.
Well, we can do that by simulating it
inside the the inside a computer.
And so, we realized this early on. So,
we set about applying all of the
technologies we'd been working on up
into that point with computer graphics
and for video games and video game
engines and physics inside these worlds
to create a system to do
uh world simulation that was physically
accurate so that we could then train
these AIs. And so, we call that um
uh operating system, if you will,
Omniverse. Uh it's a system to create
these uh
uh physics simulations, which we then
used to train AIs
that
we could test in that same simulation
before we put them out in the real
world. So, we use it for self-driving
cars and other other robots out there.
So, building Cosmos actually starts
first with simulating the world.
And so, we've been building that stack
and those computers for quite a while.
Um
once
uh once the transformer model was
introduced and we started seeing
the the amazing things large language
models can do and the ChatGPT moment
came
um we understood that this had
essentially unlocked uh the one thing
that we needed to really push forward in
robotics, which is the ability
uh to to have this kind of general
intelligence about a really complex set
of things, complex set of rules.
And so, so we set about um
building what is Cosmos today
essentially a few years ago
using using all of the technology we had
built before for with simulation and AI
training.
And what Cosmos is is uh it's actually a
few things. It's a collection of uh some
open weight models
uh that that were
that we made freely available.
uh
Uh along with it, we also provide
um
uh essentially all of the tooling and
pipelines you need to create a new world
foundation model.
So, we give you the world foundation
models that we've started training which
are world-class, especially for the
purposes of building physical AI.
And we also have a
uh what's called a tokenizer, which are
AIs themselves
uh that are world-class. It's a critical
element
of
uh of of building world foundation
models. And then we have uh curation
pipelines. The data that you
you select and curate to feed into the
training of your world foundation model
is critical. And just selecting the
right data requires a lot of AI in it of
itself. So, we released all of this
stuff and we put it out there in in the
open so that um the whole community can
join us in in building physical AI.
And so, who's going to use it? Is it
going to be robotics developers? Is it
going to be somebody that's building,
let say LM based application, but just
wants them to be a little smarter?
Both? It will be all of them. Yes, it's
uh
we
we feel that we're
as a as as the industry, the world is
right at the beginnings of this physical
[snorts] AI revolution, and
no one company, no one organization
is going to be able to build everything
that's that that we need.
So so we're building it out there in the
open
uh to encourage others to come build on
top of what we've built and come build
it with us.
And this is going to be
uh essentially anybody that has an
application that involves the physical
world.
And so that's definitely robotics
companies are part of this and and
robotics in the very general sense. That
includes self-driving car companies,
robo-taxi companies, and as well as uh
companies building robots that are in
our factories and warehouses.
Anybody that wants to make intelligent
robots that have perception and operate
autonomously inside the real world,
they want this.
But um
it's not it's not only about robots in
the way we think about them as as these
agents that move around.
Uh we have sensors that we're placing in
our spaces in in in our cities, in urban
environments, inside buildings.
Uh these sensors uh need to understand
what's happening in that world. Maybe
for security reasons, for for
coordinating um
other robots, changing the climate and
and uh energy efficiency of of our
buildings and data centers. So there's
there's many applications of physical AI
that are um
broader than what we generally think of
as as these what what you imagine when
you say a robotic application.
There's going to be thousands and
thousands of companies that
that build these physical AIs, and um
this is just the beginning.
Now, you mentioned that the transformer
model was an important development on
this path.
And that obviously was the thing that
underpinned a lot of the real innovation
we've seen with large language models.
Can the real-world AI learn from the
knowledge base that has been sort of
turned into these AI models with text?
Like if you're if you have a model
that's trying to understand the world
with common sense, do they take text as
an input?
They take all of it as input. How does
it work then with with text? I mean,
it's very interesting because it seems
like that's like
when we talk about the progression
towards general intelligence, that is a
very, you know, kind of amazing
application of being able to
read something and then sort of intuit
what it means in a physical space. Don't
you think?
Yeah, I think um the the way I think
about it, and I think this is right,
is these AIs learn the same way we do.
Uh when when you're brought into this
world,
you don't know uh who is mommy, who is
daddy. You don't even know how to see
yet. You don't have depth perception.
You can't see color or understand what
it is.
You don't know language. You don't know
these things.
But you learn by being bombarded with
all of this information simultaneously
through the many different senses.
So when when um your mommy looks at you
and says, "I'm mommy." pointing,
you're getting multiple modes of
information coming coming at you,
including essentially that text that's
coming through in audio form there.
Um and then eventually when you learn
how to read,
um you you learn how to read because a
teacher points at letters and then words
and sounds them out.
So you have this association that you
build between the information that
um
uh uh you understand, like like mommy,
and the letters that mean that thing.
AIs learn in the same way. When we train
them, if you give them all of these
modes of information associated with
each other at the same time, it'll it'll
associate them together. That that's how
image generators work today. When you go
generate an image using a text prompt,
uh the reason why it can generate, you
know, a an image of a red ball in a in a
uh
uh grass field.
Uh uh in an overcast day is because
um when it was trained, there was an
association of some text along with the
images that were fed into it.
It knew that during the training process
that the that these these um words were
related to that image, and so we can we
can gather
um
uh that understanding from from that
association.
What we what we're trying to do with
world foundation models is take it to
the next level by giving it more modes
of information and richer information,
but part of that will still include
text. We'll we'll feed in the text along
with um
uh with the video and and other ground
truth information from the physical
state of the world. Yeah, so this is
going to be a multi-part question, and I
apologize, but um I don't really know
another way to ask it. So what are the
the other modes of information that
you're feeding in there?
And do you really need to go through
this simulation process? And I'll tell
you, you know, it it all it it all
sounds like a worthwhile endeavor to me,
and I'm sure it is. But I also see video
models today, and that is something
that's really surprised me
when we've seen the video generation
models, is that they really have an
understanding of physics. Like they
know, just as an image like an image
generation is not moving, right? So you
know that let's say the guy sits on the
chair.
But video, you could see people walking
through a field, and you watch the grass
move. And that means that those models
inherently have a concept of how physics
works, I think. And I'm going to run it
by you cuz you're the expert here. But
like again, and Jan's going to come on
the show in a couple weeks, so maybe
this is just in my mind because um I'm
gearing up and and thinking about our
last conversation, but I'm going to put
this to you also. Maybe I'll ask uh what
what your answers I'll ask him to weigh
in on your answers on this. But the
thing that he always talked about is
a human mind is able to sort of see
infinite possibilities and accept that.
It doesn't break us. So if you have a
pencil, and you hold it up, you know
it's going to fall, but you know it
could fall in infinite possibility in
infinite ways, but it's still going to
fall.
For an AI that's been trained on
different scenarios, it's very difficult
for them to understand that that pencil
might fall in infinite ways when asked
to generate it. However,
they've been doing a very good job with
the video generators of like showing
that they understand that. So
uh just to sort of reiterate, what
different modes of information are you
using, and why do we need this broader
simulation environment or this cosmos uh
tool if we're getting such good results
from video generation already?
Uh
all very very good questions. So first
off, we use many many modes. Uh the
primary one though for training cosmos
is video. Uh just like the video
generation models. But along with that,
there's text. We also feed it um extra
information and labels
that we can gather from um
uh data, particularly when we do when we
train
when we generate the data synthetically.
If you use a simulator to generate the
videos,
you have
perfect information about everything
that's going on in every pixel
in that video.
We know how far each object is in each
pic pixel. We know the depth. We know um
uh
what the object is in each pixel. You
can segment out uh all of that stuff.
Traditionally, um what we've done
uh for perception training for
autonomous vehicles, so we've used
humans to go through and label all that
information
from hours and hours of of video that's
been collected, and it's inaccurate and
not um not complete. So
so from simulation, we can get perfect
information about the actual videos
themselves.
Now, that being said, your your question
about
you know, these video models seem to
really know physics and do uh and know
it know it well.
Uh I I think it is pretty amazing, you
know, how much physics they do know.
Um and and it's kind of surprising we're
here at this point. Like had you asked
me
5 years ago,
would we be able to generate videos with
this
uh this much physics plausibility at
this stage?
I wasn't sure actually, cuz I
continually had been wrong for years
prior to that. I didn't expect to see
image classification in my lifetime uh
until we saw it with AlexNet. Um but but
I I would have bet against it. And so so
we're pretty far along.
That being said,
there's a lot of flaws in the physics we
see. So you see this in the video, one
of the one of the
basic things is object permanence.
If
uh you direct the video to move the
camera
point away and come back,
Objects that were there at the beginning
of the video are no longer there or
they're different, right? And so, that
is such a fundamental violation of the
laws of physics, um it's kind of hard to
say, well, these models currently
understand physics well.
Uh and there's a whole bunch of other
things in there. Um
you know, my my um
life's work has been
primarily computer graphics and
specifically rendering,
which is a uh 3D rendering is
essentially a physics simulation. It's
the simulation of how light interacts
with matter
and eventually reaches a sensor of some
sort. Like, we simulate what a camera
would do in a 3D world and and and what
image uh uh it would it would gather
from the world.
Um
when I look at a lot of these videos
that are generated,
I see tons and tons of flaws because
when we do those simulations and
rendering,
uh we're attuned to seeing when shadows
are wrong and reflections are wrong and
and and these sorts of things.
The
to the untrained eye, it looks
plausible.
It looks it looks correct, but I think
people can still kind of feel something
is wrong, you know, when when it's AI
generated, when it's not in the same way
that
for for decades now since we introduced
computer graphics to visual effects in
the movies,
you know, when some you don't you don't
know what it is, but but if the the
rendering's not great in there, it just
feels CG, it feels wrong.
There we still have that kind of uncanny
valley thing going on.
That all being said, I think we're going
to rapidly get better and better. So, so
the the models today have
um have an amazing amount of knowledge
about the physical world, but they're
maybe at like
5 10% of what they should understand. We
need to get them to 90 95%.
Right. Yeah, I just saw a video of a
tidal wave
hitting some island and I looked at it
was like super realistic. Of course, it
was on Instagram because that's all
Instagram is right now is 3D generated,
I mean, AI generated video and it took
me a second and it's more frequently
taking me a minute to be like, oh,
that's AI generated. And sometimes I
have to look in the comments and just
sort of trust the wisdom of the crowds
on that front. But you might you might
not be the best judge uh of it as well.
Humans, I mean, we're not particularly
good at knowing uh whether physics will
really be accurate or not. This is why
movies, you know, directors can take
such license with
uh with the physics when they do
explosions and and all kinds of other
fun stuff like tidal waves in there.
Yeah. Well, it's it's interesting like
uh I had some comedian made this joke.
They're like, uh
Neil deGrasse Tyson likes to come out
after these movies like Gravity and talk
about how they're like scientifically
incorrect and uh
some comedian's like, yeah, well, how
about the fact that George Clooney and
Sandra Bullock are the astronauts? That
didn't bother you at all? But it is
interesting that we can watch these
videos, watch these movies and fully
believe, at least in the moment, that
they're real. Like, we can allow
ourselves to like sort of lose ourselves
in the moment. Exactly. And just be
like, yep, I'm I'm in this story. I feel
emotion right now watching, you know,
George Clooney in a spaceship, even
though I know he's no astronaut.
And I think for that purpose, I mean, I
worked on movies. Before I was at
Nvidia, that's that's what I did,
computer graphics for visual effects.
Um
that is a perfectly legitimate use of
that uh technology.
It's just that that level of simulation
is is not sufficient
for building physical AI that are that
are
going to be the underpinnings or the
fundamental components of a robot brain.
I don't want my my self-driving car or
my robot operating heavy machinery in a
in um in a factory
to be trained on physics that's not that
doesn't match the real world.
Even if it looks right to us, if
if if it's not right, then it's not
going to behave correctly and and that's
that's dangerous. So, so it's a it's a
different purpose. That's why what we're
doing with Cosmos,
uh it's
it it is really a different class of AI
than video generators.
You can use it to generate videos,
but the purpose is different. It's not
about generating beautiful imagery or
interesting imagery as for art.
This is about simulating the physical
world
using AI to to uh
create the the simulation.
Rev, I want to ask you uh one more
follow-up question about not the flaws,
but the video generator's ability to get
things right. And then we're going to
move on from this topic, but it is just
surprising and interesting for me to
hear you and Demis Hassabis, the CEO of
Google DeepMind, who was just on, who
commented on this, talk about how these
video generators have been surprisingly
good at understanding physics and Jan
also basically in our conversations
previously
effectively saying that it's very
difficult for AI to solve these
problems. I won't say they've solved it,
but everybody's surprised they've gotten
to this point. So, what is your best
understanding of how they've been,
though flawed,
like this good?
You know, this is the
uh uh
trillion-dollar question, I guess. You
know, we've been
we've been betting now for years that
if we just throw more compute and more
data
at at the problem,
that
these scaling laws are going to give us
a level of intelligence
uh that's really, really meaningful.
That that will be like step function
changes in in uh capabilities.
There's no way for us to know for sure.
It's very hard to predict that. It feels
like we're on an ex- we are on an
exponential curve with this, but which
um
uh part of the exponential curve we're
on, we we can't tell.
So, we don't know how fast that's going
to happen. Honestly, uh I'm I've been
surprised
at how how well these transformer models
have been able to extract the laws of
physics at this to this level by this
point in time.
Uh I have
at at this point, I believe
in a few years, we're going to get to a
level of a physics understanding with
our AIs
that are that's going to unlock,
you know, the majority of the
applications we need we need to apply
them in in robotics.
Let me ask you one one more question
about this, then we're going to take a
break and talk about some of the
societal implications of
putting robotics, let's say, in the
workforce and in I don't know, in all
different areas of our lives.
There's definitely a sizable portion of
the population that is going to be
surprised, maybe not our listeners, but
a sizable portion of the population that
would be surprised to hear that Nvidia
itself is building these foundation
these world foundational models,
releasing weights to help others build
on top of them.
The perception, I think, from uh someone
on the outside is, hey, isn't Nvidia
just the company that makes those chips?
So, what do you say to that, Rev?
Well, yeah, that's that's been the
perception. It's been the perception
since I started at Nvidia 23 years ago
and it's never been true that we just
build chips. Chips are very, very
important part of what we do.
Uh they're the foundation that we build
on.
But when I joined the company, there
were about a thousand people, thousand
employees at the time.
The grand majority of them
are were engineers. Just like today, the
majority of our employees are engineers.
And the majority of those engineers are
software engineers.
I myself am a software engineer. I I I
wouldn't know the first thing about
making a chip.
And so,
our form of computing,
um accelerated computing, the form of
computing we invented,
is a full stack problem. It's not just a
chip. Uh it's not just a chip that we
throw over the fence and leave it to
others to figure out how to make use of
it. It doesn't work unless we have these
layers of software
and these layers of software
um have to have algorithms
that that are harmonized with the
architecture of our of our chips and our
systems.
Uh so, we we have to
uh go in these new markets that we
enter, what Jensen calls zero
billion-dollar industries,
we have to actually go invent
uh these new things kind of top to
bottom cuz they don't exist yet and
nobody else is going to likely to do it.
Um so, we build a lot of software and we
build uh a lot of AI these days because
that's what's necessary in order to
build the computers to power all of this
stuff.
We did this um
with LLMs early on. Uh many, many years
ago, we trained the at the time what was
the largest model in terms of number of
parameters for an LLM, it was called
Megatron.
And because we did that, we build our
computers, our uh uh chips and and
computers and the system software and
the the frameworks and and uh pipelines
and everything
uh
uh we we were able to tune them to do
these large-scale things
and we put all of that all of that
software out there
which was then used to create all the
LLMs we enjoy today. Had we had not done
that, I don't think we would have had
chat GPT.
And so so this is essentially the same
thing.
Uh we're we're uh
creating a new market a new capability
that doesn't exist.
We see
uh this as being an endeavor that is
greater than Nvidia. We need many many
others to participate in this. But there
are some things that we're uniquely
positioned to contribute given our scale
and our particular expertise. And so
we're going to go do that and then we're
going to make that freely available to
others so they can build on it.
Yeah, for those wondering why Nvidia has
such a hold in the market right now, I
think you you just heard the response.
So I do want to take a break and then I
want to talk about the implications for
society when we have let's say humanoid
robots doing labor
in that part of the economy that we
simply
you know haven't really put AI into yet
and what it means when it's
many more trillions of dollars than the
knowledge work. So we're going to do
that when we're back right after this.
And we're back here on Big Technology
podcast with Rev Lebaredian. He's the
vice president of Omniverse and
simulation technology at Nvidia. Rev, I
want to just ask you the question that
obviously has been bouncing around my
mind since we started talking about the
fact that you're going to enable
robotics to be able to sort of
take over. I don't know. Is take over
the right word?
Take over a lot of what we do currently
in the workforce. I mean what do you
think the labor implications are here
because
yeah, if you're if you've spent your
entire life you know working at a
certain manual
task and next thing you know someone
uses the
you know the Cosmos platform or your new
I think it's like a Groot it's called
What is it called? Groot?
Groot, that's our project for humanoid
robots for
Yeah.
building and training humanoid brains.
So all right. So Groot you know that
some company uses Groot to start to put
a humanoid work for humanoid labor in a
let's say a factory or even as a care
robot and I'm a nurse and all of a
sudden
some Groot built robot is now helping
take care of the elderly.
What are the labor implications of that?
Well, first and foremost, I think we
need to understand that
uh this is a really hard problem. It's
not like overnight we're going to have
robots replace everything humans do
everywhere.
It's a very very difficult problem.
We're just now at an inflection point
where we can finally
um we we see a line of sight
to
to building the technology we needed to
unlock the possibility of these kind of
general purpose robots.
And that's
we can now build a general purpose robot
brain.
20 years ago that was not true. We could
have built the physical robot, the
actual body of a robot, but it would
have been useless because we couldn't
give it a brain that would let it
operate in the world
in a general purpose manner. We couldn't
interact with it or program it in a
useful way um to to do anything. So so
that's that's what's been unlocked here.
I talk to a lot of uh CEOs and and
executives
for
uh companies in the industrial sector
and manufacturing and
uh warehousing
um to companies in
uh to retail companies
um in all of these companies I talk to
in every geography there's a recurring
theme.
There's [snorts] a demographic problem
the whole world is facing.
We we don't have as many young people
who want to do the jobs
that the older people who are retiring
now have been doing.
If you go to an automotive factory
in
um in Detroit or in Germany
go look around. Most of the factory
workers are aging and they're quickly
retiring.
And and these CEOs
that I'm talking to
their biggest concern is all of that
knowledge they have on how to operate
those factories and work in them it's
going to be lost.
The young people don't want to come and
do these jobs.
And so we have to solve that problem if
we're going to maintain
uh not just grow our economy but just
maintain where the economy is at and
produce the same amount of things
we need to find some solution
to
to to this to this problem. We don't
have enough workers. We've been seeing
it in transportation. There's not enough
truck drivers in the world to go deliver
all the stuff that's moving around in
our supply chains.
We can't hire enough of them and there's
less and less young people that want to
do the that job every year.
So we need to have self-driving trucks.
We need to have self-driving cars
to to solve that problem. So I think
before we talk about replacing jobs that
humans want to do
we should first be talking about the
uh using these robots
to fill in the gap that's being left by
humans because they don't want to do it
anymore.
Right and there could be specialization
like
take nursing for example. The nurse that
injects me with a vaccine or the nurse
that like puts medication in my IV,
maybe we keep that human for a while
even though you know they make mistakes
too but I'd feel a lot more comfortable
if that was human. The nurse that takes
me for a walk down the hall after I've
gotten a knee replacement
uh that could be a robot. It'd be better
to have a robot.
we'll see how this plays out. Uh
we're
we believe that the first place we're
going to see general purpose robots like
the humanoid robots really take off
is in the industrial sector
because of two things. One, the demand
is great there
because we have the shortage of workers.
Um and also because
uh
it makes it makes more sense to have
them adopted in these spaces where a
company just decides to put them in
there and mostly warehouses and
factories are kind of unseen.
I think the last place we're going to
start seeing humanoids show up is in our
homes.
in your your kitchen. Don't tell Jeff
Bezos that.
Well,
they will show up there and I think it's
going to be uneven. It'll depend uneven
geographically. They'll probably show up
in a kitchen in somebody's home in Japan
before they show up in a kitchen in
somebody's home in in Munich.
in Germany. And I think that's a
cultural thing. Um you know I
I personally
don't even want another human in my
kitchen. I like
being in my kitchen and and
uh preparing stuff myself. My wife and I
are always in each other's space there
so we get kind of annoyed. So having
having a humanoid robot would be kind of
weird. I don't I don't even want to hire
somebody else to do that. We kind of do
that ourselves. So that's a kind of
personal decision.
I think things like
um jobs like caring for our elderly and
um
and health care
those are very human
uh human professions. You know, there's
a lot of a lot of what the care is it's
not really about the physical thing that
they're doing.
It's about the emotional connection with
another human.
And for that
um
I don't I don't think
robots are going to take that away from
us anytime soon. Well, the question is
do we have enough care professionals to
take those jobs? That's the one that
really seems in danger.
likely to happen is it'll be a
combination. The care professionals we
do have
will do the things that require EQ
that require empathy that requires you
know really understanding the other
human you're taking care of.
And then they can instruct the robots
around them
to to assist them to do all of the more
mundane things like cleaning and and
maybe maybe giving the shots and IVs. I
don't know. How long away is is that
future, Rev? What do you how long do you
think?
Um
you know, I wouldn't venture to guess on
on that kind of interaction in a in a
hospital or a care situation quite yet.
I believe it's going to happen in the
industrial sector first and I believe
that it's within a few years we're going
to see it uh we're going to see humanoid
robots
um
widely widely used in the most advanced
uh manufacturing and warehousing.
Wild.
Okay, I want to ask you about Hollywood
before we go. Um I guess I have this
question rattling in my mind which is
are we just going to see like movies not
that movies that look real but are
computer generated? Like we have
computer generated movies now with the
CGI but they all look uh,
pretty CGI-y.
But, I imagine we'll
don't all look CGI-y. Some of them look
pretty amazing.
But, I'm I'm curious like do you think
that like is Hollywood going to move to
a area where it's super real and just
simulated? You go ahead. Absolutely. I
mean, well, um,
was it a year or two ago when the last
Planet of the Apes came out?
I went to go see it with my wife. Now,
my wife,
uh, and I have been together since I
worked at Disney in the mid-90s working
on visual effects and rendering. We I
had a a startup company doing rendering
and she was a part of that. So, she she
has a good eye and she she's been around
computer graphics and rendering for
decades now.
When we went to go see Planet of the
Apes,
even though obviously those apes were
not real,
at one point she turned around and said,
"That's all CG, right?"
She couldn't quite believe it. I think
what Weta did there is is amazing. It's
indistinguishable from real life
except for the fact that the apes were
talking. Like, other than that,
[laughter]
it's indistinguishable.
The the problem with
with that though is to do that level of
CG
in the traditional way that we've done
it
requires
an incredible amount of artistry and and
skills
that only only a few studios in the
world can do with the teams that they
have and the
pipelines they've built and it's
incredibly expensive to produce that.
What we're building with AI,
with generative AI and particularly with
world foundation models
that once we get to the point where they
really understand
the depths of the the physics that they
need to to produce something like Planet
of the Apes,
once we have that, of course of course
they're going to use
those technologies to produce the same
images cuz it's going to it's going to
be a lot faster
and it's going to be a lot a lot less
expensive to do the same things. It's
already starting to happen.
Rev, I know we're getting close to time.
Do I have time for two more questions or
Absolutely. Okay.
So,
the more I think about robotics, the
more I think about
sort of what the application in war
might be. I know that like
you can't think of every permutation
when you're developing the foundational
technology, but we are living in a world
where war is becoming much more
roboticized and it's sort of like
remarkable that, uh, we have some wars
going on where people are still fighting
in trenches.
Um,
so I'm just curious if you've had given
any thought to like how robotics might
be applied in warfare and whether
there's a way to prevent some of like
the the bad uses, uh, that might come
about because of it.
You know, I'm I'm not really an expert
in in warfare, so I I don't feel that
I'm the best person to to talk about how
it might be used or not, but I can say
this.
Um,
this isn't the first time where a new
technology has been introduced
that, um,
uh, is so powerful that not only can we
imagine
great uses of it that are beneficial to
people, but also really
really scary, devastating consequences
of it being used particularly in
warfare.
And somehow we've managed to to, um, not
not have that kind of devastation.
And in general, the world has gotten
better and better, more peaceful and
safer despite what it might feel like
today. By almost any measure, we have
less lives lost through wars and and,
um,
uh, these sorts of tragedies than ever
before in
in mankind's history.
Uh, the big one of course everybody
always talks about is,
uh, nuclear technology.
I mean,
uh, I I grew up I was a little kid in
the '80s.
Uh, this is kind of the height of the
Cold War, the end of it.
But,
every day I remember thinking,
thinking, you know, it's might happen.
We might we might have some ICBMs,
um, arrive in Los Angeles at any point.
And it hasn't happened because somehow
um, the general understanding by
everyone collectively such that this
would be so bad for everyone that we put
together systems
even though we had intense rivalry and
even
enemies,
uh, um,
um, between between the Soviet Union and
the US,
um, we somehow figured out that we
should create a system that prevents
that sort of thing. We've done the same
with biological weapons and chemical
weapons. Largely they haven't been used
even though the technology's existed
there.
And so, [clears throat]
I think that's a uh
that's a good indicator
of of
of what's how how how we should deal
with this new technology, this new
powerful technology of AI,
and a reason for us to be optimistic
that it's possible to to actually have
this technology and not have it be,
um,
so devastating.
We can set up rules and conventions that
say even though it's possible to use AI
in this way that we shouldn't and we
should all agree on that. And anybody
that skirts the line on that,
you know, there should be,
uh, uh,
ramifications to it to to disincentivize
them from using it that way.
Yeah. I I hope you're right on that. It
seems like it's something that we're
going to as a society deal with more and
more as this stuff becomes more
advanced. All right, so, last one for
you.
You've been at Nvidia, we've talked
about a couple times, 23 years. I
already teased this. So,
um,
I want I just want to ask you, you know,
the technology's been in favor, it's not
been in favor. Uh, you know, you you're
at the top of the world right now, um,
even though, you know, there was some
hiccup last week, but whatever. Doesn't
seem like it's going to be a long-term
issue. Just what's what is like one
insight you can tell us, uh, you know,
that you can draw from your time at
Nvidia about the way that the technology
world works.
About Well, first I can tell you about
how Nvidia works and the reason I'm
here,
uh, I've I've been here for 23 years and
this will be the last job I ever have.
I'm positive of it. When I joined
Nvidia, that wasn't the plan. I thought
I'd be here
1 year, 2 years max.
And, uh, now it's been 23 years.
When I hit my 20-year mark,
um,
Jensen at our next company meeting had
rattled off a bunch of stats on how long
various groups have been here, how many
how many people had been there for a
year, 2 years, and so on. When he got to
20,
there were more than 650 people
Wow. that were at 20 year. Now, earlier
I had said when I joined the company
there were about a thousand people.
So, this means that most of the people
that were there when I started
at
when I started Nvidia were still there
after 20 years.
Uh, I wasn't as special as I thought I
was when I hit my 20-year mark.
And so, this is actually a very strange
thing about Nvidia. We have people that
that have been here a long time and
haven't left. It's strange in general
for most companies, but particularly for
Silicon Valley tech companies, uh,
people move around a lot.
And I believe the reason why
we've stayed here through,
uh, through all of our trials and
tribulations and whatnot is because
fundamentally,
uh, what Jensen has built here is a
company where people come to do their
life's work.
And we really mean it. Like, you feel it
when you're here.
This is more than just just about, um,
making some money or having a job. Like,
you come here to do great work and to do
your life's work.
And so, the idea of leaving just
it feels painful to me.
Uh, and I think it is to to many others.
Um, that's what's actually,
I think, behind why, despite the fact
that Nvidia's had its ups and downs,
and you can go back, um,
to look look at our stock chart going
back to like,
uh, the mid-2000s. We introduced CUDA in
2006.
And that was a really important thing
and we stuck to it. The
the analysts and nobody wanted us to
keep sticking to it, but we kept
investing in it and our stock price took
a huge hit and it was flat there for a
long time, flat or dropping.
And then it finally happened. AI was
born on our GPU. That's what we were
waiting for. And we went we went all in
on that and we've had ups and downs
since then.
Um,
we'll we'll continue to have ups and
downs, but I think the trend is going to
still be up into the right, um, because
uh, this is an amazing place where where
people who want to do their life's work,
the best people in the world at what we
do, want to do their life's work, they
come here and they stay here.
Yeah. Well, Rev, look, it's always, uh,
such a pleasure to speak with you. I
really enjoyed our time together at at
headquarters. It was a really fun
[music] day. We did some cool demos and
I appreciate that. And
I'm just thrilled to get a chance to
speak with you about this technology
today. It is fascinating technology.
[music]
It is cutting edge. Obviously brings up
a lot of questions, some of which we got
to today. Sure we could have talked for
3 hours. And I hope to keep the
conversation [music] up. So thanks for
coming on the show.
Thank you for inviting me and hope we do
talk for 3 hours one day.
That'll be great.
All right everybody, thank you for
listening. Ranjan and I will be back to
break down the news on Friday. Or pretty
a lot of news this week with Open AI's
deep research coming out. I just paid
$200 for ChatGPT, which is a lot more
than I ever thought I would for a month,
but that's where we are today. So we're
going to talk about that and more on
Friday. Thanks for listening and we'll
see you next time on Big [music]
Technology Podcast.