NVIDIA's Artifical Intelligence Moat & Origins — With Bryan Catanzaro
Channel: Alex Kantrowitz
Published at: 2024-02-28
YouTube video id: 0hQJZh_Fzzw
Source: https://www.youtube.com/watch?v=0hQJZh_Fzzw
the Nvidia executive who started its AI push joins us to talk about what makes the company so indispensable unpacking the secrets to its success that's coming up right after this welcome to Big technology podcast a show for cool-headed nuance conversation of the tech world and Beyond we have a great conversation for you today we're going deep inside Nvidia with Brian katanzaro he's the vice president of Applied deep learning research at Nvidia and he's going to be speaking at the company's forthcoming GTC conference March 18th to 21 in San Jose and his conversation is going to be specifically on practical a AI agents that reason and code at scale so if you're in the area you're thinking about heading out you can mark that down let's get to the conversation Brian welcome to the show thanks I'm glad to be here great to have you you're kind of the guy that kicked off this whole AI push within Nvidia well you know it took a whole company to to transform Nvidia into what it is today so um there's tens of thousands of people that deserve a lot of credit but um I was uh honored to be one of the people that helped Nvidia get started a long time ago exactly so maybe we can get into that in a bit but I first want to talk to you really about what makes Nvidia so indispensable to developers building with AI I mean that's been everything that's been going on recently just comes back to the fact that Nvidia has these capabilities right and I think that a lot of folks hear about it they understand it philosophically we know that Nvidia has the technology and that other competitors have you know have struggled to catch up but let's just focus on Nvidia for a moment what is exactly happening within your company what do you offer that allows anyone that wants to build Ai and then run AI models to do it effectively Nvidia is an accelerated Computing company and wait wait so what does that mean accelerated Computing I hear you guys talk about it all the time we've been talking about it for a long time and the world still doesn't quite understand it so I'm glad to to try to explain it um the idea is that the world faces a lot of computational challenges that can't be solved without faster computers but uh building a computer uh is not enough in order to actually deliver acceleration all of the pieces have to line up and plug together and be fully optimized across the entire stack so that uh people have the chance to do things that they just couldn't do otherwise computationally and AI is a great example of that you know training and deploying uh the awesome generative models that are changing the world right now is extraordinarily computationally intensive it's the biggest computational challenge the world has ever faced and uh the reason that Nvidia is uh is is providing something useful here is because for for decades we've taken on this mission of optimizing the entire stack to build um software algorithms libraries Frameworks compilers systems networking chips the whole thing and optimize them together for the most important workloads uh and and that's AI okay and so when I hear you recite that you know list of different things you do you know the the laying knowledge that would come in with with and hear that like if I've if I'm hearing that for the first time I would say wait a second no Nvidia just does the 100 chip so it's obviously more than that so can you talk a little bit about like people I think you know they might get it wrong that it's just the chip and that's the thing that gets the gets the headlines 20,000 to $40,000 Facebook has 350,000 of them by the end of the year that's what people need to run AI but it's it's actually and I think this is the important part the ecosystem it's not just right the the chip but everything around it so just talk about that as you know in layman's terms if you can like what surrounds that chip that Nvidia provides well um there's a there's an entire culture about um what is accelerated Computing we have to be very strategic about what we're going to accelerate U and we have to make decisions years in advance about how we're going to build the software and Hardware that's going to provide this acceleration um I always like to say that accelerated Computing implies decelerated Computing because it's not actually helpful say I'm just going to make a fast computer everybody makes a fast computer right that's the goal of every computer is to be fast so acceleration really is about specialization it's about being able to focus and prioritize and say this is the workload that matters most and I'm going to optimize the entire stack for that workload uh so in order to do that we have an enormous amount of um Hardware software and algorithms that that we're working on in order to enable the community to do things that they could never do before sometimes we like to say that we're building a time machine for scientists and Engineers uh that allows them to see into the future because of the acceleration that comes from our systems so it's interesting because if you think about like a traditional chip company and by the way I might be totally off base on this and correct me if I am but what they do is take you you the manufacturer will buy the chip put it inside let's say their computer right and then build the software around it but you guys also build you not only manufacture the chip but you build the the algorithms and the software that surrounds it so that enables companies to get the most out of it so just three questions on that one is that a right characterization of what you're doing um and then well let's just start with that one I'm not going to yeah I I think so I mean the the core thesis that powers Nvidia is that a chip could never be enough you know um just just the same way that a chip couldn't be enough for my Apple phone for example you know Apple makes awesome chips but the experience of using my phone uh is a lot more than the Chip And you know the way that Apple's able to vertically integrate and optimize um their entire system in order to create an amazing consumer experience I think is is pretty incredible and and super valuable what Nvidia is doing is uh not the same but it's related in the sense that we understand that the value of the technology we create is only understood in composition in context it's really about are we delivering acceleration transformative acceleration to the most important computational workloads of our time so why couldn't other companies then just go and build their own software to train using Nvidia chips or other chips because it seems to me like correct me if I'm wrong here also but if I'm reliant on nvidia's software it's closed Source right so I'm going to train my model with it um it's kind of difficult to switch to another chip so why why particularly rely on the Nvidia software yeah I mean we we have a lot of open- source software as well as some closed Source software um you know we we make the decision about what to open source based on what we think is going to help the market most but I mean the reason why people work with us is because uh we deliver transformational acceleration we enable people to do things computationally that they just couldn't do and we know that it would never be enough to just provide a chip that said it was really fast and had like a lot of operations per second inside the chip because the the gap between um you know what a particular chip can do and what the experience of a science scientist or engineer trying to invent the future uh that that Gap is quite enormous um and uh if any one of the links in that chain whether we're talking about systems design or networking or data center design or the compilers Frameworks libraries um applications algorithms you know if any any of those links um are to fail the acceleration is lost and the value therefore is lost and so Nvidia has a unique way of approaching this problem uh co-optimizing the entire stack in order to deliver that acceleration to the end uh scientists and researchers that are trying to the future and that's what differentiates us from other companies now uh could other companies do that I mean absolutely it's it's uh it's not a secret in fact we've been shouting it from the rooftops for decades that this is what we do and that it's different from being a chip company but you know we're we're continuing to test that thesis you know is there value in accelerated Computing that's above and beyond uh what you get just from U making and selling awesome chips and and I think the answer to that is yes and you know I think that's the reason why we've been so successful right and to me I think this is just for anyone listening like I spent the whole week making calls on Nvidia trying to figure out because I was wrong I thought that like the everything would kind of slow down this year and I was like speaking to customers and analyst tell me exactly what I missed and and this was the thing that I underestimated which is that it's not just the chip but it's the chip the software and everything that goes along with it and that's why the company has been so successful so let's talk a little bit about like what it actually goes into training an AI model so do you have companies let's say an open AI or whoever it might be that says okay I'm ready to train a model do they then get in touch with Nvidia and be like this is what I'm looking to do and then you help them figure out how many chips they need what pieces of software they need to to train and everything else that comes along with that we have um a really great set of relationships with um institutions around the world that are building Ai and we're always trying to um help them get the benefits of accelerated Computing uh in whatever way makes sense to them obviously um every institution is going to have a different um perspective and what they're trying to do and they're going to have different secrets that they need to keep as part of their strategy about how they go to go go to do that and we respect that uh but at the same time you know we do try to help them understand what's possible uh with our systems and make sure that you know they're actually getting the the transformational acceleration that we expect um so we do partner pretty closely with with a lot of uh important customers I think one of the things that's special about Nvidia is that it is a supporting and sustaining uh kind of interaction when we work with our customers um uh Nvidia technology is integrated into the heart of many different companies from amazing uh AI institutions like opena I to um uh very uh established uh companies that do manufacturing or consumer products um or self-driving or you know basically every aspect of of the world's economy Nvidia is able to um uh provide technology at the level that um makes sense for for companies to use uh if they'd like um us to to provide uh just systems and they want to write all the software um you know they can write as much software as they want if they want to use all of our software uh you know that's great too um and we're just trying to help support and sustain uh all of the different companies as they use AI for their own work right and so let just let's walk through a little bit about like how this actually happens so let's say I'm an organization uh I come to Nvidia say I have a bunch of data or maybe I even don't have data and I'm looking to build a large language model um what do I do now de that's a great question you know the the first thing that's on my mind is like you know what data center are you going to use to train this model in um and uh you know that's that's a really important question uh because it turns out that uh the AI Market is growing pretty fast because there's so many institutions that are training these huge models and you actually have to have a building to put these machines in and they're they're not small uh and you need to hook it up to power you know so that that would be one of my first questions is like okay are are you ready uh to stand this up or um are you going to be working with a CSP like for example AWS you know um and we love we love to support our customers um through through Cloud providers as well okay and so then what happens next let's say I'm set up you're set up okay so um uh we will definitely point you to our reference implementations of the various uh llms and their training uh setups on uh these clouds we'll show you um how to scale it to you know many thousand gpus efficiently um we'll we'll tell you what kind of speed you should expect uh to get while training the model and we'll um also you know uh discuss things about reliability um you know how do we make sure that the the job is actually progressing properly and yielding you know intelligence um you know so we we we definitely um help our customers with things like that um I think also then when the model's trained uh there's a question about how do we deploy it you know and we we'd love to um uh to to help people deploy AI as well I think um uh Jensen said in the earnings call this week that somewhere around 40% of our data center um gpus were we're going for inference which I think is um you know pretty amazing and and definitely a shift from where things have been a few years ago um and so uh we're spending a lot more time helping our customers accelerate the deployment of these models as well making sure that they get you know the best um uh uh speed uh so that they can you know get get as much out of the systems they're deploying these models on as possible and what are they using the models for um I think that you know uh language models are starting to be used in a lot of different parts of uh a lot of different companies um things like question answering um uh I I think a really important um to help help uh people understand uh answers to their their specific questions especially relating to um private data stores that um that they need to answer questions with um you know I think we're we're seeing a lot of uh people use uh Ai and office type settings I don't know if you've um interacted with Microsoft co-pilot at all but um it can be really helpful uh at least to me when I'm looking at um a summary of a meeting and what the the action items are for everybody at the meeting um you know uh another sort of office automation uh tasks um you know we're also pushing forward with uh the use of these models for our own internal work at Nvidia we have a project called chip Nemo that is um uh using language models to help our chip designers and and uh verification efforts be more efficient as we build our own um products it's called chip Nemo yeah is is that similar I mean I was just speaking with service now there's ALS they're also using an program from Nvidia called Nemo to that's correct yeah and that's what it is it's they use it that's like what is that software that they use to propel their question answering yeah Nemo is is kind of our um uh uh most userfriendly uh open- Source software for training and fine-tuning language models and and other kinds of conversational AI um it it also has a lot of speech capabilities as well and uh you know we've been building it for quite a few years because we believed that uh conversational AI was really going to transform uh industry we wanted uh to make a platform for companies to build their own build build and deploy their own conversational AI so that's what that's what Nemo is and uh so when we talk about chip Nemo we're talking about using that uh for our own own chip workk wait yeah how so how do you use it for your own chip work um at the moment a lot of it has to do with improving communication between uh chip designers so uh you have like a thousand people working on this project and there's a lot of interfaces that need to be described and um you know people have questions they don't know uh exactly who to talk to so basically we're making uh uh knowledge bases about our own work that then people uh can use to answer questions um and we found that that um it's kind of like having a a more senior engineer uh that you can talk to all the time that that helps you uh find the things you need to find in a in a huge code base um and so so that's that's the primary thing that we're doing right now is is augmenting uh the engineers on the team with kind of um superpowers to understand our own code better and and interact with it better uh over time I expect that chip Nemo is going to uh do other things as well um you know improving the quality of our designs you know our hopper gpus for example have a lot of circuits in them that were designed by AI that we built ourselves uh that have better um speed um and power and cost characteristics then uh we knew how to build with any other tool and generative AI uh programs designed some of the chips yes hopper Hopper uh is designed with generative AI that's insane it's wild yeah so let's dream a little bit um obviously we we know that that like knowledge repositories inside uh companies is something that this stuff is going to be really good for um maybe a little bit of like uh consumer agents or consumer chat Bots like chat GPT is this where it ends like where do you see it going I don't think this is where it ends um you know I've been thinking recently about um past Revolutions in the media space um you know we we got books uh which transformed Society you know when because we could distribute ideas and we could reference the same ideas in a new way because everybody could read the same book you know um audio you know as soon as we audio recordings that created an entirely new industry you know the recorded music industry which uh continues to be totally vibrant and important to our culture um movies uh TV you know um every and video games you know every time that we come up with a new technology we find a way to explore ideas as humans and explore our culture together uh in in a way that like helps us solve problems better and also creates um a new form of culture that uh that we interact with and I think that um the most exciting applications for AI are ones we haven't really even dreamed up yet in in the same way that it would be it would have been hard um to imagine how books were going to change the world uh back when Gutenberg first made the Press uh you know uh I think uh uh AI is going to create a new form of media that is much more interesting much more engaging much more useful um and ultimately we're going to use that to refine our ideas and explore them together in the way that we have with other other media it's just going to be much more interesting and useful when I hear you say media leads me to believe that you think that this is going to be more of like a an agent or or a digital friend that people will start interacting with right because that's me media unless is is there something else or something else I could be thinking about I'm not thinking about that could take the form of media yeah something along those those lines I mean I I think you know I'm expecting AI is going to change the lives of all of us here on planet Earth and when I think about how 8 billion people on this planet live you know most of us aren't reading and writing that much you know um but we we do love uh Virtual Worlds people love interacting in video games we love interacting with each other and I think um that the primary way that people are going to interact with AI is going to be in Virtual Worlds um because I think that's going to be the most natural way of interaction in the most useful way and I think we're going to perceive that as a new form of media that that really touches you know all aspects of of our work and our play you know it's it's going to be uh something new so you're you're a real believer in this metaverse Vision that you'll just kind of end up in a digital world and the people and the scenery will all be AI generated or maybe mostly AI generated and and you go um I think that people um we we have a culture it's very important to us you know the the the ideas that we share together and the sort of shared Humanity that we have is more important to us than uh the content of uh the the things that we're interacting with so for example um AI is probably going to be really awesome at playing soccer but do I think that people are going to go to watch Robots play soccer um even if they're robots are kicking around the ball better than humans I don't think it's as interesting because I don't think that it is related to us you know I think the primary thing that we're interested in is ourselves we're trying to understand ourselves and and how we relate to other people I think AI is going to give us new ways of doing that um I think we are going to be uh interacting in in Virtual Worlds you know Nvidia has been uh uh a big believer in Virtual Worlds for you know the past 30 years it's something been on gaming before you were on AI and and we've had this in itive called the Omniverse uh long before uh uh meta renamed itself um because we believe that uh simulating the world and providing um you know virtual agents a place to interact with people um is is hugely important to the future of technology and you know I I see these things coming together I think um there's a lot of opportunities to use uh Virtual Worlds to make AI stronger to teach AI how to understand the real world and and act better in the real world and then of course um uh giving humans the opportunity to interact with AIS in much more natural and useful ways I think a lot of that's going to happen in a virtual world and is that the next place that we go with this like the world models like we just saw meta put something out where like it's uh generative or it's not even generative software but it's AI software can kind of guess like what would happen if you black out like a certain frame in a video um is that is that where the next stage of this goes I think that's going to be really helpful um you know these these sorts of world models um you know I was really impressed with the uh open AI Sora project uh this week as well I mean really fantastic results and I'm thinking about you know uh how these things work together so you know the Sora project if you read their post they talk a lot about how building a world model is going to help uh make artificial intelligence more useful because it it you know it it's going to understand how things interact in the real world and then it's going to be able to use that to to make better decisions in order to do things um so so I think that's great um and I also think it's uh the other way it around is also really important that you know having a world model then allows us to synthesize a world which then uh allows Virtual Worlds to be richer more interesting more interactive and I think that's hugely valuable and how do you train a world model like with text I get text right like you put the text in it spits the text out but teaching AI a way way to understand what the world looks like is completely different yeah um usually the way these work is that there's some sort of implicit learned representation of the world we can call it a state space but it's basically like um uh we're trying like imagine if you could write down uh every attribute of every object in the world like where it is how how it's moving what color it is you know if if you if you're able to write down uh very precisely where every object is uh then uh you would have a good way to draw the world right because you could you could take that representation and then just turn it into a picture but then you could also use that to simulate you could ask a question like okay if I took um this particular action uh you know what would the updated State space look like you know so like for example if I if I uh swing the baseball bat when the ball's right there and I hit hit the ball you know then that's going to be a different future than if I um swing the ball when the when the swing the that when the ball's not there and I I miss it right so um with a with with a model with a world model like this you can kind of ask those questions and and then uh simulate uh how things go forward in time now one of the tricky things is that um writing down very precisely all of the state of the world is basically impossible you know it's uh it's way too complex um and you know this this is for example well known in weather forecasting right the idea that like a butterfly flapping its Wings in Japan could you know magnify over the this the course of time into like a hurricane on the other side of the world right um because very small um changes in the state space of the world could actually have pretty large uh outcomes and um so so one of the ways that these learned neural network models are dealing with this is that rather than having an explicit representation of everything in the world uh it's all being done implicitly so the model learns both a function that's that's able to go from uh some implicit representation of the world to to drawing it and also from that implicit representation into the future and and then also um you know is learning that representation from uh from the data it's trained on directly so it's all learned and that way we don't have to actually try to um describe the world to the model it it learns how to describe the world itself and I'm going to regret asking this question but how does it learn that um uh it starts to get metaphysical for me a little bit you know uh these these models are trained um using stochastic gradient descent so it's um what we're trying to do is um you know uh fit the data that we're given as best we can by um taking a lot of really small steps to improve the model so um you know it's kind of like so gradient descent is is kind of like um walking down a mountain the idea is that like the the fast fastest way to walk down the mountain is just look where you are at any moment and find the the direction that's pointed downward the steepest and and take a step in that direction now you can you can tell that this this algorithm isn't very isn't very smart right because if there's a canyon depending on how big your step is you might actually step over the canyon or you might you might get caught you know going back and forth when really you should be going down the canyon and in the spaces that we're optimizing with you know let's say a trillion Dimensions right right um these kinds of effects are are are really interesting and and and difficult to understand difficult to to relate to but um the the thing that makes this work is that we don't actually have to be super precise about how we update the model for every bit of data it learns we just kind of do a rough guess about like this data is pointing the model um to fit in this particular direction we'll take a little step that way and then we're we're going to do that a lot you know and and that's what we call this stochastic part so it's it's a little bit random um but uh it's actually kind of a a great Philosophy for life I think that um you know you could spend an awful lot of time trying to be very precise about what direction to go to make things better but often the right thing to do is just make a guess take a step and then re-evaluate what the best direction is next and then do that a lot you know just just be really iterative really flexible don't be too wetted to you know the idea that you have about where to go at the beginning of the process just let the process kind of guide you as you as you uh walk through it and that's that's the algorithms that we use um uh to train all neural networks um you know so the specific contents of what the networks are learning are difficult to interpret we don't have a lot of tools that help us understand that uh in the same way that we don't really understand um how our own brains work you know we we don't really understand all the things that are happening inside of our heads in order to allow us to think um it's too complicated uh for for our our analytics at the moment thing is it works yeah it works and we didn't build the brain but we did build these systems and it's working and we don't know how that's happening which is which is wild okay I want to talk about uh reasoning I want to talk about robotics potentially and um and a few other ways about how companies are going to work with Nvidia and what might be coming down the pike so we're going to do that right after the break coming up right after this and we're back here on big technology podcast with Brian katenzaro he's the vice president of Applied deep learning research at Nvidia uh Brian just to start off like what made you think okay AI is going to be big enough that I should get to Jensen CEO of Nvidia and say we need to really work hard to make this part of our core offering I had been spending my um research career at at Berkeley as a PhD student on uh the future of computing and um we knew that Computing was going to have to change back in 2005 or so it was obvious that computers would have to be different the standard way of making computers wasn't working anymore we would have to be more specific we'd have to be more parallel and so I had been spending my time as a grad student thinking about what kinds of applications could take advantage of uh the computers that will be possible to build um but then are going to provide enormous amounts of value to humanity and at the time AI was not a very big field um uh and it wasn't actually super popular to work in it but when I was thinking about it I felt like um from first principles it made sense to me that this was something that had the potential uh to really change the world and you know nvidia's approach to uh solving this I think was also you know fairly careful and iterative you know so um you know I I published my first paper in 2008 on um machine learning on the the GPU uh and Nvidia uh really jumped in full steam ahead uh for the whole company to to become an AI company in 2013 so it took about 5 years of um sort of testing that thesis like is AI actually going to be something that that could really change the world and we started getting some early indicators of success you know um one of those was of course the um uh imag net competition in 2012 um which really shocked the world with the quality of results and wasn't wouldn't have been possible without uh accelerated Computing you know the U results that they got um uh were so incredible because they built a very fast system for training neural Nets and uh train wasn't generative right that was just identifying what was in PH that's correct yeah it wasn't generative at the time but you know um the the idea of generative AI is is fairly old I mean when I was a grad student generative AI was um a thing that we talked about all the time it's just that we weren't using neural nets for it we were using other models like graphical models these are other mathematical approaches that um are a little bit more clever um but don't scale as well and um so this was this is another part of the thesis that that um I had is that you know the the thing that's really going to help AI succeed is um scale you know if we can apply huge data sets and huge amounts of compute to AI then the results are going to get much better and this is this is also controversial um back then and and even today some people really don't like this idea because uh they would like AI progress to be mostly uh held back by our smarts like our mathematical um skills in in like coming up with more clever models to describe our data in the world um but it it does seem uh these days that there's a lot of evidence that the most important thing is having really good data sets to learn from and then enormous computational scale and so that was my that was my thesis and I you know I I was advocating for that at Nvidia I wrote this little um prototype of a library for training neur Nets on the GPU which uh then became CNN which was our very first um library for uh for AI on the GPU and um you know the process of of uh getting the company to Rally around that and and and build that as a product and ship it you know it took some time um but because there were these um you know early indicators of success that there was a lot of uh demand picking up um even back then uh it it made sense for the company to to really pay attention and then you know Jensen himself is such a Visionary I remember when he first started interacting with me about this back in 2012 I um I felt like uh he was just so hungry to learn you know so I felt like I I gave him all the things that I learned from my PhD in the course of like an hour about like um how AI could change nvidia's business and what uh Nvidia could potentially build and my ambitions for what that meant were like a thousand times smaller than Jensen's were you know um he he took it immediately and then elaborated on it and thought about where is this going um you know one of the things he first said back in 2012 was um uh this is an entirely new way of writing software rather than having humans enumerate all the different cases that software needs to understand we're going to have models that learn from from our data how to solve problems and um these days that sounds like the truth right like we we see that happening every day when we interact with these models but you know uh 12 years ago uh that was a pretty bold thing to say and I was a little bit nervous about it because um you know the history of AI uh over the past you know 70 years had been one of um over promising and underd delivering in a lot of ways which then a lot booms yeah a lot of like think it can do something and then just totally dry it up until it started to prove itself again and so when when Jensen like immediately glommed onto this and started like um uh thinking about what it what it could mean I wanted to slow him down a little bit and be like Jensen like this is a this is a big huge idea but like I'm not sure if it's going to happen now it might be 30 years from now you know um but uh it turns out that Jensen was right about this uh this was the right time um to apply enormous data and enormous compute to Ai and get these results right but 2012 wasn't I mean it took another 10 years 11 years really for the boom to come so what did it feel like yeah go ahead oh I was gonna say I think Nvidia is uh really good at decade long technology Development I've seen that happen at Nvidia many times you know uh Ray tracing I was in meetings in 2008 with Jensen on raay tracing and we launched our first raate tracing GPU in 2018 you know it took 10 years of continuous development and research in order to make uh Ray Trac Virtual Worlds a reality and uh Cuda itself you know uh the the projects that led to Cuda uh they started in the early 2000s Cuda was released as a beta in 2006 this the software that all AI programming is done with on with the h100s pretty much na 100s yeah C Cuda is our our our framework for programming the GPU and making it do stuff that's interesting and um and uh you know that that project um was crazy for a long time you know Wall Street hated it because it subtracted value from our earnings reports like they looked at the costs of our products and they're like these products are too expensive your margins are too low you know back then the margins were quite low and that's because um you know there wasn't the applications and the ecosystem that were using Cuda yet in order for us to uh you know build a strong business around it but uh Nvidia continued investing in Cuda in uh in the the libraries the software the compilers the Frameworks and of course also the chips uh for 10 years you know actually maybe more than 10 years before uh all of a sudden Cuda became an overnight success you know it's like 10 years of hard work that everyone ignored and Wall Street criticized Nvidia for mercilessly why are you wasting your time on this you know everybody knows the GPU is just for gamers why are you trying to make the GPU do something else and you know we did it anyway and you know that's one of the things I love about uh this company I think it's one of the reasons why we're successful at The Accelerated Computing mission is that when we decide to do something we do it out of our convictions about how technology will unfold and we base those convictions on a speed of light analysis about what's what's actually possible um to try to to keep ourselves honest and then you know when once we have that conviction we're able to follow through what did you see in those years that everybody gave up on this I mean obviously there were big advances that were made in things like machine learning right that computer vision and natural language processing and that's where we had Facebook really take the lead as the public spokesperson for this stuff talking about uh image recognition and they even built this um fake generative chatbot called M that I had access to that basically would be like it's supposed to be a large language model we didn't even know it was going to LGE language model is a pre- Transformer right but like you would talk to this bot and it would talk back and they were trying to figure out like what people were interested in if they're going to build a bot and they had this whole bot platform that came out but overall like everyone's telling you yeah this is not worth building I mean it's maybe just one or two companies that are using it so why did you still think that I mean I guess it's hard to predict what happened next but why did you believe that that was going to happen Nvidia really thinks about about these problems from first principles you know we know that um the way that computers are built is changing we know that um because of you know Mo's law is is slowing down that requires more specialization um we know that uh there's a lot of opportunities to really provide transformational uh speedups to important workloads if we specialize the systems and the software for them and uh we felt like what is more important than this you know what's more important than intelligence and does the world need more intelligence absolutely the world needs enormous amounts of intelligence like the problems that we face um as a planet uh I think we're going to need a lot of intelligence to work through them and so uh for us it was I think just kind of an obvious thing to do um we had a lot of conviction we we understood the technology we also saw early indicators of success from a lot of different um directions you know a lot of different companies a lot of research institutions that um were talking to us and saying hey we um we have these goals to like train this huge language model like on enormous amounts of text but you know the the current systems are just too slow and um you know there's this idea you know back 10 years ago there was this idea that unsupervised learning was going to change the world but nobody knew how you know unsupervised learning meaning that rather than having humans go in and label every picture is it a cat is it a dog that's super wise learning we're just going to show the model all the pictures that we can find and the model is going to learn itself something about pictures that then we can use to solve problems you know that idea has been around for decades um but actually turning it into something that worked uh you know that's only happened over the past 10 years and I think it's only happened because of um you know the increases in scale that we've been able to bring to the problem so during that 10 years you know we saw continuous Improvement um even if the rest of the world uh didn't see it I think um one of the one of the things about technology when it's growing on a an exponential curve is that the the beginning of it feels like nothing's happening outside you know so e exponential curves that the hockey stick kind of curve it looks like nothing nothing nothing all of a sudden huge success you know that's that's kind of what the exponential curve looks like but the interesting thing about an exponential curve is that the rate of progress is Con you know it's always getting you know let's say 10% better like every every year it's just 10% better right and so so you can tell that you can see like wow this technology it's continuing to improve even if uh it's not reached the point where it's useful for for the world yet uh we we just have this confidence that it would so people had this basically large you know swas of text and they like we want to build something like a large language model but it just wasn't available yet um did you you guys notice when uh in 2017 the paper attention is all you need comes out from Google which is like the basic so what was the reaction internally because even within Google I'll say this I've spoken with people within Google it was not a yawn but it was like ah okay not a like holy crap moment but I'm curious what happened within Nvidia cuz it's sort of you know your bread and butter absolutely and that that paper caught our attention immediately because of the implications for our entire business so you know I I told you earlier accelerated Computing is not about the chips um and this is a great example of that like if we built a system that is for let's say resnet 50 which in 2017 that was the most most widely you know uh talked about kind of neural network is these image classification networks if if we built systems to accelerate that that would be a really different kind of system than a system designed to accelerate Transformers and so we have to ask as this question you know what's going to be the future what's going to drive uh demand what how are we going to build um the right technology to accelerate the things that will matter a few years from now and so of course we're always asking ourselves that question you know is there something coming along that's going to um uh change the way that people build Ai and if there is then we need to think about what are the implications for the systems that we're building um so yeah we saw that paper um I have to say that the title is a little bit like maybe of a a pill to swallow because you know attention is all you need it's like but is it you know like it it kind of It kind of elicits that reaction from a lot of people but um the thing that was really attractive about um Transformers to us was that we knew that they had really favorable computational properties and um again going back to this thesis that the model is a little bit is the model is less important than the data and the compute that goes into training the model if you have a a model that has really excellent um uh compute properties that allows you to scale uh really well efficiently to you know many thousand of gpus the kinds of results you can get from that um are pretty pretty spectacular so we we saw that early on that's what the Transformer model did that's what this paper attention is all you need sort of architected absolutely and and so we saw that it had the potential to do that and so we were very curious about it and you know um uh in my team we had our own language models team back in 2017 and at the time we were using recurrent neural networks which were the the standard way of doing things before the Transformer paper came out and um and so I I asked an intern uh hey can you uh take a look at doing language modeling with Transformers I'm hearing good things about it uh it would be great for us to have an independent perspective on whether this is a good idea and he came back uh you know a month or two later with just really astonish in results you know it was there was no question that it was better uh than the models that we were using and also that it was more scalable so we were able to to train bigger and smarter models because of that scalability and so um so that was really important um for for us and then you know the whole company kind of paid attention to the way that Transformers were changing Ai and and then started you know Building Systems to help uh make that even better should Google have open sourced it I mean they haven't gotten the most value out of it you know others have gotten more value out of that paper I can't really speculate on uh Google's business or or you know whether they should or shouldn't have done things I think um uh if Google had not open- sourced that or had not uh published that paper um uh but if we started seeing like incredible language modeling results um uh we would have figured out some sort of a model that had good scalable properties that um uh that could help with this um space and you know there's not just one transform variation I think ultimately the community would have figured something out because it's so important you know y I think Google deserves a ton of credit for doing that work first and for publishing that paper so you you basically build the um you know you're building you know for this world of AI you see the Transformer Model come out you shift you you incorporate it you start to see the gpts from open AI is that the next big moment on this journey where you're like oh this could be this could be because it was interesting speaking again about like what people saw from the outside we all knew that Tech you know opening eye was doing text generation but it didn't really click for most people until it became a chatbot so what did it look like for you when it was just like you know you've been watching this the whole time what did it look like on your end yeah well you know I'd been watching open ai's work in language modeling uh since before GPT I don't know if you recall they had this sentiment neuron project um which I thought was really cool because it was doing unsupervised uh modeling of text and then they were able to find that um just by showing the model a lot of text that all of a sudden the model had started to understand high level Concepts about texts like for example what kind of emotion is being expressed inside of this text and that was a really interesting thing because like I said unsupervised learning the idea had been around for a long time that um we would make a lot more progress as a field if we were able to do unsupervised learning but um actually figuring out how to practically get some value out of just showing a lot of data to a model uh it wasn't very obvious to everybody and uh so when I saw that um that unsupervised um sentiment neuron project from openai I thought that was really interesting and they followed that up with uh the G the first GPT paper um which kind of Applied Transformers to this and in the process you know made a much better um sort of text analytics model G the first gpt1 you know it was it was really kind of using a generative model more for classification than for Generation it was more like you know can we use a generative uh pre-trained model to understand text rather than can we use it to create text because at the time uh creating text seemed too hard and then of course um you know gpt2 came out and had uh really astonishing text generation capabilities and not just that but also already had started to learn things about the world that um were very difficult to teach any AI system before remember they have this story about unicorns and South America being studied by some University professor and and the that the model could remember that like in South America people speak Spanish and you know there's a country in South America called Peru and like there's mountains in that country and you know it's like wow the the amount of facts that this model is able to recall after only being trained on enormous amount of text it was really shocking right what do you think when people say that it's just these models just predict the next word don't get too excited about it I mean the what the what you're describing it seems like something more yeah I mean it it's always possible to to get very reductive with systems I mean you could say that um I'm just meat right I'm just I'm just a monkey made of meat and like you know everything that's happening in my head is also Just Energy minimization like there's chemistry happening in my head it's equivalent tell me that yeah love is not love it's just a chemical I think you're totally right yes it's a chemical but also there might there's something more here so you're saying with LMS yeah I mean I think I mean so so the fact that chemistry is involved uh in our own Consciousness doesn't make our Consciousness less interesting to me the fact that like you know Lal networks are trained to predict the next word and and you know and that may not be like the ultimate Way of training them you know we're learning how to do this right so maybe maybe we'll come up with a better way tomorrow I'm not attached to that particular way but I also don't think that understanding a little bit of how something works takes away from the magic right okay so we have just a few minutes left I want to ask you a couple more questions uh chat GPT when that comes out I mean obviously you had already been pretty impressed by GPT 1 and two we're already at 3 three and a half right by the time chat GPT comes out in November 2022 and then this stuff explodes your reaction like what what was it like sitting where you were it was just extraordinary I mean the amount of change that chat GPT brought to the world uh incredible I didn't I thought it was kind of cheeky of open AI to release it at the same time as the nurs conference because um you know usually the AI world is entirely focused on like the cool papers that are coming out at the conference but instead the entire world was focused on this chatbot you know that was doing things that you know no one had ever seen a chatbot do before and uh uh you know to me that was a statement that we were entering a new era of AI where applied research um uh starts to dominate you know so chat PT didn't come out with a fully-fledged academic paper that described exactly what they did to make it so awesome um but because it the results were so strong it kind of dominated the the academic discussion and um I felt like that that was really interesting um in terms of a a water watermark a watershed moment for um uh sort of the maturity of the AI industry you know that that it was now possible um to create systems that would solve problems in ways that we'd never seen before um uh if we we applied some really good engineering and and applied research to it um and so that um you know definitely definitely changed the world and and since then you know uh my world has been just continuously on fire you know every day I open my email there's an new awesome result it's really exciting times and working at Nvidia one of my favorite things about working at Nvidia is that we get to um collaborate with people from all sorts of companies and institutions and we get to sort of rejoice in the good work that's happening around the industry because um at the at the end of the day you know it's it's really exciting to see AI flourish that's our mission is is to make AI flourish everywhere and um and so so when I open my email and see all these great results uh it it always makes me happy do you think that we're going to get to artificial intelligence that's on par or greater than human level intelligence I don't really like that question because I don't know what human intelligence really is um for example I think that cardi B is extremely intelligent um she is able to capture the attention of hundreds of millions of people by doing things that I'm not exactly sure why they're so interesting but they totally are right there's a lot of people that would love to do that but don't have the kind of intelligence that she does in order to to make that work what is cardi B's SAT score I have no idea it's not very interesting to me oh yeah there's books Mars and emotional spars and other forms of Brilliance there's there's eight billion forms of Brilliance on this planet this the thing though these models are getting good at everything right making they're making music they're writing books they're making videos so there's a world there where you could say it can approximate there's a chance I mean getting just to the Baseline of human intelligence is one thing but there's a chance that this stuff can maybe even exceed some of our most talented people all spectrums well it you know um AI has been smarter than humans at at many things for a long time I mean when I was in high school deep blue beat Gary Casper off at chess right did that mean that humans stopped playing chess no actually it changed the way humans played chess it made humans play chess better because humans had new tools to learn they had AI to help them learn how to play chess and the reason we play chess isn't to win you know we play chess because it's part of our culture because it's interesting because we like the challenge because we like the interaction you know that because it's it's what we're doing as humans is exploring you know what what does it mean to exist um I don't think that AI challenges that I you know I've I've been in a lot of rooms with a lot of smart people I don't think that it's necessary for me to be the smartest person in order to have value or to um you know be interested or engaged in something that's going on want to be the smartest person in the room because I'm not learning that way right exactly so I'm not I'm not threatened by this um AI so my thesis is AI has always been smarter than us at some things the number of things that it's getting better better at us is is getting larger but that doesn't threaten me um I'm not worried about being obsolete in the same way that I don't think an oak tree is obsolete what does it mean for a tree to be obsolete like how do you measure the worth of a tree like are we going to just talk about how tall it is or like you know how many leaves it has and count them and say well this tree is worth more cuz it has more leaves and it made that other one it's just not a very interesting question to me well it's so interesting that you're going straight to obsolesence where like some might say this is actually you know if we if AI equals human intelligence it's not a bad thing like maybe there's actually like yeah yeah it becomes a tool for us I I think it is a tool for us um but it is interesting it's the way that it's portrayed will take these conversations often to the obsolescence part I don't I don't really fear that either I don't either you know one person that I I really love his thoughts on this is um Jurgen Schmid Huber and he has said multiple times that a truly intelligent AI is going to be first of all um not very interested in living on the surface of planet Earth because it can beam itself over the radio at the speed of light anywhere um and uh it can live underground in fact underground in different places is better there's more resources outside of the crust of planet Earth where we live and so um I I think that um you know we don't we don't have a lot to fear I think the the scariest thing for me is um you know are we are we going to um you know not figure out how to use this technology um because I think we desperately need it I think our world desperately needs more intelligence and and so that's our mission yeah I've been emailing with Jurgen trying to get him on the show so you're reminding me a to follow up here I don't know maybe you can help help me put a good word Brian great speaking with you thanks so much for joining great great to have you on the show Alex all right everybody thanks so much for listening and we'll see you next time on big technology podcast