NVIDIA's Auto Play and the Future of Autonomous Driving — With Danny Shapiro
Channel: Alex Kantrowitz
Published at: 2024-07-31
YouTube video id: tAjfSE-l1aM
Source: https://www.youtube.com/watch?v=tAjfSE-l1aM
nvidia's Automotive VP is here to speak with us about the state of autonomous driving and how the latest AI Innovations translate to the car that's coming up right after this welcome to Big technology podcast a show for cool-headed new want's conversation of the tech world and Beyond well look the car is a computer right increasingly car definitely looks a lot more like a computer and we're relying on technology more and more within the vehicles and so of course we got to speak with Nvidia any a perfect person to do it with us here today Danny Shapiro is here he's the vice president of Automotive at Nvidia someone I spoken with before privately but we're so excited to bring him here and have him share everything that Nvidia is working on with you and also give a real important picture into the state of autonomous driving which I think is one of the most interesting Tech innovations that we're in the midst of currently so Danny great to have you here welcome to the show hey Alex thanks so much for inviting me really excited great so let's talk at the beginning just looking at the big question how far are we from fully autonomous driving you know they're operating on our roads today um there are Vehicles Robo taxis even some trucks that are truly driverless they gotten to the point where the safety driver can be removed and they're operating um some as part of a revenue program as well for those those companies it's not widespread in the Bay area where I am I see them on the roads all the time I've ridden in them it's an amazing experience it really is transformative uh but I I think we're we're seeing a lot of the technology come down into consumer Vehicles so you have driver assistance systems things that are originally based on the technology for self-driving but we're bringing it into Vehicles like Mercedes like jaguar Land Rover vehicles like Volvos and a number of other brands all over the world are integrating Nvidia technology um to enable much safer driving on the streets before we get to a fully autonomous yeah and I had experience in San Francisco last summer where I spent a bunch of time riding in Whos and then riding in cruises and we had uh Cruz the Cru CEO at the time Kyle vote come on the show and talk about how his plan was to 10x Cruise rides every year going forward and of course we know it didn't happen they ran into some safety concerns and he's out and it looks like their Ambitions have scaled back you know pretty dramatically although Maybe they ramp back up again but I'm curious to hear your perspective of like what is the difference between uh a wh and a cruise I mean not not specifically like um going into the technical details but you mentioned that we have autonomy already so why is it taking us so long to and I know so long is relative but so long to uh have that spread from one company that does it really well to every car through the economy is it a safety thing is it a cost thing like what is the roadblock now well for for us and I believe a lot of our our partners safety is the primary concern I think if you look back to 2016 when a lot of predictions were made everyone was talking about 2020 was the year and so from a compute standpoint from a software development standpoint that really looked realistic I think everyone underestimated the true complexity of being able to make sure you get it right virtually all the time and that's what's really challenging so the basics are easy when you can drive down the freeway and cars are all going the same direction there's no pedestrians there's good Lane markings that's really a solved problem but as soon as you really introduce the complexity and the the anomalies that come about from human behavior people either falling asleep on the roads or driving recklessly or being impatient or road rage or whatever it is um it's really hard to predict that and creates hazards for the self-driving car so I think what we're doing is we're seeing a whole new wave of innovation now part of it's based on you know same fundamental technology from chat GPT these large language models more of an endtoend system that's able to look holistically at the whole environment around the car uh and be able to anticipate and predict what other drivers will do and understand how to react just like chat GPT you can say anything now and it knows how to respond it's kind of it's quite amazing so while chat GPT is a large language model and using a form of generative AI That's text in and text out what we're doing is applying that same type of of algorithmic approach um and training of the neural networks really the Technology Innovation from that we're doing with training video in and imagery from cameras or sensor data to be able to understand that environment and then determine what is the best um course of action for that vehicle to safely navigate yeah I was going to ask you what the path forward is given that we know that okay this is something that's these models are struggling with so you think that this is the the path forward that the way that you just described you look at the Innovation that's taking place from from players out there companies like wave in the UK have put together a really amazing uh underlying technology that lets the front-facing camera be interpreted in a way that the system can communicate to the occupants of the vehicle what's going on outside the car and also use that then as a way to determine what is the car going to do how is it going to steer accelerate or break so it can interpret that video feed and explain that there's somebody jaywalking or somebody ran a red light or there's a child waiting to cross or whatever it may be uh so this generative AI approach is really I think going to accelerate the adoption very recently at a conference called cvpr that's computer vision and pattern recognition takes place annually uh last month it was in Seattle Washington and there was a competition that they held so it's a research um conference over 400 entries into this autonomous driving Challenge and it was basically looking at sensor data and trying to predict the best trajectory for the vehicle moving into the future Nvidia submitted and won the challenge our research team had developed a new large language model basically endend training of that sensor data system for then controlling the vehicle and so over 400 entries Nvidia came out on top with this new large language model type of approach and I think really that's where we're seeing a lot of innovation now and instead of having a lot of individual neural Nets that are trained on lanes and on signs and on pedestrians and all these individual things a more endtoend approach looks at the entire environment and can then understand what to do in the case that maybe there are no Lane markings right so again I think as you look at how Dynamic the environments are how there's no standard of street lights all look the same or signs are all the same or Lane markings even are all the same this endtoend approach of a a really new type of N Net um is really going to help get us to that point where we don't worry like we used to worry about those end cases that haven't been explicitly trained I think this is important what you're saying so basically if I have it right the current set of autonomous driving vehicles at least the based layer of Technology they use they have a bunch of different basically artificial intelligence systems picking out each different feature in the road and they combine that to eventually make their predictions of where to go but you're saying that the The Cutting Edge today is not these individual systems but one system that looks at everything and then is able to predict I think what what's really key about being safe is the combination of diversity and redundancy so you want backup systems but also you want a variety of different algorithms so I think we're seeing a layering of different Technologies and these um things that you're you're looking for Lane markings but if you don't find the lane markings there's an endtoend Network also that's there to guide the vehicle and determine what to do um and we have have neural networks now that are looking at signs and can interpret complex signs so if you're trying to figure out can I park here or not the sign uh net will actually be able to read that sign and understand is it a Saturday is there street sweeping or or whatever it is it it has context and so the complexity of these networks is is quite elaborate as well trust me I could have been saved a lot of tickets if I had something in that so what you're saying is basically with if people have cars with this technology embedded they'll be able to pull up to the curb and it'll tell you no you can't park there that's right so it's interesting to me that Nvidia is developing a lot of this technology like I went to your Automotive uh section on the website and was like wow there's a tremendous amount of models coming out of Nvidia I thought it was largely the car makers like the Whos or the Teslas that are developing the autonomous technology so how involved is NVIDIA in developing these models itself and then who's the customers that's it's a great question um we work with hundreds of automakers truck makers Robo taxi companies um software startups the sensor companies the mapping companies it really is is quite uh an ecosystem that we've built we're not creating the vehicles but we work with those manufacturers and so we offer the compute Hardware that's our Drive platform so that's the brain that goes inside the car our drios is the safe operating system that's part of that package we have a lot of different middleware and libraries that they can use to develop their applications algorithms the neural networks um that application layer though is generally built by our customers so a mercedesbenz or a Jaguar Land Rover Volvo Neo in in China and so they can pick whatever parts of the software stack they want and in many cases our customers are taking the whole stack and they're developing um some of their own algorithms as well so there might be uh a pedestrian detection algorithm from Mercedes running along a pedestrian detection algorithm from Nvidia and uh we collaborate on that so starting through the end of this year the introduction of the the new CLA U it's already been announced from merceda that's the new Mercedes model the C-Class and um so every Mercedes will be built on Nvidia drive with the software that we've developed and rolled out by Nvidia so it starts with that C-Class vehicle and then we'll go through their entire line over time do you sense a time where eventually Nvidia will be able to build like using some of the technology that you've discussed already build the software suite that empowers autonomous driving and then just it's plug-and playay for these auto manufacturers um go that's essentially what we're doing we're making the software available so we're developing the whole stack and really it's a three computer problem we call it so you have the computer we just talked about inside the car so that's the drive platform that's a very high performance energy efficient Automotive grade supercomputer uh you plug all the sensors into that and it's it's purpose built for a vehicle and it's going to operate in the the heat of the Desert Sun um it's going to operate um in very cold temperatures you know in Alaska um unlike your phone which if you gets too hot or cold it'll it'll shut off so we have to make sure that the temperature range um will work that the shock and vibration the the dust environment so all of that goes into making this computer Automotive grade but then in addition to that we make the computer that's used to train the artificial intelligence that's our dgx that's the supercomputer and so we have a huge business in terms of the automakers building out their own data centers or using you know Cloud providers like Azure AWS um and that Oracle to train and then we also have our ovx that's our Omniverse computer for simulation so again that's another data center solution for first developing and then testing and validating in simulation before the software even goes into the car so Nvidia is the only company that has these three computers and it's really this whole life cycle of developing testing and deploying the software and really it's a continuous flywheel just like you know your phone gets software updates all these cars are designed to get software updates and get smarter and smarter over their life yeah that's wild you know Danny a lot of folks think of Nvidia as just like the chip company for AI training and I'm always like it's a little bit more than that and it's just wild that more than just in this one discipline it's deeply involved in pushing The Cutting Edge forward with autonomy absolutely I think you're right people tend to focus on what happens in the car not realizing uh there's so much work before you get to that point and so much development um so what's great about the customers that work with us it's a single architecture it's the same uh chip technology that's in their data center that they're training on that we do testing it's called Hardware in the loop testing so we actually test the whole software and Hardware that goes in the car we can test that in the data center first in these virtual environments called a digital twin so we create a model of a city and we simulate the camera the radar the liar signals that are detecting everything happening around the car motorcycles cutting off vehicles or pedestrians JW walking and so all of that can be tested before we actually even put it on the road so it's very efficient uh it's very safe to test it that way and um it really helps create a much better product in the end yeah I'm going to talk to you a little bit more about that in a moment but first I want to ask you a techn another technology question because it sort of seems like there's two different schools of thought when it comes to building autonomous Cars one is I'm just going to call it shorthand the wayo school of thought which is that you need like a gazillion sensors and cameras and your car is going to look like a submarine and it's going to cost hundreds of thousands of dollars um but it's going to work pretty freaking well and another school of thought I'm just going to call it the Tesla School of thought is that you just need a few cameras and then eventually you'll be able to train machine learning models to the point where you can run your Tesla autonomously without a ldar who do you think is right I think they both have merits and the the reality is I mentioned before this diversity and redundancy is really how you get higher levels of safety so cameras are great but they don't work in all conditions and so when you combine radar when you combine liar you have the strengths of many different types of of sensors and they complement each other so I think we see um in the case of wayo is really a higher level of um you know security and confidence and that redundancy that comes from the system and so they're they're operating fully autonomously with no drivers whereas um I do have a Tesla and it's quite remarkable being camera based but every once in a while I still need to to jump in and and grab the wheel so it's not there yet can it get there um I think it it probably can eventually but uh it's not there today how was the latest full self-driving update like I said it's pretty good I rely on it every day um I I use it I'm you know it is it is still considered Beta And so I'm I'm watching it I'm in the industry so I'm curious about each software rev and what can do and what it can't do um but it's it's quite remarkable and there was a report in the Wall Street Journal this week that talked a little bit about some of the deficiencies of the way the Tesla operates and points exactly at this issue right that it's decided to not use lar for instance some of them have Radars but Elon basically wants to do everything cost as cheap as possible Right seeing it in SpaceX you're seeing it in Tesla and it's because these systems are just good enough uh that people trust them that we've seen some of the tragedies happen there was a video that they put out the journal put out that showed effectively a car driving at night and there was an overturn tractor trailer blocking the road and because the car's computer vision models hadn't seen effectively like black you know dark underbelly of the of the truck dark evening um couldn't pick it up because it hadn't been trained on enough examples of this and the car went right into the um right into the tractor trailer but you but thinking about your answer you think that this is just a temporary thing that eventually they'll be able to figure out these sort of um outlier scenarios over time I think if if the models can be able to detect that there's a physical obstacle even if it doesn't know what it is then it will be able to take the the right action but again this is where having the diversity of sensor data um becomes a really big differentiator right now talking again about what Nvidia is doing internally it's pretty WI so you're actually simulating collisions is this something that you do in that sort of world that you talked about where you go through these absolutely what we try to do is um create the scenarios it's less about um you know creating accidents but creating the scenarios to ensure that the systems uh will do have a safe outcome um there's no way to ensure that there's zero accidents in the world right there's just there's there's always can be crazy stuff that's going to happen on the road and no human driver could could avoid something falling right in front of the car or somebody getting pushed in front of a vehicle or something like that but what we want to do is be able to anticipate all that and be able to avoid it or mitigate um what would happen um in one of these hazardous scenarios one of the things we're able to do actually is um record drives that we're taking and then use that as input to create a huge range of different scenarios permutation on that and test the software and so we can actually capture cars in a scene and make any one of those cars in the scene via autonomous car and see how it would behave so we're building a massive database of scenarios and ways to test and validate that the technology is good the other thing that we can do is we can take accident reports and now using um these large language models we can input these uh these accident reports and be able to create scenarios from text input explaining what happened or um if there's a map or something like that right hey um I need to shut a window so we're gonna have to cut for a second I don't know if you hear the background noise it's really getting loud here yeah so let me just let me just shut the door okay okay that's better Okay cool so then how important is simulation and training anything having to do with autonomous driving yeah simulation plays a really really big role in ensuring the safety of the system there's really no way that first of all driving around collecting data you rarely are going to see the dangerous scenarios the hazards the things that you know very rarely occur you're not going to capture them on your data collection so we need to use simulation and really what we call synthetic data generation to create those kind of scenarios so we can create fake potential hazards things falling off of trucks and people running across the street at night um whatever it may be somebody running a red light and so we can create that data to augment the real data for training the AI and then we can actually simulate all these dangerous scenarios to ensure that the system will do the right thing and the benefit of using simulation to it's repeatable so we can adjust the soft Ware and test something that maybe didn't pass a month ago but we can run it through the same scenario and see oh yeah we fixed that um there may be situations you know it often happens that the sensors um are blinded by the sun right as it's setting right the sun is like coming right into the the eyes of the the car into the driver's eyes into the camera's eyes and so you only have a few minutes a day where you can actually capture that data as well as test and so in simulation it can be sunset 24 hours a day so we really can control that environment and we can create rain snow fog uh it's really remarkable and it gives us the ability to test those things that again may never be seen in the real world so if you're driving a real vehicle trying to test it um in autonomous mode you may never know in simulation we can be sure it's going to work before we release and you also have a A system that you're working on that sort of appears within the car that helps you like navigate tough things like for instance um like it will look around it's called L I believe and if you're in New York and you want to make a ride on red it will be like Hey listen like can't right on red is illegal here or if you're in Mexico and like you're taking a hairpin turn you know there's like some traffic patterns that allow you to kind of go outside the lane to take a peek and then come back in and it will help you figure out what's going on with that I mean I'm about to go to Ireland and I definitely need something like that to tell me what side of the road to be driving on so talk a little bit about the the progress there so I think you're referring to a a video series we put out called Drive labs and what we do is each episode we sort of pick one little piece of technology that's part of this huge software suite that we're we're building and so some of these things come out of our research team so they might not be in our software today but it's sort of a preview of what's coming out soon um and so what we're able to do is is train these systems and create these different modules essentially that become part of the software stack that our customers can use so how they decide to bring it to a customer sort of is the decision of a Mercedes or or a Volvo or others but um the the technology is there so that we can U train it on what are the laws of a particular region I mean the signs the light signals the lane markings in different regions of the country or around the world differ quite a bit and so basically creating these large language models for each region that understands what you're allow what the car is allowed to do and not do and so it well um it could give you an alert if you try to if it's been implemented in the car it could give you an alert if you try to turn right on red in New York for example it's perfectly fine to do in California all right let's talk about robotics so to begin with it seems like you were talking a little bit about how when you're driving around with a car that has all these sensors you start to be building a world model is that helpful for the development of Robotics and is NVIDIA already sort of taking what it's learning from cars and putting it into robots and what it's learning from robots and putting it into cars absolutely yeah we have an entire robotics group that s sits alongside our our Automotive Group and there's a lot of related aspects to the problems if we think about driving a vehicle well the whole thing is we don't want to hit anything right we have to understand the environment we have to sense it we want to plan how we're going to maneuver and then we act we actuate the vehicle and drive it but we don't want don't want to touch anything robotics is almost the other flip side of that is like robots they need to interact they're going to grab something they want to touch something but you have to do it very delicately but the ability to sense plan and act is really the same so so much of what we're doing is related um you know car is really an autonomous car is really a form of robot it's got wheels and it drives around some um robots that we're doing in terms of factory robots still might be um autonomous machines that are are roaming around like at at a warehouse or in a factory others are you there's an arm that's moving around I think that the key thing is again this three computer model of training the systems of simulating them in digital Twins and then deploying the software is the same in both of these cases uh we're doing a lot of work in factories uh all types of factories but also factories to build cars and so companies like Mercedes-Benz like BMW are working with our teams um to develop this Factory as a digital twin first so it's a full simulation of the entire Factory all the robots the workers the assembly lines the trucks pulling up the you know Logistics of moving parts around all of that is modeled and run digitally in simulation before they even build the factory and so the benefit of that is you're not halfway into construction you realize oh wait a minute the arm that has to swing here to to take the body of the vehicle and rotate it around isn't going to clear the ceiling we need to raise the roof another two feet right you plan all that in advance before you ever build uh the factory and you can really optimize your layout so digital twins AI are huge part of planning how robots are actually going to interact and then we train those robots in simulation so that you can then take that software just like you do in the autonomous car we load that software into the robot yeah it is so interesting thinking about it like car your One mission is don't touch anything and robot your mission is interact but I imagine both um type of uh Technologies are building models of the world trying to figure out what's going on is there shared is there is like a shared universe that both the Robotics and the automotive elements of Nvidia work on absolutely there there's a lot of shed technology so this is the strength of Nvidia as a company we're relatively small company for the impact that we have um on Industries and around the world so the engineering team that's developing a lot of the core hardware and software is leveraging across groups from Automotive to robotics to healthcare you know example is we're developing pedestrian detection algorithm that same core Tech can be used to detect cancer in an x-ray or a CT scan just computer vision yeah it well it's it's it's AI it's deep learning so those same techniques it's just different patterns of data it's trained on different types of data different modalities but the concepts um are essentially the same and that's where we've been able to go to market in many different Industries with the same basic architecture hardware and software with purpose-built applications um or devices but again the core technolog is leveraged across so many different groups if you're looking for where Where to drill for oil right you have seismic data that you can then apply deep learning to to figure out where is a pocket of oil buried miles below the Earth so it's the same conceptually the applications are are markets are totally different right now let me ask this how does the company incentivize these divisions to cooperate because I imagine you're like the Automotive Group and you have one set of goals you're working towards and they're the robotics group and they have another set of goals they're working toward now if you work together you could probably both help each other to the point where you're going to both exceed those goals or do better than you would have in a silo but a lot of companies work in silos whether it's by Design thinking about Apple don't you know if you're working on face ID and you're working on Automotive uh Road detection don't talk to each other you're on two different projects and maybe you know wink wink that's why the automotive project within Apple failed and then other companies incentivize silos just by the incentives in terms of like your performance review if you were you know coming up short even though you were collaborating you came up short you get this grade you don't advance so how does Nvidia address this that's those things you describe are not the the culture of our company I think one of the the um principles that were found on as one team it's all about Nvidia first and so the the individual group kind of comes second and in fact the notion of the group is kind of dynamic and we really don't have much of an orc chart in the company Jensen says the mission is the boss and so we have these virtual teams there's a lot of um cross functional work that goes on um people have different roles and responsibilities and might be working on a variety of different things and so um it's really all about how to what's the best thing for the company and working across groups is really rewarded um and in part of the way that just the culture is we want to help each other and the whole company succeeds as opposed to hey this is my thing I own this I'm just going to focus on this so it really is is part of just the culture of a company and it embraced throughout and Jensen is constantly looking at okay if there's if he finds two different groups that are doing related things he's like you guys get together and figure it out together we don't need to have two separate uh programs going on here but let's let's pick the best and so there really is a huge collaboration that goes on throughout the company so I'm wondering whether another collaboration is going to be or already is yours with the groups that are working on AI models generative AI models because the biggest limitation with generative AI has been that it doesn't really understand the real world I mean at least with text now you might push back on me on this but this is what we've heard in on the show it which is basically like there's only a small amount of human knowledge that's been codified in text and the rest is just being out there in the world interacting using your eyes figuring out what gravity is right you can't really understand that from from text I mean you can read about it but you don't really experience it until you're out there in the world so I'm curious how these real world interactions whether it's something like a autonomous car something like a robot might be used to advance the knowledge that we have with large language models today I think the um the thing that we're seeing is you know if you bring up the point of chat GPT and how there's a lot of the information out there that it gives back you it sounds good but it may or may not be real right um and hallucinations and those kinds of systems and so what we're doing is really working on how do you um create these Foundation models U how do you put guard rails in place how do you create a system that can be relied upon and um uh you know these retrieval augmented and generative systems the rag systems they get trained on um specific data sets and so I think that's really what we're doing is taking the large language model but being able to create um you know kind of smaller versions that are purpose-built for specific applications so as we think about having some kind of an avatar in the car that's your concierge we're going to train it specifically on everything to do with that car that car brand that car model and the kinds of things that you would want to do in the car so it's not necessarily going to know about current events it's not going to know about other kinds of things but it's going to be able to handle what you need to do while you're driving in the car and be an interactive thing um that will be just a great experience and in the healthcare there may be other things right we have assistants that are going to be trained based on on you know known medical information but not trained on things out there on the web that have to do with medicine right so again we're going to start with the data that's used to train these systems and then ensure that it's a much better experience for the user right but how I mean how important do you think emerging of those two knowledge bases will be like the understanding of the real world that potentially Automotive can do and Robotics can do and the understanding of the text I think it's happening I think part of what we do is we're able to model physics right so that's the key thing so we can model gravity we can model how things interact we can model um you know motions of different types of of materials or fluids or or things like that so it it comes down to mathematical modeling of of the real world and that's a big part of what we do yeah have you thought about when it comes to sort of the electric vehicle field just like the immense progress that China is making I mean what's your view there I've heard that you can buy electro electric vehicle in China for $10,000 whereas in the US if you could do that that would be a CA change here so what's your view there we work with a lot of companies around the world we have number of customers in China they're doing remarkable work in terms of the development of EVS but also uh driver assistance systems and autonomy with proo taxis um so they're it's a big Market it's the biggest in the world uh and it's a big market for NVIDIA so we work closely we have teams um over there we work in Japan in South Korea um throughout Germany UK and of course North America as well is there anything the US can learn from the China Market to price our cars better or um I'm not an expert on on Battery Technology but I think there's there's certainly you know economies of scale and some things they're doing to to bring the cost down I think there's also a lot of government um support that that they're getting in China as well okay I guess like last question for you how long do you think it's going to be until anywhere let's say the US you can open an app and hail a robo taxi I can do it today anywhere though I can't do it in New York anywhere so um I mean I don't know if it's always gonna be anywhere right I mean you sort of there's got to be a a market for it but I think cities will make a lot of sense the suburbs in some some areas yes in the rural areas it's going to be um a challenge it's like you can't you can't get an Uber everywhere today right so you can't get a taxi everywhere um but I think in major markets it's it's very soon okay can't wait for it Danny Shapiro thanks so much for joining great to see you great to see you too thanks Alex all right everybody thanks so much for listening we'll be back on Friday with Ronan Roy breaking down the week's news we'll see you next time on big technology podcast