AWS VP of AI and Data Swami Sivasubramanian — GenAI's Growth Potential
Channel: Alex Kantrowitz
Published at: 2024-05-01
YouTube video id: DmUjdOXGw9c
Source: https://www.youtube.com/watch?v=DmUjdOXGw9c
we're going to talk about some of the big pressing issues in generative AI including whether generative AI has that much more room to run given the constraints on data compute and even energy we are joined today with by a very special guest Swami CA subra Manan is here he's the VP of AI and data at AWS and we're so lucky to have yet another executive from Amazon web services here Swami great to see you thank you so much for coming on hey uh thanks for having me really looking forward to our con me too and there was a moment in the podcast that I watched recently and listened to recently where Mark Zuckerberg was talking about the potential constraints on the further building of generative AI systems and the training and the inference of them and he said you got to ask Amazon and actually here we are we're with Amazon so let me set it up for you and then I'm going to ask you the question okay so he's on this podcast with dwares fotel and they're talking about whether we're going to hit a wall with llms because if you think about llama 3 which met a just release which we just talked about uh it trained with 10 times more data and a 100 times more compute and dwares was trying to get Mark to say well where's the limit here in terms of you're going to you know press you're pressing resources on computer you're pressing resources on data are you going to eventually hit an end and Zuckerberg of course says that energy is the real question right that it's it's a fact that if you're going to train these models and run these models you need very big data centers uh and he talked about the fact that you might even need a gigawatt data center which is effectively as big as a you know a nuclear power plant and dwares and this is the the point here and then we're going to let you talk dwares was like didn't Amazon do this and Mark Zuckerberg says you have to ask them so I'm curious what you think about the constraints of data when it comes to training large language model models and then maybe you can talk about your efforts since uh Mark Zuckerberg suggests that we ask you all right uh first of all uh just to give some context I mean uh I'll just say AI is first uh has the most transformative technology and potential uh for us it is the most transformative technology and uh the reason we are having this moment and all these uh conversation is because uh the Advent of Transformer Technologies which is uh the new neural net architecture which helps us build these large language models that uh can learn from huge amount of data but also be able to uh actually scale it up that means you can throw more computer and more data and then it naturally gets better and better so that is uh the reason why now suddenly AI is having its moment now uh to get to your question where does it stop where is the this is what researchers call it as like scaling loss saying like is it actually constantly are we going to keep throwing more and more compute and more and more data or is there no more compute left or no more data left in the world and uh this is where uh few things uh suspect is going to happen one I don't think we are anywhere uh close to yet uh hitting that wall yet uh so I do think um we are going to actually see lot of net new Innovations uh when it comes to optimizing how to actually train these models in parallel and get better utilization so that we can continue to actually build these models uh really well I mean AWS actually for instance uh we have invested in Technologies uh such as uh trinium 2 chips which actually can deliver up to 4X fast training and uh also it is while improving Energy Efficiency up to two times as an example so Innovation and compute Innovation is compute is only one now the second thing is uh right now many of the scaling laws and others are also built upon the Assumption of a certain kind of architectures but if you see already when it comes to how these models are train and if you see there are starting to see new kinds of architectures that are going to change how these models scale in the future and I suspect uh that is going to change uh how these uh scaling losss are going to be reshaped as well when it comes to that that is going to drive net new efficiency gain as well and uh so compute the architecture of the model architecture of the model but you still didn't answer my question about energy and so I want to let me just put a put a fine point on it right yeah have you built what is the biggest data center that you've built to train these models in terms of I don't know me megawatts or gigawatts um and do you think that there comes a point where energy becomes the biggest constraint on these things because that is exactly what Zuckerberg is saying and this is a guy who has in his back pocket 650,000 Nvidia h100 gpus or equivalent so I think he has an idea of you know what the resource constraint might be after you take care of compute yeah think compute and energy are going to be uh along with data are the three dimensions where the stealing constraints are going to be um but focus on the energy stuff yeah the energy stuff is a constraint but U that is why AWS if you see when you offer compute we are not just offering server we are actually offering a cluster that uh comes with all of it and the uh this is an area where uh um we have always been um investing a lot not just to actually procure more energy but also to be more energy efficient as well so I think U there is an inherent actually what is going on with the industry though is especially in the AI space is the rate of scaling and demand is so incredibly high that that's why you starting to see energy uh consumption needs are also increasing in a big way and uh we are actually innovating in big way not just in uh building our data centers and capacity and uh doing more on this front and we have made lots of announcements even uh including uh today on uh some of the new regions we are expanding and so forth uh including in Indiana and whatnot but uh the other aspect that is equally important is also how do you actually make this energy sustainable and that continues to be a big priority for us and right no everybody agrees that making it sustainable is important but I want to get back to this this constraint here is there is there I mean it seems like what you're also pointing at here is there is legitimacy in this idea that the industry is sort of coming up against an En energy constraint uh yeah there is actually uh all this going to be we are going to be compute and energy constraint when it it comes to this uh but this is where I do think uh that constraint also is uh contingent on that um certain things around how these models are architected and so forth that is where I'm saying so it is uh currently all these going to be true and this is where uh cloud is actually making it better that is the Nuance uh I'm actually talking about and then net new architectures are going to be better net new chip uh designs are also going to make it better so I and so we should talk a little bit about the way that llms are going to be rearchitecturing an important issue here but now going back to these two Legacy problems of data and compute uh and and you think about the way that these models have grown you know just the Llama example for instance 10 times more data 100 times more compute I mean they had clusters of I think 32,000 or 24,000 something some crazy number of Nvidia chips put together so they they you know and that's from some of like the healthiest compute resources in the whole industry the data they also started effectively running out there was a story in the New York Times where um internally at meta they were saying if we don't get more data we're never going to catch up with chat GPT and even started to consider buying uh Simon and Schuster or spending $10 a book to get the data necessary to train um so I guess this is the question for you because you're in this position here right again like it's so cool to speak with someone at AWS because you have all these resources and you know how the data center works this is your your bread and butter so before we get into the new architecture just quickly yeah is the traditional way of training these llms is that going to hit a wall because you know it seems like there's fairly little data left to use and the compute might be kind of running out also so is it sort of like not a nice to have but a must have that new architectures are necessary um I I uh I actually think uh new architectures are absolutely necessary in the future uh and uh there are already hybrid architectures evolving I mean uh I mean um in terms of State space models to actually connectors Hybrid models between State space to uh Transformers and so forth uh but also actually how we are training some of these large language models with even the current data how do they get reinforced and all all those things there is still lot of room uh left in comes to how these models are getting trained uh and uh you already are seeing actually even uh meta for instance uh published a book it is all about actually high quality data than uh lots of data uh and then being able to Cherry bake and do the right cleansing and so forth is number one and then the second thing is also it is not about bigger and bigger model when you look at actually what Enterprise as one because your view is purely on just model providers who are trying to build model but take an Enterprise uh who wants to use these models to actually build an chatboard agent or being able to morgage provider like rocket mortgage uh and so forth that they want to actually test it in production it's not about actually picking the most capable model but it uh in terms of model size or so forth but they want to be able to take a model and customize it with their data using different techniques uh and then being able to get business outcome so for them it is not about the most capable model but it is the most flexible model that actually achieves the tradeoff between their cost accuracy and latency needs and then uh picking that so right this is where I do think um our service like Amazon Bedrock which actually helps with uh not just actually always saying the answer is always the biggest model but instead uh making it super easy to pick from an variety of state-of-the-art models it can be like anthropic or uh Amazon or AI 21 coh here meta mesal or any of them but then making it super easy to evaluate for their use case and then build actually an retrieval augmented generation with the right set of guardrails is going to be big deal and that's why we are already seeing like tens of thousands of customers actually um leveraging rock and the key thing and this is where it's super easy for us to get all this inamed about what is the most uh capable model but to put these llms to work to solve real world problems you got to actually take some of these models and then customize it and the end result is not the biggest model it is actually a much more customized smaller model or a distal model to solve specific business problems and that's why is important well I sort of disagree with you on that I I mean I agree with you that the application is important but the thing that's most important of course is how this stuff continues to uh evolve and get better right uh you wouldn't be satisfied just saying let's stop with what we have now no actually that is not what I'm uh saying uh I'm actually saying uh that you're not hitting a wall from an uh customer perspective who are actually building or innovating with llns because there is not enough data I'm pointing out that even with these llns if I'm actually uh millions of developers who are trying to build amazing applications what is important is taking these llms and being able to customize it and uh uh that is the point uh and still there is so much Innovation and value left to be done there so that is the uh point I'm making and because those set of data is not accessible to any of the model providers and so forth and there is still lot of actually Innovation that is net new possible because uh those are actually insecure environments that are not accessible because most of these models are trained on public data are publicly available or that can be procured but you're not going to you're not going to get like a step change if you start changing with trading with rocket mortgage data you might just get a better application for Rocket morgage you're not going to advance the field of AI by getting mortgage data in there I uh I AG I'm not saying it's not important yeah you go ahead yeah but each of these Enterprises each of these startups are actually start uh going to see a step change for their application that I agree with yes so that is the and all I'm pointing out is that is going to be an equally important element that should not be overlooked if you imagine every Enterprise suddenly got uh 10x better when it comes to handling L customer claims are actually our developer is 5X more productive or so forth that is equally important and uh I also think as uh AI field actually continues to get more and more better and better when it comes to actually training these models or even changing the architectures to actually be more efficient uh in terms of how they are actually learning from these data and so forth how we are reinforcing you're also constantly going to re ourselves so just look back six years ago uh even before Transformers came about we thought actually deing uh was actually order of magnitude better than uh uh how uh traditionally machine learning was done and then we found a new kind of scaling architecture with uh Transformers uh and I have no doubt that uh we are going to actually start seeing some such things soon as well now going back to the way that these models evolve can you talk a little bit about how we have actual like the smaller models perform better than the last generation of bigger models is it does is it just like the training is different you talked a little bit about cleaning the data that goes in there um I'm curious what would be involved in that but how are these smaller models performing better than the bigger models um there are a few I mean that notion of what is small and uh what is large uh we always tend to uh I mean because it's relative because our definition of small and large keeps changing but equally important is there are multiple ingredients on what makes a great model one is actually around uh how much compute uh it is uh we have thrown but also how much data that we have uh actually used to train these models and how do they get reinforced uh so that the model knows actually it is to delivering the right uh actually responses to certain questions or so forth uh all these ingredients right and then how do you iterate based on some of these feedback uh uh as well these are some of the ingredients that make into uh how a model provider trains uh these models and uh if you see even if uh someone is actually training an uh much uh smaller model when it comes to parameters but instead they actually have trained on huge number of tokens or so forth with better recipes on reinforcing there is a case to be made uh that uh they are able to actually get higher quality results uh the second when if you see interesting research coming out of Stanford or others is also when it comes to novel architectures uh that they are talking about uh that they expect to see also equally compelling um models uh that are actually going to lean uh learn faster while actually with the same amount of data as well with potentially even 10x cheaper inference time and uh so uh that's why I'm actually very optimistic about the future when it comes to novel model architectures which uh has the potential to disrupt the space in a bay and that's why I keep saying no one model in will rule the world in the future let's talk a little bit about your decisions about partnering versus building in house versus throwing other models out there um do you I mean do you you have a large team building your own large language models in House at AWS and you also have a very big investment in anthropic can you talk through a little bit about what the logic is between kind of going at it in both ways yeah I mean uh so uh if you see um the history of AWS we always believed uh in customer Choice uh even when we actually started with uh and when I started the database team at that time at that time the world used to think databases actually like uh it has to speak SQL and we kind of pioneered the art of uh hey actually customers in fact want different models I mean different databases for different use cases and uh we ended up actually uh ping it and the rest of the industry um and AWS now has the diverse set of portfolio on the database front and rest of uh industry completely pivoted that way and when internally now going to the Gen world and uh so forth internally as we were building bunch of these gen applications and we saw what it takes to put together actually a model uh I mean an N Eng application well there is a notion of one powerful model that you need at hard of doing some of the things there are a bunch of different machine learning models uh that you need uh such as um you need a model to ingest your data to do like vectorize your data and do retrieval augmented generation you need a separate ranker model you need actually a separate model for guardrails you need an uh API orchestration to do software agent and Automation and so forth and each of these models vary across capabilities price and performance and uh what we actually wanted was uh um this letter believ that no single model will be ideal for every use case uh in a way and especially because in this space we are super early customers are still learning and experimenting with different models uh and that's why at Bedrock uh we actually started the premise that uh we are going to actually offer not just our first party but also third party large language models and other foundational models as well like anthropic and uh meta mestral coher and stability and 21 and others and if you see uh we uh 3 months um after G when uh after we launched bedrock and made it generally available we saw many of our customers um actually using more than one model um because they were still learning and uh I'll just give an example of Thompson Reuters uh Labs recently said that having the ability to use diverse range of model was like a key driver for them to innovate quickly because the space is evolving so much and uh as well that's why I actually think uh this is uh what is important is not just the individual model but also the ability to customize for various use cases and that's where we are starting to see these patterns emerging strongly as well you happy with the anthropic partnership uh anthropic has been a great Partners uh with us and uh they are uh um the producers of one of the most popular models in uh right now and uh especially if you see anthropic uh what they have done with Cloud 3 I mean they are amazing Innovation when it to uh um I mean uh their reasoning capability and they have really been a thought leader and also Innovation and uh responsible Ai and um so we are very excited about their uh partnership and uh some of models on Bedrock has been incredibly powerful and uh popular with our customers as well and on the the end anthropic will also use AWS trinium uh and inferentia chips to build train and deploy their future models and uh as well and um we have been super excited to see also uh where uh in terms of the art of possible and we are already starting to see huge uh adoption and uh customers um as well through our joint Innovation fascinating um I have a couple questions for you based off what you just said first of all you talk about trinium chips but you said can train a model four times faster than the industry standard yeah so have you heard of curious to hear your perspective on how things are going there I mean obviously like the reason why people like to go with Nvidia is because the software is something that they like to use so I'm curious to hear your perspective on how tranium techs up there and then there's also this other new chip that's called I think it's called Gro not like the Elon Musk Rock but grq that apparently is training these models like unbelievably fast so would be eager to hear your perspective on the grock as well um first on uh uh trainum uh trainum two chips uh are are designed with the sole purpose of being able to train these large language models and highly optimized uh for training use case and uh and I mentioned trainum 2 chips they are actually can deliver up to 4X faster training than the first generation trainum chips uh while improving Energy Efficiency as well and uh we also have an separate instance family optimized for running these models inference as well called inferentia so the ec2 instances that are powered by our inferentia um chips that deliver up to 50% better price performance uh per watt over comparable ec2 as well and can reduce cost by up to 40% as well so and um interally I mean when we are uh training these uh models we seeing great uh actually um performance and um well we also I mean while I can share some of our early customer feedback on some of these because tranium 2 is not generally available yet uh but it has been super compelling and super promising uh as well because uh the scale at which uh it can train us been uh very very fascinating um as well thoughts on grock or you can't com gr uh I have not actually looked at it deeper for me to offer much uh on it uh as well you talked a little bit about reasoning through our conversation and you know there was this whole Kur fuffle around open AI where they developed this qstar model that apparently had reasoning in it and we still never got the full story there um so what what is going on because from my understanding with reasoning uh it's a different skill different type of teaching than just spitting back information now because it goes through the steps that it has to you know take in order to come back to you with an answer so I'd love to hear your perspective on what The Cutting Edge of reasoning looks like and it seems like it's already in production in some of the models that we're talking about yeah yeah I mean uh take uh when it comes to reasoning um let's talk about where the state of the art with respect to reasoning because reasoning means different things to different people so the ability to actually take an problem which has logical steps uh and then break it down into a sequence of uh actions which is what these models do as like here is the plan I propose and then being able to break down into let's say five or 10 steps and then get feedback and quickly iterate and then produce and work out but uh in a domain like software development these models are starting to get U pretty good uh i' say there are benchmarks like in the soft domain something called the swe bench and you can already start to see some of these models are uh getting pretty good score when it comes to being able to given software development tasks like uh for instance today if you take our J assistant Q it can actually take an uh a task which a developer will say hey I have to work on it today or the next couple of days and like okayy change this error condition uh for this use case to with this error condition and uh and that is the description they put in their uh get repository and then uh give it to Q it'll actually consume it and then generate a plan and then get feedback from the developer and uh uh say the developer is your condition which their code needs to change is not right can you actually do it this way and then it will update the plan and actually submit uh generate the difference in code saying like here are the changes I'm planning to make does it sound okay and then it can actually get back uh and um all these are possible today and uh so uh I actually think uh this reasoning capability is going to actually get uh better and better being able to generate a plan and then being able to iterate with um and uh um uh fellow expert a human and help them assist to get input on the plan and being able to do and uh I think it is a compelling capability if you look at it uh I talked about software upgrades being uh with uh done by Q the reason that is possible is again it's uh not just reasoning uh capability of uh these llms it is also now we can couple reasoning capability with actually compilers and actually reinforce each other so when we actually ask uh so we don't just ask 11 say here is uh jdk 8 and convert to jdk7 instead actually say here is the thing I'm trying to accomplish and then we actually also leverage compilers and other Technologies to actually play off of each other and iterate back and forth and that is why this uh Q capab to do things like code transformation is extremely remarkable uh in a way that it helps Amazon in a big way so that's why the uh uh reasoning capability is not if we view it as purely like everything is going to be llm driven vs industry are going to be very very um not satisfied that's why it has to be actually more R trative and where we work with lln as one of the ingredient it is not the answer for everything thing right okay in terms of where this Computing is happening so obviously a lot of is happening in the cloud right now but Apple has WWDC coming up in just a few weeks and I bet they're going to make this big play of doing uh building LMS and running LMS on device uh there was a story in Mac rumors today it said uh Apple releases open source AI models that run on device I think they released eight of them that run this is what Mac Rumors say they run on device rather than through Cloud servers it's called open Elm efficient language models uh what do you think about this push to run on device and if people do start running them on device does it sort of how does it impact the cloud business uh first of all uh I do expect uh the model inference to be happening on U multiple areas this is no different from what happened in the Deep learning world and let's just look at even uh where Alexa for instance uh it runs in deep learning model on the device uh right and actually it does certain decisions and then it does what is like a hierarchical inference uh where it knows to Route certain queries to the cloud and certain queries can be uh answered locally and in fact Alexa actually is a very P powerful model which is a tiny model for like uh the Alexa Auto uh and so forth so I expect lens to evolve in the same way where there are a lot of uh decisions especially if you have a powerful Computing device um which uh many smartphones and others uh to be having that you can actually run some of those uh simple llm things at the edge and you will St but there is still going to be a huge amount of uh inference where you need more data that is available elsewhere or more capabilities and so forth so I expect this hierarchical capability and hierarchical inference to happen in the same way how it played out in the Deep learning era as well this is uh some of the other reason why actually don't view it as you put a big model behind an API and then uh give data on uh as question in the way these generative a applications are going to be built is actually like series of uh things chained together some of it is uh model inference some of it is actually workflows and uh um and whatnot and that's why that pattern is going to be important and there are going to be different models and different data connectors and they all are going to be working together to produce an very seamless experience from a consumer perspective so if hypothetically a big mobile device manufacturer tells us that on device large language models are the future we should view that with a bit of context is that what you're trying to tell us Swami uh it would be hard for me to comment uh anything uh hypothetical but uh what I can say is there are always going to be actually inferences happening in different uh layers of the stack and it is uh going to happen for different uh but there are going to be use cases that are very very good and just uh happening at the edge uh but there are going to be huge amount of use cases and uh as well uh happening on the cloud uh and those both those use cases are always going to exist okay great I want to end uh with a little discussion about open source llama 3 is running through bedrock I believe or AWS day one um what is uh your view on open source compared to these other close sourced models um dangerous good helpful just one of the you know one of the pack what's your thoughts um so well we are big supporters of uh actually uh enabling uh access to these models and uh um I mean companies like uh meta and U mol have been innovating at an incredible clip in some of these faes and uh with the publicly available uh weights and so forth um uh the key thing and both of them are doing incredibly well is also doing it in an uh thoughtful responsible way to put the appropriate amount of guard roils and um Asal so uh that's why I actually think uh this responsible Innovation uh even in open source is a good Trend and uh um and I expect that to continue especially with uh thoughtful Innovation from some of these providers and those are some of the reasons why we partner with them to even actually make make these models accessible to all developers as well can you actually build the guardrails into the models and can't people override them a little bit once they're open um these uh these models have built in guardrails and uh they also uh and in areas like Bedrock we also layer actually guardrails on top of these models uh as well so that imagine if you're an Enterprise wanting to build an uh customer Service chat B for insurance you want it to be only talking about uh your data and uh specific insurance and not answer questions related to weather or uh something else uh right and uh so you so Gods when we think of God Els uh we always think of the extreme scenarios but for many Enterprises it is about like hey I am actually trying to keep the topic constraint to this so that I can do certain way and uh so that's uh those and you want it to preserve actually like uh private information or sensitive information and various other things so that's why uh on top of some of the guard rails of the individual FMS Bedrock guard rails also is another layer that helps developers and companies actually build those get those capabilities out of the box as well and uh yeah we see uh up to 40 84 uh% Improvement in uh forther accuracy U because of those capability as well cool all right let's end with some fun ones um are you thinking about building a nuclear power plant to power AI training um uh I am not thinking about it myself but I can't speak to my infra team uh as well do you think $7 trillion is the right amount a little bit more or a little bit less than we might need for chip companies that are focused on building AI um I actually I've never done the math myself to know if this uh uh but um but I would say all depends on what time scale you're talking about uh right inflation these days it could be next year that that's relatively cheap yeah uh AGI before or after after 2030 I'd say uh intelligence uh is an uh a never ending Pursuit um I think AGI is a very heavily overloaded term and widely misunderstood so that's why I actually don't use that word uh I think uh intelligence is going to be constantly getting better but it is going to be human assisted uh uh intelligence assisting humans is going to be the future that's my guess yeah and fi finally um I didn't mean to demean rocket mortgage earlier uh just to be clear I think that this stuff is really fascinating when it comes to business and it seems like it'll be hand inand they will provide the capital necessary to advance the St state-ofthe-art and then the state- ofthe art will learn from them as it moves the next models forward does that sound kind of right to you or uh I actually I mean generative AI and it's uh such an infancy space with uh so much uh happening in this uh domain uh and uh there is lot uh of um I mean bright Minds working on this space and uh our goal is to actually enable all the bright Minds uh to be able to First be able to interact with it and actually get the best out of these technology that's why in aw yes we are not just enabling model Builders to build these models that's where we spend a lot of time talking about like how do you actually enable them so that they don't hit a wall and so forth uh but this technology won't reach its potential and you won't have that uh virtuous Loop or feedback cycle unless there are enough people able to consume and build remarkable applications uh using these models uh otherwise we are building it with no means uh to do something useful with it and that is the other Mission we are very excited about and uh those are some of the reasons why we actually have buil things like Bedrock which is used by tens of thousands of uh companies already or with things like Amazon q and so forth this virtuous cycle of actually uh companies and organization uh getting useful value out of it and being able to leverage these models to do amazing things it will sound triple simple as like what rocket marage was uh able to do in terms of automation or what bridge water is able to do or what fiser is doing from like scientific and medical content generation to manufacturing process Improvement to Dana farberg Cancer Institute is uh using Claud on Bedrock to develop a new research solution the potentials are endless and that's what is exciting to me in this space um as well and uh I actually the good news is we still just getting started um and uh it's uh truly day one year spot on I agree and I do think you know as I spoke with Matt Wood about and speaking with you it's just clear that Amazon is just in every layer whether it comes to foundational model running other companies foundational models um helping companies build their own technology and Advance it in their own ways uh smart way to do business thank you for coming on and speaking with me Swami and I hope we can do it again soon really good to get a chance to chat hey me too hey thanks again Alex so thanks for having me you bet thank you everybody for watching and we'll see you next time on big technology