Ali Fahardi — Allen Institute of AI CEO On LLMs Room To Grow, New Modalities, Next Breakthroughs
Channel: Alex Kantrowitz
Published at: 2024-04-30
YouTube video id: Wmm4ZHmJGEM
Source: https://www.youtube.com/watch?v=Wmm4ZHmJGEM
hello YouTube we are here with a YouTube exclusive for you uh we know that you love hearing about Ai and we have a great guest for you Ali faradi is here he's the CEO of the Allen Institute of AI one of the world's leading research institutions uh that focuses on AI has been for a very long time even before this whole era of generative AI yes they one of the ogs and they have a relatively new CEO that's Ali of course and so you know we were going to have a discussion just to get to know each other and I thought well why don't we just put it on YouTube so if you have any questions feel free to drop them in the comments uh and if you're watching afterwards feel free to let us know what you think if you like these we'll do more of them here on the channel and if you don't then you know totally understand we'll just keep it to the podcast but always trying to experiment here and uh One Last Thing Before we start we hit a big milestone on the channel today 4,000 subscribers so it's amazing to see the amount of people that have signed up much of the growth has happened in the past year I think we had about a th000 a year and a year and a half ago and so we've added 3,000 in the year and it's great that you're all here and we continue to go so if you like the Channel please share it let people know about it and thank you for being here so first of all uh with that out of the way I just want to say Ali it's great to see you welcome to the show thank you Alex excited to be here and congrats on on the growth it's phenomenal thank you you know 4,000 it's uh it's modest but you know the growth has been nice and uh we definitely continue to see regulars here in in the channel so it's good to see a little Community forming um first question for you is uh you know just thinking about the way that AI is heading it feels like we're kind of in this pause moment now which is kind of funny to describe it as such but like we heard about all this Innovation with llms and then all of a sudden we you know we have a lot of stuff on the horizon that people talk about but isn't quite here yet right agents and you know uh emotional type of sensing robotics so what do you think the state of AI is at the moment um I would probably answer that question in the context of of the progress so far right if you would have asked me five years ago where AI is I would have never predicted today as is so I think the the pace of progress uh probably outpaced even our most radical expectations or the most optimistics among us a lot has happened a lot of um in my opinion breakthroughs happen in AI we know a lot more than before uh but also there are roadblockers there are challenges and there are problems that we don't know how to solve them as a community yeah um so these are key characteristics of any kind of Explorations like most scientific disciplines see these kinds of behavior you see a rapid progress and for a while you figure things out in another Delta function um The Challenge in AI in my opinion is actually sort of the amount of noise and sifting through the noise is actually by itself is a hard problem the the momentum is is huge the the pace of progress is is in my opinion unsurpassed but at the same time there's so much noise around and so much hype that is just hard to see through um but all in all I think natural and expected I don't think we're stalled I don't think we're we're we're blocked in a sense that no one knows what to do there are heart problems there is a lot at stake and a good portion of Earth Resources and talent and brain is actually being pour in this topic so I'm sure we'll see more on this okay but so then answer the question though about the um the state of where we are so okay there's progress there's noise all right anyone could say that so what what do you actually think is happening Fair Point um so where we are today is a mix of what we learned so far right we've learned the role of scale um to me we've learned what you could squeeze out of abstractive quote unquote abstractive or textual Knowledge from the web or from specialized sources the models um have exposed certain Properties or certain capabilities that went beyond our expectation in my opinion uh where we are today is we're in a situation where probably for the first time I might be wrong um are abilities to Alex you lose you or just switch the screens oh no I'm just uh making you more prominent I see yeah so our ability to um to evaluate where we are um has been slower than our ability to generate new capabilities partly because we are by surprise learning about new capabilities the other part of it is um these capabilities are of the form that we as a whole Community are not comfortable with evaluating them so to me state today I would call it actually there's one part to it about evaluation crisis there is no scientific evaluation about where things are and the ones out there have major issues and I think we need to fix that problem collectively as a community um we are in a situation where um there are on a daily basis if not maybe weekly but most probably daily basis people come in and say mine is better than yours and here is one way to look at it um which is again there are so many problems with the evaluation piece so evaluation crisis would be to me one problem that we're facing today the other problem that we're facing today is sort of over the last couple of years AI which was a discipline in my opinion born and rais in open suddenly practiced Beyond closed doors um and I think we deployed the piece of technology that was not ready to be deployed it's phenomenal it's a breakthrough it does amazing things but I think we still need there's there's there technical technological gaps there research problems that we don't know you alluded to some of those and we we deployed things at a phenomenal Pace about a technology that was that's amazing but not mature to be deployed and as a result we are Alex you're muted if you're saying anything you're talking about large language models large language model and generative AI technology in general right these are amazing amazing pieces pieces of pieces of Technology they're they're they're great breakthroughs um at the same time we as a whole Community don't know how to control them how to control the output space of these models and as a result when we scale them and deploy them interesting thing happens we're learning about certain behaviors and all of those things are great but at the same time we're also being uh being warned being surprised about certain things that we didn't like so going back to your question evaluation Crist is one piece okay being surprised in another piece and the third piece is that acknowledging that these technologies have gaps understanding those gaps being able to work on them and solving them so these to me are three pillars and how do you fill in those gaps we know one solution that has worked before for and that is open communal approaches to the problems AI is today because we practice it in open I built something you build on top of my thing someone else build on top of that you came back again and this has been absolutely the only way that AI progressed with the exception of the last few couple of years and now confining an immature piece of technology behind closed doors only hinders the the the the pace of progress that we desperately need to get to AI from what it is today to a to a piece of technology that we could actually deploy at a scale with people being comfortable with it yeah um so last week I wrote this story uh our llm is about to hit a wall um just looking at the fact that there's been so much uh data and compute and energy that's been used in the most recent training of the models like we had Ahmed adala here in the channel and he meent he's the head of gener of AI for meta he mentioned that to train llama 3 over llama 2 which was like a 6 to 8 month process they use uh 10 times more data and 100 times more compute so doesn't this eventually slam into a resource constraint um AI is getting more and more expensive for sure playing in that playground uh requires more data more parameters um and that means more compute um does this mean that we're hitting a wall I don't think so I think we're still have have space to grow MH um yes it is expensive but also I don't want to tie progress to increasing the number of parameters or increasing the number of data points um we our models consume way more data than they should probably uh um why is that I want to say that again why is that um so again let's go back to how these things evolved right there were Transformers and attention based models that came out and we got all got excited about it people started scaling them up and they realized that oh there's so much capacity to these models let me add more parameters to it let me add more data to it let me figure out a fine way to scale it and suddenly these new capabilities actually popped up um then the common practice was let me just scale more then we realize if I want to scale more oops I need way more data where do I find data and once we find data oops I need one more compute let me find more compute and you see there there's been Innovation and Creative Solutions in both of these spaces right we are actually sort of collecting creating more data in a more creative way on a on a weekly basis and people are coming up with more sophisticated more innovative solutions on the compute front you see a a whole gamut of various different accelerators actually coming up uh you see Innovation on how you deploy them at compiler level optimization model level optimization and all that jazz and all of them are actually I think are healthy and natural and and and to some extent necessary for for the progress as a field um the part that I was actually sort of pointing to is measuring progress for a piece of technology that we particularly don't know how to evaluate and tying progress to the number of parameters or the number of data being digested is nonscientific and I want to actually sort of cautious you your your audience about this I I generate my new U many billion parameters and I consume many Exmore data and I show improvements on some of these benchmarks some of them are saturated some of them have been leaked some of them actually do not actually cor correlate with the end capability that we want some of them might the good Dev value so I think we're in a space where we don't know deep down what better means because we don't know how to scientifically evaluate them and as a result we need to sort of go back to some understandable notion of metrics so people can actually get excited about it and remember this is actually an environment where people need to at least big entities need to play a very careful game they need to stay on top they need to actually stay relevant um what we end up seeing is actually sort of a competition for more number of parameters more number of data points both of them are actually necessary and some improvements over Benchmark data is this the only way forward I don't think so have yeah have we squeezed all we could have squeezed on the on the on the existing amount of data we don't know yet are the loss functions that were actually using today I mean next next token prediction is that the only thing that we need to do we still don't know there's a lot to be explored in my opinion um and there are a lot of design decisions that we make we as a whole Community make in designing these complex language models and many of those design decisions were either inheriting them from someone else or we run some set of ablations some set of experimentations but we only cover a small fraction of those parameters and the the space of the the design space of these these complex system STS are heavily unexplored and remember any of those Explorations are are are are really expensive because because you have to actually Define a set of parameters train a model under under those set of parameters look at your result and then decide of if these parameters are good or bad and how much does that cost each time you do it it's a lot it's very expensive I don't depends on it really depends on how many buildings of parameters you have what data are you using it are you renting it at the cloud or you have your own infrastructure it varies but if you're actually sort of doing 7 billion models it's 5 to10 million wow okay it is expensive yeah some people are more efficient at it some people are less efficient at it but any combination of these parameters are expensive to evaluate but also these parameters are combinatorially related if I change parameter number one to EV value and change parameter to something else can I go back and actually redo this thing m so my argument is that the design the space I don't know if you hear that's a plan in South Lake and like you that is exactly that's a c plan let me pause for one second for the plane to pass one of the keep going yeah you can keep going through it we can hear you fine so the design parameter the design space of these models is a complex B is governed by a large number of parameters and figuring out those parameters is expensive people have been creative on scales of law and how how can actually a law of scale and how could you actually learn from a smaller model and extrapolate it to a bigger the behavior of a bigl model all of the marks are very valuable but still there's a lot to be done so I probably disagree with the notion that oh we are we're now uh the whole the whole progress installed no there's a lot to be done one of those optimization at the moment so is that what you're saying say that again it's all about optimization um it's it's about learning more about this systems we know very little about right one it's kind of encouraging that it's so early on I guess and there's they're already this powerful in the most crudest ways of training them this is phenomenal yes they're they're powerful we know very little about them um I I use the word surprise we are surprised by certain behaviors of these models when we don't truly understand what's happening one aspect is more more parameters more data and people are heavily exploring that Dimension and that's great we should other aspects is questioning our design decisions the other aspect is questioning our parameter choices uh our loss functions and all of them are are yet to be explored um if I want to speculate and I've been wrong about my speculations I think we are not squeezing all we need to squeeze from a current number of parameters and the curent number of data points that we have yeah what is the form of the solution yet to be known right so what are you working on what is I mean let me give you my impression of Allen Institute of AI and you tell me if I'm wrong and sort of where you're heading and by the way folks if you're watching live right now with us I seen some folks checking in feel free to um drop some questions in for AR Ali Fadi the CEO of Allen Institute of AI one of leading research houses uh in AI long time going so um you're nonprofit right so we are a nonprofit kind of nonprofit okay so so I'm unlike open AI is that what you're you're poking at yeah so let me let me ask you then so what are you working on and how do you sort of sort of yeah how do you compete today in a world where funds are so important those are great questions and sort of great challenges for us so the position that we have today is we don't feel like we have to compete um our mission is is rather clear we we are after scientific progress right and after deep understanding on what's happening in Within These models um our philosophy has been anchored around openness and true openness so openness is an overloaded term these days uh right you've seen the space I train a model behind closed door I don't tell you what I did um but after after I'm done I'm going to toss the model over the fence with the right licensing so you could use it that's great actually we love it this has sort of bootstrapped the rate of progress but it's not enough without people understanding the whole Pipeline and the most important piece of this Pipeline and we're learning as we talk about is data data rules this game data makes it also data breaks it data is the one that causes legal conversations policy conversations alignment conversations so data is a root cause of many of these these problem problems or benefits and yet we're being Hush Hush about it and being very quiet about it scientifically impossible to evaluate these models without actually opening up the data scientifically it's hard to build upon these models without knowing the whole gamut the whole pipeline so one of the things that we're after these days is true openness um which means let's open up every piece of the pipeline mhm we started by opening up data dolma was the very first version of a data we released while back is was three trillion tokens of of open training data people have started doing phenomenal things with them incorporating them we would love to see the the the adoption after that we released our almost 7B models these are smaller side of models that are fully open the training data is open the training algorithms are open the what we did with the data how we collected it how we cleaned it is open but also the logs are open and all the checkpoints are open and that's actually extremely valuable in my opinion because you get to see what happened to the model after we changed few parameters during the training that to me is actually us getting closer and closer to code and code open source approach to software development because to open source means that I could grab the piece of software that you wrote Fork it and do what I want to do with it uh but with with partially open models or models that are trained behind closed doors I don't have actually access to those Shake points I don't know what the algorithm was I don't know what the data was it's actually hard for me to build upon it without me guessing about what what was happening behind the closed doors so are you guys all in on large language models now we spend a lot of our energy and time on actually making these components in this whole pipeline open so scientists researchers Vel opers Engineers could look into this build upon it and make progress in that space and we're releasing our artifacts as we get our hands on them we train them we work on them and we release them we are a research institute we have a few hundred top of the top researchers engineers in in the world working on these kind of problems and obviously our job is to innovate in that space and that's what we we're after by building the building blocks that is necessary for the innovation yeah we also take the position that there's a lot to be that we don't know about these models contrary to uh these problems are solved and the combination of those three letters a g and I together uh where oh this is actually going to replace human intelligence we don't believe in in in those directions we believe in this has been great progress a lot to be unknown a lot to be discovered and innovated and we're right after that yeah awesome Ali well look I I hope we can keep in in touch and um I love speaking with people at the Allen Institute whenever I have a question about AI so I hope to keep the tradition up with you as the leader and it's great to meet you and congrats on uh taking the the helm absolutely Alex it was great talking to you you too and this is a great show and great podcast thank you so much really appreciate it thanks everybody for watching thanks again Ally for being here and uh we'll be back later this week with a lot more stuff so stay tuned