Ali Fahardi — Allen Institute of AI CEO On LLMs Room To Grow, New Modalities, Next Breakthroughs

Channel: Alex Kantrowitz
Published at: 2024-04-30
YouTube video id: Wmm4ZHmJGEM
Source: https://www.youtube.com/watch?v=Wmm4ZHmJGEM
hello YouTube we are here with a YouTube
exclusive for you uh we know that you
love hearing about Ai and we have a
great guest for you Ali faradi is here
he's the CEO of the Allen Institute of
AI one of the world's leading research
institutions uh that focuses on AI has
been for a very long time even before
this whole era of generative AI yes they
one of the ogs and they have a
relatively new CEO that's Ali of course
and so you know we were going to have a
discussion just to get to know each
other and I thought well why don't we
just put it on YouTube so if you have
any questions feel free to drop them in
the comments uh and if you're watching
afterwards feel free to let us know what
you think if you like these we'll do
more of them here on the channel and if
you don't then you know totally
understand we'll just keep it to the
podcast but always trying to experiment
here and uh One Last Thing Before we
start we hit a big milestone on the
channel today 4,000 subscribers so it's
amazing to see the amount of people that
have signed up much of the growth has
happened in the past year I think we had
about a th000 a year and a year and a
half ago and so we've added 3,000 in the
year and it's great that you're all here
and we continue to go so if you like the
Channel please share it let people know
about it and thank you for being here so
first of all uh with that out of the way
I just want to say Ali it's great to see
you welcome to the show thank you Alex
excited to be here and congrats on on
the growth it's phenomenal thank you you
know 4,000 it's uh it's modest but you
know the growth has been nice and uh we
definitely continue to see regulars here
in in the channel so it's good to see a
little Community forming um first
question for you is uh you know just
thinking about the way that AI is
heading it feels like we're kind of in
this pause moment now which is kind of
funny to describe it as such but like we
heard about all this Innovation with
llms and then all of a sudden we you
know we have a lot of stuff on the
horizon that people talk about but isn't
quite here yet right agents and you know
uh emotional type of sensing robotics so
what do you think the state of AI is at
the moment
um I would probably answer that question
in the context of of the progress so far
right if you would have asked me five
years ago where AI is I would have never
predicted today as is so I think the the
pace of progress uh probably
outpaced even our most radical
expectations or the most optimistics
among us a lot has happened a lot of um
in my opinion breakthroughs happen in AI
we know a lot more than
before uh but also there are
roadblockers there are challenges and
there are problems that we don't know
how to solve them as a community yeah um
so these are key characteristics of any
kind of Explorations like most
scientific disciplines see these kinds
of behavior you see a rapid progress and
for a while you figure things out in
another Delta function um The Challenge
in AI in my opinion is actually sort of
the amount of noise and sifting through
the noise is actually by itself is a
hard problem the the momentum is is huge
the the pace of progress is is in my
opinion unsurpassed but at the same time
there's so much noise around and so much
hype that is just hard to see
through um but all in all I think
natural and expected I don't think we're
stalled I don't think we're we're we're
blocked in a sense that no one knows
what to do there are heart problems
there is a lot at stake and a good
portion of Earth Resources and talent
and brain is actually being pour in this
topic so I'm sure we'll see more on this
okay but so then answer the question
though about the um the state of where
we are so okay there's progress there's
noise all right anyone could say that so
what what do you actually think is
happening Fair Point um so where we are
today is a mix of what we learned so far
right we've learned the role of
scale um to me we've learned what you
could squeeze out of abstractive quote
unquote abstractive or textual Knowledge
from the web or from specialized sources
the
models um have exposed certain
Properties or certain capabilities that
went beyond our
expectation in my opinion uh where we
are today is we're in a situation where
probably for the first time I might be
wrong um
are abilities to Alex you lose you or
just switch the screens oh no I'm just
uh making you more prominent I see yeah
so our ability to
um to evaluate where we are um has been
slower than our ability to generate new
capabilities partly because we are by
surprise learning about new capabilities
the other part of it is
um these capabilities are of the form
that we as a whole Community are not
comfortable with evaluating them so to
me state today I would call it actually
there's one part to it about evaluation
crisis there is no scientific evaluation
about where things are and the ones out
there have major issues and I think we
need to fix that problem collectively as
a
community um we are in a situation where
um there are on a daily basis if not
maybe weekly but most probably daily
basis people come in and say mine is
better than yours and here is one way to
look at it um which is again there are
so many problems with the evaluation
piece so evaluation crisis would be to
me one problem that we're facing today
the other problem that we're facing
today is sort of over the last couple of
years AI which was a discipline in my
opinion born and rais in open suddenly
practiced Beyond closed
doors um and I think we deployed the
piece of technology that was not ready
to be deployed it's phenomenal it's a
breakthrough it does amazing things but
I think we still need there's there's
there technical technological gaps there
research problems that we don't know you
alluded to some of
those and we we deployed things at a
phenomenal Pace about a technology that
was that's amazing but not mature to be
deployed and as a
result we are Alex you're muted if
you're saying anything you're talking
about large language models large
language model and generative AI
technology in general right these are
amazing amazing pieces pieces of pieces
of Technology they're they're they're
great breakthroughs um at the same time
we as a whole Community don't know how
to control them how to control the
output space of these models and as a
result when we scale them and deploy
them interesting thing happens we're
learning about certain behaviors and all
of those things are great but at the
same time we're also being uh being
warned being surprised about certain
things that we didn't
like so going back to your question
evaluation Crist is one piece okay being
surprised in another piece and the third
piece is that acknowledging that these
technologies have
gaps understanding those gaps being able
to work on them and solving them so
these to me are three pillars and how do
you fill in those gaps we know one
solution that has worked before for and
that is open communal approaches to the
problems AI is today because we practice
it in open I built something you build
on top of my thing someone else build on
top of that you came back again and this
has been absolutely the only way that AI
progressed with the exception of the
last few couple of years and now
confining an immature piece of
technology behind closed
doors only hinders the the the the pace
of progress that we desperately need to
get to AI from what it is today to a to
a piece of technology that we could
actually deploy at a scale with people
being comfortable with it yeah um so
last week I wrote this story uh our llm
is about to hit a wall um just looking
at the fact that there's been so much uh
data and compute and energy that's been
used in the most recent training of the
models like we had Ahmed adala here in
the channel and he meent he's the head
of gener of AI for meta he mentioned
that to train llama 3 over llama 2 which
was like a 6 to 8 month process they use
uh 10 times more data and 100 times more
compute so doesn't this eventually slam
into a resource
constraint um AI is getting more and
more
expensive for sure playing in that
playground uh requires more data more
parameters um and that means more
compute um does this mean that we're
hitting a wall I don't think
so I think we're still have
have space to grow
MH um yes it is expensive but also I
don't want to tie progress to increasing
the number of parameters or increasing
the number of data
points um we our models consume way more
data than they should
probably uh um why is that I want to say
that again why is that um so again let's
go back to how these things evolved
right there were Transformers and
attention based models that came out and
we got all got excited about it people
started scaling them up and they
realized that oh there's so much
capacity to these models let me add more
parameters to it let me add more data to
it let me figure out a fine way to scale
it and suddenly these new capabilities
actually popped
up um then the common practice was let
me just scale more then we realize if I
want to scale more oops I need way more
data where do I find data and once we
find data oops I need one more compute
let me find more compute and you see
there there's been Innovation and
Creative Solutions in both of these
spaces right we are actually sort of
collecting creating more data in a more
creative way on a on a weekly basis and
people are coming up with more
sophisticated more innovative solutions
on the compute front you see a a whole
gamut of various different accelerators
actually coming up uh you see Innovation
on how you deploy them at compiler level
optimization model level optimization
and all that
jazz and all of them are actually I
think are healthy and natural and and
and to some extent necessary for for the
progress as a field um the part that I
was actually sort of pointing to is
measuring
progress for a piece of technology that
we particularly don't know how to
evaluate and tying progress to the
number of parameters or the number of
data being digested is
nonscientific and I want to actually
sort of cautious you your your audience
about this I I generate my new U many
billion parameters and I consume many
Exmore data and I show improvements on
some of these benchmarks some of them
are saturated some of them have been
leaked some of them actually do not
actually cor correlate with the end
capability that we want some of them
might the good Dev value
so I think we're in a space where we
don't
know deep down what better means because
we don't know how to scientifically
evaluate them and as a
result we need to sort of go back to
some understandable notion of metrics so
people can actually get excited about it
and remember this is actually an
environment where people need to at
least big entities need to play a very
careful game they need to stay on top
they need to actually stay relevant
um what we end up seeing is actually
sort of a competition for more number of
parameters more number of data points
both of them are actually necessary and
some improvements over Benchmark
data is this the only way forward I
don't think so have yeah have we
squeezed all we could have squeezed on
the on the on the existing amount of
data we don't know yet are the loss
functions that were actually using today
I mean next next token prediction is
that the only thing that we need to do
we still don't know there's a lot to be
explored in my opinion um and there are
a lot of design decisions that we make
we as a whole Community make in
designing these complex language models
and many of those design decisions were
either inheriting them from someone else
or we run some set of ablations some set
of
experimentations but we only cover a
small fraction of those parameters and
the the space of the the design space of
these these complex system STS are
heavily
unexplored and remember any of those
Explorations are are are are really
expensive because because you have to
actually Define a set of parameters
train a model under under those set of
parameters look at your result and then
decide of if these parameters are good
or bad and how much does that cost each
time you do it it's a lot it's very
expensive I don't depends on it really
depends on how many buildings of
parameters you have what data are you
using it are you renting it at the cloud
or you have your own infrastructure it
varies but if you're actually sort of
doing 7 billion models it's 5 to10
million wow okay it is expensive yeah
some people are more efficient at it
some people are less efficient at it but
any combination of these parameters are
expensive to evaluate but also these
parameters are combinatorially related
if I change parameter number one to EV
value and change parameter to something
else can I go back and actually redo
this thing m so my argument is that the
design the
space I don't know if you hear that's a
plan in South Lake and like you that is
exactly that's a c plan let me pause for
one second for the plane to
pass one of the keep going yeah you can
keep going through it we can hear you
fine so the design parameter the design
space of these models is a complex B is
governed by a large number of
parameters and figuring out those
parameters is expensive people have been
creative on scales of law and how how
can actually a law of scale and how
could you actually learn from a smaller
model and extrapolate it to a bigger the
behavior of a bigl model all of the
marks are very valuable but still
there's a lot to be done so I probably
disagree with the notion that oh we are
we're now
uh the whole the whole progress
installed no there's a lot to be done
one of those optimization at the moment
so is that what you're saying say that
again it's all about optimization um
it's it's about learning more about this
systems we know very little about right
one it's kind of encouraging that it's
so early on I guess and there's they're
already this powerful in the most
crudest ways of training
them this is phenomenal yes they're
they're powerful we know very little
about them um I I use the word surprise
we are surprised by certain behaviors of
these models when we don't truly
understand what's happening one aspect
is more more parameters more data and
people are heavily exploring that
Dimension and that's great we should
other aspects is questioning our design
decisions the other aspect is
questioning our parameter choices uh our
loss functions and all of them are are
yet to be explored um if I want to
speculate and I've been wrong about my
speculations I think we are not
squeezing all we need to squeeze from a
current number of parameters and the
curent number of data points that we
have yeah what is the form of the
solution yet to be known
right so what are you working on what is
I mean let me give you my impression of
Allen Institute of AI and you tell me if
I'm wrong and sort of where you're
heading and by the way folks if you're
watching live right now with us I seen
some folks checking in feel free to um
drop some questions in for AR Ali Fadi
the CEO of Allen Institute of AI one of
leading research houses uh in AI long
time going so um you're nonprofit right
so we are a nonprofit kind of nonprofit
okay so so I'm unlike open AI is that
what you're you're poking at yeah so let
me let me ask you then so what are you
working on and how do you sort of sort
of yeah how do you compete today in a
world where funds are so
important those are great questions and
sort of great challenges for us so the
position that we have today is we don't
feel like we have to
compete um our mission is is rather
clear we we are after scientific
progress right and after deep
understanding on what's happening in
Within These
models um our philosophy has been
anchored around
openness and true openness so openness
is an overloaded term these days uh
right you've seen the space I train a
model behind closed door I don't tell
you what I
did um but after after I'm done I'm
going to toss the model over the fence
with the right licensing so you could
use it that's great actually we love it
this has sort of bootstrapped the rate
of progress but it's not
enough without people understanding the
whole Pipeline and the most important
piece of this Pipeline and we're
learning as we talk about is
data data rules this
game data makes it also data breaks it
data is the one that causes legal
conversations policy conversations
alignment conversations so data is a
root cause of many of these these
problem problems or benefits and yet
we're being Hush Hush about it and being
very quiet about
it scientifically impossible to evaluate
these models without actually opening up
the data scientifically it's hard to
build upon these models without knowing
the whole gamut the whole pipeline so
one of the things that we're after these
days is true
openness um which means let's open up
every piece of the pipeline mhm we
started by opening up data dolma was the
very first version of a data we released
while back is was three trillion tokens
of of open training data people have
started doing phenomenal things with
them incorporating them we would love to
see the the the adoption after that we
released our almost 7B models these are
smaller side of
models that are fully open the training
data is open the training algorithms are
open the what we did with the data how
we collected it how we cleaned it is
open but also the logs are open and all
the checkpoints are open and that's
actually extremely valuable in my
opinion because you get to see what
happened to the model after we changed
few parameters during the
training that to me is actually us
getting closer and closer to code and
code open source approach to software
development because to open source means
that I could grab the piece of software
that you wrote Fork it and do what I
want to do with it
uh but with with partially open models
or models that are trained behind closed
doors I don't have actually access to
those Shake points I don't know what the
algorithm was I don't know what the data
was it's actually hard for me to build
upon it without me guessing about what
what was happening behind the closed
doors so are you guys all in on large
language models now we spend a lot of
our energy and time on actually making
these components in this whole pipeline
open so scientists researchers Vel opers
Engineers could look into this build
upon it and make progress in that space
and we're releasing our artifacts as we
get our hands on them we train them we
work on them and we release them we are
a research institute we have a few
hundred top of the top researchers
engineers in in the world working on
these kind of problems and obviously our
job is to innovate in that space and
that's what we we're after by building
the building blocks that is necessary
for the innovation yeah we also take the
position that there's a lot to be that
we don't know about these models
contrary to uh these problems are solved
and the combination of those three
letters a g and I together uh where oh
this is actually going to replace human
intelligence we don't believe in in in
those directions we believe in this has
been great progress a lot to be unknown
a lot to be discovered and innovated and
we're right after that yeah awesome Ali
well look I I hope we can keep in in
touch and um I love speaking with people
at the Allen Institute whenever I have a
question about AI so I hope to keep the
tradition up with you as the leader and
it's great to meet you and congrats on
uh taking the the helm absolutely Alex
it was great talking to you you too and
this is a great show and great podcast
thank you so much really appreciate it
thanks everybody for watching thanks
again Ally for being here and uh we'll
be back later this week with a lot more
stuff so stay tuned