AI Engineer World’s Fair 2024 - Keynotes & Multimodality track

Channel: aiDotEngineer
Published at: 2024-06-28
YouTube video id: vaIiNZoXymg
Source: https://www.youtube.com/watch?v=vaIiNZoXymg
[Music]
[Music]
because
mind now I know
theise
[Music]
now you're moving
I found myself on the
blind now you won't call we lost
[Music]
iten I'm my
missing
[Music]
you I'm missing you
[Music]
[Music]
hearten
[Music]
[Music]
are
[Music]
I'll
[Music]
hold bre
[Music]
[Music]
[Music]
good morning ladies and gentlemen we
will start our day in the Ballroom in 10
minutes thank
[Music]
you it
I'm
[Music]
[Music]
[Music]
to ke want to
T
[Music]
up up
[Music]
[Music]
so and I'm ready to hold bre
[Music]
[Music]
[Music]
I'm days
[Music]
you and me we were the
only we were
[Music]
holding
SL
your
singing every
night to play that song
[Music]
bre
thece you come with the door it open
play it open slow motion
[Music]
[Music]
[Music]
[Music]
[Music]
I he tomato
[Music]
the your
[Music]
[Music]
Frozen and tired always on my
mind I feel it all back in
[Music]
the ladies and gentlemen our program
begins in 5 minutes
[Music]
it all
[Music]
[Music]
n
[Music]
the
[Music]
[Music]
back
[Music]
[Music]
he he
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
was watching you watch the sun come up
ladies and Gentlemen please take your
seats our program is about to begin
thank you
ladies and Gentlemen please welcome back
to the stage your host and co-founder of
the AI Engineers Summit Benjamin
[Music]
duny good morning good morning good
morning good morning how we
doing we all had fun last
night is this the crowd that had fun and
just came back or this is a crowd that
did not have
fun looks like some people had too much
fun but thank you so much for coming
back for the final day of the AI
engineer World's Fair so great to see
you all I have 3 minutes and 45 seconds
and a couple of announcements to make so
I like to get right to it so I've been
talking to a few people when I get get a
chance um about the mobile app that we
put a lot of effort into and how it's
going to help you at with your
attendance and interactions at this
event so I wanted to just clear up a
couple of things number one the schedule
you can favorite sessions can see the
session details favorite them and you
can go to my
schedule so now I can see it there so
for a multitrack event I just want to
make that clear you can build your own
schedule here here so some people didn't
get that memo uh of course you can see
all the sponsors you got the map there
but the real killer feature here is the
generative matching and the other
networking features here so I'd like to
invite Simon sturmer on stage the lead
architect for this just to help explain
some features can we have a round of
applause for Simon
please so Simon we got some generative
profiles going on here these look to all
be unique um for each interaction tell
me a a little bit about what's going on
behind the scenes here so what we do is
we pull in your profile or we build a
profile from your socials and from the
questions you answered when you
registered and then we create embeddings
from there and we put that into a vector
database and we do a cosine similarity
search and just pull your five most
similar
profiles very nice and now I'm going to
a pora's profile and I can see all her
talks there I can also see some talking
points here and these are generated for
me as well that's right so those were
while that spinner was going generated
in real time yeah awesome and now if we
go to one of her sessions we can
see after you watch our
talk you can rate it we got session
feedback built in so this is a fully
custom app but all right here's the real
killer feature that I like because when
I first came to San Francisco in 2015 I
went to a Tech conference and everyone's
handing out business cards so when you
meet someone at the event like how do
you connect with them oh let's exchange
emails right that's one give me your
Twitter your LinkedIn it's always like
different with different people right
why don't we get them all right now Mr
Simon all right I'm gonna scan your
badge boom okay there we go so we have
badge scanning for everyone now I'm
connected as soon as I scan his badge I
get his email even if he made it private
at first he can set his he can change
his settings to make it default to um
share with me when I scan him still got
the talking points the notes Etc and now
is on my short list of scans so after
the event I can review all my scans and
I can even export all my connections
which is going to email them to me
so is that pretty
cool what do we
think all right all right cool cool
thank so that's the match thank you so
much Simon appreciate it for your
helping that um very cool so that's the
app just want to make sure we all
understand that the other thing is we
have ai engineered jobs so this is is at
ai.
engineer jobs the currently we're
featuring jobs from our Expo Partners so
go and check those out if you're looking
for your next opportunity any XO
Partners if you're not there please
email me and I'll get it up there before
end of day anyone else want to get some
jobs up there email me and we can we can
talk um lastly one more announcement I
got 20 seconds left um we're excited to
announce
2025 is now on sale if you enjoyed your
time here this is is the best time to
lock in the best rate so you can go to
ai. engineer
s2025 or scan that QR code that's going
to reserve your seat lock in the best
rate and we don't have dates currently
locked in but uh likely around the same
time this year so that's my time for
today so I'm pleased to Welcome to the
stage our opening keynote speaker an
absolute Legend in this industry please
join me in welcoming Chris
[Music]
lner all right good morning everyone
here to talk to you about modular and
accelerating the pace of
AI uh you know what ji is I'm not going
to tell you all about this let me tell
you one of the things I think is really
cool about it and very different than
certain other Technologies is that it's
super easy to deploy
there's lots of great endpoints out
there there's a lot of good
implementations a lot of ways to make it
super easy to build a prototype and get
get going very quickly but despite all
the availability of all these different
endpoints sometimes you do have other
needs sometimes you might want to go you
know and control your data instead of
sending your data to somebody else
sometimes you might want to integrate it
into your own security because you got
your critical company data in your model
and you don't want to find tune it
somewhere else sometimes you want to
customize the model like there's
research happening all the time right a
lot of things in building proprietary
models work best for your use cases uh
can make your applications even better
and of course the inference end points
are expensive and so sometimes you want
to save money sometimes there's Hardware
out there that's really interesting and
you want to explore out from the the
mainstream and you want to go do this
and if you care about any of these
things what you need to do is you need
to go beyond the
endpoint and so how do you do that well
if you have many of you have explored
this I'm sure the answer shifted it used
to be that we had things like py torch
and tensor flow and cafe and things like
this but as inference became more
important the world shifted first we got
Onyx tensor RT things like this and
today we have an explosion of these
different Frameworks some of which are
specific to one model that's cool if you
care about that one model but if you
have many different things you want to
deploy and you want to work with it's
very frustrating to have to switch
between all all these different
Technologies and of course it's not not
just a model you all know there's this
gigantic array of different technologies
that get used to build real world things
and production and of course none of
these are really actually designed for
Gen so my my my concern about this my my
objection to the status quo is that this
fragmentation slows down getting the
research and the Innovations coming into
gen into your products and I think we've
seen so many of these demos last year
was really the year of the Gen demo but
still we're struggling to get gen into
products in an economical and good way
and so whose fault is it well is it our
fault like many of you are AI Engineers
if you don't let's sympathize with the
plight of the AI engineer because y'all
these folks that're building this have
new models and optimizations coming out
every week right every product needs to
be enhance with Gen this is not like one
thing we're getting dumped on and
there's so much to do we can't even keep
up there's no time to deal with new
hardware and all the other exciting new
features and of course once you get
something to actually works the costs
end up making it very difficult to scale
these things because getting getting
things into production means suddenly
you're paying on a per unit
basis so it's not the a engineer's fault
we should look at the concerns and look
at the challenges faced here and so I
think that we need a new new approach
right we've learned so much let's look
at what we need to do how do we solve
and improve the world here this is what
modular is about it's all I'll give you
a quick intro of what we're doing and
kind of our approach on this first of
all who are we modular is a fairly young
company we've been around for a couple
of years um we we have brought together
some of the world's experts that built
all of these things and so we've built
tensorflow and pytorch we built
compilers like LM and ML and xlaa and
all all of these different things and so
what what I can say about that is that
we learned a lot and I
apologize because we know why it is so
frustrating to use all these things but
but really it's it was a you know the
world looked very different five years
ago gen didn't exist it's it's
understandable we we tried really hard
but but we've learned and so what our
goal is is to make it so you can own
your AI you can own your data you can
control your product you can deploy
where you want to you can do this and it
make it much easier than the current
systems work today and so how well what
we're doing is really going back to the
basics we're going to we're bring
together the best-in-class Technologies
into one stack not one solution per
model our goal is to lift python
developers pytorch users this is where
the entire industry is and so we want to
work with existing people we're not
trying to like say hey ditch everything
you know and try something new we want
to gradually teach and give folks new
tools so they can be superpowers so they
can have superpowers and finally uh so I
spent a lot of time at Apple like I want
things that just work like you want to
build on top of infrastructure you do
not want to have to be experts in the
infrastructure and this is the way all
of this stuff should work and
unfortunately it's just not the case
today in Ai and so modular we're
building this technology called Max I'll
explain super fast what this is um max
is two things one is an AI framework
which I'll spend a bunch of time about
uh the AI framework is free widely
available we'll talk about it today the
other is our managed services this is
how modular makes money very traditional
we're not going to spend a lot of time
talking about that today and so if you
dive into this AI framework well it's we
see it as two things it's the best way
to deploy py TS it's also the best way
to do gen and both halves of this are
really important and Max is currently
very focused on inference and so these
are areas where uh pytorch is
challenging at times this is where gen
AI is driving us crazy with cost and
complexity and so really focusing on
this problem is something that we are uh
we're all about the other thing as I
said before is python so we natively
speak python that is where the entire
world is we also have other options
including C++ which we'll talk about
later so how do we approach this well as
I said we work with P torch out of the
box you can bring your models your model
works we can talk to the wide array of
pitori things like Onyx and torch script
and torch compile and like all this
stuff and so you can pick your path and
and that's all good uh if you want to go
deeper you can use native apis Native
apis are great if you want if you speak
the language of KV caches and Page
detention and things like this and you
care about pushing the state-of-the-art
of llm and other geni techniques that's
very cool and also um max is very
different and that it really rebuilds a
ton of the stack which I don't have time
to talk about but um we do not build on
top of CNN and the the Nvidia libraries
and on top of the Intel libraries we
replace all that with a single
consistent stack which is really
different approach and I'll talk about
what that means later and so what you
get is you get a whole bunch of
technology that you don't have to worry
about and so again as a Next Generation
technology you get a lot of fancy
compiler Technologies runtimes high
performance kernels like all this stuff
uh in the box and you don't have to
worry about it which is really the
point now why would you use max so it's
it's a AI framework you have one right
and so there are lots of different
reasons why people might want to use an
alternative thing um for example
developer velocity your team being more
productive that's actually incredibly
important particularly if you're pushing
state- ofthe art but it's also very hard
to quantify and so I'll do the same
thing that you know kind of people
generally do is go and talk about the
quantifiable thing which is performance
and so I'll give you one example of this
um we just shipped a release that has
our int4
in6 K fancy quantization approach um
this is actually 5x faster than and lad.
CPP and so if you're using lad. CPP
today on in Cloud CPUs this is actually
a pretty big deal and 5x can have a
pretty big impact on um you know the
actual perceived latency of your product
and performance and cost characteristics
and the way this is possible is again
this combination of really crazy
compiler and technology and other stuff
underneath the covers but the fact that
you don't have to care about that is
actually pretty nice it's also pretty
nice that this isn't just one model this
is you know we have this make it easy to
do in for technology and then we
demonstrate it with a model that people
are very familiar with and so if you
care about this kind of stuff this is
this is actually pretty interesting and
it's a Next Generation approach to a lot
of the things that are very familiar but
it's also done in a generalizable way
now CPUs are cool and so I mean so far
we've been talking about CPUs but gpus
are also cool and what I would say and
what I've seen is that the uh uh CPU and
AI are are kind of well understood but
gpus are where most of the pain is and
so I'll talk just a little bit about our
approach on this and so first before I
tell you what we're doing let me tell
you our dream and this is this is not a
small ambition this is kind of a crazy
dream imagine a world where you can
program a GPU as easily as you can
program a CPU in
Python not C++ in Python that
that that that is that that is a very
different thing than the world is
today imagine a world in which you can
actually get better utilization from the
gpus you're already paying for I don't
know your workload but you're probably
somewhere between 30% maybe 50%
utilization which means you're paying
for like two to three times the amount
of GPU that you should be right and that
that is understandable given the
technology today but that's not great
for lots of obvious reasons imagine a
world where you have the full power of
Cuda so you don't have to say there's a
powerful thing and there's an easy to
use thing you can have one technology
stack that
scales well this is something that is
really hard this is something where you
know Nvidia has a lot of very good
software people and they've been working
on this for 15 years um but I don't know
about you I don't run 15E software on my
cell phone like it doesn't doesn't run
Blackberry software either and I think
that it's time to really rethink this
technology stack and push the world
forward and that's what we're trying to
do
and so how does it work well you know
it's just like pyour you use one line of
code and switch out CPU to
GPU haha we've all seen this right this
this doesn't say anything I actually
hate this kind of a a demo um because
the way this is usually implemented is
by having a big fork at the top of two
completely different technology Stacks
one built on top of Intel mkl one built
on top of Cuda and so as a consequence
nothing actually works the same except
for the the thing on the slide and so
what modular has done here is we've gone
down and said let's replace that entire
layer of Technology let's replace the
Matrix multiplications let's replace the
fuse detention layers let's replace the
graph thingies let's replace all this
kind of stuff and then make it work
super easily super predictably and let's
make it all stitched together and yeah
it looks fine on a slide but the slide
is missing the
point so if you are an advanced
developer and so many of you don't want
to know about this and that's cool if
you are an advanced developer like I
said you get the full power of
Cuda and so if you want you can go write
custom kernels directly against Max and
that's that's great and for advanced
developers which I'm not going to dive
too deeply into it's way easier to use
than things like the uh Triton language
and things like this and it has good
developer tools and it has all the
things you'd expect from a worldclass
implementation of GPU programming
technology um for people who don't want
to write kernels you also get a very
fancy autof fusing compiler and things
like this and so you get good
performance for the normal cases without
having to write the hand fuse kernels
which is again a major usability
Improvement now you know it's cool like
there's a lot of lot of things out there
that the promise to be easy but what
about performance right a lot of the
reason to use the GPU in the first place
is about performance and so one of the
things I think is pretty cool and one of
the things that's very important to
modular is that we're not comparing
against L standards we're comparing
against the vendor's best in this case
Nvidia they're experts in their
architecture and so if you go look at
again there's a million ways to measure
things a micro Benchmark go look at the
core operation within a neural network
matrix multiplication this is the most
important thing for a wide variety of
workloads and again one set of data but
we compare against kuas the hardcoded
thing and then also against cutless the
more programmable C++ e thing and so max
is uh meeting and beating both of these
you know by just a little bit I mean
it's you know it depends on your bar and
data is complicated but you know if
you're winning by 30% 30% is actually
pretty big deal given the amount of cost
the amount of complexity the amount of
effort that goes into these kinds of
things and so I've talked a lot about
the what but I haven't talked about the
how and so the how is actually a very
important part of this and I'll just
give you a sample on this so we are
crazy enough that we decided to go
rebuild the world's first AI stack from
the bottom up for geni and as part of
doing that what we realized is we had to
go even deeper and so we built a
programming
language we have a new programming
language it's called Mojo and so the
thing about Mojo is if you don't want to
know about Mojo you don't have to use
Mojo you can just use max it's fine but
we had to build Mojo in order to build
max I'll tell you just a couple of
things about this our goal is that Mojo
is the best way to extend
Python and that means you can get out of
C C++ and rust and so what is it as a
programming language it's a full it's
pythonic so it looks like python it
feels like python everything you know
about python comes over and you canot
have to retrain everything which is a
really big deal you get a full tool
chain you can download on your computer
you can use in Visual Studio code it's
open source available on Linux Mac
Windows 200,000 people 20,000 people in
Discord it's it's really cool would love
for you to go check it out if you're
interested in this but what is
mojo like what what what actually is it
fine there's a programming language
thing going on well what we decided is
we decided the AI needs two things it
needs everything that's amazing about
python this is in my opinion the
developers this is the ecosystem this is
the libraries this is the community this
is even sorry the package managing and
like all the things that people are used
to using already those are the things
that are great about python what is not
great about python unfortunately is its
implementation and so what we've done is
we've combined the things that are great
about python with some very fancy High
fluen compiler stuff ml all all this
good stuff that then allows us to build
something really special and so while it
looks like python please do forget
everything you know about python because
this is this is a different
beast and I'm not going to give you a
full hourong presentation on Mojo but
I'll give you one example of why it's a
different from Beast I'll pull it back
something many of you care about which
is
performance and what I'll say is that
Mojo is fast how fast well it depends
like this isn't a slightly faster python
this is a working back from the speed of
light of Hardware kind of system and so
many people out there have found that
it's a thou 100 times to a thousand
times faster in Crazy cases it can be
even even better than that but the speed
is not the point the point is what it
means and so in Python for example you
should never write a for Loop python is
not designed for writing for Loops if
you care about performance at least in
Mojo you can go write uh code that does
arbitrary things this is an example
pulled from our llama 3 written in Mojo
that does tokenization using a standard
algorithm it's chasing link lists has if
statements for Loops like it's just
normal code and it's python I mean it
feels like Python and that that is
really the point and so for you the
benefit of Mojo is first of all you can
ignore it if you don't want to care
about it but if you do you don't have to
learn C C++ you have lower cost by
default versus python because
performance is cost it means that as a
researcher if you use this you can
actually have full stack hackability and
if you're a manager it means that you
don't have to have people that know rust
on your team and C++ and things like
this you can have a much more coherent
engineering structure where you're able
to scale into the pr no matter where it
is and so if you want to see something
super polarizing go check the modular
blog and we'll explain how it's actually
faster than rust which may people
consider to be the gold standard even
though it's again a 15-year-old
language so I have to wrap things up
they'll get mad at me if I go over um
the the thing that I I'm here to say is
that many of you may want to go beyond
the AP the API and they're fantastic
there's amazing technology out there I'm
very excited about them too but if you
care about control over your data you
want to integrate into your your
security you want customization you want
save money you want portability across
Hardware then you need to get on to
something else so if you're interested
in these things then Max can be very
interesting to you Max is free you can
download today it's it's totally
available go nuts uh we didn't talk
about production or deployment or things
like this but if you want to do that we
can also help we support production
deployment on kubernetes sagemaker and
we can make it super easy for you our
GPU support like I said is actually
really hard we're working really hard on
this we want to do this right and so
it'll launch officially in September if
you join our Discord you can get Early
Access we'd be very happy to work with
you ahead of that too uh we're cranking
out new stuff all the time and so if you
are interested in learning more you can
check out mod.com find us on GitHub a
lot of this is open source and join our
Discord thank you
[Applause]
everyone ladies and Gentlemen please
welcome to the stage principal developer
advocate of AWS
Auntie
aart hi
everyone I'm so excited to be part of
this conference and share with you five
practical steps from software developer
to AI
engineer and if anyone is wondering here
this AAR are on the slide this is what
happens if you ask AI to make you look a
little bit more
agentic all right let's get started so
I'm pretty sure everyone is familiar
with this image here and the post from
swix that defines the new role of the AI
engineer and as you've experienced
probably daily in your jobs you don't
need to be a full ml researcher anymore
or data scientist think that took months
or years before to get AI projects into
production is now able to be just a
couple of API calls super
exciting but still if you're working
with AI it still makes sense to
understand the basics of the
technology and this involves a couple of
things right so you have to understand
at a basic level how Foundation models
work why they're sometimes producing
output that you don't expect in your
application code right you have to
understand how you can customize the
models how you can you know for example
sometimes F tune models to adapt them to
your specific use cases and data sets
how to include functions in your
application code to give them access to
additional
systems the good news is if you're just
starting on this journey to become an ni
engineer there's plenty of resources now
these days available to you to learn and
I wanted to call out one specific course
here which is called generative a with
large language models a few colleagues
in mine we actually collaborated with
Andrew in and the team at
deeplearning.ai to put this course
together and help you understand the
fundamentals of generative AI to help
you build real world applications if
you're curious it's available on
deeplearning.ai and on Cera
now the second step in this journey is
to start get handson with the AI
developer tools to help you increase
your
productivity and I think we all seen
this quote here and we experienced it in
our daily jobs that how we do work how
we develop applications has changed a
lot these days we can literally use
natural language in inputs to interact
with
applications and really English has
become one of the most um popular and
hottest programm languages I think um we
can see this
happening for example you can go these
days from English to code by asking AI
to for example rewrite a readme
file we can also do Cod to English for
example asking I to document functions
in our
code but this is not all if we look at
the software development life cycle I
think many of us can agree that the
majority of time we usually spend not
writing valuable code but all the other
things around it so sometimes up to 70%
of unvaluable tasks which is writing
boilerplate code writing documentation
trying to m main old code bases right
and sometimes we only have like a
fraction of the time maybe 30% that
we're spending on actually what you know
creates joy in kind of the creative
tasks in software
development and this is what led us at
AWS this inspired us to create Amazon Q
Amazon Q is a generi powered assistant
specifically developed for software
development and this is much more than
just a coding assistant Q developer
actually uses agents to perform much
more complex tasks and help you automate
those for example feature development
and also code transformation think about
working with old Java based codebases
that you need to migrate maybe to your
newer Java
version and to show you how this works I
asked my colleague Mike Chambers to put
together a quick demo let's have a
look with Amazon Q installed inside of
my IDE I can go to new tab and I can
start a conversation with Amazon Q
developer and I can do the kinds of
things that maybe you'd expect such as
how can I create a serverless
application how do I get started and the
chat session brings back a list of
instructions of what I should do
starting off by installing AWS Sam CLI
how to do that where to get that from
and how to step through the creation of
a project now if I've done that then
serus Sam for example might actually
come back with some generated code and
here is that code maybe I don't quite
know what this code does so I can
rightclick on the code and send it to
Amazon Q asking Amazon Q to
explain and the code then will go into a
prompt along with explain and generate
an answer and this this is great for
code that's been generated for us but
also Imagine code for Legacy systems
something that was worked on Years Ago
by somebody else where you can get
Amazon Q to help explain it we can also
get Amazon Q to generate code now this
is again probably the kind of thing
you'd expect I can put in a comment line
inside of my code in this case I want to
create an input checking function I'm
going to give it some more definition
here that I actually want it to trim any
string that's being sent into this
function and yes Amazon Q can generate
this small function well that's great
but what about if I've got more code
that I need to have generated well I can
go to the chat and put in
slev and I can put in a much more
comprehensive description of something
that I would like in this particular
case I'm going to ask for it to write a
function to search by category in Dynamo
DB with a bunch of details about the way
that I want the output to be formatted
so this is much more than just a single
or a few lines of code and in this
particular case what's going to happen
is it will come back again with a
stepbystep list of what's required so I
need to add in template. yaml it's
recommending that I create search by
category. MJS and many more things but
this isn't just a big shopping list of
things that I need to do this is
actually a plan and it's a plan that
Amazon Q can actually follow for us so
it generates some code as a change set
something that we can look at the
difference between our current code and
what it suggests and if we like that we
can actually click on the insert code
button and it will add all of that code
into our project way more than just a
couple of lines so Amazon Q developer is
much more than than just code
completion all right if you're curious
to learn more about Amazon Q Amazon Q
developer we have a couple of more
sessions throughout this day so make
sure you're checking um those Expo
sessions and we also do have a session
at our AWS Booth here you can also visit
our Amazon Q developer Center for much
more examples what you can do with
it all right let's come to step three
and this is where the fund starts start
prototyping and building with
AI and the fun includes a couple of
steps right everyone developing with AI
knows this it starts all with defining
your use case and then really you're on
this road trying to choose from
different models you're trying to you
know customize them to your use case
decide whether it's prompt engineering
whether you do rack whether you need to
do a little bit of fine-tuning there
with your data and of course across the
whole development workflow you have to
incorporate responsible AI policies
making sure data is private and secure
and also implementing guard rails into
your application and then when you're
integrated another fun part obviously
working with the agents what we're
hearing a lot here throughout this
conference and the fun topic of you know
how to keep them up to date gen Ops I
think there's a lot of terms for that
MFM Ops llm Ops so really kind of um a
lot of things to consider here I want to
dive in briefly into the topic of models
to choose and this is really an
important topic when you're evaluating
models you have to really evaluate them
thoroughly because most likely there's
not just going to be one size fits all
for you in fact if you look at all your
use cases you want to implement there's
likely no one model to rule them all
and this is why we developed Amazon
Bedrock Bedrock is a fully managed
service that gives you access to a wide
range of leading Foundation models that
you can start experimenting with
implementing into your
applications it also integrates the
tooling you need to customize your model
whether it's fine-tuning also to include
Rec workflows to build agents and of
course everything in a secure
environment where you are in full
control of your
data and speaking of choice just to give
you a quick overview as of today this is
the selection of models you can choose
from we're working with leading
companies such as eii 21 Labs entropic
CER meta M AI stability Ai and we also
offer our own Amazon Titan models for
you to choose from and I'm super excited
just to call this out last week together
with entropic launch we integrated cloth
3.5 Sonet on Amazon Bedrock as well so
you can also since last week use this
model super
exciting now with Choice also comes
responsibility right and we
continuously innovate and trying to make
it easier for you to build applications
across the different model types and
just a few weeks ago we introduced a new
unified Converse API in Amazon Bedrock
what does this do the unified Converse
API helps you with a new unified method
structured invocation meaning you can
use the same parameters and bodies
regardless of which model you choose and
we are on the platform side we're
handling this translation if parameters
are called different for the different
models handling the system user
assistant prompts for you and also
giving you a consistent output format
and as well having native function
calling support in here but let me show
you how this looks in code so here's the
python example that shows how you can
use the new
API this is python so we're starting by
just integrating the python SDK client
here and then you can Define this list
of messages and here's for example where
you put in your user message prompts you
can put in system prompts as well and
then this message list you can just pass
in this single API call using the
converse API here in the model ID you
can choose which model you want to test
here I'm using an entropic model and
then pass the messages and also the
inference parameters and again in this
API all those parameters are
standardized and we're going to make the
work behind um the covers to convert
this to the specific format that the
model is expecting so you have an easy
way to work across different
models similarly here for function
calling we do have a support built in
that with the models that support it so
how we implement this is by defining a
tool list so tool here equivalent to to
the functions you want to give access to
and then when you're doing the converse
API call you can pass this list of
tools all right if you want to find out
more about Converse API here's a link to
our generative AI space on community.
AWS which has a lot more Co tutorials
code examples not just for python but
across different languages as well so
check it out the author here Dennis
Trout is also somewhere here in the
audience here this week so if you want
to connect with him talk about different
code examples and how to use the API um
feel free to reach out
all right now let's integrate AI into
our applications and this can be a whole
session in its own but I want to focus
on one of the hottest topics right now
that we're discussing during the
conference which is of course
agents and I have one more demo here and
I asked my colleague Mike last time to
put together an exciting demo to show
you what you can do with
agents
Mike I we need to be able to create a
gentic workflows right inside of the AWS
console and inside of the service it
works fully serous and I've used it to
create an agent that plays Minecraft let
me show you how I did it if we jump into
the AWS console go down the menu on the
left hand side to agents um you can see
the agents screen here and I can open up
my agent my Minecraft agent now if I
just go into agent Builder U and just
expand the screen out a little bit you
get to see some of the parameters that I
used to create this agent so you can see
the large language model I used in this
case Claude 3 Haiku and you can also see
this the instructions for the agent now
this is not some notes for myself this
is actually prompt engineering that
we're doing to explain how we want the
agent in this case the Minecraft Bot to
play the game and then we also have to
add some tools in right some Minecraft
tools so we do that through actions and
inside of action groups so I've got a
couple of different action groups we've
got Minecraft actions and Minecraft
experimental let's have a look at
actions and inside of here we can see
some really simple things some actions
that the bot will be able to do and
these are all linked up to code so we've
got the action to jump we've got the
action to dig and you can see the
description here for Action to dig it's
got some instructions again this is
prompt engineering and then we've got
some parameters that we can select
collect in fact we require these
parameters so the bot needs to get these
for us
um if I scroll down a little further
there's a couple of really simple
actions in here action to get a player
location and action to move to a
location I want to show you those in
action because the bot can actually
problem solve and reason its way to be
able to use these tools to solve simple
problems let's jump into the game um and
so it is nighttime so let's set it to be
the daytime um so that we can see what's
going on so set time to today okay and
there in the middle of the screen you
can see Rocky Rocky is the Bedrock agent
running inside of the game and we can
talk to it and we can have a chat
session but what about if we want it to
come to us now there is no tool to come
to us so if I I'm just going to back up
a little bit further make it a little
bit more of a challenge and I'm going to
say come to me in chat and what's going
to happen now is that the agent's going
to reason through a whole set of actions
it's going to look to see who requested
that it's then going to take that name
and that's my name and it's going to
find the location of that player and
then it's going to map a path from where
it currently was to me all of those
things happened all in that blink of an
eye and it's a gentic workflows making
all of that happen this is super
exciting I'm discovering new things that
this bot can do every day um but with
that it's back to
you all right thank you Mike if you're
curious to know how we did this check
out our booth session we we're running
the demo there as well and we have
another session in the agents track
later today so make sure you're popping
in there if you want to know more and of
course you can find the project code for
this on GitHub so if you want to play
play it on your own and how you can
integrate agents into a fun thing check
out this
project all right we're almost there so
the last step I really want to call out
is stay up to date there's so much much
happening in the space as you all know
and a really good way to do that is to
engage with the
community speaking of community I have
one last announcement to make and I'm
super excited to announce that we're
Transforming Our AWS Loft here in San
Francisco into the AI engineering hub
for the community so we're super excited
to host workshops events and meetups
there if you want to suggest a couple of
topics you're most interested in to make
those events most valuable to you fill
out this quick survey here also if
you're interested in speaking or hosting
a meet up yourself you can let us
know and also we do have another event
tonight which I think we're reaching
capacity or just have reached capacity
but we do have a happy hour with
entropic tonight at The
Loft in case you didn't make it anymore
in we're at capacity um don't worry
we're working on putting together much
more events like this in the upcoming
weeks and month so keep an eye out for
those and with that I'm coming to the
end of my presentation this wraps it up
the five practical steps to become an AI
engineer and let's innovate together and
I'm looking forward and I'm excited to
see what you build with AI thanks so
much make sure you're checking out the
rest of the sessions here and also pop
by our booth outside side thanks so
much relations at anthropic Alex
Albert all righty morning everyone
one so today I want to start with a uh a
little story um a short history lesson
if you will so you know sit back get
comfortable uh I'm going to take us back
to the year
1882 it's the dawn of the electrical
Revolution really the world's first
commercial power plant just opened up
electricity this amazing new force is
all the rage in the manufacturing
industry
people are claiming that it's going to
change
everything and yet something very
interesting happened around this
time or rather it didn't happen you see
despite electricity's obvious
superiority in comparison to the
traditional techniques at the time like
steam engines it didn't immediately
improve manufacturing
productivity why well because Factory
owners were simply trying to replace
their old technology with this new
technology into an outdated
Paradigm let's picture a typical Factory
at the time so we have a huge cool fired
steam engine on one end and we have a
network of transmission lines going
across the top all driving hundreds of
machines locked in the same
Rhythm these Legacy steam power
factories were incredibly inefficient
you know if one station needed power all
of a sudden you had to turn on that
entire steam engine and it had to power
all of them Factory layouts were
dictated by the limitations of the
transmission lines not by was best for
the process or for the
workers when electricity arrived many
Factory owners simply swapped out the
steam engine for an electric one and
sure you know they added some lights and
you know workers didn't have to toil
next to a Coal Fired furnace all day but
the fundamental limitations of the
factory
remained so this real electrical
Revolution well it didn't actually come
until we imagined factories from the
ground up with electricity at its
core factories started to become
flexible and adaptable they allowed for
smaller specialized tools workers could
bring their tools to the items instead
of having to Lug the items back to their
workstations the entire manufacturing
process became more efficient more
Humane and more
productive now let's fast forward 140
something years to today and you can see
that we find ourselves at a similar
point in regards to Ai and llms
Enterprises startups developers are all
building and integrating llms into their
products but often they're just tacking
it onto their existing product surface
adding a few star icon buttons in the
top left corner and calling it a
day and this is not the first time we've
seen this in Silicon Valley let's think
back to when mobile first emerged right
companies simply tried to just shrink
down their website and put it on a
phone it wasn't until we redesigned apps
from the ground up
with the unique capabilities of mobile
like always on camera and GPS that we
actually began to see true innovation in
the space and
adoption this is when the Snapchats and
the Ubers of the world started to
emerge so just as companies and just as
factories went through their you know
replace steam engines with electric one
phase and tech companies went through
their just hire a couple mobile web dev
people phase we're now in our magic star
icon pH with respect to
Ai and yeah it's funny but the thing is
you can't blame any of the companies or
developers that are actually trying to
do this right now right like all of us
are trying to do this but in many ways
we're just still so early llms are
non-deterministic they're hard to build
on they're completely different than
what most developers are used to
using reliability is still an issue
prompts still take rounds and rounds of
optimization and we've also just started
to scratch the surface of potential
product
opportunities so far not much has really
stuck Beyond just the text
box we've been missing something
something that's a little hard to put a
finger on but just last week I think we
scratched the surface of a potential new
product future that we can
build as some of you may have heard last
Thursday we released our new model
Claude 3.5
Sonet 3.5 Sonet is the first model that
we released in the new CLA 3.5 family is
only the middle model and yet it is
better than our last best model Claude 3
Opus in my opinion Claude 3.5 Sonic is
one of the best models in the world
right now and the benchmarks seem to
back it up mlu human eval gpq tool use
all the common characters here it's top
of its class in many regards in these
academic lab type environments but what
I'm most excited for is how it actually
does in the real world
the model is particularly strong in rag
use cases thanks to its 200k context and
also has near perfect recall over that
entire context as
well on coding tasks 3.5 Sonet seems to
grasp debugging problems better it's not
getting stuck in those same Loops as
much as previous
models one of the best methods that we
found for actually measuring more
complicated chains of reasoning is pull
requests they have a defined task they
usually take a few steps to solve and
the model is able to iteratively write
and test its way to a
solution in our own internal pull
request evals we're seeing that CLA 3.5
Sonet scores a
64% and to put that number in comparison
Claud 3 Opus only scored a
38% 3.5 Sonet also has state-of-the-art
Vision abilities it shows considerable
improvement over three Opus in basically
every Benchmark that we tested it
on things like table transcriptions and
OCR are Breeze now passed this table in
a 3.5 Sonet and basically replicated it
perfectly and marked down um probably
can't read all those numbers but trust
me I I double check them to make sure
they're all right uh Vision capabilities
were actually what amazed me the most
when I started playing around with this
model it feels like we are really on The
Cutting Edge of unlocking so many more
use
cases and you know as you're hearing me
say all this you might be thinking well
that's great Alex but I mean it doesn't
mean anything if I can't actually use
the model and you're right and we heard
you and that's why 3.5 Sona is available
on RPI AWS bedrock and vertex
AI we understand that developers want
Choice when they're building and we want
claw to be available wherever you
are in terms of pricing 3.5 Sonet is
five times cheaper than three Opus it's
only $3 per million input tokens and $15
per million output
tokens 3.5 sonnet's combo of speed
intelligence and low cost makes it much
more economical to use and embed in your
apps than three
Opus but 3.5 son is not all that we've
released in the past week we also
released a new product feature that I
think is actually more inspiring to
developers in terms of thinking about in
building those AI products from the
ground up it's called
artifacts artifacts separate the content
that Claud produces from the actual chat
dialogue
itself this allows you to work
collaboratively with Claude on things
from to svgs to react
websites artifacts become really
powerful when you combine it with 3.5
Sonet those coding skills plus that
reasoning ability plus that strong
visual Acuity enables a new product
experience that's really fun to
use it's also a developer's best friend
and that allows you to quickly take
screenshots and figma diagrams and
quickly turn it into code and components
that you can actually just go use uh as
you can see in this I basically cloned
her entire cloud. AI chat layout and
react just from a single
screenshot and this feature has
practically been hiding in plain site
now uh just waiting really to be
discovered for over a year and a half
maybe this tweet is right and we really
are early on this scurve of production
productionizing llms which I think is
actually pretty
inspiring an artifax is not the only AI
feature that we launched recently on
Tuesday we released projects
projects enables Dev teams to work and
collaborate much more efficiently by
grounding cla's outputs in your own
knowledge whether it's style guides or
code bases or transcripts or even your
past
work on our Claud team plan you can even
share these projects in your chats with
all your
teammates at anthropic our Engineers now
upload code repos and documentation that
they use and I've started to see people
actually just share the chats and the
artifacts instead of Google docs or site
documentation
projects is another great example of
when you think from an llm and an AI
standpoint first you can actually start
to build product experiences that
complement these Technologies and don't
feel like a simple add-on to what you
already
have so now that hopefully the creative
product jues are flowing in everyone's
Minds I want to dive a little bit into
API improvements that we've rolled out
recently and things that allow you to
actually build this cool
stuff I also want to give a preview of
what's coming next next that will enable
you to build even
more so a month ago we released our new
tool use API tool use allows you to give
Claude custom client side functions that
it can then intelligently
leverage tool use also enables things
like consisted structur Json
output with 3.5 son it I've actually
started to seen devs give Claude
hundreds of tools at a
time on the developer console front
we're also continuing to iterate we
added a prompt generator that uses Claud
to write prompts for you based on a task
description so you can see in this video
we put in a task description and then
out comes a optimized prompt and then
once that prompt is all done you can
actually just start editing it right in
the workbench itself you can see we've
also added support for variables so you
can edit prompt templates as well test
things like rag use
cases and finally we're also working on
a new evaluate feature which is
currently in console right now with a
beta tag uh and we will plan to share
more on this and continue to iterate on
it very
soon so what else is next um well
there's there's two things that I can
share right now first is that you can
expect more
models 3.5 hi cou and 3.5 Opus are
coming later this
year with each model generation we're
looking to increase the intelligence
decrease the latency and decrease the
cost the number one thing that I tell
developers is to not forget to build
with that in mind models will become
smarter cheaper and faster in orders of
months not years when you're planning
your product road map be ambitious
enough to build with the belief that new
models may arrive during your
development
period we are also working on other
areas of research like
interpretability in one of our latest
papers called scaling monos semanticity
we explained how we've been able to find
features within models that activate for
different topics
once you identify a feature you're able
to clamp its value and turn it up or
down to actually steer the model's
outputs a few weeks ago we showed claw.
users how this worked through Golden
Gate clae which was a version of Claude
that had the Golden Gate Bridge feature
turned up
significantly yeah fan
favorite we currently have a few beta
testers is also experimenting with a
steering API um this allows developers
to find and clamp features for specific
attributes and actually turn that dial
up or down which again allows you to
control claud's outputs in in addition
to actually just prompting
it we hope to be able to roll this out
to more developers in the very near
future as
well now if anything in this talk has
sparked any ideas I want to encourage
you guys to just go out there and build
and make quick prototypes as fast as you
can to get that validation and that
feedback loop started and for even more
of an incentive uh we actually just
launched another build with Claud
contest yesterday yesterday it runs
until July 10th the top three projects
will each receive 10K and anthropic API
credits to see more details just visit
that link below it's just at the top of
our docs page as well so you can find it
there too I'll leave that up for a
second and finally if you have any
questions or you want to hear more about
just what we're thinking about uh I'll
be at that AWS Booth down the hall for
the next few hours you can also find me
on x/ Twitter Alex Alber with2 unor
um I do try to read all my DMs I spend
way too much time on that site so feel
free to ask questions there as
well and with that I want to say thank
you guys very much and enjoy the uh last
day of the
[Applause]
summit ladies and Gentlemen please
welcome to the stage CEO of Lang chain
Harrison
Chase hello today I want to talk about
agents so llm powered agents are really
nothing new the react paper came out in
October of
2022 Lang chain launch about a month
after that and auto GPT is over a year
old and to me auto GPT represents the
peak of hype in agents and I actually
think for a few months after that there
was a bit of a falloff in interest as
people realized that the generic agent
architecture wasn't reliable enough to
build systems to ship to
production while there was this fallof I
do think there was some really
interesting work being done so to open
ai's assistance API I think was really
novel in a few regards and I'll come
back to that in a little bit and earlier
this year we launched Lang
graph while Lang chain did agents it
also did a bunch of other things Lang
graph is purpose-built for agents
however what exactly does that
mean so lra is highly controllable and
lowlevel as mentioned we saw that these
generic agent architectures weren't
reliable enough and that companies that
were shipping agents to production were
building custom cognitive architectures
encoding little differences in how they
wanted their agents to behave and this
was super important and so we made we
made L graph extremely low level and
controllable it also comes with the
built-in persistence layer which enables
a lot of really cool humanin the loop
interaction patterns and it's streaming
first because streaming is really
important for llm uis
and just to emphasize lingraph works
with or without Ling chain and it
integrates seamlessly with L Smith our
testing and observability
platform lingraph is already being used
in production by a lot of our awesome
Partners ranging from cuttingedge open
source projects like GPT
researcher to trailblazing unicorns like
repet to public companies innovating at
scale like Norwegian Cruise Line
Ali bank and
elastic and today we're excited to
announce the first stable version of
lingraph reaffirming our commitment to
building an agent architecture that
allows you to build the custom cognitive
architectures that are necessary for
bringing agents to production but that's
not the only thing we're launching so I
want to go back to the assistance API
which I mentioned earlier and I think
there were a lot of really cool things
here and really novel things it wasn't a
framework it wasn't just a framework for
building agents it also introduced
infrastructure
components it had built-in persistence
it kept track of the messages and stored
them for you so you didn't have to do
that it had this concept of background
runs really good for longer running
asynchronous workloads and it allowed
you to configure
agents the downside is that it didn't
give you full control over the cognitive
architecture application it it came with
a specific state that it expected your
application to have a list of messages
and it was a little bit rigid and didn't
let you easily do other things besides
that and so that got us thinking what if
we took L graph which lets you build
these custom cognitive
architectures and combined it with these
generic agent infrastructure
pieces today we're excited to announce
langra Cloud which is a step in that
direction so with Lang graph cloud you
can take your lingraph
applications written in python or
JavaScript clode and with no changes get
a production ready agent
API that agent API has all the benefits
of the assistance API so it comes with
built-in persistence for whatever the
state of your L graph agent is it comes
with a task CU to manage background runs
and you can configure different
instances of your graph to change out
the llm or prompt that's
used but we've also added a few other
things that we're really excited
about so when you kick off an agent run
and you send a message and it goes and
it does a bunch of
work what happens when you send it
another message before it's finished we
call this double texting and we've
introduced four different modes to
handle
this agents aren't just invoked through
chat they're also triggered on schedules
and lra Cloud comes with buil-in cron
jobs to easily support
this I talked about human in the loop
one of the really important human in the
loop features that we're seeing is the
ability to break before specific steps
for example if you have a tool that you
really want the user to approve access
for so lra Cloud easily comes with break
points which allows you to add this and
then resume once you get that
confirmation
that's not the only human and loop
feature that's
supported so with langra cloud and the
built-in persistence you can easily go
back to any step in the agent's
trajectory edit that and then resume
from there and so this is supporting a
bunch of really cool time travel like
features that we think will be very
important for the uxes of the
future and just to emphasize another
benefit of lra cloud of course is that
it's not bound to open Ai and it
supports any cognitive architecture that
you can build with L
graph finally I want to talk about
lingraph studio so lingraph studio is
what we think the best way to build
debug and share
agenes so taking a look at this video
you can see that you can easily see the
graph of the agent you can invoke it you
get streaming output of all the steps
and the tokens you can go back you can
modify steps edit it and resume from
there you can also add break points so
that for future iterations you have to
explicitly approve it before you
continue so all those human in the loop
features that I mentioned we built into
langra studio and are providing a
developer experience specifically aimed
for building
agents so lra cloud is in private
data we're excited to work with Cutting
Edge companies to figure out how to
deploy agents reliably we think a
framework like Lang graph that gives you
complete control over the cognitive
architecture of your application is just
one part and we strongly believe that
everyone should be building that that is
business logic that you should be
incorporating and that is your moat in
some sense however we also think there
are generic infrastructure pieces that
just slow down the time to get to
production and that's what we want to be
building with lra Cloud thank you
all all
right how do we enjoy the uh opening
Keynote
yeah really great time I want to um run
through a couple of uh of the tracks
coming up so we can go through that so
um agents so Andrew ing famously said
that AI agents could drive more progress
in 2024 even more than the next
generation of Frontier models our host
Demetrios from the mlops Community just
came off organizing his own AI quality
Conference next door and brought his
ukulele to guide us through building
Crews and factories and AI agents join
him in salons 14 to 15 and head to to
get there head to the AWS Booth down
there and turn right at the end of the
hallway to the
end evals and llm Ops AI Engineers
should build real Moes instead of gp4
rappers Twitter loves hacky MVPs but
serious AI engineering means writing
great evals and building good
operational discipline for you to ship
better faster and cheaper you might
think our track sponsor Galileo AI ships
a UI generation platform but you might
be surprised by how they ship the evals
that ship the
platform mouthful to say this is swick's
copy I'm reading it for the first time
so join Osman in salons 2 to6 just
outside the doors behind you and to the
left at our Summit last year open AI
launched the year of multimodal AI with
vision and image generation this year
GPT 40 is just one of many Frontier
models we will use to push into
applications for on device Vision
realtime TTS character simulation and
classroom education one of the track
speakers Ben higher
worked on AI and the Apple Vision Pro at
the Apple Vision Pro and his co-founder
Alexis will guide us through the dawn of
of
multimodality join her uh right here in
Salon 7 um and AI engineer is a
Convention of both GPU rich and poor and
we are excited to learn the
state-ofthe-art from people making
making it uh didn't get a chance to
rehear this this is going well uh our
host Nyla worked in product at Nvidia
and convey and is excited to accelerate
uh our learning at the top teams in gpus
and inference due to scheduling
conflicts we also have two talks here
from yesterday's tracks um Scott wo from
cognition AKA Devin on Coen and Kathleen
uh canel of Google Gemma on open models
join Nyla and her speakers right here in
Salon 8 and last but not least day two
of the AI leadership track addresses
more needs of VPS of AI from
understanding Enterprise rag building
evals in security privacy and compliance
safeguards hiring and growing uh an AI
engineering org as well as case studies
from coher and
twio and as a reminder this track is
exclusive to AI leadership track uh
attendees with the green lanyards and
the green badges so if there is room at
session start time we can let in
speakers with the blue badge and blue
lanyards but anyone else uh again please
do not attempt to attend these sessions
these are exclusive sessions or you'll
be escorted off the premises by security
please don't make us do that um if you
want to attend these sessions next year
the tickets there for you to purchase um
so these take place right across the
hall in salons B to
d uh so we have uh a number of things
happening after this block of talks um
we got Expo and Expo sessions from 10:30
to 11:10 uh we got breakout tracks from
11:15 to 12:15 then we'll have lunch uh
we got Expo and Expo sessions happening
more um then we have HF zero they're
doing a demo day uh that's going to be
in salons 2 to 2 to S over there um just
outside the doors and then we got the uh
more Expo sessions then we'll be back
here for closing Keynotes from 4 to 5
that includes uh CEO of GitHub Thomas
dke but I believe if the demo gods are
good we might actually have a special
guest with us today Tim did we get that
set up oh hey
hello who is this hey what's going on
everyone please come on welcome
swix Hey from Singapore um this is a
live here is also watching along it is
uh 13 a.m. our
time right on well glad you could join
us swix so how's it Ben what's going on
we want do we want to give any more
contacts to the folks here uh um
scheduling speakers is hard especially
when they're busy launching Gemma
2 and uh it's been a fun experience
putting this together I hope everyone's
having fun I hope that the the talks
have been engaging I wish I was there
with
you we absolutely miss you I know a lot
of people thought it was a joke for a
long time that you didn't get to make it
so um
I'm really glad that you're supporting
us this this guy has been so helpful in
everything like every all the sessions
that you see here he's curating that
with you know help from his uh from his
friends but um what is it like from from
afar you know watching the live stream
like are do you have time to watch the
live stream are you catching any of the
talks yeah uh if actually people are
watching the uh YouTube live stream
that's me in the chat uh just hanging
out with folks saying things I regret
sometimes
but uh it's it's very fun to just be one
of the audience my my dream was always
to actually just show up at one of these
conferences and just be an attendee and
not know what's going on um I think
there's a little bit too much not
knowing what's going on even as as a
core
organizer um but it's it's been it's
been fun nonetheless I think uh the the
lessons learned from this event will
allow us to implement a lot of
procedures and policies that allow us to
smooth our the organization task that
bit and I think I think well we we might
have a little bit more budget now that
we've kind of proven the the model here
so um any any parting thoughts as we
enter the last day yeah I mean there's
there's lots of amazing talks uh I I'm
personally excited uh about like I can't
pick favorite I'm personally person
excited about all of them um and uh
they'll catch up on live stream
afterwards and I I think there's some
After parties as well right uh I'm not
sure what the social schedule is like
there's tons of After parties I think
they're all on the they should be all on
the homepage the homepage is getting
very long now at this point yeah um
let's let's let's put those in slack is
everyone on slack let's drop those in
slack as well so we we'll drop those in
announcements or or general yeah and um
I I think I think I restored the for
wsare 2025 uh I broke it just before you
went on stage because I told you I was
going to change the url and then we
didn't change the QR code so try it
again that's why I broke I thought it
was the I thought it was our QR provider
that just got a spike and they just cut
it off interesting no it's
me swix is to blame for the demo Gods SX
is the uh the the demo demon um all
right well thank you so much for showing
up um so sad that you couldn't be here
with us but um really great to be a
partner with you at this event so let's
hear it for swix
everyone so we're going to go to break
now
um we will uh we will see you all back
here at 400 pm. but uh enjoy the
breakouts until then see you
[Music]
[Music]
[Music]
[Music]
know
[Music]
[Music]
because I'm missing you oh oh because
I'm missing you
missing because I'm missing
you because I'm missing
[Music]
youing up my heart
piece that's broken trying to get back
to myself don't have a CL looking for
luck
[Music]
[Music]
tell me to St baby just don't walk away
I need
you
[Music]
Al lost some heart trying to get on my
feet caught in the
I feel
you let
me now I want to be next to you you want
to be next to me holding our hearts
fading our Broken Dreams I want to be
next to you you want to be next to me
hold it on paper heart fing our Broken
Dreams want to be next to
you
[Music]
B just don't walk away I Need You Now f
it out all the time we spent Al fighting
through the fire don't let me down I
need you
I'm feeling out it's getting to
me
L get my
feet caught in the madness I feel
you don't let me go I need you right now
I want to be next to you you to be next
to me hold iter Hearts fing our Broken
Dreams I want to be next to you you want
to be next to me holding our Paper
Hearts feing our Broken Dreams I want to
be next
[Music]
to you want to be next to you you want
to be next to me our heart fing our
Broken Dreams I want to be next to you
you want to be next to me holding our
heart our Broken Dreams want to be next
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
got
Ania with eyes wide shut we got
everything we need and then a little too
much I know that you're starving for
something you can't touch but you'll be
honest with me right
now there's something in the underc I
can feel it up don't you want to feel
it your SES don't you ever feel
[Music]
it baby
esep
you don't to feel it
[Music]
think there's something in my B that's
weighing me down it's just the way of
the world now I'm
call
staring can
weest right
now there's something in
theur I can feel it coming up don't you
want to feel it taking over your senses
don't you ever feel
[Music]
teolog baby es I'll sweep
you feel
[Music]
[Music]
it in and I'm ready to go I found what I
want and I know that we're on top so
I'll tap and I'm ready to hold my breath
and I'm ready to go I catching and I'm
ready to go you're
[Music]
hold we
are feeling
IGN
do feel it in
glow I'll come back to
your to know that
[Music]
[Music]
you I and
[Music]
I and ready
my
bre
[Music]
[Music]
[Applause]
I in a perfect
Stone you and those eyes again
[Music]
[Music]
iast
[Music]
bre and I'm ready to my breath and I'm
ready to go I catch and I'm ready
your it
and and I'm ready to
[Music]
I Go's
go holding my breath and ready to
right
[Music]
and and
[Music]
breath to IAT
[Music]
[Music]
I'm ready to
[Music]
n
[Music]
the
[Music]
[Music]
B
[Music]
[Music]
he hey
[Music]
[Music]
me were the
only we were
[Music]
holding from the greatest
[Music]
weever theing
in
your
singing every night
[Music]
I feel it all come back in the
moment
SP like
the so if you want to come
it play all
[Music]
my he feels like a m
up the night
alone I he
[Music]
Tom
singing every
night
[Music]
play made fire
the sum
bre in
[Music]
your
[Music]
on in
[Music]
moment the so if you want to come
it it all
slow
[Music]
motion like the
come with the
open it all back in slow motion
[Music]
[Music]
[Applause]
[Music]
know
[Music]
you
[Music]
know know
[Music]
[Music]
[Music]
[Music]
get
[Music]
secrets that we
know doors that open
us in a
moment keeping
oning
[Music]
keep catch our breath the
[Music]
midle coming Crystal Vision
[Music]
true I see on
[Music]
the
forest for the trees I'm keeping all
theor W up
[Music]
[Music]
and
is the coming Crystal
[Music]
you
[Music]
true
[Music]
I chasing the is we know
[Music]
trueing on the I see
the
[Music]
hor
you
[Music]
true I
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
he
[Music]
[Music]
[Music]
see
[Music]
[Music]
we
oh
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
you
you you
it would you would you it
you
[Music]
you
you
you
you
[Music]
you
you you
would you get would
you would you would you
[Music]
[Music]
you
you
you
you
[Music]
[Music]
you
[Music]
you you
you
you
you
you
you
you
you you
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
breath so I could find strength to
divide
[Music]
[Music]
KN when to kill your pride there's no to
blame nothing really stays the same this
is how
we we hold on
[Music]
[Music]
[Music]
us
[Music]
I
know
on this we
go got to give it
up KN to kill your pride there's to
blame nothing really
[Music]
STS let go
[Music]
[Music]
go KN when to kill your pride there's no
to blame nothing really stays the this
is how we
[Music]
[Music]
a
[Music]
[Music]
[Music]
take
[Music]
[Music]
Dam
n
[Music]
[Music]
[Music]
[Music]
d
[Music]
f through
[Music]
[Music]
like we our CL dancing right
through while
[Music]
weing through
window
ring through the Ping past all the
you're wearing out my name we our
[Music]
sneath like
super like
[Music]
supering down a
Harmony
only us it's
forever in America
Inu Moon
wax I couldn't see it until
you insane we blame it
[Music]
all we
[Music]
enough we our cles dancing right through
the fire while we were Cho singing
[Music]
as
window
Harmony
[Music]
only it's forever so in America
coming over me electric
Sy every night on
[Music]
I feel
[Music]
[Music]
back
us hold tonight
all
going
[Music]
us hold back tonight is all we have
going
us hold
[Music]
tonight
aony only we can he
super CR you want to feel like
it's
forever
America it's coming
[Music]
over America
[Music]
[Music]
[Music]
hold
[Music]
[Music]
[Music]
it was summer back in
89 we were kids falling in love for the
first time your hand you look me in the
eyes kind of feeling you get Once in a
li but now something went wrong
you're moving on I found myself on The
Blind Side now you won't call we lost it
all you fade away I'm picking up my
heart from every piece that's broken
been trying to get back to myself but
don't have a clue I'm looking for some
luck can't find a door that's open I'm
losing all my feel like love
you because I'm missing
you because I'm missing
you
oh because I'm missing
you because I'm missing
you because I'm missing you
[Music]
I'm missing
you I was chasing all the
sides trying to hold on to something
that I couldn't find you didn't
Captivate my
mind now I know we've in the sunsets in
Paradise
but now something went you're moving on
I found myself on The Blind Side now you
won't call we lost it all you f away I'm
picking up my heart from every piece
that's broken and trying to get back to
myself but don't have a clue I'm looking
for some luck can't find a it's open I'm
losing all my feels like I'm left
hereo because I'm missing
you because I'm missing
you
oh because I'm missing
you because I'm missing
you because I'm missing you
up my
heart that's broken try to get back to
myself don't a
cling find open I'm Los all my feels
like I'm
here because I'm
[Music]
you f out all the time we spent alone
fighting through the fire let me down I
need you now cuz I'm feeling worn out
it's getting to
me lost some heart trying to get on my
feet caught in the madness I feel you
some don't let me go I need you right
now I want to be next to you you want to
be next to me holding our Paper Hearts
fading our broken dream
I want to be next to you you want to be
next to me holding our paper heart
fading our Broken Dreams want to be next
to
[Music]
I'll
[Music]
[Music]
are super excited to welcome you to the
multimodality track for the AI engineer
World spare things have been changing a
lot in the world of AI and llms going
from just text to text to now a whole
world of multimodal inputs we have a
really exciting set of speakers here
today to help you learn a ton more about
this and so I'm excited to welcome Rob
CH to talk about substrate it is their
launch week so give them a round of
applause hey um yeah it's really good to
be here um this is a particularly
exciting talk for us because we've been
working with private clients for about a
year now but this is the first time
we've really talked about it in public
um since our launch last week um I'm
incredibly proud of the work we've done
so far and um excited to take a few
minutes to tell you about it um so if
you look at the products out there that
have really successfully leveraged this
generation of AI I think one thing is
true about nearly all of them is that
they're using more than one inference
runs often many different types of
models in tandem to accomplish a
specific kind of task really well
and I think people really quickly
realize that the foundation model is not
enough and
even very simple tasks like summarizing
a document to much much more complex
tasks like solving coding problems end
to end I think the best products right
now are all using systems of inference
runs in a logical structure
so I think at subrate we believe that
building with modular intelligence is
always going to be more effective than
building with a monolithic intelligence
um these systems are inherently more leg
which means you can understand them
structurally which means that they're
debuggable and they're
extensible and evals become a lot easier
because the decision trees are explicit
and you can sort of verify at every step
what's going on and what's going wrong
um
so substrate I think is a is sort of New
Way new approach to this um I think our
model is sort of fast in ways that other
paradigms can't be it's sort of flexible
enough to build any AI product out there
and it works to scale by default so what
is it um I think at its core substrate
is a coupling of two things first I
think it's a really elegant developer
SDK that lets you describe a computation
graph over any number of nodes um and
the abstractions here are are really
General and so we have we have a bunch
of intelligence nodes across all the
most modalities that you might care
about which is like generating images
transcribing speech generating text Json
embeddings executing code um but second
subrate is also an inference engine
specifically built to run these
computation graphs as efficiently as
possible um so these graph
representations here um are it's a
representation of many tasks and their
relationships and since we run a very
coordinated compute cluster um we can
statically and dynamically optimize
things like batching caching sort of
networking concurrency physical
placement um which really makes a big
difference uh and if you look at most
Frameworks out there um they're
typically involving dispatching a bunch
of API calls separately and if you look
at what happens mechanically when you do
that it's every step means you've got to
resolve DNS you've got to go through
proxies you've got to through
authentication like
balance checks um and all of that sort
of adds hundreds of
milliseconds of latency on every single
step and if you contrast that with
substrate we we transfer data from node
to node process to process on the order
of microseconds which is some 10,000
times faster meaning that it's actually
feasible now to run online applications
that involve dozens of nodes um we've
also noticed that Json decoding is is of
the most useful patterns for multi-
inference runs and I think we've
invested a lot
into offering a a Best in Class um Json
mode both in terms of reliability and
speed and if you look at all of this
together I think what it means is that
substrate is is is really a way that
weit didn't enable higher quality
outcomes with AI letting you work in a
system that's more flexible it's more
legible it's more verifiable than any of
the current paradigms that sort of exist
now um I think there's a lot more to say
there all the time I really have today
it's only five minutes um but if you're
curious um please come out and say hi on
the Expo
for you can scan this QR code we and get
some credits um and go to the website
sub. run um or give me an email at uh
Rob sub.
run
for for
awesome thank you so much Rob um and now
we're excited to welcome Vic kapati who
is the CTO at m87 labs to talk about
moondream hi can youall hear me I can
hear myself uh hi my name is Vic I work
on a Model an open source Vision model
called Moon dream
um a little bit about myself before I
dive into Moon dream uh I was at AWS for
about 9 years
um before I started working on this
model um looking at where the stock
price is going I'm not sure if that was
the right financial decision but I'm
very happy with the work I'm doing uh so
let's dive into it I'll talk about m a
little bit um it is a tiny Vision
language model it's less than two
billion parameters so it can run
anywhere and it's open source Apache 2.0
so you can use it to do anything um
here's some examples of things you can
do with moon dream you can ask it
questions about it
images um you
can caption images uh it can detect
specific objects inside of images so
here I asked it to tell me where the
peak is and gives me
coordinates um I can count stuff I can
do all sorts of things um I had the
audacity to title my talk how can a tiny
Vision model slap so hard so I have to
back things up a little bit um and so
here's me doing that um these are two
Vision uh benchmarks V Vision question
answering bench marks one's called VQ V2
the other is called
gqa um as you can see mream has been
steadily improving over the releases
I've made over the last three months uhu
I've included a reference line over
there for lava 1.5 which is a popular 7
billion
parameter uh Vision model so this shows
you that um Moon dream gives you compar
performance that's comparable to uh
models that are about four times bigger
than
it um I didn't really set out to build a
vision model um so so I kind of got
roped into it I was originally trying to
build an application that required an AI
agent so I needed to be able to see what
was going on on the user screen and um
have it describe what's on the browser
page for QA testing automation tried to
do this at first with uh GPT 4V but um
there were too many safety refusals back
then like if there was any human being
present in the image it would just
refuse to process it um it was also
going to be really slow and expensive
and so I realized if this is a product
I'm trying to build I really need to
have control over the model itself so I
figured you know what how hard can it be
let me just go try and uh build this
model myself now the task I was trying
to perform here was fairly constrainted
um I I just needed to describe screens
and answer questions about screens so um
it doesn't need to be generally
intelligent um I had a couple of 309s at
home so I figured I'd train a small
version of the model at home and then
rent some B machines in the cloud to go
train a bigger version um and and uh
once I got done training small version I
was like hey this actually works pretty
well so I posted it on Twitter I thought
you know what I might get 20 likes off
of this and then I'll move on with my
site project at the time um it blew up
far beyond expectations I was a little
surprised pleasantly surprised but
surprised nonetheless and I immediately
started seeing um other automated
testing companies reach out and be like
hey can I use this to describe browser
screens uh because this would work
really well for us um as well as other
companies shout out to our friends at
open interpreter from Seattle um that
basically told us that they
were I forget you know what like this is
getting a lot of traction let me pause
on the whole automated testing app for a
couple of weeks and focus on moon dream
and see where it goes
um yeah so let me dive into a couple of
the technical details um around what
makes the model succeed despite being
small the first thing we did um that I
think really helped was um deciding what
problems the model should solve and what
it should not solve so mream wants to be
a developer tool we focus on being
really accurate and not hallucinate um
it doesn't really have a lot of
knowledge about the world so um if you
ask it to write a poem it's probably not
going to help you it's really focused on
answering question like giving you
helping you understand images um this is
really important because it affects the
type of data that you use and the sort
of benchmarks that you want to focus on
uh there's a popular Vision language
model Benchmark called math which
measures how good models are at solving
math problems you take a picture of a
differential equation and you see
whether the model can solve it that was
uh an example of a non- goal for us
because we just want the model to be
good at looking at images the most we do
is probably generate a latch
representation of the problem we don't
really want to even attempt to try and
solve calculus um it was not pre-trained
from scratch we use um um we we fuse a
vision encoder called sigp from Google
uh with uh
pre-trained text model called f 1.5 from
Microsoft uh the notable thing over here
is 5 1.5 was also trained on mostly
synthetic data which is very similar to
our pipeline so it works very well
um for this sort of task pre-training
from scratch doesn't really make a
difference uh as opposed to using
pre-train models and it is cost
prohibitive so unless you want to get
those brownie points for saying you
trained it from scratch uh it's probably
not worth doing I've experimented with a
bunch of different other models uh as
they were released and nothing really
made too much of a difference what does
make a difference though is uh training
data um the latest release of moon dream
is trained on around 35 million images
and uh the problem is especially when
you're on a budget like high quality
multimodal training data is really hard
to come bu by um there's companies that
uh there's a lot of companies out there
that will annotate data with humans um
but it's really expensive and I've heard
a rumor recently that they won't even
talk to you anymore unless you're
willing to sign an upfront seven figure
commitment there's a lot of data on the
internet images all text pairs um the
problem with this is it's often not in
the format you want it to be and it's
really noisy and the noise is really
problematic when
you're and when when you're training
small models um and so synthetic data is
a way to solve this uh where you use
that all text information process it
it's a bit of an Open Secret that a lot
of people are training on outputs from
gbd4
um You probably don't want to do that um
besides being questionable in terms of
terms of use uh it's often not helpful
gbd4 is a very powerful model it has
reasoning capabilities and knowledge
that your small model is never going to
be able to get and so when you train it
on gp4 outputs what it learns instead is
to hallucinate it's going to generate
plausible sounding outputs that include
details that it cannot possibly memorize
uh and so you end up in trouble so um
this is a little important I'm going to
go a little more technically detailed
for a couple minutes to dive into how to
do synthetic data so bear with me for a
sec we'll pop back up um here's an
example of how not not to it uh Coco is
a data set it has around 200k images
each image has five short descriptions
and a bunch of uh object annotations uh
with like hey there's a bicycle these
coordinates and whatnot um and let's say
you want to take the short descriptions
and these these object annotations and
generate more detail captions that
include the union of all the information
present over here if you just naively
call GPD 4 uh with this information uh
it generates this not important to read
all of it uh but there's uh two
important things to not the first is
that um it hallucinates it says uh in
the second paragraph there's a person
near the right side of the harbor I I
think there's like a person way back the
there's like five pixels there that may
be a post it may be a person we don't
really know that's because the object
annotations were bad but besides that
like uh the model is also taking a lot
of creative Liberties over here like
saying there's there's five Yates
standing out from the rest and whatnot
um and so this is uh you need to do a
little more pre-processing of your data
before you feed it to the model um
here's another example there's a data
set from Google called localized
narratives um the task annotators here
uh annotators are given here is um
verbally describe this image and as
you're describing the image uh how are
your mouse over the part of the image
that you're describing so it's nice and
that it encourages people to create
really detailed descriptions that
capture spatial positioning
um in the image so for example here it
says the girl in the front is playing
the guitar and whatnot and spatial
reasoning is something that Vision
language models typically tend to
struggle with um I ended up having to
build a fairly sophisticated data
processing pipeline to get really good
results with this um not really
important to dive into the details over
here but the important thing to note is
a it gets really expensive
um each image ends up being 20 llm calls
and the llm here is mixw 8x7 so it gets
pretty expensive
um but it was necessary uh the training
data is the biggest needle mover in
terms of model performance and because
of this uh I'd say we spent like maybe
one or two orders of magnitude more
compute on generating training data than
actually training the model
itself um a couple so yeah this
particular data set we've open sourced
it's uh available on hugging face here's
an example of the type of questions it
generates for this image um
the there's an interesting question
towards the end what theory does the kid
have about the existence of pleasure in
the image I'll talk about that in a sec
but basically you want to generate a few
distractor questions so the model knows
to not always agree with the question
that the user ask is
asking um so yeah couple of the
challenges involved in working with
synthetic data
um there was an interesting incident I
had uh early on where a user was like
Hey I asked a relatively simple question
why couldn't the model answer this um
and when I looked at it it turned out
that they didn't capitalize the first
letter in their question and the model
had never seen anything like that during
training so I was like what do I do over
here um and so it's really important for
you to make sure that your training data
has the same rough distribution as your
real world query so ended up adding like
an extra step where we artificially
inject like capitalization issues and
typos and whatnot into the model before
training it um there's also this risk of
what we call model collapse where your
model has biases iner into it so for
example if you try to ask Mi to generate
distractor questions he just generate a
question that's completely irrelevant to
the image it'll always generate
something about dinosaurs and aliens and
so if you train your model on that it'll
instead learn uh to say uh hey if the
question is about dinosaurs in Aliens
always say no which doesn't really help
um and so there you need to inject like
some entropy into the process of
generating synthetic data to avoid this
uh in the case of synthetic captioning
you can do something like hey describe
this image but also consider the alt
text on the image which may be no may be
relevant but if it is relevant use
relevant facts for that uh and that
tends to help a
lot all right so popping back up
um there were a couple of important
learnings I had over the last three
months uh that I would like to share
with all of you uh the first was U the
community was really critical in this
whole journey seeing that original
engagement that we got from the M
emploees helped me realize that hey
maybe this is is more valuable than that
QA testing application that I was
working on because a lot of people have
a need for this to build applications
like that uh coming from an Enterprise
is company uh it's been really valuable
it's been refreshing to be able to just
talk to customers directly like someone
tter DM and be like hey just saw you
looking for this what do you think um
but it's also helped us connect with a
lot of Partners mentors and get a lot of
support from the
community being open source was critical
I kind of didn't really have a choice
over here because the competition was
free so what am I going to do um but
when you're in the dev tool space it is
pretty important uh open source is
important to a lot of developers they
would like to have the ability to run it
in different environments um it's also
pretty important for a lot of Enterprise
users in a lot of cases they don't
really want to run the software
themselves but having the option is very
important to them because uh they've had
most Enterprises have had situations
where a vendor goes out of business or
decides to
um screw them in some other capacity um
it's also been really critical for
engagement for us we've had a lot of
people in the community help out Port it
to different platforms running run the
model in the web browser and whatnot so
it's been very valuable for
us um this one is a little controversial
U not sure everyone agrees with this but
I feel pretty strongly that safety guard
rails should be implemented at the
application layer not baked into the
model itself
um this was one of my learnings from my
first attempt to build a QA testing
application with GPT 4B um it made no
sense for that application to reject
pictures of any picture the contain a
human being I understand why they felt
it was important um Dev tools are kind
of
B2 B2B not b2c so it's important to make
it easy for developers to decide what
guard rails they want and implement it
in their model as opposed to just
deciding it for all users uh I'm not
saying this is not important at all uh
kind of makes sense if you're trying to
build an assistant to bake that stuff
right directly into the model but when
you're building for developers makes
makes makes less
sense um yeah I believe pretty strongly
now that tiny models are going to run
the world um in computer vision more so
perhaps than in text models efficiency
is really important
um in a lot of cases you're really
worried about cost because your
processing video and 30 frames a second
at 7/10 of a cent per second adds up
very quickly and doesn't give you a lot
of room to work with but there's also
situations where you're really worried
about
privacy uh or latency and therefore you
want to run the model really close to
where decisions need to be made um which
is not to say big models are not useful
I think they're very useful I just think
that we'll mostly be running them in our
development environments maybe for
generating training data uh but the
artifact that you going to want to
deploy is most likely going to be a
smaller
model
um another thing that was a little
surprising to me was looking at the
different things people were doing with
mream there were a lot of people
building net new applications that
weren't possible to do before because a
model can understand language as well as
images but there were also a lot of
people doing traditional computer vision
things with um with the model it's like
is is there a person in the scene or uh
is there something suspicious going on
tell me where the uh where where where
the bus is in this picture of from a
road camera
um all of which was possible to do
before we had Transformers like just
train a YOLO V3 model or whatnot the
thing that was
uh yeah the lesson I I took from this
was uh prompting is a much better
developer experience than having to
train a custom model and so for a lot of
developers that would be interested in
incorporating Vision into their
applications um before they'd be like
you know what it's not worth me spending
two weeks learning how to like collect
data and annotate it and train my own
custom model um giving them the option
to say hey for Fairly cheap you can just
in English describe what you want
extracted from this image uh makes it
something that they actually consider
doing it doing
now all right um I think I'm a little
ahead of time so I'm excited to maybe do
a live demo if the demo God smile upon
me but we'll see um in conclusion uh
yeah where's Moon dream going we we're
not AGI people I'm really focused on
making it really easy for developers to
build amazing applications with vision
um there's a bunch of model improvements
that I'm working on right now um I'll
talk about some um right now we use 729
tokens to represent an image so you can
only really send one image to the model
at a time uh we're working on giving
users the option to like give a more
compressed representation to the to the
model um which makes sense if you're not
trying to read text or something from
the image if you're just trying to do
classification whatnot that makes the
model run a lot faster which is
important especially if you're on CPU as
opposed to gpus which can't do as
much uh CPUs can't do as much parallel
compute and so that sort of thing ends
up being really
important um we also just raised a seed
round um from felis Ascend and also the
GitHub fund which I forgot to include in
the slide sorry get
um this means more gpus but more
importantly it means I can finally get
some sleep because we're able to get a
couple more people to join the team uh
if you're interested please reach out we
have a contact email on the website or
just hit me up on Twitter uh we also
have an exciting release coming up later
this Summer that I'm super pumped for so
stay
tuned um I think that's about it so I
have a couple of minutes left I think so
I'm going to try doing something that
may not be the wisest idea but we'll see
how it goes
all right uh I turn the Wi-Fi off this
whole thing is running
locally um so what this is going to do
is like um
start taking my webcam in and it's going
to use Moon dream in a in an infinite
Loop to describe what it
sees and we can ask it different
questions so we'll see how that goes
and yeah you can ask it different things
uh so let's say is the person wearing
glasses um you do have to tell the model
to answer briefly if you want a yes or
no otherwise it gives you like
a
um answer with a single word let's try
that yes okay I'll take them off
and I can't
see
what's did it get
[Applause]
it let's do that I'll go back to the old
prompt
[Applause]
all right well that was it for me thank
you all
thank you so much
Vic we are super excited to welcome our
next speaker uh Ben hilac who is one of
the founders of dawn analytics and he's
going to be talking about the era of
unbounded
products hey all um I guess actually I
have this on I don't need that uh I'm so
excited to be here with you guys today
and I think what is uh probably the
coolest uh AI
conference uh in the world at such an
exciting exciting time in history I
think especially for AI products
um if you don't already know me from
either demos on Twitter or sometimes
probably ill advised spicy takes on
Twitter uh my name is Ben hilac and uh
I'm the founder of dawn
so at dawn we help some of the best
companies in the world everyone from
GitHub to can of soup build better
more predictable AI
products my entire life I've
been like really obsessed with building
and designing unbounded products so
unbounded products are products that
transcend the uh the the mouse the
monitor in some way right so for me that
started with
robotics uh when I was is I think the
first one was when I was in eth
grade uh eventually Rockets uh at SpaceX
so these are very these are very
unbounded products right um and then
most recently uh I was on the design
team for the Apple Vision Pro for four
years so we designed the first version
of vision
OS I think
that AI makes
products less bounded than they've ever
been right you can type you can talk you
can show images or show video just like
we just saw you can also sort of plead
you can bargain you can confide right
these are very interesting sort of input
modalities and this
unboundedness often makes products
unpredictable
right confusing hard to
understand
users assume your product can do things
that it can't they try to do those
things doesn't
work and they walk away thinking that it
can't even do the things that it
can't when you talk to people and
specifically people that are not in this
room how they use chat
TBT how they learn how to use
it it's often Word of Mouth
right so they hear one of their friends
say that they used it for travel
planning and then they go use it for
travel planning a lot of us a lot of us
in this room especially like people that
are more technical we often learn
through trial and error right so we just
keep trying keep trying we keep trying
because we know that these models are
good right we know that it's impressive
um but a lot of people are not they
don't do the trial eror thing right so
they try it once it doesn't work uh they
don't try
again and so this talk is about making
good AI
products and to that end I'm going to
cover just three
things so those three things are the
past right so how have products become
more unbounded and what has worked for
unbounded products in the
past the present which is AI products
today what are sort of good design
patterns and bad design
patterns and then the third point is
going to be the future right so again
just three things just the past the
present and the future
easy so we're going to start with the
past
so most software that we use lives on a
screen right and you use it just by
typing sorry you use it primarily by
swiping clicking and tapping right uh
when you click something whatever the
developer expected to happen is what
happens depending on how good of a
developer you
are it's easy for users to understand
what your app can do they look they see
the buttons they get it it's also very
easy for you to understand what your
users are doing you just add a amplitude
or mix panel call on a button press you
see what they
did
so if you think about one of the biggest
changes to this uh previous to the last
two years was multi-touch right
and this is just like instead of one
pointer you have two but just by adding
that second pointer you get relative
distance you get rotation right and just
this one little change like largely made
the smartphone possible Right like
largely made it easy to use a screen
that is that
small and now it's just getting crazy
right it's like we have unbounded
products everywhere products are so
unbounded you have software you know
just freely roaming the street of San
Francisco getting attacked by fiery mobs
right so this is getting crazy um and so
I want to talk a little bit about just
one unbounded product that's I got to
work on uh which is the Vision Pro and
what I want to talk about is just three
lessons that we learned while we were
designing it lessons that I think aren't
as intuitive looking from the outside
in so I think that bounded products are
often defined by this what if question
like when we were
starting it's like users get themselves
into the craziest situations so
something as simple as oh well what if
someone's in the living room and then
they move to the bedroom and they lay
down on their bed right what should
happen to your
apps if you're designing Mac OS that's
on a laptop you don't have to worry
about that but that's something we had
to think about and there's hundreds of
more questions like this right what if
someone's on a plane what if someone's
next to their friend what if someone has
a a disability of some sort right like
uh they cannot move they can't move
their neck they're bedridden so all
these wh ifs and I think this is again
what defines unbounded products right
all of us that are building AI products
we're constantly thinking you know oh
like what if someone puts in this what
if someone puts in that and there's
evals etc
etc and
so without structure you just have chaos
right you have a blank slate you have
all these wh ifs um infinite World of
Poss possibilities and so it's really on
us as product designers to add structure
and structure is what creates Clarity so
again I want to talk about three ways we
added structure the first was
highlighting what matters and doing it
really fast so the first thing you see
in Vision OS is a home screen it has
apps it has people and it has
environments so those are the things
that we think matter when you're using
Vision
OS so they're the first thing you
see might not sound that novel and it's
not in a lot of ways it's the same thing
that happens on your iPhone but when you
compare it to VR products that came
before it's very hard to understand how
you're actually you know what is this
thing good for when you look at this
menu the second point is
hierarchy hierarchy is what gives
unbounded products a shape and a purpose
right it's what helps users understand
what it's good for what they should use
it
for so again we have the home menu
that's kind of where everything starts
and ends for vision
OS we have
Windows they have bounds you can resize
them and move
them and any individual window can go
full
screen right so that was our
hierarchy the last point which is really
important and I think the easiest way to
make unbounded products feel familiar is
sorry intuitive I got ahead of myself
it's familiar
familiarity uh it was something we hit
when we were building Dawn our first
kind of prototype was this star cluster
thing that you could explore is really
fun um nowadays it looks a lot more like
this which are you know tables and we
have graphs and examples again it's just
structure
Clarity and I think that it's no
accident that you know the TV app on
Vision OS looks a lot like the TV app on
tvos right it's not an accident it
wasn't laziness uh when people are sort
of in Uncharted Territory you want to
give them as many signs as home signs of
home as
possible same thing for control center
right when people see Vision OS they
already know how to use
it so again these three
points highlighting what matters
bringing that to the
Forefront establishing
hierarchy and then leveraging
familiarity all
right and so that was the past now we're
going to talk about the present and
specifically we're going to talk about
AI products we're going to talk about
way
both good and bad that
products have been incorporating
structure into their AI
features it's really important to note
the right Str sorry the right structure
is very unique to your app right that's
the whole point is that it gives your
app a shape helps your user understand
what it's actually
for so let's take something like dot
right dot is a companion thought is sort
of a journal at least for me and so the
structure they added was that if you
pinch out you can see each day separated
right it feels a lot like a
journal and if you tap a person or two
people in this case my co-founders I can
see this again structured information
about them in a
timeline every time I mention them to
Dot and so again you're pulling that
structure out of the
chat lexity does a really good job of
using structure to make their experience
feel more like a search engine and less
like chbt less like a chat you're having
a conversation with right and they do
this by you know really pulling your
title you know your query up top as like
a title you know highlighting the
sources it came from and then having the
answer below that right and then
having that take up the full page kind
of regardless so it makes it feel again
like more like one shot less something
you're having a back and forth
with now I want to talk about uh sort of
an anti- pattern I've seen which is um
this is in the the versell uh chatbot
demo I think versell does some of the
coolest design work in the entire world
I didn't like this one um so this is
like having this idea of almost
ephemeral UI but inside the flow of chat
right and I get the appeal right so
actually if we go back here sorry this
was a video M to show um you know you
have a slide slider right so instead of
having to like you inquire about you
know you want to buy Doge and it shows
you this UI so you can adjust exactly
how many instead of having to do it over
text it's it you know could be good the
problem is that when it's stuck inside
this sort of unstructured thing it
starts like floating away as I try to
ask follow-ups right and then at some
point I even have two of them right so I
go back up to the first one I pressed
purchase and now I'm interacting with
something that's completely different so
it reminds me a lot of sort of that the
house and up right it's just kind of up
up
away so instead of trying to put
structure stuff it into this
unstructured thing I think the answer is
you pull it out right you pull it off to
the side and what that means is that as
the conversation continues you can just
sort of update that
structure without disrupting where the
user
is and that's exactly what Claude did
with artifacts right and I think why
it's so successful is that they pulled
out the structure which is the app
you're working on and iterating on from
the actual
conversation and then so as you make
changes you can even go between the
versions here without even having to
scroll in the conversation right so it's
beautiful and it actually brings us to
another thing that I think has been
really effective for AI apps which is
this almost concept of Version
Control so this was actually one of the
like shipping like original chat CPT
features which is kind of crazy but if
you edit a message um you can go between
the the versions right and actually
maintains this entire tree it's very
complicated um but super
powerful with v0 versel did something
again amazing where it feels extremely
familiar almost like you're working on
Google Slides or something um but you
can go back and iterate keep iterating
on UI without having to be afraid that
you you're losing something right so
again
versions again I think familiarity is
really one of the most important things
for unbounded prodct s um I think Claude
did an excellent job with this again I'm
I'm hyping them up here but uh chbt
introduced memory across all of your
chats right completely unbounded so when
I tell it something about you know some
sort of medical problem as I'm working
on JavaScript it's like you know it
knows that which is very weird to me um
I think this idea of projects and that
structure of a project is very familiar
um so sharing context across a project
makes more
sense agents are something that are
extremely unfamiliar to most
people and uh this idea of having you
know all these different tasks and
you're feeding data between steps
whatever but you know what is
familiar are spreadsheets right
spreadsheets are extremely familiar to
not to me actually but to a lot of
people and
um I think the only real uses of Agents
I've seen in in the world in the real
world are our spreadsheets so this is
Clay right and each column is
essentially a step that an agent is
taking the user is defining so it's
going across building up kind of context
across the spreadsheet each row it's
often almost you you you do it like a
eval right so it's like you run the
first 10 rows and then you run the next
50,000 100,000 right so you get it right
and you can see here eventually you end
up with a personalized email as the last
column um but with all these steps in
between the next thing
that I think is extremely effective in
helping people understand what your app
is for and skipping all the sort of
noise of prompt hacking prompt
engineering are examples and
presets so chachu I think was the first
for this where they had these you know
message to comfort a friend plan a
relaxing day and so
on vzer does an awesome job with this
right we're not not just having those
suggest those suggestions below but they
also have an explore page where you can
see what other users are doing what's
actually working right again try to like
shortcut this like prompt uh you know
blank canvas
problem notion as well right they have a
simple menu where you can change tone
for text instead of having to like be
like uh you know you are a very concise
GPT whatever whatever whatever right so
you're just using these tried and proven
uh things that notion can
validate and that last Point brings us
to the Future right so where are
interfaces going in the
future uh lus gave an awesome talk last
year where he described prompt
engineering as this almost trying to
drive a car a llama trying to drive a
car with a pool noodle from the back
seat was I think his his metaphor uh and
there's some real truth to this
right and so I think first of all the
future has a lot less prompt engineering
and we're already seeing this right
we're already seeing this with um
generative images you know the way that
Apple designed it where you're mixing
and matching these different
concepts uh you're able you know there's
a ton of demos on Twitter of people you
know essentially you're going between
emotions here in a more intuitive way
and then just yesterday figma released
this way of adjusting the tone of text
right where you're going between
professional casual expanded
concise the problem with
this is that casual means a lot of
different things right casual for a
Fortune 500 company and a um
you know direct to Consumer Cosmetics
brand uh you know with ads on on Tik Tok
right these are very different things
casual when talking to your best friend
or a coworker these are different so how
do we avoid being reductive when trying
to offer these sorts of
presets and the answer is you just like
mill I I don't know exactly how many
zeros I put here but you just like
million x or billion x the number of
presets right so you have enough presets
for
everything
and I think sparse Auto encoders show a
really promising path towards that so if
you guys have tried Golden Gate CLA
where you can kind of identify the one
feature of Golden Gate bridess and
amplify it and it makes claw obsessed
with bridg uh Golden Gate bridges uh
specifically or the Golden Gate
Bridge uh my friend gdus has an amazing
demo towards this but for manipulating
images right so you can see here he's
increasing the amount of play of light
and shadows increasing the amount of
serene Forest streams or Venetian canals
um again a very controllable and
predictable
way okay but uh so now we have a million
billion options whatever how do we avoid
too many
options I think this gets to point three
which is ranked presets so these are
presets that are personalized searchable
and even invoked through natural
language they might not even be directly
visible to the user so the user types in
something like more
friendly and you pull up the
corresponding presets like
how close you are how confrontational it
is again maybe they're directly editing
it maybe they're
not and this gets to the last point or
second last point which is developer
defined
personalization so as soon as you're
able to define those sort of
features you can start tuning them per
user so each
user in a way that you can't do with
just text prompts today right because
text prompts are sort of this fragile
House of Cards where if you remove one
word the whole output changes
so you're able to tune it per
user and the last point and especially
true as you start your your app is going
to become increasingly different per
user is shifting from eval to
analytics I don't think there's going to
be some objectively correct for a lot of
domains answer to things like who was
the first president yes but the right
sort of tone for a summary for a
specific user I don't think so and so I
think that increasingly it's going to be
about how do you understand if you're
meeting the needs of your users and what
they're asking for so uh that's it thank
you so much oh yeah we'll skip this one
and thank you so much for coming and uh
yeah
awesome thank you so much Ben and now
we're excited to welcome Karan Goyle
from caria to talk about State space
models
uh can everybody hear
me awesome um we'll see if I can stay on
track 18 minutes so uh great to be here
I'm Curran I'm the CEO of a small
company called cesia which is a um 8mon
old company uh and I'll you a little bit
about what we're working on and uh try
to describe some of the challenges that
um we see in sort of multimodal um
intelligence uh emerging and and how we
uh plan to fix them and hopefully
actually show a demo as well um if Wi-Fi
works um so maybe to set the stage a
little bit um last four or five years of
AI have basically been really focused on
this idea of batch intelligence which is
sort of um pretty core to this uh idea
of building like an AI system that can
reason uh for long periods of time on a
problem and then solve it so you can
think about like math problems or you
know physics problems that are hard um
there's a lot of applications where
actually what you need are systems that
are streaming so they're real time they
work instantly so imagine um generating
video audio or um doing like
understanding applications on um sensor
streams Etc so um so it sort of
bifurcates where there's these two
different types of applications similar
to how there's you know generally this
idea of having batch workloads and
streaming workloads and so um a lot of
what we've seen over the last few years
has really been focused on uh batch apis
where you call a model in the cloud it
takes a few seconds and then you get uh
a pretty good response back um and now
we're seeing some shift towards more
real-time applications where you
constantly will be quering a model and
um asking it to return responses at low
latency and then uh using that to sort
of corporate uh or generate
information um and I think this you know
this area is really exciting because
it's going to be transformative to a lot
of interesting applications that um have
so far actually not necessarily been uh
the main focus for uh a lot of what
we've seen over the last few years so
conversational voice uh is an example of
this where you should be able to
interact with a system and then talk to
it um and it should be able to
understand you and and do all kinds of
tasks on your behalf uh this is similar
to having assistants um that are on
device and run kind of really
efficiently at low power um at you know
all times um regardless of whether
you're on a phone or or laptop and then
um things like World Generation where
like you can imagine actually playing a
game that is generated in real time
similar to um how the graphics are
rendered um on gpus um and um all of
this you know should be able to happen
in real time on low power on your phone
uh on your MacBook Etc um robotics is
another great example where it sort of
culminates with all these coming
together on a on a single um device that
is uh trying to kind of interpret
everything in the world and so I think
this is sort of the exciting
intersection which is like how do we
make intelligence faster and cheaper so
that we can put it um everywhere
basically um and couple examples that
are really powerful realtime
intelligence for conversational
interfaces uh is going to be really
interesting because you would be able to
have a agent that can provide customer
support for a problem answer questions
about health insurance uh you know call
your vendor to pick up a shipment all
these coordination tasks that uh
generally are um annoying to do should
be really automated and uh real-time
intelligent agents should be doing them
um and then humans can spend their time
solving sort of harder problems that are
uh more interesting and in customer
support that could be dealing with uh
you know the tail of customers that are
much more important because they're
pissed off or they're uh uh more
important because they have uh you know
more customer value
Etc and similarly in robotics there's
this idea of like ingesting similar to
humans like audio video sensor data and
then responding instantly to a lot of
these um pieces of information so I
think this is sort of the the world we
we should be living in where all these
intelligent models run super fast they
saw all these different problems and uh
they're able to really kind of um Power
these new experiences that are
Interactive at their
core so this is where we come in we're
building uh these realtime Foundation
models so um some of what I'll talk
about is uh the work we've done in
really building kind of new ideas for
how you can create uh deep learning
model so um I did my PhD before this I
was working with a lot of these folks
for my PhD Chris was our PhD adviser um
and um we were really focused on this
idea that you should be able to have a
model that can compress information as
it comes into to the model and um use
that to really kind of build powerful
systems that are streaming at their core
um and I'll talk a little bit about this
but that's really the technology that
we've been working with for the last
four or five years we've been developing
Academia and some of you might have
heard of things like Mamba which is sort
of a a more recent iteration of this
technology you know I did my PhD working
on some of the early iterations that
nobody uses anymore but um are sort of
the precursors to a lot of the modern
stuff that is now more widely used
and uh and now what we're doing at caria
is basically taking this and trying to
understand how we can improve it how do
we push the boundaries on what
architectures can do and um and I think
it's an interesting question because you
know we should not settle for having a
uh one way of doing things I think
that's sort of a a poor way to kind of
um think about the future so our
approach is sort of like let's think
about new ways of actually designing
models that aren't necessarily built on
let's say the Transformer architecture
and the standard recipe for deep
learning that's uh you know prevalent
today and I think it boils down to this
question of like efficiently modeling
long context is a huge problem because
you know a lot of practical data is
really long sequence data I think text
is maybe the least interesting U long
sequence data because text is actually
fairly uh compressed already right like
you have a lot of information that is uh
embedded in two minutes of uh or two
sentences of text but there's all these
other domains where you know audio video
Etc where there's so much information
um you know imagine looking at a
security camera for a day like you would
probably have just so much information
coming into the system and just um very
little of that would be useful so
compression is kind of really
fundamental to intelligence because
we're able to do this where we can look
at all this information and then sort of
compress it down to whatever is
necessary to remember or understand um
and I think so far what we've seen is
that the AI systems that we built have
not necessarily exhibited that same
behavior so they're really kind of built
not on the principles of compression but
more on this idea of retrieval like
keeping all the context around and then
using it to reason over all the
information that You' seen so I think um
our kind of point of view is that
multimodal AI will remain challenging as
as long as you're sort of working in
this Paradigm because uh if you try to
think about what humans do in a year um
you're basically processing
understanding about a billion text
tokens 10 billion audio tokens these are
you know back of the envelope
calculations that I did and about a
trillion video tokens probably
underestimated how much video we
processed and not including all the
other sensory information that you're
processing and you're doing it
simultaneously and you're doing it on a
computer that fits in your brain um and
you you know sometimes don't eat and
drink and you know you're still
functioning fine so you know you can
have variable amounts of power in the
system um so I think the idea that like
intelligence is solved is sort of very
far from the truth because humans just
are an extremely amazing machine that uh
does something very extraordinary um in
a very compressed way that um our AI
models can't do so I think that's sort
of our uh you know sort of uh the reason
we get up in the morning is we think
about this and we're like yeah we're
very far away from where we should be um
and the best models today are in the you
know 10 million 100 million sort of
token range so uh that's really good a
lot of progress has been made but really
this is sort of what we aspired to is
how do you kind of build these machines
that are long lived that can actually
understand information over very long
periods of time and I think the cool
thing is like like as a human you can
remember things that happened 30 years
ago with very little effort you don't
need to do rag or retrieval or anything
you just you know you remember it it's
GED in your brain and then you figure it
out basically so I think that's kind of
an extraordinary capability that we
should be able to put into our AI models
as
well and so some of the big problems
with models today are uh you know
they're built on Transformers really
optimized for data center I think um we
see this with like a lot of the work we
did which was on sub quadratic models so
quadratic scaling and context length
really just means that um you know the
amount of uh computation you have to do
to process long amounts of uh context is
very large and so right now the sort of
predominant approach is to throw compute
at that problem and then hope that that
would scale um obviously compute is a
very important piece of the puzzle
because you do need more computation to
be able to do more difficult things but
um this type of approach because the
quadratic scaling actually has cor
scaling with you know very large
multimodal context and text context tend
to be shorter multimodel context will
get larger because you have just way
more tokens and information that's going
into the system so that's going to be a
big challenge for these models
especially how do you do this inference
efficiently so you're not you know
Burning Down the data centers to you
know do uh a fairly limited amount of
inference like you have to imagine that
we're doing a thousand times or um you
know 100 thousand times more inference
and then um if the models are scaling
the same way it's going to be really
really really expensive so you're not
going to be able to permeate all these
applications that I talked about very
easily um and so you know that's sort of
a big challenge I would
say and so you know again our hypothesis
is you need new architectures and that's
kind of where we spend our time and we
want to make these models more efficient
faster more capable while being able to
handle all these long context problems
um this is a slide about you know
Transformers being somewhat inefficient
at handling this um but obviously um a
very good recipe for scaling uh uh these
models
out and so you know some of the work
that we've been doing is new
fundamentally efficient architectures so
they have compression at their core so
they sort of the way they operate I I
have a slide on this just to give you
kind of a a quick illustration but um
they really scal more linearly in
context lens so you should be able to
have uh because of this like more low
power implementations of these models um
you can compress information as it comes
into the system you have low memory
usage um and you can actually scale to
much more massive context because of
that um and this is all the work around
ssms I just through this nice slide
which I thought was cool um Jensen had
an interesting quote about ssms in one
of his wired articles that I like to
keep talking about but uh uh but I think
it's a cool technology that has a lot of
potential and sort of that's where we're
um spending a lot of our time and if you
folks are interested in Reading More
there's lots of videos on YouTube and
lots of sort of resources that try to
make this more accessible to understand
and kind of get into some of the details
um but you know the working intuition is
basically Transformers are generating
quadratically by attending to every pass
token of information so as tokens come
into the system you're sort of keeping
them around and then looking at all the
past tokens if you want to generate the
word jump uh from the quick brown fox
you would actually look at the entire
context try to understand what the next
word should be and then um generate it
push it into the context do it again uh
with ssms you just have a streaming
system so you you have a to token stream
in uh they update an internal memory for
the model and then the token gets thrown
away so that actually really simplifies
the system and that's why it's such a
core sort of stream streaming interface
because you're just not keeping all this
memory around about what happened in the
past you're compressing it into some
sort of zipped file uh State inside the
model that's going to be used to do a
future
generation and so this is sort of taking
this idea of uh taking advantage of this
idea of recurrence which is sort of core
to how even humans do a lot of their
raising and you know last few months a
lot of these models have been getting
adopted so it's great that you know a
lot of folks are now excited about the
uh this you know alternate way of doing
things that is much more uh sort of
oriented around this idea of recurrence
rather than retrieval um and so I think
like we'll see a lot more activity in
this especially multimodel data becoming
more important and uh you know a lot of
the challenges multimodel data around
efficiency will mean that I think that
these models will have more of a role to
play in the next three to five years um
as we also do our work in uh scaling
them up and making them more interesting
a lot of people ask me about quality uh
I only have a few minutes so I'll go
through this the rest of the slide super
fast but um you know ssms generally have
the right quality obviously there's a
trade-off between compression and
keeping all the information around but
actually like compression can be helpful
so if you imagine the security camera
example if you're watching 24 hours of
footage actually compressing all that
information on the fly would help you
solve tasks and answer questions better
rather than looking at all 24 hours
every time so I think that's sort of the
rule of thumb to think about which is
compression super helpful for a large
context um not is helpful for um short
context and so we see that quality
actually is very good for long context
problems and and multimodal problems let
me talk quickly about some of the work
we've been doing so we've been starting
to work on sort of multimodal data and
we did a release a few weeks ago for a
voice generation model so this is sort
of text to speech and sort of in line
with some of the work we're doing to
bring more multimodel uh data into a
single model um and use ssms to power
the inference and the training and and
so on so this is a model you can
actually play with try to show you a
demo but um one of the things we're
proudest off with this model is that we
really Shrunk The latency down so when
you play with it on the playground you
get instant voice back generated from
the data center and there's some cool
work we're doing to actually run these
models on Mac um and other devices so
that you can basically have the same
experience as you have in the data
center but on any device and and do that
efficiently and it will power how much
time do I have okay we're out of time
but I was also almost done so um go to
the website play. AI I unfortunately
couldn't walk through the demo but um
play with it and um send us feedback
this is my email in case you want to
send me a note would love to hear
feedback and anything that you folks uh
find interesting thank you
[Music]
bre to go right I'm ready to go found I
and I know they on to so and I'm
ready breath toat
[Music]
[Music]
I'm ready to
[Music]
[Applause]
[Music]
the
[Music]
down
[Music]
he he
[Music]
you and me we were the only
on we were holding nothing
back from the greatest nights we ever
Sun L up The
Bard driving slow in your
car
singing every
night to play that song 100
times
fire in your bedroom
like on my mind you the music my mind
Frozen and tired always on my
mind I feel it all come back in the
moment
SP like
the so if you want to
come open play
it slow motion
[Music]
memory
so my heat feels like a
m up the
night
[Music]
I
the
driving Your
Heart Sing My love every
night to play that song
made
fire the summer
[Music]
dancing and tired always on my
mind I feel it all come back in the
[Music]
moment the
oce if you come with the door it open
play it all in slow
[Music]
motions feel it all come back in the
moment
can spin to like the ocean so if you
want to
come
open motion
[Music]
[Music]
[Applause]
[Music]
you
know you know how you
[Music]
get
[Music]
[Music]
[Music]
[Music]
[Music]
know know
[Music]
heav The
Echoes secrets that we
know doors that open for us in a
moment keeping
light
our everything
we we catch our breath in the midle of
it all Chas
ech sun is coming up all the
Crystal
[Music]
Vision
true on the I can see on the horizon all
we can feel
[Music]
see in the
forest for the trees I'm keeping watch
all that
storming waking
[Music]
uping
s everything
we we catch our breath in the midle of
it
allas
ech is C all the is coming Crystal
[Music]
Vision I can see on the horizon all we
can feel it Chas the is all know
chasing the
[Music]
light
[Music]
it like
aever up
we to believe
looking
the I see on the
[Music]
horizon true
belever weing the I can
see on the horizon all we can feel
is life is all know
[Music]
we
[Music]
[Music]
n
[Music]
we
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
here
[Music]
[Music]
you would
you you
would
you
you would
you
[Music]
you
[Music]
you
you
you
you
you
you you
would you would
you would
you
you
you you
[Music]
[Music]
[Music]
me
you
you
you
you
[Music]
you
you
you what you
[Music]
you
you
you
you
you
you you
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
taking one more breath beside you
so I could strength div
us I know we the best I could go
the
I but this how
we got to give it up sometimes
it's KN when to kill your pride there's
no what to blame nothing really stays
the same this is how we
[Music]
[Music]
[Music]
and I know your
Reas a but I know there's a row over all
the past your head on my shoulder but I
know we're better on
our but this is how we
go got to it
up KN to kill your there's
no nothing really stays the same this is
how
we we hold on to let go
[Music]
hold got to give it up sometimes as C
oh KN to kill you Pride there's no to
blame nothings
[Music]
[Music]
n
[Music]
B
[Music]
[Music]
oh
[Music]
[Music]
B
[Music]
[Music]
B
n
[Music]
[Music]
up
t-shirt through
high these night tastes like gold sweet
with Obsession show me something
[Music]
as
[Music]
ouring
while
[Music]
sing
as through
window all newes burning through the
page tearing all
you're
we we our
problem
underneath like
super like
superoes it's coming over
now down a
Harmony
only
super CR you want to feel like it's
St in
America your
influence moon
Waxing
insan blame it
[Music]
all
we we out the night like we wear our
clothes dancing through
the
[Music]
while
Asing through
window down a
Harmony only
heic you want to feel like
forever s in
America coming over me electric
[Music]
[Music]
going come
us hold tonight
all
going come us
hold
[Music]
tonight
us
[Music]
hold down a Harmony oface that only we
can
you
feel
[Music]
it's feel like us it's
forever you're get Mar
[Music]
[Music]
hold come with us
[Music]
hold it back in
9 we were kids falling in love for the
first time your hand you look me in the
eyes kind of feeling you get
life but now something went you're
moving on I found myself on The Blind
Side now you won't call we lost it all
you fade away I'm picking up my heart
from every piece that's broken been
trying to get back to myself but don't
have a clue I'm looking for some luck
can't find a door it's open I'm losing
all mys like I'm
here because I'm
missing because I'm missing
you
oh I'm missing
you because I'm missing
you I'm missing you
[Music]
I was chasing all the
sides trying to hold on to something
that I couldn't find wish you didn't
Captivate my mind
now I know we' been the sunet in
Paradise but now something went wrong
you're moving on I found myself on The
Blind Side now you won't call we lost it
all you fade away I'm picking up my
heart from every piece that's broken
been trying to get back to myself but
don't a CL I'm looking for some luck
can't find a open I'm losing all my
feels like
I'm because I'm missing
you because I'm missing
you
oh because I'm missing
youing because I'm missing
you because I'm missing you
up my heart every piece that's broken
try to get back to myself don't have a
CL I'm looking for some Lu can't find a
door it's open I'm Los know my hope
feels like I'm left here too because I'm
missing
[Music]
you baby just don't walk away I Need You
Now fade it out all the time we spent Al
fighting through the don't let me down I
need you now I'm feeling wor out getting
to
me lost some trying to get on my
feet caught in the madness I feel you
somehow don't let me go I need you right
now I want to be next to you you want to
be next to me me holding our Paper
Hearts fading our Broken Dreams I want
to be next to you you want to be next to
me holding our paper heart fing our
Broken Dreams I want to be next to
[Music]
you you
[Music]
tell me that you to stay baby just don't
walk away I Need You Now f it out all
the time alting
through let down I need you now I'm
feeling out it's getting to me lost some
heart trying to get on my
feet caught in the madness
I feel you
somehow don't let me go I need you right
now I want to be next to you you want to
be next to me holding our Paper Hearts
fading our Broken Dreams I want to be
next to you you want to be next to me
holding on Paper Hearts feing our Broken
Dreams I want to be next to you you you
[Music]
to be next to you to be to me hold
ouring our Broken Dreams to be next to
you you want to be next to me hold it
our paper heart now Broken Dreams I want
to be next
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
oh
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
let
[Music]
[Music]
we got
[Music]
an eyes wide sh we got everything we
need and then a little too I know that
you're starving for something you can't
touch would you be hon with me right
now there's something in the under car I
can feel it coming up don't you want to
feel it
taking over your SES don't you
ever
teolog baby come escap with me I'll come
sweep you your Fe don't to feel it don't
don't
you think there's in my that's weighing
me down it's just the weight of the
world now I'm calling it out we're a
little starving for some lightening love
can we speak Hest right
now there's something in
theur I can feel it coming up don't you
want to feel it taking
[Music]
over
baby I'll come sweep you off of your
feet don't you want to feel it don't you
don't hold breath
and and
I'm what I and I know to and I'm
ready breath and to IAT and I'm ready to
go
your it and IGN my soul and I'm ready
we
[Music]
are
do feel it CAU in
theow I'll come back to
your know that
you summer all that I
[Music]
bre
breath
I hold
[Music]
it I'm ready to
[Music]
[Applause]
again in
a you eyes
again when I least
expected you're all that I want
we know together we got it
all hold my
[Music]
breath
breath
[Music]
IAT and I'm ready to if I find myself at
your door would you follow me to better
places I find myself your the keys let's
go
up
I
[Music]
up to go I found what I want and I know
we're on top so I'll and I'm ready to oh
hold in my breath and I'm ready to go I
catch and I'm ready
[Music]
toat to
your it
I'm D in and I'm ready to
[Music]
[Music]
[Music]
down
down
[Music]
back
[Music]
a
[Music]
[Music]
back
[Music]
s night and
days you and me we were the only
one we were holding nothing
back from the greatests we
ever Sun up
the driving
slow
your
singing every night
to play that song 100
times made
fire
[Music]
in come back in the
moment could spend to
the ocean so if you want to come the
door it
open
it motion
[Music]
my heat feel like
a b it up the night when I'm
alone when I hear this words you made a
[Music]
theing your
heart
singing every
night to play
[Music]
that made
fire daning the rec in your bedroom
like always on my
[Music]
mind I feel it all come back in a
moment
SP like the
ocean if you
want open play it all slow
[Music]
all you to come it
open it all back in slow motion
[Music]
[Applause]
[Music]
[Applause]
[Music]
get he
[Music]
[Music]
[Music]
[Music]
[Music]
you know you feel me you know
secrets that we
know doors that open for us in a
moment keeping light on
ring keeping our sight
everything we
want we catch our breath in the midle of
it all Chas
ech is coming up is coming Crystal
[Music]
Vision like
you
belever looking on
the I see on the
[Music]
horizon the trees keeping
all waking up and turning on keeping
light riding hold
keep our
sight
[Music]
everything catch our breath in the midle
of
[Music]
Chas is coming all the coming Crystal
[Music]
Vision like
[Music]
you true
belie the I can
see
[Music]
Horas stop and I won't let it goas
let
[Music]
it
up looking on the I can see on the
horizon all the we can feel ch
[Music]
I True
Believer weing the I can see on the
horizon all we feel
[Music]
[Music]
he
[Music]
[Music]
w
[Music]
[Music]
we
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
oh
[Music]
[Music]
[Music]
me
[Music]
[Music]
you
[Music]
you
you
you
you
you you
you
you you
would you would
you
you
you
you
[Music]
you
you
you
you
you
you you
[Music]
[Music]
here
[Music]
do you do
you
you
you would you
you
you
[Music]
you
you
you
you
you
you
you would you would
you would you would
you would you
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
we got it I know we did the best we
could if I could go back UND the mess
would memorize your face before I
go but this is how we
grow got to give it
up know to kill your pride there's no to
blame nothing really stays the same this
is how
we sometimes we hold on to
[Music]
[Music]
there is nothing lost between
[Music]
us I know
your days a but I know there's a rainbow
over all the head
oner I
know
on but this how
[Music]
we it up
[Music]
KN when to kill your pride there's no to
blame nothing really stays
[Music]
the let
[Music]
[Music]
to kill you Pride
there
blame stays the this is how we
[Music]
sometimes we hold on to let go
[Music]
[Music]
[Music]
B
[Music]
[Music]
n
[Music]
[Music]
B
[Music]
[Music]
[Music]
[Music]
d
[Music]
I was watching you watch the up
tshirt
through
taste
[Music]
obession dancing right through the fire
while we watch itose
as we up our as
morning through
Windows
all through the P
tearing
the we our
problems cles like
super like super heroes it's coming over
now down a Harmony of that
only
[Music]
youever
Amer
influence wax now I couldn't see it
until you show me how feels like we're
insane we blame it all on love
[Music]
saturated our
daning
while give up our gos as a new evening
comes through the wind
it's coming over now it's w down a
Harmony
only
it's
Amica over
MEC
Sy every night on kn
a
super you feel
like
[Music]
forever the going so come with us
don't hold back tonight is all we have
the is
going come
[Music]
us
[Music]
hold come with us don't hold back
tonight is
[Music]
we
[Music]
going
Harmony
[Music]
only in
America it's coming over me electric
Sy every night on
fire
Master
[Music]
super us
[Music]
it's
come don't hold t
[Music]
[Music]
t hold
[Music]
it was summer back in
89 we were kids falling in love for the
first time your hand you me in the
eyes kind of feeling you get in
a but now something went you're moving
on I found myself on The Blind Side now
you won't call we lost it all you fade
away I'm picking up my heart from every
piece that's broken been trying to get
back to myself but don't have a CL I'm
looking for some luck can't find that's
open I'm losing all my like I'm
here because I'm missing
you because I missing you
oh because I'm missing
you because I'm missing
you because I'm missing
[Music]
you because I'm
[Music]
you I was chasing all the
sides trying to hold on to something
that I couldn't find you didn't
Captivate my
mind now I know we've in the sunsets in
Paradise but now something went wrong
you're moving on I found myself on The
Blind Side now you won't call we lost it
all you fade
away I'm picking up my heart from every
piece that's broken trying to get back
to myself but don't have a CL I'm
looking for some Lu can't find a that's
open I'm losing all my feels like I'm
here because I'm missing you
because I'm missing
you
oh because I'm missing
you because I'm missing
you because I'm missing
picking up my heart every piece that's
broken trying to
get
aing
open
my Miss
[Music]
tell
all Al
through I need you nowz I'm feeling worn
out it's getting to
me got some heart trying to get on my
feet caught in the madness I feel you
somehow don't let me go I need you right
now I want to be next to you you want to
be next to me holding our Paper Hearts
fading our Broken Dreams I want to be
next to you you want to be next to me
holding our paper heart feeding our
dreams I want to be next to
[Music]
you you you
[Music]
tell me to St baby just don't walk away
I Need You Now fad it out all the time
we spent Al fight through the fire don't
me down I
[Music]
need getting to
me lost some heart trying to get on
myet caught in the madness I feel you
somehow don't let me go I need you right
now I want to be next to you you want to
be next to me holding our paper heart
fing our Broken Dreams I want to be next
to you you want to be next to me hold it
our paper heart feing out Broken Dreams
I want to be next to you
[Music]
you want to be next to you you want to
be next to me holding our Creer heart
fing out brok dreams I want to be next
to you you want to be to me holding our
hearten
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
n
e
e
e
e
e e
hello hello and welcome back to the
afternoon session of the multimodal
track we are super excited to get
started with Chang sha and Noah talking
about the hierarchy of needs for
training data set
development oh wait okay I think this
thing is on hey everyone thanks for
coming to our talk hopefully everyone
didn't eat too much and you're not too
sleepy from from the lunch but uh we're
excited to be here and we're excited to
be talking to you about uh training data
set development for llms so my name is
chuna I'm the CEO and co-founder of
Lance DB I've been creating data tools
for data science machine learning for
almost two decades uh starting with
being one of the co-authors of P the
pandas Library a long time ago uh spent
a bunch of years in Big Data Systems and
recommender systems and most recently I
started this company um Lance CB which
is the database for multimodo AI these
days I spent about roughly equal time
tweeting and and on
GitHub and yeah hi everyone I'm Noah I
currently lead the AI data platform at
character and character one of the
leading personalized AI platforms so we
train Our Own Foundation models as well
as run a direct consumer online platform
and I focus on data research so since we
train Our Own Foundation models we need
to learn what need to train on to engage
our users and so we're Focus both on
academic benchmarks as well as things
like AB tests and trying to get more
engagement on our platform my team is
focused on Research acceleration as well
so we tend to build a lot of tools and
leading to this collaboration with Lance
and how we think about storing our
data so I think if there's one thing I
want to convey with this whole talk is
that you should really care about what
you're training on and you should care
for it by giving it a nice form format
that does a lot of nice things for
it I wanted to start just kind of broad
Strokes talking about how we think about
pre-training and how we think a post
trining there's definitely a lot of
overlap but at least in terms of
pre-training you tend to think wider
right you want to think about more like
what domains you're training on are you
thinking about books or more chat data
um and then you want to also think about
quantity right how big is your model how
many tokens do you need compared to post
training where you're looking at very
specific tasks and maybe not just
looking at the context of that task but
also how difficult is that math problem
how easy is that multiple choice problem
so you kind of have to get much deeper
and more granular in terms of the things
that you understand about your data at
scale in the middle I I guess I group
together some of my favorite problems
right now that a lot of people are
looking into so ranging from data
efficient learning right how do we
reduce the amount of data we need to get
good results from a similarly sized
model how do we sample from data right
like what kind of metrics do we need and
then how do we look at diversity right
measuring diversity is very difficult
and looking at some of the automated
ways that we do that in industry and all
the different papers that are out
there so everyone loves a good hierarchy
of needs uh I think that for us we
always start with clean data and quickly
go right up to
evaluations for us we always start there
because it's hard to measure anything
without a compass and since we're
focused a lot on post training nowadays
having systems for data set management
is becoming more and more of a problem
so when we thinking about mixtures right
these collections of data sets and
usually you have different ways of how
you're including them in your batches
and in your training sets you want to
understand those collections not just in
terms of what data set they are you know
is this Wikipedia or is this some other
thing but also what's in there right so
it naturally rolls up into analytics so
we want token counts an understanding of
length and you might even want things
that are more complicated right so you
might be classifying your code data into
not just say is this python or is this
Java but also how difficult is it how
many functions are in this problem how
many classes were you supposed to
generate and really having more and more
analytics lets you understand your data
more I think more than anything reading
data has probably been the biggest win
so these are kind of just ways of
automating things that we've learned
from looking at data looking at outputs
looking at performance and trying to
understand what is going
on everything in kind of the top top
half I guess of this is more about using
language models to improve language
models so things like synthetic data
things like quality scoring things like
data set selection and data set
selection is probably the simplest and
one of my favorites right you're just
kind of looking at ways to match
distributions for the behavior you want
from your model and the data that you do
have and so a lot of what we do is do
retrieval or do clustering you know you
can embed the web nowadays pretty
quickly and how do we pick the data that
we like of according to what kind of
evaluations we're looking
at quality scoring is similarly simple
like we built a lot of classifiers in
house for a variety of things and
there's a lot of cool work around how
people are actually doing this with just
prompting classifications so you can do
it even more simply than say having to
go down the route of actually building a
classifier and evaluating it and doing
that whole
Loop and synthetics given the way that
we've structured our platform is also
super powerful for us because we have
this ecosystem of Big Data tools like
spark and trino alongside some GPU back
services for doing prompting for doing
embedding and for classifying things and
so we can enrich our data sets we can
augment them um you can generate quick
examples of say preference Pairs and try
to explore a method not at its peak of
quality right synthetics are going to
have problems but you can start getting
signal for what types of data what shape
is that data and how can you kind of
start looping in human labeling to make
it even better so at the top right we
use human labeling a lot for improving
these classifiers um and we also want to
use them for rewriting synthetic data
that maybe has issues or rereading data
that just has issues in
itself and so all of this kind of comes
together to motivate a lot of our our
platform tooling and uh we go to the
next slide to to talk about kind of how
we try to make all of this easy for
researchers working in this domain so I
said at the beginning accelerating
research is a big part of this I've
included some beautiful yaml here that
hopefully people can see that there's a
SQL block over there and I think that
this is pretty motivating in terms of
how we materialize data sets so if
you've worked in machine learning at all
you know that usually you have a
specific training format maybe it's TF
records maybe it's Json lines depending
on where you're coming from and at least
in my experience it's one of the most
error prone components of training right
I don't know what data this is I'm
training on it I'm getting rird results
so for our team since we're doing so
much iteration around data making it
part of your training job and separating
concerns in terms of how your data is
materialized and what your training job
is doing is really really nice for
us and this is kind of where land
started becoming a big deal especially
as we start thinking about multimodal
and how the data volumes are much much
larger and the problems that we're
trying to solve become much more
complicated so the materialization
service aside you know it's kind of this
nice interface that you send it some
request and it gives you some list of
files um really starts hitting the road
when we think about data loading which
is its own problem in and of itself if
uh especially once data volume becomes
really large so Lance has this nice
property that Chung will talk about a
lot more that allows for quick random
access and it lets us Shuffle data very
cheaply right so it essentially lets you
Shuffle references to rows rather than
shuffling the rows themselves which
allows you to save a lot of time in
iteration speed and at the end of the
day for us we just want to watch the
gpus go bur and the numbers go up so
I'll pass over to CH who can talk a lot
more about the Lance format in detail
thanks cool so you've heard from Noah
about the importance of data in
developing models and so if data is
critical then it's also critical to have
the right data infrastructure for the
for your workloads now ai workloads tend
to be a little bit different from your
traditional data warehousing olab and
analytics workloads in a couple of
different ways but let me give you just
one motivating example if you think
about a distributed training workload
typically it breaks down into three
steps you have a filter you want to
select the right uh samples from your
raw data set then you'll have a shuffle
step where you will then uh draw random
rows uh from the filtered set and then
you'll stream typically if the data set
is large you'll be streaming those
observations whether they're text or
images or videos from object storage
into your gpus so in that one workload
you needed fast scans to run the filter
you need fast random access to do the
shuffling and then you need to be able
to deal with potentially very large
binary data large blobs uh to be able to
quickly stream data directly into your
gpus
so these three properties are required
often in one AI workloads from training
to uh search and retrieval but existing
data formats and data infrastructure is
good for at most two but but often just
one of the three and so this is what I'm
calling the the new cap theorem for AI
data um and that's the motivation for us
for Designing uh Lance format around
which we've built Lance
CB so so uh this problem is of course
exacerbated by scale of AI data and
especially multimodal data so if you
look at tabular data from the past one
row of tabular data just scalar simple
scalar columns on average it's about 150
bytes per row uh if you add embeddings
to that that gets about 20 20 25 times
larger depending on the number of
Dimensions if you add images uh that's
another 20 times and if you add videos
that gets pretty astronomical and that's
one single row um and with generative AI
data is isn't limited by the speed at
which you know manual human interaction
can generate observations uh new new
rows of data is being generated at
thousands of tokens per
second so scale often blows blows up um
in the past as I've been in data for a
long time if you were in the tens of
terabytes you were a fairly large
company and I think these days uh if
you're working in generative AI it's not
unheard of for you know uh 10% 20% teams
to be managing like tens of terabytes to
even uh pedabytes of
data um so what does Lance format do to
solve these problems well so Lance
format first it's a colner file format
so like parquet but or uh optimized for
AI so it gives you the ability to do
fast scans like paret it supports fast
lookups unlike paret and uh we've
actually gotten rid of a big limiting
factor in parket called row groups uh
and so that we can allow you to store
blobs in line Lance format is also a
lightweight table format so as you add
data uh it's automatically versioned you
can also add additional columns without
having to copy the original data set so
it makes it a lot easier if you're
working with large uh multimodal data
sets to add experimental features and
then roll roll them back later
on um and we call this a zeroc copy
schema Evolution and then finally of
course it supports time travel so that
often times if you make a mistake or
there's an error or there's bad data
it's instantaneous to roll back to a
previously known good version so that it
doesn't corrupt down Downstream um model
training processes
and the third aspect of Lance format
that's really interesting is indexing
extensions so in parquet there are
indices but uh the indices can quickly
tell you which rows you need but with
paret because it doesn't support Random
Access even if you know which rows you
need to fetch it's really slow to fetch
those rows U and and not so with Lance
so with with Lance we've added indexing
extensions for embeddings so you can do
you know essentially billions scale
Vector search directly off of S3 um you
can we can have scaler indices to make
filtering metadata columns really
quickly uh and then uh full teex search
indices to do keyword or fuzzy search uh
and directly from uh from your S3 data
set uh and you don't really need that
elastic search cluster
anymore so what lands gives you is the
ability to have a single table for many
many different workloads so uh if you
have metadata columns or time series
columns you can run SQL so you can plug
Lance directly into say um duck DB or uh
trino or spark and you can run SQL on
that you if you're storing large blobs
and tensors like te uh the videos or
text or images you can plug your Lance
data the same table into P torage
training um and if you have embedding
vectors you can use use the embedded uh
uh Vector index to do similarity
search and so this makes it a lot easier
uh for a full AI workflow from um
analyzing exploring your data set to
searching and retrieving throughout your
data set to uh fine-tuning and training
uh your
model around this format we've built
lanb U the vector database and the more
General a database for multimodal AI so
uh one big feature is a distributed
Vector search so billion uh search
through billions of vectors at low
latency and very high QPS with order
magnitude less infra than other Vector
databases and it provides data
infrastructure for all of your
multimodal data needs when we talk about
multimodal we often think narrowly about
just image generation or video
generation but when you look at the data
multimodal I think has many different
meanings one of course is the data the
data can be uh multimodal so unlike
traditional tabular data we can store
features um and then audio waveforms
images and all that that's we're
familiar with that already um and of
course vectors and and Vector is a
vector so whether they're they're image
uh embeddings or or text embeddings now
the workload can also be multimodal um
so you know not just running olab SQL
but you can run Vector search you can
run full teex search uh filtering and
then um uh and other uh sort of data
frame and SQL workloads and then finally
the use case in the scenario can also be
multimodal so operational scenarios
where you're in a production service for
um rag or search and retrieval and
personalization or lands can be used in
training uh or it can be part of your
data Lake to analyze and explore all
that multimodal data that you have
yeah so I think that from at least my
team's experience and a lot of what Jung
is describing we just think that speed
is probably our our best bet in terms of
strategy and a lot of the tools that we
worked with really slowed down under
load under new multimodal needs and
we're looking to develop out what the
future for those Data Systems looks like
so thanks so much for listening to our
talk yeah
[Applause]
codes uh both character and Lancy be are
hiring the on the on the I guess your
left is uh the job board the QR code for
characters job board and our uh Lance CB
Discord is on on the right and uh yeah
check us out thank you for coming thank
you
e
e e
up next we are really excited to welcome
stf Dua who's going to be talking about
the multimodal future of Education she's
been doing research in this domain since
20155 and so we're super excited to hear
from her thank you so
much is this
on yeah can you hear me hi hello I think
it's on no it's not on very faint yeah
perfect uh hi everyone I'm Steph I'm
going to talk about the future of
Education with multimodal AI uh we are
here at AI engineering Summit and AI
engineering starts very very early so
I'm curious how many of you have kids
how many people in the room have
kids okay wonderful uh how many of your
kids uh have played with generative AI
so
far okay so um you won't be surprised to
see the next slide basically 70% of
generative AI users uh are from
Generation Z this is a recent study from
Salesforce so it starts very early and
the reason reason I care about the
future of Education with generative AI
is because edu education needs a wakeup
call so we know that early L literacy
rates um she can run around the room and
the robot is constantly going to try to
find her now this is the first step um
it's using like I mentioned blog based
uh programming language expanding
scratch and at the time like it allowed
kids to not only program their smart
lights their voice assistants but also
train their own custom models so they
can train models with examples of images
or examples of text and then use those
custom models in their own games and
application so for example here like
this student train a model to
distinguish between unicorns and
narwhals and then not only it gets a
prediction when it plays with a game but
it also gets the confidence level how
confident is his custom model that the
drawing is a unicorn and we see the
confidence is pretty low
so they made all sorts of things like
looking at what's in their food uh
trying to like uh program games like
rock paper
scissors uh get like uh the robot to
talk like
Shakespeare and this was used all over
the world it's translated in uh more
than 30 languages and um the good news
is that we evaluated this to see how it
increases that critical understanding of
AI and how it helps with AI litery ly so
to do that um uh I did a longitudinal
study in public and private schools
where we asked questions of what kids
think about AI before then we allow them
to engage in AI learning activities and
then we ask the same questions at the
end and what we found after they learn
how to do text training image training
smart home programming um is that they
became much more skeptical of the AI
smarts like in the beginning they would
say like yes you know Google home is
smarter than me or this model is much
better than me and after they learned
how it works and how to train it they
were not so sure it's smarter than they
are I'll show you a quick video to see
how that went
for a lot of The morex Savvy parents is
like go for it technology is going to be
a huge part of their lives much more so
than my life if it's scary for some
people this AI technology I totally get
it but as a parent and as a teacher I
thought it was really important because
these are skills that 21st century kids
need to
have when my dad was young he bought a
car and took it AP to see how it worked
so you teach people that young how these
things that grown-ups mostly program how
it
works so as I was saying AI engine years
in the making um and this this is the
significant difference like to their
perception of the smarts of AI before
and after doing these learning
activities so how did they why did that
happen right like why would why did they
became more skeptical more critical and
also more literate in how to read and
write with AI is because by providing
this platform and allowing them to
Tinker and form hypothesis and test them
we basically allow them to engage in the
scientific process just like researchers
do just like we do right but we needed
to have the right sandbox the right
platform for them to be able to quickly
thinker and quickly
iterate so kids are not alone in
learning this parents need to learn to
teachers need to learn too and we've
seen during the pandemic when kids were
stuck at home with with parents a huge
opportunity for them to learn together
so I'll show you one of the early demos
of cognates
oh the audio is not working on this one
I'm not sure why basic there you go you
did
it no I need you to help me ask a
question for that we'll need the ask
block see if you can find
it awesome so the thing that you're
programming is kind of collaborating
with you to teach you how to program it
right just imagine IM applying that to
any of the chat Bots we have today right
like when you're not happy with the
answer or maybe the answer is not age
appropriate or you want to teach you
also want to teach something to the
model about your language your culture
weird facts that you're interested about
how do we do that right um so I did
another study uh where I this was with
kids and parents in 10 different states
in us over multiple weeks where we
wanted first to learn how do we design
a co-pilot for programming for families
so before we start and build it like
what do they want what works and what
doesn't so what we found was that some
of the things that kids and parents
likeed the most was to generate coding
ideas with an AI friend like if they had
a co-pilot in in scratch um and this was
very very helpful here are some quotes
um because like here like one of
participants says most people would like
coding with AI friend because one of the
hardest parts of your project is when
you start you run into into a wall
because you're out of ideas so they I
friend helped with that it also allowed
them to express and elaborate their
ideas in code so if they had an idea for
a game like I want to make uh the bear
kind of jump over the Hedgehog but they
didn't know how to do it it would kind
of help them um find the the right code
constructs to do it and more importantly
it supported their cretive coding
identity so it wasn't the bot that was
making all the coding they were doing it
the bot was just helping them when they
were stuck so this was very very
important it encouraged kids and parents
to work together which is not always
easy right like uh one of the things I
discovered I've been working with kids
and families for a long time now since
2015 um it's not always easy so actually
having like a third moderator to be like
oh what does Mommy say what does daddy
think uh take turns try this really help
with family joint engagement also it
doesn't always work right sometimes it's
too distracting and it was very
important to enable families to shut it
off maybe they want to do the game alone
they want to do the coding alone so they
could stop it whenever they wanted if
you have multiple siblings that fight
over the laptop it doesn't really it
cannot help with that uh or if the
comple if the concepts were too complex
it was not able to scaffold it always
like break it down so parents were very
help to to
help so after understanding like what
are the core things that families want
from co-creating and learning how to
program with an AI friend and went and
basically evaluated all the generative
AI models to see if they could do that
right so scratch for Scratch like top
generative AI models are pretty good at
generating explanations giving like
ideas or questions to help uh kids and
parents like explore like and test like
new games and this was published we
created a benchmark as well for for
measuring
this and this is just an example of what
the future of Education with multimodal
AI could look like um if it's applied to
Minecraft to games to physics
simulations science simulations it can
become a creative sidekick right there
are a lot of people who love to build
things with their hands what if I could
get ideas like by taking pictures of
flowers I like and colors I like and it
gives me idea and helps me like generate
3D models and that I could afterwards
print and paint um or I'm into knitting
and I want to use a generative AI model
to inspire my meeting projects it can
also be a learning companion and a
coach it can help with math so together
with Nancy Otero we created the first
Benchmark for math misconceptions to
show what are the most common math
problems that kids have in K through 12
and evaluate how good are top of art
generative AI models in identifying
these misconceptions when kids talk with
a chat bot and I put a link to it if you
want to download it so I am here to
invite you to think about AI engineering
and AI tinkering for all ages and how do
we go from my experiments to cognates to
things that people are doing and
tinkering with hugging face and make
sure like we open up the space so we use
AI not just to teach but we actually use
AI for people to learn how to Tinker and
Learn by playing and Learn by doing so I
like to do what I preach and I'm going
to show what I tinkered with AI last
night this is these are very fresh demos
so this is using the latest um Gemini
API and I have three demos let's hope
they work uh let's start with the
science
one uh
and I was hoping to draw in real time
but I don't have a table so luckily I
have some some drawings and we'll see
we'll see how well this works so I have
a
drawing
oops a scale with a weight on each side
what would happen if you had another 5
kilograms so it's asking me questions
based on my drawings and then I can make
a new drawing that has like 10 kg and 10
kg and see if that gets better
um let's try another
one water and CO2 what happens if it
gets mixed oh I need to so imagine I
have a webcam and I'm like a table and
I'm drawing a real time and we could
play with it but it's very interactive
let's try this one
the Earth is being hit by something
hopefully
not let's add one more arrow and see
what happens if we do
that
ah that was fun so it finally understood
it was the moon let's play with the math
one
and solve the expression inside the
parenthesis okay so I have one where I
did
okay solve the multiplication with the
parenthesis and let's assume I've done
that too and I have the next
question I need a better background for
this demo that's for sure uh the first
step is to simplify no no no go back
okay so any number divided by itself
equals one but you see it doesn't give
me the answer it just give me a question
so I can keep trying right and learning
um let's try one more more
complicated let loated four on the
axis I was hoping it would give me a
better question yeah so the last one is
the one that is encouraging
curiosity so this one
what is the lady doing
okay
uh what are the colors on the flag what
shape is the
star oh it asked me things about
Jordan um let's see what it does with
Apple it has
can you do you know what Apple this
is uh let's
see what does the Apple smell
like uh I had a nice origami thing as
well
okay I don't know where Dory gummy went
but
so you sort of get the gist um these
these are like I don't know if do you
want me to draw something or do you want
to ask one of a question of the science
or math or objects any any requests from
the
audience don't be
shy yeah what what should we
ask no okay well
uh excuse
me a system equation yeah can you tell
me what to
write 2x +
7 =
2 thank you let's try it
let's see if it does well with my
uh substract seven from both sides of
the equation not bad and now if I do
that
divide both sides by two and so on and
so forth um now the cool thing about
this like I made the code open source
and template so you can play with it too
is less than 100 lines you just need to
create an API key which is free and you
can create your own
instructions and hopefully I inspired
you to think like Beyond of chatbot
interfaces and delegating instructions
and delegating like questions and think
more in like a tinkerer
and think about how we could put these
tools in the hands of young people
because they are the future and they
need to learn about this technology as
well and how it works um I think that's
my time uh all my research is on my
website and I put a QR link for that as
well and I look forward to your
questions afterwards thank you so
[Applause]
much
e e
and next we're excited to have quinda
halman Kramer talking about building
voice AI
agents thank
you so uh my name's quind I work at a
company called daily we make realtime
audio and video infrastructure and when
gp4 was released last year we started
building lots and lots of experiments uh
we thought gbd4 was so good at
conversational Exchange in text that it
really should be able to talk to you
literally talk to you with your voice so
we built a bunch of things like this
this is probably a year ago and in gen
terms it's ancient ancient history but
probably worth looking at as context I'm
your friendly chatbot here to assist you
with any questions or tasks you have in
mind how can I help you today uh can you
tell me a
joke absolutely why don't skeletons
fight each other you always start with
that one give me a different joke
all right I'll give it another go what
do you call fake spaghetti an heard that
one too come on maybe just so we use
tell me a joke from you know very early
on all the way through today and I think
we know all the jokes that all the llms
we use regularly uh tell uh which is
funny but not necessarily funny in the
same way they were the first time we
heard them um so this is sort of a
really highle schematic of what we're
trying to do here right we've got a user
on a phone or a laptop they want talk to
their device and then somewhere in the
cloud we've got a bunch of gpus uh that
are doing a whole lot of heavy duty
compute and we need to talk to those uh
that cloud computing resource
somehow uh so as soon as you build stuff
like the video we just saw a couple of
things come very much top of Mind One is
speed really matters and the other is
architectural flexibility is really
really important I'll talk about both of
those things today uh let's start with
architectural flexibility so that really
nice clean diagram gets really messy
really fast This Is Not Unusual in an
engineering software development problem
domain uh but I thought I'd kind of make
a slide based on like I looked at a
bunch of source code I thought about all
the conversations I've had with you know
my colleagues and our customers and
friends who are building this stuff and
it turns out you have to at some level
kind of be aware of a bunch of these
things
if you want to build realtime robust
voice AI stuff deployed scale to
production um it's a little bit of an
intimidating map we're definitely
putting the multi in multimodal AI here
all the way from audio processing things
like Echo
cancellation uh and CPU management when
you're encoding decoding audio and video
through networking issues like firewall
traversal all the way through to
building things like retrieval augmented
generation and Tool calling so that your
real world applications are really
really
useful we can collect that kind of messy
map into a few a little bit more kind of
uh highle categories um it's worth going
through these just really quickly
because I think they give you a sense of
what that map is so you need really
robust and low latency media processing
and transport you've got to encode the
media you've got to send it over the
network that's got to work really well
it's got to work really fast um you need
really good and fast transcription at
least until the future of truly
multimodal uh audio native models comes
which will happen at some point uh and
even after that you probably are going
to need to go from audio to text for
lots of kinds of AI use cases um you
have to do lots of real-time data
Pipeline and buffer management so uh I
think in Discord I've probably maybe 20
or 30 times answered the question why is
my audio stream not working uh when I
did local development on my Mac and then
I pushed it to an Intel box in the cloud
it's because you know Indian issues
always get you if you're writing
low-level audio stuff but there's a lot
of sort of Pipeline and buffer
management that comes into play here you
want to be able to swap between models
for a whole bunch of reasons or use
multiple models together uh you
generally need to call out to external
systems you have to do things like
phrase inp pointing which is the fancy
academic term for when is the person
done talking and when do they expect the
AI to talk uh you need to handle
interruptions really gracefully and that
actually is a whole Rabbit Hole of its
own uh people will interrupt the bot uh
and you need to figure out what you're
going to do and how you're going to
handle that and how you're going to
maintain State when people interrupt you
have to do Echo cancellation unless you
can convince everybody in the world
who's using your thing to wear
headphones 100% of the time uh and I've
been doing audio video development on
the internet for a really long time like
our core product at Daily is a web RTC
Tech stack so I can tell you that you
will not convince everybody in the world
to wear headphones all the time uh maybe
if you're only deploying to people in
like a call center professional context
but even then it's pretty tough uh and
you need good fast text to speech or
voice generation on the other
end the good thing is it's a lot but
when it all comes together it really is
amazing to build with these new gen
tools you can build stuff that feels
magical and new to people but that is
genuinely useful that sort of just feels
like how things should be uh we built
this is an example of something we built
for a Healthcare customer that solved a
real pain point I think if you've been
to the doctor which I'm sure all of us
have you've had to fill out way more
forms than you thought you should at
subsequent doctor visits and maybe you
filled those out on paper maybe you you
know got a link to a form to fill out
before uh the visit which is like the
current state-ofthe-art what if an llm
could just call you on your phone or you
could click a link in an app and you can
have a conversation and you never had to
fill out a form
again hello chat my name is Jessica and
I am with Tri County Health services to
ensure your privacy and to confirm your
identity could you please provide me
with your birthday including the year uh
yeah it's January
1st I appreciate that Chad can you also
tell me oh sorry yeah uh
1983 thank you for providing your
birthday Chad now could you please list
your current prescriptions including the
medication name and the dosage for
each uh yeah I take my Cil and also Gaba
pent
great could you please provide the dose
oh yeah uh sorry 10 milligrams and uh
100 milligrams
respectively thank you for that
information chat additionally do you
have any allergies that we should be
aware of if so please list
them uh yeah I'm allergic to latex and
I'm also allergic to
penicillin thank you for letting us know
about your allergies Chad next hard so I
think you can see how all those pieces
we just talked about are actually
necessary to pull together something
like that but when you do it feels
simple it feels like it's supposed to um
that's a real video nothing in there is
edited uh the call outs to the external
system are mocked they're not it's not
actually a real electronic health record
systems on the back end but it's it's
they the real tool calling um and that
stuff is now starting to move into
production uh in in the real world so I
think in a lot of cases the geni
frontier is moving very quickly and
people are really sort of pulling it out
of our hands which is super exciting um
so let's switch gears a little bit and
talk about fast because that's the other
really big piece here how fast do we
need how fast is fast and also just a
quick side note so we're talking there
always two kinds of fast when you're
talking about engineering things there's
throughput and there's latency these
days for
conversational interactions throughput
is pretty okay for all the tools we all
use today like llms and other tools can
generate content as fast as people can
read it or listen to it but what's hard
is latency and latency is that sort of
time to First Bite time to First token
in lots and lots of engineering contexts
there's trade-offs between throughput
and latency complicated relationships
between throughput and latency uh one of
the graphs that I sometimes show in
these talks is that uh throughput tends
to improve by an order of magnitude
every couple of years in lots of domains
latency improvements tend to be linear
and like way behind throughput
improvements so latency is hard and
latency is mostly what bites us here
human conversational latency like if I
am talking to another person it feels
weird to me if that person doesn't
respond in about half a second sometimes
people respond actually a lot faster we
seem as humans to be doing like
speculative decoding next token
prediction just like natively like
that's what we do I know what you're GNA
say four or five words before you finish
saying it I'm queuing up my response I'm
sort of doing my inference in this like
greedy fashion if you say something I
didn't expect well I can like reroute
but most of the time I'm right and most
of the time if you actually record
people in conversation they'll respond
in like two or 300 milliseconds commonly
and if they don't they'll give you some
kind of cue so the the sort of 500
millisecond Target is is pretty
important because we hit that uncanny
valley pretty quickly when we're above
it in fact I think that video of my
colleague Chad that you just saw if you
watched it with a critical eye what I
hope you saw was pretty cool
orchestration of like state-ofthe-art
gen stuff and probably slower response
times than really should be there uh so
we've spent the last couple months I've
spent the last couple months really sort
of thinking a lot about how to improve
these response times and just as a kind
of Benchmark like relative uh another
number that show how hard this is like
Gemini Pros time to First token it's
like 900 milliseconds so if you're
aiming for 500 milliseconds you're
already almost double even before you do
anything else even before you send stuff
you know o over the network for other
services or anything so what models and
tools you choose are constrained they
matter a lot it matters a lot how you
string them together um so just to pop
up a level again this is what we're
trying to
achieve and the most power ful tool we
have today for making everything run
fast in this domain is actually putting
as much together into one compute
container as we possibly can so if the
if the really really big things we're
trying to do are natural language uh
speech to text and then phrase inp
pointing so when should the bot do
processing or talk and then llm
inference and then Voice output if we
can put all those things together and
run them locally and collocated we're
like way ahead uh of where we are if we
can't do that and this is worth
emphasizing because I think I'm sure
like 95 98 99% of stuff we're all
building today with Gen we're calling
out to hosted Services there's a lot of
really really good reasons for that uh
but that's tough in this domain if
latency is what you're prioritizing and
latency might not be what you're
prioritizing and that's okay like
there's lots of different trade-offs you
can make but if you're trying to make
things really really really fast you
need to figure out how to host stuff
yourself and how to host stuff in a way
where you can tune and control and
combine
everything um so this is the part of the
talk where I like look at the clock and
I look out at all of you and I try to
figure out how much tolerance you have
for me talking about latency because I
will maybe ironically we'll talk about
latency for hours and hours is what I'm
obsessed with as an engineer uh I do
think it's worth just quickly kind of
going over this list of like kind of the
best you can hope for latency numbers
for a typical voice AI context because
some of them are non obvious so first
what are we actually measuring we're
measuring the time like what do we
really really care about we measure the
time I stop talking so if there's like a
green waveform on one side and a purple
waveform on the other side of this like
you know uh audio uh editor um the time
I stop talking then there's some kind of
Gap usually silence we could play hold
music music if it's too long a gap um
and then there's another wave for on the
other side when I first start to hear
the LM talking to me that's the Gap we
care about the voice to voice latency
and that has to include everything it
has to include audio encoding sending
stuff over the network all the
processing sending stuff back playing it
out the speakers so the very first
number here is actually kind of
shockingly high if you're using the
laptop mic on Mac OS it's like 30 40
milliseconds of delay there are some
good reasons for that there's a lot of
audio processing
again Echo cancellation background noise
removal uh gain control all that stuff
plays in plays a role here uh as these
kind of use cases get more and more
important maybe hardware and operating
system manufacturers will push this
stuff down but today you're sort of 40
millisecs into your 500 millisecond
Target time before the btes ever even
get into any of what we think of as
software we control um and then you're
encoding the audio you're sending it
over the network you're doing some
packet handling you're reassembling
those packets on the other side and what
we call a Jitter buffer you're decoding
and then you're finally at the like
transcription
stage if you are really careful about
transcription and inp pointing you can
probably get down to a couple hundred
milliseconds then you're doing llm
inference in this context you're really
hoping you can get your time to First
token from the llm in maybe another 100
milliseconds uh then you're doing some
speech uh sorry text to speech and then
you're going over the network working
out the other side these are real
measured numbers sort of medians there's
obviously some hand waving because it's
hard to measure everything perfectly and
there's huge amounts of variation but
you're you're at like 600 to 700
milliseconds optimistically but
realistically in this world so that
gives you a sort of sense of where we
are
um yesterday we posted a demo on Twitter
of really trying to drive all this
latency down uh and we have like a
container that's open source that you
can kind of look at and play with and
hack on uh that gets you to like the 500
to 700 milliseconds of uh
latency hello I'm fbot how can I help
you today let's play a geography game do
you want to give me some geography
questions to answer that's the spirit
I'll give you a question what is the
world's largest desert oh I think it's
the Sahara that's correct the Sahara
Desert is indeed the largest hot desert
in the world covering most of North
Africa well done next question which
river is the longest in South America
it's got to be the uh not the Nile sorry
the Amazon no need to apologize you're
absolutely right again the Amazon River
is indeed the longest river in South
America and one of the longest rivers in
the world good job next question what is
the capital city of
Australia canbera correct again cber is
the capital city of Australia you're h a
next question which mountain range runs
along the border between France and
Spain
Pyrenees exactly right the Pyrenees
Mountain Range does indeed run along the
border between France and Spain you're
really showing off your geography skills
here next question what is the largest
city in
Scandinavia visit
Stockholm I think there might be a
slight mistake visit.com is not a city
but rather a travel website I was trying
to say stockolm
no worries I knew what you meant and
you're absolutely right again Stockholm
is indeed the largest city in
Scandinavia and the capital of Sweden
you're doing great next question which
city is the largest in
Poland
kraco close but not quite Kow is a
beautiful city in Poland but it's not
the largest the correct answer is Warsaw
which is the capital and largest city of
Poland OH I feel like I should have
known that if we're going to talk about
capitals maybe I'll ask you some
questions what's the capital capital of
France I'm ready for a challenge the
capital of France is Paris
Germany not quite well Germany is a
country with no what's the capital of
Germany good one the capital of Germany
is in deep Berlin and what's the capital
of Stockholm I mean sorry what's the
capital of Sweden no worries the capital
of Sweden is Stockholm what a so the the
the idea there was not to do prompt
engineering because clearly I could have
better done a better job on prompt
engineering uh but to kind of really
test latency and show all the latency
numbers in one place it's useful to
visualize stuff and again that UI and
everything is open source so the the big
number in the middle uh is the the voice
to voice latency measured on the client
side and then the numbers on the edge
are metrics that are being sent from the
server that's doing all that work sort
of on the individual pieces the
measurement sent back so you can sort of
see what the variation is and correlate
and kind of get good intuitions about
this stuff um the architecture here is
uh uh two models by a company called
Deep gram uh for the transcription and
the voice generation uh that are really
good compromises between how good they
are and how fast they are and deep gram
has a hosted service but they also let
you run those models on premises in
little Docker containers um and that's
llama
38b I think because I couldn't quite get
70b to run a fast as I wanted it to
although in theory that's possible um
and I'll post some links uh to this if
you want to look at it
more so because we solved so many
problems over and over uh we thought it
would be great to have an open source
framework for this stuff it I think
we've seen this in other parts of AI
landscape things like Lang chain and
llama index are really valuable this is
sort of that for real time and
multimodal Ai and this slide probably
looks familiar because I stole the list
list of hard problems from this slide
and made a slide that I moved higher up
in the talk here for today um but this
is a open source framework called pip
cat uh it's gotten a bunch of traction
recently uh it's vendor neutral even
though it came out of uh work that we've
done at Daily early on on this and we're
just really excited about this it's
super fun to be getting lots of
community contributions now and if you
are trying to build really fast
multimodal AI stuff I think it's at
least worth taking a look at you can
build things like conversational Bots
and speech to speech language
translation apps and voice controlled
agents of various kinds like that
control your software user
interfaces uh and real-time Vision model
stuff like the awesome last presentation
is also like baked into pip cat services
now here's all the stuff that's
supported in PIP cat today we're adding
stuff all the time you can add stuff so
if you're interested in building please
hang out with us in the PIP cat Discord
if you want to contribute a a service
plugin please do that if you want to be
a maintainer for an open source project
that's a lot of fun ping me uh
maintainers are you know gold in the
open source World we're all always
trying to recruit great
maintainers um and just last slide about
the context here so this is the PIP cat
star rating list and the day that it
went vertical was the GPD 40
announcement we are gonna get great
multimodal models and they'll be
incredibly useful they'll make building
super fast stuff easier and easier um
but a they're not here yet and B we're
still going to need orchestration layers
for all this stuff um also the the the
demo that I showed just a minute ago uh
that I posted yesterday is now at
175,000 views on Twitter so there's more
and more and more interest in voice Ai
and we'd love to have people uh come
build with us
[Applause]
awesome thanks so much everyone for
joining us for the multimodal track um
and that is the end of the sessions
today
[Music]
[Music]
feel
[Music]
like it up the night when I'm
alone when I hear the
[Music]
tomato up
the
driv
your
sing every
[Music]
night
fire and the summer bre dancing the rec
in your bedroom like
always on my
mind
mind I feel it all come back in a
[Music]
moment the so if you want to
come
open it all slow motion
[Music]
in the
moment like
the so you
[Music]
[Music]
[Music]
get
[Music]
we
know do set open for us in a
moment keeping light on riding
our keeping our
sight everything
we we catch our breath in the midle of
it all
[Music]
Chas
coming coming Crystal
[Music]
like
true the I see on the horizon all
[Music]
we keeping all
waking up and turning on keeping the
light riding all
the keeping
[Music]
our catch our breath in the
[Music]
midle crystal VIs
[Music]
you true
[Music]
belie I see on
[Music]
the chasing
[Music]
stop and I won't let it
[Music]
go looking on
the I can see on the horizon
all we can
[Music]
feel true
belever we the I
[Music]
[Music]
[Music]
oh
[Music]
[Music]
you
[Music]
he
[Music]
oh
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
give
[Music]
you
you would you would
you would you
you
you
you
you
you
you
you
[Music]
you
you you
yeah would you would you
[Music]
[Music]
you you me
you
you
you you
[Music]
[Music]
would
you
you would
you
you
you you
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
breath
[Music]
Bes so I could strength to divide us
we got it I know we did the best I could
[Music]
the to it
[Music]
up kill your there's no to blame nothing
really stays the same this is how we
go we hold on to
[Music]
[Music]
let hold
[Music]
but I know there's a rainow over all the
head
on but I
know
on but this is how we
got to it
[Music]
up kill your pride
there's blame nothing really stays the
same this is how we grow
[Music]
[Music]
we hold
[Music]
on KN to kill your pride there's no to
blame nothing really stays the same this
is how we gr
we sometimes
[Music]
[Music]
[Music]
n
[Music]
[Music]
B
[Music]
[Music]
[Music]
[Music]
B
[Music]
[Music]
[Music]
[Music]
[Music]
d
[Music]
[Music]
[Music]
morning we out the night like we
dancing right through
the
while
[Music]
sing We our problems
[Music]
like
super
[Music]
super
[Music]
only
America your influence
waxing now I couldn't
seean we blame it
[Music]
[Music]
alling
while
sing we give up our gos
[Music]
ason
only in
America it's coming over me electric
Sy
[Music]
night hold
[Music]
is
[Music]
us
us hold
[Music]
tonight is going
come
[Music]
us
only you want to feel like us it's
forever
[Music]
[Music]
amever Amica
so come with us
[Music]
[Music]
[Music]
back in
89 we were kids falling in love for the
first time H your hand you me in the
eyes kind of feeling you get in
a but now something went wrong you're
moving on I found myself on the
blind now you won't call we lost it all
you fade away I'm picking up my heart
from every piece that's broken try to
get to don't a I'm looking
[Music]
for it's I'm losing all mys like I'm
here
because I'm missing
you because I'm missing
you
oh because I'm missing
you because I'm missing
you because I'm missing you
[Music]
because I was chasing all the
sides trying to hold something that I
couldn't find you didn't Captivate my
mind now I know we been the sunset in
Paradise but now something
went wrong you're moving on I found
myself on the
blind now you won't call we lost it all
you fade away I'm picking up my
heart that's broken try to get back to
don't have a I'm looking for some luck
can't find the door it's open I'm losing
all my hope feels like I'm left
here because I'm missing
you because I'm missing
you
oh because I'm missing
you because I'm missing
you because I'm missing you
[Music]
broken
[Music]
[Music]
[Music]
F out all the we SP fighting through the
fire don't let me down I need you now
I'm feeling
out
L get on
my CAU in
thead I feel
you let me I need you right now I want
to be next to you you want to be next to
me holding our Paper Hearts fading our
Broken Dreams I want to be next to you
you want to be next to me holding our
paper heart feing our Broken Dreams want
to be next to
[Music]
tell me that you to stay baby just don't
walk away I need you
F out all Al
through me down I need
[Music]
you
out
on CAU in the mad I feel you
somehow don't let me go I need you right
now I want to be next to you you want to
be next to me holding our Paper Hearts
fading our Broken Dreams I want to be
next to you you want to be next to me
hold it on paper heart our Broken Dreams
to next
[Music]
to to
you
to our Broken Dreams to next to you you
want to be next to me holding our
papering our broken dream
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
[Music]
we got an
insomnia with eyes wide shut we got
everything we need and then a little too
much I know that you
star you
[Music]
be
now there's something in the under I can
feel it coming up don't you want to feel
it Tak over your senses don't you ever
feel teolog
baby
[Music]
esep think
there's that's down it's just the weight
of the world now I'm calling it out
we're a little starving
foring can we speak Hest right now
there's something in
the I can feel it coming up don't you
want to feel
[Music]
it baby with me I'll come sweep you your
Fe don't you to feel it don't you don't
you
[Music]
are
stor feeling you can't
ignore do you ever
[Music]
[Music]
[Music]
he
[Music]
[Music]
[Music]
n
[Music]
ladies and Gentlemen please return to
the ballroom to take your seats for our
afternoon program we will start our
program in 10 minutes thank
[Music]
you the night
like we
our
[Music]
while
the my we our problem
[Music]
[Music]
forever America
[Music]
[Music]
waing
through while
we sing
[Music]
[Music]
it's
forever in America
El
[Music]
[Music]
[Music]
hold tonight is all we
[Music]
only you feel like it's forever
[Music]
[Music]
going so come
with hold t
[Music]
ladies and gentlemen our program begins
in five minutes
[Music]
[Music]
[Music]
n
[Music]
[Music]
a
[Music]
[Music]
[Music]
[Music]
[Music]
e
[Music]
[Music]
[Music]
n
[Music]
ladies and Gentlemen please take your
seats our program is about to begin
thank you
ladies and Gentlemen please welcome to
the stage head of developer experience
at open AI Roman
[Music]
hu wow good afternoon everyone super
excited to see you all here today such
an incredible energy here at I'm Roman I
lead developer experience at oppi and
before joining OPP I was also a founder
and like many of you in this room I
actually experienced firsthand the magic
of building with the frontier models now
I'm working on making sure we offer the
most delightful experience for all of
you builders in this room and what I
love the most about this role is also
showing uh the art of the possible with
our AI models and Technologies and so uh
today we're going to go through a few
things uh that uh that the great
capabilities that the open team has
built recently and we'll show some live
demos to really bring them to
life but first I'd like to start with a
quick um zoom out on how we've gotten to
where we are
today openi is a research company and
we're working on building AGI in a way
that benefits all of humanity and to
achieve this Mission we believe in each
rative deployment we really want to make
the technology enter contact with
reality as early and often often as
possible and for that a top Focus for us
at oon ey is really all of you like the
best builders in the world we really
believe strongly that the best
developers and startups are integral to
the G in this like AGI Mission you guys
are the ones that are going to build the
native AI products in the future so
today we have three million developers
around the world building on the uh
openi platform and we are very fortunate
uh to to have so much Innovation but I'd
like to rewind a little bit and you know
today outside of this room when people
think of open a they often think of CH
GPT first because that's become like the
the product that has taken the World by
storm a little bit but the first product
was actually not chat GPT the first
product we put out there was the
developer platform so back in 2020 at
the time we had gpt3 and that's when we
first started launching it to the public
behind an API uh maybe quick show of
hands actually who in this room have
played uh with the API at the time of
gpt3
ready wow that's like more than half of
you you you guys are really my crowd
here that's awesome and you know at the
time we kind of got a taste for what AI
would be capable of doing like basic
coding assistance copy editing maybe
some very simple translation but to
really put things in perspective at the
time this was the most or one of the
most popular use cases on the platform
AI dungeon this was like a roleplaying
game purely based on text and it kind of
was generating open and it stories and
you could navigate uh the world and you
know at at each scener when you were
trying to look around you would generate
new text so that was kind of the
state-ofthe-art at the time obviously in
2023 gp4 changed the game it completely
um changed the way we thought about AI
it got better at reasoning it get more
creative more specific it could start
being better at coding and reasoning
about complex problems and it could use
tools also interpret data and kind of
that dramatically expanded the aperture
of the possibilities with the
platform we've had a great Fortune of
working with many many developers and
companies like integrating gp4 in their
own apps and services and this is just
one example among many uh Spotify when
they took um the our models to kind of
generate uh playlists on the Fly based
on your music taste and
history but the one thing I want to
highlight today in this talk is that gp4
was also the beginning of our
multimodality Journey this is the very
first time where we introduced like
Vision capabilities and suddenly gp4
could analyze and interpret data um such
as images photos Etc not just purely
text and then with gp4 tubo for the
first time we brought Vision
capabilities into the same model right
inside so that you can have the exact
same model that does the the two
modalities at the same
time and of course last month we kind of
took a step further with multimodality
and that brings us to GPT for so GPT new
flagship model that that can reason
across audio video and text all in real
time and thanks to its native image and
audio capabilities we really think that
you can now build uh very natural human
computer interactions in a way that
we've never seen
before and so the reason we call it GPT
4 is because O is for Omni Omni model if
you will and that that's because we
brought all of these modalities into to
one single model for you to interact
with and there are like many things that
are very unique and interesting about
GPT for but I'd like to call out a
couple uh step function changes here the
first one is that it's really a big step
towards like this like you know
multimodality future with like Ultra
fast latency for the first time like I'm
not sure if many of you have played with
chat GPT with voice mode before or if
youve tried to build your own voice
assistant but you kind of had to stitch
kind of three models together first like
whisper for translation then gp4 for
reasoning and then ultimately Text to
Speech but of course that kind of added
a loss of context and some latency along
the way now we have all of that into a
single model and for all of you
developers what's really exciting too is
that for the same kind of
state-of-the-art reasoning capabilities
that we offer with gp4 we also Shi
massive efficiency improvements and that
means that in turn we've been able to
turn those efficiency uh gains back to
you so as a result um gp4 is twice as
fast as gp4 turbo it's also half the
price and we also increase drastically
the rate limits I I actually have like
5x higher rate limits on this slide but
we are kind of like always pushing them
higher and we really want to live in a
world where there's not even any rate
limits so you can build with no
limits but with all that said I'd like
to give you a glimpse into GPT 4 uh and
so we're going to switch over to my MAAC
here so for this first live demo I'm
going to bring up uh chat gbt for
desktop and all I have to do is Click
option uh space and I'm going to bring
the voice mode right
here hey CH GPT how are you hey I'm
doing great thanks for asking how about
you I'm great I'm actually on stage at
the AI engineer Worlds Fair in San
Francisco uh could you greet the
audience we have some amazing developers
here in the room hey everyone it's
awesome to be virtually here at the AI
engineer World's Fair in San Francisco
so Chad I kind of like how you started
but I really want you to bring the
energy like people are very excited here
so make this more live and also don't
forget people are also tuning in from
the live stream so don't forget about
them hello s Francisco and to everyone
joining us on the live stream welcome to
the AI engineer World's Fair we've got
an incredible lineup and a ton of energy
here today let's dive into all the
exciting Tech and Innovation that's much
better thank you Chad GPT I think they
really enjoyed
that here if you need anything else just
let me know hope the event goes
fantastically yeah maybe one thing you
could do like since we have a crowd of
the best Builders out there can you
whisper one secret that's really
precious but has their building the
future of their AI native products what
should they know all right just between
us remember the key to building great AI
native products
is focusing on responsible and
ethical parency and glad you liked it
can you whisper another one but even
lower like very very very
slow sure here's one more it's crucial
to keep your AI adaptable and
scalable technology evolves fast your
products should
too cool
so one thank
you a couple things to call out here but
first of all like you can see the
latency is pretty impressive like it's
very much like natural in a conversation
and and there's Lally no difference with
talking to humans it's almost too fast
at times I would say um the second thing
that's really cool is that it
understands emotion and can generate
tones you know it could whisper it could
bring the energy it could be more alive
but also I can jump back in the
conversation at any time I can interrupt
the model and I doesn't I don't have to
wait for the audio output to finish and
that's really cool to have something
much more natural uh versus what we what
we knew before so the next thing I'd
like to to show you of course we're
talking of modalities so let's bring the
second modality I'd like to turn on
camera
here anytime hey GPT can you now see
me yes I can see you yeah you know you
don't have to whisper anymore you can
talk normally
now got it
how's the event going it's good um do I
look good actually because I'm in front
of a lot of people
here you look great ready for the
spotlight awesome okay so give me a
second I'm going to try I'm going to try
to draw something for you and and I'll
I'll show it to you sounds good I'm
curious to see what you come up
with just give me a few more seconds you
know take your time I'm here whenever
you're
ready
patient no rush I'm always here when
you're ready to okay so based on what I
drew here can you take a guess out where
I am and also can you translate what I
wrote yes that's the Golden Gate Bridge
so you must be in San Francisco and you
wrote bonjour developer which means
hello developers in French pretty cool
all right
um this was a pretty poor Bridge uh but
chat GPT is pretty clever um next I have
this book can you take a look chpt what
is
this yeah that's poor Charlie's Almanac
by Charlie Munger it's a collection of
the speeches talks and life philosophy
Munger is known for his wisdom on
investing and decision yeah can you uh
can you just give me a random number
between 1 and
400 sure how about
126 126 here we go okay page 126 can you
give me like a brief overview of what's
happening on this
page it looks like this page talks about
Coco Chanel rule for Success it mentions
that Chanel focused on identifying the
qualities her customers wanted and made
sure to offer them essentially
understanding your audience desires is
key to success would you like a bit more
detail from the page no I think I'm good
this is pretty amazing you you read
faster than I could read just one line
so it's pretty amazing all right um cool
next I'd like to show you something else
so instead of camera I'll go real quick
here and uh instead of sharing camera
I'll share my screen so I'm going to
bring up Chad GPT one more time here
with audio
capabilities I'm going to reduce this
window a little
bit can you clar okay chpt I'm going to
share my screen with you can you see
what I
see yes I can see your screen what are
we look looking at here uh well I'm I'm
currently working on this app uh you
know it's a travel app and I'd like to
make it a little better um but there's
one thing that I'm not quite sure about
doing and I'd like your help so if I go
to this tab here and if I try to put the
final touches on it feels like the
responsiveness is not really good like
what do you
think I see what you mean when you
switch to the mobile view the layout
looks a bit cramped and the images seem
squished have you considered using media
queries to adjust the layout for smaller
screens so let's take a look at the code
here uh maybe you could help me I'm
going to send you first like this whole
component um I'm going to copy the code
over to you if you can take a look uh
can you give me like a on line overview
of what's going on in that
one this code defines a react hook use
assistant that manages interactions with
an AI assistant it handles sending and
receiving messages file uploads and
streaming responses from a server okay
cool I don't think that's where we need
to change some stuff I think it might be
over here in the Discover component what
do you
think yeah it does look like this is
where the layout is being set for those
end descriptions so I'm using tawin CSS
um what should I do exactly to kind of
change it and make it responsive
here you could adjust the grid color
attributes on mobile sizes by using
mobile what's the what's the of prefix I
should use for mobile screens
then you can use the SM prefix for small
screens okay try
that I say yeah this and maybe for uh
other should that is that looking right
like medium screen maybe two and then we
keep the three for large screens
something like
that yeah that looks good adjusting
the take a look now what do you think
it looks much better the images and
layout great job solving that any else
you want to
tweak so I really like using chat GPT on
desktop as a coding assistant but of
course this was a very trivial use case
but what's also even more interesting is
when you start reasoning out loud with
chat GPT to build something but you also
tell like hey actually I'm going to get
cursor to do it but what should I prom
cursor and I've done that many times
it's also pretty amazing to see how of
them can interact across modalities but
let's go back to my presentation
please I'd like to give you a little bit
of a sneak peek of what's on our mind
what are we working on next at openai as
we think about these modalities and the
future of models so there are four
things that we're currently top of that
are currently top of mind for us
especially for all of you developers
building on the platform the first thing
is textual intelligence of course as you
can tell we're extremely excited about
modalities but we also think that
increasing textual intelligence is still
very key uh to unlock the
transformational value of AI and we
expect the potential uh of llms
intelligence that we expect that
potential to be like still very huge in
the future those models today they're
pretty good you know as we can tell we
we're building things with them but at
the same time what's really cool to
realize that is that the dumbers they
lever be will always have better models
coming up and if you will like it's
almost like we have first graders
working alongside us they still make
mistakes every now and then
but we expect that in a year from now
they might be like completely different
and unrecognizable from what we have
today they could become Master students
In The Bleak of an eye in multiple
disciplines like medical research or
scientific reasoning to reex the next
Frontier Model will have such a a
function change in reasoning
improvements
again the second Rea Focus that we're
excited about is like faster and cheaper
models and we know that not every use
case requires like the highest
intelligence of course gp4 pricing has
decreased significantly uh 80% in fact
over a year but we also want to in
introduce like more models over time so
we want these models to be cheaper for
you all to build we want to have models
of different sizes we don't really have
timelines to share today but that's
something we very excited about as well
and finally we want to help you run U
async workloads we launched a couple
months ago the batch API and we're
seeing like tremendous success already
especially For Those modalities say you
have like documents to analyze with
vision
with or photos or images all that can be
batched for another 50% uh discount on
pricing third we also believe in model
customization we really believe that
every company every organization will
have a customized model and we have like
a wide range of offering here I'm sure
many of you here have tried our
fine-tuning API it's completely
available for for anyone to build with
but we also assist companies all the way
like Harvey for instance uh a startup
that's building a product uh for law
firms and they were able to kind of
customize gp4 entirely on us case law
and they've seen like amazing results in
doing so and last we'll continue to
invest in enabling agents we're
extremely excited about the future of
agents and we share a little bit about
that Vision back in November at death
day and agents will be able to perceive
interact with the world using all of
these modalities just like human beings
and once again that's where the
multimodality story comes into play
Imagine an agent being able to kind of
coordinate with m multiple AI systems
but also securely access your data and
and even yes manage your calendar and
things like that we're very excited
about
agents dein of course is an amazing
example of what agents can become like
cognition Labs has built this awesome uh
uh this awesome like software engineer
that can code alongside you but he's
able to break down complex task and
actually um you know browse the
documentation online submit pool request
and so on and so forth it's really a
glimpse into uh what we can expect for
the future of agents and with all of
that it's no surprise that in fact Paul
gram realized um a few months ago that
like often 20 year 22y old sorry
programmers are often as good if not
better than 28y old programmers and
that's because they have these amazing
AI tools at their fingertips so with
that I'd like to switch to another demo
to kind of show you this time not chat
GPT but rather like what we can build
with these modalities
so in the title of this talk I did not
mention video but I'm sure most of you
have seen Sora the preview of our kind
of diffusion model that's able to
generate videos from a very simple uh
prompt and this is one of them so uh in
the interest of time I've already sent
this prompt to Sora describing a
documentary with a tree frog very
precise on what I'm expecting and if I
click here this is what came out of Sora
[Music]
it's pretty
[Music]
cool but next what I'd like to do is
going to bring this video to life you
know and and here what I'm doing is like
I simply sliced frames out of the video
of Sora and what I'm going to do next is
very simple I'm going to send these six
frames over to um to GPT for with vision
with this prompt if you're curious and
I'm going to tell it to narrate what it
sees as if it was a narrator so going
back here I'm going to click analyze and
narrate again this is all happening in
real time so every single time the story
is unique and I'm just discovering it
like all of you and boom that's it so
that's what gp4 with vision was able to
create based on what it saw in those
frames so it's pretty
magical but last but not least I wanted
to show you one thing that we also
previewed recently uh and it's our voice
engine model the voice engine model is
the ability for us to create custom
voices based on very short clips and of
course we take safety very responsibly
so this is not a model that's broadly
available just yet uh but I wanted to
give you a sneak peek today of how it
works and also the voice engine is what
we use internally with actors to bring
the voices you know in the API or in
chat GPT so here I'm going to go ahead
and show you a quick demo hey so I'm on
stage at the uh AI engineer World Fair I
just uh need to record a few s seconds
of my voice I'm super excited to see the
audience that's really captivated by
these modalities and what we can now
build uh on the openi
platform all right hey so I'm on stage
at the uh engineer World Fair I just
yeah sounds like it's perfect uh that's
all we need so now to bring it all bring
us all together here what I'm going to
do is I'm going to take this clip I'm
going to take the script that we just
generated and I'm sending all of it back
to um the voice engine and we'll see
what
happens in the heart of the dense Misty
Forest a vibrant frog makes its careful
way along a moss covered Branch its
bright green body adorned with black and
yellow patterns stands out amidst the
Lush foliage I can also have it
translate in multiple languages so let's
try French
and for those who know me that's
actually I sound speak
French maybe one last one with
Japanese all right um thank you you
let's go back real quick to the to the
slides and of course this is one's very
specific examples of bringing modalities
together with you know Sora videos GPT
for1 Vision the voice engine that we
have not uh released yet but I hope this
inspires you to see how you can kind of
picture the future with these modalities
combined together so to wrap up we're
focused on these four things textual
intelligence to drive it up uh making
our models faster and and more
affordable so you all can scale we're
thinking about customizable uh models
for your needs and finally making sure
you can build for this multimodal future
and agents and if there's one thing I
want to leave you off with today is that
our goal is not for you guys to spend
more with open but our goal is for you
to build more with open a because let's
remember we're still in the very early
Innings of that transition and it's a
fundamental shift in how we think and
build software every day so we really
want to help you in that transition
we're dedicated to supporting developers
startups we love feedback so if there's
anything we could do better please come
find me after this talk and you know
this is really like the most exciting
time to be building an AI native company
so we want you to bet on the future of
AI and and we know that gold Builders
like all of you are going to come up
with the future and and invent it before
anyone else so with that thank you so
much and we can't wait to see what
you're going to build with those new
modalities and reinvent software
[Applause]
2.0 ladies and Gentlemen please welcome
to the stage the authors of what we
learned from a year of building with
llms Brian Bishoff and Charles fry
[Music]
[Applause]
hey
everyone
so you're about to experience something
of a strange talk and not just because
Brian and I are strange but because
something kind of strange
happened over the last year A bunch of
us were posting things on Twitter we
were uh writing blog posts complaining
about llms and we formed a little group
chat and we were you know continuing to
complain about llm to each other uh and
sharing what we were working on when we
realized we were all about to write the
exact same blog post what we learned in
the last year so we we got together and
we uh we turned what was initially a
couple of short blog posts into a long
white paper on O'Reilly uh combining our
lessons across strategic operational and
tactical levels of building LM
applications and the response to that
white paper was overwhelmingly positive
we got uh we heard from everybody from
people who contribute to postgress to
venture capitalists to Tool Builders
saying we loved what you wrote in that
article um I like I felt that pain too
and we were invited on the strength of
that to give this keynote
address and so we faced a kind of funny
challenge which is part of the appeal of
this blog post uh of this article was
that the six of us all came together to
write it as Scott condrin put it it was
like the an Avengers team up uh so we
had to figure out a way to deliver uh
one keynote talk from six people uh so
we we pulled the Avengers together for
uh one night only uh to sort of EX like
deliver some of the most important
insights from that 30 page article uh to
add some of our spicy extra takes that
ended up on The Cutting Room floor and
to respond to the
allegations I'd like to State
unequivocally that we are not in fact
Crypt Bros who just found out that gp4
was the new web 3 um uh we all trained
our first neural networks back when you
had to write the gradients by
hand so uh we split the article up to
three pieces we split the talk into
three pieces first you're going to hear
from me and Brian talking about the
Strategic considerations for building
llm applications how do you look to the
Future how do you see around corners how
do you make big
decisions then we're going to hand the
clickers and the stage over to ham
Hussein and Jason Lou Who are going to
share the operational considerations how
do you put together processes how do you
put together teams how do you think
about workflows around delivering llm
applications and then they will hand
over the clickers in the stage to uh sh
Shankar and Eugene Yan who will talk
about the Tactical considerations for
building LM applications what are the
specific techniques tactics and moves
that have stood the test of one year's
time for building llm
applications all right so Brian how do
you build an LM application without
getting outmaneuvered and wasting
everybody's time and money ah yes yes
well many of you may be thinking that
there's really only only one way to win
in this new exciting Dynamic and very
scary industry and that of course is to
train your own custom model pre-training
fine-tuning a little rhf here and there
you better start from scratch
buddy eh not quite the model is actually
not your
moat for almost no one in this audience
the model is the
moat you all as AI engineering
devotees should be building in your zone
of
Genius you should be leveraging your
product expertise or your existing
product maybe you've got one and you
should be finding your Niche and digging
into that Niche exploiting
it you should be building what the model
providers are
not there's a high likelihood that the
model providers have to build a lot of
things for all of their customers don't
waste your calories on building these
things the Sam mman phrase of
steamrolling is appropriate
here and you should be treating the
models like any other SAS product you
should be quickly dropping them when
there's a competitor that's clearly
better no offense to GPT 40 but Sona 35
looking looking pretty
sharp it's important to keep in mind
that a model with high MML U scores
that's not a product 87% on spider SQL
that doesn't automate all data requests
or even 87% of them you can't sell human
eval pass at
67 at least my GTM team doesn't know
how an excellent llm powered
application is an excellent
product it's
well-designed it solves a job to be done
and it enhances your
user why are we so excited about AI
human
enhancement so what should you build if
not all of these
things things that generalize to smarter
and faster
models things that help you maintain
your products Quality Bar under
uncertainty and things that help you
continuously improve whoa Brian
continuous Improvement that's uh that's
my trigger
phrase the idea of continuous
Improvement has been brought to the
world of llm applications by like this
shift in Focus that we've all felt since
the previous AI engineer Summit to focus
on evaluation and data it's nicely
syncot alized by this diagram from our
co-author ham Hussein showing this
virtuous cycle of improvement it has
evals and data at the center but the
core reason to create those evals the
core reason to collect that data is to
drive forward this Loop of continuous
Improvement and despite what your
expensive Consultants or U your the many
of the uh LinkedIn influencers posting
about llm apps might say this is not
actually the first time that Engineers
have tried to tame a complex system and
make it useful and
valuable this same Loop of iterative
improvement was also at the core of
mlops at the
operationalization of machine learning
models before llms this figure from our
co-author shre Shan car's paper uh had
that same Loop of iterative improvement
centered also on evaluation and on data
collection mlops was also not the first
time that Engineers faced this problem
the problem of complexity the problem of
non-determinism and
uncertainty this the devops movement
that gave mlops its name also focused on
this kind of iterative improvement and
on monitoring uh information in
production to turn into improvements to
products but dear reader devops was not
the first time that Engineers t tackled
this problem of
uncertainty and solved it with iterative
Improvement devops built on the ideas of
The Lean Startup movement uh from Eric
Rees that that was focusing not just on
building an application not just on
building an a machine learning model or
an llm agent but on building the entire
business and it used this same Loop
centered on measurement and data uh to
to drive the Improvement of and building
of a business
this idea itself was not invented in
Northern California despite what uh some
people might say it has its roots in the
Toyota production system and in the idea
of Kaizen or continuous
Improvement geni gutsu is one of the
core principles from that movement that
we can take forward into the development
of llm applications it means real things
real places and at Toyota that meant
sending Executives out to factory floors
getting their khakis a bit dirty for LM
applications the equivalent is looking
at your
data looking the that data is the real
information about how your llm
application is delivering value to users
there's nothing that is more valuable
than
that finally it's there's lots of people
selling tools at this conference
including myself it's easy to get overly
excited about the tools and the
construction of this iterative Loop of
improvement and to forget where value
actually comes from and there's a great
pity the earthy statement from the
Toyota production system from chig
shingo that I really like value is only
created when metal gets vent so we have
to make sure that we don't get lost just
building our eals and calculating
concept drift and we instead make sure
that we continue to get out there and
bend metal and create value for our
users not going to lie I might have
misunderstood earlier when you said
let's get bent Okay so right off the bat
we need to spin that data flywheel Bob
oh wait sorry wrong wrong game show
point is we need to get this moving we
need to get this in front of users and
human beings we need to express the
goals for our system and how do we do
that with evals remember evals are not
convenient weird bespoke uh metrics
evals are objectives they're what we
want our system to
do any system for capturing this
behavior is good good enough I don't
have an eval framework to sell you but
what I do have to sell you is this idea
that you should be getting out there you
should be getting started but wait Brian
I'm really nervous what if this isn't
good enough for my customers fear is the
mind
killer put it out there in beta if it's
good enough for these incredible
companies like apple intelligence
Photoshop and hex that's me it's good
enough for you you need to collect this
data you need to put something in the
wild you need to start looking at your
user interactions the real user
interactions
lm's responses deserve human eyes you
can give it some AI eyes too but
definitely look at it with your human
eyes binary human feedback is valuable
it's nice to add some rich feedback too
that can be interesting but start with
binaries and finally user requests will
reveal the pmf opportunities that lie
below your product substrate where is
your pmf everybody wants to know it's in
your user interactions what are they
asking your chatbot that you haven't yet
implemented that's a really nice
direction to skate if that's where the
Puck's
going and despite the focus on the user
interactions that you can have today the
things that you can ship right now it's
important to also think about the future
the best way to predict the future is to
look at the past find people predicting
the present and copy what they
did in designing the many of the
components of the personal Computing
Revolution Alan Kay and others at
Park adopted as a core technique
projecting Moore's Law out into the
future they built expensive unmarketable
slow and buggy systems themselves so
they could experience what it was like
and build for that future and and create
it we don't have quite the industrial
scaling uh information that uh that
Moore had when he wrote down his
predictions but we do have the
beginnings of those same
laws there's been an order of magnitude
decrease every 12 to 18 months at three
distinct levels of capability at the
capability of Da Vinci the original gpt3
API that brought that excited a a lot of
us about the idea of building on
Foundation
models the capabilities of Tex D Vinci 2
the model lineage underlying chat gbt
that brought the rest of the world to
excitement about this technology and the
latest and greatest level of
capabilities with gp4 and Sonet in each
case around 15 months is enough time to
drop the cost by an entire order of
magnitude this is faster than Mo's
law and so the appropriate way to plan
for the future is to think what this
implies for what applications that are
not economical today will be economical
at the time that you need to raise your
next round uh so in 2023 it cost about
$625 an hour to run a video game where
all the NPCs were powered by a chat bot
that's pretty expensive in 1980 it cost
about $6 an hour to play Pac-Man
inflation adjusted
that suggests that if we just wait for
two orders of magnitude reduction or
about 30 months from mid 2023 it should
be possible to deliver a compelling
video game experience with chat chatbot
NPCs at about $6 an hour and people will
probably pay for it so you can't sell it
now but you can live it and you can
design it and you can be ready when the
time
comes so that's how to think about the
future and how to think strategically
when building L applications I'd like to
call to the stage my co-authors Jason
Lou and hamama Hussein to talk about the
operational aspects let's give them a
[Applause]
hand all right so how and I have
basically been doing a lot of AI
Consulting in the past year right we've
worked with about 20 companies so far
and you know we've done something from
preed all the way to public companies
and I'm pretty bored of giving generic
good advice especially because there's
such a range of operators here and so
instead I'm going to invert my goal
today is to tell you how to ruin your
business first of all everyone knows
that in the gold rush you sell shovels
and so if you want to get gold you got
to buy shovels too right you know if you
want to find more gold keep buying
shovels where do I dig keep buying
shovels how do I know when to stop
digging the shovel will tell you and how
do I dig one deep hole versus making
investments in a plenty of shallow holes
again the answer is more shovels clearly
right and this might be generic so I'll
give you some more specific advice if
your rag app doesn't work try a vector
database a different Vector database if
the methodol doesn't work Implement a
new paper and maybe if you update the
embedding model you'll finally find
product Market
fit because truth be told success does
not lie in developing expertise or
processes try more tools there's no need
to balance between exploring and
exploiting the mechanisms that work for
you change the tools and the processes
and the decision-making Frameworks don't
matter the right tool will solve
everything number two find a machine
learning engineer who can fine-tune as
quickly as possible a $2,000 per month
open AI bill is very expensive and
instead hire someone for a quarter of a
million dollars give them 1% % of their
company to fight Cuda build errors and
figure out server cold starts right
because what's the point of growing your
company if you're just a rapper and if
your margins are too low try fine-tuning
it's much easier than figuring out how
to build something worth charging
for it's really I can I cannot reiterate
this enough it's very important to hire
a machine learning engineer as quickly
as possible right even if you have no
data generating
products they love fixing versell
typescript build errors and
generally if you hire a full stack
engineer who's really caught the llm bug
they they're going to lack real
experience and this is because python is
a dead language right machine learning
Engineers research Engineers can easily
pick up typescript and the ecosystem
that exists in Python could be quickly
reimplemented in a couple of weekends
right the people who wrote python code
for the past 10 years doing data
analysis they're going to easily be able
transition their tools and if anything
it's really easy to teach things like
product sense and data literacy to the
JavaScript
community and most important of all in
order to find this kind of magic Talent
we need to create a very Catal job title
let's use words like ninja and Wizard or
data scientist or prompt engineer or
even the AI
engineer in the past 10 years we've
known that this works really well right
every time we know exactly L who we want
as long as we catch a very wide net of
skills it doesn't really matter whether
or not we don't know what outcomes we're
looking for anyways to dig me out of
this hole I'll uh have HL explain and uh
you know take a deep breath think out
loud step by
step thank you
[Applause]
Jason so that was really good I mean
this is step back from the cliff a
little bit and let's kind of Linger on
the topic of AI engineer had heard some
booing in the audience um and so I love
the term a engineer like much props to
swix for kind of popularizing this term
allows us all to get together and have
conversations like this but I think that
there's a misunderstanding of the skills
of AI engineer what is what skills you
need to be successful and there's a lot
of inflated
expectations as a founder or engineering
leader your talent is the most important
lever that you
have and so what I'm going to do is I'm
going to talk about some of the problems
and perhaps some solutions when it comes
to this Talent a
misunderstanding so just to review what
is an AI engineer so this is a diagram
that everyone has probably seen uh
there's a spectrum of skills in the AI
space and there's this API dividing line
in the middle and kind of to the right
of the AP dividing line we have ai
engineer AI engineer skills are focused
on things like chains agents tooling and
infra and auspiciously missing from the
AI engineer are tools like evals and
data and I think a lot of people have
taken this diagram too literally and
taken it to heart and say hey we don't
really need to know about ebals for
example the problem is is that you can
go from 0 to one really fast in fact you
can go to 01 faster than ever before
with all the great tools out there just
by using Vibe checks and implementing
the tools that we talked about however
without evals you can't make progress
quickly lead to stagnation because if
you can't measure what you're doing you
can't make your system better and you
can't go beyond 0o to
one so what can we do about this about
this eval skill set and data literacy so
Jason and I have found that we can
actually get really good at writing
evals and data literacy with just four
to six weeks of deliberate practice in
fact like very effective and we think
that these skills evals and data should
be brought more into the core of AI
engineer and it really it like helps
solve this problem and it's something
that we see over and over
again so the next thing I want to talk
about is the AI engineer job title
itself and
so vague job titles can be problematic
what we see over and over again in our
Consulting is that this kind of catchall
role have very inl inflated
expectations um this anytime anything
goes wrong with the AI people look
towards that role to fix it and
sometimes that role doesn't have all the
skills they need to move forward and
we've seen this before with the role of
data scientists the titles and names
really matter
um and what I want to emphasize I think
AI engineer is very aspirational and you
should keep learning and it's a good
thing to strive towards but you need to
have reasonable
expectations and just to kind of bring
it back to data science we've seen this
before in data science as well where we
had kind of a decade ago when this role
was coined it was a unicorn that had all
these skills software engineering skills
sta itics math domain expertise we found
out as an industry that we had to unroll
that into many other different roles
such as decision scientist machine
learning engineer data engineer so on
and so forth and I think similar things
may be happening with the role of AI
engineer and it's good to keep that in
mind and what I see or what we both see
in Consulting is that it's helpful to be
more specific to be more deliberate
about what skills you need and at what
time and depending on your maturity
it's very helpful to not only specify
what the skills are but what kinds of
products you'll be working on so these
are some job titles from GitHub co-pilot
um that kind of are very specific about
the skills you need at that time and
really it's important to hire the right
Talent at the right time on the maturity
curve so when you're first starting out
you only need application development
software engineering and or AI
engineering to go from zero to one then
you need platform and data engineering
to C capture that data and then only
after that you should hire a machine
learning engineer do not hire a machine
learning engineer without having any
data but again you can get a lot more
mileage out of your AI engineer with
deliberate practice on evals and data we
usually find four to six weeks practice
does the job so in
recap one of the biggest failure modes
is Talent we think that a engineer is
often over scoped but underspecified but
we can fix that learning
evals next I want to give it over to
Shrea Shankar and Eugene Yan to talk
about to dive into this evals and data
[Applause]
literacy thanks question thank you
Jason thank you haml next up sh and I
going to share with you about the
Tactical aspects of building with LMS in
production specifically evals monitoring
and guard rails so here's a he news
quote how important evals are to the
team is a differentiator between team
shipping out hot garbage and those
building real products I would agree I
think here's an example of lm's uh of
Apple's recent LM where they shared
about how they actually collected 750
summaries of push notification and email
sum summarizations because these are
data sets they are representative of
their actual use case so how do we build
evals for our own products well I think
the same thing the simple thing is to
just make it simpler for example if
you're trying to extract product
attributes from a product description
break it down into title price rating
and then you can just simple do simply
do assertions s similarly for
summarization instead of trying to eval
that amorphus blob of a summary break it
down into Dimensions such as factual
inconsistency relevance and
informational density and once you've
done that assertion based test can go
along way are we extracting the correct
price are we extracting the correct
title or if you're doing natural
language to SQL generation is it using
the expected table is it using the
expected columns these are very simple
to eval and reiterates what haml has
mentioned about keeping it simple
lastly assertions can do everything but
they can only go so far so therefore
consider evaluate the models maybe
training a classifier for factual
inconsistency or reward model for
relevance this is easier if your evals
are classification and regression based
but that said I don't know how I feel
about LM as a judge what do you mean you
don't like llm as a judge I I personally
am super bullish on llm as a judge and
I'm curious how many of you are
exploring LMS judge or have implemented
it no yeah there's a judge right here
you want to stand up no actual Jud LM
judge here yeah anyways we're going to
go through some points on what to
consider when deploying llms Judge first
of all there it's no brainer llms judge
is the most easy to prototype you just
have to write a prompt to check for the
criteria or metric that you want and you
can even align this towards your own
preferences by providing few shot
examples of good and bad for that
criteria on the other hand fine-tune
models or llms that you know you have to
collect a lot of data and set up a
pipeline to train as your evaluator are
not super easy to prototype and have a
lot of upfront
investment yeah but that said LM a Jud
it's pretty difficult to align it to
your specific criteria in the business
who here has not had who here has not
had any difficulty aligning the LM as a
judge to your
criteria anyone okay we got to talk
later sha um I think that if you just
have a few hundred to a few thousand
samples it's very easy to F tune a
simple model who can do it more
precisely second then if you want to do
LM as a judge and have it fairly precise
you sort of need to use Chain of Thought
and Chain of Thought is going to be I
know 5 to 8 seconds long on the other
hand if you have a simple classifier or
reward model every request is maybe 10
milliseconds long that's two ords of
magnitude lower and would improve
trut next we want to think about
technical debt okay when we're
implementing our validators in
production even if they run
asynchronously or they run in the
critical path how much effort do we need
to put in to keep these up to date with
llm as judge if you don't make sure your
few shot examples are dynamic or some
way of making sure your judge kind of
prompt aligns with your definition of
good and bad then you're toast and kind
of the effect is not as pronounced for
fine-tune models but if you don't
continually fine-tune your validators on
new data on new production data then
they will also be susceptible to drift
so over all when do you want to use llm
as judge it's honestly a resources
question and where you are in your
application development if you're
starting to prototype it um you need
quick evals with minimal Dev effort and
need something you have a lowish volume
of evals start with llm as a judge and
kind of invest in the infrastructure to
align that over time if you have more
resources or you know that your product
is going to be sticky go for a fine tune
model next I'm going to talk about about
looking at the data Eugene mentioned you
know you should create evals on your
custom or bespoke criteria but how do
you know what criteria you want simple
answer look at your data great AI
researchers but we changed that to
Engineers great AI Engineers look at
their data so how do we do this the
first question actually before how is
when do you look at this I know people
who never look at their data at all or
people who look at it initially after
deployment wrong answer you want to look
got it regularly I work with a startup
that you know whenever they ship a new
llm agent they create a new slack
Channel with all of the agents outputs
that come in real time after a couple of
weeks they transition this to kind of
daily batch jobs um and make sure that
you know they're not running into errors
that they didn't
anticipate second thing is what
specifically are you looking for you
want to find slices of the data that are
pretty simple or easy to characterize in
some way for example data that comes
from a particular Source or data that
has a certain keyword or phrase or is
about a certain topic right simply just
saying all of these are bad but having
no way of characterizing them and then
improving your pipeline based on that is
not going to help finally some things to
keep in mind throughout this whole
looking at your data experience is that
your codebase is very rapidly changing
over time probably your prompts
components of the pipeline and Etc so
when you're inspecting traces it's super
helpful to be able to know you know what
GitHub commit or what model version or
prompt version did this correspond to I
think this is one of the very successful
things that traditional mlops tools did
like ml flow for example they made it
very easy to trace back and then
hopefully you could replay something
well I I see the judge shaking as head
but great um and finally when using llms
as apis pin model versions U llm apis
are known to you know exhibit different
behavior that is very hard to quantify
for for certain tasks so pin you know
GPT 4 1106 pin GPT 40 whatever it is
that you're using so shya mentioned that
we need to look at our data but how do
we look at our data all the time I think
the way to do this is VI an automated
guard reel here's brandolin law adapted
the amount of energy to catch and fix
defects is an order of magnitude larger
than needed to produce it and that's
true it's really easy to call llm Api
and just get something but how do we
know if it's actually bad I I think it's
really important that we do have some
basic form of guard RS and some of them
are just table sticks toxicity
personally identified information
copyright and expected language now you
may imagine that this is pretty
straightforward but sometimes you don't
actually have control over the context
for example if someone's posting an ad
on your English website that's in a
different language and you're asking
your LM to extract the attributes or to
summarize it you may be surprised that
for some nonzero proportion of the time
it actually in a different language
similarly hallucinations happen more
often that we would like um so imagine
you're trying to summarize a movie based
on the description you just have a
description for the trailer it may
actually include spoilers because it's
trying so hard to be helpful but that's
actually a bad user experience so
sometimes you will include information
that's not in that here's a tip if we
spend a little bit more time building
reference free evals we can use them as
guard rails so reference-based evals are
when we generate some kind of output and
we compare compared to some ideal sample
this is pretty expensive and you
actually have to collect all these go
samples on the other hand if we have
these labels we can train an evaluator
model and just compare it to the source
document so for example if we're
comparing summarizations we can just
check if the summary entails or
contradicts The Source document and now
we have a summarization I mean a
hallucination eval so therefore if we
spend some time building reference free
evals once we can use it to guard R all
new
output
cool thanks so we're going to wrap up
the next minute or so on some highle
Bird's eyee view 2000t view whatever you
want to call it
takeways first off how many of you
remember this figure from this pretty
seminal paper in mlops that came out
maybe 10 years ago 2015 so nine years
ago yeah so I think this paper really
communicated the idea that the model is
a small part and when you're
productionizing ml systems right there's
so much more around the model that you
have to maintain over time data
verification uh feature engineering
monitoring your infrastructure Etc so
you might be wondering you know we have
LMS does any of this
matter
yeah seeing few nods here absolutely um
when we have llms it all of these you
know Tech debt principles still apply
and you can even think the exact mapping
for every single component in here to
the llm equivalent for example maybe we
don't have feature engineering pipelines
but you know cast it a new light they
it's rack right we're looking at context
we're trying to retrieve what's relevant
engineer that to you know not distract
the llm too much we have a ton of
experimentation around that all of this
is something that needs to be maintained
over time especially as models change
under the hood similarly for data
validation and verification right we
have evales we have guard rails that
need to be deployed right it's not just
simply wrap your uh model or GPT um in
some software and ship it no there's
like a lot of investment that needs to
happen around the model all right so I'd
like to end with this quote from kapati
Senpai there's a large class of problems
they are really easy to imagine and
build demos for but it's extremely hard
to mil products out of for
example Charles dug up this paper of the
first car driven by neuron Network work
that was
1988 25 years later Andre kapati took
his first demo Drive of
2013 10 years later I hope all of you
had a chance to try the weo we got the
first driess we got the driess permit
for way more in San
Francisco maybe in a couple more years
we'll have it for the whole of
California the point is going from demo
to production takes time so therefore
that's all we had thank you let's
[Applause]
build and now I'm delighted and honored
to welcome a very special guest Thomas
domy Thomas has been fascinated by
software development since his childhood
in Germany and he's built a career
building tools death love and
accelerating Innovations at that are
changing software development currently
Thomas is CEO at GitHub where he has
overseen the launch of the first at
scale AI developer tool GitHub co-pilot
so please join me in welcoming to the
stage Thomas
[Music]
domy thank you Thomas
it's B out here yeah well thank you
everyone thank you Thomas um let's start
with
co-pilot many people have shared their
own takes on the co-pilot origin story
so but what was your personal experience
seeing it in grub I don't know you have
a sneak preview take us back to the
start in 2020 so imagine it's 2020 um
it's lockdown here in San Francisco in
Seattle everywhere where giab Engineers
are sitting so like all of you probably
were on a zoom call um one of us had
early access to a new model that um open
AI had just um released in preview um a
version of gpt3 called codex and uh you
know one had the um uger I think had the
keyboard the leader of GitHub next at
the time and we were dictating prompts
and asked the model to uh write some
code and I think the first aha moment
that I had is that you could ask it to
write JavaScript code and put the curly
braces in the right places and whatnot
and you could
ask it to write python code and the
model in a way you know doesn't work
like a compiler it doesn't have a syntax
tree it doesn't know these things or you
could also argue it knows them exactly
like we know it so that was probably the
first moment um we kept building uh uh
we kept exploring the model and then
decided we build this autoc completion
co-pilot that you know was the first
co-pilot and we build it all you know by
being remote while being on lockdown so
if uh event if your investors tell you
today you need to be in in a room and on
the front of the Whiteboard um you can
innovate if you want to while being
while being uh in your home offices
around the world um I think the next
moment was um that we shipped um a
preview to our internal engineers and we
call it a staff ship at GitHub and uh
The NPS um survey with those Engineers
was through the roof I think 72 73
something like that and typical our
early stage products especially you know
with a large language model and you know
all the hallucinations and the UI wasn't
really fig got out yet uh is is much
lower so that was kind of like a holy
moment uh that we had and um as the
product then shipped in mid 2021 and you
know Co was still going uh uh we we
started looking at Telemetry and the
team came and says it writes about 25%
of the code in in those files where it
was enabled and I remember saying don't
believe this like your Telemetry is
wrong please go back and validate that
and turned out you know that was
actually right and uh by now it's about
half you know the code that's written
some languages like Java even has have a
higher acceptance rate and more lines
written and so I think those kind of
this journey that we went through over
the first two years really was like one
one moment after another where we saw uh
the future of AI long before um chat GPT
actually opened everybody else's mind
amazing and now it's available to
everyone here as well so I think
co-pilot started as an auto compete IDE
and now it's all over GitHub I know I
have PR boards Etc what do you do to
make co-pilot um and integrated across
all of GitHub like what are some
experiments what worked what didn't work
I think the first thing is to think
about you know what do I do as a leader
as the CEO of a company and it's really
about constantly reconfiguring our
approach um so much of you know the AI
world is changing almost daily um
there's you know some news uh on the
information elsewhere uh every morning
and so there is no more a I have a
long-term strategy uh I have my features
all laid out and work through the back
lers really like operating as agile as
possible even as we are you know 3,000
person company as part of you know one
of the largest company uh on the planet
the second is that we really try to meet
you know the developer where they are um
we say you know we're not trying to
build an AI engineer we're trying to
build AI for engineers a human Centric
approach you know that's where the name
what the name copil ultimately uh
visualizes um but also you know we're
trying to make the developers lives
better um and because we have developers
ourselves and um every productivity
Improvement we can find ultimately helps
us at GitHub to build you know our AI
product so that really is the approach
like looking at what what can we do next
uh to make you know our our work um a
little bit easier of building more
features for co-pilot you mentioned a
great point you trying to meet the
developer where they are so for now
we've been bringing the AI to the IDE
yeah can you are we going to try to
bring the developer the ID closer to the
AI how are you thinking about that you
know the idea of bringing AI into the
IDE or really into ghost text you know
autoc completions was a way of getting
around hallucinations um it was a way of
saying okay the model is not always
going to be perfect but neither are Auto
completions right like whether you have
Auto completions in your Google Docs or
in your email or in your editor in in
the old intellisense way as you're
typing it cannot know what you wanted to
type and so you're used to adjusting
your typing and then you find this
moment when you press the Tab Key and um
even without Auto completion so we think
about what developers do in the editor
while they write code and the best
developers write a lot of code before
they get stuck in the newbies and and
those that rarely write code like I you
know get stuck more often and then you
you know control tab or command tab into
um into your browser and you open um
Google or stack Warlow GitHub right and
what you do there is you find code and
and you argue with other developers and
then you copy and paste that code into
your editor and then you modify that as
well so it's kind of like in a way stack
Overflow has as many hallucinations as
as as the model I might have H and not
because the answers are bad but because
the world is changing so much you know I
code a little bit on iPhone projects and
Swift and there's always a new Swift
version uh after dddc or new excode
version so things have changed of how
you use the apis and so it keeps the
developer in the flow that really the
crucial thing here was we didn't you
know in a world you know 10 years ago we
probably wouldn't even call this AI we
would just call call it you know more
smarter Auto completion and um the AI
piece is not the core of piece the the
core the core feature of copilot that
helps developers to stay in the flow to
get the job done and not be in this
constant distraction between the editor
and and the browser that's a great point
and I think a few months ago you wrote
this post about
workspace what was the journey to
creating workspace and maybe for folks
who don't unfamiliar with it what is
workspace yeah so now you already
mentioned autoc completion that's what
how we started um in um November 2022
chat GPT happened so early 2023 we added
chat and and gp4 uh to co-pilot in the
IDE as a as a separate um um sidebar
window so we have that available and it
has Rag and and all the information the
context available in the IDE but ever
since we have been thinking how can we
make the developer flow even easier and
workspace does exactly that it takes a
GitHub issue or just the task and idea
that you write down on get.com and it
helps you then as part of your code
basic repositories to figure out how to
implement that change it Bridges from
the issue you know from the task
description into the pull request into
the code and the the magic behind this
is that a the human is still in the
center so every step of that way you
know writing a specification analyzing
the current Reaper the current behavior
and then using your description to
figure out how do you modify this then
writing the plan which sh shows you how
to change all the files to the
implementation which is the diff view if
you will the human can interact can
change those bullet points can change
the code and um what that really does it
it gives you a an a pair programmer that
helps you to explore the codebase right
because the challenge we all have as
Engineers is that as as soon as you get
moved onto a new project or you want to
you know modify an open source project
um or you're just you know coming back
from vacation you're timeing to remember
what in what is implemented where in
your thousand plus files that is
navigating the code base is the first
challenge you have figuring out what's
the current behavior and what's the new
Behavior so you're having an AI native
um a co-pilot native developer
environment that helps you along that
Journey that you're naturally also doing
in your IDE and that really is the key
here it's not about you know building an
autonomous agent I'm sure you have heard
a lot about that in the last three days
it's about building agents that helps us
as humans to solve a task and learn
along the way as we figure out oh you
know there's this test file that I also
have to modify if I want to implement
this feature I love the point you
mentioned
which is not building autonomous agents
and also helping the developers so how
should non developers use
workspace they can and in fact you know
once we announced this um last year at
GitHub Universe in November I think the
first email we got with feedback was
from a program manager or product
manager saying this is awesome because
now I can uh not only write you know a
user story or um a work item I can also
see what that would mean to implement in
the code base in many ways you know the
biggest challenge we have today is can
we be as specific as possible when we
write down a task you know as product
managers or as Engineers ourselves you
know often everything is obvious until
it is not um and then um you know you
you kind of need to size the task right
like how long will it take and uh the
mythical men month um I think the
pragmatic engineer had that a couple of
weeks ago is still true most uh most
estimates are half as uh half the time
that the the drop actually takes and
it's so really bad at estimating how
much time it takes to get uh done
whether it's encoding or whether it's
building houses or or roads or
infrastructure and so um workspace helps
you with that as it helps you to figure
out what I just describe is it actually
specific enough to write the code for
that or to even figure out what the plan
would look like can you share a bit
about your vision on how you think we
will build and code in natural language
and how it help us collaborate better
devs and PMs coders and non-coders
across languages and across the world
for me you know the very first thing
when you say natural languages I have it
on my t-shirt here co-pilot speaks your
language is because chat these large
language models that we're using today
in GitHub co-pilot and many other AI
applications are the same models that
are also helping us in chat agents they
speak almost any language on any major
human language and so whether you um you
know want to explore coding in English
and you don't understand the concepts of
uh you know true false booing logic yet
or whether you want to learn that in
German in Hindi in you know Brazilian
Portuguese in in Spanish and Chinese you
can do that now and if I you know look
at kids today in in school most of them
are growing up with mobile phones um you
know when you go into a restaurant here
in San Francisco on Seattle or elsewhere
in the world at night you probably see a
family with little kids where the kids
have their phone because the parents
want to enjoy five minutes on their own
and then as then kids grow up you know
they see Super Mario or or Minecraft and
they get into gaming and that naturally
that means how can I create my own game
how can I create my own web page copilot
enables that enables that in the
language that the kids grow up with
which you know for the majority of the
humans of the on the on this planet is
not English um so that's number one it
democratizes access to technology it
also democratizes access for those that
don't have parents at home that have a
technical background or that don't have
parents at home that have infinite
patience but most parents do not I have
two kids I speak from my own experience
at some point you're just done you know
with explaining the world to your kids
and you just want to you know switch on
the TV and uh and watch your watch this
Netflix show and and and but that keeps
going if you look into the professional
context one of the biggest challenge we
have is you know if you would join my
company or I join your company uh
tomorrow the biggest challenge we have
is what's all the institutional
knowledge how are things being done you
know and what we don't like as humans is
ask a thousand questions um especially
if you're a new employee in in a big
company you're like having this anxiety
in your head that everybody else thinks
you're you're dumb you why did you get
hired in the first place so a copilot
also democratizes access to all the
information and companies and I think
that is going to be changing how we work
and not only for developers in the
workforce but for really every
human thank you thank you Thomas for
sharing your vision I guess the next
thing I want to ask is maybe a little
bit more
unhinged speaking of agents in your
opinion what makes an agent or co-pilot
what's the definition what what's your
definition of an agent I think an agent
you know is like an AI dishwasher um you
fill it um with you know the dishes and
you let it uh let it do its thing and
then at the end you to take the output
and you put it back into the shelves
right and today um we have you know we
called it used to call it Bots um um you
know or cicd in many ways that's an
autonomous agent right you push your P
request and you run your cicd get up
actions or or or a similar product um
many compute Primitives that we have
today are agents as they get a job done
on their own and my monitoring you know
to figure out if GitHub up or down is
somewhat autonomous um hopefully it
Pages somebody without us hearing from
you that you cannot access your
repository so you know I think in many
ways um uh what we're building is still
tools that help us to get the job done
and there's many jobs that developers
have to get done many jobs that now ai
Engineers need to get done you saw on
the slide earlier all the things that
are also still true you know even though
you can automate things with large
language models um and a lot of work in
software engineering is is barging us
down um a lot of boilerplate a lot of
security compliance you know that Friday
evening
when you when you want to you know enjy
the barbecue because the sun is out and
instead you have to update all your lock
for Jade dependencies right like
security Tooling in fact you now is
creating more work it's not a dishwasher
it's actually a tool that shows you that
the tells you that the dishes are dirty
and then you have to do the dishes
yourself today and so um that's security
tooling right it just adds stuff to our
backlog while we actually want to work
on the creative side and we want to you
want to build new features you want to
build Innovative product that creative
things um I think many software
developers do not understand themselves
as a production worker they understand
themselves as artists as creators yes
and but you know our companies our
governments you know the world is
requiring us to do a lot of other work
and we need AI tools autofix you know
things that that scans uh not only for
security issues but then fixes those
security issues we need those pieces um
supported by AI so we have more time for
the things we don't want we do want to
do and AI takes over the things we don't
want to do and that's that's where the
agents will go fantastic what's an agent
you want to have and how far are we from
it I mean I want to have these agents
that burns down all my security backlog
um it's um as in any company and the the
challenge is that I have way too many of
these items um and there isn't really a
book you can buy um that tells you as an
engineering manager of how to balance
those two things um you cannot do all
the your work into security compliance
um accessibility and whatnot you cannot
put all your work in into Innovation
because your customers will lose all
your trust the moment you have a
security issue that uh threatens their
their data and as such you have to
balance those two things or you find AI
agents that Springs the work down and I
think as any you know leader of a
software development company I always
want to go faster I always want to get
that feature done faster and um I'm sure
it's you know the same uh for you folks
at Amazon when uh when I have an idea
and I ask my folks how long will it take
to implement that the estimate I'm
getting is like I'm scratching my head
I'm thinking I could done that can do
that myself faster than than waiting for
for my team to do it but of course that
that's not the truth the truth is that
there's so many other things in the
process these days that um we need to
find new abstraction layers um that help
us to to get control over our
development life cycle again that's a
great point so last question do you have
any advice for devs both new and
experien on how they should they should
navigate this new world of tools this
new of new world of
abstractions um in what some say is the
biggest Technology Innovation since the
that I think you know the most exciting
thing about this new technology is and
you saw it hopefully over the last three
days at this conference is that we are
moving into a new world of software
development and there have been multiple
step functions you know over my uh life
um I was born right before the PC was
invented uh I remember getting my
commodor 64 on a PC in the '90s I
remember the open source in the internet
and you know internet Open Source before
the internet was buying CDs and DVDs um
in bookstores the internet came you know
um Source Forge and then GitHub came all
of a sudden developers started
collaborating the mobile wave came and
every time we had those step functions
software development got more exciting
and I think you know we are again at
that at that step function it means we
can embrace our nerditude we can build
new new and I think you know
the it's really like like you know for
me as the CE of GitHub I don't get to
touch code often and so when I get to
touch code on a Sunday afternoon I don't
want to spend all my time of updating
all my dependencies and I don't want to
figure out all the things that have
changed in the API documentation or how
to deoy The Container through the cloud
now and what has you know there's so
many change things changing around us
that we want to bring the fund back and
I think that's it's it's AI brings the
fun back into software development and
so I want to you know encourage you all
go back home and and build some cool
stuff and and embrace this new world of
AI okay that's all we had thank you
Thomas please join me thank you so much
ladies and Gentlemen please welcome back
to the stage your host and co-founder of
the AI engineer Summit Benjamin
[Music]
duny oh my
God how are we feeling
better than me I hope I'm
exhausted what a way to end an
event open
AI rapid fire talks from
Legends and the CEO of GitHub can we
have a round of applause for our final
Keno
speakers
incredible I think the only thing left
to
say is thank you thank you all for
coming to this event we can put together
the best flows content Productions but
it means nothing without all of you the
community members who show up and learn
engage and connect so we thank you for
that so please give yourself a round of
[Applause]
applause next I want to thank Microsoft
they were a sponsor of summit last year
last October and Britain winterose in
particular was our first Champion I as I
recall for taking their sponsorship up a
notch so thanks to him and I want I
couldn't be happier with their
partnership Sharon Allison Kayla you've
all been just absolutely incredible to
work with and special thanks to all the
speakers from Microsoft and GitHub for
your hard work and incredible content
AWS I've been a user and fan of this
company for some time and as someone who
produces events I'm in absolute awe of
what they do for reinvent and all the
other events that they do and memo you
are just incredible keep doing what
you're doing and atie I'm so excited to
hear that you're turning San Francisco's
AWS Loft into an AI engineering Meetup
Hub you announced that at your keynot
it's incredible San Francisco is back
baby uh mongodb Google Neo forj you all
brought your aame thanks for your
partnership and support from the
planning and organizational side to the
content and production side to all of
our other sponsors especially our gold
and silver sponsors you not only help to
make the event a financial success you
make every aspect of this event more
interesting engaging and valuable this
event is enriched by your presence and
we thank you for being here Argus HD I
mean seriously Argus HD you never cease
to amaze me Tim Ryan Tim the other Tim
everyone else you guys are incredible
our budget is small but you all make it
work and you've they did all the
breakouts it's just absolutely
incredible you put on an incredible show
Motif events do you guys see that Expo
that's events they designed and built
that we selected them not only because
they were the most Innovative but
because it was clear to me that they put
in the most work and heart into what
they do so thanks Dave Ben and everyone
else and thanks to the entire Local 16
crew for your excellent craftsmanship in
assembling and disassembling this thing
had to come down between the sessions we
brought the airwall in y'all saw that
they did that super super fast so thank
you all for that everyone at the
Marriott Marquee you've all just been
absolutely incredible food was good good
coffee was good everyone's super
responsive internet good at least as far
as I was concerned hopefully yall were
as well all of the speakers we can't
have an event without the speakers they
work incredibly hard on these talks like
Picasso it took them their entire lives
to create that talk but unlike Picasso
it takes weeks or even months to
accomplish so to all of our speakers
thank you so much for being an
incredible part of this event coach and
vide for managing our live stream with
realtime clips of of our YouTube live
stream so thank you Chris Otto everyone
at source craft everyone at Cody for
helping with the distribution of this
incredible content for a wider audience
and volunteers these are the folks in
the yellow staff T-shirts they are not
paid they're volunteers events are super
expensive and we can't do without you so
we thank you so much and Leah McBride
many of you met her at registration and
she's an absolute Force seriously she's
an entire agency at the event so I'd
like you to welcome her to the stage if
we could all just show our gratitude
with a round of
applause and Leah we will get you some
more on-site support for next year so uh
you're going to be working a little less
hard Simon sturmer is a beast of an
engineer and volunteers so much of his
time to help with a conference website
and mobile app this of course has bugs
we have volunteers running this if
you guys want to contribute and build
some cool stuff that we're doing in
email me directly Ben a. engineer um um
we could use some help um lastly swix
seriously you all don't know how much
heart and soul and sweat and time this
man has put into this event I'm beyond
disappointed that he missed this event
but he is watching from the live stream
Co so can we show some love for
swix Max video productions who handles
our b-roll Cinema phography and speaker
interviews um uh Randall gee who does
our photography Angelie fit for live
voice overs Sasha Shang who help with
the diversity committee everyone from
the diversity committee Sarah Chang Tony
shw Harper Carrol so many people if I
forgot you I'm sorry I put this together
five minutes before I went on stage um
crazy times but thank you all so much
for being here thank you for coming and
we will see you next year don't forget
to secure your ticket today bye-bye
thank you everyone
[Music]
[Applause]
[Music]
I and I know to and ready
bre and I'm ready to
[Music]
[Applause]
[Music]
IAT feel
I'll come back
[Music]
[Music]
to
I and I'm ready to my bre
[Music]
[Music]
[Applause]
I in aone
[Music]
[Applause]
again when iast
expected you're all
[Music]
I hold them up