The 2025 AI Engineering Report — Barr Yaron, Amplify

Channel: aiDotEngineer
Published at: 2025-08-01
YouTube video id: mQ7_Zje7WKE
Source: https://www.youtube.com/watch?v=mQ7_Zje7WKE
[Music]
[Music]
[Applause]
All right. Hi everyone.
Uh, thank you for having me here and
huge thanks to Ben, to Swix, to all the
organizers who've put so much time and
heart into bringing this community
together.
Yeah.
All right. So, we're here because we
care about AI engineering and where this
field is headed. So, to better
understand the current landscape, we
launched the 2025 state of AI
engineering survey. And I'm excited to
share some early findings with you
today.
All right, before we dive into the
results, the least interesting slide. Uh
I don't know everyone in this audience,
but I'm bar. I'm an investment partner
at Amplify, where I'm lucky to invest in
technical founders, including companies
built by and for AI engineers.
And uh with that, let's get into what
you actually care about, which is enough
bar and more bar charts.
And there are a lot of bar charts coming
up.
Okay, so first our sample. We had 500
respondents fill out the survey,
including many of you here in the
audience today and on the live stream.
Thank you for doing that.
And the largest group called themselves
engineers, whether software engineers or
AI engineers. While this is the AI
engineering conference, it's clear from
the speakers, from the hallway chats,
there's a wide mix of titles and roles.
You even let a VC sneak in.
Um, so let's test this with a quick show
of hands. Raise your hand if your title
is actually AI engineer at the AI
engineering conference. Okay, that is
extremely sparse.
Uh, raise your put your hands down.
Raise your hand if your title is
something else entirely. So that should
be almost everyone. Keep it up if you
think you're doing the exact same work
as many of the AI engineers.
All right. So this sort of tracks titles
are weird right now, but the community
is broad. It's technical. It's growing.
We expect that AI engineer label to gain
even more ground. Uh couldn't help
myself. Quick Google trend search. Term
AI engineering barely registered before
late 2022. Uh we know what happened.
Chat GPT launched and the moment for AI
engineering interest has not slowed
since. Okay. So people had a wide
variety of titles but also a wide
variety of experience. Uh the
interesting part here is that many of
our most seasoned developers are AI
newcomers. So among software engineers
with 10 plus years of software
experience nearly half have been working
with AI for three years or less and one
in 10 started just this past year. So
change right now is the only constant
even for the veterans.
All right. So what are folks actually
building? Let's get into the juice. So
more than half of the respondents are
using LLMs for both internal and
external use cases. Uh what was striking
to me was that three out of the top five
models and half of the top 10 models
that respondents are using for those
external cases for the customerf facing
products are from open AI.
The top use cases that we saw are code
generation and code intelligence and
writing assistant content generation.
Maybe that's not particularly
surprising. Uh but the real story here
is heterogeneity. So 94% of people who
use LLMs are using it for at least two
use cases. 82% using it for at least
three. Basically folks who are using
LLMs are using it internally,
externally, and across multiple use
cases. All right. So you may ask how are
folks actually interfacing with the
models and how are they customizing
their systems to for these use cases. Uh
besides fshot learning rag is the most
popular way folks are customizing their
systems. So 70% of respondents said
they're using it. The real surprise for
me here I uh I'm I'm looking to gauge
surprise in the audience was how much
fine-tune is hap fine-tuning is
happening across the board. It was much
more than I had expected overall. Uh, in
the sample, we have researchers and we
have research engineers who were the
ones doing fine-tuning by far the most.
We also asked an open-ended question for
those who were fine-tuning. What
specific techniques are you using? So,
here's what the fine-tuners had to say.
Uh, 40% mentioned Laura or Qura
reflecting a strong preference for
parameter efficient methods. And we also
saw a bunch of different fine-tuning
methods, uh, including DPO,
reinforcement fine-tuning, and the most
popular core training approach was good
old supervised fine-tuning.
Many hybrid approaches were listed as
well.
Um, moving on top uh to up on top of
updating systems, sometimes it can feel
like new models come out every single
week. Just as you finished integrating
one, another one drops with better
benchmarks and a breaking change. So, it
turns out more than 50% are updating
their models at least monthly, 17%
weekly,
and folks are updating their prompts
much more frequently. So, 70% of
respondents are updating prompts at
least monthly and one in 10 are doing it
daily. So, it sounds like some of you
have not stopped typing since GPT4
dropped.
Um, but I also understand I have
empathy. Uh, seeing one blog post from
Simon Willis and suddenly your trusty
prompt just isn't good enough anymore.
Despite all of these prompt changes, a
full 31% of respondents don't have any
way of managing their prompts. Uh, what
I did not ask is how AI engineers feel
about not doing anything to manage their
prompts. So, we have the 2026 survey for
that.
We also ask folks across the different
modalities who is actually using these
models at work and is it actually going
well and we see that image, video and
audio usage all lag text usage by
significant margins.
I like to call this the multimodal
production gap
because I wanted an animation. Um, and
this gap still p persists when we add in
folks who have these models in
production but have not garnered as much
traction.
Okay, what's interesting here is when we
add the folks who are not using models
at all in this chart too. So here we can
see folks who are not using text, not
using image, not using audio or not
using video. And we have two categories.
It's broken down by folks who plan to
eventually use these modalities and
folks who do not currently plan to.
You can roughly see this ratio of no
plan to adopt versus plan to adopt.
Audio has the highest intent to adopt.
So 37% of the folks not using audio
today have a plan to eventually adopt
audio. So get ready to see an audio
wave. Um, of course, as models get
better and more accessible, I imagine
some of these adoption numbers will go
up even further.
All right, so we have to talk about
agents. One question I almost put in the
survey was, "How do you define an AI
agent?" But I thought I would still be
reading through different responses. Uh,
so for the sake of clarity, we defined
an AI agent as a system where an LLM
controls the core decision-making or
workflow.
80% of respondents say LLMs are working
well at work, but less than 20% say the
same about agents.
Agents aren't everywhere yet, but
they're coming. Uh, the majority of
folks uh may not be using agents, but
most at least plan to. So, fewer than
one in 10 say that they will never use
agents. All to say that people want
their agents. And I'm probably uh
preaching to the choir.
Um the majority of agents already in
production do have right access uh
typically with a human in the loop and
some can even take actions
independently.
So um excited as more agents are adopted
to learn more about the tool
permissioning that folks uh have access
to.
If we want AI in production of course we
need strong monitoring and
observability. So we asked do you manage
and monitor your AI systems? This was a
multi- select question. So most folks
are using multiple methods to monitor
their systems. 60% are using standard
observability. Over 50% rely on offline
eval.
And we asked the same thing for how you
evaluate your model and system accuracy
and quality. So folks are using a
combination of methods including data
collection from users, benchmarks, etc.
But the most popular at the at the end
of the day is still human review.
Um and for monitoring their own model
usage. Most respondents rely on internal
metrics.
So storage is important too. Where does
the context live? How do we get it when
we need it? 65% of respondents are using
a dedicated vector database. And this
suggests that for many use cases,
specialized vector databases are
providing enough value over
generalpurpose databases with vector
extensions. Uh among that group, 35%
said that they primarily self-host. 30%
primarily use a thirdparty provider.
All right, I think we've been having fun
this whole time, but we're entering a
section I like to formally call other
fun stuff. Uh, I spent hours
workshopping the name. So, we asked AI
engineers, should agents be required to
disclose when they're AI and not human?
Most folks think yes, agents should
disclose that they're AI. Uh we asked
folks if they'd pay more for inference
time compute and the answer was yes but
not by a wide margin. And we asked folks
if transformer-based models will be
dominant in 2030 and it seems like
people do believe that attention is all
we'll need in 2030.
Uh the majority of respondents also
think open source and closed source
models are going to converge. So I will
let you debate that after. Um no
commentary needed here. So, uh, the
average or the mean guess for the
percentage of US Gen Z population that
will have AI girlfriends, boyfriends is
26%.
Um, I don't really know what to say or
expect here, but we'll see. Uh, we'll
see what happens. Uh, in a world where
folks don't know if they're being left
on red or just facing latency issues,
um, or uh, of course, the dreaded it's
not you, it's my algorithm.
And finally, we asked folks, what is the
number one most painful thing about AI
engineering today? And evaluation topped
that list. Uh, so it's a good thing this
conference and the talk before me has
been so focused on evals because clearly
they're causing some serious pain. Okay.
And now to bring us home, I'm going to
show you what's popular. So, we asked
folks to pick all the podcasts and
newsletters that they actively learn
something from at least once a month.
And these were the top 10 of each. So,
if you're looking for new content to
follow and to learn from, this is your
guide. Uh, many of the creators are in
this room, so keep up the great work.
And I'll just shout out that Swix is
listed both on popular newsletter and
popular podcast for latent space. Uh, so
I will just leave this here.
Um, I think that's enough bar charts and
bar time, but if you want to geek out
about AI trends, you can come find me
online in the hallways. Uh, we're going
to be publishing a full report next
week. Uh, I'll let Elon and Musk have
Twitter today, but um, it's going to
include more juicy details, including
everyone's favorite models and tools
across the stack. Thank you for the
time. Enjoy the afternoon.
[Music]