Are 95% of Businesses Really Getting No Return on AI Investment? — With Aaron Levie

Channel: Alex Kantrowitz

Published at: 2025-09-18

YouTube video id: lWJ9getD9dg

Source: https://www.youtube.com/watch?v=lWJ9getD9dg

Why are the headlines telling us that
businesses are getting no return on AI
investment? And are AI agents finally
ready to get to work? We'll cover it all
with box CEO Aaron Levy right after
this. Welcome to Big Technology Podcast,
a show for coolheaded and nuanced
conversation of the tech world and
beyond. Well, today we're going to talk
about AI and its application in
business, whether it's actually making a
difference, and whether AI agents are a
real thing. We have the perfect guest to
do it today because we have Aaron Levy
back with us fresh off the Box uh Box
Works AI event. And uh Aaron, it's great
to see you as always.
>> Thank you. Uh thank you, Alex. Uh good
to uh good to be here.
>> So, did I did I add Box uh Box Works AI
event or is it just called Box Works?
And I'm just
>> I actually like I like you. I like you
calling an AI event. Um it it is just
called Box Works, but uh but any anytime
you want to jam in AI in there, we're
good.
>> Okay, sounds good. You had a lot of AI
news, we'll get into that in a moment.
Uh but since you are talking with a lot
of folks about AI applications, uh in
business, I want to run this MIT study
by you and get your perspective on
what's real and what's not. So, uh this
is from Axios. A couple weeks ago, MIT
study on AI profits rattles tech
investors. Wall Street's biggest fear
was validated by a recent MIT study
indicating that 95% of organizations
studied gets zero get zero return on
their AI investment. They studied 300
public AI initiatives trying to sus out
uh the no hype reality on AI's impact on
business. 95% of organizations said they
found zero return despite despite
enterprise investment of 30 billion to
40 billion uh into generative AI. This
has been a study that everybody in the
business world is talking about. Um, do
you think there's any validity to it?
You're already shaking your head.
>> I I'm shaking my head on actually like
seven dimensions. Um, we we can parse
each one. Um,
>> let's do it.
>> Uh, so so um I mean actually maybe the
first one that is maybe the most funny
is is the the kind of Wall Street
element. Actually, Wall Street is is
completely schizophrenic on this
dimension. um uh you know obviously a
report like that you know scares them on
one dimension but actually uh there's an
equal amount of of kind of wall street
maybe um you know kind of uh frenetic
energy around the idea that that AI will
be so good that all software is dead.
So, so it's a it's this very kind of
bipolar state of, you know, where are we
in AI adoption versus versus AI is going
to be so powerful that there's not even
going to be software business models
because everything will just be
delivered by AI. Um, and and you know,
as with most things that that have these
kind of, you know, extreme polarization
elements, I think the the reality is
just, you know, way more nuanced. Um, we
are still early in the adoption curve of
AI. uh in the early you know curve of
all of these types of technologies you
have lots and lots of proof of concepts
you have lots of trials of different
technologies people are trying to figure
out which which tool works for which use
case. So by definition you're kind of in
the wild west where there's lots of
attempts uh at trying these technologies
with various vendors and and technology
stacks and and many of those projects
and pilots will absolutely fail because
by definition they're pilots um and
they're and we're still in the early
phases. One interesting thing about the
study was uh they they saw a significant
delta between companies that tried to
effectively DIY uh their AI stack versus
um you know going with really kind of
applied solutions and use cases. And
this is what we tend to find in our
customer base. So um you know I think
there was maybe an initial theory of
well AI will be relatively easy to kind
of get our arms around. Um we could
build our own AI application. We'll do
all of the vector embeddings of our data
ourselves. We'll put it into a vector
database. We'll have we'll manage the
security and permissions of data access
ourselves. And you know, before you know
it, a company that wanted to deploy AI
in in a particular workflow in their
enterprise, they might have 10 or 15
different pieces of software that they
have to run and manage just before, you
know, before a single user could
actually interact with with with AI
within that organization. So that's
that's probably an architecture that's
not going to work. Um, you need to you
need to have purpose-built solutions
that that solve sort of tailored use
cases. Those can be very big use cases
like all of AI coding, but you probably
don't want to be in a position where you
have to kind of bootstrap this or or
build it all out yourselves. Um, and
that was one of the the kind of
recognitions in the survey. But, uh, I I
obviously wholeheartedly, you know,
disagree with with any of the maybe um,
uh, conclusions uh, other than other
than just you have to get your use cases
right. Uh, you have to, you know, kind
of, you know, target um, uh, the most
effective areas for AI and you probably
shouldn't be building this technology
yourselves. Um and uh and so the the but
but it it's sort of empirical on our
end. We we get to talk to customers
every single day that that are are
seeing the immediate gains. Um we've
talked to customers where uh where they
have uh had had um colleagues that that
can't actually uh they can't present the
actual ROI savings to their board. um
the actual kind of uh uh expected Rway
savings to the board because the board
won't believe uh how how they won't
believe the numbers um based on how good
they are. So they actually have to water
them down. Um so it's actually more
pragmatic and believable based on what
they're seeing. So
>> isn't that a terrible board? I mean if
the board can't hear the truth a new
board,
>> but the truth is so good that that it's
it doesn't sound incredible. So, so that
that that is the uh like when when the
the ROI is so good that you actually
don't you you aren't going to be
believed when you actually explain how
this thing is going to work. So, we're
we're seeing examples all across the
board at least for our customers. You
know, we have we have the benefit of a
very applied use case, which is we we
take documents and unstructured data and
then we have AI agents that can operate
on that data to do things like extract
structured data from your documents. So,
give us, you know, 100,000 contracts and
we'll pull out the structured data
fields in those contracts. Uh, or give
us invoices and we'll pull out the key
details in an invoice so we can help
automate a workflow. Those use cases
tend to be very high ROI because either
you weren't getting that data before or
it used to be very expensive to do so.
And AI is getting increasingly good at
being able to execute that kind of task.
Uh, and so there's immediate benefit to
customers. You can automate workflows
much more easily. As a result, you can
lower the cost um of operations in some
areas. So, so we we tend to see a
different set of outcomes um based on
the the AI adoption that within our
customer base, but but you know, if you
zoom out and you kind of think about all
projects across, you know, the past
couple of years, I do think you're going
to get a mixed bag just as a reality of
how early we are in the space.
>> Yeah. And it says internal builds uh
fail at double the rate of external
partnerships. So, spot on there. people
trying to parse this together on their
own versus doing it externally are
having a tough time which sort of flies
in the face of like some of the
conventional wisdom. I think the
conventional wisdom was you wanted to be
able to build internally maybe with open
source so you could customize to your
use case but it turns out some of the
off-the-shelf stuff is actually working
quite well. Yeah, I think you have to
you you know a lot of the the challenge
with with either these types of surveys
or even talking about architectures is
you have to kind of separate uh the tech
industry from the non- tech industry. Um
the non- tech industry being the kind of
consumers of these types of technologies
and the tech industry being the
builders. So open source is insanely
valuable but not in the sense where a
law firm should go off and build their
own AI project using an open source
model like that. that is just a recipe
for disaster if if you know we think
that that every single company on the
planet is going to go build their own
technology to go automate their
workflows and that that has been
actually the case for a lot of pilots
because we've been early in the
technology and you haven't had applied
solutions that you could go deploy but
open source is actually extremely
valuable for a company like Box because
it it you know we're we're you know
we're powering technology for 120,000
customers and so we actually do have the
expertise internally to leverage those
kinds of capabilities And so so I I
would say I would say the conclusion
from from you know the dimension of of
open source as an example is just is you
know you probably shouldn't expect that
every company on the planet is going to
DIY their own AI strategy and that's a
recipe for not getting the returns and
gains uh from an AI adoption standpoint.
And then maybe the final point thing I'
I'd kind of point out is just there
really is a decent amount of change
management required to getting real
gains from AI. There's not a this is not
a panacea type of of of solution where
you could take an existing workflow,
drop AI directly into it, and then all
of a sudden that workflow will be, you
know, 3x better. You usually do have to
re-engineer the work to take advantage
of AI. And the the conclusion I've
recently come to more and more is, you
know, I think we had this feeling maybe
two or three years ago where AI was
going to learn everything about how we
work. who would be able to adapt to our
workflows and then bring automation to
our workflows. And I think
realistically, increasingly, we probably
will have to modify our work uh
hopefully incrementally, but in some
cases meaningfully to fully take
advantage of AI. And that sounds maybe
hard on one hand, but for the companies
that do that, the ROI is going to be
fairly massive. Um, so if you think
about AI coding as as maybe the the, you
know, most obvious example right now
where you're seeing productivity gains,
the way that AI kind of first engineers
tend to work is pretty different than
how you engineered two or three years
ago. The engineer really becomes more of
a manager. Uh, you're deploying agents
to go off and work on large parts of the
codebase and then it's coming back with
a a bunch of work that you go and
review. So if you don't change your
workflow as an engineer to take
advantage of background agents and how
you give them the right kinds of prompts
to actually execute on their task and
the new ways you should effectively
think about your codebase and um you
know handling the specifications and and
you know rules of what the AI agent
should do. If you don't do all of that
work, you're probably not going to get a
2x or 5x gain from from AI. And so we
will actually have to re-engineer some
of our business processes to make agents
effective as opposed to thinking agents
will just drop into our processes and
automate everything that we're doing.
>> By the way, you've brought up pilots a
couple times and I think it's important
to talk about because this study was not
just pilots. It was 95% of organizations
uh get zero return on AI investment. So
I think the pilot thing is interesting
because it's natural that pilots are
going to fail. And in fact, we've had
some listeners who've given me some
feedback that said because I talk often
about how like only 20% of AI pilots or
10 to 20% of AI pilots get out the door
into production. And that might be a
good number. Uh because you're you're
going to obviously, you know, have some
trial and error in the early days.
>> Yeah. And and to be clear, I I'm using
pilots colloially in the sense that
we're just so early in the technology
that that when we talk to customers,
what a lot of times they have so far
deployed is the equivalent of a pilot.
Um just because of literally how
>> organizationwide.
>> Yes. Well well organizationwide is is
you know it's hard for one centralized
survey taker to represent an
organizationwide.
It's like like the that's why again
that's why I don't want to like the
survey is great. It's an interesting,
you know, kind of conversation starter,
but like if you actually tried to go
assess how is the answer, you know,
answering this question and what is
their way of measuring that that
productivity and have they actually
surveyed all the end users that are just
using chatbt in an unsanctioned way and
what they're doing. It's like it's not
possible to capture all of that. So, so
it tends to more represent the the kind
of the the centralized, you know,
heavily sort of, you know, again, kind
of I think more more likely pilot
oriented type projects because of just
again how early we are. Um, you know,
the word agents just came onto the scene
less than a year ago. So, we're just
early in a lot of these spaces. But
again, I I think it's a fantastic survey
because it gets a conversation going.
But I think if the takeaway was to slow
down um you know using AI or or to uh to
do anything other than kind of realize
what you should mitigate from a risk
standpoint, then actually the the the
failure would just be or the problem
with the with that would just be um all
it's going to do is cause some companies
to to move even more slowly and then
you'll have other companies just outrun
them. So, so it's kind of up to the, you
know, it's sort of, you know, uh, at the
risk of, um, you know, you know, the the
risk is now on the listener to decide
what they want to do about that that
survey. Yeah. And I can tell you one
more thing that I found super
interesting about this study, which has
sort of been underappreciated. So, it
says official LLM purchases cover only
40% of firms, yet 90% of employees use
personal AI daily, at least those
surveyed, which just is so interesting
because it means that yeah, there's
there's more personal use and more
interest among individuals than
companies to get this stuff uh into
production. Yeah, you obviously have
reaction here, so let's hear it. Yeah.
>> Well, I know. I just think that's that's
like empirical revealed preference. So,
so like you don't have to like you don't
even have to survey once you know that.
why are people, you know, going off and
and using AI in in a personal
productivity sense um at that rate. It's
because they're they're getting value
from it. So, you almost like that that
is sort of now in the baseline of of how
people are working. um uh it it's
unquestionable that if you just sort of
eliminated a AI just today, let's just
say um you you would just notice, wow,
okay, I I actually have to go and do
that three hours of research that I used
to be able to go and kick off as a deep
research project and go and check back
in on it, you know, after 5 minutes. And
so, so, you know, it it's empirical that
we're choosing to use these technologies
on a daily basis because they're they're
adding that productivity. I mean, I
would argue that that what we've seen
with AI thus far is barely scratching
the surface of what is going to start to
happen as you you start to deploy these
technologies. But do you think the use
in business could it potentially be just
individuals using let's say chat GPT on
their own versus uh scaled enterprise
use of large language models or because
or or or do you think will be some blend
>> in the future?
>> You're obviously watching in the future
because you're obviously watching this
happen on on the other side of things.
No, the the future is um I I think that
that we are in the earliest phases of
just even the diffusion of the the
technology itself of of the the basic
use cases of hey when you're going to go
research a customer you know why not why
don't you get a full account plan um uh
you know instead of just saying okay
this person works at this company and
they're interested in these things and
this is these are the trends in that
industry why not ask a an AI system to
to gen generate the full And that that's
super powerful but also relatively basic
if you think about about how people work
um and the full scope of workflows that
people do. Um, one one really
interesting example of of again how
early we are, uh, Claude this week
announced uh, a new capability that will
generate files for you. And even though
we're two and a half years, you know,
nearly 3 years into the chatbt moment,
it's the first time where an AI system
can, I believe, generate reliably a a
kind of high quality document in the
form of a, you know, word document or
PowerPoint presentation. So we're nearly
three years in and it's the first time
ever that you could generate something
that you would sort of look at and say,
"Oh, that looks like a good
presentation." So, so we are only at the
very very beginning stages. Now imagine
it'll stay still take a couple years.
Now imagine a technology like that
begins to ripple through uh corporations
and in the future um before you go and
present whatever product you're selling
to a customer instead of spending one or
two hours of doing a bunch of research
and making your PowerPoint file that's
your presentation you go to an AI agent
you say I'm about to go sell to this
customer you know generate this
presentation for me you kick that off
and again three minutes later it's sort
of done for you that this is going to
just show up in all of our workflows
every single day in in almost everything
that we're doing. So coders are getting
the first lens into what the future
looks like um you know earliest because
you know they're they're sort of wired
to take advantage of these tools and AI
coding has been the kind of first
breakout use case. But that same dynamic
of you're going to go to an interface
you're going to talk to an agent it's
going to go and execute kind of multiple
steps of work for you that will start to
emerge within you know all of knowledge
work over the coming years. I actually
am am probably a pragmatist on this
sense that it will not be like this
instant overnight you know
transformation of work. It will take
years of change management. Um we just
hosted our conference this week as you
as you noted and um it happens to be a a
crowd obviously by definition that is
sort of forwardleaning and and kind of
early adopters of technology but that
represents a small fraction of the total
economy. It will take years before again
all of the banks, all of the pharma
companies, all of the law firms start to
get wired up. um in this uh in this AI
first way but um but I I mean
unequivocally it's going to happen um
and there's there's nothing that will
kind of slow that train down.
>> All right, let's talk a little bit more
about this using cloud to generate
documents uh use case. I mean I would
imagine so you talk the example that you
gave was using one of these to go in and
sell into a client. Now I would imagine
most organizations they have like their
PowerPoint templates and the data baked
in. So, even if I were to go into Claude
and like upload um my pricing
spreadsheet, my inventory spreadsheet,
uh a document about positioning and say
make a PowerPoint based off of this, I'm
sure it would do a good job. But how
practical is it to then say this is
going to be a way that people do their
work versus uh something that might look
like a party trick where you're going to
use the other documents that you have
already when you actually are going to
go out into market.
>> Oh, yeah. I know the way that this will
actually show up and and I I you know I
can't represent the exact date that this
will happen but Box you you'll just go
to Box and you'll say here's my sales
presentation template here's the new
client information please generate a
PowerPoint presentation with that like
and then you'll just do that with your
existing data this is not the this is
not sort of you know some kind of
one-off vibecoded document you you will
use your existing assets as the source
material for the next document that
you'll generate and you'll go and review
its work and that'll take you three
minutes, but it will have saved you, you
know, an hour or two hours of all of the
time that it took to do the customer
research and and move around all the
graphics and put the relevant
information in place. That will just be
done for you. So, so and that will, you
know, multiply that over a million
people that do that per day and in, you
know, some sector of the economy and
you'll just see, you know, that's how
you'll get tens of millions of hours of
productivity gained, um, you know,
within the within the economy. And how
are you feeling about the
trustworthiness of these models? Because
you've talked a couple times now about
how you could use deep research to
prepare you for something or you could
use these models to generate a
PowerPoint and then spend a couple
minutes checking them over. Are you at
the point now where you think these and
know the outputs of these models are
trustworthy enough that that's all it
takes?
>> I think as long as uh and this is where
I get very excited about about now
obviously what's in the zeitgeist is
context engineering. as long as you are
really good about what context you're
giving the AI and and how you are are
effectively grounding the the AI in uh
trustworthy data with the right kinds of
prompts and a and a high enough quality
model um you could nearly eradicate uh
the you know all of if not the vast
majority of of hallucinations or
accuracy issues. So in our case, you
know, everything that we do at Box is we
think about your existing data as the
source material for for the AI agent. So
it's the source context for the AI agent
to be effective. And so if I take an
existing PowerPoint document that's our
sales presentation and I say modify this
for a new customer uh and and you do
that with a you know a frontier model
that is a a reasoning model you know
with some degree of kind of thinking
mode uh I I would I would posit that 99%
of the time it's going to make you know
infantismally false small kind of errors
or failures on that that's just like a a
solved problem at this point and so um
and and it is still easily worth the the
kind of fiveminute trade-off off for the
couple hours you save to go and review
its work. And this is the that like we
we actually have this incredible front
row seat in watching what the future
looks like with coding. So if you talk
to if you talk to the new like the brand
new startups um and I I I don't know if
you if you do this but I know that that
you know you get you get you know to
spend your time with the devices of the
world and whatnot but like go talk to a
fivep person startup that's brand new
and what's exciting is they are working
in the craziest ways that I've ever seen
in my entire life. Um I was talking to a
nineperson startup the other day that
estimates that they're at a minimum
executing at the size of about a 100
person company and that that was again
kind of conservative probably when when
you you know do the underlying math and
it's because each of their engineers has
the capacity output now of five or 10 or
20 engineers worth of work but they are
working in a completely different way
they are they are managers of AI agents
they spend their time on writing really
good specs for what they want to build
they spend really good time on on the
design architecture of their software
and then they spend a lot of time on
reviewing the output of the agent. So,
you know, not every area of knowledge
work will look exactly like that. But if
you imagine, you know, in sales, if you
imagine in marketing, if you imagine in
legal work, uh, and your role is to
manage agents that are doing a lot of
the underlying data preparation,
research, um, you know, creation type of
work, and then your job is to go review
that work and put it together in a
broader business process. That will
actually be what a lot of work looks
like in the future. And this idea of
hallucinations or errors will be no
different than the fact that I have to
sometimes review other people's work and
other people review my work and I have
errors in the presentations that I
create that somebody catches and they
see a misspelling or they they see that
I change the name of a customer in the
wrong way and they they change that. We
will be doing that for AI agents. So
it's this it's this flip of the model
where we thought AI agents were going to
review our work and kind of
incrementally make us more productive.
We will be the reviewers of the AI
agents work. We will be the editors. We
will be the managers. We'll be the
orchestrators. And that's actually how
you then get the productivity gains. So
I'd say watch the AI coding space. Watch
what startups are doing to get leverage
and then think about that against the
broader economy.
>> You know, it's really interesting,
Aaron, because the last time we spoke
you told me about this person that you
knew who was basically building a
company on their own using AI coding
tools. And so I was in the process of
writing this profile of Dario at at
Anthropic which you're quoted in and uh
I went out and found a developer doing
something quite similar using cloud code
to build on their own. So
>> this is clearly I mean to the point
where like anthropic now has to put some
rate limits on but this is clearly a
thing that's happening.
>> Well and and this is the this is the
thing that again I I still I love the
MIT survey. I think it's great. It's
it's it's a fun conversation topic, but
but the the the the one uh travesty
would be if if people miss that that
what you just said is actually happening
on the ground and then not starting to
pay attention to what that's going to
mean as that ripples through
corporations and how people should
probably start to think about
re-engineering workflows for a world of
of AI agents and and you know this
happens in every single technology uh uh
you know wave which is actually why you
have early adopters and early innovators
and why you have lack lagards is because
the early adopters and innovators are
going to read, you know, your anthropic
piece and see, oh, this actually is a
real trend. And the lagards are going to
read the MIT piece saying, "Oh, I've
been vindicated." And some companies
will then get those early returns at a
much faster rate. And other companies
can wait. And and you know sometimes
that means that your company gets
disrupted and sometimes it doesn't
because you actually have you know some
proprietary you know um capability as an
organization like like if Fizer or um
Eli Liy took a little bit longer to
adopt AI as a result of of you know one
to be more pragmatic that'll be totally
fine. They're not going to get
disruptive like they have enough of
market position they have enough
distribution they can afford to kind of
wait for this technology to be more
baked. But if I'm a startup right now,
I'm probably going to use that as my
advantage as much as possible to try and
run circles around maybe a larger
incumbent. And this is what kind of
creates this nice tension in the market
that that you know creates creative
destruction um in every in every you
know kind of wave of of uh of
technological change.
>> Okay. I definitely want to speak a
little bit more about what the
definition of an agent is and how you're
rolling them out at Box and also uh get
your reaction about uh GPT5. So let's do
that right after this. And we're back
here on Big Technology Podcast with Box
CEO Aaron Levy. Uh Aaron, let me start
before we get into agents and before we
get into uh GPT5, let me just start with
a basic question um which is if this is
already happening in business, which is
basically like you're finding ways to
get the AI to do work on its own and
pull information from different data
sources and present it coherently. Why
do you think it's been so difficult for
uh consumer companies like let's say
Amazon with Alexa plus and Apple with
Apple intelligence to put this together
some as something um on device or a
consumer product that does similar
activities because they've all promised
it but it it's not quite there yet.
>> Yeah, I I think there's um the the fact
that the technology can exist is
different from uh the the the still the
execution requirements to to bring it to
life. Um, and so, you know, we we get to
all have a front row seat on what the
frontier models can do and and you have
companies that can package those up in a
way for these applied use cases. Um, but
if you're a company with tens of
millions or hundreds of millions of of
users of your your product and consumers
that have a certain expectation and um
you that that that is a lot of execution
gap required to go from the frontier
model to how do you deliver that to your
end customer in a reliable way that is
trustworthy, that is affordable. Um and
uh and so I I think that the bigger
companies are all going through their
own version of that motion. Um I'd also
imagine that given a given the space is
moving so fast uh I can sympathize for
probably some degree of indecision maybe
where one day a model is on top and then
the next day a different model's on top
and then another day you know another
model kind of breaks through. And so you
probably want to make sure that by the
time that you land on a final
architecture you want that to be the
sustainable long-term architecture. And
so to some extent time is on your side
up to a point because you might want to
wait to see kind of who falls out and
who keeps going. Um but but I I'm I I
think that you know the as an example
the companies you just mentioned like I
don't think the the the spaces have been
so utterly uh disrupted that that uh
that they can't catch up uh once they
land on a final architecture. Um but you
know we'll have to see kind of how they
execute through this. And so for
business, it's more that there are more
prescribed use cases. And I think with
with a phone, maybe if you're trying to
get these proactive notifications, then
that you're looking at a massive
universe of data, whereas you're more
concentrated in business. Or what's the
difference?
>> Well, actually, I I I wouldn't say
there's a difference. I would say even
in business, we're insanely early. Like
we we have to process how early we are.
The the the the breakouts so far have
been chatbt for consumers. The breakouts
have been uh you know coding agents for
very very uh wired in engineers that
that are you know very online. They're
paying attention to everything going on
and then early adopters across the
economy. You know most of the the agents
that are being deployed in the
enterprise are being done by the the
like um maybe you can flash it up or
something. Jeffrey Moore came up with
this idea of the technology adoption
curve or at least popularized it. It has
multiple categories of where a company
and a or a group of individuals will
will be. You have you have these early
innovators and early adopters. Then you
have a chasm. Then you have then you
basically have kind of pragmatists and
early majority and then you have lagards
and and we are in the early adopter kind
of the earliest phase of jumping over
the chasm on some use cases. But we have
to imagine there's there's this chasm
where what happens is the early
adopters, the people that, you know, we
all hang out with and and talk to all
day. They're going to try everything.
We're going to try the these crazy
goggles and we're going to put, you
know, magnets on our head and we're
going to do the craziest things. We're
going to wear Google Glass. And that
that actually tells you almost nothing
about whether the thing will jump over
the chasm. you you have to actually see
like what what makes it to the early
majority or the or or those pragmatists
that that really adopt things at scale.
And so the kind of technologies that
have clearly broken through are chatbt
products like cursor products like let's
say you know a bunch of these kind of
nextgen research agent type things
perplexity done well in that kind of
early majority but we we we are so early
in terms of AI agents jumping over now
the chasm. So some won't make it some
will. Um but but I would say that that
business is not particularly moving
faster than than the examples you just
gave. I just think we we can see lots of
examples of it, but they're usually in
that kind of early adopter type type
category,
>> right? And so the week we're talking,
you at Box are releasing a number of
different agents. Um, but let me start
this discussion by just asking you, what
is an agent? Because
>> it does seem like it's an overused term
and and even myself, who I'm I'm in this
all the time, I I don't fully have
clarity on what that word actually
means.
>> Um, I I think the uh I think we should
anticipate that it's fully overused. It
is now the new term of art for talking
to a an AI system that is doing work for
you. So just we will hear this will be
the main term that we use going forward
as an industry and not because it's a
buzz word but actually it's a it's a
useful term. It's a it's a definable
object that is doing automated work for
you. That could be in some cases as
simple as answering a question. Um, but
I think most people in in the tech
industry would generally argue that it
should be doing some degree of of work
and looping through the AI model
multiple times um uh to do that work.
And so uh that could be everything from
you know very clearly something like
claude code or cursor has an agent or
replet has an agent where you give it a
task like build me a website that has
these qualities and it will go off and
do you know weeks worth of human work in
10 minutes and that's an agent that is
managing that whole process looping
through the model multiple times keeping
track of what it's doing updating its
memory in the process and that's
effectively an agent. So that's an agent
encoding and we're going to see that
same kind of agent architecture emerge
in law, in healthcare and finance and
education where you can deploy agents to
go off and do work for you. And um and
and there will be, you know, a critical
access which is how much work can the
agent do before you have to intervene
and modify and kind of repoint it in the
right direction. And so a lot of that
work right now can be maybe a couple
minutes long, but but we're seeing
examples where agents could be running
for tens of minutes or maybe even hours
and effectively drive, you know, better
and better and high quality more high
quality output. So, so I think that
that's a way to think about agents and
and these are going to be very pervasive
in the in the coming years, but this is
really the first year 2025 is the first
year where we could even really be
talking about it seriously. Um uh and um
I think Andre, you know, Carvathy had a
had a um uh you know, probably phrased
it as we shouldn't think about this as
the year of agents. We should think
about it as the decade of agents. Um
that's probably the right way to think
about it. This is
>> the year of mobile became the decade of
mobile. But then eventually we started
using mobile.
>> Yeah. But but but the and and again when
you just said the year of mobile
mattered, right? Did did people say, you
know, so some people said that was in
2022, but probably the first time it
could have been realistic was 20 uh
sorry, not 2022, 2002. Some people uh
but but it wasn't really realistic until
2006 and 2007 when when you had the
iPhone. So uh you know I and I think
fairly other many many other people are
actually convinced we we already have
our iPhone for agents. We don't need we
don't need any kind of new breakthrough
architecture. We have the the an
architecture that that already kind of
works as the the core scaffolding for
agents. So, so we can start the decade
kind of clock now. Um, but it will be a
a full self-driving type problem. Um,
you know, obviously Whimo, you know, got
kicked off, I don't know, a decade,
decade and a half ago and only this year
is it, you know, accessible in suburban
Silicon Valley. So what what took a
decade or a decade and a half uh it was
just lots of engineering work, lots of
miles on the road, lots of improving
every single dimension of um you know of
of of the accuracy of uh and the
intelligence of of the system. We're
going to see the same thing for
knowledge work. It's going to take
years. The early adopters will get the
the early returns. The pragmatists will
will use it once once it sort of works
without a lot of handholding and
everybody will land somewhere in in that
in the middle of that spectrum. Okay.
And so I watched a chunk of your
presentation this week and some of the
agents that you're talking about
enabling companies to deploy will be
things that will for instance uh take a
look at a application to um be invol to
maybe take an apartment out or um uh oh
yeah or to look at some property records
and then do tasks there or to create
reports um looking at clinical tests and
trying to pull out issues. So talk a
little bit about how the process to
create these works and is this still in
the demo uh phase or is this actually
real?
>> Um so maybe uh second question first. Um
so uh so so we we made a number of big
announcements this week. Um uh some of
the the product and capabilities that we
announced are fully GA right now. So
customers can already start to use it.
Some of it we we kind of give a little
bit of a of a crystal ball view into the
next couple of quarters of of the
product that we're getting out there. Uh
as an example, we have an AI agent right
now that any customer can go and use um
uh which is an a data extraction agent.
So you can give us again contracts or
invoices or medical data. And then we
have an a AI agent that that um that
works through that content, pulls out
the critical data from those documents
and then lets you go and automate a
workflow around that. What we announced
at Box Works was a new capability called
Box Automate. And what the idea of Box
Automate is is it's very very powerful
to have one-off agents that can help
you, you know, review a document or
generate a proposal or generate a sales
plan for a client based on data. That's
super powerful. But what's even more
powerful is if I can drop many of those
agents into a full business process. So
what Box Automate lets you do is
actually define your business process
within Box. It could be a client
onboarding workflow. It could be an M&A
due diligence review process. It could
be a health core a healthcare patient uh
review process. And you define that
workflow within box automate. It's a
drag and drop um kind of workflow
builder. And then at any point in the
process, you can bring in an AI agent to
do work within that process. And so one
one thing that that that is very
important with AI agents is they need
the right context to be effective. So
our system allows you to get that
context to agents from your enterprise
content. So your marketing assets, your
research data, your contracts, your
invoices that becomes very important
context for agents. So Box Automate lets
you basically build these agents on
demand or on the fly in a in a workflow
that leverages your existing content and
then we can start to help you automate a
bunch of knowledge work tasks um around
the enterprise. Now, a lot of the early
reviews around GPT5 was it was sort of
built to do these type of things or like
as a foundational layer for this type of
work, right? The the reviews we read
early on was that um it just does stuff
and there have been people that have
noticed that like when you're in chat
GPT using GPT5, you like literally can't
have an answer where it doesn't say can
I do something uh for you. So, I'm
actually curious what your response has
been. We last time we spoke was preGPT5.
uh what your what your feeling has been
about this new set of models really it's
a set of models um and and I'm curious
like what you make of the fact that so
many people were disappointed early on
>> well um yeah so so um we on on the on
the disappointment or kind of online
zeitgeist which actually interestingly
has already shifted um uh I think you
know quite a bit where a lot of folks
have kind of updated their views on on
GBD5 and I think codeex has come out
very strong recently on the coding uh
agentic side um uh you know I if I think
it uh we have gotten uh used to uh and
and we've been hooked on these
incredible kind of jumps and and
breakthroughs over the past you know
year or so we had we went from if you
think about it we went from GPD4 to
GPD40
to 01 and 03 and then GBD41 and each of
those on a different axis was actually a
pretty meaningful step function. So if
you had just taken GPD4 and then you
jumped to GPD5, it would have looked
insanely exponential. But we got these
points along the way that that
effectively uh you know kind of gave us
an early preview into what GPD5 would
ultimately become which is a thinking
model with chain of thought with with a
a way higher quality of coding skills
and a bunch of capability de on u
capabilities on on critical dimensions
of work. And so so I think it was mostly
just driven by the fact that we got lots
of incremental steps uh or step function
steps on the path to GBD5 and then GBD5
was just the culmination of a lot of
those breakthroughs. Um so so again it's
I think it's probably more psychological
than than than you know kind of
empirical like I think if we had gone
from you know three to four to five it
would be the most vertical axis we've
ever seen. Um but it was really again
those steps along the way that that um
that maybe caused a little bit of that
that kind of reaction. Uh in our world,
you know, we we test every single model
on a number of of evaluations where we
give um we give the model different
types of enterprise data uh contracts,
financial documents, research materials,
internal memos, those types of things.
And we ask the model a series of
questions about that document or data.
And we saw meaningful improvements from
GPD5 versus GBD41 as an example on our
our eval. So for us it was multiple
points of improvement on a number of our
key on our key tests. And that those
those improvements then translate into
real life improvements for for for you
know customers where uh they all of a
sudden will it'll mean that when uh
you're a health care provider using GPD5
on unstructured healthcare data you're
going to get better results than you got
before or when you're using it on your
contracts you're going to get better
results and so on on a number of spaces
where either it was kind of expert uh
analysis required in healthcare or law
or financial services we saw
improvements or in more a general sense
if you needed logic or reasoning or math
uh it was also an improvement on those
dimensions as well.
>> Can I get a quick gut check from you on
the economics of the AI industry right
now? I mean we are talking at a moment
where we just talked about this on the
Friday show with Ranjan that uh OpenAI's
losses are now going to total 115
billion through 2029. Oh sorry it's cash
burn 115 billion through 2029 80 billion
higher than it previously expected. It's
It's expected to make like 10 billion
this year, but it just signed a $300
billion deal with Oracle that like
turned Oracle into a nearly $1 trillion
company almost overnight and made Larry
Ellison the richest person uh in the
world above Elon Musk. How does this how
does this make sense? Well, I I think it
makes sense if if you believe like I do
and certainly others, you know, Jensen,
you know, Kulie Sam, uh even Elon, I
think would believe that this is the
single biggest technology uh uh that
that we've we've probably ever had
access to. And um and so if you you
think about this as sort of a third
industrial revolution where for the
first time ever, we can bring automation
to knowledge work. Just think about that
for a second. We were bringing
automation to knowledge work. Everything
about the world of knowledge work was
always basically limited by how fast we
as humans could work. We could type into
a computer, put data into a system,
somebody else reads that data, it moves
along in some kind of process. That was
about the speed of of knowledge work was
how quickly we could type or read
information and then do something in the
real world with that data. That was the
rate of of pace. That was the pace that
knowledge work could happen at. And so
every field that we know of uh in in
kind of knowledge work uh you know
health care uh experts reading you know
medical diagnosis um uh uh life sciences
experts that are doing research on
clinical studies. Lawyers that are
trying to find facts about a a case um
or uh you know working through
intellectual property. An engineer
trying to generate code and read product
specifications. All of that work has
always been constrained by how fast we
as individuals can do that work
individually ourselves. For the first
time ever with AI, we can bring
automation to effectively all of that
work. And that automation can kind of be
tuned based on just how much compute we
throw at the problem. And then of course
how good our data is and how how
effective our systems are at getting
that data to the AI. But in a world
where you can toggle compute and then
get different levels of automation and
and effective output in work to get done
at a way lower cost than what people can
do. That that is the biggest
breakthrough we've ever had in in you
know in the economy and in in the sort
of you know in the kind of
post-industrial uh you know world. And
so, you know, hundred billion dollars of
of loss, let's say, to to get to that
point um of of, you know, saturation
where that technology is out there. That
that's a very it's actually a very small
number when you think about the economy
and the size of the economy for all of
healthcare, all of law, all of life
sciences, all of financial services, all
of engineering. So, I think that's how
that's how these technology companies
are underwriting this. And the losses
are a choice to be clear like that. I
mean that's very that's very obvious
like they are choosing to lose that
money. They're doing it uh for a
strategic reason you know that that's at
least their decision. The strategic
reason is is that this is such a
valuable market to own and to dominate
in that that they would rather build up
capacity and in many cases subsidize
usage let's say in free consumer tiers
of chache BT than charge everything at
today's you know kind of rate of cost
and and then you know make sure
everything is profitable that's a choice
they could decide to charge for
everything they would get less adoption
today they would it would be you know
instantly a more sustainable business
But enough people believe that the prize
is big enough that it's worth actually
doing all of the research expenses, all
of the data center expenses, and the
subsidization where necessary to drive
that adoption and demand. And it's a go
big or go home type of bet. You know,
clearly very very smart uh very
economically rational uh firms,
individuals, sovereign wealth funds
believe that that bet is worth it. um
I'm probably on the side that that the
bet is worth it because of again how how
material of an economic impact this
technology can have and then we'll
obviously house how how we'll see how it
plays out with any kind of individual
player in the in the space. Folks, you
can learn more about Box's offerings at
box.com. There's a video playing on the
homepage right now that talks a lot more
about the things that Aaron and I have
discussed here today. Aaron, so great to
see you. Thanks again for coming on the
show.
>> Thanks, Alex. All right, everybody.
Thank you so much for watching. We'll
see you next time on Big Technology
Podcast.