How agents will unlock the $500B promise of AI - Donald Hruska, Retool

Channel: aiDotEngineer

Published at: 2025-07-23

YouTube video id: Lqq_LcBaJCc

Source: https://www.youtube.com/watch?v=Lqq_LcBaJCc

[Music]
Yes, my name is Donald. I lead the new
product teams at Retool. Uh Retool made
its name in the earlier days working on
internal tools, making it really easy
for any business out there to build
internal applications. And we've been
making it easy to connect with AI
providers for a couple years now, but
we're now breaking into Agentic AI with
the release of Retool agents, which we
announced last week and made available
to our customers.
So half a trillion dollars has been
spent on AI infrastructure and yet most
large companies are really just still
stuck with toy chat bots and messing
around with code generation. So, let's
talk about why that changes this year
with enterprises finally being able to
build agents with guardrails that plug
into real production systems.
Reuters shared last week that Anthropic
hit at the end of May, so a couple days
ago, $3 billion in annualized revenue.
That's up from $2 billion at the end of
March and $1 billion in December. So,
that's 3xing their annualized revenue
growth in 5 months, which is some
staggering growth. That's not to mention
OpenAI is slated to end 2025 at 12
billion in revenue, over 3x where they
were at at the end of last year.
These growth rates are massive, and this
largely is fueled by enterprise AI
spend.
And coding is growing. Teams love using
Cursor and Windsurf, including my own. I
think every engineer on my team is using
one of these tools. and engineers are
now becoming experts in prompting and in
code review and letting LLM do the heavy
lifting of a lot of day-to-day coding.
Their workflows really are just
completely transformed right now and
their productivity is through the roof.
If you look at Open Router, which gives
access to a unified API that exposes
hundreds of AI models, their top apps
list is really dominated by code
generation use cases as you can see
here.
And the LLM providers are taking note.
SWEBench verified is a benchmark that
measures an AI model's ability to
perform real world coding tasks. If you
look at GPT4.1,
it's up 21 percentage points from
GPT4.0.
Really showing the investment that
OpenAI is is putting behind making their
models work really well for coding use
cases. And Gemini 2.5 Pro is up another
9 percentage points from GPT41. Devs are
raving about Gemini 2.5 Pro. I think
nearly every developer I know using
cursor is talking about how well it
works.
And finally, the term vibe coding has
firmly planted itself in the zeitgeist.
Last week on the Andre Horowitz podcast,
Rick Rubin, the legendary music
producer, said vibe coding is the punk
rock of software. talking about in the
same way that punk rock with its
simplicity made it really easy for
anyone who had something to say to go
make a song. Vibe coding is doing that
now for anyone with an idea.
And vibe coding is so powerful because
you just tell cursor or wind surf kind
of the gist of what you want and it goes
and it thinks and it thinks and it acts
and it writes that code for you. This is
a lot different than basic text
completions or copying code from chat
GPT into your code editor. This is
agentic AI.
So vibe coding needs agents to work. But
why should we stop with this idea at
just code? Code is testable. It has
semantics. It's easy to validate and
understand if the LLM is generating it
correctly. But could we apply the same
idea to any problem in our business? And
to do that, we would need general
purpose agents.
And building the agent, believe it or
not, I would say, is actually the easy
part. You could build a really basic
agent in about a 100 lines of JavaScript
or Python. Have the start of one right
here. And what I'm talking about here is
using the React framework, which is
basically a a framework for building
agents that instructs the agent to
reason, act, reason, act until it
determines that it's come up with a
final answer. The agent has access to
tools, which are basically a set of
functions. These could be external
services. It's calling code in your
codebase that it's running. So
effectively an agent is just an LLM
wrapped in an execution loop that can
read, decide, call tools, and
self-verify.
So here you see, like I said, I have the
start of a basic agent. I'm defining a
set of tools for that agent. In this
case, it has one. It's a calculator as
well as a function to actually calculate
something.
I initialize a system prompt here for
the agent using the React framework.
And I know there's a lot here, but
basically what this is is I'm defining
that agent loop. It's a for loop, like
I'm sure many of us learned in CS 101,
and a number of maximum iterations so
our agent can't get stuck in a loop
thinking forever burning up our our
OpenAI costs.
The LLM tells our logic when it decides
that a tool needs to be invoked. We call
that tool. We pass the result back to
the LLM and it decides when a final
answer has been reached. We detect that
and we spit it back out to the user.
So building agents is easy, right? We
can all just go build agents at our
company and problem solved, right? Not
so fast. Just like vibe coding, agents
are tough to get into production in the
same way that say a web app that you
build and cursor really really quickly
is tough to get into production. You
have a lot of things that a real
enterprise company that you're probably
concerned with here. things like single
sign on role-based access control
integrating with external services in a
secure way. Maybe you care about audit
logs. Maybe you care about compliance
like sock 2. Maybe you use AWS secrets
manager. Maybe you are a multinational
corporation. It needs to be
internationalized. The list goes on and
you can't always safely vibe code these
things. The information released an
article last week on the high risks of
using vibecoded
logic in production and a couple real
world use cases of vulnerabilities that
were uh put into production by
developers not carefully vetting AI
generated code.
We've also learned firsthand at retool
that there's a lot that you really have
to get right when you build agents.
models can hallucinate or give you
unpredictable results or inaccurate
results, madeup results. You have to you
have to be mindful of security. You have
to be conscious of the things that
you're giving your agent access to.
You have to be cognizant of cost
overruns. It can be really uh easy to
accidentally burn up a bunch of tokens.
And overall, eval are really an
important safeguard here in making your
non-deterministic agent as deterministic
as you can.
So, how do you solve that problem? I
would kind of group the options into
approximately four buckets. The first is
to build your agent from scratch. You
write every line of code by hand. Maybe
you're fine-tuning LLM. Maybe you have
AI ML engineers on your team. You have
full control, but it's a high lift.
You're building all those ancillary
pieces, but what you get is something
purpose-built. It's not outsourced. You
have maximal control.
Then there's more of a middle ground
using a a framework like say Lang graph.
You still have a high level of control.
For example, different memory modes.
It's a medium lift but a pretty flexible
framework that you're tied to. There's
agent platforms like retool agents where
you would get opinionated defaults, low
lift to production. Of course, you're
tied to the platform, but it's useful
for that long tale of business agents.
The hosting is abstracted for you.
Connectors to external services come out
of the box observability for your fleet.
Or the fourth bucket is the verticalized
agent bucket. These are offerings where
the agent is really dialed in for one
use case. It can do one thing really
well, but you really have minimal
flexibility to kind of go beyond that
one core use case.
So, how do you decide? Everyone wants
agents, but you have to be really
thoughtful about where you spend those
precious engineering cycles. You know,
when should you hand roll an agent
versus when would you want to consider a
managed agent platform? Ultimately, I
would say the decision boils down to an
engineering decision of trade-offs. If
you're working on something that's part
of your core product or gives your
business its competitive edge, then you
probably want to build it yourself.
If you are working with say regulated or
sensitive data, maybe you have hard
SLAs's of some sort, you might want to
consider both options. But if you're
building some kind of commodity workflow
and you need it in days and not
quarters, then I would probably buy it.
I would also as a part of this do a risk
assessment of either option. You know,
do you want your engineers debugging
business logic or do you want them up at
2 am trying to figure out why OOTH isn't
working right?
As a part of this decision, if you go
the managed platform route, I would
evaluate the breadth of connectors that
the uh offering connects to. You know,
are you pulling data from Salesforce and
data bricks and Snowflake? Is that going
to come out of the box or do you have to
build that?
Is permissioning built in? Is it
compliant? Does it come with audit
trails? Is observability built in? Are
emails built in? Or is that another
vendor that you're going to have to go
now pay for?
And I think overall on the build versus
buy decision, I would think about the
token costs, the infrastructure costs,
and the engineering costs that come into
play for building Orbine
on observability. This is how we think
about it at retool for agents. It's
important overall, I would say, with
whatever platform you go with to
understand token usage, estimated costs,
and runtime information for your agent.
And with whatever platform you choose,
you should also be able to dial in to
any specific agent and agent run to make
sure that your fleet of agents is doing
what you would expect it to.
So looking ahead, there's an analogy
here I would say to how businesses today
think about building versus buying
software. Stripe, for example, is always
going to have its core billing logic and
its critical userfacing apps built by
hand. But Stripe uses external platforms
for that long tale of software.
And I would expect the same for agents.
I would expect businesses as time goes
on to have a few handbuilt agents
purpose-built for certain use cases and
then a long tail for business use cases
hosted on some kind of platform.
to look again at Stripe. They use React
for much of their critical customerf
facing software and they use Retool for
much of their internal tooling.
Or you could say look at Cursor. Cursor
would never use a managed platform for
their core product. You know, this is
their core product that we're talking
about. It would be slow to use a
different provider. They wouldn't own
it. They really need as much control as
possible and they have a lot of really
smart engineers kind of pouring over
every edge of that thing. But you could
imagine that as cursor the company grows
which they are they may eventually be
dealing with a high volume of say
fighting chargebacks against their
billing provider uh many customer
support requests. I could imagine cursor
their company moving towards using an
agent platform as they get quite large.
I've been working with closely with
customers like AWS on initiatives to
automate mundane business processes with
AI and I've really seen the impact here.
Another retool customer, ClickUp, built
their AI tooling on Retool. They saved
over $200,000 in vendor costs and
hundreds of thousands of dollars on
additional headcount.
Dcript estimated that they're saving
hundreds of hours of work weekly with
the 50 apps they built. And in fact, we
recently announced at Retool on the
topic of work automation that our
customers have automated over 100
million hours of work to date.
By doing this, we're freeing human
potential for more creative and
strategic endeavors. You know, people
thought the print and press was going to
lead to the decline of traditional
knowledge. And in fact, it democratized
the access of information. And I really
do think that AI and agents are going to
enable businesses to enhance the
capabilities of their people and of
their teams. And this is just going to
unlock limitless potential and I would
say overall just increase the GDP of the
world.
Last week, Mary Mer's AI trends report
came out and it was reported that
inference cost is dropping dramatically.
From 2022 to 2024, cost per token
dropped 99.7%.
And spend is huge as we saw with
Anthropics uh 3xing their annualized
revenue in 5 months and OpenAI's 12
billion by the end of this year. While
the marginal cost, as we can see here,
is completely bottoming out.
For example, at Retool, for our cheapest
agent, we charge $3 an hour. You can
imagine that cost is going to keep
dropping.
The MA report also showed that Google
searches for AI agents 11xed in the last
16 months. So, you can expect to keep
hearing about agents. So, in closing, I
would say the question isn't what is the
single golden ticket way to put
everything in my business on autopilot.
It's where can I help my engineers
create the most leverage and what's the
right tool for the job.
Thank you.
I think we have two three minutes for
questions.
Yeah.
Uh first of all, thank you for the talk.
That was really good. Um I was curious
like this essentially paradigm of uh for
core like for core business logic uh
build your own tools. uh whereas you
know for more ancillary stuff looked at
things like retool agents was this like
a philosophy that you guys had uh
basically figured out like while working
on this stuff internally to retool and
if so like what's an example of like
retools core internal logic that they
want to build themselves and what's
something they might look to use their
own product for their own agent for
that's a really good question I think
like this is like generally a philosophy
at retool we have just you know uh we we
build a lot of our own internal software
on on retool. Of course, we're like dog
fooding as much as we can. Um, in terms
of your second question, I would say
it's a great question. Agents released
last week like I said, so we're building
as much as we can on it. I I think it
remains to be seen what we'll do on the
platform and what we'll build by hand. I
think just our philosophy is to do as
much as we possibly can using our own
platform and if we can't do something
then we should go figure out why and go
build it. And so I think for us
specifically I would say we're just
going to use the platform itself for
everything we possibly can.
Thanks for the question.
Hey Donald Lance from IO. So we build uh
applications for government and NGO and
stuff
and I'm curious about your AI agents. Do
you allow your onrem uh offering to
include the AI agents as well?
We do. We do. We uh so we launched cloud
only but on-prem support is coming in
the next like week or two maybe three.
Um so yes it is definitely going to be
supported on prem and also eventually
for our air excuse me airgapped
customers as well.
Thank you.
Any other questions?
Cool. Well thank you everyone.
[Music]