How agents will unlock the $500B promise of AI - Donald Hruska, Retool
Channel: aiDotEngineer
Published at: 2025-07-23
YouTube video id: Lqq_LcBaJCc
Source: https://www.youtube.com/watch?v=Lqq_LcBaJCc
[Music] Yes, my name is Donald. I lead the new product teams at Retool. Uh Retool made its name in the earlier days working on internal tools, making it really easy for any business out there to build internal applications. And we've been making it easy to connect with AI providers for a couple years now, but we're now breaking into Agentic AI with the release of Retool agents, which we announced last week and made available to our customers. So half a trillion dollars has been spent on AI infrastructure and yet most large companies are really just still stuck with toy chat bots and messing around with code generation. So, let's talk about why that changes this year with enterprises finally being able to build agents with guardrails that plug into real production systems. Reuters shared last week that Anthropic hit at the end of May, so a couple days ago, $3 billion in annualized revenue. That's up from $2 billion at the end of March and $1 billion in December. So, that's 3xing their annualized revenue growth in 5 months, which is some staggering growth. That's not to mention OpenAI is slated to end 2025 at 12 billion in revenue, over 3x where they were at at the end of last year. These growth rates are massive, and this largely is fueled by enterprise AI spend. And coding is growing. Teams love using Cursor and Windsurf, including my own. I think every engineer on my team is using one of these tools. and engineers are now becoming experts in prompting and in code review and letting LLM do the heavy lifting of a lot of day-to-day coding. Their workflows really are just completely transformed right now and their productivity is through the roof. If you look at Open Router, which gives access to a unified API that exposes hundreds of AI models, their top apps list is really dominated by code generation use cases as you can see here. And the LLM providers are taking note. SWEBench verified is a benchmark that measures an AI model's ability to perform real world coding tasks. If you look at GPT4.1, it's up 21 percentage points from GPT4.0. Really showing the investment that OpenAI is is putting behind making their models work really well for coding use cases. And Gemini 2.5 Pro is up another 9 percentage points from GPT41. Devs are raving about Gemini 2.5 Pro. I think nearly every developer I know using cursor is talking about how well it works. And finally, the term vibe coding has firmly planted itself in the zeitgeist. Last week on the Andre Horowitz podcast, Rick Rubin, the legendary music producer, said vibe coding is the punk rock of software. talking about in the same way that punk rock with its simplicity made it really easy for anyone who had something to say to go make a song. Vibe coding is doing that now for anyone with an idea. And vibe coding is so powerful because you just tell cursor or wind surf kind of the gist of what you want and it goes and it thinks and it thinks and it acts and it writes that code for you. This is a lot different than basic text completions or copying code from chat GPT into your code editor. This is agentic AI. So vibe coding needs agents to work. But why should we stop with this idea at just code? Code is testable. It has semantics. It's easy to validate and understand if the LLM is generating it correctly. But could we apply the same idea to any problem in our business? And to do that, we would need general purpose agents. And building the agent, believe it or not, I would say, is actually the easy part. You could build a really basic agent in about a 100 lines of JavaScript or Python. Have the start of one right here. And what I'm talking about here is using the React framework, which is basically a a framework for building agents that instructs the agent to reason, act, reason, act until it determines that it's come up with a final answer. The agent has access to tools, which are basically a set of functions. These could be external services. It's calling code in your codebase that it's running. So effectively an agent is just an LLM wrapped in an execution loop that can read, decide, call tools, and self-verify. So here you see, like I said, I have the start of a basic agent. I'm defining a set of tools for that agent. In this case, it has one. It's a calculator as well as a function to actually calculate something. I initialize a system prompt here for the agent using the React framework. And I know there's a lot here, but basically what this is is I'm defining that agent loop. It's a for loop, like I'm sure many of us learned in CS 101, and a number of maximum iterations so our agent can't get stuck in a loop thinking forever burning up our our OpenAI costs. The LLM tells our logic when it decides that a tool needs to be invoked. We call that tool. We pass the result back to the LLM and it decides when a final answer has been reached. We detect that and we spit it back out to the user. So building agents is easy, right? We can all just go build agents at our company and problem solved, right? Not so fast. Just like vibe coding, agents are tough to get into production in the same way that say a web app that you build and cursor really really quickly is tough to get into production. You have a lot of things that a real enterprise company that you're probably concerned with here. things like single sign on role-based access control integrating with external services in a secure way. Maybe you care about audit logs. Maybe you care about compliance like sock 2. Maybe you use AWS secrets manager. Maybe you are a multinational corporation. It needs to be internationalized. The list goes on and you can't always safely vibe code these things. The information released an article last week on the high risks of using vibecoded logic in production and a couple real world use cases of vulnerabilities that were uh put into production by developers not carefully vetting AI generated code. We've also learned firsthand at retool that there's a lot that you really have to get right when you build agents. models can hallucinate or give you unpredictable results or inaccurate results, madeup results. You have to you have to be mindful of security. You have to be conscious of the things that you're giving your agent access to. You have to be cognizant of cost overruns. It can be really uh easy to accidentally burn up a bunch of tokens. And overall, eval are really an important safeguard here in making your non-deterministic agent as deterministic as you can. So, how do you solve that problem? I would kind of group the options into approximately four buckets. The first is to build your agent from scratch. You write every line of code by hand. Maybe you're fine-tuning LLM. Maybe you have AI ML engineers on your team. You have full control, but it's a high lift. You're building all those ancillary pieces, but what you get is something purpose-built. It's not outsourced. You have maximal control. Then there's more of a middle ground using a a framework like say Lang graph. You still have a high level of control. For example, different memory modes. It's a medium lift but a pretty flexible framework that you're tied to. There's agent platforms like retool agents where you would get opinionated defaults, low lift to production. Of course, you're tied to the platform, but it's useful for that long tale of business agents. The hosting is abstracted for you. Connectors to external services come out of the box observability for your fleet. Or the fourth bucket is the verticalized agent bucket. These are offerings where the agent is really dialed in for one use case. It can do one thing really well, but you really have minimal flexibility to kind of go beyond that one core use case. So, how do you decide? Everyone wants agents, but you have to be really thoughtful about where you spend those precious engineering cycles. You know, when should you hand roll an agent versus when would you want to consider a managed agent platform? Ultimately, I would say the decision boils down to an engineering decision of trade-offs. If you're working on something that's part of your core product or gives your business its competitive edge, then you probably want to build it yourself. If you are working with say regulated or sensitive data, maybe you have hard SLAs's of some sort, you might want to consider both options. But if you're building some kind of commodity workflow and you need it in days and not quarters, then I would probably buy it. I would also as a part of this do a risk assessment of either option. You know, do you want your engineers debugging business logic or do you want them up at 2 am trying to figure out why OOTH isn't working right? As a part of this decision, if you go the managed platform route, I would evaluate the breadth of connectors that the uh offering connects to. You know, are you pulling data from Salesforce and data bricks and Snowflake? Is that going to come out of the box or do you have to build that? Is permissioning built in? Is it compliant? Does it come with audit trails? Is observability built in? Are emails built in? Or is that another vendor that you're going to have to go now pay for? And I think overall on the build versus buy decision, I would think about the token costs, the infrastructure costs, and the engineering costs that come into play for building Orbine on observability. This is how we think about it at retool for agents. It's important overall, I would say, with whatever platform you go with to understand token usage, estimated costs, and runtime information for your agent. And with whatever platform you choose, you should also be able to dial in to any specific agent and agent run to make sure that your fleet of agents is doing what you would expect it to. So looking ahead, there's an analogy here I would say to how businesses today think about building versus buying software. Stripe, for example, is always going to have its core billing logic and its critical userfacing apps built by hand. But Stripe uses external platforms for that long tale of software. And I would expect the same for agents. I would expect businesses as time goes on to have a few handbuilt agents purpose-built for certain use cases and then a long tail for business use cases hosted on some kind of platform. to look again at Stripe. They use React for much of their critical customerf facing software and they use Retool for much of their internal tooling. Or you could say look at Cursor. Cursor would never use a managed platform for their core product. You know, this is their core product that we're talking about. It would be slow to use a different provider. They wouldn't own it. They really need as much control as possible and they have a lot of really smart engineers kind of pouring over every edge of that thing. But you could imagine that as cursor the company grows which they are they may eventually be dealing with a high volume of say fighting chargebacks against their billing provider uh many customer support requests. I could imagine cursor their company moving towards using an agent platform as they get quite large. I've been working with closely with customers like AWS on initiatives to automate mundane business processes with AI and I've really seen the impact here. Another retool customer, ClickUp, built their AI tooling on Retool. They saved over $200,000 in vendor costs and hundreds of thousands of dollars on additional headcount. Dcript estimated that they're saving hundreds of hours of work weekly with the 50 apps they built. And in fact, we recently announced at Retool on the topic of work automation that our customers have automated over 100 million hours of work to date. By doing this, we're freeing human potential for more creative and strategic endeavors. You know, people thought the print and press was going to lead to the decline of traditional knowledge. And in fact, it democratized the access of information. And I really do think that AI and agents are going to enable businesses to enhance the capabilities of their people and of their teams. And this is just going to unlock limitless potential and I would say overall just increase the GDP of the world. Last week, Mary Mer's AI trends report came out and it was reported that inference cost is dropping dramatically. From 2022 to 2024, cost per token dropped 99.7%. And spend is huge as we saw with Anthropics uh 3xing their annualized revenue in 5 months and OpenAI's 12 billion by the end of this year. While the marginal cost, as we can see here, is completely bottoming out. For example, at Retool, for our cheapest agent, we charge $3 an hour. You can imagine that cost is going to keep dropping. The MA report also showed that Google searches for AI agents 11xed in the last 16 months. So, you can expect to keep hearing about agents. So, in closing, I would say the question isn't what is the single golden ticket way to put everything in my business on autopilot. It's where can I help my engineers create the most leverage and what's the right tool for the job. Thank you. I think we have two three minutes for questions. Yeah. Uh first of all, thank you for the talk. That was really good. Um I was curious like this essentially paradigm of uh for core like for core business logic uh build your own tools. uh whereas you know for more ancillary stuff looked at things like retool agents was this like a philosophy that you guys had uh basically figured out like while working on this stuff internally to retool and if so like what's an example of like retools core internal logic that they want to build themselves and what's something they might look to use their own product for their own agent for that's a really good question I think like this is like generally a philosophy at retool we have just you know uh we we build a lot of our own internal software on on retool. Of course, we're like dog fooding as much as we can. Um, in terms of your second question, I would say it's a great question. Agents released last week like I said, so we're building as much as we can on it. I I think it remains to be seen what we'll do on the platform and what we'll build by hand. I think just our philosophy is to do as much as we possibly can using our own platform and if we can't do something then we should go figure out why and go build it. And so I think for us specifically I would say we're just going to use the platform itself for everything we possibly can. Thanks for the question. Hey Donald Lance from IO. So we build uh applications for government and NGO and stuff and I'm curious about your AI agents. Do you allow your onrem uh offering to include the AI agents as well? We do. We do. We uh so we launched cloud only but on-prem support is coming in the next like week or two maybe three. Um so yes it is definitely going to be supported on prem and also eventually for our air excuse me airgapped customers as well. Thank you. Any other questions? Cool. Well thank you everyone. [Music]