Why Getting Data Right Could Be The Key To Effective AI Projects — With Charles Sansbury
Channel: Alex Kantrowitz
Published at: 2025-11-05
YouTube video id: riVo7VI1BTY
Source: https://www.youtube.com/watch?v=riVo7VI1BTY
What does AI need to do to [music] deliver real economic value? Let's talk about it with Charles Sansbury, the CEO of Cloudera, who is here with us in studio for a video brought to you by Cloudera. Charles, great to see you. How are you? >> Great to see you and thanks for having me. >> Thanks for being here. We've been talking on the show so much about the economic value of artificial intelligence. Um whether or not there there will be an ROI on this technology. >> I'm so happy to be speaking with you today because you have a fascinating background. And you were the principal technology principal for technology investment banking at Morgan Stanley >> 1996 to 2000. Yes. So you've seen the euphoria around technology in the past around the dot moment. How does it compare to this moment? >> Um well even better. I left in 2000 to actually join a company that was one of the um definite.com darlings. So I lived it both as an adviser and as a as a principal. What company? >> Uh company called Vignette Corporation. original maker of web content management systems. So if you're building a complex website, you went to Vignette and bought the software. Fastest growing software company in history up to that point uh because of the demand went from 18 million of revenue in '98 to about 96 in 99 to almost 400 in 2000. And then the trajectory changed. >> So talk a little bit about what you saw then and today. Is it the same thing? >> It's people are saying it's another bubble. >> It's it's hardman for instance. It's hard for me to say I I I you know when we were in the moment there was euphoria around the opportunity but what we actually said was at that point every idea could get funded and the market wasn't differentiating between an idea and a good idea for funding. And so the financial investors were being rewarded early on for betting on everything. So, their solution, if I if I'm doing the math out loud, would be we're playing roulette and the payoff is 100 to1 and there are 33 spots. So, let's just put money in every spot. And we'd have losers, but the winners would pay for that. As time went on, it became more important to discern. And at that point, you saw some spectacular failures of businesses. Uh there's a company called Web Van that was going to automate the shopping experience. They spent $6 or $700 million building out an automated shopping experience. Turned out the way to shop was to go and take stuff off the rack and put in. Now 15 years later, we have the home delivery that's kind of kind of gotten there with humans actually doing the work, not robots. But you look back and you say it should kind of seem clear. So in the moment there were certain ideas you could say that doesn't make sense. But generally most of the businesses had business value attached to them. Um, and so I think probably the same is is true right now, except that the financial dollars have gotten so much bigger. So if you think about a venture capital round to fund a software startup in 1998, $20 million was the first round, maybe 40 or 50 to build out your salesforce and those were astronomical dollars. What's different now is the dollars required to be relevant in some of these markets are beyond anything we've ever seen. And so I read an article recently that said that um the the the the AI leaders and the funding they require the article was somewhat provocative is a systemic risk to our financial system. But if you think about it, we are spending so much money right now to uh buy hardware electricity electricity generation. Um and we just don't know. What I will say though is my sense is the end goals and outcomes are more tangible than you know selling beach balls on the internet beachballs.com or socks or what and so I do think they're tangible certainly there's a lot of money that's going to be lost in these investments um but I think what's going to happen is actually different and that I think there's going to be you know a lot of companies that turn out not to be successful but a very small number will be massively successful and the gains will outpay case the losses, but the gains will be concentrated in a very very small group of winners. That's kind of what I see as the end outcome here. >> And and as I'm hearing you talk, there's like some patterns that are emerging, right? So yeah, um and some some comparisons and contrast. So for instance, in the dotcom boom, the uh the money was spread over many companies and if you had an idea, then you would get money. Now it's much more concentrated. But maybe the difference is is that the big the company's getting these $40 billion investments like OpenAI or they just announced hundred billion uh with Nvidia. People are actually using their products like it's is almost as if you're in the com era but everybody's ordering from web >> from one from one person. Yes. You have a a fascinating study that just came out. Cloudera does the state of enterprise AI and data architecture and you asked IT leaders are you using generative AI or are you using AI? I'm [clears throat] talking about AI first then generative AI. 96% said yes. >> Yes. >> What's the significance of that? >> Well, there's two pieces of it. One, I think there's a a rush to show that you're using AI within corporate IT because there's been so much focus on very high-profile successes of we improved our code quality, we improved our ability to respond to customer calls. Those gains are real. But the other thing that's happening is if 96% of companies are trying AI, only about 30% of IT organizations have approve have approved to what they're doing. And so the business is running ahead of IT governance, IT rules, IT structures, which means we have this kind of wild west right now going on where the the business user, the nonIT person is pushing an AI initiative because you know he or she is being pushed by their their boss to show me some value for this technology. and and the early use cases that were successful around code completion or content generation or customer supportd driven applications they're very real and easy to get. I think the question is not not that but what happens next and what are the killer apps that evolve for AI and you know we talk about this breathlessly and I get excited about it but it's only been here for about 2 and a half years and so if you think about the pace that it's running at um I still think we're in the very early stages of which applications will be the game changers for companies and what we've said to our team uh internally is we want you to think about using AI in every function across the business. So, it's from a legal perspective, it's pretty straightforward. From marketing, it's pretty straightforward. Um, finance, uh, operations, and then obviously software development. Um, but the software developers were actually already ahead of us. they'd already been using a handful of code completion tools in our environments before the IT folks had approved um a specific set of tools to use which points to the fact that the business is is moving faster than than it massive adoption real business value but the technology is moving faster than the people who are trying to control the adoption of the technology that's the tension that we're hearing at at our event in New York here today. So let me ask you, do do all those divisions need to run through IT? So for instance, marketing, does marketing, which is using AI for content generation, need to run through it? Because I imagine you're going to be more successful if it is involved in some circumstances, but where's the balance? So the concern especially as you move from generative to agentic AI where you have autonomous agents moving through your systems doing stuff without checking back in with a human. Uh I think it creates risks that we haven't got our hands around yet. So um I was I was having a conversation today with one of the folks who was presenting at our conference and that person runs IT governance for a large global financial services organization and that person said that right now the business is pushing back on ITbased governance and regulation but that actually once it's in place governance will be an accelerant not a detractor because if you have a set of approved tools and processes you can then allow the business should go to go deploy that. But I just I I am very uncomfortable personally and from a governance model perspective from not having some oversight that exists for technology that are going to be deployed on inside an enterprise infrastructure. So I'm that is my concern and I know that there are there are people in the organization who think I'm being overly cautious um but but that's how I'm thinking about it. >> But it's also an effectiveness question. I mean, I think that I'm going to run this by you. So many organizations are getting some value when people are using chat GPT in their roles. Yes. But when it comes to making change in the way they work, >> that's when you sort of need that governance, you need that buyin. And your your study is fascinating. Uh and something really struck me from it. Uh this is from right from the study. Just 9% of respondents said uh that all of their data was available, right? and only 38% said uh that most of the organization's data was available. So you have effectively this technology that runs on data, but you only have 9% of people that are able to access it. What is what's going on there? >> Well, I would say putting in a cloud commercial, that was the value proposition that we've talked about that our new products are designed to address. But but taking a big step back, what AI needs to run is it needs um accelerated compute and it needs high fidelity data. the accelerated compute and the models are being deployed, but but large corporations, they're they're data estates are like a dusty old closet with things shoved in drawers and and in in forums where people get together and talk about it, people are kind of embarrassed, but it turns out everybody has the same issue. You know, we don't have a clean and pristine set of data across our various enterprise applications. Um but the AI the AI initiatives are rolling out and so it is running very quickly to try to maintain or improve the quality of the data. What our perspective has been the answer can't be you take all that data and move it to the cloud so it can run very neatly on these cloud-based models because then you lose kind of control over that enterprise context that you've built over years. the transactions with your customers, the unique insights that you have. Um, but you also can't wait a year or two for it to get the data in shape and put it into uh one place so that you can bring the basically the models to that data. So, so what we're trying to do is what we are doing actually through a combination of research and development and our new our new basically uh our new iteration of our product is we're going to give customers the ability to create an orchestration layer that overlays both their on premise whether it's in Cloudera or other applications data stores um and also overlays cloud-based applications and hyperscalerbased data. So you can basically create effectively a a an open data lake based on those component parts without having to move everything to one place. And we also are able to do what we call the data wrangling, right? The data cleansing. So you can have a pool of data in one place without having to basically rebuild from the ground. And what that means is you can get up and running on a highfidelity set of data much more quickly. And it addresses the issue we've talked about, which is if AI is built on good data and your data is not good, you're not going to have quality answers from the models you deploy. >> So, can you give me an example of what happens when all this goes right? >> Um, I actually have a really interesting example that one of our customers gave gave us a couple of days ago. Um, large global financial services institution, non US-based. Um and they have to and they have transactions that happen around the world their customers that are flagged as being suspicious as a regulated bank and with global know your customer and anti-fraud anti-moneyaundering rules they have to evaluate each one of those. So originally and and they've got and and it's a bank that's come together through acquisition. They have a business they bought in geography A and they're on different systems with different repositories and then the systems they have could be securities trading systems, cash machine systems and and all these are not tightly integrated because they're all kind of have been separate over time. So you basically have this issue that's flagged and you have to have a human go investigate it. You know what were their credit card transactions? Did they happen to buy a plane ticket and let's go to this place? And it would take a thousand people a full-time job to basically on a daily basis go through these types of issues. And now they've created an agent where they where basically when an incident comes up, they score it based on this agent going looking, oh my gosh, there's been I'm I'm making this up, but there was a cash deposit in Malta. That's very odd. Have they been in Malta? Well, yes, actually. They bought a plane ticket to Malta, and they were actually at Starbucks in Malta and bought a coffee. So okay that makes sense. Whereas if it doesn't make sense it gets a a different score. So then they create this basically scoring system for these incidents and they found that they can draw a waterline and and let's say if the score is 40 out of 100 everything at 40 and below that's no issue. And then above 40 they know that the top 10% are highly likely fraudulent. So they put their human investigators already with the data file that's been created by the agent so they can more quickly resolve these cases. and it's allowed them to basically take a team of a thousand and repurpose the majority of those people to other functions within the bank. So that's a savings of tens of millions of dollars and a more efficient process and better for the customer who gets his or her credit card cut off in Malta because they happen to buy a Starbucks and then make a cash deposit. And so it's one of those things where when I when you when they walked me through I'm like that makes total sense. But it wasn't really possible without the advent not just of generative but a gentic technology. Uh and so I think that's that's an early use case, but we see lots of instances of people rethinking business processes and overlaying the technology on those business processes which is going to give you outcomes that are kind of orders of magnitude better. I think that's a pretty exciting use case. >> So just a couple questions about that. They can do that today. >> They're they've been doing it in production for for for a couple of months now >> and they trust the bot. >> Uh they trust the bot. Um obviously you had a bunch of iterations of testing but the different question is the magic is setting the waterline right at what point do you get the human involved and I think that's one of the things that a lot of the AI strategists that we have talked to have said that there are different situations where you get a human involved at a different point if there are medical related things uh breeding of an X what humans get involved pretty soon um think about the automated trading that happens today on most of our exchanges humans don't get involved in that. So somewhere in that hierarchy from automated trading to healthcare, um there's a different level of involvement for the human. And I think the art in what I've just said is they decided where to draw the water line by deciding the point at which to get the human involved. And I think that's really one of the one of the one of the complex issues that we're going to face is at some point for many of these processes, we're going to want to have a human involved and somehow a human responsible for the outcome. So someone worries about it as opposed to releasing these agents into the wild to see what happens. >> Yeah. And it's it's kind of interesting because it is it is using if I'm getting it right, it's using the large language model. Yes. To take all these inputs and make sense of them, but because it's operating on good data, >> right? >> It's actually able to make logical conclusions, >> right? Well, and the other thing that's important there is we talk about the concept of private AI. And private AI is basically you you buy a pre-trained model, you do some fine-tuning um but you do fine-tuning on your unique data. The concept of private is using basically your unique data and your enterprise context in a way that informs your models to optimum effect without allowing that data to escape outside of your security perimeter. So you don't want your most proprietary data outside. And so the other thing this this the point this makes is they've done this internally on their systems because they wouldn't necessarily want to have um all of that information about specific transactions and customers in some place in some repository in some vehicle where they didn't have full command and control. And that's also I mean that's an operational concern but this is also a place where they have huge data sovereignty concerns and those are increasingly becoming impactful in terms of how customers think about data security, data privacy and ultimately data sovereignty. Do you think that line of where it it makes sense for a human to get involved will just keep going up over time or is this kind of as good as it's going to get? >> Uh I will tell you if if the improvement in the large language models that I use as an indication the line's going to keep going up. The technology is iterating very fast and getting better much more quickly than I would have thought. >> I'm sure you've seen the studies the MIT study for instance that's cited everywhere that we've talked about a lot on the show that >> the 95% study. >> That's right. 95% of companies no ROI on AI. This seems like a pretty clearcut case of a company getting ROI on AI. >> Yes. >> Where do you believe those numbers? How should we read the study? >> Um maybe it's also so I think that a lot of things are tried and and fail quickly. Um and and I think that it's also very hard for the business user right now um to to identify the use case that matters and and and and maybe an example of that is it requires not just facility with the technology but also real deep understanding of business and historically um a lot of our IT folks have not been experts in the business and our business folks have not been experts in the IT. So I think it takes a pretty unique individual right now to put those two things together and a lot of the use cases I believe are being driven by you know business users who don't have as much technology experience or IT users who don't have as much business experience driven by the urgency of oh my gosh we got to do something and I think that's what we're seeing right now the we've got to do something so we're going to run a prototype and even if the report up to management is we ran three prototypes and they failed that's still better than we didn't do anything. >> Yeah. You can't go to your CEO and say, "I don't have an AI strategy." You would >> do nothing is not an option. >> Exactly. >> Yes. >> All right. Speaking of CEOs, I'm going to ask you a CEO question. >> Well, if we had one in the room, we could ask him or her. >> Well, I was just speaking with the people running uh search at Google and some of the products there. >> And I I said, look, uh AI is a product with a lot of potential, but it's really hard to know where to invest in because some days it's working well, some days it's not working well, some use cases it makes sense, some use cases it doesn't make sense. So from where you sit, I'd love to hear your perspective on how does Cloudera uh decide whether to go where to go forward and where not to because you have a pretty good business that's not not generative AI related. So you're in this like always day one situation. It's like do you want to reinvent or do you stay with the flagship? How do you think that through as a CEO? Um it it's it's a really it's a really tough question. Um using that day one analogy because a pure day one perspective would say ignore the history from this point going forward. What's the right decision? This is the first day we're in business. And that's that's theoretically true. But we have more than a billion dollars of revenue and 7 or 800 of the world's largest customers who depend on us to manage their enterprise data and make it safe and secure and accessible for all their analytics initiatives. So I can't ignore that. So the answer right now is we kind of have to do both. So we are investing both in that core data platform, the Cloudera data platform that is the data foundation for eight or nine of the top 10 companies in basically every global industry. But those same customers are saying and also because you are the repository within my organization has the most data gravity. what are you doing in terms of AI tools and capabilities so I can use that data in in in an in a data lakehouse fashion so that I can basically use it to train and build um models that my my business users want to deploy and and so I think from our perspective we can kind of straddle both worlds um it would not be um it wouldn't be prudent for me to shift all of our resources to our our new initiatives because we have large organizations say look I need to make sure that you know that I need this this product that runs in my data center to run for the next 20 years. On the other hand, I have innovation requests being driven by every one of those same customers. And so, right now, we're doing both. Um, we're both continue to invest and and evolve our platform, but we're also spending both in terms of internal R&D, but also we've done three acquisitions recently around adding functionality to our data management capabilities. um and also adding containerization capabilities which allows us to basically deliver a cloud-like user experience regardless of computing platform and that's more forward innovation driven. Uh and so right now I don't have I I say I don't have the luxury of shifting everything to the innovation but I've also got a foundation of knowledge of expertise that's built up over the last 15 years of being probably the unquestioned leader in the early stages of big data that we can then leverage to help make better decisions as we think about how to solve people's AI needs. So that's that's a that's a that's a that's a CEO answer where I settled and kind of filibustered. >> That was not a filibuster. That was a legitimate answer. You mean you talked about the We had to do both. We had to do both. >> So, I want to hear about your journey actually because well, I I I can't even imagine what it's been like for you to like see Chip PT come out and then hearing what you've been talking about today. I think corporations were like, we need to put this into place in our companies. Uh probably using off-the-shelf models and then realizing maybe that, you know, they're not going to be as valuable if they're just using public data that they'll be valuable when they are able to connect it with their internal data. And then your phone probably started ringing. So talk a little bit about what it's been like for you. So, so in the early stages look the the the the AI was born in the cloud and all the models were trained on all the publicly available data and as they got better being they got better and better but at that point all the models have been trained on the same data that's not differentiated and then we had customers coming and saying look I've got all this you know customer interaction data or transaction data and and how do I feed that in and it became clear to us about two or two and a half years ago that we had to also incorporate that into fine-tuning models and fine-tuning wasn't even a term back then, right? >> To basically get better outcomes. And the and the example I use is for my legal department, we have thousands of contracts that we've signed over time. And there are certain terms that we've agreed to and certain terms that we negotiate and certain terms that we don't negotiate. And we have we can, you know, feed that into a model to understand here's a contract that comes in. Let's apply our rules. here's our markup based on what we've agreed to in the past and that's fantastic but it's only valuable if it's trained on our own internal context a and b that model does not have to have read war in peace to deliver to us that it doesn't have to be generally trained on everything that's available it has to be trained on our content and so the the point that I make there is people are getting better about large and small language models and where they're appropriate and being more targeted in training models on the right data that gives you the better outcomes. Whereas the early days were boil the ocean, right? >> And so I think >> the progress we've seen in the large language models, the progress we've seen in the sophistication of customers and understanding what to use in terms of different models for different use cases is dramatically different over the last 12 or 18 months and and we're still in the very early innings of this adoption. And so I think we have a we have I say we we have a ringside seat. Arguably we're in the ring for a lot of this and it it's it's pretty exciting. Um but I think the answer is don't get too attached to what you're doing today because something better is coming around the corner. >> I think in the early days of all this when companies were rushing to roll it out, we and the public saw some hilarious examples of them getting it wrong. For instance, car dealerships offering discounts because they had said, "I read the policy and they just hallucinated the discount." And there's a question of like whether they should be held to it or not. And this is I think an important thing when we think about the ROI question and where this is going that until that gets right um those questions will follow uh generative AI and it's like we also talk a lot about are the models getting better. Well, the model can get good up until a certain point. Uh, but once you start bringing in clean data and not subjecting the models to these types of hallucinations, that's when you start being able to put it into into production as a product that works. >> Yeah, I think um models are getting better, but but maybe they're approaching this kind of astoodic barrier where are 13 billion parameters really that much better than 12 billion parameters, right? Maybe nine is good, maybe five. we someone mathematically has done the work, but my guess is that we're kind of approaching a point where the models themselves aren't getting better, which means the quality of data you're trained on has become increasingly important. And so we're hearing customers talk a lot more about data fidelity, data quality, and then, you know, data lineage, understanding where the data comes from that trains your models. And and so I believe that what's happening is that enterprise data is being revalued as a corporate asset and people are now willing to spend more money to get it right. you know, whereas historically if I'm uh you know an an IT leader and we had projects that came up and one was cyber security we're funding that right and one was analytics well we're funding that and one is um data governance and and kind of data fidelity like I don't even know what that means that goes to the bottom of the list and so I think a lot of companies did neglect their data stores uh but I think as as recently as 18 months ago that was the case but now it's a very high priority for customers because again components are accelerated compute and high fidelity data and if the models are nearly optimized data is your next chance to improve your quality of outcome. >> All right Charles if people want to learn more where can they go? >> Uh cloudera.com we have a tremendous uh group of of both written and and video materials online. We've just introduced a bunch of new products that help us get to this vision. So I'd encourage people to learn more about the company and obviously we're excited about what's going on. >> Awesome. Charles, thank you for the conversation. Thank you. Appreciate it. >> All right, everybody. Thank you for watching. We'll be back on the feed soon.