Critical AI Inference your CIO can Trust — Sahil Yadav, Hariharan Ganesan, Telemetrak
Channel: aiDotEngineer
Published at: 2025-07-22
YouTube video id: 6Tpm4m1YxHk
Source: https://www.youtube.com/watch?v=6Tpm4m1YxHk
[Music] Hey guys, thanks for being here. I'm Sahil. Uh I am here with Hari. We're we're presenting on AI. Um obviously we're talking about trust, but let me give you a little background about us. Um over the past 10 years we have deployed AI in various industries from from health um monitoring to industrial IoT to uh network automation in telecom networks and uh there's been one question that has been asked all along all this time can we trust AI some of these systems are used in mission critical applications but the question really is can we trust the inferences of the these AI systems because they are impacting businesses, they're impacting business decisions and the bottom line at the end of the day. So, we're going to explore that topic today with that. Let me get us started. All right. So, you know, just like any presentation, we'll start with some stats. Um, so McKenzie is saying 78% of the companies are adopting AI. There's another research from EVI says 95% investing in AI. But here's the problem. Only 11% of the companies are focused on AI's governance, ensuring safe practices with AI. So that 67% gap is going to be a huge problem because it's it's not just about uh implementing the AI in the right way, it's also about understanding the impact of that AI in the long run. And let me quantify that with some examples. So you see a couple of examples here. Telecom disruption. Um what really happened here was AI made a made some decision. Based on that there was a network disruption. Now if you look at AT&Ts and Verizons of the world, they're spending millions of dollars each minute the network is um not working. Another example here is uh a gas sensor misinterpreted data that put human lives at risk. Third company lost millions of dollars because the supply chain uh company uh AI screwed up the SKs and you know ended up in losses. So what we're what we're trying to say here is that these are silent failures. You cannot quantify the impact of these failures ahead of time. You cannot see them coming ahead of time but they are they they are worth millions and billions of dollars over time. So this is extremely extremely impactful as as AI is getting adopted. So taking a view of what trustworthy AI looks like there are three main pillars. You talk about explanability. When you're talking about explanability, I think the most important thing is um having a view of what's really under the hood. Um otherwise, you're just flying blind. You have to understand why those inferences are being made on on what basis. The other thing is traceability. Um it's like a flight recorder. It's capturing all the order trails. It's ensuring that you can retrace the steps and based on that you can understand uh the particular situation, recreate the situation and being able to solve it again. Guardrails extremely important there to ensure that you don't end up with millions of dollars of losses. Uh there there there is some threshold where where it stops. The AI has got to stop. So together all of these build trust in a real system. More importantly, when you talk about real world scenarios where you're implementing this, you're talking about scalability, these are the pillars to think about when you're scaling them in the real world. I'll have Hari talk about the pillars of trust. Hey, thanks a so every mission critical system that we rely on today, be it aircrafts, be it energy grids or be it even the simple banking financial systems are built on principles of safety and understanding. Right? Our AI systems should be no different. So looking at the first pillar, right? First A has to show its work. Every important decision shouldn't be a mystery. It should come with a simple English explanation so that a end user, a decision maker, somebody who is auditing the system is able to act on the information and not look for a data scientist to explain or translate what the system actually means. That's the first pillar. The second pillar, adaptive control. What do we mean by that? It's about building smart guard rails. If the AI system starts to wear off, makes a wrong decision, the system should be able to slow down, change its course, or at least call a human for help. Think of it as a lane assist for your AI. The third pillar is always have human in the loop. What do I mean by that? This is basically setting up the roles and the playbooks so that the right experts gets pinged in the right time with the right information without causing a overhead for both the system as well as the person. Right? But all of these things are built on the bedrock foundation of traceability. Every data, every change is digitally signed and is trackable. Think of the concept of it like software bill of materials or even simple. Think of it like your FedEx package. From the time it leaves the warehouse till it reaches your doorstep, you can track every single step of it. So this was our this is the three pillars of a trustworthy AI. But with these pillars in place, the larger question is how do we actually weave them into the AI systems we are building and running today. Right? Let's look into that journey. So like I said, how do we make these pillars reality in day-to-day AI operations? This is where um we move beyond the standard MLOps and what we call it as XTOPS. Think of it as an MLOps but with built-in conscience and a direct line of human oversight. This diagram isn't just a flowchart. It's the blueprint for the entire life cycle of AI. Let's begin with the verifiable traceability. Right from the data stage, know where your data comes from. Understand what are all the changes and how it is changing. No more guess works. When we train the models, we just don't train them for accuracy, right? We are embedding actionable intelligibility. What does it means is the model also learn to explain itself so that we can spot when its reason starts to drift when we deploy right we put those adaptive cruise controls that we talk about this is where the guards kicks in automatically adjusting to new situation new data and pausing to look at things if they drift and when we deploy the model this is where the human AI teaming comes in right this This is where the actual real world feedback kicks in so that we could quickly improve the system and humans can step in when needed. XTOPS is not about creating a is about creating a system where every AI decision has a clear why, a when and a who and attached to it. It is about moving from just launching an AI system to launching an AI which we can truly trust. So let's pause here right now. You all might be thinking hey we do MLOps day in and day out most of these modules that we spoke about is already there. So what is unique right? What is a big difference in doing this? The challenge is adopting an XTOPS is like a journey. It's not a flip of a switch. XTOPS is also taking about all those foundational pieces that we have and giving them a serious integrated upgrade especially for trust. I'm not going to go through all of this but let me touch upon a couple of things. Let's think guardrails and policies. We have IM policies. We have security policies. MLOps provide us everything right but XTOPS gives you dynamic AI aware guardrails that you can actually understand the context and block a risky AI decision. Let's talk about monitoring and metrics. We do have standard MLOps metrics right but XTOPS gives you dedicated trust specific dashboards that both your leadership and the boards can understand. human in the feedback. We do have human in the loop but it is mostly adoc when it comes to MLOps Xtops. Think of it it is creating a fast lane. You click to fix workflows where human can look at some of these quick changes and go back and fix it. But the larger context is XTOPS is not reinventing the wheel, right? It's about adding advanced safety and transpar transparency features needed for the high stakes enterprise of AI. And what is in for us? We spend less time firefighting unpredictable AI behaviors and spend more time actually building more innovative products. So if you are serious about managing AI trust, we also need to measure what matters, right? So we talk about two metrics here, MTR and trust adjusted risk in dollars. The MTRE stands for mean time to resolve explainable errors. Fancy name, but very simple idea. It's basically the time that takes for us to fix something unexpected when it happens to how quickly can we understand the why and response with the fix. The faster your MT is the team is more agile, less defects in the product and quicker to solve the problems. Second, trust adjusted risk and dollars. This idea is basically to put a price tag on what happens when the trust breaks, right? What is actually the business cost? Is it fines? Is it lost customers? Is it damaged reputation? Right? Or and if your AI system keeps failing or remains a black box, this metric makes value of the trust I think we yeah. So let me again pause here. We spoke about metrics. So why obsess about all these metrics? Right? This is we have enough of metrics in MLOps. We have enough of metrics. But why absess? The challenge is this. Look at the first table. Right? On an average, an MPT takes several months in some of these cases to even find a resolution. Right? Now, imagine this. Imagine your AA is making a biased decision for months. The damage escalates quickly and sometimes it also escalates exponentially. Now look at the second table. It actually shows the fallout, right? It is not just not one parameter. It starts with your direct fines. It starts with your engineering effort, regulatory scrutiny and above all the loss of trust and brand value of the products that we stand day in and day out for. Right? These are in small figures. A serious incident like a privacy bug or a bias in a credit card system could quickly escalate up to 700 millions of dollars. Right? So this is why these metrics are not just about defense. These are about building resilient, reliable and ultimately AI powered products that the end users can trust. All said and we are not talking out of thin air. So Sahil is going to present a case study on a real incident and how we went about building this whole framework. >> All right, perfect. So that's the right stage for let's bring it all together. Uh I'm going to talk about a company called Guard Hat. uh this is a company that I used to work for. It's uh focused on uh worker safety. So more specifically uh it has an AIdriven platform that is geared towards uh solving worker safety problems in hazardous environments. So what we're doing here is we built a we built IoT devices variable devices that would be worn by the workers and at some point in time uh uh these devices would get deployed activated and they they will collect data they'll collect health data as well as um as well as environmental data and then that data is sent to the backend system where the AI analyzes this in real time and based on that it is able to identify when an incident predict when an incident is about to happen and and you know you can pre prevent that incident from happening. So very missionritical application. Um it was great because uh while we were saving lives in a way um there was there were enormous challenges. One of the inputs to the uh to the AI platform was the GPS and as a result uh 70% of the cases were false positives. And um it's easier to say this now because it's after the fact but back then we didn't know that. And so the behavior of the user was that at that point in time uh the user stopped uh reacting to the alerts. They started ignoring the alerts and that caused a huge safety risk not just for the people of course there lines at stake lives at stake but even for the company from the liability point of view that uh workers were not uh reacting to alerts. So we went back to the drawing board. Um we started identifying the issues. Um and you know if we were to do this without the XDOPS framework uh we would probably do an MTR meantime to resolution if you look at it. Um you know 70% of the time is spent in identifying the problem. Another 20 is spent in um finding a solution and then you deploy it. But I think the most critical part is that there is no system to identify that GPS drift. We wouldn't know about it and then because it's such a complicated model and code that it's really hard to identify what's causing that problem. So if you were to apply this model um day zero uh you get an alert that was ignored um during an incident. Day two, there is an attribution telemetry that'll flag the normally and day seven you have a solution deployed which uh which fixes the GPS drift uh or at least finds a reroute to GPS drift. >> Sorry. >> Now having said that um to be real this problem did not get solved in 7 days. It took 8 months for us but it this was a model problem that actually helped us to build this framework. And once we were able to build this framework, these kind of problems can be solved in seven days. We we tested it across our enterprise and eventually became an enterprise standard. So all this is great. Um you have the impact, you have uh you you can see the value in this, but here's the big question. What do how do you convince the CIOS? How do they look at all of this and find value in this? What is the language that they talk? The answer is. >> So you got to convince the CIOS that this is saving money. And if you were to look at the left side of the slide, you' see that the risk exposure that we're looking at is approximately $2.5 million per site per per year. Now some direct impact with this is that we were able to solve the uh with this with this product we were or this structure we were able to solve the fines and we saved 500k in fines every year per site. Beyond this, some of the indirect uh benefit was that this system in if it were were to be working correctly was supposed to prevent incidents all of them but it was preventing x percentage of incidences because it wasn't working correctly but with this structure it did work correctly and then after that it was you got the uh um the the remaining value as well. So I just want to wrap it up real quick. Um I think the outcome you can see a lot of value there in terms of you know false alerts came down uh trust score went up that means uh people started using those alerts they were able to see value in those alerts I think more important things were related to the telemetry itself um understanding why a particular inference was made uh having the control where you can if there's a GPS drift you you're able to switch and then most important thing human in the loop So if these things happen, someone is notified, we created a dashboard where someone is notified and someone is able to take action and retrain the the the model. So with that um I mean this slide is just uh a highle overview of what we presented to you today. Um thank you for being here. Uh we'll just leave it at that. Thank you so much for the fantastic talk on the trust gap. Um I actually had a question for you because we have a minute or so. Um so the phrasing that you had around like the trust a trust adjusted risk cost premium. How do you advise people to think about like reputational damage like is this something that you have thought about measuring or investigating at all? >> Uh you want to take it? Sure. So it's a like I was saying in the beginning these are silent failures. You cannot it's really hard to quantify the impact and reputational damage is is again right on top of that list. So um it's really hard to measure that to be honest but all you can do in in this kind of a case is you know you can you can find you know people are creative you can find ways to quantify some dollars to it but you know it's really hard to predict. Let me there's no short answer to it. Let me just say that. [Music]