Critical AI Inference your CIO can Trust — Sahil Yadav, Hariharan Ganesan, Telemetrak

Channel: aiDotEngineer

Published at: 2025-07-22

YouTube video id: 6Tpm4m1YxHk

Source: https://www.youtube.com/watch?v=6Tpm4m1YxHk

[Music]
Hey guys, thanks for being here. I'm
Sahil. Uh I am here with Hari. We're
we're presenting on AI. Um obviously
we're talking about trust, but let me
give you a little background about us.
Um over the past 10 years we have
deployed AI in various industries from
from health um monitoring to industrial
IoT to uh network automation in telecom
networks and uh there's been one
question that has been asked all along
all this time can we trust AI
some of these systems are used in
mission critical applications but the
question really is can we trust the
inferences of the these AI systems
because they are impacting businesses,
they're impacting business decisions and
the bottom line at the end of the day.
So, we're going to explore that topic
today
with that. Let me get us started.
All right. So, you know, just like any
presentation, we'll start with some
stats. Um, so McKenzie is saying 78% of
the companies are adopting AI. There's
another research from EVI says 95%
investing in AI. But here's the problem.
Only 11% of the companies are focused on
AI's governance, ensuring safe practices
with AI. So that 67% gap is going to be
a huge problem because it's it's not
just about uh implementing the AI in the
right way, it's also about understanding
the impact of that AI in the long run.
And let me quantify that with some
examples.
So you see a couple of examples here.
Telecom disruption. Um what really
happened here was AI made a made some
decision. Based on that there was a
network disruption. Now if you look at
AT&Ts and Verizons of the world, they're
spending millions of dollars each minute
the network is um not working.
Another example here is uh a gas sensor
misinterpreted data that put human lives
at risk. Third company lost millions of
dollars because the supply chain uh
company uh AI screwed up the SKs and you
know ended up in losses. So what we're
what we're trying to say here is that
these are silent failures. You cannot
quantify the impact of these failures
ahead of time. You cannot see them
coming ahead of time but they are they
they are worth millions and billions of
dollars over time. So this is extremely
extremely impactful as as AI is getting
adopted.
So
taking a view of what trustworthy AI
looks like there are three main pillars.
You talk about explanability.
When you're talking about explanability,
I think the most important thing is um
having a view of what's really under the
hood. Um otherwise, you're just flying
blind. You have to understand why those
inferences are being made on on what
basis. The other thing is traceability.
Um it's like a flight recorder. It's
capturing all the order trails. It's
ensuring that you can retrace the steps
and based on that you can understand uh
the particular situation, recreate the
situation and being able to solve it
again. Guardrails extremely important
there to ensure that you don't end up
with millions of dollars of losses. Uh
there there there is some threshold
where where it stops. The AI has got to
stop. So together all of these build
trust in a real system. More
importantly, when you talk about real
world scenarios where you're
implementing this, you're talking about
scalability,
these are the pillars to think about
when you're scaling them in the real
world.
I'll have Hari talk about the pillars of
trust. Hey, thanks a so every mission
critical system that we rely on today,
be it aircrafts, be it energy grids or
be it even the simple banking financial
systems are built on principles of
safety and understanding. Right? Our AI
systems should be no different. So
looking at the first pillar, right?
First A has to show its work. Every
important decision shouldn't be a
mystery. It should come with a simple
English explanation so that a end user,
a decision maker, somebody who is
auditing the system is able to act on
the information and not look for a data
scientist to explain or translate what
the system actually means. That's the
first pillar. The second pillar,
adaptive control. What do we mean by
that? It's about building smart guard
rails. If the AI system starts to wear
off, makes a wrong decision, the system
should be able to slow down, change its
course, or at least call a human for
help. Think of it as a lane assist for
your AI. The third pillar is always have
human in the loop. What do I mean by
that? This is basically setting up the
roles and the playbooks so that the
right experts gets pinged in the right
time with the right information without
causing a overhead for both the system
as well as the person. Right? But all of
these things are built on the bedrock
foundation of traceability.
Every data, every change is digitally
signed and is trackable. Think of the
concept of it like software bill of
materials or even simple. Think of it
like your FedEx package. From the time
it leaves the warehouse till it reaches
your doorstep, you can track every
single step of it. So this was our this
is the three pillars of a trustworthy
AI. But with these pillars in place, the
larger question is how do we actually
weave them into the AI systems we are
building and running today. Right? Let's
look into that journey.
So like I said, how do we make these
pillars reality in day-to-day AI
operations?
This is where um we move beyond the
standard MLOps and what we call it as
XTOPS. Think of it as an MLOps but with
built-in conscience and a direct line of
human oversight. This diagram isn't just
a flowchart. It's the blueprint for the
entire life cycle of AI. Let's begin
with the verifiable traceability. Right
from the data stage, know where your
data comes from. Understand what are all
the changes and how it is changing. No
more guess works. When we train the
models, we just don't train them for
accuracy, right? We are embedding
actionable intelligibility. What does it
means is the model also learn to explain
itself so that we can spot when its
reason starts to drift when we deploy
right we put those adaptive cruise
controls that we talk about this is
where the guards kicks in automatically
adjusting to new situation new data and
pausing to look at things if they drift
and when we deploy the model this is
where the human AI teaming comes in
right this This is where the actual real
world feedback kicks in so that we could
quickly improve the system and humans
can step in when needed.
XTOPS is not about creating a is about
creating a system where every AI
decision has a clear why, a when and a
who and attached to it. It is about
moving from just launching an AI system
to launching an AI which we can truly
trust.
So
let's pause here right now. You all
might be thinking hey we do MLOps day in
and day out most of these modules that
we spoke about is already there. So what
is unique right? What is a big
difference in doing this? The challenge
is adopting an XTOPS is like a journey.
It's not a flip of a switch. XTOPS is
also taking about all those foundational
pieces that we have and giving them a
serious integrated upgrade especially
for trust. I'm not going to go through
all of this but let me touch upon a
couple of things. Let's think guardrails
and policies. We have IM policies. We
have security policies. MLOps provide us
everything right but XTOPS gives you
dynamic AI aware guardrails that you can
actually understand the context and
block a risky AI decision. Let's talk
about monitoring and metrics. We do have
standard MLOps metrics right but XTOPS
gives you dedicated trust specific
dashboards that both your leadership and
the boards can understand. human in the
feedback. We do have human in the loop
but it is mostly adoc when it comes to
MLOps Xtops. Think of it it is creating
a fast lane. You click to fix workflows
where human can look at some of these
quick changes and go back and fix it.
But the larger context is XTOPS is not
reinventing the wheel, right? It's about
adding advanced safety and transpar
transparency features needed for the
high stakes enterprise of AI.
And what is in for us? We spend less
time firefighting unpredictable AI
behaviors and spend more time actually
building more innovative products.
So if you are serious about managing AI
trust, we also need to measure what
matters, right? So we talk about two
metrics here, MTR
and trust adjusted risk in dollars. The
MTRE stands for mean time to resolve
explainable errors. Fancy name, but very
simple idea. It's basically the time
that takes for us to fix something
unexpected when it happens to how
quickly can we understand the why and
response with the fix. The faster your
MT is the team is more agile, less
defects in the product and quicker to
solve the problems. Second, trust
adjusted risk and dollars.
This idea is basically to put a price
tag on what happens when the trust
breaks, right? What is actually the
business cost? Is it fines? Is it lost
customers? Is it damaged reputation?
Right? Or and if your AI system keeps
failing or remains a black box, this
metric makes value of the trust
I think we
yeah.
So let me again pause here. We spoke
about metrics. So why obsess about all
these metrics? Right? This is we have
enough of metrics in MLOps. We have
enough of metrics. But why absess? The
challenge is this. Look at the first
table. Right? On an average, an MPT
takes several months in some of these
cases to even find a resolution. Right?
Now, imagine this. Imagine your AA is
making a biased decision for months. The
damage escalates quickly and sometimes
it also escalates exponentially. Now
look at the second table. It actually
shows the fallout, right? It is not just
not one parameter.
It starts with your direct fines. It
starts with your engineering effort,
regulatory scrutiny and above all the
loss of trust and brand value of the
products that we stand day in and day
out for. Right? These are in small
figures. A serious incident like a
privacy bug or a bias in a credit card
system could quickly escalate up to 700
millions of dollars. Right? So this is
why these metrics are not just about
defense. These are about building
resilient, reliable and ultimately AI
powered products that the end users can
trust. All said and we are not talking
out of thin air. So
Sahil is going to present a case study
on a real incident and how we went about
building this whole framework.
>> All right, perfect. So that's the right
stage for let's bring it all together.
Uh I'm going to talk about a company
called Guard Hat. uh this is a company
that I used to work for. It's uh focused
on uh worker safety. So more
specifically uh it has an AIdriven
platform that is geared towards uh
solving worker safety problems in
hazardous environments. So what we're
doing here is we built a we built IoT
devices variable devices that would be
worn by the workers and at some point in
time uh uh these devices would get
deployed activated and they they will
collect data they'll collect health data
as well as um as well as environmental
data and then that data is sent to the
backend system where the AI analyzes
this in real time and based on that it
is able to identify when an incident
predict when an incident is about to
happen and and you know you can pre
prevent that incident from happening. So
very missionritical application.
Um it was great because uh while we were
saving lives in a way um there was there
were enormous challenges. One of the
inputs to the uh to the AI platform was
the GPS and as a result uh 70% of the
cases were false positives. And um it's
easier to say this now because it's
after the fact but back then we didn't
know that. And so the behavior of the
user was that at that point in time uh
the user stopped uh reacting to the
alerts. They started ignoring the alerts
and that caused a huge safety risk not
just for the people of course there
lines at stake lives at stake but even
for the company from the liability point
of view that uh workers were not uh
reacting to alerts.
So we went back to the drawing board. Um
we started identifying the issues. Um
and you know if we were to do this
without the XDOPS framework
uh we would probably do an MTR meantime
to resolution if you look at it. Um you
know 70% of the time is spent in
identifying the problem. Another 20 is
spent in um finding a solution and then
you deploy it. But I think the most
critical part is that there is no system
to identify that GPS drift. We wouldn't
know about it and then because it's such
a complicated model and code that it's
really hard to identify what's causing
that problem.
So if you were to apply this model um
day zero uh you get an alert that was
ignored um during an incident. Day two,
there is an attribution telemetry
that'll flag the normally and day seven
you have a solution deployed which uh
which fixes the GPS drift uh or at least
finds a reroute to GPS drift.
>> Sorry.
>> Now having said that um to be real this
problem did not get solved in 7 days. It
took 8 months for us but it this was a
model problem that actually helped us to
build this framework. And once we were
able to build this framework, these kind
of problems can be solved in seven days.
We we tested it across our enterprise
and eventually became an enterprise
standard. So all this is great. Um you
have the impact, you have uh you you can
see the value in this, but here's the
big question.
What do how do you convince the CIOS?
How do they look at all of this and find
value in this? What is the language that
they talk? The answer is.
>> So you got to convince the CIOS that
this is saving money. And if you were to
look at the left side of the slide, you'
see that the risk exposure that we're
looking at is approximately $2.5 million
per site per per year. Now some direct
impact with this is that we were able to
solve the uh with this with this product
we were or this structure we were able
to solve the fines and we saved 500k in
fines every year per site. Beyond this,
some of the indirect uh benefit was that
this system in if it were were to be
working correctly was supposed to
prevent incidents all of them but it was
preventing x percentage of incidences
because it wasn't working correctly but
with this structure it did work
correctly and then after that it was you
got the uh um the the remaining value as
well. So
I just want to wrap it up real quick. Um
I think the outcome you can see a lot of
value there in terms of you know false
alerts came down uh trust score went up
that means uh people started using those
alerts
they were able to see value in those
alerts I think more important things
were related to the telemetry itself um
understanding why a particular inference
was made uh having the control where you
can if there's a GPS drift you you're
able to switch and then most important
thing human in the loop
So if these things happen, someone is
notified, we created a dashboard where
someone is notified and someone is able
to take action and retrain the the the
model.
So with that um I mean this slide is
just uh a highle overview of what we
presented to you today. Um thank you for
being here. Uh we'll just leave it at
that.
Thank you so much for the fantastic talk
on the trust gap. Um I actually had a
question for you because we have a
minute or so. Um so the phrasing that
you had around like the trust a trust
adjusted risk cost premium. How do you
advise people to think about like
reputational damage like is this
something that you have thought about
measuring or investigating at all?
>> Uh you want to take it? Sure. So it's a
like I was saying in the beginning these
are silent failures. You cannot it's
really hard to quantify the impact and
reputational damage is is again right on
top of that list. So um it's really hard
to measure that to be honest but all you
can do in in this kind of a case is you
know you can you can find you know
people are creative you can find ways to
quantify some dollars to it but you know
it's really hard to predict. Let me
there's no short answer to it. Let me
just say that.
[Music]