The Billable Hour is Dead; Long Live the Billable Hour — Kevin Madura + Mo Bhasin, Alix Partners

Channel: aiDotEngineer

Published at: 2025-07-23

YouTube video id: Wv1tAxKYLeE

Source: https://www.youtube.com/watch?v=Wv1tAxKYLeE

[Music]
I'm Mo. Uh I'm director of AI products
at Alex Partners. Prior to this, I was a
co-founder of an anomaly detection
anomaly detection startup and prior to
that I was a data scientist at Google.
uh together we co-lead the development
of a internal Genai platform. We've been
working it for the last two years. Uh we
have 20 engineers. We've scaled it to 50
deployments uh and hundreds of users and
we're excited to tell you everything
we've learned on that journey. Great.
And I'm Kevin Madura. I help companies,
courts, and regulators understand new
technologies like AI and LLMs. As Mo
mentioned, both of us work at a company
called Alex Partners. It's a global
management consulting firm. I realize
lots of you in this room might be
rolling your eyes at that, rightfully
so, but I like to think our firm does a
little bit more than deliver
PowerPoints. We actually roll up our
sleeves and and solve problems, whether
that's coding or or actually uh getting
into the weeds of things. So, we're here
to talk to you today about really three
different things. One is how we see AI
reshaping knowledge work as we see it
today. So, a lot of how it's impacting
professional services, advisory
services, that sort of thing. We'll
bring three real life use cases uh that
we'll walk through in terms of how we've
actually deployed it realistically,
concretely within the way that we work
in our business and then wrap up with
what doesn't work and where we see
things going.
So, some of you here might recognize
this chart from an organization called
Meter, which evaluates the ability for
LLMs to complete a a certain set of
tasks, and it very specifically measures
the length of task that LMS can
complete, at least with 50% um success
rate. And so, the takeoff rate is pretty
significant here. Um now, we think
that's mostly because it's a verifiable
domain and as we all know, model
capabilities are a little bit jagged. So
they perform very very well in software
development maybe not so well in uh
non-verifiable or or more messy domains
like knowledge work. So we think it's a
it's a rough proxy for the coming
disruption for professional services and
and knowledge work more broadly. Do we
think the takeoff will be as steep as
software engineering? Probably not just
because of the messiness of of the real
world if you will. Um and for those of
you not familiar there there's typically
two main models for professional
services. One is the junior le model.
This is where you have very senior
individuals and uh more junior
individuals provide that leverage. So
it's a lot of directing. Okay, do this
and you throw 50 people at a problem and
they kind of figure it out and probably
waste some time in doing so. There's
also the senior le model which is more
senior folks who have 15 20 years of
experience. They're much more involved
in the day-to-day. They're actually
doing the work, delivering the work.
This is the Alex Partners model uh where
it's a little bit less leverage um but
we you know can can deliver results uh a
lot faster and more more impactfully
because it's the senior le uh folks. We
think the future is probably somewhat of
a hybrid but we think because of model
capabilities and how quickly they're
advancing it really provides that those
more experienced folks. people have been
in a particular domain or industry for
15 20 years. Um if you've listened to
Dwaresh Patel and his podcast, fantastic
podcast, he has this concept of an AI
first firm where you can basically take
the knowledge and start to replicate
that out. So you can have 50 copies of
the CEO as an example. We think the
future is something like that where you
have you're basically replicating the
knowledge experience of more senior
individuals and you provide and you
scale out that leverage below using AI
to do so.
And so the way we think about typical
engagements, um it's really it roughly
falls into these three different
buckets. Not always, but for just for
demonstration purposes, there's a lot of
upfront work initially. Um whether it's
an M&A transaction, a corporate
investigation, some type of due
diligence. Oftentimes, you're left with
a bunch of PDFs, databases, Excels,
whatever it might be. There's just a lot
of upfront work to just understand what
you've got, right? just ingest the data,
normalize it, categorize things, put it
into a framework that you can then use
to do what you do best, which what
whatever that might be. If you're a
private equity equity expert or
investigator, whatever it is, you
typically have some type of playbook,
and that's phase phase two, which is the
black part, which is the analysis, the
hypothesis generation. You're basically
getting all that data into a format that
then you can you can take and use um and
derive some type of insights from. And
all of that, of course, is in support of
the the last piece, which is really what
what clients actually care about, which
is you solving their business problem.
That's the recommendation, the
deliverable, the output, whatever that
might be that that's the reason that
they've hired you in the first place.
We're seeing AI today just significantly
compressing at at minimum the that first
part. So if if it was 50%, maybe it's 10
to to 20% today in terms of what's
required from a human perspective just
to get up to speed about understanding
the contents of a data room or whatever
it might be.
And it's not only that because to the to
date you're largely limited by the
throughput of human beings. So you think
of Doc Review as an example. If you have
5,000 different contracts, Box is a
perfect um um precursor to this talk
because that's exactly what they do. Um,
if you have 5,000 contracts, think of
how many people it would take if it
takes 30 minutes to review each and
every contract. You have 5,000 of them,
you want to extract some type of
information from it. You're inher
inherently limited by either time or
cost. And so, inevitably, there's some
type of prioritization that occurs.
You're only focusing on kind of the top
20% or whatever it might be, the most
valuable um pieces of the data. With AI,
that's completely changed, right? You
can now look at 100% of the corpus of
data, whatever that might be, and you
can start to derive insights. You can
apply your same methodology, your
analysis, your insights to all of the
data. Now, because you're able to
extract that information from across
100% of the data set. So now you can
look at 100% of the vendor contracts,
100% of the customer base. You can start
to derive those insights to identify
savings opportunities, free up time to
do more interviews, whatever it might
be. you're freed up to do much more
highv value work and the value is that
because it's done across 100% of the
data instead of just the first 20 or so
percent the output is just that much
better so to bring to life a little bit
I'll turn it over to Mo to talk through
some real life examples thanks Kevin
so to motivate the use cases that we
have I want to start with the paradox
that we face um everyone's investing in
AI 89% of CEOs said that uh they're
imple planning to implement agent
authentic AI according to deote but we
find ourselves in this paradox where uh
national bureau of economic research
says that there's been no significant
impact on earnings or recorded hours
BCG says that threequarters of company
failed to struggle and achieve and scale
value with their geni initiatives and
then finally S&P global said that almost
half the uh companies were abandoning
their AI initiatives this year so how is
it that everyone's spending but no one's
seeing the you. We think there's a
difference between employee productivity
and enterprise productivity. And so we
want to talk about the use cases that we
found that help drive enterprise
productivity.
So the first example I want to start
with is categorization.
Maybe trying to put a square peg in a
round hole. How does this show up for
us? Um think if you have IT support
tickets, you laptop keeps restarting and
that needs to be triaged to the hardware
department. um you need to categorize
those tickets accordingly. Something
closer to home uh is we analyze
companies a lot and so we want to look
at accounts payables or spend data
across companies and we need to say what
is United Airlines if it's under travel.
How was this done before?
Does anyone remember word clouds? You'd
have to build a machine learning model.
You'd have to stem your data, remove
stop words, um, build a classifier,
support vector machines, naive bays.
It's a lot of work.
Enter the new way, structured outputs.
So with structured outputs, you can get
the answer a lot easier. This is
unsupervised learning. Uh, this is
literally what that would look like. Say
you have a list of companies, JD
factors, and you have to categorize it
into a taxonomy. Here the taxonomy would
be the North American industry
classification system. The NICS codes
each code has a description. Uh and in
this case it would be other cache
management. For instance, uh typically
JD factors is probably not part of the
foundational model's knowledge. So how
do we ensure that the classification
works? Well, enter tool call. You can
run a web query to append information to
each of these pieces of uh to each of
these companies and then categorize
enormous volumes. Uh so this is what
we've been doing and we found that we've
had huge wins from this. So uh what this
has done is this democratized access to
text classification for us.
I want to talk about the the learnings
that we've had from uh deploying this
surgically at our company. Enomous wins
in speed and accuracy those accuracy
gains have not come cheaply. Uh this
might be unsupervised learning but it's
not unchecked. We've had to have the
right relationships with the business
partners who've worked handinhand with
us to ensure that we get to the accuracy
that we wanted. What this does is
converts skeptics into champions. We
don't become snake oil salesmen pushing
and peddling AI. It becomes a pull from
the firm that's asking us, hey, can you
use this or can you apply Gen AI for us
in these other initiatives, which is
really powerful. Um, it's important to
have business context that gets embedded
for us in those taxonomies which are
being used for classification.
Uh, everyone's talking about agents.
Well, you need to get the individual
steps right correctly. And what this
does is it builds that individual step
to a high level of robustness and
accuracy that we can daisy chain into
the agentic workflows that we want. And
finally, you know, a call out is that
these results are stoastic and not
necessarily uh deterministic. That comes
with some risks. Kevin will talk more
about those.
Punch line here. We've we've been able
to achieve 95% accuracy across 10,000
categorizing 10,000 vendors. Uh doing in
minutes what would have taken days at an
order of magnitude less cost.
All right, next use case. Uh this
wouldn't be an AI conference if we
didn't talk about rag. So what do we how
do we how do we uh see rag at our firm?
You get dumped with a bunch of data.
Here's 80 gigs of internal documents.
What did Acme release in 2020? Uh let's
say you got a court filing that you have
to submit on Monday and it's Friday. You
know, you might get asked a question,
what is Acme's escalation procedures for
reporting safety violations?
How do we do this? In the past, you'd
have an index, a literal index. Someone
would say in an Excel file, what
documents have been received? What
documents haven't been received? And
where are they? Uh or uh hope not, but
maybe you'd use search and you have
SharePoint search or something like that
that uh probably wouldn't find you what
you're looking for. Well, what do we do
now? We have an enterprise scale rag
app. It has to handle hundreds of
gigabytes of data uh powerpoints,
documents, Excel, CSVs, all sorts of
formats, uh and and huge volumes. What
can you append to that? You can append
tool calls to third party proprietary
databases. Let let me talk about that
for a second. What are the trade-offs
that we've had? Sorry, I'm going really
fast, short on time. Um the the wins and
the losses. So it's been rag is
invaluable at at consulting companies
because you get dumped on a project
really quick and you have to get up to
speed. So uh ends up being really
valuable. Uh but I want to call out the
teaching LLM APIs part. Um typically
certain data sources would be siloed
behind organizations that had licenses
that would have to pull information from
a web UI that would then be emailed to a
certain to a certain team and then that
team would analyze the Excel. Well, what
we did was we took the API spec,
embedded it, and taught the LLM how to
call an API. We have democratized access
to information that would otherwise have
taken days for people to use. Really
condensing the time as Kevin said before
in some of the projects on the highv
value work. Uh the last thing to call
out about rag is that it serves as a
substrate on which you can tack on a a
number of geni features that's proven
really valuable for us at our firm. Uh
number of call outs, you know, people
have high expectations on what they what
they want to receive from a a prompt
box. If you say reason across all
documents, that's just not how Rag
works. So we have to build those
solutions step by step and it's a long
journey that we have to go on and we're
excited to be on it. With that, over to
Kevin in the third use case. Yeah. Oh,
thanks. Um, so it's a good thing Box
went before us because they covered a
lot of the advantages of the the ability
fundamentally to take unstructured data
and create structure from that. It it is
an unbelievably powerful concept. It's
it's very simple on its face, but it is
incredibly powerful in an enterprise
context because you can take something
like this credit agreement. It's 50 or
so pages long in terms of a PDF and you
can very quickly extract information
that's useful like contract parties,
maturity date, senior lenders, whoever
that might be. Um, and so you see folks
like Jason Lou, Pinantic is all you
need. It is still true. It is still all
you need. Um, and fundamentally what
this looks like, I box went through a
lot of it, but it's combining a document
with a schema with an LLM with some
validation and scaffolding around it to
make sure that you're pulling out the
the values that you uh that you need.
And the business value really is in the
schema of what you're actually what
you're extracting and why you're
extracting that information. It's the
flexibility um that is really powerful
here because you can start to reapply it
across different types of engagements.
investigations might be looking at
something entirely different than an M&A
transaction. This fundamental capability
can can span across all those and the
power is there at the bottom where you
can do this type of thing repeatedly
across multiple documents up thousands,
tens of thousands, hundreds of thousands
of documents where doing a human review
might take days or weeks. Using an LLM,
you can get it down to minutes. It's
incredibly powerful. Um in terms of user
trust, we um not only are using external
sources like Box and others as well, but
we've we've rolled our own uh internally
as well. And so um in terms of just
exposing some of the model internals to
users to have somewhat of an off-ramp
for them to understand um where the
model is more or less confident, we use
the log probs that's returned from the
OpenAI API and we align that with the
output schema from structured outputs.
So we ignore all the JSON data. We
ignore the field names themselves. We
just home in on the values themselves.
So in this case, the green box above the
interest rate of LIBOR plus 1% peranom.
That's the field that we want. We u
basically take the geometric mean of the
log props associated with those tokens
in particular and use that as a rough
proxy of the model's confidence in
producing that output. So the um the
boxes way at the beginning that you saw
in terms of green and and uh and yellow
is a direct reflection of the confidence
level. So it's a really relatively
intuitive way for users to get an
understanding of the model's confidence
again for for human review to the extent
that's needed. Uh I won't go through all
these but fundamentally like I said it
is magic when it works and it works at
scale. It is a total unlock particularly
for non-technical folks who are not up
to speed with the capabilities of LLM.
to be able to do this is is a light
switch light bulb moment for them. Um,
and it really is a gamecher. Now, uh,
that being said, there's a lot of work
to be done in terms of validation. You
saw all the work that Box and others
have done in terms of getting it to a
level of rigor that you could that users
can trust. U, and so that's really a key
tenant for all this. And so, finally,
uh, I'll turn it to Mo for the must
haves. Ju, so just a couple quick
callouts. Uh I know this is a tech
conference but a lot of this to get to
work at the enterprise requires people
skills and working closely with the
organization. There are a couple things
I want to call out that have been really
important for us to scale our gen
initiatives at our firm. The first one
is uh demos. We we build in Streamlip
but we uh we we prototype in Streamlip
but we build in React. And so we have a
constant cadence once a month that we
show the latest and greatest of what
we're building. This inspires the firm
and what we're able to build and
continue to invest in our uh
initiatives. Uh and then the second
thing is you know there's always the
next shiny thing agents MCP uh the
latest model uh NPS is our our metric
ROI is our metric and that is one
hardearned one bug fix at a time uh I'll
skip the other one you know partnerships
are really important it's a shared
journey so and I think we're out of time
but uh I'll leave you with this once
Excel powered uh LLMs actually work we
will be at AGI so I'm looking forward to
that next talk thank Thank you. Thank
you.
[Music]