The Billable Hour is Dead; Long Live the Billable Hour — Kevin Madura + Mo Bhasin, Alix Partners
Channel: aiDotEngineer
Published at: 2025-07-23
YouTube video id: Wv1tAxKYLeE
Source: https://www.youtube.com/watch?v=Wv1tAxKYLeE
[Music] I'm Mo. Uh I'm director of AI products at Alex Partners. Prior to this, I was a co-founder of an anomaly detection anomaly detection startup and prior to that I was a data scientist at Google. uh together we co-lead the development of a internal Genai platform. We've been working it for the last two years. Uh we have 20 engineers. We've scaled it to 50 deployments uh and hundreds of users and we're excited to tell you everything we've learned on that journey. Great. And I'm Kevin Madura. I help companies, courts, and regulators understand new technologies like AI and LLMs. As Mo mentioned, both of us work at a company called Alex Partners. It's a global management consulting firm. I realize lots of you in this room might be rolling your eyes at that, rightfully so, but I like to think our firm does a little bit more than deliver PowerPoints. We actually roll up our sleeves and and solve problems, whether that's coding or or actually uh getting into the weeds of things. So, we're here to talk to you today about really three different things. One is how we see AI reshaping knowledge work as we see it today. So, a lot of how it's impacting professional services, advisory services, that sort of thing. We'll bring three real life use cases uh that we'll walk through in terms of how we've actually deployed it realistically, concretely within the way that we work in our business and then wrap up with what doesn't work and where we see things going. So, some of you here might recognize this chart from an organization called Meter, which evaluates the ability for LLMs to complete a a certain set of tasks, and it very specifically measures the length of task that LMS can complete, at least with 50% um success rate. And so, the takeoff rate is pretty significant here. Um now, we think that's mostly because it's a verifiable domain and as we all know, model capabilities are a little bit jagged. So they perform very very well in software development maybe not so well in uh non-verifiable or or more messy domains like knowledge work. So we think it's a it's a rough proxy for the coming disruption for professional services and and knowledge work more broadly. Do we think the takeoff will be as steep as software engineering? Probably not just because of the messiness of of the real world if you will. Um and for those of you not familiar there there's typically two main models for professional services. One is the junior le model. This is where you have very senior individuals and uh more junior individuals provide that leverage. So it's a lot of directing. Okay, do this and you throw 50 people at a problem and they kind of figure it out and probably waste some time in doing so. There's also the senior le model which is more senior folks who have 15 20 years of experience. They're much more involved in the day-to-day. They're actually doing the work, delivering the work. This is the Alex Partners model uh where it's a little bit less leverage um but we you know can can deliver results uh a lot faster and more more impactfully because it's the senior le uh folks. We think the future is probably somewhat of a hybrid but we think because of model capabilities and how quickly they're advancing it really provides that those more experienced folks. people have been in a particular domain or industry for 15 20 years. Um if you've listened to Dwaresh Patel and his podcast, fantastic podcast, he has this concept of an AI first firm where you can basically take the knowledge and start to replicate that out. So you can have 50 copies of the CEO as an example. We think the future is something like that where you have you're basically replicating the knowledge experience of more senior individuals and you provide and you scale out that leverage below using AI to do so. And so the way we think about typical engagements, um it's really it roughly falls into these three different buckets. Not always, but for just for demonstration purposes, there's a lot of upfront work initially. Um whether it's an M&A transaction, a corporate investigation, some type of due diligence. Oftentimes, you're left with a bunch of PDFs, databases, Excels, whatever it might be. There's just a lot of upfront work to just understand what you've got, right? just ingest the data, normalize it, categorize things, put it into a framework that you can then use to do what you do best, which what whatever that might be. If you're a private equity equity expert or investigator, whatever it is, you typically have some type of playbook, and that's phase phase two, which is the black part, which is the analysis, the hypothesis generation. You're basically getting all that data into a format that then you can you can take and use um and derive some type of insights from. And all of that, of course, is in support of the the last piece, which is really what what clients actually care about, which is you solving their business problem. That's the recommendation, the deliverable, the output, whatever that might be that that's the reason that they've hired you in the first place. We're seeing AI today just significantly compressing at at minimum the that first part. So if if it was 50%, maybe it's 10 to to 20% today in terms of what's required from a human perspective just to get up to speed about understanding the contents of a data room or whatever it might be. And it's not only that because to the to date you're largely limited by the throughput of human beings. So you think of Doc Review as an example. If you have 5,000 different contracts, Box is a perfect um um precursor to this talk because that's exactly what they do. Um, if you have 5,000 contracts, think of how many people it would take if it takes 30 minutes to review each and every contract. You have 5,000 of them, you want to extract some type of information from it. You're inher inherently limited by either time or cost. And so, inevitably, there's some type of prioritization that occurs. You're only focusing on kind of the top 20% or whatever it might be, the most valuable um pieces of the data. With AI, that's completely changed, right? You can now look at 100% of the corpus of data, whatever that might be, and you can start to derive insights. You can apply your same methodology, your analysis, your insights to all of the data. Now, because you're able to extract that information from across 100% of the data set. So now you can look at 100% of the vendor contracts, 100% of the customer base. You can start to derive those insights to identify savings opportunities, free up time to do more interviews, whatever it might be. you're freed up to do much more highv value work and the value is that because it's done across 100% of the data instead of just the first 20 or so percent the output is just that much better so to bring to life a little bit I'll turn it over to Mo to talk through some real life examples thanks Kevin so to motivate the use cases that we have I want to start with the paradox that we face um everyone's investing in AI 89% of CEOs said that uh they're imple planning to implement agent authentic AI according to deote but we find ourselves in this paradox where uh national bureau of economic research says that there's been no significant impact on earnings or recorded hours BCG says that threequarters of company failed to struggle and achieve and scale value with their geni initiatives and then finally S&P global said that almost half the uh companies were abandoning their AI initiatives this year so how is it that everyone's spending but no one's seeing the you. We think there's a difference between employee productivity and enterprise productivity. And so we want to talk about the use cases that we found that help drive enterprise productivity. So the first example I want to start with is categorization. Maybe trying to put a square peg in a round hole. How does this show up for us? Um think if you have IT support tickets, you laptop keeps restarting and that needs to be triaged to the hardware department. um you need to categorize those tickets accordingly. Something closer to home uh is we analyze companies a lot and so we want to look at accounts payables or spend data across companies and we need to say what is United Airlines if it's under travel. How was this done before? Does anyone remember word clouds? You'd have to build a machine learning model. You'd have to stem your data, remove stop words, um, build a classifier, support vector machines, naive bays. It's a lot of work. Enter the new way, structured outputs. So with structured outputs, you can get the answer a lot easier. This is unsupervised learning. Uh, this is literally what that would look like. Say you have a list of companies, JD factors, and you have to categorize it into a taxonomy. Here the taxonomy would be the North American industry classification system. The NICS codes each code has a description. Uh and in this case it would be other cache management. For instance, uh typically JD factors is probably not part of the foundational model's knowledge. So how do we ensure that the classification works? Well, enter tool call. You can run a web query to append information to each of these pieces of uh to each of these companies and then categorize enormous volumes. Uh so this is what we've been doing and we found that we've had huge wins from this. So uh what this has done is this democratized access to text classification for us. I want to talk about the the learnings that we've had from uh deploying this surgically at our company. Enomous wins in speed and accuracy those accuracy gains have not come cheaply. Uh this might be unsupervised learning but it's not unchecked. We've had to have the right relationships with the business partners who've worked handinhand with us to ensure that we get to the accuracy that we wanted. What this does is converts skeptics into champions. We don't become snake oil salesmen pushing and peddling AI. It becomes a pull from the firm that's asking us, hey, can you use this or can you apply Gen AI for us in these other initiatives, which is really powerful. Um, it's important to have business context that gets embedded for us in those taxonomies which are being used for classification. Uh, everyone's talking about agents. Well, you need to get the individual steps right correctly. And what this does is it builds that individual step to a high level of robustness and accuracy that we can daisy chain into the agentic workflows that we want. And finally, you know, a call out is that these results are stoastic and not necessarily uh deterministic. That comes with some risks. Kevin will talk more about those. Punch line here. We've we've been able to achieve 95% accuracy across 10,000 categorizing 10,000 vendors. Uh doing in minutes what would have taken days at an order of magnitude less cost. All right, next use case. Uh this wouldn't be an AI conference if we didn't talk about rag. So what do we how do we how do we uh see rag at our firm? You get dumped with a bunch of data. Here's 80 gigs of internal documents. What did Acme release in 2020? Uh let's say you got a court filing that you have to submit on Monday and it's Friday. You know, you might get asked a question, what is Acme's escalation procedures for reporting safety violations? How do we do this? In the past, you'd have an index, a literal index. Someone would say in an Excel file, what documents have been received? What documents haven't been received? And where are they? Uh or uh hope not, but maybe you'd use search and you have SharePoint search or something like that that uh probably wouldn't find you what you're looking for. Well, what do we do now? We have an enterprise scale rag app. It has to handle hundreds of gigabytes of data uh powerpoints, documents, Excel, CSVs, all sorts of formats, uh and and huge volumes. What can you append to that? You can append tool calls to third party proprietary databases. Let let me talk about that for a second. What are the trade-offs that we've had? Sorry, I'm going really fast, short on time. Um the the wins and the losses. So it's been rag is invaluable at at consulting companies because you get dumped on a project really quick and you have to get up to speed. So uh ends up being really valuable. Uh but I want to call out the teaching LLM APIs part. Um typically certain data sources would be siloed behind organizations that had licenses that would have to pull information from a web UI that would then be emailed to a certain to a certain team and then that team would analyze the Excel. Well, what we did was we took the API spec, embedded it, and taught the LLM how to call an API. We have democratized access to information that would otherwise have taken days for people to use. Really condensing the time as Kevin said before in some of the projects on the highv value work. Uh the last thing to call out about rag is that it serves as a substrate on which you can tack on a a number of geni features that's proven really valuable for us at our firm. Uh number of call outs, you know, people have high expectations on what they what they want to receive from a a prompt box. If you say reason across all documents, that's just not how Rag works. So we have to build those solutions step by step and it's a long journey that we have to go on and we're excited to be on it. With that, over to Kevin in the third use case. Yeah. Oh, thanks. Um, so it's a good thing Box went before us because they covered a lot of the advantages of the the ability fundamentally to take unstructured data and create structure from that. It it is an unbelievably powerful concept. It's it's very simple on its face, but it is incredibly powerful in an enterprise context because you can take something like this credit agreement. It's 50 or so pages long in terms of a PDF and you can very quickly extract information that's useful like contract parties, maturity date, senior lenders, whoever that might be. Um, and so you see folks like Jason Lou, Pinantic is all you need. It is still true. It is still all you need. Um, and fundamentally what this looks like, I box went through a lot of it, but it's combining a document with a schema with an LLM with some validation and scaffolding around it to make sure that you're pulling out the the values that you uh that you need. And the business value really is in the schema of what you're actually what you're extracting and why you're extracting that information. It's the flexibility um that is really powerful here because you can start to reapply it across different types of engagements. investigations might be looking at something entirely different than an M&A transaction. This fundamental capability can can span across all those and the power is there at the bottom where you can do this type of thing repeatedly across multiple documents up thousands, tens of thousands, hundreds of thousands of documents where doing a human review might take days or weeks. Using an LLM, you can get it down to minutes. It's incredibly powerful. Um in terms of user trust, we um not only are using external sources like Box and others as well, but we've we've rolled our own uh internally as well. And so um in terms of just exposing some of the model internals to users to have somewhat of an off-ramp for them to understand um where the model is more or less confident, we use the log probs that's returned from the OpenAI API and we align that with the output schema from structured outputs. So we ignore all the JSON data. We ignore the field names themselves. We just home in on the values themselves. So in this case, the green box above the interest rate of LIBOR plus 1% peranom. That's the field that we want. We u basically take the geometric mean of the log props associated with those tokens in particular and use that as a rough proxy of the model's confidence in producing that output. So the um the boxes way at the beginning that you saw in terms of green and and uh and yellow is a direct reflection of the confidence level. So it's a really relatively intuitive way for users to get an understanding of the model's confidence again for for human review to the extent that's needed. Uh I won't go through all these but fundamentally like I said it is magic when it works and it works at scale. It is a total unlock particularly for non-technical folks who are not up to speed with the capabilities of LLM. to be able to do this is is a light switch light bulb moment for them. Um, and it really is a gamecher. Now, uh, that being said, there's a lot of work to be done in terms of validation. You saw all the work that Box and others have done in terms of getting it to a level of rigor that you could that users can trust. U, and so that's really a key tenant for all this. And so, finally, uh, I'll turn it to Mo for the must haves. Ju, so just a couple quick callouts. Uh I know this is a tech conference but a lot of this to get to work at the enterprise requires people skills and working closely with the organization. There are a couple things I want to call out that have been really important for us to scale our gen initiatives at our firm. The first one is uh demos. We we build in Streamlip but we uh we we prototype in Streamlip but we build in React. And so we have a constant cadence once a month that we show the latest and greatest of what we're building. This inspires the firm and what we're able to build and continue to invest in our uh initiatives. Uh and then the second thing is you know there's always the next shiny thing agents MCP uh the latest model uh NPS is our our metric ROI is our metric and that is one hardearned one bug fix at a time uh I'll skip the other one you know partnerships are really important it's a shared journey so and I think we're out of time but uh I'll leave you with this once Excel powered uh LLMs actually work we will be at AGI so I'm looking forward to that next talk thank Thank you. Thank you. [Music]