Resolve AI CEO Spiros Xanthos: AI for Prod, Multi-agent Architectures, Engineering's Future
Channel: Alex Kantrowitz
Published at: 2025-12-23
YouTube video id: eyexdmlJUk4
Source: https://www.youtube.com/watch?v=eyexdmlJUk4
Why has AI generated so much code but less products running in production? Let's cover it with Spiro Xandos, the founder and CEO of Resolve AI, who's here with us in a conversation brought to you by Resolve AI. And Spiros, it's great to see you. >> Great to see you, Alex. Thanks for hosting me. >> My pleasure. So, look, we're going to have a conversation about the missing element for AI coding, right? There's a lot of AI code out there. Uh, but what happens once that code comes out? you have a company that starts to handle that with AI. I want to start broad. Um, can you just give me your perspective on the state of AI technology today? Where is it? Where is it heading? And what does it need? Like what are the key uh building blocks that it needs to get better? >> First of all, I think that uh AI is is real in a big way, right? I do think that it is probably the technological wave at least in our lifetimes that might have the most impact economically in a very positive way for for for humanity and I think that's going to happen by u creating productivity gains by creating a lot more technology and all that technology in my opinion is going to make things that were impossible before very expensive or very hard a lot more accessible and I'm a big believer now of course when you have like a kind of a technology evolution of this kind you end up having a lot of hype as well and maybe not all ideas that are being funded or created are are great ideas. But I don't think also that's that's a terrible thing, right? Uh in in the long run, I think it's real. The impact is real and you know the the good solutions and good ideas will prevail. Um where we are, I do believe we're still in an exponential improvement curve with AI. I think the models keep improving quite a bit because there was maybe a concern maybe a few months ago more like you know end end of last year I would say a year ago whether let's say the the the improvement in models is stalling that hasn't happened in fact the models kept accelerating and becoming better I also think now um over the past 12 months we saw the uh development of very effective agentic solutions definitely we saw that in in software and coding in particular the impact is real and is very visible um you know two years ago Nobody was using AI in coding or effectively nobody. Uh then with GitHub Copilot in particular, everybody started doing it and then for the last 12 months, nobody really writing code without an agent assisting them. And I think what we will start seeing now in 2026, that paradigm is going to show up in other parts of software and in other industries. We already see it in customer service and a lot of other business process automation. And of course the imp impact in our own personal lives as consumers of AI is going to be significant as well. >> But let's just talk about it because for code is well it's text it is something that has real it has specific answers and for an AI system that's probabilistic you can try some things and reinforce on what works. You know when someone pushes something that's pretty good uh indication that you did a good job and you should do more of that. um all these other uh types of disciplines are more open-ended. So, is it just going to take a little longer? And maybe that's why code was first, but is it going to take a little longer? How do you see that evolving? That that's that's a very valid point, right? One of the ways we can make let's say models and agents uh improve uh how they perform tasks in a certain domain is by obviously having a very clear reward function. let's say on on which and enough cases on which we train the model or the agents and uh uh with code let's say we have created the ability to have a very clear signal uh when somebody uses the product but also in my opinion the other thing that happened with code in addition to actually let's say training the models on code because there's a lot of code available uh we built products we built ids that essentially users started using even if the answer was not perfect and that started generating a lot more data and how users actually respond to these things right when do they accept the change and when they do not accept the change and that itself is very valuable data and uh you know creates a data flywheel that then can go back and improve the way we generate code. I do think that paradigm is applicable to other domains. So you have to probably start with something that is useful that let's say humans are on the driver's seat engineers in our case and uh as long as it's it's a better way of working it's it's it's faster than not without that tool then you can also create this data flywheel where we get essentially u uh uh some sort of a response of whether what the the AI did was accepted by a human and and then that creates essentially an improvement in the next cycle and a lot more automation over time. Uh the way we see it at resolve AI like our primary way uh engineers the primary way engineers use resolve AI is to put us on call instead of them waking up in the middle of the night when something goes wrong with software and uh resolve does the triing and troubleshooting and suggests remediation. So we have created the ability to to have this kind of reward function in the end let's say or or ground truth that helps us become better and better in the general sense but maybe more importantly in in in a specific organization for for which we learn patterns and you know extract knowledge let's say that humans might otherwise have >> right yeah and I think you mentioned this uh this on call situation that happens for engineers and we should talk about it because u for those who don't know if you're in a tech company. Uh, somebody typically is on call at night or over the weekend and your job in case something goes down is to go fix it. And I think these people have often times saved services so that when people wake up or even they're using things on the weekend, uh, you know, even if there's a problem, they're able to to solve it. Now, the problem is it's costly for companies and it ruins the quality of life of engineers. Um I know you know many who suffered through this um and and that really is a question about what happens with AI code right because AI code we know AI is good at generating code as we've talked about already uh here today the issue is that when you put it into production you're just creating more work for people to be able to go monitor it and that really I think when we talk about finding an ROI from AI that really diminishes the possibility because um you know you've you've diminished the amount of work you needed to do for one thing but now you've created multiples of new work uh for others and so talk a little bit about your view on on this and and how it might be solved. >> Yeah. So starting with your intro to the question uh the world runs on software and there are people behind all this software that you know often times have to to spend nights and weekends in you know troubleshooting and maintaining the software to ensure like uh you know reliable or availability of that software. So to your question though now um it is very clear that uh AI has a lot of impact in in generating code but at the same time a lot more code without addressing the subsequent steps is almost a liability it's not it's not an asset for a company right because at the end of the day what you want to do is you know create technology faster but also create this technology in a way that you can reliable deliver it to your users whatever those might be. And in addition to now uh having a lot more code because this code is generated by AI it is becoming harder for uh software engineers to let's say troubleshoot maintain improve that code once it gets to production. So there are studies out there that showed let's say that the uh incidents where something went wrong per new new change has increased quite a bit as AI is being used and on top of that we're probably less familiar with the code that AI created. So to me we're not going to be able to really move faster in technology just by producing more code. In fact, you know, it might get into the way of moving faster because we're going to have less confidence in how to maintain and run the system reliably. So, I do think that AI, but also the way is not to go back obviously, right? Is not not generate all this code. The the the answer to me for me is more AI, but now applied to the next step of this, which is okay, all this code is generated. We need to have models and agents that actually can monitor, maintain, improve and you know troubleshoot when something goes wrong so that you know the the the let's say the whole thing can move maybe 10 times 100 times faster because if in this process let's say half of it gets generated uh five times faster but the next step doesn't move five times faster you're really not actually improving velocity that much. So that's exactly the area where resolve is focusing and we do think it's going to be very very impactful. Uh both because because of the current state but more so because of the state we're moving into with a lot of the code being generated by AI. >> Okay. So you are you're an AI solution effectively that will um I mean it doesn't have to be AI generated code but all those things about keeping uh that the product running um you effectively built a product that can go ahead and it does it investigate the codebase and then look for errors and then does it is it does it have authorization then to go out and fix it? That's a very good question because the other thing that happens uh when you deploy let's say code and you run it in in your production system as we call it um there is a lot of sensitivity both around the data but al also in in terms of like something going wrong like you know humans can make mistakes of course and they can cause you know challenges and outages and you know they can take down a service but AI potentially can do the same and in my view the way actually we do this is we do it by making a you know building AI that focus focuses a lot on trust trust for software engineers or operators in any domain for that matter. So that the way I think about it is almost as self-driving cars. So we don't let essentially self-driving cars on the street unless they have proven right with data that they can drive better than human drivers and uh you know there are also different levels of automation or in in driving right I think the same thing is going to happen with AI in many domains but definitely in this one because in initially most people allow the AI to go do the work do the investigation report back the findings and what is the solution and then a human has to a human engineer has to decide maybe In the next step, the AI is going to be able to take some of these actions on its own as long as they they they are, let's say, not too risky or they're reversible. And eventually, we're going to get to level five kind of self-driving situation where AI should be able to solve problems that humans cannot solve and, you know, make changes and move move a lot faster. And that's really what's going to allow us to to to build honestly a lot more technology a lot more quickly. >> How far away do you think that level five is for like your clients? So right now basically if I'm reading it right uh your software will uh investigate errors when they come up uh and then push it to a person and be like hey I think this is probably what it is and what do they like press a button and then it gets fixed or so so I'm curious to hear that and then also um uh yeah how how far just like the AI is like I will go in and do it myself. I I think that we let's say there are two aspects to this question in a way there is the capability like the reasoning ability of AI to be let's say on par with software engineers or or better for that matter to to allow it and then there is the whole uh compliance you know framework that you have to have in place on when do you let an AI do all that work right and what what happens when something goes wrong and I think we need to figure out both but I I'll answer more on on the capability uh kind of uh uh uh let's say uh perspective It is very hard for us to to possibly imagine the future when we're still on an exponential curve because that's very unusual for how we think and how things improve. I do believe that you know we are probably a year away from AI becoming the driver of software the same way let's say agents are the primary producer of code today. I think we're going to move to the same place in a year from now where humans are going to be operating at a higher level of abstraction still overseeing that AI making most of the final decisions but I think probably in two to three years we're going to be at a place where AI is going to be making most of these decisions and humans would be delegating let's say high level maybe uh decision frameworks or tasks to to the AI. So I I give it I guess I don't know level five maybe is difficult to predict but let's say the level below is probably going to happen in the next two years. Okay. And so you said we're on an exponential and clearly things have really moved fast. It's actually exponential might be the the word that everybody in AI loves the most and how can you blame them I guess given the way things have gone recently. Um but that there's been you know this this theory that the models you know are all commoditizing and leveling out saturating and so where do the gains come from next? And I think you're the perfect person to ask about this because you are tackling a really tough problem, really meaty problem. I doubt that you can just like throw a model at a codebase and you know with like a sort of standard instant response that says um hey it's probably this like I imagine you're probably deep into things like orchestration where you've got multiple models running and checking each other's work. Um, and so I' I'd love to hear like your firsthand experience of of how that's working. Uh, and whether you think, you know, if we're going to continue on the exponential the way that we are or if this exponential is going to continue, if this is going to be the way it goes. Not bigger models and maybe that's part of it. Uh, but more taking multiple models and multiple uh different programs that are that are working with them and getting them to check their work and build on top of each other. >> Yeah, I I do believe that the foundational models are going to keep improving at a cost quite fast pace. um you know I think there is probably more we can do there with more data and probably algorithmically but you know that's not the only way right and maybe that's the way things improve so far the most but I think the application layer matters quite a bit here and the application layer does not mean like simply building applications but taking the domain and you know uh adding it to the model as well so generally speaking if you see one of the reasons maybe sometimes we say AI did not have as much impact in businesses is I do believe that you know models are let's say are accessible to everybody right and then there are not that many applications that have gone deep to understand let's say the domain or the the business uh specifics cuz most of the products are a bit like um I would say thin in how they deal with the last mile almost I do believe that is to to be very successful and I've seen many examples of that you need to incorporate a lot of the knowledge knowled specifics the the tribal knowledge or institutional knowledge into let's say the product or the model to be able to be highly effective and I think that's what that's what needs to happen next now the way we do it in our domain is that first of all we have to build this multi- aent system that both can use tools that software engineers can use themselves because that's how the world looks like today right the world was not built for agents and until we transition to to a new set of tools that are built for agents we need to be able to use the human tools as a Then we need to be able to collaborate with humans very well. So the agents have to have the right interface to towards humans and then we need to be able to discover a lot of the knowledge that exists in those environments and that knowledge often times is maybe written down in documents. Sometimes the documents are outdated so it's wrong but also it's often times in in human minds and the AI has to collaborate with humans. So over time learn learns from them. So that requires really this multi- aent system where different agents possibly have different responsibilities uh on how let's say they do the work, how they use the tools, how they coordinate with each other but also how do we communicate with with with engineers and that's what makes it very very hard because it's not simply a model problem. It is a multi- aent orchestration planning reasoning reasoning problem and often times you have like many many many steps let's say an agent have to take like hundreds sometimes so you run out of context you you you start hitting a lot of the model problems as well right so I I think like to step back what's going to happen at least in our domain and I think that generalizes in other domains I think we need to build essentially uh deep aentic applications that understand both the domain very well understand the the the customer context very well and a lot of that we have we have to push a lot more innovation into the model to be able to deal with much more data much more context and you know be able to plan and reason for much longer let's say task longer horizon task as we call them right that sometimes models lose track of of what they're trying to do >> but can I talk to you about this because I get what multi- aent systems are supposed to be doing uh but I imagine that if one agent breaks uh the the whole thing goes down. Um, so can you talk a little bit about like how you get them I mean trying to get a chatbot to do one thing you want is sometimes a challenge. So how are you getting multi- aent uh processes and workflows to work because it seems to me like a very difficult problem. >> Yes. So first of all I mean maybe to to use an example right let's say you have two agents that they have to collaborate. Maybe one agent goes and understands documents and maybe there's another agent that goes and takes actions into a code base, right? To be able to perform a task. So if you have a situation with these two agents and often times you have more than two, maybe you have five or 10, you have to figure out the plan you want to execute, right? How do you coordinate the work across them? And then they have to essentially do the work and communicate back to each other what they learned, right? So to do that reliably I think you have to have many layers of let's say u uh guardrails if you wish right not guard rails in the sense of not letting the the the models or the agent do something something wrong but often times what we do is we have an agent do some work another agent review that work and provide feedback and have the first agent iterate right and then now when it expands across multiple agents maybe they all do the work they produce an outcome in our case let's say an agent goes and does uh investigates an incident and produces like a an analysis of what happened and how to fix it, then we have another agent go review that work and force the first agent to go back and you know redo the work if it finds like some some uh hole in it in its reasoning. So I would say like there are you have to stack many many things together in terms of controls and checks and validation to be able to actually produce a very reliable system. And then the other thing that happens over time is you gather a lot more data about what works. You know the sequence in which you do something and it works right you go back to the earlier discussion we had about you know the ground truth and a reward function. So the more of these kind of um let's say trajectories as we call them we have of you know how the work was done successfully then we can go back to the model and improve the model in the first place and then that gives you a lot more room let's say at the application layer to go solve harder problems. So you have like an orchestrator in the middle that's dispatching these different bots and then they come back and show their work and it helps check and do do the necessary things to make sure things are in good shape. Effectively you have somebody who manages you have an agent that manages the other agents right and makes decisions decides what to tell them also what not to tell them depending on the what the task at hand to not confuse them and then there is that agent itself checks his own work but then maybe there is another let's say supervisor if you want if you wish that also does spot checking and you know validation of the work so you know there is a lot of kind of that kind of uh delegation and checks uh kind of >> it's amazing it's amazing what what um you guys have come up with what people come with with these systems. Um, let let me ask you, do you for the orchestrator, do you need a uh smarter model? Like do you go with like the the foundational smartest foundational model as the orchestrator and then uh maybe you send like uh more specialized or a little dumber open- source models to go out and do some of these other things. >> Yeah, that's valid because if you think about it like if it's a specific task, it takes one or two steps, right? So a model that doesn't have to reason a lot or for a for a long set of tasks is sufficient. But if if that you have something that has to plan across many agents and across many many steps, usually you want the most capable model in terms of reasoning, which means usually a very big model that is also probably most likely a close proprietary model. >> Interesting. So that's that's the mix that you have. >> Yes. And we found Yeah. In some sense maybe if we're to generalize at the top level you usually have the most capable model usually the most expensive also and the biggest model and then the the underlying tasks can be performed by maybe either fast closed source models or even open source models that you you you post train for the task and then if you think about it what it becomes in the long run maybe a question is for a given domain is there going to be a specialized large model that can reason very well that maybe is owned by a company that just does that or are going to have the situation where like maybe we have these horizontal models the current situation basically right that that are applied across domains and to be honest what do you think I'm not sure I in my intuition is that for large domains like software like I don't know customer service uh because there is u a lot of impact economically it does make sense to invest in in a in a in a model that's domain specific and very good at what it does I don't think you have to start from zero right you take in all the capabilities that exist maybe uh in in in in a larger model but I do think I do think we're going to end up in a situation where the most capable model in the domain is going to be a specialized model that's really interesting all right so let's talk a little bit about culture here right um these tools have come uh so fast and furious and a thing that uh we hear often is there's this like capability overhang that the models can do more than what people are taking advantage of them uh to 2 and uh people are asking GPT 5.2 uh the same questions they were asking GPT3. So talk a little bit about it's interesting from your perspective. You're working with engineers so these people generally tend to appreciate what technology can do. What is the uh interest in adopting this stuff been like? And has there been any sort of oh if we use this you know AI um system that's going to make sure that we stay up if there's a problem then what are we going to do? This is a valid question and concern. Now I do think maybe engineers are the earliest adopters you can find. So that generally helped also in getting AI to be widely adopted for coding and now the subsequent steps of software software engineering like we see ourselves with the adoption of of resolve AI. Uh but there's still sometimes two things happening also right there is some natural resistance to change. It's difficult for for for humans to change their habits very very quickly and it takes you know different people with different personalities sometimes are more resistant to change. So that is definitely uh an issue and I do think there is a concern about you know people's jobs sometimes. Now my perspective on this is the following especially in software engineering where some of the most highly paid professionals in the world and I think by us producing more technology I I I don't think the end state here is going to look something like where we have fewer software engineers but also this is not what the the the the is not the optimization formula in my opinion right it almost doesn't matter if we have like all these highly more highly paid people or less highly paid people I think the the what we should be optimizing for is can we produce technology a lot faster that in the end benefits the entire world right like I said by solving harder problems or let's say problems that maybe are too expensive today so in that sense yes we have to push through this resistance because I do think the end goal is beneficial to to everybody um there there's so when I speak with engineers they I we have great conversations about this stuff it's almost like you can speak with an engineer and you get a sense as to what the future's going to be like because they have the most tooling um right at their fingers tips and they can tell you, oh, this is, you know, it's doing well here or it's having this effect on me or, you know, we're we're doing this in our office or we couldn't get this done. One of the things that I've heard is people are thinking, well, if um you know, if coding it gets handed to the AIS, then people will like kind of lose the ability to code and um and then it's fine until something breaks. And so I always thought oh like maybe you know you'll have engineers will become effectively the auditors and the fixers of the AI produced code but and again we're talking it you have much more code because of AI. Um and you've built this solution that says you know we'll just uh help help that auditing and the fixing uh with technology itself. Um so then what does that I mean what do you think that that uh if this works the way you anticipate where do you think that leaves the end state of engineering skills um you know do you think that that engineers will risk uh you know having some of those skills atrophy if the AI does a good enough job? Yeah, like with every technological evolution, obviously we get a lot of leverage as humans to produce, you know, outcomes a lot faster, right? We're not uh resisting using machines for other domains. I think the same applies to to to software and, you know, software over the last 50 years moved uh you know, from very low level, let's say, coding for the machine itself to like different layers of abstraction with operating systems and high level languages. I think AI is just another abstraction and I don't think the answer to this is uh or the concern is are humans are engineers going to atrophy it has all the skills to produce code or run code. I think the real answer is that we should produce agents that do both parts very well. They should produce code very quickly but they should also be able to run maintain improve troubleshoot code the same way and engineers should be now should be will be very quickly in my opinion operating at the higher level of abstraction where they won't have to worry also a lot about these low-level kind of specific bespoke things of the tool that you have at your disposal like query languages and you know how exactly should I call this API or this CLI to get to an answer just simply this is going to be done all this heavy lifting and stressful work is going to be done by AI and we're going to be operating at a level above and I think that's desirable and I don't think it it is a risk or a concern. >> Okay, here's my last question for you. We've talked a lot about how AI uh is able to your AI is able to monitor code bases and alert companies for like when a fix needs to be happen and maybe propose that fix. Can you give me like one concrete example of I I would love to hear what I'm trying to I'd love to hear an everyday use case um where this has really worked well and maybe like you know woken up an engineer or something but they've like just been able to hit a button and fix thing. Can you just like tell us an example with like some specific products and names of companies and things like that? >> Yes 100%. I will tell you that our own engineers share many stories internally at resolve of how they wake up in the middle of the night. They don't have to actually go to their laptop. They look at their phone they see what resolve says and then they go back to sleep. And I heard this from many customers but in in the real world we have customers like Coinbase, Door Dash, uh Salesforce and maybe the first two as consumer companies are products everybody's using uh to on a daily basis right so it is not uncommon that you know uh a change happens you know maybe somebody tries to develop a new feature pushes some new code uh to production that cause has unintended kind of consequences that end up like many layers have let's say producing an error for the for an end user, right? And what result we'll do is we'll go identify let's say uh maybe starts with an error like a user seeing an error or we have some you know some problem. will go traverse let's say all the and walk the path of the infrastructure figure out that this is happening from a particular let's say application or service you know review all the errors and then connect the errors maybe to the new chains go figure go see the code that was created let's say that is causing the problems and essentially describe to you the entire cycle and maybe tell you hey you have to actually undo this code change and everything's going to go well it's going to go back to normal >> okay Spirus I I typically at the end of uh these interviews ask uh you know the our guests to tell folks where they could find them, but it's been on your sweatshirt here. Uh resolve.ai. That's the URL, right? >> Correct. It's resolve.ai. Yes. >> Okay. All right, folks. Well, if you want to learn more, check out resolve.ai and Spiro. Hey, this was our first conversation. It was it was great to get a chance to know you and hope to talk more. >> Thanks, Alex. Same. >> All right, everybody. Thanks for watching and we'll see you next