Resolve AI CEO Spiros Xanthos: AI for Prod, Multi-agent Architectures, Engineering's Future

Channel: Alex Kantrowitz

Published at: 2025-12-23

YouTube video id: eyexdmlJUk4

Source: https://www.youtube.com/watch?v=eyexdmlJUk4

Why has AI generated so much code but
less products running in production?
Let's cover it with Spiro Xandos, the
founder and CEO of Resolve AI, who's
here with us in a conversation brought
to you by Resolve AI. And Spiros, it's
great to see you.
>> Great to see you, Alex. Thanks for
hosting me.
>> My pleasure. So, look, we're going to
have a conversation about the missing
element for AI coding, right? There's a
lot of AI code out there. Uh, but what
happens once that code comes out? you
have a company that starts to handle
that with AI. I want to start broad. Um,
can you just give me your perspective on
the state of AI technology today? Where
is it? Where is it heading? And what
does it need? Like what are the key uh
building blocks that it needs to get
better?
>> First of all, I think that uh AI is is
real in a big way, right? I do think
that it is probably the technological
wave at least in our lifetimes that
might have the most impact economically
in a very positive way for for for
humanity and I think that's going to
happen by u creating productivity gains
by creating a lot more technology and
all that technology in my opinion is
going to make things that were
impossible before very expensive or very
hard a lot more accessible and I'm a big
believer now of course when you have
like a kind of a technology evolution of
this kind you end up having a lot of
hype as well and maybe not all ideas
that are being funded or created are are
great ideas. But I don't think also
that's that's a terrible thing, right?
Uh in in the long run, I think it's
real. The impact is real and you know
the the good solutions and good ideas
will prevail. Um where we are, I do
believe we're still in an exponential
improvement curve with AI. I think the
models keep improving quite a bit
because there was maybe a concern maybe
a few months ago more like you know end
end of last year I would say a year ago
whether let's say the the the
improvement in models is stalling that
hasn't happened in fact the models kept
accelerating and becoming better I also
think now um over the past 12 months we
saw the uh development of very effective
agentic solutions definitely we saw that
in in software and coding in particular
the impact is real and is very visible
um you know two years ago
Nobody was using AI in coding or
effectively nobody. Uh then with GitHub
Copilot in particular, everybody started
doing it and then for the last 12
months, nobody really writing code
without an agent assisting them. And I
think what we will start seeing now in
2026, that paradigm is going to show up
in other parts of software and in other
industries. We already see it in
customer service and a lot of other
business process automation. And of
course the imp impact in our own
personal lives as consumers of AI is
going to be significant as well.
>> But let's just talk about it because for
code is well it's text it is something
that has real it has specific answers
and for an AI system that's
probabilistic you can try some things
and reinforce on what works. You know
when someone pushes something that's
pretty good uh indication that you did a
good job and you should do more of that.
um all these other uh types of
disciplines are more open-ended. So, is
it just going to take a little longer?
And maybe that's why code was first, but
is it going to take a little longer? How
do you see that evolving? That that's
that's a very valid point, right? One of
the ways we can make let's say models
and agents uh improve uh how they
perform tasks in a certain domain is by
obviously having a very clear reward
function. let's say on on which and
enough cases on which we train the model
or the agents and uh uh with code let's
say we have created the ability to have
a very clear signal uh when somebody
uses the product but also in my opinion
the other thing that happened with code
in addition to actually let's say
training the models on code because
there's a lot of code available uh we
built products we built ids that
essentially users started using even if
the answer was not perfect and that
started generating a lot more data and
how users actually respond to these
things right when do they accept the
change and when they do not accept the
change and that itself is very valuable
data and uh you know creates a data
flywheel that then can go back and
improve the way we generate code. I do
think that paradigm is applicable to
other domains. So you have to probably
start with something that is useful that
let's say humans are on the driver's
seat engineers in our case and uh as
long as it's it's a better way of
working it's it's it's faster than not
without that tool then you can also
create this data flywheel where we get
essentially u uh uh some sort of a
response of whether what the the AI did
was accepted by a human and and then
that creates essentially an improvement
in the next cycle and a lot more
automation over time. Uh the way we see
it at resolve AI like our primary way uh
engineers the primary way engineers use
resolve AI is to put us on call instead
of them waking up in the middle of the
night when something goes wrong with
software and uh resolve does the triing
and troubleshooting and suggests
remediation. So we have created the
ability to to have this kind of reward
function in the end let's say or or
ground truth that helps us become better
and better in the general sense but
maybe more importantly in in in a
specific organization for for which we
learn patterns and you know extract
knowledge let's say that humans might
otherwise have
>> right yeah and I think you mentioned
this uh this on call situation that
happens for engineers and we should talk
about it because u for those who don't
know if you're in a tech company. Uh,
somebody typically is on call at night
or over the weekend and your job in case
something goes down is to go fix it. And
I think these people have often times
saved services so that when people wake
up or even they're using things on the
weekend, uh, you know, even if there's a
problem, they're able to to solve it.
Now, the problem is it's costly for
companies and it ruins the quality of
life of engineers. Um I know you know
many who suffered through this um and
and that really is a question about what
happens with AI code right because AI
code we know AI is good at generating
code as we've talked about already uh
here today the issue is that when you
put it into production you're just
creating more work for people to be able
to go monitor it and that really I think
when we talk about finding an ROI from
AI that really diminishes the
possibility because um you know you've
you've diminished the amount of work you
needed to do for one thing but now
you've created multiples of new work uh
for others and so talk a little bit
about your view on on this and and how
it might be solved.
>> Yeah. So starting with your intro to the
question uh the world runs on software
and there are people behind all this
software that you know often times have
to to spend nights and weekends in you
know troubleshooting and maintaining the
software to ensure like uh you know
reliable or availability of that
software. So to your question though now
um it is very clear that uh AI has a lot
of impact in in generating code but at
the same time a lot more code without
addressing the subsequent steps is
almost a liability it's not it's not an
asset for a company right because at the
end of the day what you want to do is
you know create technology faster but
also create this technology in a way
that you can reliable deliver it to your
users whatever those might be. And in
addition to now uh having a lot more
code because this code is generated by
AI it is becoming harder for uh software
engineers to let's say troubleshoot
maintain improve that code once it gets
to production. So there are studies out
there that showed let's say that the uh
incidents where something went wrong per
new new change has increased quite a bit
as AI is being used and on top of that
we're probably less familiar with the
code that AI created. So to me we're not
going to be able to really move faster
in technology just by producing more
code. In fact, you know, it might get
into the way of moving faster because
we're going to have less confidence in
how to maintain and run the system
reliably. So, I do think that AI, but
also the way is not to go back
obviously, right? Is not not generate
all this code. The the the answer to me
for me is more AI, but now applied to
the next step of this, which is okay,
all this code is generated. We need to
have models and agents that actually can
monitor, maintain, improve and you know
troubleshoot when something goes wrong
so that you know the the the let's say
the whole thing can move maybe 10 times
100 times faster because if in this
process let's say half of it gets
generated uh five times faster but the
next step doesn't move five times faster
you're really not actually improving
velocity that much. So that's exactly
the area where resolve is focusing and
we do think it's going to be very very
impactful. Uh both because because of
the current state but more so because of
the state we're moving into with a lot
of the code being generated by AI.
>> Okay. So you are you're an AI solution
effectively that will um I mean it
doesn't have to be AI generated code but
all those things about keeping uh that
the product running um you effectively
built a product that can go ahead and it
does it investigate the codebase and
then look for errors and then does it is
it does it have authorization then to go
out and fix it? That's a very good
question because the other thing that
happens uh when you deploy let's say
code and you run it in in your
production system as we call it um there
is a lot of sensitivity both around the
data but al also in in terms of like
something going wrong like you know
humans can make mistakes of course and
they can cause you know challenges and
outages and you know they can take down
a service but AI potentially can do the
same and in my view the way actually we
do this is we do it by making a you know
building AI that focus focuses a lot on
trust trust for software engineers or
operators in any domain for that matter.
So that the way I think about it is
almost as self-driving cars. So we don't
let essentially self-driving cars on the
street unless they have proven right
with data that they can drive better
than human drivers and uh you know there
are also different levels of automation
or in in driving right I think the same
thing is going to happen with AI in many
domains but definitely in this one
because in initially most people allow
the AI to go do the work do the
investigation report back the findings
and what is the solution and then a
human has to a human engineer has to
decide maybe In the next step, the AI is
going to be able to take some of these
actions on its own as long as they they
they are, let's say, not too risky or
they're reversible. And eventually,
we're going to get to level five kind of
self-driving situation where AI should
be able to solve problems that humans
cannot solve and, you know, make changes
and move move a lot faster. And that's
really what's going to allow us to to to
build honestly a lot more technology a
lot more quickly.
>> How far away do you think that level
five is for like your clients? So right
now basically if I'm reading it right uh
your software will uh investigate errors
when they come up uh and then push it to
a person and be like hey I think this is
probably what it is and what do they
like press a button and then it gets
fixed or so so I'm curious to hear that
and then also um uh yeah how how far
just like the AI is like I will go in
and do it myself. I I think that we
let's say there are two aspects to this
question in a way there is the
capability like the reasoning ability of
AI to be let's say on par with software
engineers or or better for that matter
to to allow it and then there is the
whole uh compliance you know framework
that you have to have in place on when
do you let an AI do all that work right
and what what happens when something
goes wrong and I think we need to figure
out both but I I'll answer more on on
the capability uh kind of uh uh uh let's
say uh perspective
It is very hard for us to to possibly
imagine the future when we're still on
an exponential curve because that's very
unusual for how we think and how things
improve. I do believe that you know we
are probably a year away from AI
becoming the driver of software the same
way let's say agents are the primary
producer of code today. I think we're
going to move to the same place in a
year from now where humans are going to
be operating at a higher level of
abstraction still overseeing that AI
making most of the final decisions but I
think probably in two to three years
we're going to be at a place where AI is
going to be making most of these
decisions and humans would be delegating
let's say high level maybe uh decision
frameworks or tasks to to the AI. So I I
give it I guess I don't know level five
maybe is difficult to predict but let's
say the level below is probably going to
happen in the next two years.
Okay. And so you said we're on an
exponential and clearly things have
really moved fast. It's actually
exponential might be the the word that
everybody in AI loves the most and how
can you blame them I guess given the way
things have gone recently. Um but that
there's been you know this this theory
that the models you know are all
commoditizing and leveling out
saturating and so where do the gains
come from next? And I think you're the
perfect person to ask about this because
you are tackling a really tough problem,
really meaty problem. I doubt that you
can just like throw a model at a
codebase and you know with like a sort
of standard instant response that says
um hey it's probably this like I imagine
you're probably deep into things like
orchestration where you've got multiple
models running and checking each other's
work. Um, and so I' I'd love to hear
like your firsthand experience of of how
that's working. Uh, and whether you
think, you know, if we're going to
continue on the exponential the way that
we are or if this exponential is going
to continue, if this is going to be the
way it goes. Not bigger models and maybe
that's part of it. Uh, but more taking
multiple models and multiple uh
different programs that are that are
working with them and getting them to
check their work and build on top of
each other.
>> Yeah, I I do believe that the
foundational models are going to keep
improving at a cost quite fast pace.
um
you know I think there is probably more
we can do there with more data and
probably algorithmically but you know
that's not the only way right and maybe
that's the way things improve so far the
most but I think the application layer
matters quite a bit here and the
application layer does not mean like
simply building applications but taking
the domain and you know uh adding it to
the model as well so generally speaking
if you see one of the reasons maybe
sometimes we say AI did not have as much
impact in businesses is I do believe
that you know models are let's say are
accessible to everybody right and then
there are not that many applications
that have gone deep to understand let's
say the domain or the the business uh
specifics
cuz most of the products are a bit like
um I would say thin in how they deal
with the last mile almost I do believe
that is to to be very successful and
I've seen many examples of that you need
to incorporate a lot of the knowledge
knowled specifics the the tribal
knowledge or institutional knowledge
into let's say the product or the model
to be able to be highly effective and I
think that's what that's what needs to
happen next now the way we do it in our
domain is that first of all we have to
build this multi- aent system that both
can use tools that software engineers
can use themselves because that's how
the world looks like today right the
world was not built for agents and until
we transition to to a new set of tools
that are built for agents we need to be
able to use the human tools as a
Then we need to be able to collaborate
with humans very well. So the agents
have to have the right interface to
towards humans and then we need to be
able to discover a lot of the knowledge
that exists in those environments and
that knowledge often times is maybe
written down in documents. Sometimes the
documents are outdated so it's wrong but
also it's often times in in human minds
and the AI has to collaborate with
humans. So over time learn learns from
them. So that requires really this
multi- aent system where different
agents possibly have different
responsibilities
uh on how let's say they do the work,
how they use the tools, how they
coordinate with each other but also how
do we communicate with with with
engineers and that's what makes it very
very hard because it's not simply a
model problem. It is a multi- aent
orchestration planning reasoning
reasoning problem and often times you
have like many many many steps let's say
an agent have to take like hundreds
sometimes so you run out of context you
you you start hitting a lot of the model
problems as well right so I I think like
to step back what's going to happen at
least in our domain and I think that
generalizes in other domains I think we
need to build essentially uh deep aentic
applications that understand both the
domain very well understand the the the
customer context very well and a lot of
that we have we have to push a lot more
innovation into the model to be able to
deal with much more data much more
context and you know be able to plan and
reason for much longer let's say task
longer horizon task as we call them
right that sometimes models lose track
of of what they're trying to do
>> but can I talk to you about this because
I get what multi- aent systems are
supposed to be doing uh but I imagine
that if one agent breaks uh the the
whole thing goes down. Um, so can you
talk a little bit about like how you get
them I mean trying to get a chatbot to
do one thing you want is sometimes a
challenge. So how are you getting multi-
aent uh processes and workflows to work
because it seems to me like a very
difficult problem.
>> Yes. So first of all I mean maybe to to
use an example right let's say you have
two agents that they have to
collaborate. Maybe one agent goes and
understands documents and maybe there's
another agent that goes and takes
actions into a code base, right? To be
able to perform a task. So if you have a
situation with these two agents and
often times you have more than two,
maybe you have five or 10, you have to
figure out the plan you want to execute,
right? How do you coordinate the work
across them? And then they have to
essentially do the work and communicate
back to each other what they learned,
right? So to do that reliably I think
you have to have many layers of let's
say u uh
guardrails if you wish right not guard
rails in the sense of not letting the
the the models or the agent do something
something wrong but often times what we
do is we have an agent do some work
another agent review that work and
provide feedback and have the first
agent iterate right and then now when it
expands across multiple agents maybe
they all do the work they produce an
outcome in our case let's say an agent
goes and does uh investigates an
incident and produces like a an analysis
of what happened and how to fix it, then
we have another agent go review that
work and force the first agent to go
back and you know redo the work if it
finds like some some uh hole in it in
its reasoning. So
I would say like there are you have to
stack many many things together in terms
of controls and checks and validation to
be able to actually produce a very
reliable system. And then the other
thing that happens over time is you
gather a lot more data about what works.
You know the sequence in which you do
something and it works right you go back
to the earlier discussion we had about
you know the ground truth and a reward
function. So the more of these kind of
um let's say trajectories as we call
them we have of you know how the work
was done successfully then we can go
back to the model and improve the model
in the first place and then that gives
you a lot more room let's say at the
application layer to go solve harder
problems. So you have like an
orchestrator in the middle that's
dispatching these different bots and
then they come back and show their work
and it helps check and do do the
necessary things to make sure things are
in good shape. Effectively you have
somebody who manages you have an agent
that manages the other agents right and
makes decisions decides what to tell
them also what not to tell them
depending on the what the task at hand
to not confuse them and then there is
that agent itself checks his own work
but then maybe there is another let's
say supervisor if you want if you wish
that also does spot checking and you
know validation of the work so you know
there is a lot of kind of that kind of
uh delegation and checks uh kind of
>> it's amazing it's amazing what what um
you guys have come up with what people
come with with these systems. Um, let
let me ask you, do you for the
orchestrator, do you need a uh smarter
model? Like do you go with like the the
foundational smartest foundational model
as the orchestrator and then uh maybe
you send like uh more specialized or a
little dumber open- source models to go
out and do some of these other things.
>> Yeah, that's valid because if you think
about it like if it's a specific task,
it takes one or two steps, right? So a
model that doesn't have to reason a lot
or for a for a long set of tasks is
sufficient. But if if that you have
something that has to plan across many
agents and across many many steps,
usually you want the most capable model
in terms of reasoning, which means
usually a very big model that is also
probably most likely a close proprietary
model.
>> Interesting. So that's that's the mix
that you have.
>> Yes. And we found Yeah. In some sense
maybe if we're to generalize at the top
level you usually have the most capable
model usually the most expensive also
and the biggest model and then the the
underlying tasks can be performed by
maybe either fast closed source models
or even open source models that you you
you post train for the task and then if
you think about it what it becomes in
the long run maybe a question is for a
given domain
is there going to be a specialized large
model that can reason very well that
maybe is owned by a company that just
does that or are going to have the
situation where like maybe we have these
horizontal models the current situation
basically right that that are applied
across domains and to be honest what do
you think I'm not sure I in my intuition
is that for large domains like software
like I don't know customer service uh
because there is u a lot of impact
economically it does make sense to
invest in in a in a in a model that's
domain specific and very good at what it
does I don't think you have to start
from zero right you take in all the
capabilities that exist maybe uh in in
in in a larger model but I do think I do
think we're going to end up in a
situation where the most capable model
in the domain is going to be a
specialized model that's really
interesting all right so let's talk a
little bit about culture here right um
these tools have come uh so fast and
furious and a thing that uh we hear
often is there's this like capability
overhang that the models can do more
than what people are taking advantage of
them uh to 2 and uh people are asking
GPT 5.2
uh the same questions they were asking
GPT3.
So talk a little bit about it's
interesting from your perspective.
You're working with engineers so these
people generally tend to appreciate what
technology can do. What is the uh
interest in adopting this stuff been
like? And has there been any sort of oh
if we use this you know AI um system
that's going to make sure that we stay
up if there's a problem then what are we
going to do?
This is a valid question and concern.
Now I do think maybe engineers are the
earliest adopters you can find. So that
generally helped also in getting AI to
be widely adopted for coding and now the
subsequent steps of software software
engineering like we see ourselves with
the adoption of of resolve AI. Uh but
there's still sometimes two things
happening also right there is some
natural resistance to change. It's
difficult for for for humans to change
their habits very very quickly and it
takes you know different people with
different personalities sometimes are
more resistant to change. So that is
definitely uh an issue and I do think
there is a concern about you know
people's jobs sometimes. Now my
perspective on this is the following
especially in software engineering where
some of the most highly paid
professionals in the world and I think
by us producing more technology I I I
don't think the end state here is going
to look something like where we have
fewer software engineers but also this
is not what the the the the
is not the optimization formula in my
opinion right it almost doesn't matter
if we have like all these highly more
highly paid people or less highly paid
people I think the the what we should be
optimizing for is can we produce
technology a lot faster that in the end
benefits the entire world right like I
said by solving harder problems or let's
say problems that maybe are too
expensive today so in that sense yes we
have to push through this resistance
because I do think the end goal is
beneficial to to everybody
um there there's so when I speak with
engineers
they I we have great conversations about
this stuff it's almost like you can
speak with an engineer and you get a
sense as to what the future's going to
be like because they have the most
tooling um right at their fingers tips
and they can tell you, oh, this is, you
know, it's doing well here or it's
having this effect on me or, you know,
we're we're doing this in our office or
we couldn't get this done. One of the
things that I've heard is people are
thinking, well, if um you know, if
coding it gets handed to the AIS, then
people will like kind of lose the
ability to code and um and then it's
fine until something breaks. And so I
always thought oh like maybe you know
you'll have engineers will become
effectively the auditors and the fixers
of the AI produced code but and again
we're talking it you have much more code
because of AI. Um and you've built this
solution that says you know we'll just
uh help help that auditing and the
fixing uh with technology itself. Um so
then what does that I mean what do you
think that that uh if this works the way
you anticipate where do you think that
leaves the end state of engineering
skills um you know do you think that
that engineers will risk uh you know
having some of those skills atrophy if
the AI does a good enough job?
Yeah, like with every technological
evolution, obviously we get a lot of
leverage as humans to produce, you know,
outcomes a lot faster, right? We're not
uh resisting using machines for other
domains. I think the same applies to to
to software and, you know, software over
the last 50 years moved uh you know,
from very low level, let's say, coding
for the machine itself to like different
layers of abstraction with operating
systems and high level languages. I
think AI is just another abstraction and
I don't think the answer to this is uh
or the concern is are humans are
engineers going to atrophy it has all
the skills to produce code or run code.
I think the real answer is that we
should produce agents that do both parts
very well. They should produce code very
quickly but they should also be able to
run maintain improve troubleshoot code
the same way and engineers should be now
should be will be very quickly in my
opinion operating at the higher level of
abstraction where they won't have to
worry also a lot about these low-level
kind of specific bespoke things of the
tool that you have at your disposal like
query languages and you know how exactly
should I call this API or this CLI to
get to an answer just simply this is
going to be done all this heavy lifting
and stressful work is going to be done
by AI and we're going to be operating at
a level above and I think that's
desirable and I don't think it it is a
risk or a concern.
>> Okay, here's my last question for you.
We've talked a lot about how AI uh is
able to your AI is able to monitor code
bases and alert companies for like when
a fix needs to be happen and maybe
propose that fix. Can you give me like
one concrete example of I I would love
to hear what I'm trying to I'd love to
hear an everyday use case um where this
has really worked well and maybe like
you know woken up an engineer or
something but they've like just been
able to hit a button and fix thing. Can
you just like tell us an example with
like some specific products and names of
companies and things like that?
>> Yes 100%. I will tell you that our own
engineers share many stories internally
at resolve of how they wake up in the
middle of the night. They don't have to
actually go to their laptop. They look
at their phone they see what resolve
says and then they go back to sleep. And
I heard this from many customers but in
in the real world we have customers like
Coinbase, Door Dash, uh Salesforce and
maybe the first two as consumer
companies are products everybody's using
uh to on a daily basis right so it is
not uncommon that you know uh a change
happens you know maybe somebody tries to
develop a new feature pushes some new
code uh to production that cause has
unintended kind of consequences that end
up like many layers have let's say
producing an error for the for an end
user, right? And what result we'll do is
we'll go identify let's say uh maybe
starts with an error like a user seeing
an error or we have some you know some
problem. will go traverse let's say all
the and walk the path of the
infrastructure figure out that this is
happening from a particular let's say
application or service you know review
all the errors and then connect the
errors maybe to the new chains go figure
go see the code that was created let's
say that is causing the problems and
essentially describe to you the entire
cycle and maybe tell you hey you have to
actually undo this code change and
everything's going to go well it's going
to go back to normal
>> okay Spirus I I typically at the end of
uh these interviews ask uh you know the
our guests to tell folks where they
could find them, but it's been on your
sweatshirt here. Uh resolve.ai. That's
the URL, right?
>> Correct. It's resolve.ai. Yes.
>> Okay. All right, folks. Well, if you
want to learn more, check out resolve.ai
and Spiro. Hey, this was our first
conversation. It was it was great to get
a chance to know you and hope to talk
more.
>> Thanks, Alex. Same.
>> All right, everybody. Thanks for
watching and we'll see you next