The State of AI Code Quality: Hype vs Reality — Itamar Friedman, Qodo

Channel: aiDotEngineer

Published at: 2025-12-11

YouTube video id: rgjF5o2Qjsc

Source: https://www.youtube.com/watch?v=rgjF5o2Qjsc

[music]
I'm really excited being here. So many
so much pragmatic and insight and
suggestions. I was sitting there uh just
just before. So I'm Edomar Freriedman
the CEO and co-founder of Kodto. Codto
stands for quality of development and
I'm going to share uh our reports and
other companies reports about state of
AI code quality. Uh you know trying to
uh talk about the hype versus reality
which was uh like one of the uh points
that were discussed here quite a lot
which is awesome. So in the last three
weeks, four weeks, we saw like three
outages in the clouds, unfortunately,
right? And these are coming from
companies that really care about moving
fast, right? They're they're they're
saying themselves that they're using AI
to generate code 10%, 30%, 50%, at the
same time, they care about quality. So
how did that happen? And is it is it
related? I don't know. But let's have
some I'm going to share some guess. So
by the way 60% of developers say that
the like quarter of their code is either
generated by AI or in in like uh uh
shaped by I and 15% say that even more
than 80 80% of their code uh is
basically generated or or shaped by AI.
Now people are using AI to do vibe
coding but actually they're even doing
it for vibe checking vibe reviewing.
This is the command of cloud. This is
the prompt for the command of claude
code for security review. It was hyped
like two months ago. Do you know what
I'm talking about now? It says there, I
don't know if you see it. Uh you are a
senior security engineer. Good. And then
like somewhere there uh down the line it
says please exclude denial of service.
Don't don't uh catch denial of service
issues. Maybe that's part of the part of
the reason like we're we're having uh
cloud outages. probably not just that,
but you get the point. Like we need to
be rigorous about how we deal with
quality. It's not just like vibe quality
or or so like we're doing vibe coding
sometimes. Uh let's go to another
example. Okay, cursor I guess like or or
pilot most of you use rules, right?
We're going to talk about it. You invest
in code generation. After a while, you
understand if you invest, you'll get
more out of it. And uh we we asked like
a bunch of of developers and I'm asking
you as well think think for a second for
all the developers there in the audience
like when you write cursor rules or
copilot rules etc. Do you feel they're
completely followed or it's like mostly
followed? Do you know how much they're
followed? And what extent are they
followed? It's rigorously like how
technical deep they're they're being
followed. So the what we get back like
the answer from what you see here on the
screen is mostly like B, C, and D. They
are followed but they're not completely
followed. Okay. So that means like we
are generating code trying to push it to
the standards but it's not necessarily
still like getting to the quality we
wanted. I'm going to share a bit more
statistics and and information and some
insight from three reports. One done by
Codo, another by done by Sonar, another
by far. And all of them are are focused
on code code quality review etc. The
sample size is thousands of developers
in some cases even more. Millions of
pull requests and and a billion of of
lines lines of code that were uh uh
being checked. Like for example, if you
think about uh Sonar, this is a company.
Yeah. A bit like coming from pre-AII,
but they see code at scale and you
they're doing like a lot of uh checks in
code that are not necessarily AI
focused, but are necessary in order to
check uh your your software from all
possible direction. And that's why their
scaling and the scale of the code that
they're seeing is is immense. Okay. So
for example, we took information from
from their report and eventually my
purpose here is to break down the
different dimension of what uh code
quality means and give you some share
some stats and and insights. I want to
start with the end. Okay, this is the
takeaway I want you all all like to take
from from the next 13 minutes that I
have. We started with code generation.
We like out of the box use it
autocomplete etc. and you invest in it
and you can get more out of it. But
there's a glass ceiling for how much
productivity you can get from code
generation. And then we move to the
agent code generation, right? Let's call
it gen 2.0. And that's a higher glass
ceiling. It could do much more
productivity and especially if you
invest in it, for example, rules, etc.
Then with AI breaking outside of the
IDE, we can start using AI also for code
for agentic quality workflows. It could
be inside the ID, but the the truth is
that if you think about all the
workflows you have in your organization,
especially if you're more than 100
developers or so, you probably have a
lot of workflows that you are related to
quality that you need to auto automate.
And that's where you start like breaking
through the glass ceiling of
productivity. if you invest in it. And
finally, I I claim that you need those
agentic workflows. Keep learning. And we
might touch a little bit of that like
later later on, okay? Like because
quality is something dynamic. So you'll
only finally break break the glass
ceiling if if you really have those
quality workflows and rules and standard
being dynamic. And then then you will
see the promised 2x, let alone the 10x
that you were promised the hyped. and
you you heard from McKenzie and from
Stanford you're not getting that I don't
need to tell you that 2x 10x for that
entire software development uh life
cycle so a bit about more about the
market adoption uh one of the report
says that 82% of adoption already for AI
dev tools are being used daily or weekly
uh some people at 60 60% 59 report that
they're using more than three and 20%
saying they're using more than five code
generation tools If you think about it
for a second, uh don't only take like
cursor copilot, codex, cloud code, etc.
Sorry if I'm insulting anyone in the
that I forgot their tool, but there's
also the lovable etc. They also generate
code. And by the way, you're going to
get to 10. I'm count on me. You're going
to get to 10 tools in two three years
that generate code for you. Okay, come
to talk to me about later. I'll try to
convince you. And and the thing is that
it it's coming from bottom up. like 50%
of the usage is coming from less than 10
teams that are less than 10 developers
but it is propagating also to the
enterprise again I'm sure you know I
mean talk propagating to the enterprise
at scale like not just like five
developers in the last year we're seeing
like more and more enterprise using co
code generation u so if like an average
with within reports we saw 82 to 92%
using weekly to a monthly uh code
generation tools and in some cases Maybe
extreme, maybe not. We're going to talk
about it. We saw 3x productivity boost
in writing code. Okay, but that doesn't
mean that if you have uh 3x productivity
in writing code that you actually
guarantee any quality like I presented
before. So actually 67% of the developer
that we ask asked have serious quality
concerns about all the AI generated all
the generated code uh uh code generated
by AI or influenced by AI and they're
claiming that they're missing the
framework how to deal with quality how
to measure quality. It's a big question.
What is quality? I'm going to talk about
it in the next few slides. Okay, think
about it for a second before I break
break it down. What what is quality? Um
so what we're actually saying that the
crisis with V right coding uh viable
coding we're seeing it shifting and
evolving is that you're getting like
more task being done like 20 some report
20% more task you know velocity and like
97 more% or so of PRs being opened and
eventually it takes more time to review
PR like 90% more time to review PR and
by the way like there's a lot of
statistics about AI generating ating
code at least there's not less amount of
bugs per line of code I'm not claiming
that there are more but even if there's
not less bugs per line of code you have
much more bugs because there are much
more PRs much more code being generated
etc right so that that's a problem for
the reviewer so it's somebody's surprise
it takes more time to review these
especially in the age of agents right
when 5 minutes calling to cloud code I
have 1,000 line of code after 5 minutes
once upon a time it took me like hours
to write 10 proper lines of code. Right
now, let's zoom out for a second. Code
generation is magnificent. Okay? Like it
it's a gamecher when you're talking
about green field. You saw people talk
about it a few slides a few minutes
before me. Uh it it revolutionized how
we do PE proof of concept uh project
etc. But when you're dealing with
heavyduty software then you you like it
or not we are dealing with a lot of
things when uh when you serve millions
of clients you have financial
transactions when you're doing
transportation you're dealing with code
integrity if you like code governance uh
review standards testing relability etc.
That's what we need to uh uh to deal
with. Now let's break that under the
surface part of the glacier into two
dimensions. This is one dimension. You
can look on the qual quality issues in
throughout the software development life
cycle like planning and then development
writing code review. C code review is a
bit of a process but like what you're
like checking quality that's part of the
process of code review testing which is
another part of of quality and and
deployment. And I know I didn't cover
the entire like software development
life cycle but just to give you an
example and each one of them like
possess like introduce new problems that
are coming because you're using more and
more AI generated code. Um now another
dimension to look at it is actually code
level problems and process level
problems. Okay, I'm not I'm not opening
the you know list of functional just
opening the list of non-functional.
you're talking about security and
efficiency that are not necessarily uh
functional use. I I'll show you some
statistics about that. And then process
level is for example learning. Hey, if
you will have a a a bad outage because
of AI generated code, who is
responsible? Is it the AI or or the team
that own that? Okay. Like you need to
learn and own the code eventually.
That's a process that needs to be done.
verification, porting guard rails,
standards, uh, etc. So, so all of those
issues when they're introduced to
thousands of developer that we asked
them, do you think like actually AI
helped to reduce with those problems or
or actually made more like more
challenging 42 people reported that they
spend 42 more of the development time on
solving issues, on fixing bugs, etc. and
and they saw 35 uh% project delays.
We're talking about we're talking about
maybe games they're talking about like
delays. Okay, there's some bias. We told
them we talked about problem with
quality and what's the impact etc. Um
but that's what they they they present
uh to when they they answer uh when when
they're talking about like when you're
mass using AI code AI generated code and
we see reports uh some of the reports
talking about 3x more security inc
incidents by the way it makes sense you
remember we had a slide saying 3x more
writing code so 3x more security
incidents like the same amount of line
of code the same amount of uh uh
problems correlation so what to do with
that like I talked about problems and
problems and problems Okay, help help me
deal with it. Like let's let's spend a
few minutes on on that. So one one
suspect of course is testing and
actually really interesting we asked a
couple of question about testing and one
really relevant saying that people said
that when they heavily [clears throat]
use AI to on testing use AI to do
testing they actually double their trust
in the AI generated code. Okay, that's
one thing. The ne next suspect to help
us with the quality is code review. What
really interesting about code review
that it's a process that helps almost
with all the process level and the code
level like issues. For example, you can
set your AI code review tool to tell you
block this PR if it doesn't cover
certain level of test coverage. So
through the PR you take care of the
testing process problem. Okay. So code
like code review with AI is actually one
of one of the major things you you you
can do and people that are developers
that are using AI code review tool
they're saying that they're saying
they're seeing double the quality gain
and they're saying that actually it's it
helps them to uh improve improve 47% in
productivity of writing code. Okay. Now
a bit statistics from our own uh AI code
review tool. We scan a million of PRs a
month and we took one mill million of
those PRs and we noticed that 17%
include like high severity issues. By
the way, we're now analyzing uh before
and after using AI. I don't have that
statistics yet, but we are noticing
since we're starting uh most of the
companies we serve, they use AI
generated code. So that's why uh I don't
have before. We need to go scan
backwards. Uh and that's like a really
big a big number. Another thing I want
to talk to you like about uh when you're
trying to improve on quality is is the
foundation of having the right context
that is brought to the uh code
generation tool that is brought to the
AI code review tool better context
better quality across the board wherever
you're using AI. Uh so when we asked
developers when when you h when you
don't trust AI generated code like you
remember like 67% that like are really
worried about that they said 80 80% of
the time they don't trust the context
that the LLM have okay and and and uh
when we asked developers what would you
like to be improved in your AI generated
code in your AI code review tool they
said the number one was context it was
number one of 33% they can choose among
many things to to improve. So context is
extremely important. I can tell you that
as Kodto one of our technology moes uh
is is around context and when you
connect our context engine we're seeing
it as the number one tool that is being
used like 60% of code generator or code
review tools 60% of their calls to an
MCP would be to a context MCP. Okay. And
just to tell you the context doesn't
necessarily need to include only your
code. It could also include context to
your standards, your best practices.
We're seeing in our AI code review that
8% of the context usage is actually from
files that are related to standards and
and best practices etc. Okay, I have to
CEO of Kodo like marketing will be mad
on me if I don't brag a little bit.
Right? So this is uh kind of like our
market of our context engine being
presented by Jensen and GTC keynote and
he notice he didn't talk about our co
code review capabilities about our
testing capabilities he talked about our
context engine that Nvidia checked
because there's a realization that AI
quality AI generated whatever review
testing will come from bringing the
right context to invest in that you need
to to build your context buy a solution
and invest in it build your solution uh
etc. And the context needs to include
code uh uh versioning PR history uh
organization logs etc. That's where all
the context sits. It's not just in the
last branch of your codebase. Okay. So
I'm I'm zooming out starting to talk
about like recommendations and uh and
like uh takeaways. So what what what's
next? So automated uh quality gateways
invest in that. People talked throughout
the morning about parallel agents. You
know what I'm talking about like
background agents. You can use a lot of
those like tools and capabilities to
build build your quality gates. Uh use
intelligent code review testing and you
need a living and breathing like
documentation and and what documentation
means is is a story by itself. Uh I'm
not going to double click on it. And and
this is how I present for three years
now and I think I'm gonna go all the way
until age of 60 with this slide of how I
think the future of software development
looks like. Okay. So basically you have
your specification and you have your
code right and you have multiple agents
parallel agents that are helping you to
improve your spec write your spec
improve your code transfer transfer from
your spec to your to your code uh uh
make tests which are executable specs
right uh and and then you're going to
have your context engine the software
development database and you will build
your tools especially MCPs around
quality and verification and you'll Make
sure you have environments, stable,
secured sandboxes where those agents can
run and and run validation and quality
uh workflows. So don't don't forget like
the path forward is quality is your
competitive edge over your uh
competition. AI is a tool. It's not it's
not a solution. Okay? And don't like
only think about code generation as the
only thing. Look on the entire SDLC or
product development life cycle. I saw
one of the uh people talked um speakers
and it iterate with everything we talked
about today. I have uh I want to tell
you that you will gain value from it.
We're seeing in the reports people
seeing like security availability being
reduced faster code review you we just
got a hit on that because of a generated
code and test coverage in a month can
can triple depends on on the project
etc. with with the last minute I want to
show like a really small piece of what
you can do with codo. uh you can go into
codto and define your own rule for
example almost the same rule you'll put
on cursor of I don't like nested ifs if
this is a problem that you have but then
kodto will look on your context build
the good example the bad example and
then start giving like building a
workflow that is specifically to catch
that issue and give you statistics over
time when it's being accepted and when
not so you can adjust that rule and
really know and have visibility to to
your standards. Okay. So when a PR is
written with a few ifs and else although
it was written with cursor copilot that
had a rule do not do nested ifs etc.
then eventually when you open a PR you
will get uh codo uh uh catching that and
giving a suggestion according to the
good and the bad example. COD will also
make a graph, give you a CLI checks like
check each one of the rules and
eventually tell you the nested if and
then we'll record and learn what you did
or did not do with that suggestion in
order to adapt the standard and of the
of the quality. Um there will also
automated like suggestion. You don't
need to write your own. It learns your
your your standards and quality and
offer that to you. And that's it. I'm
I'm really really excited about like
breaking the glass ceiling, okay, with
what we did with code generation and
then a jet to code generation. Now we're
turning into the era of putting AI into
work and through the entire SDLC. The
most important part is related to
quality. You would need to invest in
that. It's not out of the box. Okay. And
then you would see eventually the
promised 2x that that probably promised
to the CEO or something like that once
they give you the budget for for the
relevant tools. Thank you so much.
[music]