OpenAI President Greg Brockman on GPT-5.5 “Spud,” AI Model Moats, and Cybersecurity Risks

Channel: Alex Kantrowitz

Published at: 2026-04-23

YouTube video id: YnoQ8RJbALw

Source: https://www.youtube.com/watch?v=YnoQ8RJbALw

OpenAI president and co-founder Greg
Brockman joins us to discuss OpenAI's
newest model Spud, aka GPT-5.5,
[music]
and where it leaves OpenAI
competitively. That's coming up right
after this. Open the big technology
podcast, today we have an emergency
episode with OpenAI president and
co-founder Greg Brockman all about
GPT-5.5,
the famous Spud model. Looking at what
it does and what it means for OpenAI.
Greg, great to see you. Welcome back to
the show.
Thank you for having me. Hope it's not
too much of an emergency.
Well, I am definitely recording in in a
Vegas hotel room, so
more emergency than our last
conversation, but we had some time to
prepare. So, it's great to be on with
you. So, let's let's just start with
with this
Can you confirm GPT-5.5 is Spud?
Yes.
Okay.
What is GPT-5.5?
Well, it's it's an amazing model. I
think in many ways it is a
step towards a new way of getting work
done with a computer. It's a new class
of intelligence. It's extremely useful
at things like programming, right? And
all the different aspects of debugging
and
solving very hard and gnarly problems,
just being very proactive and really
being able to solve problems end-to-end
with little instruction. But the thing
that's to me most remarkable is not
necessarily the fact that it got better
at coding. Like that I think is is what
everyone kind of expects. But the fact
that it's now really crossed the
threshold of usefulness for general
kinds of applications. And so, it's much
better at creating slides, spreadsheets,
much better at computer use, using your
browser, being able to kind of click
through applications that are
otherwise hard to to have an AI operate.
And so, I think that we're really seeing
the emergence of this new way of using a
computer, and it starts with this kind
of intelligence at the core.
When we spoke last, you mentioned that
this was effectively the culmination of
a two-year research process. So, was
this planned two years ago? Is that how
far back OpenAI plans?
I would say that yes, we do have very
long horizons for how we plan. Now, one
note is that we stack together many
research ideas and bets on a variety of
time scales. And so, the way to think
about it is that we are making constant
progress across every single part of the
stack. And so, what GPT-5.5 represents
is not an end point. In many ways, it's
a beginning point. It's really a step
towards the kinds of models that we see
coming over even just upcoming months.
And I think that you should expect that
we are going to have even larger
improvements in the capability across a
wide variety of these
of these aspects of what the model can
do. And that's something I think will be
very exciting, and we're just always
thinking about how can we make what
we're producing more useful for
real-world use for real users and real
applications.
Can you share specifically what those
aspects are that we should be looking
out for over the next few months?
If this is the beginning, what is it the
beginning of?
Well, I think the the big vision we
have, and you can see it reflected in
many things, not just the models, but
the the kind of you know, you think
about the models as the brain, you can
think about the
systems and the
harnesses like Codex and the
applications like the super app as
almost the body around it to make it
into a useful AI. And that's really
what's happening is a shift from
language models being the thing that is
produced by labs like ourselves to an AI
that's actually useful. It's actually an
assistant that's out there trying to
solve your goal. If it's really
operating according to your instruction.
And
you can see right now, Codex is becoming
this app that's not just for the coders.
It's really for anyone using a computer.
And that it's not perfect, right? That
there are still some tasks where that it
should be able to do it, and it doesn't
quite get it right. Sometimes the
personality isn't quite what you want
it, right? That it doesn't quite you
know, it's like extremely powerful and
out there doing a lot of really amazing
things, but the way it communicates back
to you that you have to still spend some
time really trying to read through,
okay, exactly how did it solve this
problem? And so, these aspects, we know
exactly how to make them much better.
And we've already had a
pretty remarkable improvement from 5.4
to 5.5. And I think we're going to have
even more remarkable improvements across
every single aspect of what makes these
models useful. And one thing to know
internally is that we think a lot about
the end application. Like that is one
thing that changed for us over the past
12, you know, 18 months, something like
that is that we used to really just be
focused on let's be let's improve on the
benchmarks, let's make these models more
cerebrally capable, but we now are
really focused on let's bring them to
real-world applications. Let's think
about finance, sales, marketing, every
single function that someone uses a
computer, how can we help with their
computer work? How can we actually make
the model have not just the theoretical
capability to help, but has actually
experienced those kinds of tasks, that's
actually been able to see what good
looks like. And I think that the place
we're going is one where you as a person
doing work that you are the overseer,
you are the the CEO of almost this
autonomous corporation, or you know, of
this this fleet of agents perhaps is
more is is the way to say it. And
that they are operating according to
your goals. Now, you are still
accountable, right? You're still in the
driver's seat. You're still the person
who thinks about, well, is this what I
actually wanted? Was this work up to up
to standard? But that the details of
exactly what buttons were clicked and
exactly the kind of code that was
written or exactly how the formula in
the spreadsheet works, that you can
abstract yourself from those if they're
not important to the evaluation of
whether or not something was what you
wanted. And so, I think it's like
increasing leverage for every worker.
Okay. Let me take my best guess as to
what's happening, and you tell me how
close I am. I mean, I'm thinking about
this. This is like a like you mentioned,
a culmination of two years of work.
There's two different types of I mean,
not to tell you you know this, but for
our audience, two different types of AI
training. There's the pre-training, or
at least the ones that have been
pertinent for these models. The
pre-training, where you just make the
model generally smart by having it
predict the next word and the
reinforcement training, where you have
it like go out and actually take, you
know, try to accomplish different tasks,
and you reward it when it does a good
job with those tasks, and effectively it
sort of teaches or learns how to how to
do those tasks.
Is is what you're saying basically that
like this is the first result that we're
seeing where OpenAI has just loaded a
ton of reinforcement learning on
task-specific stuff into this model, and
that's what's producing the results
you're talking about.
Well, I would actually say it a little
differently. I would say that there's
many steps in the pipeline, right? That
there's pre-training, mid-training,
reinforcement learning. There's, you
know, the data collection. There's like
a lot of these different things that all
come together to produce the end result.
And the way in which it's connected to
the world, that's also very key to
making it useful. And the thing that I'm
really saying is we have been investing
on every single one of these, and have a
repeatable We have like a team, right?
That it's not just about individuals
working on these pieces, but a team that
really comes together and looks across
the whole stack to say, how do we make
this more useful for real-world
applications? And so, it's not really
any one thing that we do. It's really
about the the overall effort of trying
to Like if you think about if you're
building a car, right? That there's it's
not just about do you have like a better
engine, right? You can build a great
engine, but if the rest of the car is
not up to the quality level of the
engine, it's not going to matter. And
so, I think that that is the real
innovation. It's really the end-to-end
co-design, and all coming together in a
repeatable fashion to make these models
better and better for our users.
You were on a media call earlier today
with myself and a number of members of
the press, and
one of the interesting things that you
said, or basically I think you said this
right off the bat, is that the model
more intuitively knows
what you want, and you don't have to
spell it out exactly as you as you would
in the past. Here's a tweet from from
Rune. There are early signs of 5.5 being
a competent AI research partner. Several
researchers let 5.5 run variations of
experiments overnight given only a
high-level algorithmic idea, waking up
to find a completed sweep, dashboards,
and samples, never having touched the
code or terminal at all.
Um
Just a if you can answer briefly on
this, a two-parter. How do you do that?
And does that mean prompt engineering is
dead?
Um number one,
I think it really comes down to when we
say there's a new class of capability,
new class of intelligence. That's really
what we mean, right? The models are
becoming much more intuitive to use
because they have deeper understanding
of what it is you're asking of them,
right? That they really look at the
context, try to understand and puzzle
out, what am I being asked to do? And it
really makes you realize, you know, to
the second part, is prompt engineering
dead? Which I actually think that prompt
engineering in some ways may be even
more vibrant than before.
Um
But you spend so much time right now
trying to explain to your computer what
you even want. You try to like pack in
this context and be like, well, here's
what's going on, here's the situation,
here's the thing I want from you. And
you're like, why do I have to explain
this to my computer? Right? Like the
whole thing is the computer should be
doing the work to help me. Like I don't
want to have to be sort of, you know,
breaking down the task, trying to
explain to it step by step how to do
things. I want to point it in a
direction, and I want it to be able to
take care of the details, and to get me
the result. Again, in a way that I can
observe and kind of provide feedback
along the way, but I want it to be the
driver of the of the of those like
low-level execution. And so, I think
that in some ways where prompt
engineering is going to go is is going
to be about you can get so much more out
of these models with so much less
effort, but with the same amount of
effort,
you still have a multiplier. Think about
how much more you could even get. And I
think that we're just at the leading
edge right now of seeing the ceiling of
what is capable what even today's models
are capable of.
Okay. Let me briefly speak with you
about the economics of building a model
like this. There's been this pattern
where these big massive models. Now,
you're not saying how much money your
computer used you've used to train this,
uh but I think we can be safe in
assuming it was a lot. And there's been
this pattern where these massive models
come out, uh they get distilled uh by
open-source model makers, and then
open-source is just a couple months
behind the leading foundational models.
Um and you know, I guess like when the
investment was smaller, being a couple
months ahead, uh you know, mattered a
lot. But I'm curious now that the
investment is so big, um and the models
are the capabilities are increasing,
you know, fairly dramatically, you know,
uh as you go, um
how is this defensible in the long term
if you're just going to have that
pattern repeat over and over?
Well, I look at it a little differently.
Like I think that the real
investment that we are making is into
that end-to-end co-design, right? Of
having a system, a system of people,
right? Who are producing this
technology, right? A way of working
together, and some of this is about how
you leverage these massive
supercomputers to produce these models.
Now, it is also the case that
it's not as simple as you can take the
outputs of these models and distill and
you have exactly the model of the same
capability.
It's just smaller and could run fast. If
that were the case, we would just do
that, and then we would also have a
model that would be, you know, much more
uh easy to serve it in many ways. And of
course, there's a lot of art behind
distillation. There's a lot of great
things there. But the point that I'm
getting at is that the real thing that
we are investing in is the machine that
makes the machine. Now, the at the
deployment side, we think a lot about
safeguards. We think a lot about
mitigations. And we do that for many,
many different aspects of how these
models could be misused um
in real situations. And that's something
that we have been investing in for many
years, and we think about that across
areas like cyber, or think about that in
areas like bio, that we have a I a
long-standing effort that you can see in
our preparedness framework, which is
public, about how we approach these
kinds of uses of the model, and how we
try to make maximize the benefits,
mitigate the risks. And so I think it's
a real motion that every piece of what
we do needs to connect to the question
of how do we
continue to make progress, but also how
do we make these models broadly
available, because that's something that
we really believe in, that we believe
this technology empowers people, and
that we want it to benefit people and
lift everyone up.
Yeah, but just to go back on that, um
the pricing on this model is, I think,
double the last model, GPT-5.4.
Um and so from an economics or a
business standpoint, the question would
be, you know, let's say you keep on
progressing, but because there's been
all this infrastructure that
infrastructure that's been put towards
training the models, if open-source uh
can deliver not as good performance, but
almost as good, uh and do it cheaper, um
how do you handle that threat?
Well, again, I I look at it a little
differently. So first of all, if you
look at our history,
which really is not driven by anything
in competition. It's just like our our
own sort of
progress and and desire, we have dropped
prices on the same level of intelligence
year over year, sometimes by literally a
factor of a hundred. Right? It's like at
least an order of magnitude year over
year, sometimes literally a hundred. But
the thing that keeps happening, it's
real Jevons paradox, where it's like you
lower the cost of something, way more
activity happens, right? And I think
that what we keep seeing is that there
are returns to intelligence, right? That
for the kinds of tasks that these models
are now capable of doing, that a little
bit more intelligence goes a long way.
And I think that is the story of 5.5,
that in some ways, you can almost look
at it as like, oh, there's just an
incremental improvement in intelligence,
but I think there's going to be a
massive improvement in terms of what
people use it for. And by the way, I
actually think that incremental is
actually very much an understatement for
this model relative to 5.5. You know,
it's a point one uh improvement in some
ways, um but I think that that that
actually really undersells the magic
that we see within this model, and that
the that our early testers have have
really seen in their practical work. So
if people see these numbers and they
say,
um
uh there's IPO pressure on uh on OpenAI,
and therefore the, you know, we've been
getting a great deal on intelligence,
and the free ride is over.
You would argue against that. I I Yeah,
look, the way I think about this is that
we have a very simple business in some
ways, right? We rent, build, buy
compute, and we resell it with some
positive margin. And as long as it's,
you know, positive operating margin,
and as long as there's scalable demand
for intelligence, which I think is true
as long as there's problems to solve,
like no one's going to run out of
problems to solve, and we've seen this
at every step that the demand outstrips
our supply, then we can scale that
compute all day. And I think that that
in my mind, that's the main directive
that I that I ask of the team, it's just
like just think about we need to add
value on top of the raw compute, and
make sure that we are at positive
operating margin on it, and that that is
something where it's actually not even
about
like like the different competition in
the marketplace. It's just a question of
can you
like have compute that gets turned into
intelligence, and that's just how, you
know, that it does that at at a, you
know, slightly improved uh you know,
value coming out relative to the cost
going in. And I think that that is
something where again,
we're always trying to make more
efficient models, but then we just want
more of them, and then we want the more
intelligent models. And regardless of
where they're coming from, it's kind of
all the same compute that's going in.
And so I think that it's actually a
great like competition this marketplace
has been great for innovation, um but I
think that it's actually something where
it's driving more usage and more overall
spend in the ecosystem, and you can see
that in the revenue numbers of us and
you know, others in this industry.
Okay, I want to take a quick break and
come back and talk to talk with you
about cybersecurity, trust, and whatever
else we can get to in our time in this
emergency show. We'll be back right
after this. And we're back here on Big
Technology Podcast with OpenAI President
and Co-founder Greg Brockman.
Um
Greg, let me ask you about uh the
cybersecurity implications here. Um two
very different approaches between OpenAI
and Anthropic. Anthropic's uh latest
massive model, Mythos, uh is not
released to the public. Um this one, you
know, Spud or 5.5, is released to the
public.
I mean, let me just ask you straight up,
is there a chance that uh releasing this
powerful model into the, you know, into
the public without this like
step-by-step uh practice could lead to
some major cyberattacks? Well, I I
actually have a different view on the
premise of the question. So the thing to
understand is that we have been
investing in cyber safeguards and
cybersecurity as a part of our
preparedness framework for years, right?
That this is something we have invested
in far ahead of having the kinds of
capabilities we see coming. And so we
have been taking a very deliberate
step-by-step approach. You can see even
just over the past couple weeks where
we've expanded our trusted access for
cyber program,
and in general, we believe in ecosystem
resilience, right? That we think that
you do want to go step-by-step, that
these models are going to continuously
better. We have line of sight to even
more capable ones, and that you want to
be able to I put these models in the
hands of defenders to make sure that
you're able to protect critical
infrastructure, and we believe in in
that resilience of as you can bring
these models into people's hands, that
that then they're able to explore in
ways that you would not be able to
without that kind of access. And so you
kind of want this graduated approach,
and to make sure that you are moving
down that pipeline as you can bring in
additional safeguards in order to make
sure that you can maximize the benefits
and mitigate the risks. And so we've
really taken a deliberate approach. I
think our team has been working
incredibly hard to think through the
cyber implications of this model.
Um we also believe in iterative
deployment. Uh that's part of this
really bringing the models as they
continuously get better, and we believe
in democratic access. That we believe
that ultimately the goal of creating
this technology is to empower people, to
ensure that it does benefit all of
humanity. And so we are constantly
trying to solve for how do we safely and
responsibly bring this technology to
bear in the world in a broad way.
Right. And um I think suffice it to say
that um your team hasn't been fans of
the way that Anthropic's deployed
Mythos. This is a quote from Sam. Uh
it's clearly incredible marketing to
say, "We have built a bomb. We're about
to drop it on your head. We will sell
you a bomb shelter for a hundred million
to run all your to run across all your
stuff, but only if we pick you as a
customer." Um let me talk through the
other case, and then get your response.
Um
the other case would be there are you
can't account for everything, and there
are clearly going to be some
vulnerabilities that can that will only
be found by people or entities deploying
this and looking for them. So maybe it
makes sense to start with a trusted
group of testers before you deploy it
before you deploy it broadly. What do
you think?
Well, I believe the correct answer here
is subtle, and I think it is rooted in
the technical specifics of what you have
in front of you and many, many factors,
right? You need to think about how are
the models progressing, right? Not just
your own capabilities, but others in the
ecosystem. You need to think about
what kind of benefit do you get from
having a small group that has access and
are able to have, you know, are they
able to have high leverage by by being
able to find and produce patches, but
then how do you actually coordinate the
disclosure of those across an industry?
And so there's a lot of factors that go
into it. I think that the true answer is
like if either extreme is not quite
right. There are tools that can be
applied to a specific situation. And I
think that this is not the first time
we've had to think about this problem.
It's not the last time we will have to
think about it. But one thing to note is
that we have had our model in the hands
of defenders um for some time that we've
been building up our trusted access
program. That the model that we're
releasing is actually not cyber
permissive, right? That it actually has
a number of safeguards built into it.
And that you can then have a gap between
what you're privately sharing, testing,
those kinds of things. And so I think my
my short answer is like it's there's
definitely these different schools of
thought in terms of values of is the
value that you want to get these models
into people's hands and empower them? Or
is the value that you kind of want them
to be centralized and controlled and
that you don't want them in people's
hands? That is something that is a maybe
underlying tension in some of these
debates. But I think that the tactics,
right? This you know, that those almost
flow from the details and that they can
be informed by these values. But either
extreme reflexively, I don't think will
yield the best outcome for the world.
Okay, I want to ask you about agents. Uh
back to agents if we could. Um these
agents work work the best if you sort of
let them
uh have a high degree of autonomy. I
mean, sort of makes sense. Um so I'm
just kind of curious to hear your
perspective. As we get more agents that
can do more things and access more files
and work across programs, what is the
proper amount of trust to put into
agents right now?
So, I think that right now actually
agents tend to be quite reliable. Um and
even things like prompt injections, I
think that there's still holes there,
but that we're patching them. And that
the models are becoming much more
resilient. But I also think that the
flip side is that as these models be are
given increasing responsibility and
access to more important context, that
you need to have some answer for just
like if you have employees, you know, if
you have a team of five employees,
they're all kind of trustworthy,
fine. But if you have 500,000 of the
same employees, that some somehow those
numbers, right? Just like that there's
the law of large numbers that you start
to worry about, okay, how do I have good
governance and oversight? Right.
>> And so this is something where as we're
investing in these capabilities and but
making the super got more accessible not
just to coders, but to to any person
doing work with a computer, also
investing in governance and oversight.
And you can see this very concretely in
workspace agents, which we released
recently. So that's within your
enterprise, you can now define agents.
So you get a hosted Codex harness in the
cloud, you can hook up tools, you can
hook it up to your Slack, and it's doing
work. It's like awesome. A lot of people
use it. It's been very cool to see how
sort of viral it goes within an
organization when you see you use
someone else's agent, you're like, wait,
I could build one of these too. And you
can just fork it and and do your own
thing.
And then that's an opportunity to have
great governance and that you can see
that's baked into the product where your
IT organization can see all the agents
that have been created that for an
agent, you can see the conversations
it's had and that you can think about
exactly what the guardrails are around
it. So I think that the short answer is
like you want to ramp the
responsibility entrusted with the agent
and the diversity of things that agents
are doing together with security,
safety, observability, oversight. And if
you're not doing those in hand then I
think that that that's a little bit out
of balance and I think it's important to
to think about both sides.
Yeah, basically go ahead, but be
careful.
But you and really lean in, right? I
think it's like as you scale, like you
can prototype and that that it's just
the nature of scale that starts to bring
in the do you still have the ability to
to oversee what's going on? So you need
to to kind of make sure at each step do
you feel like you're calibrated, do you
understand what your teams are up to?
Greg, let's end with this. Um you've
called this a compute-powered economy.
What does that mean?
Well, I think we are heading to a world
where the more compute is poured into a
problem, the faster that problem will be
solved. And that the ceiling of problem
that can be solved depends on how much
compute is available. And you think
about things like drug discovery, right?
Being able to solve complex diseases.
Like those are solving complex diseases
like Alzheimer's is kind of outside of
humanity's reach right now. We've never
really done it. But
imagine a world where you can take a
gigawatt data center and have it just
think about how to solve Alzheimer's.
For a month, for a year, however long it
takes. And it may not be literally just
cerebrally solving this problem, but it
may have to consult with world experts.
Maybe it has to
suggest experiments that get run in a
wet lab. But if you can actually solve
such a problem, that would be such a
transformatively positive thing for
humanity. And I think we're heading to a
world where that is how important
problems get solved. And that is how
tasks in your daily life can also be
solved. Whether it's having an agent
that knows you, that has your personal
context, that is trustworthy, that you
can ask for
advice on health and you get back
trustworthy information. Um
And that's just a thing that's a
smartphone that's that's in your pocket,
right? You can just talk to and it'll be
out there doing things. It proactively
knows what are your goals, what are your
interests, and how it can help you. And
I think that big and small compute is
going to be the resource that shows how
much computers can be used to help
people, to do work on behalf of people.
And I think we're heading to that world
and it's one that we're all building
collectively.
Yeah, and that I think would explain the
massive investments that you've led
making these big infrastructure bets.
Still not enough. We're going to feel
the scarcity. We're going to feel it.
We're feeling it already. You can sense
it right now [music] on people who are
trying to use these agents and just
simply cannot, you know, hitting the
rate limits. Um so we're working on
behalf of our customers, [music] on
behalf of of everyone who wants to use
these agents to ensure that there is
enough. And I don't think we're going to
get there. We're going to do our best,
but I think that we are headed to a
world of compute scarcity. And uh again,
I think this is something where
>> [music]
>> we can all contribute to trying to help
there just be more availability of this
in the world.
Greg, busy day. Always appreciate
[music] your time. Always great to speak
with you. Thanks again for coming on.
Likewise. Great chatting.