OpenAI President Greg Brockman on GPT-5.5 “Spud,” AI Model Moats, and Cybersecurity Risks
Channel: Alex Kantrowitz
Published at: 2026-04-23
YouTube video id: YnoQ8RJbALw
Source: https://www.youtube.com/watch?v=YnoQ8RJbALw
OpenAI president and co-founder Greg Brockman joins us to discuss OpenAI's newest model Spud, aka GPT-5.5, [music] and where it leaves OpenAI competitively. That's coming up right after this. Open the big technology podcast, today we have an emergency episode with OpenAI president and co-founder Greg Brockman all about GPT-5.5, the famous Spud model. Looking at what it does and what it means for OpenAI. Greg, great to see you. Welcome back to the show. Thank you for having me. Hope it's not too much of an emergency. Well, I am definitely recording in in a Vegas hotel room, so more emergency than our last conversation, but we had some time to prepare. So, it's great to be on with you. So, let's let's just start with with this Can you confirm GPT-5.5 is Spud? Yes. Okay. What is GPT-5.5? Well, it's it's an amazing model. I think in many ways it is a step towards a new way of getting work done with a computer. It's a new class of intelligence. It's extremely useful at things like programming, right? And all the different aspects of debugging and solving very hard and gnarly problems, just being very proactive and really being able to solve problems end-to-end with little instruction. But the thing that's to me most remarkable is not necessarily the fact that it got better at coding. Like that I think is is what everyone kind of expects. But the fact that it's now really crossed the threshold of usefulness for general kinds of applications. And so, it's much better at creating slides, spreadsheets, much better at computer use, using your browser, being able to kind of click through applications that are otherwise hard to to have an AI operate. And so, I think that we're really seeing the emergence of this new way of using a computer, and it starts with this kind of intelligence at the core. When we spoke last, you mentioned that this was effectively the culmination of a two-year research process. So, was this planned two years ago? Is that how far back OpenAI plans? I would say that yes, we do have very long horizons for how we plan. Now, one note is that we stack together many research ideas and bets on a variety of time scales. And so, the way to think about it is that we are making constant progress across every single part of the stack. And so, what GPT-5.5 represents is not an end point. In many ways, it's a beginning point. It's really a step towards the kinds of models that we see coming over even just upcoming months. And I think that you should expect that we are going to have even larger improvements in the capability across a wide variety of these of these aspects of what the model can do. And that's something I think will be very exciting, and we're just always thinking about how can we make what we're producing more useful for real-world use for real users and real applications. Can you share specifically what those aspects are that we should be looking out for over the next few months? If this is the beginning, what is it the beginning of? Well, I think the the big vision we have, and you can see it reflected in many things, not just the models, but the the kind of you know, you think about the models as the brain, you can think about the systems and the harnesses like Codex and the applications like the super app as almost the body around it to make it into a useful AI. And that's really what's happening is a shift from language models being the thing that is produced by labs like ourselves to an AI that's actually useful. It's actually an assistant that's out there trying to solve your goal. If it's really operating according to your instruction. And you can see right now, Codex is becoming this app that's not just for the coders. It's really for anyone using a computer. And that it's not perfect, right? That there are still some tasks where that it should be able to do it, and it doesn't quite get it right. Sometimes the personality isn't quite what you want it, right? That it doesn't quite you know, it's like extremely powerful and out there doing a lot of really amazing things, but the way it communicates back to you that you have to still spend some time really trying to read through, okay, exactly how did it solve this problem? And so, these aspects, we know exactly how to make them much better. And we've already had a pretty remarkable improvement from 5.4 to 5.5. And I think we're going to have even more remarkable improvements across every single aspect of what makes these models useful. And one thing to know internally is that we think a lot about the end application. Like that is one thing that changed for us over the past 12, you know, 18 months, something like that is that we used to really just be focused on let's be let's improve on the benchmarks, let's make these models more cerebrally capable, but we now are really focused on let's bring them to real-world applications. Let's think about finance, sales, marketing, every single function that someone uses a computer, how can we help with their computer work? How can we actually make the model have not just the theoretical capability to help, but has actually experienced those kinds of tasks, that's actually been able to see what good looks like. And I think that the place we're going is one where you as a person doing work that you are the overseer, you are the the CEO of almost this autonomous corporation, or you know, of this this fleet of agents perhaps is more is is the way to say it. And that they are operating according to your goals. Now, you are still accountable, right? You're still in the driver's seat. You're still the person who thinks about, well, is this what I actually wanted? Was this work up to up to standard? But that the details of exactly what buttons were clicked and exactly the kind of code that was written or exactly how the formula in the spreadsheet works, that you can abstract yourself from those if they're not important to the evaluation of whether or not something was what you wanted. And so, I think it's like increasing leverage for every worker. Okay. Let me take my best guess as to what's happening, and you tell me how close I am. I mean, I'm thinking about this. This is like a like you mentioned, a culmination of two years of work. There's two different types of I mean, not to tell you you know this, but for our audience, two different types of AI training. There's the pre-training, or at least the ones that have been pertinent for these models. The pre-training, where you just make the model generally smart by having it predict the next word and the reinforcement training, where you have it like go out and actually take, you know, try to accomplish different tasks, and you reward it when it does a good job with those tasks, and effectively it sort of teaches or learns how to how to do those tasks. Is is what you're saying basically that like this is the first result that we're seeing where OpenAI has just loaded a ton of reinforcement learning on task-specific stuff into this model, and that's what's producing the results you're talking about. Well, I would actually say it a little differently. I would say that there's many steps in the pipeline, right? That there's pre-training, mid-training, reinforcement learning. There's, you know, the data collection. There's like a lot of these different things that all come together to produce the end result. And the way in which it's connected to the world, that's also very key to making it useful. And the thing that I'm really saying is we have been investing on every single one of these, and have a repeatable We have like a team, right? That it's not just about individuals working on these pieces, but a team that really comes together and looks across the whole stack to say, how do we make this more useful for real-world applications? And so, it's not really any one thing that we do. It's really about the the overall effort of trying to Like if you think about if you're building a car, right? That there's it's not just about do you have like a better engine, right? You can build a great engine, but if the rest of the car is not up to the quality level of the engine, it's not going to matter. And so, I think that that is the real innovation. It's really the end-to-end co-design, and all coming together in a repeatable fashion to make these models better and better for our users. You were on a media call earlier today with myself and a number of members of the press, and one of the interesting things that you said, or basically I think you said this right off the bat, is that the model more intuitively knows what you want, and you don't have to spell it out exactly as you as you would in the past. Here's a tweet from from Rune. There are early signs of 5.5 being a competent AI research partner. Several researchers let 5.5 run variations of experiments overnight given only a high-level algorithmic idea, waking up to find a completed sweep, dashboards, and samples, never having touched the code or terminal at all. Um Just a if you can answer briefly on this, a two-parter. How do you do that? And does that mean prompt engineering is dead? Um number one, I think it really comes down to when we say there's a new class of capability, new class of intelligence. That's really what we mean, right? The models are becoming much more intuitive to use because they have deeper understanding of what it is you're asking of them, right? That they really look at the context, try to understand and puzzle out, what am I being asked to do? And it really makes you realize, you know, to the second part, is prompt engineering dead? Which I actually think that prompt engineering in some ways may be even more vibrant than before. Um But you spend so much time right now trying to explain to your computer what you even want. You try to like pack in this context and be like, well, here's what's going on, here's the situation, here's the thing I want from you. And you're like, why do I have to explain this to my computer? Right? Like the whole thing is the computer should be doing the work to help me. Like I don't want to have to be sort of, you know, breaking down the task, trying to explain to it step by step how to do things. I want to point it in a direction, and I want it to be able to take care of the details, and to get me the result. Again, in a way that I can observe and kind of provide feedback along the way, but I want it to be the driver of the of the of those like low-level execution. And so, I think that in some ways where prompt engineering is going to go is is going to be about you can get so much more out of these models with so much less effort, but with the same amount of effort, you still have a multiplier. Think about how much more you could even get. And I think that we're just at the leading edge right now of seeing the ceiling of what is capable what even today's models are capable of. Okay. Let me briefly speak with you about the economics of building a model like this. There's been this pattern where these big massive models. Now, you're not saying how much money your computer used you've used to train this, uh but I think we can be safe in assuming it was a lot. And there's been this pattern where these massive models come out, uh they get distilled uh by open-source model makers, and then open-source is just a couple months behind the leading foundational models. Um and you know, I guess like when the investment was smaller, being a couple months ahead, uh you know, mattered a lot. But I'm curious now that the investment is so big, um and the models are the capabilities are increasing, you know, fairly dramatically, you know, uh as you go, um how is this defensible in the long term if you're just going to have that pattern repeat over and over? Well, I look at it a little differently. Like I think that the real investment that we are making is into that end-to-end co-design, right? Of having a system, a system of people, right? Who are producing this technology, right? A way of working together, and some of this is about how you leverage these massive supercomputers to produce these models. Now, it is also the case that it's not as simple as you can take the outputs of these models and distill and you have exactly the model of the same capability. It's just smaller and could run fast. If that were the case, we would just do that, and then we would also have a model that would be, you know, much more uh easy to serve it in many ways. And of course, there's a lot of art behind distillation. There's a lot of great things there. But the point that I'm getting at is that the real thing that we are investing in is the machine that makes the machine. Now, the at the deployment side, we think a lot about safeguards. We think a lot about mitigations. And we do that for many, many different aspects of how these models could be misused um in real situations. And that's something that we have been investing in for many years, and we think about that across areas like cyber, or think about that in areas like bio, that we have a I a long-standing effort that you can see in our preparedness framework, which is public, about how we approach these kinds of uses of the model, and how we try to make maximize the benefits, mitigate the risks. And so I think it's a real motion that every piece of what we do needs to connect to the question of how do we continue to make progress, but also how do we make these models broadly available, because that's something that we really believe in, that we believe this technology empowers people, and that we want it to benefit people and lift everyone up. Yeah, but just to go back on that, um the pricing on this model is, I think, double the last model, GPT-5.4. Um and so from an economics or a business standpoint, the question would be, you know, let's say you keep on progressing, but because there's been all this infrastructure that infrastructure that's been put towards training the models, if open-source uh can deliver not as good performance, but almost as good, uh and do it cheaper, um how do you handle that threat? Well, again, I I look at it a little differently. So first of all, if you look at our history, which really is not driven by anything in competition. It's just like our our own sort of progress and and desire, we have dropped prices on the same level of intelligence year over year, sometimes by literally a factor of a hundred. Right? It's like at least an order of magnitude year over year, sometimes literally a hundred. But the thing that keeps happening, it's real Jevons paradox, where it's like you lower the cost of something, way more activity happens, right? And I think that what we keep seeing is that there are returns to intelligence, right? That for the kinds of tasks that these models are now capable of doing, that a little bit more intelligence goes a long way. And I think that is the story of 5.5, that in some ways, you can almost look at it as like, oh, there's just an incremental improvement in intelligence, but I think there's going to be a massive improvement in terms of what people use it for. And by the way, I actually think that incremental is actually very much an understatement for this model relative to 5.5. You know, it's a point one uh improvement in some ways, um but I think that that that actually really undersells the magic that we see within this model, and that the that our early testers have have really seen in their practical work. So if people see these numbers and they say, um uh there's IPO pressure on uh on OpenAI, and therefore the, you know, we've been getting a great deal on intelligence, and the free ride is over. You would argue against that. I I Yeah, look, the way I think about this is that we have a very simple business in some ways, right? We rent, build, buy compute, and we resell it with some positive margin. And as long as it's, you know, positive operating margin, and as long as there's scalable demand for intelligence, which I think is true as long as there's problems to solve, like no one's going to run out of problems to solve, and we've seen this at every step that the demand outstrips our supply, then we can scale that compute all day. And I think that that in my mind, that's the main directive that I that I ask of the team, it's just like just think about we need to add value on top of the raw compute, and make sure that we are at positive operating margin on it, and that that is something where it's actually not even about like like the different competition in the marketplace. It's just a question of can you like have compute that gets turned into intelligence, and that's just how, you know, that it does that at at a, you know, slightly improved uh you know, value coming out relative to the cost going in. And I think that that is something where again, we're always trying to make more efficient models, but then we just want more of them, and then we want the more intelligent models. And regardless of where they're coming from, it's kind of all the same compute that's going in. And so I think that it's actually a great like competition this marketplace has been great for innovation, um but I think that it's actually something where it's driving more usage and more overall spend in the ecosystem, and you can see that in the revenue numbers of us and you know, others in this industry. Okay, I want to take a quick break and come back and talk to talk with you about cybersecurity, trust, and whatever else we can get to in our time in this emergency show. We'll be back right after this. And we're back here on Big Technology Podcast with OpenAI President and Co-founder Greg Brockman. Um Greg, let me ask you about uh the cybersecurity implications here. Um two very different approaches between OpenAI and Anthropic. Anthropic's uh latest massive model, Mythos, uh is not released to the public. Um this one, you know, Spud or 5.5, is released to the public. I mean, let me just ask you straight up, is there a chance that uh releasing this powerful model into the, you know, into the public without this like step-by-step uh practice could lead to some major cyberattacks? Well, I I actually have a different view on the premise of the question. So the thing to understand is that we have been investing in cyber safeguards and cybersecurity as a part of our preparedness framework for years, right? That this is something we have invested in far ahead of having the kinds of capabilities we see coming. And so we have been taking a very deliberate step-by-step approach. You can see even just over the past couple weeks where we've expanded our trusted access for cyber program, and in general, we believe in ecosystem resilience, right? That we think that you do want to go step-by-step, that these models are going to continuously better. We have line of sight to even more capable ones, and that you want to be able to I put these models in the hands of defenders to make sure that you're able to protect critical infrastructure, and we believe in in that resilience of as you can bring these models into people's hands, that that then they're able to explore in ways that you would not be able to without that kind of access. And so you kind of want this graduated approach, and to make sure that you are moving down that pipeline as you can bring in additional safeguards in order to make sure that you can maximize the benefits and mitigate the risks. And so we've really taken a deliberate approach. I think our team has been working incredibly hard to think through the cyber implications of this model. Um we also believe in iterative deployment. Uh that's part of this really bringing the models as they continuously get better, and we believe in democratic access. That we believe that ultimately the goal of creating this technology is to empower people, to ensure that it does benefit all of humanity. And so we are constantly trying to solve for how do we safely and responsibly bring this technology to bear in the world in a broad way. Right. And um I think suffice it to say that um your team hasn't been fans of the way that Anthropic's deployed Mythos. This is a quote from Sam. Uh it's clearly incredible marketing to say, "We have built a bomb. We're about to drop it on your head. We will sell you a bomb shelter for a hundred million to run all your to run across all your stuff, but only if we pick you as a customer." Um let me talk through the other case, and then get your response. Um the other case would be there are you can't account for everything, and there are clearly going to be some vulnerabilities that can that will only be found by people or entities deploying this and looking for them. So maybe it makes sense to start with a trusted group of testers before you deploy it before you deploy it broadly. What do you think? Well, I believe the correct answer here is subtle, and I think it is rooted in the technical specifics of what you have in front of you and many, many factors, right? You need to think about how are the models progressing, right? Not just your own capabilities, but others in the ecosystem. You need to think about what kind of benefit do you get from having a small group that has access and are able to have, you know, are they able to have high leverage by by being able to find and produce patches, but then how do you actually coordinate the disclosure of those across an industry? And so there's a lot of factors that go into it. I think that the true answer is like if either extreme is not quite right. There are tools that can be applied to a specific situation. And I think that this is not the first time we've had to think about this problem. It's not the last time we will have to think about it. But one thing to note is that we have had our model in the hands of defenders um for some time that we've been building up our trusted access program. That the model that we're releasing is actually not cyber permissive, right? That it actually has a number of safeguards built into it. And that you can then have a gap between what you're privately sharing, testing, those kinds of things. And so I think my my short answer is like it's there's definitely these different schools of thought in terms of values of is the value that you want to get these models into people's hands and empower them? Or is the value that you kind of want them to be centralized and controlled and that you don't want them in people's hands? That is something that is a maybe underlying tension in some of these debates. But I think that the tactics, right? This you know, that those almost flow from the details and that they can be informed by these values. But either extreme reflexively, I don't think will yield the best outcome for the world. Okay, I want to ask you about agents. Uh back to agents if we could. Um these agents work work the best if you sort of let them uh have a high degree of autonomy. I mean, sort of makes sense. Um so I'm just kind of curious to hear your perspective. As we get more agents that can do more things and access more files and work across programs, what is the proper amount of trust to put into agents right now? So, I think that right now actually agents tend to be quite reliable. Um and even things like prompt injections, I think that there's still holes there, but that we're patching them. And that the models are becoming much more resilient. But I also think that the flip side is that as these models be are given increasing responsibility and access to more important context, that you need to have some answer for just like if you have employees, you know, if you have a team of five employees, they're all kind of trustworthy, fine. But if you have 500,000 of the same employees, that some somehow those numbers, right? Just like that there's the law of large numbers that you start to worry about, okay, how do I have good governance and oversight? Right. >> And so this is something where as we're investing in these capabilities and but making the super got more accessible not just to coders, but to to any person doing work with a computer, also investing in governance and oversight. And you can see this very concretely in workspace agents, which we released recently. So that's within your enterprise, you can now define agents. So you get a hosted Codex harness in the cloud, you can hook up tools, you can hook it up to your Slack, and it's doing work. It's like awesome. A lot of people use it. It's been very cool to see how sort of viral it goes within an organization when you see you use someone else's agent, you're like, wait, I could build one of these too. And you can just fork it and and do your own thing. And then that's an opportunity to have great governance and that you can see that's baked into the product where your IT organization can see all the agents that have been created that for an agent, you can see the conversations it's had and that you can think about exactly what the guardrails are around it. So I think that the short answer is like you want to ramp the responsibility entrusted with the agent and the diversity of things that agents are doing together with security, safety, observability, oversight. And if you're not doing those in hand then I think that that that's a little bit out of balance and I think it's important to to think about both sides. Yeah, basically go ahead, but be careful. But you and really lean in, right? I think it's like as you scale, like you can prototype and that that it's just the nature of scale that starts to bring in the do you still have the ability to to oversee what's going on? So you need to to kind of make sure at each step do you feel like you're calibrated, do you understand what your teams are up to? Greg, let's end with this. Um you've called this a compute-powered economy. What does that mean? Well, I think we are heading to a world where the more compute is poured into a problem, the faster that problem will be solved. And that the ceiling of problem that can be solved depends on how much compute is available. And you think about things like drug discovery, right? Being able to solve complex diseases. Like those are solving complex diseases like Alzheimer's is kind of outside of humanity's reach right now. We've never really done it. But imagine a world where you can take a gigawatt data center and have it just think about how to solve Alzheimer's. For a month, for a year, however long it takes. And it may not be literally just cerebrally solving this problem, but it may have to consult with world experts. Maybe it has to suggest experiments that get run in a wet lab. But if you can actually solve such a problem, that would be such a transformatively positive thing for humanity. And I think we're heading to a world where that is how important problems get solved. And that is how tasks in your daily life can also be solved. Whether it's having an agent that knows you, that has your personal context, that is trustworthy, that you can ask for advice on health and you get back trustworthy information. Um And that's just a thing that's a smartphone that's that's in your pocket, right? You can just talk to and it'll be out there doing things. It proactively knows what are your goals, what are your interests, and how it can help you. And I think that big and small compute is going to be the resource that shows how much computers can be used to help people, to do work on behalf of people. And I think we're heading to that world and it's one that we're all building collectively. Yeah, and that I think would explain the massive investments that you've led making these big infrastructure bets. Still not enough. We're going to feel the scarcity. We're going to feel it. We're feeling it already. You can sense it right now [music] on people who are trying to use these agents and just simply cannot, you know, hitting the rate limits. Um so we're working on behalf of our customers, [music] on behalf of of everyone who wants to use these agents to ensure that there is enough. And I don't think we're going to get there. We're going to do our best, but I think that we are headed to a world of compute scarcity. And uh again, I think this is something where >> [music] >> we can all contribute to trying to help there just be more availability of this in the world. Greg, busy day. Always appreciate [music] your time. Always great to speak with you. Thanks again for coming on. Likewise. Great chatting.