OpenAI Codex Masterclass — Vaibhav Srivastav & Katia Gil Guzman

Channel: aiDotEngineer

Published at: 2026-04-29

YouTube video id: MhHEGMFCEB0

Source: https://www.youtube.com/watch?v=MhHEGMFCEB0

Hi everyone. Thank you for being here.
So today we're going to talk about
codeex. My name is Katya Katya Gilusman
and I'm with VB. Uh we are both uh
working in the developer experience team
at OpenAI based in London. And so our
role is really to help developers build
and get the most out of our products uh
including Codeex. And so today we're
going to start with a quick CEX
overview. Uh just so we know how many of
you here are using Codex. Can you raise
your hands?
>> Yay.
>> Okay, cool. So,
we we're not gonna stay too long on that
overview part. Uh, and then we're gonna
we're gonna do some demos. So, we're
going to show you plugins and
automations. Uh, VB is going to talk
about sub agents and then about the
bleeding edge. So, hopefully for those
who already know Codeex and use it, uh,
you'll learn something that, uh, you
didn't know about. And then we'll have
some time at the end for Q&A. So, um,
feel free to ask anything. Also, this is
a a workshop format. So, you know, if
you have like a a pressing question, um,
feel free to ask. And I see you all have
your laptop. So, also feel free to kind
of follow along with us. We're going to
show you how to do some things and you
can like try it at the same time. And
during the Q&A, uh, that's also like the
perfect time as well to try things on on
your site. Okay. So
um to start just for those who don't
know Codex or even if you know it maybe
you don't know it that well uh Codex is
our open eyes is open software
engineering agent. So it's not just a
coding agent. It's not just a you know
uh an agent that writes code. It can do
much more than that. It can run
commands. It can run tests. It can
explore code bases. It can really do
everything that a software engineer
would do. And so it's based on our
models as a foundation. So for example,
GPT 5.3 Codex was uh our previous ones.
We also have the Spark version which is
like the super fast uh model that that
we have. The the state-of-the-art model
right now is GPD 5.4. And we also have a
mini version that uh came out last week.
And you know, every time we make
improvements, every time we have better
models, Codeex benefits from it. But
it's not just the models. On top of that
foundation, we have what we call a
unified agent harness that will manage
uh evaluates the agents behavior and
that is a wrapper for tool execution for
environment setup for everything that uh
can let the agent uh do its work and run
smoothly. There's also safety uh the
safety embedded in that harness. So all
of that is Codex and then you can
interact with it through different
surfaces. So you have the Codex app that
we're going to talk about in a in a few
minutes. Uh you can also interact with
it through your idees with the
extension. You can interact with it
through the CLI and also through other
services like Slack for example at
OpenAI we all the time just like ping
codeex in Slack and ask it to fix things
or in GitHub as well. Uh and on top of
all of that, you can also integrate it
with your preferred tools so that it can
really uh work with everything that
you're already using. So, you know, you
can integrated with Figma, with linear,
with notion, all of that combined uh can
let you really like do every can let
codeex do everything that a software
engineer colleague would do. And so, as
I mentioned, this is based on uh our
models. And so I'm going to let VB tell
you a little bit more about that.
>> All right. Um good morning everyone. Uh
so um as we've been talking about um the
Codex app, the IDE um extension, the CLI
and so on and so forth, all of these
like harnesses as well as all of these
services would not be nearly as good uh
without the models sparring them, right?
And just to sort of like uh take a step
back uh back when I join joined OpenAI
which is not really as far uh back along
was in December um our leading model at
that time was GPD 5.2 and um and from
there we went on to release GPD 5.2 CEX
which was a specialized um you know um
codex variant of GPD 5.2 you um wherein
we we sort of pushed how far you can you
can take the model and um you know run
it on longunning tasks, how far you can
let it just continue chugg along. Um and
then shortly after we followed up with
GPD 5.3 CEX um shortly after that in
partnership with Cerebras we followed up
with uh GPD 5.3 CEX Spark um and and
most recently we had uh we released GPD
5.4 for and um you can already see how
we're sort of you know pushing this
whole sort of model and harness flywheel
as fast as possible trying to bring the
next for u next frontier as um as fast
as possible to you all right um
something to note is uh and and and
something that's not on the screen is uh
at the same time we also whilst we were
pushing for larger models which are um
really good for um longrunning tasks as
well as really complex tasks and so on
and so forth. Um, we also released uh
GPD54 mini uh and GPD54 Nano which you
can use for um short running tasks and
sub agents um which we'll talk about in
a bit. Um
and um and something that we haven't
really um emphasized on this uh over
here is um is two things. one that as we
as we sort of pushed on making these
models better, we also worked quite a
bit on making sure that these models um
can be served to you as fast as
possible. What that means in principle
is um we um we introduced something
called websockets which allow us to um
sort of create a connection between your
um your device as well as where the
where the API resides to be able to uh
give you roughly about 1.75x uh faster
tokens without um without really like
paying the cost over to you. Uh at the
same time we also released um a fast
mode which allows you to on top of the
1.75x get 2x more faster tokens and um
and this is something which the team is
continuously sort of hammering on. Um
there's there's lots more speed
improvements coming um over there.
And so um to bring this all together um
at the start of this year we we brought
together u the Codex app. How many of
you have used the Codex app? All right.
That's a that's a fair good chunk of uh
people. Um to be honest, back in
December um and and and even before I
was a hardcore CLI user and um at some
point um during the app launch while sub
was beta testing it, doc fooding it, I
um you know the Codex app became really
like a really important part of my
workflow. And the reason for that is uh
it it brings together a really nice way
to work across projects. number one and
number two within within a same project
work on multiple features at at the same
time. The way you can do that is um you
can have indiv individual projects like
you can see on the on the left side you
can have the codeex project chat GPD
sora and so on and so forth but also
within those you can you can use um work
trees to work on individual sort of
feature requests or you know bug fixes
or just that Q&A uh all at the same time
uh without really interfering with
individual tasks. This is something
which um we're quite proud of. Um and
you know providing a native work tree
support helps you um do the same task
and do multiple tasks at the same time
without really having to context switch
as much. Um at the same time um through
the through the launch we've been trying
to sort of increase um the net benefit
you can get out of the Codex app. Um and
some of these features have been um like
you know having a better automation
support. Uh automations is is also
something we're going to talk about just
in a bit. Um but the short summary of
automations is that you can essentially
have a um have a rough process that you
want Codeex to run let's say every day
at 9:00 a.m. or let's say every evening
or let's say you want Codex to look
through your calendar uh and and and
create like a briefing for you. And that
all is possible all within like the
native uh codex app setup with um with
automations. Um and then of course with
the with the work trees and like more
native git support um you you can sort
of work across projects and um uh and
just be able to you know push changes as
you want with whatever git persona you
want to uh do it with. Um last but not
the least based on which surface you use
uh you use the Codex app on. Um at the
start of the year we released it just on
Mac OS but now we have uh native Windows
support which comes along with native
Windows sandbox. Is there anyone here
who's using Windows today?
I'm cheering for you man.
And um so uh for the for the for the one
gentleman over there uh we have native
sandbox support uh in Windows. We're the
first of kind. Um there is no other um
you know competing harness which
supports like a native sandbox for uh
Windows.
Cool.
And so I've been talking on and on about
like the Codeex app. Uh been talking
about you know all the models that we've
been shipping but what's what's new in
terms of all the features um that we've
shipped. Um
this is in I think if I'm not wrong in
the in the sort of descending order. So
most recently we launched plugins. Um
plugins is um is a way that you can
bring together skills uh MCPS as well as
prompts and you know any other thing
really uh together in one bundle and
allow the model to do more nuanced uh
matching whilst it's building. Um we
also released recently mini models um
which tie in quite well with sub aents
which allow you to parallelize uh a
particular feature or bug request or Q&A
whatever it may be um at um at a faster
rate all whilst making sure that you
don't pay as much cost for your uh for
your particular models. Um and then we
have like bunch of other stuff which we
which we're going to talk about as we go
through. Uh some of this is you know how
you can uh how Codex is so good at like
code review, how Codex is really good at
security um and so on. Um all of this um
whilst we talk about all of this um I
want to sort of emphasize on this fact
that um we're at OpenAI quite lucky uh
that the community has really embraced
Codex. Um in fact just um just last
night we crossed the milestone of uh of
crossing 3 million weekly active users.
Uh and this is a pretty big deal. um uh
for us and we we want to continue sort
of supporting the developer community
the the you know enterprises startups
building on codec. So um throughout the
session if you have any questions please
feel free to throw it at both Katya and
myself um or even afterwards or just you
know ping us um with this I'll pass it
over to Katya.
>> Thank you. And yeah, the the three
million weekly active users thing is
really it's really cool to see and it's
crazy to think that it's also more than
tripled since January. So just in a few
months we've seen like huge adoption and
uh yeah and it's it's uh it's really
really cool to see. Um okay, so
plugins um plugins I don't know if
you've heard about it. It's it's quite
new on Codeex. Uh the the native support
for plugins. Uh the idea of plugins, I'm
going to show you what it looks like in
practice and how you can use them is
that they bundle a bunch of things
together. So like skills, apps,
integrations,
uh MCP servers, and they they bundle
that into reusable workflows. And so
what uh skills apps and MCP servers are
again I'm going to show you but just to
introduce that a little bit. So skills
are essentially um reusable instructions
packaged for specific processes. So if
you have something that you're doing
quite a bit uh you can actually create a
skill for it so that codeex knows about
it. You can give it instructions, you
can give it scripts as well, resources
and all of that. uh will save you from
just repeating yourself over and over.
So every time you have like a sort of
neat workflow that is always the same,
you can package that into a skill, you
can actually ask codeex to create the
skill for you as well. And then apps are
connections to other services. So uh you
know again uh we we'll see a quick demo
but the the tools that you use every day
like notion linear all of that uh you
can let codeex connect to it and MCP
servers uh you might be familiar with
this already but um they basically
expose tools for codeex uh to just
extend its capabilities further and it's
tools from external systems and so all
of these three things are already very
useful on their own. And what plugins do
is that they bundle that so that you
don't have to, you know, set up
everything manually. You don't have to
install multiple skills. You don't have
to connect multiple apps. You don't have
to connect multiple MCP servers. You can
just add a plug-in.
And uh another thing I wanted to talk
about in the Codex app and that we'll
we'll show a quick demo for um is
automations. Personally, this is like
one of my favorite things to do with
Codex. Uh because you can set up
automations that run in the background,
so like a chron job and you can connect
apps, you know, you can use uh plugins
there too and just set it to run on a
scheduled uh time. So for example, you
know, you can set an automation for to
run every day at a certain time and uh
it's just an instruction that Codex will
run in the background.
And the last thing I wanted to show you
with the demo right after is uh specific
skills for web app and game development
because we've uh you know we've heard a
lot about developers who want to to use
codeex to build these things to build
apps to build games and every time you
know they kind of repeat themselves
every time they kind of use the the same
skills. So, we actually packaged that
into specific plugins. And uh there's
two skills that I want to highlight uh
that are super useful and honestly that
are a game changer when you're
developing something visual. It's um
playright interactive. And so for those
who don't know, playright is essentially
like a a a headless browser like a um
you know a sandbox browser uh that you
can that Codex can just run and use that
to see what it's doing. So you can open
your app in a browser and with the
interactive version you can actually
click things and uh you know just
navigate your app um and and take
screenshots and see the and analyze as
those screenshots. And then image genen
uh is a great way to just generate
visual assets for your apps and games.
So enough talking. I'll show you a demo.
Um I'm going to start by actually
running this uh this one because this
one is pretty long. Uh when I ran it
yesterday, it took like an hour to
build. So I also have like the final
version, but I wanted to show you like
this this prompt how Codeex is going
through it. And so what I'm doing here
is I'm using uh the game studio plugin
which is again a bundle of a bunch of
skills that are helpful for uh game
development.
And I'm asking it to use imagen to
create visual assets, so sprites for the
games and using uh playright interactive
to also debug the game and make sure
that it works well. So we're going to
let that run and uh then we're going to
talk about plugins for a little bit. So
let me switch to another project here.
Uh
so this developers website one. Okay. So
this one is uh the repo for our
developers website which is
here. Sorry, I'm going to put that in
full screen. Um and so on our developers
website, we have this page with all of
the codeex meetups we have. So you know
there's a lot and all of that is
actually in our repo like in our
codebase in YAML files. And so I'm what
I'm going to do is I actually added this
Google Drive plugin here. Uh you know we
have a lot of featured plugin uh built
by us that you can choose from. You can
also of course add your own plugins. But
I connected this Google Drive plugin
that lets Codeex access my Google Drive.
And so what I did is that I prepared
this uh this spreadsheet
called Codex events with the event name,
date, and city. And I'm gonna ask uh
codeex to just update this sheet with
the current codeex meetups uh listed in
the codebase. So I'm going to start this
again. It's going to take a little
while. Uh and so let's check in on okay
for the the game task is still running.
I'm going to show you when it's doing a
little bit uh uh some more interesting
things. Uh but the last thing that I
mentioned is automations. And so
automations
is again something that you can just set
up using apps. Do you can just ask Codex
anything but instead of it being
interactive like you're actually using
the Codex app, you can set it up to run
in the background. So for example, some
ofations that I set up um that are
honestly helping me a ton in my
day-to-day lives um is one for Slack
messages. So, I connected Codex to Slack
and I'm asking, "Hey, Codex, can you
check uh every day at 9:00 a.m. the
messages that I should reply to and flag
if it's time sensitive or waiting for an
urgent response? Can you also do a
summary of all the things uh that have
happened since yesterday on Slack? Uh,
and I'm asking that to bucket it to
bucket uh per topic." And then uh
important information to be aware of. So
we have like important channels where
company information uh generally the the
things that you can uh that that that
leaks in like one day but uh gen so uh
important company information is in
there and so I just want to make sure
that I don't uh miss anything here. So
that's the kind of stuff that I asked
Codex to just summarize for me. Another
one uh that is uh pretty cool is the is
connecting Gmail and same thing like I
receive honestly an ungodly amount of
emails per day and so I'm just asking
codeex to check if there are emails that
I should actually reply to and uh you to
check you know if it's timesensitive or
if it looks legit or not because I do
get a lot of requests that's not
necessarily something that I would uh
that I would uh uh reply to. But this is
like saving me hours per per day. And so
the way you can create automations is
you can create it from here or you can
also just you know uh say something like
uh
hey codex can you uh create an
automation that will um look at Slack
and look for anything that mentions uh
codeex use cases and then uh list all of
the important use cases that I should um
that I should uh put on our website.
So, I'm gonna let Codex think about this
for a second. I should have used Spark.
And it's going to come up with this, you
know, it's going to create the
automation for me basically. And I
didn't specify
um when I wanted to run it, but I can
actually like Oh,
interesting. It's doing something
different because this is a live demo.
So, obviously it wouldn't uh Okay,
normally it will it should like do a
little popup. Uh so I can just like
click on the Oh, it's doing it. Perfect.
It was just very chatty this morning.
Okay.
Interesting.
Interesting. Okay. So, please create the
automation.
So, this it should show a little popup
if everything goes well. But if not, you
can still like create it manually. Uh
let's just see if it is doing it.
Okay.
I don't know what's going on, but okay.
Let's just do it manually. So, it will
you can also create it from here. And
basically all you have to do is just
call the plugins you want to you want to
use uh you know like use uh slack and
then uh choose you know the frequency
where the automation should run which
project it should run in etc. Okay so
let's check on our other tasks.
Uh this one is still running. Okay. It
generated some pretty cool sprites.
We'll look at this after. Uh let's check
on our uh task to update the
spreadsheet. So here Codex took two
minutes to actually analyze the
codebase. It found the source for all of
the Codex events where we have our YAML
files and then uh it wrote the 57 event
rows. So we have 57 events uh currently
listed on the website and uh so let's
check let's see our spreadsheet and yeah
we can see that it was updated. Nice. So
this is something you know this is a
simple example but every time you have
something that's very you know uh time
consuming and uh anything that has
anything to do with data data review for
example you can actually ask Codex to do
it for you. it has access to everything
uh on your codebase and you can also
feed it other inputs you know like other
CSV files and then you can just ask
codeex to do that type of work for you
okay now last thing let's check on our
uh game so as you can see codeex is
actually using image genen to generate
I'm going to uh zoom out a little bit so
oh Nice. So, it's generating like all
the sprites, all the game assets that I
asked it to do. And this looks pretty
nice. Uh, it's also so it's going to
take a while. Uh what I'm going to do is
I'm actually going to show you um final
results. But uh as you can see like
codeex is just reading um sorry it's
just generating all of these assets and
then it's going to use the playright
skill to see how that looks like in the
app. So unfortunately we don't have an
hour to wait for this final results. So
let me just show you the one that it did
yesterday. So, this is un uh untouched
like I haven't touched it. It's
literally just CEX um who built this and
all of that was like I had I gave zero
input. I was just like do a platformer
game with platforms made of bricks.
That's it. And uh yeah, it generated
everything. So granted the the overall
UI is not like you know I would probably
iterate on that but um I think the the
platformer itself is pretty cool and
what is really cool here is that
literally like all the sprites like here
you know I'm just like moving all around
and you know that that's at least like
five different sprites of the little
character and I didn't have to do any of
that. You can also, you know, do a
custom game with your face as input and
uh have image gen just like create a a
2D version of you. Um, so that's a way
that you can like leverage the image
genen skill, the playright interactive
skills and that game studio uh plugin.
And just to show you what's inside like
we have also the same thing for web
apps, but it's a bundle of like all of
these skills together. Um, so yeah,
that's uh that's it for me. Uh, I'm
gonna
pass it back to VB.
>> Thank you.
>> Thank you. Got
>> all right. Um, perfect. So just to do
like a very quick uh checkpoint uh and
like a recap on what we've spoken so
far. So we went through like all the um
all the models that power the codeex
ecosystem. Then we went through all the
surfaces you can consume codex from. Um
and then we went through uh plugins, how
to use them and what are some of the
plugins that you can use. You can also
create your own plugins um using plug-in
creator. um you and and then we went
through uh to speak about uh automations
um and imagin and and so on and so
forth. Um now something to note is like
as we as we continue sort of delegating
more you know more and more work on
these um agents it could be any of your
favorite agents uh codeex or not. Um,
one thing that um that you want to be
sure of is whatever it is that your
agent produces is of the utmost quality.
Which means that um as we as we start
sort of working on multiple features at
the same time, multiple projects at the
same time, it it's going to be quite
likely that it's impossible for you to
uh go and look through each and every
line of code. which means that at least
for the first pass you want to have a
way um which you can rely on um to
review your code and this is where code
review um sort of comes in um it's um by
no means um am I bragging about this but
uh in my own biased way uh codeex code
review is one of the best in the
industry right now this is uh something
which you know uh people on Twitter and
LinkedIn um on our own uh sort of you
know platforms Discord and so on and so
forth keep raving about uh that how is
codex code review so so good. Um so I
wanted to spend like a quick hot minute
on um on what it does. So first of all
um it is available on the surfaces that
you work at which means number one you
are able to use codeex code review on
GitHub. Um so you can connect your chat
GPD account with GitHub and for each and
every pull request that you create um
you can set it up such that codeex can
automatically review each and every pull
request and it would typically give you
um you know some sort of a uh some sort
of a you know um what's this called a
call out like this on the pull request
itself saying that hey like this is
something that is missing. Hey, maybe
you know P 0 fix this, P1 fix that, P2,
you know, this is something that would
be a good to have and so on and so
forth. Uh, at the same time, you can use
slash review on the on the Codex um CLI
or the Codex app and Codex will spin up,
you know, large um sort of review
process and so on and so forth. Um and
very recently last week uh with my
colleague Dom we shipped um a clot code
plug-in for codeex which allows you to
um you know essentially invoke codeex
within your clot code sessions to be
able to get the same sort of
state-of-the-art code review but in your
plot code sessions. Um so um something
to sort of see here is let's say that I
am working on a project like this. By
the way this is my this is my actual
working setup at work. Uh I this is like
all which I work on. Uh I'm not like
everything that you see here is like all
of these threads all of these projects
is something which I work on day-to-day.
So if you see something which you
shouldn't just close your eyes. Uh and
so typically what I would do is I would
um I would go through you know like a
like a feature request or I would go
through um you know some sort of ask
from from someone um and um uh let's say
over here I asked the I asked Codex to
do a bunch of things. So I'm just going
to ask it to review its changes. Um
and so then you get an option to you
know either choose from a base branch if
you have multiple branches in in the git
repo you can choose it against a feature
branch against an eval branch whatever
it may be and so on and so forth. Uh in
this case I'm just going to ask you to
review um uncommitted changes. Uh, and
what it does is if you see
um here,
what it does is it spins off a totally
new thread. Um, and what that thread
would do is, um, is it would essentially
spin up a totally new CEX process which
has uh like our own, you know, review
system prompt. Um and it would continue
sort of looking through not just the
diff or like the list of all the changes
but it would also contextualize it with
everything that is there on the uh on
the model repo itself right and so a lot
of the times um um codeex code review
will like find find out changes which
would have second order effects um which
is not limited to just the you know diff
or whatever changes you've made but also
to some other like modules which you
haven't even touched in the pull request
itself or in the changes itself and this
Um this is so effective that 100% of
pull requests across all open air repos
made by all employees um including Greg
are are reviewed by Codex code review by
default um and that's when uh you know
that's the first pass that you take um
cool and so as you can see over here um
Codex worked for a minute and it came up
with these with these sort of uh you
know updates like P1 you know localize
whatever revenue revenue detail P2 uh
translate this to this and um and so on
and so forth. And what you can do like
after this is like essentially ask
codeex to uh either like take a pass at
fixing this or like open another sort of
PR on the on whichever branch you're at
and then sort of go on from there.
Cool.
Now we get to sub aents which is
something which I'm personally quite
excited about. So uh first and foremost
what is sub aents? Sub agents is the um
is is essentially the ability uh wherein
you can spin off um a master task into
decomposible parallel and independent
tasks which you can hand off to agents
which can uh which can allow these
agents to sort of work independently and
then at the end of their run get back to
you and um you know give you a response
and um over here like sky is literally
the limit like you can spin up as many
agents as you want um of course as long
as your API key or your uh you know
whatever charge GBD pro plus go
subscription you're on u can can can
take um you can do a lot of like
interesting things uh with sub aents um
for example what I'm doing um on the
screenshot on the left is um I have a
codeex agents repo which we're going to
look at in a sec it's not public yet but
I hope that we'll be able to make it
public very soon which has a lot of
personas for sub agents that you can
use. So, it's kind of meta. It's it's
essentially sub aent personas like doc
reviewers or, you know, um test case
creator or test case runner and and so
on and so forth. And what I um every now
and then we would change the change the
spec. This is from before we wanted to
change the spec of how um how sub aents
work. So what I wanted um it to do is to
go through all of these 40 50 different
sub aent personas, review them and and
and make sure that they are up to spec.
And of course doing it without sub aents
would have meant that um codeex would
open each and every file and then review
it and then give me a summary and
continue doing it for like 50 different
sub aents. In this case, um it it
essentially created review slices, which
means it created say, you know, these
are the two uh files that um that you
know uh sub agent uh poly or sub aent
Plato uh should you know uh essentially
review and then they would spin up a new
codeex environment. They would review
those and then at the end CEX will
collate all of these and um you know
give me back a response. So let's let's
give this a shot.
Um
so the repo in question is this. Um it's
um it's just a codex agents repo which
has bunch of personas. Um you can see
that we have um we have quite a few sort
of personas over here. Um we've got like
an accessibility reviewer, architect and
so on and so forth. And this is like
actually something
which you can create yourself and we're
going to touch on that in just a uh in
just a minute is um you can you can
define your own custom sub aents right
um but think of this repo as like a
collection of these sub aents and um
this is typically what you would have
for for each and every sub aent you
would have a name you would have a
description you would have a different
sort of like you know sandbox mode
whether you want it to be write only
whether you want it to be read only um
you and then you would have some sort of
like you know instructions um and so on.
And so now what I'm going to do is I'm
going to ask Codeex
to um I'm going to go over to my Codeex
agents. Um I'm going to switch to
let's do medium over here. Let's close
this.
Can I make this full screen?
All right. Um so let's give it give it a
task. um spin up 20 sub agents to review
all the sub agents.
So this is a very simple task. All I'm
asking uh codex is to do u the same task
which I was showing before um wherein I
wanted to review all the different sub
agent personas in this repo. And you can
see that um uh you know there's it it
already figured out that there's like
agents and skills and it's looking into
it. There are 45 curated persona files
and uh what what it's going to do is
it's it's going to create 20 reviewers
and um um it it's going to give them all
of those um um toml files and then it's
going to review those. And you can see
that um there's two things which is
quite interesting over here. Number one,
Codex automatically decided that this is
potentially u a complex task. So it
automatically kickstarted the plan mode
uh which is what's active over here. So
you can see that uh it um it essentially
came up with five tasks u to solve this
particular problem. Um you can
explicitly invoke plan mode as well, but
uh in this case it decided to do it on
its own. Um, it's it's then partitioning
all of these persona files. Um, and then
it's going to spawn 20 sub aents very
soon. Um,
I swear it's faster. Uh,
but um, so now what it's doing is it's
um Oh.
Uh so for some reason on my on my
particular setup I have a cap on six
like six concurrent agent threads that
can be run at the same time. Um we can
fix that. Um but to go back up
what we can see over here is that uh it
at least spin up six agents which is my
limit uh for now. And I can see all of
those agents um you know working over
here. I can quickly see like what Jason
the agent over here is doing uh or Hume
and so on and so forth. And you can see
that uh something to note here is that
the the main codex model over here, hi.
The main codex model over here um
essentially created a persona, right? Um
and and not just that, it doubled down
and it it gave the exact files that this
particular sub aent should review,
right? Um a and and uh additionally it
also gave it some some insight on um
there's there's repo guidance in
repo.mmd in contributing.mmd in skills
and so on and so forth and um it will
sort of continue going down this this
route for all the different sub aents
right um and what it does towards the
end stage is that um it will tear down
all of these sub agents when when they
have gone through um when they have gone
through their whole process of looking
through all the TOML files and so on.
And if I go back to
uh my main thread, um you can see that
two of the agents are are still working.
Um
but eventually like it would collate all
of this feedback that it that it has
gotten from um all of these individual
sub aents and you know proceed. Um now
you can you can think of this this is
like a very simple sort of explorer use
case right but you can think of this
from for example a cyber security
perspective wherein you have um a git
commit or you have a a particular git
repo and you want codeex to spin up and
run multiple uh you know vulnerability
um
sorry one sec you wanted to create
multiple sort of you know vulnerability
analysis from different points of views
or from different hypotheses and you
wanted to sort of tackle the same diff
or the same GitHub repo and try and come
up with like a vulnerability map, right?
And this is something we actually use um
um quite a bit or I personally use quite
a bit when I'm brainstorming a
particular feature. I would just spin up
multiple codec sub aents to sort of look
through how I would approach a problem.
Right? So let's say I want to add a
feature. I would ask Codex to create a
plan for what are say five or six or 10
different ways that uh that a model um
that a particular feature could be
implemented and then I would quickly
double down on could uh like and ask
codeex to um then create multiple sub
aents to get me some sort of
understanding for um for these tasks.
Sorry, my watch was constantly
vibrating.
Um and and so that's like um that's like
a quick highle overview of how sub aents
work. Uh by default we ship three sub
aents um three sub aents personas. Um
let me quickly open.
So by default we ship um three personas.
One is like a default general purpose
fallback agent. Another is a worker
which is sort of execution focused. So
this is something that you would use for
um you know when you want codeex to
write a particular feature request uh or
work on a particular feature. Then
there's explorer which is the same one
which we used uh before and and then for
for each of these you can double down
and create your own codeex um sub aent
personas like we saw before and we will
create one right now.
Um something to note is um is that these
particular sub aents um they
like for each of these you can define
what model you want to use. You can
define what reasoning effort do you want
to use you can define what sandbox mode
do you want to use and so on and so
forth. Um the reason why this is
important is for a review agent you
would almost always 100% want to use the
review agent in readonly mode. you would
never want your review agent to execute
anything, right? Um for same reason for
like a cyber security vulnerability u
assignment, you would want your um your
sub agent to always be in readon mode.
But for a for a um for like a docs
writer or for something which like you
know creates um docs for a particular
feature that you've created or a bug
report and so on you do want to give it
write access so that it can execute
stuff and also create a um create a bug
report for it as well. Um something to
note is that you can also double down
and give these um sub agents you know
more capabilities by giving them uh MCP
access. So you you can just give um you
know let's say you can give a sub agent
MCP access to Sentry so that it can look
through all of your um um all of your
reports over there or like one sub agent
access to your linear um you know
backlog so that it can um it can
interact with linear it can uh read
through all the um uh all the issues
added to you triage them and so on and
so forth. You can also give them skills.
Um so really you can um um if you really
want to you can quite heavily customize
this entire setup for your own um for
your own
use case.
So let's open um our codeex app again.
You can see that it went through all of
these sub aents. It created a bunch of
uh other sub aents just to go through
all of these and uh it came up with
these findings. Um it's like based on
readme based on contributing uh
performance investigator um is
overprivileged um P1 has a sandbox mix
uh sorry verifier has a sandbox mismatch
same for writer and so on and so forth
and so you can see this is already quite
useful um and it saves you quite a bit
bit of time to be able to go through all
of these uh individually or sequentially
and so on and so forth. Um now let's go
back and see a bit more about custom sub
aents.
Um so as I mentioned that we ship three
um sub aent personas but at the same
time you can create your own custom sub
aents. In fact we do recommend creating
your own sub aents or just ask your your
codeex to look through your past
sessions and create sub aents for you.
Um both of these scenarios work and um
work quite well. So in in this
particular case uh you can see that we
have a PR explorer sub agent which um
reads your um your codebase uses GPD 5.3
codex spark which is our um research
preview model text only um deployed on
Cerebras um and is blazingly fast is
quite fit for this particular use case
and we set sandbox to read only so we
don't want the model to sort of execute
and we give it certain u you know ex in
instructions. So in this case we say hey
stay in the exploration mode uh trace
the execution path you know um don't
propose any fixes and and and just like
you know search through and and and
figured out like what what what exactly
do you want us to do? Um
now let's quickly try and
try and um create a sub agent.
So let's say we want to do um docs
researcher.
In this case, what I what I typically do
is to just go
and ask um
hey Codex, can you create this sub agent
um for me? Uh here's here's its persona.
Um, and then let's see.
And so what Codex is going to do because
Codex is aware about um about how it
works and you know uh what it's supposed
to um do and where it's supposed to
place uh all of these things. Uh what
it's going to do is it's going to create
a TOML file for this docs reviewer. And
in this particular case, this is this
uses the docs MCP server which we
created um um from the DX theme um which
packages all the API references, all the
docs, all the guides, all the you know
toolkits and so on and so forth and uh
it will add that as an MCP server so
that every time we ask it um ask it a
question about hey like what's the best
way to use GPD 5.4 before with
websockets or what's the best way to use
GPT realtime with um um with I don't
know pick your favorite way of using GPT
real time and um and can you create a
react plug-in for this and so on and so
forth. Um uh it would be able to
reference all of these things.
So, I'm gonna let it do its thing and in
the meantime,
um, head back over to the slides.
And so, just to go back, sorry, one
second.
Um what you can do just to sort of
invoke um you know a particular sub
agent is you can say um hey can you
reviewer
sub agent and review
each and every
persona
based on the
developers
docs. So, uh in this case you can you
can essentially like use the same
particular um sub agent uh leverage it
again and then ask it to do the
particular task that you want to do.
Now, what are some like interesting ways
that you can use this is um imagine like
you have like a long build process or
you have a test process. You can have a
sub agent which can run your test case
locally. You can have a sub agent which
can uh always make sure to um oh I'm I'm
being told that I don't have as much
time. Uh um you can have a sub agent
which can pull the latest from uh from
GitHub as soon as you do a pull. You can
have a sub agent which can uh you know
quickly um pull all of the context from
a linear issue and so on and so forth.
So really like you can you can you can
do this for you can leverage this for a
lot of um things and the best thing that
I like to do is to just ask codeex to
look through my past sessions and
recommend me certain automations certain
sub aents and so on and so forth that I
can use.
Cool. So now we're at the at the
bleeding edge. Um this is bunch of stuff
which we have shipped in the past and um
we haven't really made as much of a
splash about. Um so um what we're going
to do is we're just going to quickly go
uh around and see like what each and
every one of these um uh do and and how
you can leverage them. So first and
foremost is guardian approvals. This is
an experimental feature. You can
activate it today uh by just going on
/experimental.
Um so it would be something like
um
codeex
hopefully it works
and then you can look at um experimental
and you can um in in my case I already
use cardon approvals and you can
activate it this way. Um what card
approval does is um all of us including
myself at some point were um guilty of
using yolo mode all the time which means
that you by default give unfettered um
access to your coding agent to do
literally whatever the hell it wants
right and this by all means and measure
is not safe. Um hence we came up with
something called guardian approval which
for each and every time codex needs um a
privilege needs to run a privileged
task. Let's say it is uh can I remove
this particular directory? Can I run a
server? Can I expose a particular file
to um um to the internet. Whenever all
of these things sort of pop up, what
Codex will do is it will spin up a new
sub agent, right? Which will based on a
particular prompt try and verify whether
or not this is something which needs my
human interruption or not. Um and in
most cases it doesn't need you know
human interruption. So it will just say
hey go on run this particular you know
privileged tool or privileged task and
so on and so forth. And um this way what
we what we hope to do is we hope to
reduce the human fatigue uh that comes
by just you know always sort of having
to approve you know do this task do this
run this particular bash script or run
this and and so on and so forth. In um
in principle, how would that look is
um trying to see if there was
um
Uh,
okay. It doesn't show show it to me
right now, but if I just in the interest
of time, I'm going to ask uh, hey, can
you run the dev server? And I'm going to
instead of full access mode,
uh, which for some reason again I'm not
able to,
uh, click on.
Let's let's try and see um if it if it
invokes um guardian approvals.
Whilst this this works, I'm going to
head over uh to the next step which is
hooks. Hooks is also something which is
experimental right now. We're we're
trying 24/7 to try and make this uh a
better experience. Uh currently Codex
supports three hooks. One is after each
tool use, one is at the start of a
session and third is at the uh when you
stop a session. What hooks allow you to
do is it allows you to programmatically
ask codeex to do a thing x uh based on a
particular event. So let's say that when
you start your um your codec session you
want codeex to pull the latest from your
GitHub repo. So in that in in that
particular case you would want to set up
a start hook. Um if you want Codex to do
something after each tool use let's say
um for a lot of researchers who want to
document each and every tool use they
might have like a per tool use hook
wherein they document what Codex has
done uh per session and so on and so
forth. So you can do that with that. Um
and um last but not the least something
which I personally use is the stop hook
which is when I'm running long running
tasks I would um at the end of each turn
uh of codeex I would ask it to keep
going so that like it just continuously
uh you know continuously keeps running a
particular task and um
in in theory how this would look like is
um
is Where is it?
Is
sorry one second.
Wow. I was really prepared for having
more time. Um I have to say but um in
theory how this would look like is um is
that you have some sort of a Python
script um and you have you define like a
hooks.json. So in this particular case
you can see over here that you have a
pre-tool use um you have some sort of a
you know matcher you say like on startup
or resume run this particular session
dot session start py and so on uh and
you can define how you want to uh in
this particular case um so what I did
for for example uh for the sales
dashboard example that I've been showing
you so far is I created a hook for stop
which runs this Python script which is
keep going UI um which is every time it
encounters the stop um um hook, it would
just ask Codex um to keep going, do one
more pass, run one one solid validating
command, type in one more thing, and
then stop and give the result. And so
for really longunning tasks, you can
just set it up and like ask it to
continue doing its own thing. Um
last but not the least um we have
personality changes which means that you
can go on codeex and you can ask it to
um
quickly look at personalization. You can
set up different personalities. You can
set up a more uh friendly personality or
a pragmatic personality based on
whatever you want to do. You can also
add custom instructions. So you can ask
it to always site whatever it is it is
doing and so on.
Um
right and then last two things um is we
released something called codec
security. This is our state-of-the-art
uh model which allows you to find and
fix vulnerabilities in in your GitHub
projects and um you know essentially
what it does is it would go through
commit by commit and um it would create
a vulnerability patch uh and then and it
would use codeex to then sort of patch
the set changes as well.
Um lastly um as I mentioned before we
released a cloud code plug-in uh which
allows you to use codeex in uh in cloud
code. Uh this is something which was uh
surprisingly used quite a bit by the
community. Um and this is something
which allows you to sort of ask Codex to
review whatever it is that you've done
so far. Run an advers adversarial review
or just like ask Codex to rescue
whatever changes you've done so far as
well. Um that's it. Thank you so much
for for joining us and feel free to ask
any questions that you might have.
>> Hi.
>> So, we don't have a lot of time for Q&A.
Unfortunately, we should have started
maybe a little bit earlier, but uh happy
to take maybe a couple questions in the
room and then we'll stay here anyway.
So, if you have questions and you don't
have anywhere to be, you can come to us.
Yeah.
>> Thank you so much. I have a question.
you said a couple of times that there's
like a way to uh scan uh let's scan the
past sessions and basically give your
recommendations for that. How exactly do
I do that like a project with like 20
threads or something like that? How you
want to scan that? Yeah. So what you
typically do is like all of the sessions
within Codex are put in uh dot sessions
within a particular within the same
codex folder and CEX has the ability to
just like scan through all your sessions
and then you know
>> this using the CLI but not
>> I can do this using CLI but not using
the COX
>> you can use it uh you can use Codex app,
you can use Codex CLI, anything. Um, you
just have to ask it to look through the
sessions and yeah, do whatever you want
to do.
>> Nice.
>> There's another Oh, okay. Maybe a couple
more.
>> Yeah.
In the back here.
>> Hi.
>> Hi.
>> Is there a way to hand off a task to a
cloud agent? So, let's say I'm here
working on a task and I'm I have to
close my laptop. So, I off to a cloud
agent.
>> Yes, definitely. We didn't touch on that
but actually uh you can do that from the
Codex app directly like um maybe you can
you can show your screen but you can
either work locally and as you mentioned
you can do it like we support get work
trees as well uh but you can also just
select cloud here and you can select the
number of uh times this task should run
like you can parallelize we call that
like best of n so you can like run it
four times in the cloud and then just
pick the best output uh so that's
something that is like built in in the
the Codex app in the ID extension and
you can also like access it directly
from the the web interface
>> and there's more cool stuff coming on
that very soon.
>> More what?
>> There's more cool stuff coming on that
very very soon.
>> I think there was one right here. Yeah,
>> thank you so much. Um my question was
actually about the cloud UI as well
because um today sub agents aren't
supported if I'm not wrong and uh
especially the thing that bothers me is
it doesn't use the the skills that are
in the repo is that coming soon or
>> so um there's like a at the risk of uh
uh you know talking about the whole road
map uh we we we we definitely have a lot
more changes coming up on that
particular front. Um I'm not sure if
skills within cloud is going to be as
soon as I say that it it's going to be
but u it's definitely at the top of the
mind and we do want to sort of add uh
you know give you the ability to sort of
like have your own trusted MCP servers
to be able to run there or CLIs and so
on. Um and also the ability to just like
have SSH agents u that you can just
spawn off uh a particular task to on a
VM and so on. So lots of work on that.
Like
>> it can use skills in the repo, right?
That that is checked in. It's
>> not on cloud tasks.
>> Yeah.
>> But like if you like it it reads
instructions and stuff and you can like
find it and like still see it since it's
in the codebase. It's more like the the
skills that you have locally that work
the same. The reason why we don't allow
it on on cloud is because there's no way
for um the sandbox to know whether or
not a skill is trusted or not,
>> right? And so that's why we we we don't
and like skill can package like a Python
script or or an execution.
>> It won't execute things, but like if you
have, you know, like things like
resources, it can access it technically
because it is like in the repo. It's
just Yeah, it's not as good.
>> So I have to request it.
>> Yeah.
>> Thank you. Thank you.
>> Were there any other questions?
>> Cool. Have a great day. Enjoy the day.
And uh if you have any other questions,
we're going to be around um today,
tomorrow, and also maybe on Friday. U
feel free to reach out or just like drop
a DM. And um enjoy.
>> Thank you.