Building your own software factory — Eric Zakariasson, Cursor

Channel: aiDotEngineer

Published at: 2026-04-28

YouTube video id: rnDm57Py54A

Source: https://www.youtube.com/watch?v=rnDm57Py54A

Um, okay. So, we're starting five
minutes early. Um, hey everyone. I'm
Eric. I'm an engineer cursor and I
mostly work at developer experience and
product. And today I kind of wanted to
talk to you about
my experiences like working at Cursor
dog fooding the product and like getting
to a place where you can build your own
like software factory and like what that
kind of like takes and the practical
steps getting there. To be honest, I
don't think we're really there yet. Like
sub parts of the product and sub parts
of the company are running like fairly
autonomously. U but building a software
factory takes a lot of work. I mean like
look at like real life uh factories
producing like hardware. There's a lot
of assembly lines. There's a lot of
people that goes into this, a lot of
managing, observability and all that.
And there's a lot of concepts we can
borrow from that world and put into the
uh software world. So anyway, here goes
my observations from doing this. Um but
first um the agenda I want to talk about
like levels of autonomy uh precursor to
factory, pun intended. um building the
factory, running the factory, and then
scaling the factory. And I want to
finish with some Q&A for any kind of
questions.
Okay, so for the levels of autonomy,
Dan Shapiro put out this blog post I
think in January or February, uh
explaining like six different stages of
uh autonomy uh throughout like
automating software. uh Carpathia has
also like previously used cursor as
example of like going from tab to agent
and all that but I think this kind of
like encapsulates this really really
well. So we have this spicy autocomplete
uh at the start and this is kind of like
where cursor started in 22 23 like ages
ago at this point. Um and we kind of
like gradually moved up the ladder and
making the software creation more
autonomous and letting the agents do
more work.
And I think most people uh adopting the
AI tools um are like at somewhere
between level two and level three where
you have a pair programmer where
essentially just going back and forth
with the agent, asking questions, um
getting suggestions, asking the agent to
do work um and eventually like finishing
their tasks. And the step above that
would be uh having the AI generate the
majority of the code um which we can see
like here in the developer level three.
um where you as a human more kind of
like reviews it um kind of like in the
loop following traces and all that but
as you further progress you're like
becoming more and more of a manager and
we'll talk about uh this more later but
eventually like level four I think this
is where I'm at at this point like for
most like software projects where I'm
like delegating as much work as possible
to agents and probably like reviewing
the outputs before I actually review the
code um because I still look at the code
sometimes
Um and lastly we have the software
factory which is essentially like a
black box. Um Dan Shapiro calls it like
the dark factory where you don't really
have an insight. It's just like agents
going around doing their thing uh
shipping the code, testing the code,
building the code, all that. And you as
a manager just provides like the intent
and the instructions um and like the
goal uh from what you want out of the
factory.
Okay.
Um yeah. So like why do you even want to
create a factory? Um first of all like
throughput you probably want to create
more code with like less resources. Um
you can run agents 247. You don't have
to uh rely on humans that need sleep and
food and eat and all that. Um you can
just like have more agents. Um another
like thing with the factory is like you
have assembly lines and assembly lines
produces um consistent outputs. So if
you build your factory right, you can
probably have very consistent output.
Um, but at some point you initially you
feel like if you don't have a red setup,
you might feel like the agents are
getting more and more probabilistic and
like you're losing a lot of determinism.
Um, because they just go off and do
random things. Um, which is probably a
sign that you need to like build more
guardrails for the factory. And I think
this is a function of the model capa
capabilities as well. Like as the models
get better, they can follow instructions
better and just execute on whatever you
want them to do.
Um, and thirdly, um, you might want to
have a factory because you can leverage
your taste better. Um, you can like get
more out of your creativity out. Um,
instead of just like waiting for you as
a human to create them and produce this,
uh, software that you're creating. Um,
and then obligatory like then and now.
This is what it used to look like. This
is like a Tesla factory from a couple
years ago. Uh, and this is like kind of
what we're getting after here.
Okay, let's get straight into it. So to
build a factory, what do you actually
need? Um, I like to think of this as
primitives and patterns. So just like
how do you structure the code? Um, is it
like a modularized codebase? Um, do you
have this scattered all over the place?
Is it coll-located code etc.
Um
because the like um the distance in
um in locating like if you have an agent
like lsing a folder uh it can like
discover all the relevant files at once
instead of having to gp and search all
of the codebase. It can just like be
very isolated to work within one um
single part of the codebase. And this
goes for same with humans. like if you
have an easy time like onboarding
yourself to a new codebase, an agent
probably will have that too.
Um the second thing is like usage
patterns. Do you have specific like
methods and services for authenticating
a user? Do you have like startup
scripts? Do you have a way to like write
tests etc? Do you have this boiler plate
in place? Um because if you do, you can
point the agent to like existing
references and just asking this to
reproduce uh over time. So those are
like some of the like primitives and
structures of the codebase.
Um the second one would be guardrails.
So like you might you want to let the
agents free but not too free. Uh so you
want to have some rules and and checks
and and hooks in place. Um for example
um a hook you might want to have is uh
touching a specific part of the
codebase. uh maybe the agent should not
be able to change to like the most
sensitive like encryption of sensitive
data or authentication or anything like
that where uh a mistake could be like
very very very costly um for the company
or for you as a human etc.
Um rules um rules is probably the most
misunderstood
um concept since we launched cursor
rules. Um there's uh cursor directory
which launch a good collection of
different rules. Um and the assumption
was usually that you should just install
every rule that you can depending on
like what um uh software stack you're
using. For example, if you're using
NexJS, maybe you should have Nex.js
rules. Um but what I found and what I'm
seeing amongst our users and internally
is that rules should just like emerge
dynamically. Like if you're finding
agents going off the rails, you should
probably create a rule for that. And it
should kind of like be sort of like an
SA SOP uh to showing like the agents
what they can do and cannot do. And
again, the models are getting so good at
following specific rules that they
usually don't go off the rails anymore.
And I think that's just kind of like
extrapolate over time as well.
Um, and of course tests like can the
agent uh verify its own work and can it
run tests to know like oh I messed
something up or um I made a change
depending in like in this specific area
of the code but it still passes I can I
can still run the code and u the check
looks good. Um
and lastly, which I think is probably
most exciting is the enablers. Like what
can you allow the agents to do to
actually let them be free? Um skills is
good for this. Um just giving the agents
more capabilities, skills and MCPS. Um
accessing like external context. Um
getting like understanding of how to
implement a certain thing. Uh I'm going
to show you some later uh in the cursor
codebase. Uh what we are doing for
example like feature flagging. Can we
give uh the agents a skill to add a
feature flag? So when we launch them
autonomously, they can just flag the
actual changes made and merge the PR and
come back to us like, "Hey, uh if you
want to try this, just turn on this
flag. Um if you don't like it, we'll
just revert to PR. If you like it, we
can like expand it to more users."
Um and lastly, like what kind of
environment are you letting the agents
run in? Um, can your agents
um start your dev environment? Can you
just ask them to like, hey, um, start my
project um, and let them do that without
having to like have any human in loop.
Um, because if that's the case, you can
probably like have them run um, you can
scale it up like infinitely on separate
VMs.
Um, and then this checklist is like what
I'm usually following um when thinking
of like building the actual like uh the
factory. Um, and part of that is like is
it runnable? Um, there's a typo in here.
I blame my Swedish. Uh, there's is it
accessible like the context that the
agents needs to have. Um, can they
interface with linear or notion or data
dog or slack etc. just to understand and
like see what's what is like the broader
context of this the intent that the user
have and lastly which I think people
should be spending a lot more time is
like building verifiable systems how can
um the agents themselves like verify
their own work whether that's through um
unit test or integration tests or uh UI
tests like actually clicking around in
the DOM and like trying to reproduce
things that's actually happening for the
end user. Um, this is arguably easier
for uh backend systems where there's
like no UI really happening and you can
have like clear contracts and boundaries
of what should work and what shouldn't.
Uh whereas for for web and UI and all
that, you actually need to click around
and making sure things work. The buttons
actually have a loading spinner, etc.
Okay, so this is like part of building
the factory. So, if we switch over to
cursor here, um I'm not sure if you've
seen this, but this is cursor uh three.
Uh we launched this a couple of weeks
ago, and it's a complete rewrite of
cursor. There's no VS code anymore. Uh
most of you are probably familiar with
this type of cursor um
where you have files and sidebars and a
lot of different things. uh whereas this
is a bit more streamlined for like an
agent first workflow
and and we'll get to like why we created
this as well so at a later point but I
wanted to show you some parts of um some
rules etc. Let's see where I put them.
Um so for example I built this music
agent
uh project and if you've used Ableton
before you probably recognize this.
>> Yeah.
>> Yeah. Yeah. Yeah, I'll expand it
more. Good. Okay.
Um,
yeah. So, if you've used Ableton or any
like music production software, um, you
probably recognize this interface. Oops.
Uh, it's not really working in the size.
Um
but what I uh essentially asked the
agent to do here is like can you start a
local dev server and we can see that it
worked for a while. Um it explored some
files read package JSON and based on
this uh there is a start script. So like
package JSON and all these dependency
files are so in distribution of the
models that they know like we should
immediately go to package JSON if
there's a JS project or if it exists um
to look for a start script and this is
like a good example of having like a
pattern that is predefined and like
making your codebase more like in
distribution uh in that way u because
now it's like it's super easy for the
agent to understand like oh I should
just go in here and start the server.
So it started a server. Um it's running
on localhost 3000.
Um and let's see here. We can see that
we have this agents MD file. Um so
agents MD is like cursory rules. It's
across for many different harnesses. Um
and what I wanted to accomplish with
this project is essentially like
building a factory around this idea of
building like a online music uh creation
tool. Um and to do that I like I forced
myself never to write any code um myself
try not to look at a code that much
either and just like try to figure out
like what is the systems and the
structures I need around this. Um
and um immediately it became pretty
clear that we need a way to start the
project. Um we need a way for the agent
to like verify its own work. So the
agent created this uh end to end tests
um using a playright so it can just
spawn browsers um go to root etc. click
around and get my test ID. Uh, and
making sure like for every different
change I make, um, for example, the play
button still works or I can add notes to
this project here, uh, without anything
breaking.
Um, so these are like some examples of
um, how you can create like verifiable
outputs like that. Um,
okay. Uh, we have V test, we have this,
etc. So, let's see here. If you go back,
um, oh yeah, another option here, casual
scrolling on Twitter.
Um, a different way to verify the work
is using, um, like an automation to rec
code review. Um, you can ask the agent
to just review the changes it made. Um
or you can use like um a more like
integrated tool like bugbot that we have
in cursor that just looks at uh
different PRs uh in GitHub and reviews
them and comes back. Um and this is like
also like one piece of the whole like
factory that you should have multiple
different stages where you you plan it,
you produce it, you review it and you
essentially follow the whole uh SLC uh
but you like automate and codify uh this
work.
Um let's see here. Yes.
Um I did I want to show you this as
well. Um so we launched uh updated cloud
agents. Um in the last couple of weeks
where we gave uh each agent their
separate VM and you can have them like
create this very reproducible
environment in the cloud and this
essentially allows you to scale like
infinitely. uh but we also gave the
agent a tool to test its own work um by
controlling the computer. So for
example, we have glass here uh which is
the interface and I asked agent to let's
see here
uh glass agents still rough with the
keyboard control tab etc like better
accessibility and um using the keyboard
to navigate the agents. Um, and I asked
it to uh make the change and then record
this with the full editor because the
first one was just a sidebar. So, what
we got back here is just a video of the
agent actually testing its own work. So,
we can see that it has this highlighted
row. I'm not sure if you can see that.
Um,
but just some context for me as a human
to verify the work. Um, and then it
actually clicking around and using the
keyboard to to navigate.
So with this, we're like we're getting
kind of far in like the factory like
where we're at. Like a lot of the things
are automated like review is automated.
Um the testing is automated. Um uh we
have some rules to like steer the agents
etc.
Um but there's still a lot still a lot
more to do. Um so I think when you have
this in place the most important thing
you can do is like shift your mindset.
Like you are going to look way less at
code. So you are going to go from like
worker to manager. Um where instead of
just doing the work yourself, you're
overseeing a lot of agents doing the
work uh for you. Um so this also means
going from sync to async because most of
the work is going to happen in the
background and you can still tap in and
see what's going on uh for different
agents, but the more agents you spawn
over time, the harder time you're going
to have to like understand what's going
on in each of them. So then you need a
way to aggregate these changes like
upwards. Um and it's just I think it's
so interesting that it's just the same
as like in human organization like all
the same principles kind of follow. You
still have uh you start with a very
small team and then you add more and
more people because you need to get more
throughput and all of a sudden you need
a manager to like oversee things and
then you add more managers and then you
need a manager of the manager. And this
is essentially what's going to happen
with agents too. But you are just gonna
like keep on going up the lab levels of
abstraction.
So when you're a manager, you need to
start thinking of like how do you scope
and paralyze the work. Uh because you
want to get like higher throughput. Um
but some things are not necessarily um
it's not good to make all the changes at
once. For example, if you have two
different tasks working on the same part
of the codebase, you're going to get
merge conflicts. So you need to still
like plan out scope and paralyze uh the
work. Um and one like one unit of work
can always be one agent. Um so then like
how do you take a long long list of
things you want to do and actually like
make the most out of that um and run the
most amount of agents that you can do.
And to do this I think it's important
that you preserve um like tribal
knowledge of the codebase like you still
understand what's going on in the
different systems. um you know like how
data flows, what the users want, um
which part are critical, which part
don't um so not outsourcing too much uh
to the agents, but like very be very
like direct um and and managing managing
them pretty well. And when you're going
from sync to async, you are going to
need to trust the agents a lot more um
because
you are going to send them off and doing
longer and longer tasks and when you do
that you need to like get more context
up front. So you kind of like frontload
uh the context to the agents either
through like a plan or a long spec and
then you send them off and then you let
them go. And once you start doing this
regularly, you're going to like start to
feel the agents. You're going to like
understand the models. You're going to
see like these are the weaknesses. These
are the strengths. And you are going to
create like this alignment with the
models. So you know like how to prompt
them and what intent to give them. And
again, as the models keep getting
better, you have to give them shorter or
less and less prompts as you used to uh
before. But you still got to provide the
intent and be very clear like what uh
with the change you want the agents to
do.
Um and there's like no there's no
shortcut to this uh from what I found
and from the what the team has found.
You just got to like spawn a shitload of
agents and just like let them do the
work and see what happens. And as long
as you have good safety guardrails, you
can just let them do that. Um so you
probably shouldn't let them push to prod
like straight away.
>> Sorry, one question. Do you
Do you multiply the working environments
as well or do you let them all the
agents work in parallel on the same
development environment? um
>> yeah so this kind of comes down to like
um personally I'm always using isolated
environments so in different VMs um I
just tweeted about this actually because
on one hand if you're sharing the
workspace you can have like git work
trees where you like have diff shallow
copies essentially of the codebase on
the same machine and you can reuse
services but you're still going to have
to branch every like database or cache
or user management to have like
reproducible and separate environment
ments like if you are going to make uh a
lot of changes at once you need to you
want to know that they are pure and
they're not like having side effects to
the other branches and that's why I
found like just using cloud agents uh
where I spawn a VM and this VM can run a
database uh uh internal tooling
databases other stuff uh and the cursor
app itself and then have the agent just
work in that isolated environment to be
much better um it is more expensive It's
going to take a lot more work to set up
your like factory or your environment to
support this. But once you have it set
up properly, you can scale this to like
100 or a thousand agents. I'm not sure
how many we are running today, but I bet
it's like multiple thousands a day. Um
just agents running in the same or like
copies of the codebase.
Um so that's what I would recommend.
Um yeah, so when you're a manager like
your job changes quite a bit. Um so um
you have to like look at your system as
a whole. You got to like think of where
is the human in the loop needed. For
example, do you have a log service like
data dog and do you need to copy paste
the logs and go into the codebase and
paste them and like run the agents to
identify and and trace down issues or do
you have user feedback that you need to
copy paste from Twitter into somewhere
else and let the agents do something
with that? Um, do you have like a notion
thing uh where you have all your specs
and you need to copy paste a notion or
export them into markdown and dentry
agents? there's probably a way to like
automate all these different things. Uh
either it's like skills for MCPS or
either or or separate automations.
So think of like where is a human in the
loop needed and try to like automate
that away.
Um
the second thing is like catch where how
can you catch agents going like off not
doing what you actually wanted to do.
Um, and this is like the this this
is like the perfect flywheel for
improving your factory as well. If you
can see agents like um creating like
wrong uh schemas in their database
because they're not following naming
conventions, etc., that's probably a
rule somewhere. um or if they are um
just producing really ugly UI, there's
probably a way for you to create a
design system and let the agents be
aware of the design systems where they
can uh incorporate that and uh use it
for the next kind of like iteration you
do
and yeah then you take all these
learnings and uh you use it to actually
improve the factory
and thirdly it comes to like scaling the
factory. So now you have like your
environment set up. You know how to uh
be a manager to like manage a fleet of
agents. You scope the task and you do
all this. Um so how do you like actually
take it from like five agents to 10
agents to 50 to 100 uh agents
and um
the thing is again um not looking at
code is going to be a real thing if the
models get better and they are getting
better. So observing the outcomes um
kind of like the same thing as
previously like where they go off the
rail uh what are they producing what are
the artifacts etc. Um how can you make
it so that the agents also can verify
their own work and verify the outcome
that they produce.
Um you should set up automations you
should look again at the things you're
doing repetitively. Um, so one thing we
could do for example here is if we go to
cursor and we go to uh this music agent
again uh I can ask uh looking at my uh
chat history what repetitive tasks am I
doing?
Um so we can ask the agent to like look
at this and identify potential
opportunities.
Uh, so it's searching the AENT
transcripts and it's producing
some kind of artifact of this.
Um,
yeah, we'll see how this goes. Um, I
actually built this into a plugin. Oh,
let's see here. Uh, plan execution
loops, restarting the produ direction.
Um, let's see here. Ableton like UI
iteration. I should probably like put
this in a rule saying like make it look
like Ableton.
um
tooling, housekeeping, etc., etc. So,
this product is very shortlived, but if
you're looking at an actual production
thing where you have prompted a lot over
time, you're probably going to find
things that you are doing recurringly.
And I want to show you some things that
we are doing at cursor
um that we are automating. Um and some
of these are not that obvious all the
time, but one is for example, let's see
here.
Oh, not this one.
Uh, let's see here. For example, daily
review.
So, I have this um automation for
checking my own daily review. So, this
is going to um look at Slack. It's going
to look at GitHub um and it's going to
send me a summary of the things I've
done um over the last day. So, I would
previously have done this like writing
down my notes maybe um thinking like
what did I get done today uh or like
running an agent with access to MCP but
now I can just put this on a schedule
and do this automatically for me. Um I
want to show you a different one uh for
example read merge PR comments. Um this
is also like a way for you to uh learn
over time. So for all the PRs that we
merge in our main repository, we can
look at the comments and we can look at
what did humans actually review here uh
and what do they say about the changes I
made. Uh because if it's if a human
actually goes in and reviews a PR and
leaves a comment, there's probably like
high high value and high signal and high
intent uh in that comment and we can
then store that later uh in order for
the agents to actually learn over time.
Um we have another one uh which
I can show you here.
Uh
this one. Yeah. Again the code owners.
Um,
so this one allows us to we essentially
had this problem where um we had code
owners in our codebase and they were
kind of right most of the time like 80%
of the time but for these 20% of the
time they caused a lot of bottlenecks
for us internally like we were blocked
on merging the PR we needed someone else
to um to review it for us and maybe they
were in a different time zone perhaps.
So what we started doing was building
this agentic code owner thing and what
it essentially does is looking at PRs
and checking like first of all what's
the risk of this? What's the risk level?
Can we is it just like changing a
variable name? Is it changing a constant
that's changing like how long a trial
subscription is or something like that.
Um and if it is a low risk it can just
approve the PR because we don't really
we don't want to block uh our own
engineers uh on these things but if it
is um we can see that it is a
high-risisk PR and then we can find like
okay who who made changes to this
previously and can we like pull in their
feedback um and making the most out of
this and like um first of all making the
code safe um and not breaking into
systems but also for the user that
actually did the initial change, keep
them in the loop and like keeping them
up to date on and refreshing their
context of what's going on here. So, it
kind of like it goes both ways. Um,
and and yeah, multiple uh value ads from
doing this.
Um, let's see if there's one more
review. No, I think that was pretty much
it. Um, or yeah, I have this one more
thing called continue learning. Um so
continue learning is another type of
automation um that I created um a couple
weeks ago as well and it essentially
does what we did with the agent. We look
at the previous transcripts we have and
we can then extract like memories and
learnings from what we said previously
like if we're correcting the agent to do
uh a certain thing like um use this
component instead of that component or
um always uh refer to me as uh like
always like have very like verbose uh
descriptions of things that you're
doing. instead of me like every time
going in and um asking the agent to do
this, I can create a rule, but I'm kind
of lazy so I don't really um remember to
create a rule. So instead, we can have
this continual learning plug-in that
looks looks through uh the transcripts
and store this as a rule uh for you
instead. Um so these are all examples of
like systems to automate yourself away
and to automate like things that the
agent can do for you. Um and I think
that's the important part of like
building these factories like how can
you identify uh the flywheels and loops
where you can uh automate yourself away
by building systems.
Um okay um and yeah
you are going to move up abstractions.
So now you're managing five to 10 agents
but tomorrow you might be managing uh an
agent managing other agents. Um, and
that is just going to grow. Like you're
going to have a lot of sub agents like
under you working for you.
Um, cool.
So yeah, what I want you to take away
from this is be very clear about the
intent and like really think about
what's the actual problem to solve here.
What do we want to get out of this? Um,
don't outsource important decisions.
Like make sure you're staying in the
loop for important decisions. Um whether
this is like uh safety or security or
databases or payments and authentication
um some things are really important and
should not be made uh uh should not be
decided by agents but by humans.
Um build tools and systems try to find
these flywheels and like codify them and
get them in uh your systems and let the
agent have access to them.
um store context for later, whether that
is like agent transcripts or artifacts
of things you think look good. Um
because this is going to help the agent
to like know what good and bad looks
like over time and this is going to
change. Um so storing the context and
building the tools and like keeping them
up to date is is more important than
actually doing the work because this is
going to provide like the framework and
the guardrails uh for the agents.
Um, and lastly, like let the agents be
free. Like think of what do they need.
Um, I have a friend at Lovable. Uh, he
mentioned that they set up a Slack
channel or he gave the agent a tool, a
vent tool, so the agents can complain
about things uh when it was running. And
the agent started complaining about,
hey, um, I can't like access this image.
Um, I'm like very frustrated about this.
And then it posted straight into a Slack
channel. and then they set it up as a
joke but then they started scrolling
through and like oh this actually is
very valuable like we should probably
like give the agent access to reading
images and they did and then the agent
started complaining about something else
that was problem with the harness um so
find ways to let the agents be free I
think that's um very important thing
um okay that's kind of it uh and that's
kind of like a direction of and things
we have found like building cursor and
like taking cursor towards uh software
factory. Um I hope you learned a thing
or two and can take away um some of
this. I'm happy to take any questions
about anything cursor. Yeah. Or actually
now we have the microphones coming here.
>> Thank you very much. I have a question
about um uh code quality or architecture
quality. So when agents ship tons of
code and you barely can review them uh
how you uh ensure the
the code is extensible and so on. I mean
um you can uh establish hooks or guard
rails for measurable things like I don't
know uh number of lines in the file
should not be more than something but uh
the architecture is not measured this
way. So um and agents they have this
completion bias. They want to finish
task as soon as possible and uh they
don't think ahead. They don't have their
uh picture of the future how code will
evolve. They just want to finish task
now. And uh yeah,
>> thank you.
>> Yeah, it's a good question. Um I think
we as a humans have the same problem,
but it just takes a lot more time uh for
us to like discover them. Um,
one pattern like the good thing about
agents and models being like um,
essentially like completion machines is
that they will just look at existing
references and just continue forward
with that same path. So if you have
existing things you can point them to, I
think that's very important. If you
don't, I think there's a case where you
let the agents do um one-off
implementations here and there and then
eventually you have another agent like
refactoring like we do as humans as
well. So like one to generalize um and
build abstractions and all these things.
Um so like how can you build like a
system to like detect this and verify
that the abstractions that are getting
built is also good and in line with what
you want to do. Um but I think it's
going to be like a lot more
architectural review for humans. um and
and scoping and like planning of what
the architecture should look like and
system design. Um but yeah, it's it's a
tough problem.
>> Thank you.
>> Hello Eric. Thank you for the talk. Um
when it comes to the activities of
building the factory one thing that I
observe for example when it comes to
building things like rules in a in a
team is that because it's so new almost
everybody feels oh this is a rule for me
and I don't want to inflict it on other
people
>> and I notice this creation of silos
where each engineer ends up having their
own separate different factory. Do you
have any advice on how to bring it to
the point where the whole team is
contributing to the creation of the
factory?
>> It's it's a great question. Um I think
it's hard. I think it's very cultural as
well. Um I mean like we developers have
always created our own tools and like we
want to have our own custom setup. Uh
but at some points like we have to unify
and like uh on a certain structure. So I
think historically we have like had PR
reviews and all these kind of things as
a ceremony to like align on the code
that's being produced and making sure
it's consistent. I think we got to take
the same principles and apply that to
the tools we're building as well and
like the the guardrails and enablers and
primitives. Um so I think I don't know
establishing some kind of
a forum where you can discuss these
things and like plan like what do we
want the factory to look like? what are
the components we need? Like what are
the integrations we need? Do you have
any examples of like specific things
that people is it like flavor or is it
more bigger changes that the agents are
doing?
>> Um what I notice when it comes to rules
they they create like oh I want to like
one person wants to write the test first
and they create the rule to write the
test first but they know that somebody
else doesn't want to do it that way.
>> So then they have the rules only on
their machine. and they they don't share
it because it is too unique to what they
are. So they're collaborating the whole
team is collaborating on creating the
codebase
>> but the collaboration in creating the
factory in thinking well are we deciding
now that the factory writes the test
first or not
>> that is a big decision that is hard to
align everybody and accept that
like with all of these rules not
everybody's going to be completely on
board and in most cases doesn't matter
when
>> when you defer a little bit but it is
hard to
>> yeah I guess it's it's it's a human
problem and a human change that needs to
be made.
>> Yeah, these
>> but it's a good question. I'll think a
little about it. Thank you.
>> Thanks for the talk. Um, a lot of the
patterns resonate. Um, I was wondering
what is needed, what kind of patterns
can you suggest to take it to the next
level if you work on enterprise
brownfield mission critical systems
>> that cannot fail. they cannot be
insecure. If you look at the recent
supply chain attacks and you give your
agent sandboxes, maybe that's not even
enough.
>> Um, so the humans remain accountable.
>> Yeah.
>> And we can't say, "Oh,
it's not my fault my agent did that." So
do you have any extra patterns that um
um or or is it just inherently we have
to keep reading the code which may feel
like reading assembly lines in the 80s
or something?
>> Um I think
if you can spend a lot of compute and
tokens up front before you as a human
actually like needs to be involved. I
think that's a pattern that we found be
to be pretty successful. Um so one thing
is like manually writing tests for very
critical parts of the systems. Um and
then just letting the agents like run
them uh a lot. Um the second part is
like
building automations to like um our
security team. They built like this
security sentinel which is an automation
that like looks specifically for very
very uh specific invariants um of the
system and they run like 10 of these on
like certain PRs that changes certain
files. Um and then
yeah I think
I think it's a bit contextual as well
but yeah just spending a lot of tokens
before uh and trying to like find
different variants and like almost red
teaming. So one thing I did is um
instead of focusing on velocity and
throughput,
>> I focus on quality.
>> Sorry, what?
>> I use AI to focus on quality
>> and just improve the tests and just make
it completely AI ready.
>> Yeah, I think that's very good because
if you as a human trust the tests, you
probably are trusting the output even
though you don't have to look at the
code. And that's kind of like where
we're going.
I think.
>> Hi. So, uh, thanks for a great
presentation. Uh, I I find myself kind
of like lacking and slacking in using
guide rails, uh, especially like rules
and hooks.
uh partly because historically the the
the knowledge of how to do that properly
was very scattered and decentralized
across uh whole web. So you would have
this like exotic GitHub repos would uh
try to like centralize this knowledge or
maybe you would have some like medium
articles or maybe cursor would cursor
company would do a blog post on this
right but still it was very evolving and
also the capabilities of models
themselves on especially on instruction
following
uh they are also evolving and they are
getting better on that and and and it
always felt like kind of like duct
taping. to me. So I'm wondering uh
basically can we have AI to help us with
that? meaning that could cursor for
example give us like proactive agents or
maybe some new setup uh or maybe wizards
kind of uh setups where we could
identify our workflow and then help AI
build us rules and and guard rails and
all those like rules artifacts
uh for us. So maybe just like a
proactive agent.
So, so maybe we would have like an agent
that would scan our workflow globally
>> and then help us build those artifacts.
What do you think about it? And do do
you guys think about this in the
company? Maybe do you work on that?
>> Yeah, totally. Um, I think now there's
like two places where you can do this.
One is like in the product itself with
the whole um with the like continual
learning pro. Um, let's see here. Uh oh,
I don't have it installed. Uh we can go
to marketplace. Yeah, with a continual
learning kind of uh plugin to actually
like look at your um transcripts and
like extracting rules and memories and
all that. That's like one way to do it.
Then there's like another world where um
you
like change the weights of the model
depending on like what your codebase
looks like and what like your engineers
are doing like in a specific team. uh
and you like you reconcile them and it's
like it's like true continual learning
uh not this hacky plugin um and you like
actually baked that into the model uh so
they actually know what your preferences
are etc. Um, but totally like memory and
rules and all that. I think that's going
to become more and more important over
time. Um, because that's kind of like
what's lacking. That's kind of like
what's preventing me from having a lot
of trust in the agents sometimes because
like I say something and they forget
about it. Uh, but they're just like
stateless machines. So, how do we
capture this knowledge? Um, so I think
we should put a lot more like time and
effort into um building these systems.
>> If I might just follow on that. So, so
you you say that you seem to first to to
start a project or or do to dive into
the project that's already existing in
the codebase and then to build rules on
top of that. How about we we first have
rules and we want to start a new
codebase, new project.
>> How to how to actually have those
>> good rules for us? Do do you think that
humans should humans should still do
that or can we also automate that? And
do we have a new best workflows for
that?
>> I think it's hard because like my
perspective on rules is like the bridge
between uh the model behavior and like
the human behavior and like how do we
steer the models in a way that they
follow me as a human what I want to do.
Um and in a new product I'm not really
sure what I want to do. Like I kind of
want to like outsource that to the model
like see what are they doing here? Can I
run different models? Do they want to
like combine them or do I want to scrap
everything? Um,
so I think it's it's hard like one like
the best example of a rule that I can
think of internally is for a bug bot.
Um, so when we're doing database
migrations, uh, we're not really using
foreign keys on a database um, for
performance reasons. And the models like
the right way to do this is use foreign
keys, right? Um, so they will always add
a foreign key. Um, but when it like hits
GitHub and there's a PR created, we have
Bugbot looking at us as a reviewing.
like, oh, I have this rule saying like
we should never use foreign keys. So
then it flags this. Um, so that's like
the gap between the human and the model
and what we want the desired like intent
we have versus what they have. So I
think rules should like emerge
dynamically over time. Um, and before
that you should probably adjust those
ephemeral like specs and plans. Um, oh
yeah, there's like one over Oh yeah. Oh
yeah,
>> it doesn't work. it it worked. Uh so
thank you Eric um for the talk. Um as
evol evaluation and trust is a big
point. I'd like to know how you
effectively do uh GUI testing and uh
user acceptance testing
>> automated.
>> If you can show show like something of
your workflow.
>> Um totally the best or like the main way
I do it is using let's see here. Oh yeah
I have this one for example. Um the main
way I do it is using uh the cursor cloud
agent with the computer use that we
have. So I'm going to publish this. Oh
no, that's bad.
Uh I guess we're not doing that. Um I
have this website where it's running a
uh I have like seven components um like
a button, a dropown, etc., etc.
components and then I'm generating each
of these components with a different
model and because I want to like compare
like what does uh composer dropdown look
like versus GPD54 dropdown look like and
I put this in a grid but when I created
this website there was an error where I
had this like view code button so I
could actually see the generated code it
was not working because the model didn't
uh bundle the actual code so I went to
cursor and I clicked when clicking view
code on a component it says it cannot
load a
And it's like it's a very like short
description. So what the Aiden did uh
you can see here it spawned my local
server. Um it started like clicking
around and pressing enter. Uh we can see
the cursor up here
and it's creating this like screen
studioesque
uh recording where it's like chopping
and speeding up and zooming in etc. Um
so here it's taken a while because
computer use is fairly slow. um it's
consuming a lot of tokens and we can see
we have this view code button and now we
can actually see it's working too. Um so
since this is a very like much of a side
product for me I'm not really going to
look at the code. I'm just going to like
see that this works and I'm going to
merge it. Um but you can keep on
prompting the model to do very specific
things for you like can you follow these
like specific instructions um like a
login flow for example you should click
the button you should log in um the
models this like login steps are
probably so much in distribution that
you can probably just prompt the model
to say like go to this URL and click
login or like log in and it's going to
like understand which steps it needs to
take but then you can ask the model to
like uh input a wrong password or input
a wrong email and see uh what are the
results from the website and maybe the
website is giving like wrong credentials
and then the agent would understand like
oh I need to like put in the right
credentials. Um so just like you would
um like hire a consultant like a QA
consultant and giving them instructions
you would just give the same
instructions uh to the agent. Um so this
is like one way to do it. Um I guess the
other way would do like more uh
playright puppeteer and just automating
like a browser thing uh which is a bit
more deterministic as you can review it
um and check it in and like have other
people reuse it. Is it does that answer
the question?
My question was going more into uh like
user acceptance testing to check does
this thing actually look right because
like uh testing a login you can do you
can automate that you need an agent for
that
>> but like does the does the website uh
look right is it consistent through all
the pages that are generated stuff like
that
>> yeah yeah then I I use cloud aents for
that a lot Um there was one I can't
remember now but I think it was I did
some changes in the docs and I just
asked it to like open every single
instance where this word is referenced
uh take a screenshot and give it back to
me. So then I could just like look at
all different screenshot everything look
good and then I could merge the code.
Um, so letting like the agents do uh the
navigation and clicking around and uh
the testing for you. Um, I think it
works surprisingly well. Like this was
like a very much an AGI moment for me uh
when we launched this in last year
sometime internally. So have you have
you had a chance to try cloud agents in
cursor?
>> You should curious to get your feedback.
>> I know you have
Uh which one? This one.
>> The agent like spawning all the
giving us the
>> what was the initial question? How long
it took or
>> Yeah. No, I I see. But how expensive it
would be for like
>> ah um very straightforward. Like I for
this one I did no specific setup um for
like our own repository uh where we have
like when running cursors like we can
actually like reproduce like this demo
here is running all the backend services
for cursor it's running all the front
end things um and this is like a lot of
lot of different things um so the VM is
quite beefy um but as long as you give
the right instructions it's working
really well what we did was creating
this internal C five that the agent
could use to like uh we call it um like
cursor dev tool cursor dev tool backend
start cursor dev tool frontend start um
and that is abstracting everything away
um that actually needs to get to like uh
orbstack to running click house and
postgress and reddis and then the front
end running like um electron and then uh
glass here but then they just like
coexist the two different processes
Um, and the agent have access to
everything like just as a human would
do. Um, and you can have like the agent
be authenticated if you store like a
snapshot where you are authenticated,
etc.
>> Yeah, more expensive was in dollars like
>> Oh, sorry, sorry, sorry. Okay, okay,
okay.
>> Uh, my bad, my bad. Um, yeah, this one I
don't really have I could probably look
it up. I would guess this is like
$1, something like that.
>> Um,
>> just one turn.
>> There's like for one turn probably like
this initial one would be $1. Uh, and
the other ones I just asked them to
re-record a bunch of different things.
Um,
>> something like I can look it up later.
Totally.
>> I jump back and forth between
Yeah. And I guess depends on which model
you're using too.
>> Okay.
>> Okay. Uh my question is about uh hand
hand off between humans and and agents
whenever you are using different tools.
So in my current setup, I have a product
owner and a functional analyst that they
they work on cloud code and they
prototype very fast uh with basically
without uh u so much thinking about oh
the back end the architectural choices
or whatever and then they pass the the
control down to the delivery team that
uses cursors and has to make that stuff
work actually work. uh which best
practices do you suggest in order to
enforce a proper workflow between people
just not knowing basically what they are
doing
>> uh on a technical point of view of
course uh and the people that needs to
bring that thing that maybe has okay
some poor choices such as okay use that
database or then cloud changed the idea
and they moved from super basease to
torso to any other kind of fancy
database that actually is in that uh in
that environment and then bring that
into some sound architectural choices
moving from cloud code to to
>> cursor.
I think what we're doing internally is
like we have like one or two PMs
>> and they are building a lot of different
prototypes. Um sometimes it's actually
in the real like product itself. Uh
they're using maybe cloud agents and
just prompting them. they're getting
like a video like this back of the
changes and they see like, oh, it kind
of looks like I wanted to and then they
tweaked the designs a bit, but the code
might be really bad or like not
following best practices. Um, which if
they had a if we had a good factory,
then it probably would. Um, but if
that's the case, uh, we hand off like a
link to the cloud agents. We just copy
the link and just send it to the the
like engineers like, hey, this is like
something that we want to build. Um,
does this make sense? Like, can we do
this?
Um and then you have a lot of intent
already expressed. Um but the other case
is like having the PMS they have a
separate repo called like prototypes and
it's just like an HTML file like a mega
HTML file uh reproducing like the cursor
UI or the dashboard.
>> Yeah. The the problem is the migration.
So uh just uh practical use case I had
my PO and functional team uh build out a
very fancy demo using Prisma and Torso
and
>> whatever database and then storing data
on Versel blob storage and then my
delivery team had to migrate that to use
SQL server and
u C and Aspire for the back end and the
migration was really painful Even
because uh when they use the agent
freely with no constraint uh the agent
sometimes decided to use say nextjs some
other times decided to use vite another
time it decided to use and uh putting
constraints in form of rules within that
agent shape that down the path. But the
problem is that uh we need to uh write a
lot of rules and make them consistent uh
and it is not easy to to manage all the
workflow. So we are shifting a lot of
effort from uh having people to write
code to having people to write guard
rails and rules and whatsoever and make
all the pieces talk to each others.
>> I see. Yeah. Yeah. Yeah. I guess um if
if the POS and PMs can't have access to
the actual codebase just like handing
off an artifact is like the minimum
viable intent uh which could be like an
interactive like back in the days it
used to be like Figma prototypes right
you can click around and you get like a
feeling for it now you can have them
even higher fidelity where you have an
interactive prototype using like web
technology without like touching
anything of the backend stuff or it
doesn't have to be like a working thing
for real if it's just a prototype
internally uh but just enough to like
your engineers can understand like oh
this is like the intended thing if I
click this thing that should happen um
or if I like enter some text here and
click send a row should show up here um
and I think all that can just be done um
in the front end kind of like a
hackathon
>> you don't need to migrate the the
prototype into something that becames
production really but rather uh rewrite
that.
>> Yeah, I think so. I think rewriting um
and I think like setting like clear
expectations from from the engineers to
the PMs and POS like what engineers kind
of want from the product organization
and like what's most helpful for them.
So maybe not like vibe coding complete
SAS products is the most efficient
thing.
>> Isaac, thank you for that presentation.
Uh my question as we're building more
and more agent and it become part of our
time critical processes how do you see
the brown outs and blackouts as as a as
a as a new risk and um what's your uh
what's your view how it can be mitigated
and and the impact reduced?
>> Yeah, it's a great question. Um it's a
really good question. I think it comes
down to what we talked like the humans
are still accountable for the things
that's being shipped. Um so the humans
need to build like systems and
observability and monitoring around the
changes that's being made. Um and I
think that still like comes down to
understanding which are like system
critical areas of the codebase. Making
sure you have good like observability
and understanding of everything that
goes on. Maybe like every line should be
humanly written in these critical things
or at least like always humanly reviewed
by one or two people. Um, and yeah, it's
it's close to vibe. It's easy to vibe
code close to the sun and fly too close.
Um, so I think it's also like a cultural
thing where you have to make sure that
the humans are still like accountable
for for the things getting shipped. Uh,
but yeah, setting up good systems to
understand
um the changes being made. I think
that's important and tests.
>> Hi Eric, thanks a lot for the talk and
I'm assuming you're probably one of the
people around the world that has the
best understanding of how to use these
technologies. So this question takes a
step back about from the technology and
things about processes and how do you
manage yourself in your work days and I
wonder how long are these tasks or how
how long do you get to be away from your
agents without babysitting them and how
do you actually invest this time uh
let's say you have five 10 15 minutes
how do you make the best out of your
time and maybe how many agents do you
have in parallel like mental processes
and how do you manage to yourself.
Thanks.
>> Yeah, it's a great question and I think
like once you like there's like two
levers to pull. Uh one is like the scope
of the of the change. Like the larger
the scope is, the longer the agents are
going to run and if you want them to run
for a really long time, uh you want to
have like a verifiable um system so like
they can check their own work etc. Um
and the other thing is like how much can
you parallelize like how many of these
agents can you spawn off? Um, and I
think the sad reality in some sense is
that there's going to be a lot of
context switching. Um, I probably work
in four different repos or like four
different areas of the codebase at the
same time. Uh, whether that is like
through a like single like feature that
requires front end, backend, database,
um, testing, yada yada. uh or if that's
like five completely different things.
It could be like docs. It could be like
uh side projects I'm exploring. It could
be fixing a bug from a Twitter user. Um
but I usually they range from like
um probably five to 10 agents five
agents like asynchronously running in
the cloud at all times. And while I'm
waiting for these, I'm either like
scrolling Twitter or
It's true. We also have the browser and
cursor now so I can just stay in here
and do it
or I have like a synchronous task going
where like I'm a bit back and forth. Uh
maybe that's like fixing a small thing
in the codebase or maybe that's like
planning the next thing. Maybe I'm like
sourcing notion and slack and just like
creating a spec in cursor using a model.
So I love to like plan synchronously and
then just execute the plans like
asynchronously and then once that is
done one of my cloud agents is probably
done as well. So I can come back and
like review that keep on prompting it a
bit maybe merging um and some parts I
still like need to test manually like
maybe I need to download a copy of Glass
or Cursor 3 um test it manually and like
this looks good to me. Uh let's go ahead
and merge
you a quick question. This factory
building leaves us with a scattered
ecosystem of a lot of markdown files. Is
there an easy way to organize these
files and to keep an overview of the
factory you have actually built? As
maintaining a factory would require you
to have an overview of the processes you
want your coding agents to go through.
What tools do you use? What methods do
you recommend? How do you keep a mental
map of the factory you have built? And
how do you maintain it?
>> Yeah, it's it's a really good question.
I think it's somewhat unsolved as well.
Um, one of the reasons we rebuilt cursor
to look like this instead of like the
traditional ID is the fact that we are
using more agents and we need like a
better control panel where you can like
see all the agents and manage them and
spawn them etc. Um, so what's going to
happen with like cursor 3? Um, this is
like the first tab at like multi- aent
orchestration. Uh, what's going to
happen is that these are going to be
like nested agents. So you're going to
have like opening this one up and you're
going to have like 10 agents in here.
Um, so you can still like introspect
them and see what's going on and
following the traces. But you're
probably also going to have like
somewhere here like some kind of project
view where you can see like an
aggregated status update. So like here's
what everyone is working on and here's
like the latest here's what you as a
human need to review. Um so I think
these are product things that we're
going to build into cursor. Um, but to
like set the spec for the factory, I
would probably like have a folder in
your codebase. Um,
where you like outline how certain
things should work. Um, maybe that's
like just markdown files of saying here
are some best practices. Uh, maybe it's
probably rules. Um, and establishing
some kind of council to decide on like
what goes into the factory and what
doesn't and like what are we lacking to
like improve the factory. Um, so as long
as it's something that the agent can
understand and read, which is files, um,
that's probably what I would do and just
store them as, uh, yeah, in your
codebase that's checked in somewhere.
>> Thank you. Um, I'm just thinking about
like teams of the future. Uh, so, you
know, a year or two ago, it's like very
reasonable to have, you know, an
engineering team that might be several
hundred people, several thousand people.
um what does this do for that and uh
what roles and kind of like roles in a
engineering team right this is kind of
akin to almost becoming somewhere
between like a product manager and like
an architect um so what roles do
engineers have
>> yeah I think that I think that's very
accurate um it's hard to predict like
what are the like second third fourth
order effects of of this happening
And it's definitely like writing less
code, looking at less code, um, spawning
more agents. Um,
it's going to be like how do you take
because like we're still building
software for humans mostly. So like how
do we know what other humans want? Like
how do we talk to our customers? How do
we market what we're building? How do we
do all these things and bring them into
the actual like factory? Um, who sets
the direction? What's the intent? Um all
these things are coming from somewhere.
Either it's like creativity from someone
else's head or it's actually like a user
demand. Um
so
having someone like doing that is going
to be very important. Having someone
like like aligning that between the
different humans in the org I think is
going to be important. um having people
building uh the scaffolding for the
other agents and like uh just building
the assembly lines where the agents can
actually run. I think that's also going
to be uh important but like to what
magnitude and how many people it's going
to be like in yeah I don't know it's
really hard um you can do a lot with the
models right now with a very like small
team if you have the right setups in
place and like yeah depending on the
domain you're working in. I don't know
do you have any predictions?
I I I I see issues with kind of like uh
from like a labor perspective. Um if
you're if you're working in an
incredibly agentic environment, what's
your need to like like what happens to
training new grads, hiring new grads? Um
and kind of like the the future from
that perspective, what happens with
office politics and like land grabbing,
right? Because
>> basically your your value now becomes in
your ability to configure and set up
your own kind of like agentic team, not
in your ability to kind of like program
and be productive anymore.
>> The 10x engineer is no longer about, you
know, words per minute. It's like
prompting.
>> Yeah.
>> Yeah. Tok to token usage.
>> Yeah. Am I Am I paid in tokens? Am I am
I
>> yeah leaderboard you know
>> um
>> gota be token maxing
>> am I paid an amount and then like my
token usage takes away from that
>> you know how do you how do you optimize
you know for
>> we got to train the models to be more
political I think that's the solution
right
>> we need more like water cooler talk
>> I guess we're gonna get more of that if
the agents are doing our work
>> hi Eric hey
>> for your talk
>> um I was wondering Um probably uh you
are using uh at cursor uh some kind of
uh uh issue tracking uh tools like
Atlassian or Gyra Jira. Okay. Um,
are you using uh I was wondering if you
are using uh um
um agents to check automatically check
uh and uh um read tasks directly from uh
uh Jara for example and spawn um sub
agents to perform the work
or if there is always a human that uh um
start to work using cursor.
>> Uh so we're using linear for issue
management and uh we have this first
party integration as well. So for every
ticket that's getting created in linear
we spawn in the cloud agent. Um, so like
one where I interface with this the most
is like if we have a feature flag for a
specific thing that's rolled out and if
it's rolled out for two weeks with 100%.
Um, the system kind of like signals us
like hey uh you can it's a stale feature
flag at this time you can remove it. So
then we have this to create an automatic
issue in linear and since that is hooked
up with cursor it triggers a cloud agent
to remove uh the feature flag. So it's
kind of like completely automatic once
the system knows that it's rolled out to
everyone and I can just like I can
probably look at the code and say like
okay we can merge this the feature is no
longer active. Um and we do this for
like everything. So once you post
something in Slack, uh we either have a
linear Slack a Slack agent look at it or
we have a cursor automation to like um
look at the message that was posted and
uh triage it and like look for
duplicates or like if it's determined to
be easy like start to implement a fix
for it immediately. Um and this is like
an example of where a human is like in
the loop where it might not have to be.
It could be like me going on Twitter and
like seeing a tweet like something is
broken with um the plan mode uh button
dropdown. I can copy that into Slack and
then having the agent uh perform the
work. But there's probably a way where
we can just source this feedback
immediately without me having to like
scan it and triage it and copy paste it.
Um so that's kind of like a bit how we
work with uh linear and issue
management. Um,
but yeah, we we're also like, yeah,
since we're spawning a cloud aent for
every single thing, it provides a good
way for us to dog food the product and
like test it out. But I'm not sure if I
would recommend that for for everyone
because it it can be quite costly.
as cloud agents are a little expensive.
Uh do you have something in road map run
something locally like I'm just thinking
of an alternative called dev containers
and opening in that but do you have
something planned in the road map for
that? Um,
what I think the closest thing you can
do is probably just prompt Aiden to run
for a really long time. Um, it's kind of
like the same thing with like running
local models. Um, and the reason like
for I've tried it like I probably tried
it like once a month running like the
best open source local model and like
seeing how it works in cursor, but it's
never the same experience as ling like
um, GPT or claude or composer. Um, and
the same thing with like running really
long things locally.
I found it to not work that well as if
it's running for a long time, it's
probably going to use your your local
database, your other local stuff. Um,
and it's going to prevent you from doing
other work locally unless you like
create a VM on your own machine. Um,
um, and and if you do, you could
probably
Wait, never mind. Just re ignore
everything I said. We launched cursor
workers. So cursor worker is um
we launched it like yesterday. Uh it's
way for you to run the same uh
infrastructure uh and orchestration
layer as we do for cloud engines but on
any machine you might have. Um so you
can do like not right now uh we can do
cdvoom.
Yeah. So you can do agent. So we have
the agent CLI and there's now a worker
and you can call worker start. Uh so
from here we have a worker running. Um
and this worker is going to show up in
here. Let's see. So we can do
self-hosted.
Let's see here.
Oh, I don't think it's hurt up yet. It's
a different uh account I'm running it
on, but event essentially you can run
this on any kind of machine and you can
get access to this um from like cursor
cloud. So you can spawn multiple of
these on your own machine or you can run
like a Mac mini or you can have a VM um
in any like cloud platform provider.
>> Right. Um just to follow up on that. So
you you you are saying that we can have
isolated environments in the local
itself using this command.
>> Yeah. So it's
>> call the open still call the frontier
models or composer models.
>> Yes. Exactly. So this is going to like
leverage the cursor harness um but it's
going to run on wherever you're spawning
this uh demon.
>> Yeah. That's interesting. Thank you.
>> So I like I built this like cursor claw
thing. uh where I have one running on my
Mac Mini and that has access to iMessage
and calendar and all these kind of other
things and um yesterday we launched
automations as well. So I can get like
um like a daily report or weekly report
of everything that's going on in my
machine uh that I might like want to
know on a specific cadence. And since
it's running like the agent demon,
you will get access to this in like
Slack and the web and the mobile app
that's coming um at some point not too
far out
>> sorry what
>> for iPads a lot a lot of time people
wanted to
>> it's going to use Swift UI so it's
probably going to be compatible with uh
iPad as well.
>> I think that two versions of iOS and
iPad OS are two different things
actually.
>> Got it. Yeah. Nice. Yeah.
>> I just want to ask quite a simple
question like when you have obviously
more than one developer in your where
you're working in your company and
you're spawning hundred and hundred of
agents to do a lot of different kind of
work. How do you ensure you don't step
on each other toes doing the same kind
of work ties and even how like you're
running internally? Do you still do you
use do you use a scrum or still agile
ways of working? You know, even that has
already kind of gone out of the window
already.
>> Um
yeah, what are we doing? We're not
really following any like traditional
methodologies in that sense. Uh we do
have like monthly goals and of things we
want to get shipped. Uh but I think
since everyone has so much like power at
their fingertips with agents uh this
like causes people to have like extreme
ownership over certain things. Um so for
the longest time there's like one guy
building like MCP and rules and like all
kind of accessibility uh by himself. Um
and now we have like maybe one person
focusing on MCP uh but they can own
everything around MCP and they don't
really really need to interact that much
with other teams. Um but at some point
that's going to break too. Um and like
so far in the like history of cursor we
have like found ways to like go in
around this. The like the agentic code
owner thing was probably one place where
we stepped on each other toes where the
code owners were like misconfigured so
we could just like instead of having a
deterministic thing can we just pull in
the relevant people at relevant time. Um
so like something like that is probably
going to happen with other like problems
that we're going to surface in the
future.
>> Thank you.
One question about the selfages. So do
we get all the goodies that we get with
these video walls?
>> I think computer use is the one thing
that's like still in uh early access. I
think I think we haven't shipped the GA
yet, but it's coming for sure. So this
should be like completely on par with
the cloud.
Yeah.
>> Can you describe the profile of these?
>> They describe the profile of these kind
of like mix between product managers and
engineers that that take this this
ownership.
>> Yeah. Uh so I guess the archetypes we
have it's like a PM um they talk a lot
internally in at cursor like they talk
with go to market with sales um they
talk with engineers they talk with users
they just product manage and product
manage and just keep everything together
in a way and also like shield like
engineers uh from various things. Um and
then we have designers. Um designers
work I would say like 50/50 in Figma and
code at this point. Like all of them do
code. Uh all of them like do push to
production. Um but it's a lot of like
exploratory work like what should like
what does it look like when you have
like 10 nested sub agents. Um and you
can't really feel that in Figma like you
got to actually like develop and
prototype that. Um and they work um they
work with PMs and then we have engineers
of course um but I think cursor is very
fortunate to build like an developer
product. So developers are building
developer product and it's kind of like
they have good taste. They know what
good and bad look like. They know like
what developers want and don't want. Um
and I think because of that they can
take such like ownership and they can
like go with the concept and go really
really far. Uh whereas so like the PM
might be setting more of the business
and uh like the overall overarching like
direction and then the engineers and
designers like collaborate on like what
does this actually look like in code but
also like how should it feel and how
should it look um for a developer.
>> Makes sense. uh are there like analysts
in this mix as well or is that done by
the product managers? Oh yeah that's a
good yeah so we have a data data team as
well uh data scientists and analysts and
they are also working closely with um
PMs of course and like understanding
like how users are using the product
where the bottlenecks are uh but also
like with engineers and like
instrumenting the code in the right way
and like understanding feature flags and
why certain users hit certain path and
some don't um so everyone is like just
working together um and we have like I
think the way we've structure team is
like pretty much um domain like
extensibility might be one team, cloud
might might be one team um and cloud
should still be extensible. So then they
have to also work together
um but we try to like keep it like um
modularized and not to ship our
organization that much.
>> Thanks.
>> Cool. I guess one final question if
there's one.
So um from time to time I messed up and
started a cloud agent in a wrong repo or
something where just like went out on a
tangent came back to an hour later where
was separate desperately trying to get
access to that repo. Um are there any
way to catch these agents that just
don't provide any value? They just
continue doing stuff but they're not
really making progress.
>> Yeah, I think that's that's on us for
sure. Um over the last year we have made
a lot of improvements to the cloud
agents where initially they were like
when they were like worked they were
extremely useful but most of the time
they weren't. Um so like again cloud
agents also come from this like internal
need of us just wanting to like run
things asynchronously.
Um and because of that we have also like
put a lot a lot of effort into making
our own codebase work really well in
cloud aents. So maybe some like We like
have to sometimes like create new
projects and jump into other products
and talk to our customers to understand
like where these things fall short and
we try to have like instrumentation of
like does the agent run for x amount of
hours or minutes and like does it touch
any files at all or like is it going in
circles and loop detection and these
kind of things. Um and this is like part
of the observability I was talking about
before. Um most of that should happen on
our side. Uh but there are always going
to be like very specific uh contextual
things where um like if you are the uh
the codebase owner need to like set up
certain things. Um but yeah we're we're
working on that improving it and if you
have any examples like please come to me
and I'll try to take a look. I think the
worst was when I started it on a one
repo and it just like
>> called out to Slack MCP and tried to get
access in 10 different ways and it
failed.
>> Yeah. Yeah.
>> Yeah. We could make that better.
>> Good that you're working on
>> All right. Thanks everyone for coming.
UM,
I'll be around for the next two days as
well. So, please grab me if you want to
discuss anything cursor or anything at
all actually.