Building your own software factory — Eric Zakariasson, Cursor
Channel: aiDotEngineer
Published at: 2026-04-28
YouTube video id: rnDm57Py54A
Source: https://www.youtube.com/watch?v=rnDm57Py54A
Um, okay. So, we're starting five minutes early. Um, hey everyone. I'm Eric. I'm an engineer cursor and I mostly work at developer experience and product. And today I kind of wanted to talk to you about my experiences like working at Cursor dog fooding the product and like getting to a place where you can build your own like software factory and like what that kind of like takes and the practical steps getting there. To be honest, I don't think we're really there yet. Like sub parts of the product and sub parts of the company are running like fairly autonomously. U but building a software factory takes a lot of work. I mean like look at like real life uh factories producing like hardware. There's a lot of assembly lines. There's a lot of people that goes into this, a lot of managing, observability and all that. And there's a lot of concepts we can borrow from that world and put into the uh software world. So anyway, here goes my observations from doing this. Um but first um the agenda I want to talk about like levels of autonomy uh precursor to factory, pun intended. um building the factory, running the factory, and then scaling the factory. And I want to finish with some Q&A for any kind of questions. Okay, so for the levels of autonomy, Dan Shapiro put out this blog post I think in January or February, uh explaining like six different stages of uh autonomy uh throughout like automating software. uh Carpathia has also like previously used cursor as example of like going from tab to agent and all that but I think this kind of like encapsulates this really really well. So we have this spicy autocomplete uh at the start and this is kind of like where cursor started in 22 23 like ages ago at this point. Um and we kind of like gradually moved up the ladder and making the software creation more autonomous and letting the agents do more work. And I think most people uh adopting the AI tools um are like at somewhere between level two and level three where you have a pair programmer where essentially just going back and forth with the agent, asking questions, um getting suggestions, asking the agent to do work um and eventually like finishing their tasks. And the step above that would be uh having the AI generate the majority of the code um which we can see like here in the developer level three. um where you as a human more kind of like reviews it um kind of like in the loop following traces and all that but as you further progress you're like becoming more and more of a manager and we'll talk about uh this more later but eventually like level four I think this is where I'm at at this point like for most like software projects where I'm like delegating as much work as possible to agents and probably like reviewing the outputs before I actually review the code um because I still look at the code sometimes Um and lastly we have the software factory which is essentially like a black box. Um Dan Shapiro calls it like the dark factory where you don't really have an insight. It's just like agents going around doing their thing uh shipping the code, testing the code, building the code, all that. And you as a manager just provides like the intent and the instructions um and like the goal uh from what you want out of the factory. Okay. Um yeah. So like why do you even want to create a factory? Um first of all like throughput you probably want to create more code with like less resources. Um you can run agents 247. You don't have to uh rely on humans that need sleep and food and eat and all that. Um you can just like have more agents. Um another like thing with the factory is like you have assembly lines and assembly lines produces um consistent outputs. So if you build your factory right, you can probably have very consistent output. Um, but at some point you initially you feel like if you don't have a red setup, you might feel like the agents are getting more and more probabilistic and like you're losing a lot of determinism. Um, because they just go off and do random things. Um, which is probably a sign that you need to like build more guardrails for the factory. And I think this is a function of the model capa capabilities as well. Like as the models get better, they can follow instructions better and just execute on whatever you want them to do. Um, and thirdly, um, you might want to have a factory because you can leverage your taste better. Um, you can like get more out of your creativity out. Um, instead of just like waiting for you as a human to create them and produce this, uh, software that you're creating. Um, and then obligatory like then and now. This is what it used to look like. This is like a Tesla factory from a couple years ago. Uh, and this is like kind of what we're getting after here. Okay, let's get straight into it. So to build a factory, what do you actually need? Um, I like to think of this as primitives and patterns. So just like how do you structure the code? Um, is it like a modularized codebase? Um, do you have this scattered all over the place? Is it coll-located code etc. Um because the like um the distance in um in locating like if you have an agent like lsing a folder uh it can like discover all the relevant files at once instead of having to gp and search all of the codebase. It can just like be very isolated to work within one um single part of the codebase. And this goes for same with humans. like if you have an easy time like onboarding yourself to a new codebase, an agent probably will have that too. Um the second thing is like usage patterns. Do you have specific like methods and services for authenticating a user? Do you have like startup scripts? Do you have a way to like write tests etc? Do you have this boiler plate in place? Um because if you do, you can point the agent to like existing references and just asking this to reproduce uh over time. So those are like some of the like primitives and structures of the codebase. Um the second one would be guardrails. So like you might you want to let the agents free but not too free. Uh so you want to have some rules and and checks and and hooks in place. Um for example um a hook you might want to have is uh touching a specific part of the codebase. uh maybe the agent should not be able to change to like the most sensitive like encryption of sensitive data or authentication or anything like that where uh a mistake could be like very very very costly um for the company or for you as a human etc. Um rules um rules is probably the most misunderstood um concept since we launched cursor rules. Um there's uh cursor directory which launch a good collection of different rules. Um and the assumption was usually that you should just install every rule that you can depending on like what um uh software stack you're using. For example, if you're using NexJS, maybe you should have Nex.js rules. Um but what I found and what I'm seeing amongst our users and internally is that rules should just like emerge dynamically. Like if you're finding agents going off the rails, you should probably create a rule for that. And it should kind of like be sort of like an SA SOP uh to showing like the agents what they can do and cannot do. And again, the models are getting so good at following specific rules that they usually don't go off the rails anymore. And I think that's just kind of like extrapolate over time as well. Um, and of course tests like can the agent uh verify its own work and can it run tests to know like oh I messed something up or um I made a change depending in like in this specific area of the code but it still passes I can I can still run the code and u the check looks good. Um and lastly, which I think is probably most exciting is the enablers. Like what can you allow the agents to do to actually let them be free? Um skills is good for this. Um just giving the agents more capabilities, skills and MCPS. Um accessing like external context. Um getting like understanding of how to implement a certain thing. Uh I'm going to show you some later uh in the cursor codebase. Uh what we are doing for example like feature flagging. Can we give uh the agents a skill to add a feature flag? So when we launch them autonomously, they can just flag the actual changes made and merge the PR and come back to us like, "Hey, uh if you want to try this, just turn on this flag. Um if you don't like it, we'll just revert to PR. If you like it, we can like expand it to more users." Um and lastly, like what kind of environment are you letting the agents run in? Um, can your agents um start your dev environment? Can you just ask them to like, hey, um, start my project um, and let them do that without having to like have any human in loop. Um, because if that's the case, you can probably like have them run um, you can scale it up like infinitely on separate VMs. Um, and then this checklist is like what I'm usually following um when thinking of like building the actual like uh the factory. Um, and part of that is like is it runnable? Um, there's a typo in here. I blame my Swedish. Uh, there's is it accessible like the context that the agents needs to have. Um, can they interface with linear or notion or data dog or slack etc. just to understand and like see what's what is like the broader context of this the intent that the user have and lastly which I think people should be spending a lot more time is like building verifiable systems how can um the agents themselves like verify their own work whether that's through um unit test or integration tests or uh UI tests like actually clicking around in the DOM and like trying to reproduce things that's actually happening for the end user. Um, this is arguably easier for uh backend systems where there's like no UI really happening and you can have like clear contracts and boundaries of what should work and what shouldn't. Uh whereas for for web and UI and all that, you actually need to click around and making sure things work. The buttons actually have a loading spinner, etc. Okay, so this is like part of building the factory. So, if we switch over to cursor here, um I'm not sure if you've seen this, but this is cursor uh three. Uh we launched this a couple of weeks ago, and it's a complete rewrite of cursor. There's no VS code anymore. Uh most of you are probably familiar with this type of cursor um where you have files and sidebars and a lot of different things. uh whereas this is a bit more streamlined for like an agent first workflow and and we'll get to like why we created this as well so at a later point but I wanted to show you some parts of um some rules etc. Let's see where I put them. Um so for example I built this music agent uh project and if you've used Ableton before you probably recognize this. >> Yeah. >> Yeah. Yeah. Yeah, I'll expand it more. Good. Okay. Um, yeah. So, if you've used Ableton or any like music production software, um, you probably recognize this interface. Oops. Uh, it's not really working in the size. Um but what I uh essentially asked the agent to do here is like can you start a local dev server and we can see that it worked for a while. Um it explored some files read package JSON and based on this uh there is a start script. So like package JSON and all these dependency files are so in distribution of the models that they know like we should immediately go to package JSON if there's a JS project or if it exists um to look for a start script and this is like a good example of having like a pattern that is predefined and like making your codebase more like in distribution uh in that way u because now it's like it's super easy for the agent to understand like oh I should just go in here and start the server. So it started a server. Um it's running on localhost 3000. Um and let's see here. We can see that we have this agents MD file. Um so agents MD is like cursory rules. It's across for many different harnesses. Um and what I wanted to accomplish with this project is essentially like building a factory around this idea of building like a online music uh creation tool. Um and to do that I like I forced myself never to write any code um myself try not to look at a code that much either and just like try to figure out like what is the systems and the structures I need around this. Um and um immediately it became pretty clear that we need a way to start the project. Um we need a way for the agent to like verify its own work. So the agent created this uh end to end tests um using a playright so it can just spawn browsers um go to root etc. click around and get my test ID. Uh, and making sure like for every different change I make, um, for example, the play button still works or I can add notes to this project here, uh, without anything breaking. Um, so these are like some examples of um, how you can create like verifiable outputs like that. Um, okay. Uh, we have V test, we have this, etc. So, let's see here. If you go back, um, oh yeah, another option here, casual scrolling on Twitter. Um, a different way to verify the work is using, um, like an automation to rec code review. Um, you can ask the agent to just review the changes it made. Um or you can use like um a more like integrated tool like bugbot that we have in cursor that just looks at uh different PRs uh in GitHub and reviews them and comes back. Um and this is like also like one piece of the whole like factory that you should have multiple different stages where you you plan it, you produce it, you review it and you essentially follow the whole uh SLC uh but you like automate and codify uh this work. Um let's see here. Yes. Um I did I want to show you this as well. Um so we launched uh updated cloud agents. Um in the last couple of weeks where we gave uh each agent their separate VM and you can have them like create this very reproducible environment in the cloud and this essentially allows you to scale like infinitely. uh but we also gave the agent a tool to test its own work um by controlling the computer. So for example, we have glass here uh which is the interface and I asked agent to let's see here uh glass agents still rough with the keyboard control tab etc like better accessibility and um using the keyboard to navigate the agents. Um, and I asked it to uh make the change and then record this with the full editor because the first one was just a sidebar. So, what we got back here is just a video of the agent actually testing its own work. So, we can see that it has this highlighted row. I'm not sure if you can see that. Um, but just some context for me as a human to verify the work. Um, and then it actually clicking around and using the keyboard to to navigate. So with this, we're like we're getting kind of far in like the factory like where we're at. Like a lot of the things are automated like review is automated. Um the testing is automated. Um uh we have some rules to like steer the agents etc. Um but there's still a lot still a lot more to do. Um so I think when you have this in place the most important thing you can do is like shift your mindset. Like you are going to look way less at code. So you are going to go from like worker to manager. Um where instead of just doing the work yourself, you're overseeing a lot of agents doing the work uh for you. Um so this also means going from sync to async because most of the work is going to happen in the background and you can still tap in and see what's going on uh for different agents, but the more agents you spawn over time, the harder time you're going to have to like understand what's going on in each of them. So then you need a way to aggregate these changes like upwards. Um and it's just I think it's so interesting that it's just the same as like in human organization like all the same principles kind of follow. You still have uh you start with a very small team and then you add more and more people because you need to get more throughput and all of a sudden you need a manager to like oversee things and then you add more managers and then you need a manager of the manager. And this is essentially what's going to happen with agents too. But you are just gonna like keep on going up the lab levels of abstraction. So when you're a manager, you need to start thinking of like how do you scope and paralyze the work. Uh because you want to get like higher throughput. Um but some things are not necessarily um it's not good to make all the changes at once. For example, if you have two different tasks working on the same part of the codebase, you're going to get merge conflicts. So you need to still like plan out scope and paralyze uh the work. Um and one like one unit of work can always be one agent. Um so then like how do you take a long long list of things you want to do and actually like make the most out of that um and run the most amount of agents that you can do. And to do this I think it's important that you preserve um like tribal knowledge of the codebase like you still understand what's going on in the different systems. um you know like how data flows, what the users want, um which part are critical, which part don't um so not outsourcing too much uh to the agents, but like very be very like direct um and and managing managing them pretty well. And when you're going from sync to async, you are going to need to trust the agents a lot more um because you are going to send them off and doing longer and longer tasks and when you do that you need to like get more context up front. So you kind of like frontload uh the context to the agents either through like a plan or a long spec and then you send them off and then you let them go. And once you start doing this regularly, you're going to like start to feel the agents. You're going to like understand the models. You're going to see like these are the weaknesses. These are the strengths. And you are going to create like this alignment with the models. So you know like how to prompt them and what intent to give them. And again, as the models keep getting better, you have to give them shorter or less and less prompts as you used to uh before. But you still got to provide the intent and be very clear like what uh with the change you want the agents to do. Um and there's like no there's no shortcut to this uh from what I found and from the what the team has found. You just got to like spawn a shitload of agents and just like let them do the work and see what happens. And as long as you have good safety guardrails, you can just let them do that. Um so you probably shouldn't let them push to prod like straight away. >> Sorry, one question. Do you Do you multiply the working environments as well or do you let them all the agents work in parallel on the same development environment? um >> yeah so this kind of comes down to like um personally I'm always using isolated environments so in different VMs um I just tweeted about this actually because on one hand if you're sharing the workspace you can have like git work trees where you like have diff shallow copies essentially of the codebase on the same machine and you can reuse services but you're still going to have to branch every like database or cache or user management to have like reproducible and separate environment ments like if you are going to make uh a lot of changes at once you need to you want to know that they are pure and they're not like having side effects to the other branches and that's why I found like just using cloud agents uh where I spawn a VM and this VM can run a database uh uh internal tooling databases other stuff uh and the cursor app itself and then have the agent just work in that isolated environment to be much better um it is more expensive It's going to take a lot more work to set up your like factory or your environment to support this. But once you have it set up properly, you can scale this to like 100 or a thousand agents. I'm not sure how many we are running today, but I bet it's like multiple thousands a day. Um just agents running in the same or like copies of the codebase. Um so that's what I would recommend. Um yeah, so when you're a manager like your job changes quite a bit. Um so um you have to like look at your system as a whole. You got to like think of where is the human in the loop needed. For example, do you have a log service like data dog and do you need to copy paste the logs and go into the codebase and paste them and like run the agents to identify and and trace down issues or do you have user feedback that you need to copy paste from Twitter into somewhere else and let the agents do something with that? Um, do you have like a notion thing uh where you have all your specs and you need to copy paste a notion or export them into markdown and dentry agents? there's probably a way to like automate all these different things. Uh either it's like skills for MCPS or either or or separate automations. So think of like where is a human in the loop needed and try to like automate that away. Um the second thing is like catch where how can you catch agents going like off not doing what you actually wanted to do. Um, and this is like the this this is like the perfect flywheel for improving your factory as well. If you can see agents like um creating like wrong uh schemas in their database because they're not following naming conventions, etc., that's probably a rule somewhere. um or if they are um just producing really ugly UI, there's probably a way for you to create a design system and let the agents be aware of the design systems where they can uh incorporate that and uh use it for the next kind of like iteration you do and yeah then you take all these learnings and uh you use it to actually improve the factory and thirdly it comes to like scaling the factory. So now you have like your environment set up. You know how to uh be a manager to like manage a fleet of agents. You scope the task and you do all this. Um so how do you like actually take it from like five agents to 10 agents to 50 to 100 uh agents and um the thing is again um not looking at code is going to be a real thing if the models get better and they are getting better. So observing the outcomes um kind of like the same thing as previously like where they go off the rail uh what are they producing what are the artifacts etc. Um how can you make it so that the agents also can verify their own work and verify the outcome that they produce. Um you should set up automations you should look again at the things you're doing repetitively. Um, so one thing we could do for example here is if we go to cursor and we go to uh this music agent again uh I can ask uh looking at my uh chat history what repetitive tasks am I doing? Um so we can ask the agent to like look at this and identify potential opportunities. Uh, so it's searching the AENT transcripts and it's producing some kind of artifact of this. Um, yeah, we'll see how this goes. Um, I actually built this into a plugin. Oh, let's see here. Uh, plan execution loops, restarting the produ direction. Um, let's see here. Ableton like UI iteration. I should probably like put this in a rule saying like make it look like Ableton. um tooling, housekeeping, etc., etc. So, this product is very shortlived, but if you're looking at an actual production thing where you have prompted a lot over time, you're probably going to find things that you are doing recurringly. And I want to show you some things that we are doing at cursor um that we are automating. Um and some of these are not that obvious all the time, but one is for example, let's see here. Oh, not this one. Uh, let's see here. For example, daily review. So, I have this um automation for checking my own daily review. So, this is going to um look at Slack. It's going to look at GitHub um and it's going to send me a summary of the things I've done um over the last day. So, I would previously have done this like writing down my notes maybe um thinking like what did I get done today uh or like running an agent with access to MCP but now I can just put this on a schedule and do this automatically for me. Um I want to show you a different one uh for example read merge PR comments. Um this is also like a way for you to uh learn over time. So for all the PRs that we merge in our main repository, we can look at the comments and we can look at what did humans actually review here uh and what do they say about the changes I made. Uh because if it's if a human actually goes in and reviews a PR and leaves a comment, there's probably like high high value and high signal and high intent uh in that comment and we can then store that later uh in order for the agents to actually learn over time. Um we have another one uh which I can show you here. Uh this one. Yeah. Again the code owners. Um, so this one allows us to we essentially had this problem where um we had code owners in our codebase and they were kind of right most of the time like 80% of the time but for these 20% of the time they caused a lot of bottlenecks for us internally like we were blocked on merging the PR we needed someone else to um to review it for us and maybe they were in a different time zone perhaps. So what we started doing was building this agentic code owner thing and what it essentially does is looking at PRs and checking like first of all what's the risk of this? What's the risk level? Can we is it just like changing a variable name? Is it changing a constant that's changing like how long a trial subscription is or something like that. Um and if it is a low risk it can just approve the PR because we don't really we don't want to block uh our own engineers uh on these things but if it is um we can see that it is a high-risisk PR and then we can find like okay who who made changes to this previously and can we like pull in their feedback um and making the most out of this and like um first of all making the code safe um and not breaking into systems but also for the user that actually did the initial change, keep them in the loop and like keeping them up to date on and refreshing their context of what's going on here. So, it kind of like it goes both ways. Um, and and yeah, multiple uh value ads from doing this. Um, let's see if there's one more review. No, I think that was pretty much it. Um, or yeah, I have this one more thing called continue learning. Um so continue learning is another type of automation um that I created um a couple weeks ago as well and it essentially does what we did with the agent. We look at the previous transcripts we have and we can then extract like memories and learnings from what we said previously like if we're correcting the agent to do uh a certain thing like um use this component instead of that component or um always uh refer to me as uh like always like have very like verbose uh descriptions of things that you're doing. instead of me like every time going in and um asking the agent to do this, I can create a rule, but I'm kind of lazy so I don't really um remember to create a rule. So instead, we can have this continual learning plug-in that looks looks through uh the transcripts and store this as a rule uh for you instead. Um so these are all examples of like systems to automate yourself away and to automate like things that the agent can do for you. Um and I think that's the important part of like building these factories like how can you identify uh the flywheels and loops where you can uh automate yourself away by building systems. Um okay um and yeah you are going to move up abstractions. So now you're managing five to 10 agents but tomorrow you might be managing uh an agent managing other agents. Um, and that is just going to grow. Like you're going to have a lot of sub agents like under you working for you. Um, cool. So yeah, what I want you to take away from this is be very clear about the intent and like really think about what's the actual problem to solve here. What do we want to get out of this? Um, don't outsource important decisions. Like make sure you're staying in the loop for important decisions. Um whether this is like uh safety or security or databases or payments and authentication um some things are really important and should not be made uh uh should not be decided by agents but by humans. Um build tools and systems try to find these flywheels and like codify them and get them in uh your systems and let the agent have access to them. um store context for later, whether that is like agent transcripts or artifacts of things you think look good. Um because this is going to help the agent to like know what good and bad looks like over time and this is going to change. Um so storing the context and building the tools and like keeping them up to date is is more important than actually doing the work because this is going to provide like the framework and the guardrails uh for the agents. Um, and lastly, like let the agents be free. Like think of what do they need. Um, I have a friend at Lovable. Uh, he mentioned that they set up a Slack channel or he gave the agent a tool, a vent tool, so the agents can complain about things uh when it was running. And the agent started complaining about, hey, um, I can't like access this image. Um, I'm like very frustrated about this. And then it posted straight into a Slack channel. and then they set it up as a joke but then they started scrolling through and like oh this actually is very valuable like we should probably like give the agent access to reading images and they did and then the agent started complaining about something else that was problem with the harness um so find ways to let the agents be free I think that's um very important thing um okay that's kind of it uh and that's kind of like a direction of and things we have found like building cursor and like taking cursor towards uh software factory. Um I hope you learned a thing or two and can take away um some of this. I'm happy to take any questions about anything cursor. Yeah. Or actually now we have the microphones coming here. >> Thank you very much. I have a question about um uh code quality or architecture quality. So when agents ship tons of code and you barely can review them uh how you uh ensure the the code is extensible and so on. I mean um you can uh establish hooks or guard rails for measurable things like I don't know uh number of lines in the file should not be more than something but uh the architecture is not measured this way. So um and agents they have this completion bias. They want to finish task as soon as possible and uh they don't think ahead. They don't have their uh picture of the future how code will evolve. They just want to finish task now. And uh yeah, >> thank you. >> Yeah, it's a good question. Um I think we as a humans have the same problem, but it just takes a lot more time uh for us to like discover them. Um, one pattern like the good thing about agents and models being like um, essentially like completion machines is that they will just look at existing references and just continue forward with that same path. So if you have existing things you can point them to, I think that's very important. If you don't, I think there's a case where you let the agents do um one-off implementations here and there and then eventually you have another agent like refactoring like we do as humans as well. So like one to generalize um and build abstractions and all these things. Um so like how can you build like a system to like detect this and verify that the abstractions that are getting built is also good and in line with what you want to do. Um but I think it's going to be like a lot more architectural review for humans. um and and scoping and like planning of what the architecture should look like and system design. Um but yeah, it's it's a tough problem. >> Thank you. >> Hello Eric. Thank you for the talk. Um when it comes to the activities of building the factory one thing that I observe for example when it comes to building things like rules in a in a team is that because it's so new almost everybody feels oh this is a rule for me and I don't want to inflict it on other people >> and I notice this creation of silos where each engineer ends up having their own separate different factory. Do you have any advice on how to bring it to the point where the whole team is contributing to the creation of the factory? >> It's it's a great question. Um I think it's hard. I think it's very cultural as well. Um I mean like we developers have always created our own tools and like we want to have our own custom setup. Uh but at some points like we have to unify and like uh on a certain structure. So I think historically we have like had PR reviews and all these kind of things as a ceremony to like align on the code that's being produced and making sure it's consistent. I think we got to take the same principles and apply that to the tools we're building as well and like the the guardrails and enablers and primitives. Um so I think I don't know establishing some kind of a forum where you can discuss these things and like plan like what do we want the factory to look like? what are the components we need? Like what are the integrations we need? Do you have any examples of like specific things that people is it like flavor or is it more bigger changes that the agents are doing? >> Um what I notice when it comes to rules they they create like oh I want to like one person wants to write the test first and they create the rule to write the test first but they know that somebody else doesn't want to do it that way. >> So then they have the rules only on their machine. and they they don't share it because it is too unique to what they are. So they're collaborating the whole team is collaborating on creating the codebase >> but the collaboration in creating the factory in thinking well are we deciding now that the factory writes the test first or not >> that is a big decision that is hard to align everybody and accept that like with all of these rules not everybody's going to be completely on board and in most cases doesn't matter when >> when you defer a little bit but it is hard to >> yeah I guess it's it's it's a human problem and a human change that needs to be made. >> Yeah, these >> but it's a good question. I'll think a little about it. Thank you. >> Thanks for the talk. Um, a lot of the patterns resonate. Um, I was wondering what is needed, what kind of patterns can you suggest to take it to the next level if you work on enterprise brownfield mission critical systems >> that cannot fail. they cannot be insecure. If you look at the recent supply chain attacks and you give your agent sandboxes, maybe that's not even enough. >> Um, so the humans remain accountable. >> Yeah. >> And we can't say, "Oh, it's not my fault my agent did that." So do you have any extra patterns that um um or or is it just inherently we have to keep reading the code which may feel like reading assembly lines in the 80s or something? >> Um I think if you can spend a lot of compute and tokens up front before you as a human actually like needs to be involved. I think that's a pattern that we found be to be pretty successful. Um so one thing is like manually writing tests for very critical parts of the systems. Um and then just letting the agents like run them uh a lot. Um the second part is like building automations to like um our security team. They built like this security sentinel which is an automation that like looks specifically for very very uh specific invariants um of the system and they run like 10 of these on like certain PRs that changes certain files. Um and then yeah I think I think it's a bit contextual as well but yeah just spending a lot of tokens before uh and trying to like find different variants and like almost red teaming. So one thing I did is um instead of focusing on velocity and throughput, >> I focus on quality. >> Sorry, what? >> I use AI to focus on quality >> and just improve the tests and just make it completely AI ready. >> Yeah, I think that's very good because if you as a human trust the tests, you probably are trusting the output even though you don't have to look at the code. And that's kind of like where we're going. I think. >> Hi. So, uh, thanks for a great presentation. Uh, I I find myself kind of like lacking and slacking in using guide rails, uh, especially like rules and hooks. uh partly because historically the the the knowledge of how to do that properly was very scattered and decentralized across uh whole web. So you would have this like exotic GitHub repos would uh try to like centralize this knowledge or maybe you would have some like medium articles or maybe cursor would cursor company would do a blog post on this right but still it was very evolving and also the capabilities of models themselves on especially on instruction following uh they are also evolving and they are getting better on that and and and it always felt like kind of like duct taping. to me. So I'm wondering uh basically can we have AI to help us with that? meaning that could cursor for example give us like proactive agents or maybe some new setup uh or maybe wizards kind of uh setups where we could identify our workflow and then help AI build us rules and and guard rails and all those like rules artifacts uh for us. So maybe just like a proactive agent. So, so maybe we would have like an agent that would scan our workflow globally >> and then help us build those artifacts. What do you think about it? And do do you guys think about this in the company? Maybe do you work on that? >> Yeah, totally. Um, I think now there's like two places where you can do this. One is like in the product itself with the whole um with the like continual learning pro. Um, let's see here. Uh oh, I don't have it installed. Uh we can go to marketplace. Yeah, with a continual learning kind of uh plugin to actually like look at your um transcripts and like extracting rules and memories and all that. That's like one way to do it. Then there's like another world where um you like change the weights of the model depending on like what your codebase looks like and what like your engineers are doing like in a specific team. uh and you like you reconcile them and it's like it's like true continual learning uh not this hacky plugin um and you like actually baked that into the model uh so they actually know what your preferences are etc. Um, but totally like memory and rules and all that. I think that's going to become more and more important over time. Um, because that's kind of like what's lacking. That's kind of like what's preventing me from having a lot of trust in the agents sometimes because like I say something and they forget about it. Uh, but they're just like stateless machines. So, how do we capture this knowledge? Um, so I think we should put a lot more like time and effort into um building these systems. >> If I might just follow on that. So, so you you say that you seem to first to to start a project or or do to dive into the project that's already existing in the codebase and then to build rules on top of that. How about we we first have rules and we want to start a new codebase, new project. >> How to how to actually have those >> good rules for us? Do do you think that humans should humans should still do that or can we also automate that? And do we have a new best workflows for that? >> I think it's hard because like my perspective on rules is like the bridge between uh the model behavior and like the human behavior and like how do we steer the models in a way that they follow me as a human what I want to do. Um and in a new product I'm not really sure what I want to do. Like I kind of want to like outsource that to the model like see what are they doing here? Can I run different models? Do they want to like combine them or do I want to scrap everything? Um, so I think it's it's hard like one like the best example of a rule that I can think of internally is for a bug bot. Um, so when we're doing database migrations, uh, we're not really using foreign keys on a database um, for performance reasons. And the models like the right way to do this is use foreign keys, right? Um, so they will always add a foreign key. Um, but when it like hits GitHub and there's a PR created, we have Bugbot looking at us as a reviewing. like, oh, I have this rule saying like we should never use foreign keys. So then it flags this. Um, so that's like the gap between the human and the model and what we want the desired like intent we have versus what they have. So I think rules should like emerge dynamically over time. Um, and before that you should probably adjust those ephemeral like specs and plans. Um, oh yeah, there's like one over Oh yeah. Oh yeah, >> it doesn't work. it it worked. Uh so thank you Eric um for the talk. Um as evol evaluation and trust is a big point. I'd like to know how you effectively do uh GUI testing and uh user acceptance testing >> automated. >> If you can show show like something of your workflow. >> Um totally the best or like the main way I do it is using let's see here. Oh yeah I have this one for example. Um the main way I do it is using uh the cursor cloud agent with the computer use that we have. So I'm going to publish this. Oh no, that's bad. Uh I guess we're not doing that. Um I have this website where it's running a uh I have like seven components um like a button, a dropown, etc., etc. components and then I'm generating each of these components with a different model and because I want to like compare like what does uh composer dropdown look like versus GPD54 dropdown look like and I put this in a grid but when I created this website there was an error where I had this like view code button so I could actually see the generated code it was not working because the model didn't uh bundle the actual code so I went to cursor and I clicked when clicking view code on a component it says it cannot load a And it's like it's a very like short description. So what the Aiden did uh you can see here it spawned my local server. Um it started like clicking around and pressing enter. Uh we can see the cursor up here and it's creating this like screen studioesque uh recording where it's like chopping and speeding up and zooming in etc. Um so here it's taken a while because computer use is fairly slow. um it's consuming a lot of tokens and we can see we have this view code button and now we can actually see it's working too. Um so since this is a very like much of a side product for me I'm not really going to look at the code. I'm just going to like see that this works and I'm going to merge it. Um but you can keep on prompting the model to do very specific things for you like can you follow these like specific instructions um like a login flow for example you should click the button you should log in um the models this like login steps are probably so much in distribution that you can probably just prompt the model to say like go to this URL and click login or like log in and it's going to like understand which steps it needs to take but then you can ask the model to like uh input a wrong password or input a wrong email and see uh what are the results from the website and maybe the website is giving like wrong credentials and then the agent would understand like oh I need to like put in the right credentials. Um so just like you would um like hire a consultant like a QA consultant and giving them instructions you would just give the same instructions uh to the agent. Um so this is like one way to do it. Um I guess the other way would do like more uh playright puppeteer and just automating like a browser thing uh which is a bit more deterministic as you can review it um and check it in and like have other people reuse it. Is it does that answer the question? My question was going more into uh like user acceptance testing to check does this thing actually look right because like uh testing a login you can do you can automate that you need an agent for that >> but like does the does the website uh look right is it consistent through all the pages that are generated stuff like that >> yeah yeah then I I use cloud aents for that a lot Um there was one I can't remember now but I think it was I did some changes in the docs and I just asked it to like open every single instance where this word is referenced uh take a screenshot and give it back to me. So then I could just like look at all different screenshot everything look good and then I could merge the code. Um, so letting like the agents do uh the navigation and clicking around and uh the testing for you. Um, I think it works surprisingly well. Like this was like a very much an AGI moment for me uh when we launched this in last year sometime internally. So have you have you had a chance to try cloud agents in cursor? >> You should curious to get your feedback. >> I know you have Uh which one? This one. >> The agent like spawning all the giving us the >> what was the initial question? How long it took or >> Yeah. No, I I see. But how expensive it would be for like >> ah um very straightforward. Like I for this one I did no specific setup um for like our own repository uh where we have like when running cursors like we can actually like reproduce like this demo here is running all the backend services for cursor it's running all the front end things um and this is like a lot of lot of different things um so the VM is quite beefy um but as long as you give the right instructions it's working really well what we did was creating this internal C five that the agent could use to like uh we call it um like cursor dev tool cursor dev tool backend start cursor dev tool frontend start um and that is abstracting everything away um that actually needs to get to like uh orbstack to running click house and postgress and reddis and then the front end running like um electron and then uh glass here but then they just like coexist the two different processes Um, and the agent have access to everything like just as a human would do. Um, and you can have like the agent be authenticated if you store like a snapshot where you are authenticated, etc. >> Yeah, more expensive was in dollars like >> Oh, sorry, sorry, sorry. Okay, okay, okay. >> Uh, my bad, my bad. Um, yeah, this one I don't really have I could probably look it up. I would guess this is like $1, something like that. >> Um, >> just one turn. >> There's like for one turn probably like this initial one would be $1. Uh, and the other ones I just asked them to re-record a bunch of different things. Um, >> something like I can look it up later. Totally. >> I jump back and forth between Yeah. And I guess depends on which model you're using too. >> Okay. >> Okay. Uh my question is about uh hand hand off between humans and and agents whenever you are using different tools. So in my current setup, I have a product owner and a functional analyst that they they work on cloud code and they prototype very fast uh with basically without uh u so much thinking about oh the back end the architectural choices or whatever and then they pass the the control down to the delivery team that uses cursors and has to make that stuff work actually work. uh which best practices do you suggest in order to enforce a proper workflow between people just not knowing basically what they are doing >> uh on a technical point of view of course uh and the people that needs to bring that thing that maybe has okay some poor choices such as okay use that database or then cloud changed the idea and they moved from super basease to torso to any other kind of fancy database that actually is in that uh in that environment and then bring that into some sound architectural choices moving from cloud code to to >> cursor. I think what we're doing internally is like we have like one or two PMs >> and they are building a lot of different prototypes. Um sometimes it's actually in the real like product itself. Uh they're using maybe cloud agents and just prompting them. they're getting like a video like this back of the changes and they see like, oh, it kind of looks like I wanted to and then they tweaked the designs a bit, but the code might be really bad or like not following best practices. Um, which if they had a if we had a good factory, then it probably would. Um, but if that's the case, uh, we hand off like a link to the cloud agents. We just copy the link and just send it to the the like engineers like, hey, this is like something that we want to build. Um, does this make sense? Like, can we do this? Um and then you have a lot of intent already expressed. Um but the other case is like having the PMS they have a separate repo called like prototypes and it's just like an HTML file like a mega HTML file uh reproducing like the cursor UI or the dashboard. >> Yeah. The the problem is the migration. So uh just uh practical use case I had my PO and functional team uh build out a very fancy demo using Prisma and Torso and >> whatever database and then storing data on Versel blob storage and then my delivery team had to migrate that to use SQL server and u C and Aspire for the back end and the migration was really painful Even because uh when they use the agent freely with no constraint uh the agent sometimes decided to use say nextjs some other times decided to use vite another time it decided to use and uh putting constraints in form of rules within that agent shape that down the path. But the problem is that uh we need to uh write a lot of rules and make them consistent uh and it is not easy to to manage all the workflow. So we are shifting a lot of effort from uh having people to write code to having people to write guard rails and rules and whatsoever and make all the pieces talk to each others. >> I see. Yeah. Yeah. Yeah. I guess um if if the POS and PMs can't have access to the actual codebase just like handing off an artifact is like the minimum viable intent uh which could be like an interactive like back in the days it used to be like Figma prototypes right you can click around and you get like a feeling for it now you can have them even higher fidelity where you have an interactive prototype using like web technology without like touching anything of the backend stuff or it doesn't have to be like a working thing for real if it's just a prototype internally uh but just enough to like your engineers can understand like oh this is like the intended thing if I click this thing that should happen um or if I like enter some text here and click send a row should show up here um and I think all that can just be done um in the front end kind of like a hackathon >> you don't need to migrate the the prototype into something that becames production really but rather uh rewrite that. >> Yeah, I think so. I think rewriting um and I think like setting like clear expectations from from the engineers to the PMs and POS like what engineers kind of want from the product organization and like what's most helpful for them. So maybe not like vibe coding complete SAS products is the most efficient thing. >> Isaac, thank you for that presentation. Uh my question as we're building more and more agent and it become part of our time critical processes how do you see the brown outs and blackouts as as a as a as a new risk and um what's your uh what's your view how it can be mitigated and and the impact reduced? >> Yeah, it's a great question. Um it's a really good question. I think it comes down to what we talked like the humans are still accountable for the things that's being shipped. Um so the humans need to build like systems and observability and monitoring around the changes that's being made. Um and I think that still like comes down to understanding which are like system critical areas of the codebase. Making sure you have good like observability and understanding of everything that goes on. Maybe like every line should be humanly written in these critical things or at least like always humanly reviewed by one or two people. Um, and yeah, it's it's close to vibe. It's easy to vibe code close to the sun and fly too close. Um, so I think it's also like a cultural thing where you have to make sure that the humans are still like accountable for for the things getting shipped. Uh, but yeah, setting up good systems to understand um the changes being made. I think that's important and tests. >> Hi Eric, thanks a lot for the talk and I'm assuming you're probably one of the people around the world that has the best understanding of how to use these technologies. So this question takes a step back about from the technology and things about processes and how do you manage yourself in your work days and I wonder how long are these tasks or how how long do you get to be away from your agents without babysitting them and how do you actually invest this time uh let's say you have five 10 15 minutes how do you make the best out of your time and maybe how many agents do you have in parallel like mental processes and how do you manage to yourself. Thanks. >> Yeah, it's a great question and I think like once you like there's like two levers to pull. Uh one is like the scope of the of the change. Like the larger the scope is, the longer the agents are going to run and if you want them to run for a really long time, uh you want to have like a verifiable um system so like they can check their own work etc. Um and the other thing is like how much can you parallelize like how many of these agents can you spawn off? Um, and I think the sad reality in some sense is that there's going to be a lot of context switching. Um, I probably work in four different repos or like four different areas of the codebase at the same time. Uh, whether that is like through a like single like feature that requires front end, backend, database, um, testing, yada yada. uh or if that's like five completely different things. It could be like docs. It could be like uh side projects I'm exploring. It could be fixing a bug from a Twitter user. Um but I usually they range from like um probably five to 10 agents five agents like asynchronously running in the cloud at all times. And while I'm waiting for these, I'm either like scrolling Twitter or It's true. We also have the browser and cursor now so I can just stay in here and do it or I have like a synchronous task going where like I'm a bit back and forth. Uh maybe that's like fixing a small thing in the codebase or maybe that's like planning the next thing. Maybe I'm like sourcing notion and slack and just like creating a spec in cursor using a model. So I love to like plan synchronously and then just execute the plans like asynchronously and then once that is done one of my cloud agents is probably done as well. So I can come back and like review that keep on prompting it a bit maybe merging um and some parts I still like need to test manually like maybe I need to download a copy of Glass or Cursor 3 um test it manually and like this looks good to me. Uh let's go ahead and merge you a quick question. This factory building leaves us with a scattered ecosystem of a lot of markdown files. Is there an easy way to organize these files and to keep an overview of the factory you have actually built? As maintaining a factory would require you to have an overview of the processes you want your coding agents to go through. What tools do you use? What methods do you recommend? How do you keep a mental map of the factory you have built? And how do you maintain it? >> Yeah, it's it's a really good question. I think it's somewhat unsolved as well. Um, one of the reasons we rebuilt cursor to look like this instead of like the traditional ID is the fact that we are using more agents and we need like a better control panel where you can like see all the agents and manage them and spawn them etc. Um, so what's going to happen with like cursor 3? Um, this is like the first tab at like multi- aent orchestration. Uh, what's going to happen is that these are going to be like nested agents. So you're going to have like opening this one up and you're going to have like 10 agents in here. Um, so you can still like introspect them and see what's going on and following the traces. But you're probably also going to have like somewhere here like some kind of project view where you can see like an aggregated status update. So like here's what everyone is working on and here's like the latest here's what you as a human need to review. Um so I think these are product things that we're going to build into cursor. Um, but to like set the spec for the factory, I would probably like have a folder in your codebase. Um, where you like outline how certain things should work. Um, maybe that's like just markdown files of saying here are some best practices. Uh, maybe it's probably rules. Um, and establishing some kind of council to decide on like what goes into the factory and what doesn't and like what are we lacking to like improve the factory. Um, so as long as it's something that the agent can understand and read, which is files, um, that's probably what I would do and just store them as, uh, yeah, in your codebase that's checked in somewhere. >> Thank you. Um, I'm just thinking about like teams of the future. Uh, so, you know, a year or two ago, it's like very reasonable to have, you know, an engineering team that might be several hundred people, several thousand people. um what does this do for that and uh what roles and kind of like roles in a engineering team right this is kind of akin to almost becoming somewhere between like a product manager and like an architect um so what roles do engineers have >> yeah I think that I think that's very accurate um it's hard to predict like what are the like second third fourth order effects of of this happening And it's definitely like writing less code, looking at less code, um, spawning more agents. Um, it's going to be like how do you take because like we're still building software for humans mostly. So like how do we know what other humans want? Like how do we talk to our customers? How do we market what we're building? How do we do all these things and bring them into the actual like factory? Um, who sets the direction? What's the intent? Um all these things are coming from somewhere. Either it's like creativity from someone else's head or it's actually like a user demand. Um so having someone like doing that is going to be very important. Having someone like like aligning that between the different humans in the org I think is going to be important. um having people building uh the scaffolding for the other agents and like uh just building the assembly lines where the agents can actually run. I think that's also going to be uh important but like to what magnitude and how many people it's going to be like in yeah I don't know it's really hard um you can do a lot with the models right now with a very like small team if you have the right setups in place and like yeah depending on the domain you're working in. I don't know do you have any predictions? I I I I see issues with kind of like uh from like a labor perspective. Um if you're if you're working in an incredibly agentic environment, what's your need to like like what happens to training new grads, hiring new grads? Um and kind of like the the future from that perspective, what happens with office politics and like land grabbing, right? Because >> basically your your value now becomes in your ability to configure and set up your own kind of like agentic team, not in your ability to kind of like program and be productive anymore. >> The 10x engineer is no longer about, you know, words per minute. It's like prompting. >> Yeah. >> Yeah. Tok to token usage. >> Yeah. Am I Am I paid in tokens? Am I am I >> yeah leaderboard you know >> um >> gota be token maxing >> am I paid an amount and then like my token usage takes away from that >> you know how do you how do you optimize you know for >> we got to train the models to be more political I think that's the solution right >> we need more like water cooler talk >> I guess we're gonna get more of that if the agents are doing our work >> hi Eric hey >> for your talk >> um I was wondering Um probably uh you are using uh at cursor uh some kind of uh uh issue tracking uh tools like Atlassian or Gyra Jira. Okay. Um, are you using uh I was wondering if you are using uh um um agents to check automatically check uh and uh um read tasks directly from uh uh Jara for example and spawn um sub agents to perform the work or if there is always a human that uh um start to work using cursor. >> Uh so we're using linear for issue management and uh we have this first party integration as well. So for every ticket that's getting created in linear we spawn in the cloud agent. Um, so like one where I interface with this the most is like if we have a feature flag for a specific thing that's rolled out and if it's rolled out for two weeks with 100%. Um, the system kind of like signals us like hey uh you can it's a stale feature flag at this time you can remove it. So then we have this to create an automatic issue in linear and since that is hooked up with cursor it triggers a cloud agent to remove uh the feature flag. So it's kind of like completely automatic once the system knows that it's rolled out to everyone and I can just like I can probably look at the code and say like okay we can merge this the feature is no longer active. Um and we do this for like everything. So once you post something in Slack, uh we either have a linear Slack a Slack agent look at it or we have a cursor automation to like um look at the message that was posted and uh triage it and like look for duplicates or like if it's determined to be easy like start to implement a fix for it immediately. Um and this is like an example of where a human is like in the loop where it might not have to be. It could be like me going on Twitter and like seeing a tweet like something is broken with um the plan mode uh button dropdown. I can copy that into Slack and then having the agent uh perform the work. But there's probably a way where we can just source this feedback immediately without me having to like scan it and triage it and copy paste it. Um so that's kind of like a bit how we work with uh linear and issue management. Um, but yeah, we we're also like, yeah, since we're spawning a cloud aent for every single thing, it provides a good way for us to dog food the product and like test it out. But I'm not sure if I would recommend that for for everyone because it it can be quite costly. as cloud agents are a little expensive. Uh do you have something in road map run something locally like I'm just thinking of an alternative called dev containers and opening in that but do you have something planned in the road map for that? Um, what I think the closest thing you can do is probably just prompt Aiden to run for a really long time. Um, it's kind of like the same thing with like running local models. Um, and the reason like for I've tried it like I probably tried it like once a month running like the best open source local model and like seeing how it works in cursor, but it's never the same experience as ling like um, GPT or claude or composer. Um, and the same thing with like running really long things locally. I found it to not work that well as if it's running for a long time, it's probably going to use your your local database, your other local stuff. Um, and it's going to prevent you from doing other work locally unless you like create a VM on your own machine. Um, um, and and if you do, you could probably Wait, never mind. Just re ignore everything I said. We launched cursor workers. So cursor worker is um we launched it like yesterday. Uh it's way for you to run the same uh infrastructure uh and orchestration layer as we do for cloud engines but on any machine you might have. Um so you can do like not right now uh we can do cdvoom. Yeah. So you can do agent. So we have the agent CLI and there's now a worker and you can call worker start. Uh so from here we have a worker running. Um and this worker is going to show up in here. Let's see. So we can do self-hosted. Let's see here. Oh, I don't think it's hurt up yet. It's a different uh account I'm running it on, but event essentially you can run this on any kind of machine and you can get access to this um from like cursor cloud. So you can spawn multiple of these on your own machine or you can run like a Mac mini or you can have a VM um in any like cloud platform provider. >> Right. Um just to follow up on that. So you you you are saying that we can have isolated environments in the local itself using this command. >> Yeah. So it's >> call the open still call the frontier models or composer models. >> Yes. Exactly. So this is going to like leverage the cursor harness um but it's going to run on wherever you're spawning this uh demon. >> Yeah. That's interesting. Thank you. >> So I like I built this like cursor claw thing. uh where I have one running on my Mac Mini and that has access to iMessage and calendar and all these kind of other things and um yesterday we launched automations as well. So I can get like um like a daily report or weekly report of everything that's going on in my machine uh that I might like want to know on a specific cadence. And since it's running like the agent demon, you will get access to this in like Slack and the web and the mobile app that's coming um at some point not too far out >> sorry what >> for iPads a lot a lot of time people wanted to >> it's going to use Swift UI so it's probably going to be compatible with uh iPad as well. >> I think that two versions of iOS and iPad OS are two different things actually. >> Got it. Yeah. Nice. Yeah. >> I just want to ask quite a simple question like when you have obviously more than one developer in your where you're working in your company and you're spawning hundred and hundred of agents to do a lot of different kind of work. How do you ensure you don't step on each other toes doing the same kind of work ties and even how like you're running internally? Do you still do you use do you use a scrum or still agile ways of working? You know, even that has already kind of gone out of the window already. >> Um yeah, what are we doing? We're not really following any like traditional methodologies in that sense. Uh we do have like monthly goals and of things we want to get shipped. Uh but I think since everyone has so much like power at their fingertips with agents uh this like causes people to have like extreme ownership over certain things. Um so for the longest time there's like one guy building like MCP and rules and like all kind of accessibility uh by himself. Um and now we have like maybe one person focusing on MCP uh but they can own everything around MCP and they don't really really need to interact that much with other teams. Um but at some point that's going to break too. Um and like so far in the like history of cursor we have like found ways to like go in around this. The like the agentic code owner thing was probably one place where we stepped on each other toes where the code owners were like misconfigured so we could just like instead of having a deterministic thing can we just pull in the relevant people at relevant time. Um so like something like that is probably going to happen with other like problems that we're going to surface in the future. >> Thank you. One question about the selfages. So do we get all the goodies that we get with these video walls? >> I think computer use is the one thing that's like still in uh early access. I think I think we haven't shipped the GA yet, but it's coming for sure. So this should be like completely on par with the cloud. Yeah. >> Can you describe the profile of these? >> They describe the profile of these kind of like mix between product managers and engineers that that take this this ownership. >> Yeah. Uh so I guess the archetypes we have it's like a PM um they talk a lot internally in at cursor like they talk with go to market with sales um they talk with engineers they talk with users they just product manage and product manage and just keep everything together in a way and also like shield like engineers uh from various things. Um and then we have designers. Um designers work I would say like 50/50 in Figma and code at this point. Like all of them do code. Uh all of them like do push to production. Um but it's a lot of like exploratory work like what should like what does it look like when you have like 10 nested sub agents. Um and you can't really feel that in Figma like you got to actually like develop and prototype that. Um and they work um they work with PMs and then we have engineers of course um but I think cursor is very fortunate to build like an developer product. So developers are building developer product and it's kind of like they have good taste. They know what good and bad look like. They know like what developers want and don't want. Um and I think because of that they can take such like ownership and they can like go with the concept and go really really far. Uh whereas so like the PM might be setting more of the business and uh like the overall overarching like direction and then the engineers and designers like collaborate on like what does this actually look like in code but also like how should it feel and how should it look um for a developer. >> Makes sense. uh are there like analysts in this mix as well or is that done by the product managers? Oh yeah that's a good yeah so we have a data data team as well uh data scientists and analysts and they are also working closely with um PMs of course and like understanding like how users are using the product where the bottlenecks are uh but also like with engineers and like instrumenting the code in the right way and like understanding feature flags and why certain users hit certain path and some don't um so everyone is like just working together um and we have like I think the way we've structure team is like pretty much um domain like extensibility might be one team, cloud might might be one team um and cloud should still be extensible. So then they have to also work together um but we try to like keep it like um modularized and not to ship our organization that much. >> Thanks. >> Cool. I guess one final question if there's one. So um from time to time I messed up and started a cloud agent in a wrong repo or something where just like went out on a tangent came back to an hour later where was separate desperately trying to get access to that repo. Um are there any way to catch these agents that just don't provide any value? They just continue doing stuff but they're not really making progress. >> Yeah, I think that's that's on us for sure. Um over the last year we have made a lot of improvements to the cloud agents where initially they were like when they were like worked they were extremely useful but most of the time they weren't. Um so like again cloud agents also come from this like internal need of us just wanting to like run things asynchronously. Um and because of that we have also like put a lot a lot of effort into making our own codebase work really well in cloud aents. So maybe some like We like have to sometimes like create new projects and jump into other products and talk to our customers to understand like where these things fall short and we try to have like instrumentation of like does the agent run for x amount of hours or minutes and like does it touch any files at all or like is it going in circles and loop detection and these kind of things. Um and this is like part of the observability I was talking about before. Um most of that should happen on our side. Uh but there are always going to be like very specific uh contextual things where um like if you are the uh the codebase owner need to like set up certain things. Um but yeah we're we're working on that improving it and if you have any examples like please come to me and I'll try to take a look. I think the worst was when I started it on a one repo and it just like >> called out to Slack MCP and tried to get access in 10 different ways and it failed. >> Yeah. Yeah. >> Yeah. We could make that better. >> Good that you're working on >> All right. Thanks everyone for coming. UM, I'll be around for the next two days as well. So, please grab me if you want to discuss anything cursor or anything at all actually.