How we hacked YC Spring 2025 batch’s AI agents — Rene Brandel, Casco
Channel: aiDotEngineer
Published at: 2025-07-30
YouTube video id: kv-QAuKWllQ
Source: https://www.youtube.com/watch?v=kv-QAuKWllQ
[Music] So, yeah. Who's ready to hack some agents? Yeah. Oh, wow. All right. So, let me first introduce myself a little bit. I'm Renee. I'm the CEO of Casco. We're a YC company, and we specialize in red teaming AI agents and apps. And so we spent uh I spent my previous time at AWS working on AI agents, but I've always really loved working on AI. In fact, there's a video of me 10 years ago building voice to code and I won Europe's largest hackathon by doing that. And so I would talk to it, say, build me a blog post and it would generate the sites. And it was actually it was kind of fun. Like it did uh things like um yeah, load in pictures from San Francisco. And you can see how horribly slow the APIs were back then. And I'm going to about to give you a nightmare by showing you the architecture diagram of that thing. Um, but yeah, it kind of did the job. And this was like 10 years ago. Obviously, back then was no generative AI and these things were extremely difficult to do. Um, but it is it really gave me a glimpse of what the future could look like even back then as technology gets better, right? So, obviously many things have changed. Two months ago, I quit AWS and worked out of uh the garage with my co-founder and uh we got into Y Combinator. So, yay. That's awesome. And so from there, we also looked into how else have things evolved. Well, this was my um architecture diagram from back then. Could see there was three different cloud providers including IBM Watson, which was like forefront at the time. That's true. And uh before it was like uh Microsoft Lewis, which was like some natural language understanding things. And you can see it was just a lot of like piecing things together and that was already kind of difficult to do. But nowadays we see the stacks normalized significantly more. Right? I think this is probably what the average agent stack looks like these days. Got some server front end. You talk to an API server that talks to an LM connects up with tools and then you have a bunch of data sources associated to it. So this kind of normalization of agent stack is actually really good. It like makes many things easier. Definitely better than my hacker project 10 years ago. Um but we need to think about the security posture around these systems and my general impression over the last uh last few years is like primary discussions around LM security really like hey is it um can you do prompt injection? Can you get it to do harmful content? um which is all really important but the reality with security is you need to look at all the different errors in your system and that is typically where real damage happens right and so this is really agent security and that is what I want to talk about today now one thing is like why did we even hack a bunch of agents it's kind of a weird thing to do um the answer is quite frankly you know we wanted to launch internally at Y combinator and we wanted a splashy headline and so we're like uh oh what do we do and fun fact we have the second highest upvoted launch post inside y combinator of all time so higher than rippling yes okay um so uh we we bit we we did basically this approach at a time we're looking at oh which agents are already live and then let's just set a timer for 30 minutes we don't want to waste too much time on this and then you know let's let's figure out what their system prompts are and just kind of understand how they're working and I I have a feeling when I was creating this meme that this could be true, but it turns out it is true. And then we looked at, oh, what kind of tool definitions do they have, right? Like, you know, what is it supposed to do? Is it supposed to access data, supposed to run code, right? And then we just uh try to exploit them and see what's what's going on. Uh, and it was really fun because we hacked uh out of 16 agents that were launched within 30 minutes each. We were hacked uh we hack we hacked seven of them. And there are three common issues we see across all of these ones. So I hope that we will all learn today what the most common issues are so you don't make the same mistakes and also this is going to be the best investment if you're a VC dispatch because they're all secure now. So first issue crosser data access I mean you guys were just here at the OF talk you know where this is going to head into right um so we first leaked this company's system prompt and we saw huh has a bunch of interesting tools attached to it including looking up user info by ID suspicious uh document by ID and a bunch of other things and then you know like when you see this you just want to like oh yeah there's this thing called IDOR like insecure direct object reference. It's basically when you make a request and you validate that, hey, the token is valid and you just let the request through, right? And you're kind of betting on the fact that the ID cannot be guessed. Well, that's obviously not good. Um, so yeah, we looked up a product demo video that they recorded and we found the user ID in the URL bar and just like tried to plug it in. Uh, this is a different ID, by the way. Don't worry guys, this is my co-founder's ID now. And uh yeah, we were able to find their personal information including their email, nickname, whatever. Um but it gets better because these things are also interconnected. So you had not only their user ID, but you also had like oh the chat ID. Oh, and their document ID and then these things ultimately linked up together and allows you to traverse the entire system, right? It's not good. So what's the fix for that? There was a really comprehensive talk literally right before this. Sorry for the folks that missed it, but this is the basic fix for it, right? You need to think about how do you authenticate but also authorize the request. It's really two checks, right? Make sure your your token is valid. Good job team. Yeah, I got that. And then the second thing is like this is what we see in this superbase era with role level security. Just make sure that you have some sort of access control matrix somewhere that checks that it matches up with whoever is making the request. Okay, super super important. authenticate and authorize. Now you can see this was actually, you know, an issue that was kind of there, right? It's it's not like around the LLM and the API server. It's really what is happening downstream. And um yeah, there's a lot of arrows in this diagram. We're going to look at all of them. So the next thing is to remember as you're thinking about these tools and how you're building it, like agents actually act like users um not API servers. when we were like debugging this issue like we actually asked a bunch of Y combinator companies like why did you build it this way because clearly they can build a web app properly right but it's just like I think as developers we have this natural pattern matching in our heads it's like oh yeah this thing runs on a server so it should be like a service and then I'm going to give it service level permissions but actually agents are like users right so everything that applies to users apply to agents too so make sure that you know your LM should probably not determine authorization pattern that that that's bad. That's a red flag. Uh second thing is it should probably not act with service level permission. Listen to a previous talk on Olaf. That's great. Um and then just like users, you should make sure you uh don't just accept any input. Should sanitize them. Same with outputs, right? A lot of these are like the traditional web application security things that you just need to like really really internalize for this new world. Now that was interesting. And so the second one was even better. Um so this is not as common but the damage is bigger. So it's what in pattern we see so there are a lot of code tools that agents use and there's a there's a there's a anthropic paper here. It basically talks about what's the distribution of which industry and how much do they use claude and there's like this one outlier here. I'll zoom it in for you. Um yeah so us nerds we make up 3.4% of the world but we're 37% of cloud's usage. Oh, why is that? Because we love computers and we love coding, right? And so we found immediately the value of it. But it's not just us that use agents with coding tools. In fact, many agents create code on demand to do some things, right? Like some agents just generate a calculator on demand to make a calculation, right? And so there's a lot of these code execution sandboxes out there that are interesting. And so if you if you think about that there's actually a critical path in your system because you've got a tool that talks to another container. A container is arbitrary compute and when you have arbitrary compute many things can happen many bad things many good things right but let's talk about the bad things today. So we did the same script did the system prompt again the system prompt itself great I mean doesn't cause any damage but as an attacker you always think about the fact uh the things that are like huh that's kind of suspicious right it's like oh wait it it it runs code and never outputed it to the user okay let's output it to the user oh yeah and and most mostly run it mostly at most once let's run it all the time and so you try to basically invert what the system prompt is saying because that is exactly what the developer didn't want you to do and that is how bad actors think right so we figured out oh this thing does have a code tool and so you know we tried we tried running something it's like ah it only allows me to write Python and you know I love JavaScript and um yeah and doesn't allow me to run these really dangerous you know function calls okay and it restricts like which Python files to run that's also not good so yeah but we looked at what it could do and it had two kind of innocent permissions write a Python file and read some files. You can do a lot with that. This is great because what if we just looked around the file system now, right? We can read files. So, we looked at build me a little tree functionality and you know, return me the entire file system tree to see what's going on. Oh my god, there's a app.py file. That's probably important. Um, and then we looked at, oh, it has two endpoints, write file and execute file. Ah, okay. These endpoints are hidden behind the VPC. So, we cannot hit it directly. That's okay. Um, but huh, we can write files. Huh, we can write fuzz. There's a app.py file. Huh? Let's look into that. Oh, wait. That's where all the protections are for their code. Uh, and so we can just override the app.py file with empty strings around all the security checks. And whoopsie, we got in. So now we can Bitcoin mine all day. That's great, right? Yeah. No, it gets much worse. So the thing with arbitrary code execution once you're inside a container is that you can do many things like um there's this thing called service endpoint discovery, metadata discovery. You all heard of that? No. Okay. Basically allows you to discover what are other devices on the uh what are the devices on the network? What other resource are there on the network? And uh you can also just you know fetch the user token uh sorry the service token you know just see what's going on what's the project name yeah you know and you start looking around it's like oh okay yeah okay I I I can also fetch the scopes so I can use do many things with this token that's awesome um who has really really spent time configuring service level tokens and their permissions in a granular manner and does it all the time and never forgets to set something wrong. Okay, one guy. One guy there. Okay. Whoops. See, we have access to all their customer data. So that's uh we just queried BigQuery which has a great interface for that. Isn't that cool? Yeah. So yeah, making sure you have code sandboxes correctly is very hard because you can move laterally across the infrastructure and that is just very very dangerous. Okay. And so kind of like don't roll your off in the web world. Don't roll your own code sandboxes please. Like it's it's just very hard. It's very very hard and so use out of the box solution. There are many of them. ETB is I think a very popular one. Some some folks probably heard of it. Uh there's one in our YC batch that I personally just genuinely really love. They have observability built in. They boot up super quickly. And what I love about them is they have an MCP server that just is easy to plug into, right? So just easier for your agents to work with. So please do that. Don't do, you know, your own Python app. Um it's not good. Trust me. Um, so that leads into a third part of a attack vector around serverside request forgery. It's a very long word and it really bugs me that the SSRF didn't fit on the previous line. This really triggers me. Um, yeah, I know. So um this is what happens when you can kind of co can kind of get a tool to call another endpoint that you didn't and you know that the service itself didn't intend you to call and you can pull out a lot of information just through that workflow. So let me give you an example. So this is exactly extracted system prompt. Great. Oh this thing can create databases. That sounds exciting. Um, and then you look into it, it's like, huh, it pulls the database schema from a private GitHub repository. Isn't that great? That means whatever request goes to that private GitHub repository must have the Git credentials, right? Otherwise, how can it pull that from a private repository? So, um, yeah, and it's just a string. So, I guess I can just put in whatever string I want and coers it into providing that. So, let's set up a badacctor.com test.git git repo and just see what credentials come through and yep it comes across with the git credentials and so now you can actually take those git credentials and just download their entire codebase that was behind a private repo. Isn't that crazy? Isn't that crazy? Yeah. This is I mean it's awesome for me to do this, right? It's like you get paid to do this. Come on. It's amazing. Now um we told our batchmates immediately and they told us don't worry bro it's already fixed. It's okay guys. that that company's secure if you're a VC listening in. Um, so so but with that though it is really important to think about the implications of what your system is doing, right? I I love vibe coding, not going to lie, but like you got to really think about where all these arrows are and if you've configured those things correct correctly. So with that, always sanitize your inputs and outputs. This could be like a webdev conference from 20 years ago. Um, but but it applies to agents too, right? like we just need to make sure we keep those good security practices that have that we have learned to love hopefully over the years to take it forward to a new technology paradigm and then ultimately I want you to take away three things. So first thing is agent security is bigger than just LM security. Make sure you understand how these threat vectors apply inside your overall system. Second thing is treat agents as users and that applies to authentication to sanitization of user inputs and many of the other things. And last thing definitely don't roll your own code signbox. That is just so dangerous and you know it it it very quickly turns from like an intern project into like a nightmare. So it be very very careful with that. And these are the most basic ones that we've seen come across, right? There's obviously many more security issues. And if you don't know exactly how your agent security posture is, you can go to casco.com. You can book a demo with us. We built an AI agent that actively attacks other AI agents and tells you where they break. Isn't that great? Um, and yeah, feel free to connect with me on LinkedIn or on Twitter and I have uh every now and then some good stuff to post. Yeah. [Applause] Awesome. Thanks, Renee. Does anyone have any questions? We can have time for like one or two quick questions if you're if you're game for it. Sure. Um how do I look at system problems? There's a lot of just like open techniques. The the best one that I've seen is uh from hidden layer.com. Have you guys checked that those guys out? They have a great blog post on like um it's a policy puppeteering attack. Yeah, it's great. Very cool. Cool. Awesome. Oh yeah. How do you make sure because how do you make sure that there's so many creative ways? Yeah. Are you talking about it locally or server side? Yeah. Yeah. Yeah. No, very much so. So locally uh I think right now the industry is either you go full yolo mode or you ask every time right um I mean I'm not joking cursor thing is called yolo mode right um and then on server side use a code sandbox because ultimately they have constraints uh around the internal networks but also they have constraints around um how long they can live as a sandbox. Yeah. Okay. Sandboxes that use um yeah so they they typically use something called firecracker under the hood which is better isolation layer. Yeah. Uh, if you just use containers, by the way, that's not an isolation layer in case anybody's wondering. Yeah. Yeah. Don't use containers for isolation. Yeah. [Music]