How we hacked YC Spring 2025 batch’s AI agents — Rene Brandel, Casco

Channel: aiDotEngineer
Published at: 2025-07-30
YouTube video id: kv-QAuKWllQ
Source: https://www.youtube.com/watch?v=kv-QAuKWllQ
[Music]
So, yeah. Who's ready to hack some
agents? Yeah. Oh, wow. All right. So,
let me first introduce myself a little
bit. I'm Renee. I'm the CEO of Casco.
We're a YC company, and we specialize in
red teaming AI agents and apps. And so
we spent uh I spent my previous time at
AWS working on AI agents, but I've
always really loved working on AI. In
fact, there's a video of me 10 years ago
building voice to code and I won
Europe's largest hackathon by doing
that. And so I would talk to it, say,
build me a blog post and it would
generate the sites. And it was actually
it was kind of fun. Like it did uh
things like um yeah, load in pictures
from San Francisco. And you can see how
horribly slow the APIs were back then.
And I'm going to about to give you a
nightmare by showing you the
architecture diagram of that thing. Um,
but yeah, it kind of did the job. And
this was like 10 years ago. Obviously,
back then was no generative AI and these
things were extremely difficult to do.
Um, but it is it really gave me a
glimpse of what the future could look
like even back then as technology gets
better, right? So, obviously many things
have changed. Two months ago, I quit AWS
and worked out of uh the garage with my
co-founder and uh we got into Y
Combinator. So, yay. That's awesome. And
so from there, we also looked into how
else have things evolved. Well, this was
my um architecture diagram from back
then. Could see there was three
different cloud providers including IBM
Watson, which was like forefront at the
time. That's true. And uh before it was
like uh Microsoft Lewis, which was like
some natural language understanding
things. And you can see it was just a
lot of like piecing things together and
that was already kind of difficult to
do. But nowadays we see the stacks
normalized significantly more. Right? I
think this is probably what the average
agent stack looks like these days. Got
some server front end. You talk to an
API server that talks to an LM connects
up with tools and then you have a bunch
of data sources associated to it. So
this kind of normalization of agent
stack is actually really good. It like
makes many things easier. Definitely
better than my hacker project 10 years
ago. Um but we need to think about the
security posture around these systems
and my general impression over the last
uh last few years is like primary
discussions around LM security really
like hey is it um can you do prompt
injection? Can you get it to do harmful
content? um which is all really
important but the reality with security
is you need to look at all the different
errors in your system and that is
typically where real damage happens
right and so this is really agent
security and that is what I want to talk
about today now one thing is like why
did we even hack a bunch of agents it's
kind of a weird thing to do um the
answer is quite frankly you know we
wanted to launch internally at Y
combinator and we wanted a splashy
headline and so we're like uh oh what do
we do and fun fact we have the second
highest upvoted launch post inside y
combinator of all time so higher than
rippling yes okay um so uh we we bit we
we did basically this approach at a time
we're looking at oh which agents are
already live and then let's just set a
timer for 30 minutes we don't want to
waste too much time on this and then you
know let's let's figure out what their
system prompts are and just kind of
understand how they're working and I I
have a feeling when I was creating this
meme that this could be true, but it
turns out it is true. And then we looked
at, oh, what kind of tool definitions do
they have, right? Like, you know, what
is it supposed to do? Is it supposed to
access data, supposed to run code,
right? And then we just uh try to
exploit them and see what's what's going
on. Uh, and it was really fun because we
hacked uh out of 16 agents that were
launched within 30 minutes each. We were
hacked uh we hack we hacked seven of
them. And there are three common issues
we see across all of these ones. So I
hope that we will all learn today what
the most common issues are so you don't
make the same mistakes and also this is
going to be the best investment if
you're a VC dispatch because they're all
secure now. So first issue crosser data
access I mean you guys were just here at
the OF talk you know where this is going
to head into right um so we first leaked
this company's system prompt and we saw
huh has a bunch of interesting tools
attached to it including looking up user
info by ID suspicious uh document by ID
and a bunch of other things and then you
know like when you see this you just
want to like oh yeah there's this thing
called IDOR like insecure direct object
reference. It's basically when you make
a request and you validate that, hey,
the token is valid and you just let the
request through, right? And you're kind
of betting on the fact that the ID
cannot be guessed. Well, that's
obviously not good. Um, so yeah, we
looked up a product demo video that they
recorded and we found the user ID in the
URL bar and just like tried to plug it
in. Uh, this is a different ID, by the
way. Don't worry guys, this is my
co-founder's ID now. And uh yeah, we
were able to find their personal
information including their email,
nickname, whatever. Um but it gets
better because these things are also
interconnected. So you had not only
their user ID, but you also had like oh
the chat ID. Oh, and their document ID
and then these things ultimately linked
up together and allows you to traverse
the entire system, right?
It's not good. So what's the fix for
that? There was a really comprehensive
talk literally right before this. Sorry
for the folks that missed it, but this
is the basic fix for it, right? You need
to think about how do you authenticate
but also authorize the request. It's
really two checks, right? Make sure your
your token is valid. Good job team.
Yeah, I got that. And then the second
thing is like this is what we see in
this superbase era with role level
security. Just make sure that you have
some sort of access control matrix
somewhere that checks that it matches up
with whoever is making the request.
Okay, super super important.
authenticate and authorize.
Now you can see this was actually, you
know, an issue that was kind of there,
right? It's it's not like around the LLM
and the API server. It's really what is
happening downstream. And um yeah,
there's a lot of arrows in this diagram.
We're going to look at all of them. So
the next thing is to remember as you're
thinking about these tools and how
you're building it, like agents actually
act like users um not API servers. when
we were like debugging this issue like
we actually asked a bunch of Y
combinator companies like why did you
build it this way because clearly they
can build a web app properly right but
it's just like I think as developers we
have this natural pattern matching in
our heads it's like oh yeah this thing
runs on a server so it should be like a
service and then I'm going to give it
service level permissions but actually
agents are like users right so
everything that applies to users apply
to agents too so make sure that you know
your LM should probably not determine
authorization pattern that that that's
bad. That's a red flag. Uh second thing
is it should probably not act with
service level permission. Listen to a
previous talk on Olaf. That's great. Um
and then just like users, you should
make sure you uh don't just accept any
input. Should sanitize them. Same with
outputs, right? A lot of these are like
the traditional web application security
things that you just need to like really
really internalize for this new world.
Now that was interesting. And so the
second one was even better. Um so this
is not as common but the damage is
bigger. So it's what in pattern we see
so there are a lot of code tools that
agents use and there's a there's a
there's a anthropic paper here. It
basically talks about what's the
distribution of which industry and how
much do they use claude and there's like
this one outlier here. I'll zoom it in
for you. Um yeah so us nerds we make up
3.4% of the world but we're 37% of
cloud's usage. Oh, why is that? Because
we love computers and we love coding,
right? And so we found immediately the
value of it. But it's not just us that
use agents with coding tools. In fact,
many agents create code on demand to do
some things, right? Like some agents
just generate a calculator on demand to
make a calculation, right? And so
there's a lot of these code execution
sandboxes out there that are
interesting. And so if you if you think
about that there's actually a critical
path in your system because you've got a
tool that talks to another container. A
container is arbitrary compute and when
you have arbitrary compute many things
can happen many bad things many good
things right but let's talk about the
bad things today. So we did the same
script did the system prompt again the
system prompt itself great I mean
doesn't cause any damage but as an
attacker you always think about the fact
uh the things that are like huh that's
kind of suspicious right it's like oh
wait it it it runs code and never
outputed it to the user okay let's
output it to the user oh yeah and and
most mostly run it mostly at most once
let's run it all the time and so you try
to basically invert what the system
prompt is saying because that is exactly
what the developer didn't want you to do
and that is how bad actors think right
so we figured out oh this thing does
have a code tool and so you know we
tried we tried running something it's
like ah it only allows me to write
Python and you know I love JavaScript
and um yeah and doesn't allow me to run
these really dangerous you know function
calls okay and it restricts like which
Python files to run that's also not good
so yeah but we looked at what it could
do and it had two kind of innocent
permissions
write a Python file and read some files.
You can do a lot with that. This is
great because what if we just looked
around the file system now, right? We
can read files. So, we looked at build
me a little tree functionality and you
know, return me the entire file system
tree to see what's going on. Oh my god,
there's a app.py file. That's probably
important. Um, and then we looked at,
oh, it has two endpoints, write file and
execute file. Ah, okay. These endpoints
are hidden behind the VPC. So, we cannot
hit it directly. That's okay. Um, but
huh, we can write files. Huh, we can
write fuzz. There's a app.py file. Huh?
Let's look into that. Oh, wait. That's
where all the protections are for their
code. Uh, and so we can just override
the app.py file with empty strings
around all the security checks. And
whoopsie, we got in. So now we can
Bitcoin mine all day. That's great,
right? Yeah. No, it gets much worse. So
the thing with arbitrary code execution
once you're inside a container is that
you can do many things like um there's
this thing called service endpoint
discovery, metadata discovery. You all
heard of that? No. Okay. Basically
allows you to discover what are other
devices on the uh what are the devices
on the network? What other resource are
there on the network? And uh you can
also just you know fetch the user token
uh sorry the service token you know just
see what's going on what's the project
name yeah you know and you start looking
around it's like oh okay yeah okay I I I
can also fetch the scopes so I can use
do many things with this token that's
awesome um who has really really spent
time configuring service level tokens
and their permissions in a granular
manner and does it all the time and
never forgets to set something wrong.
Okay, one guy. One guy there. Okay.
Whoops. See, we have access to all their
customer data. So that's uh we just
queried BigQuery which has a great
interface for that. Isn't that cool?
Yeah. So yeah, making sure you have code
sandboxes correctly is very hard because
you can move laterally across the
infrastructure and that is just very
very dangerous. Okay. And so kind of
like don't roll your off in the web
world. Don't roll your own code
sandboxes please. Like it's it's just
very hard. It's very very hard and so
use out of the box solution. There are
many of them. ETB is I think a very
popular one. Some some folks probably
heard of it. Uh there's one in our YC
batch that I personally just genuinely
really love. They have observability
built in. They boot up super quickly.
And what I love about them is they have
an MCP server that just is easy to plug
into, right? So just easier for your
agents to work with. So please do that.
Don't do, you know, your own Python app.
Um it's not good. Trust me. Um, so that
leads into a third part of a attack
vector around serverside request
forgery.
It's a very long word and it really bugs
me that the SSRF didn't fit on the
previous line. This really triggers me.
Um, yeah, I know. So um this is what
happens when you can kind of co can kind
of get a tool to call another endpoint
that you didn't and you know that the
service itself didn't intend you to call
and you can pull out a lot of
information just through that workflow.
So let me give you an example. So this
is exactly extracted system prompt.
Great. Oh this thing can create
databases. That sounds exciting. Um, and
then you look into it, it's like, huh,
it pulls the database schema from a
private GitHub repository.
Isn't that great? That means whatever
request goes to that private GitHub
repository must have the Git
credentials, right? Otherwise, how can
it pull that from a private repository?
So, um, yeah, and it's just a string.
So, I guess I can just put in whatever
string I want and coers it into
providing that. So, let's set up a
badacctor.com test.git git repo and just
see what credentials come through and
yep it comes across with the git
credentials and so now you can actually
take those git credentials and just
download their entire codebase that was
behind a private repo. Isn't that crazy?
Isn't that crazy? Yeah. This is I mean
it's awesome for me to do this, right?
It's like you get paid to do this. Come
on. It's amazing. Now um we told our
batchmates immediately and they told us
don't worry bro it's already fixed. It's
okay guys. that that company's secure if
you're a VC listening in. Um, so so but
with that though it is really important
to think about the implications of what
your system is doing, right? I I love
vibe coding, not going to lie, but like
you got to really think about where all
these arrows are and if you've
configured those things correct
correctly. So with that, always sanitize
your inputs and outputs. This could be
like a webdev conference from 20 years
ago. Um, but but it applies to agents
too, right? like we just need to make
sure we keep those good security
practices that have that we have learned
to love hopefully over the years to take
it forward to a new technology paradigm
and then ultimately I want you to take
away three things. So first thing is
agent security is bigger than just LM
security. Make sure you understand how
these threat vectors apply inside your
overall system. Second thing is treat
agents as users and that applies to
authentication to sanitization of user
inputs and many of the other things. And
last thing definitely don't roll your
own code signbox. That is just so
dangerous and you know it it it very
quickly turns from like an intern
project into like a nightmare. So it be
very very careful with that. And these
are the most basic ones that we've seen
come across, right? There's obviously
many more security issues. And if you
don't know exactly how your agent
security posture is, you can go to
casco.com. You can book a demo with us.
We built an AI agent that actively
attacks other AI agents and tells you
where they break. Isn't that great? Um,
and yeah, feel free to connect with me
on LinkedIn or on Twitter and I have uh
every now and then some good stuff to
post. Yeah.
[Applause]
Awesome. Thanks, Renee. Does anyone have
any questions? We can have time for like
one or two quick questions if you're if
you're game for it. Sure.
Um how do I look at system problems?
There's a lot of just like open
techniques. The the best one that I've
seen is uh from hidden layer.com. Have
you guys checked that those guys out?
They have a great blog post on like um
it's a policy puppeteering attack. Yeah,
it's great.
Very cool. Cool. Awesome. Oh yeah.
How do you make sure
because
how do you make sure that
there's so many creative ways?
Yeah. Are you talking about it locally
or server side?
Yeah.
Yeah.
Yeah. No, very much so. So locally uh I
think right now the industry is either
you go full yolo mode or you ask every
time right um I mean I'm not joking
cursor thing is called yolo mode right
um and then on server side use a code
sandbox because ultimately they have
constraints uh around the internal
networks but also they have constraints
around um how long they can live as a
sandbox. Yeah. Okay. Sandboxes that use
um yeah so they they typically use
something called firecracker under the
hood which is better isolation layer.
Yeah. Uh, if you just use containers, by
the way, that's not an isolation layer
in case anybody's wondering. Yeah. Yeah.
Don't use containers for isolation.
Yeah.
[Music]