How to Secure Agents using OAuth — Jared Hanson (Keycard, Passport.js)
Channel: aiDotEngineer
Published at: 2025-07-30
YouTube video id: blmAkayzE8M
Source: https://www.youtube.com/watch?v=blmAkayzE8M
[Music] Thanks a lot everyone. Thanks for coming out. Uh we're going to talk about a topic that I consider one of the most uh important topics uh for what we're doing with AI and agents, which is how to secure agents using OOTH. Um I'm Jared Hansen. I'm the co-founder of a new company called Keycard where we're building identity and access management platform for AI and agents. I'm also the creator of Passport.js for any of the node uh developers in the audience very popular o framework and previously I was at Ozero where I built a lot of their core identity infrastructure and then and then at octa uh let's get into it. So I think we're all super excited about what's happening with LLMs and AI powered applications. uh you know we can bring these things into our daily lives and they automate a lot of the tasks for us and and simply put agents that are more connected are more useful. Uh so let's connect these agents to more systems. But but hold on a second because today we face an impossible choice. Uh we can give agents broad-based access and accept security risks or we can limit their capabilities and sacrifice business value. Uh, and this is exemplified pretty well in how we set up uh, MCP servers today, which is we go get API keys that are typically longived and broadly scoped. We paste them into some configuration files and environment variables and and let our agents run with them. Now, if we continue this pattern for hundreds or thousands of agents, we've got a pretty big security problem on our hand. Uh, luckily, we we know how to fix this. We know how to transition away from static secrets uh, to dynamic access using OOTH. Now, show of hands, how many people are familiar with OOTH in the crowd? I'll say quite quite a bit. So, I'll I'll burn through this quickly. Uh, but just uh as a quick introduction, um I'm not going to lie to anyone like OOTH is a relatively complicated protocol, especially when you consider all the extensions. But the princip principles behind it are fairly straightforward and easy easy to understand. Uh what it is is a protocol for applications which we call clients in OOTH to request access to APIs which we call resource servers and and these requests are mediated by what's known as an authorization server. If you've ever used anything like Calendarly and connected it to your Google calendar API, you've experienced OOTH in the real world. Uh what's happening there is Calendarly sends a request over to Google saying, "Hey, I'd like access to this person's Google calendar." uh Google C- Google's authorization server then you know ensures that you're logged in prompts you for consent uh that you want this access to occur and if you agree to it uh Google sends what's known as an access token over to Calendarly and then Kalanley can take that access token and go about accessing your calendar. Uh there's a few other interesting bits going on here like refresh tokens which basically allows these access tokens to be shortlived and rotated pretty quickly while still maintaining the the authorized connection. Uh and in OOTH we call these types of flows that involve user delegation uh authorization code flows and they typically happen via browser based interfaces that that you've seen when you've used these types of applications. Now one thing that gets kind of confusing for people is that OOTH is oftent times used to implement things like signin with Google or signin with uh Facebook. Uh and this is confusing because we refer to OOTH as an authorization protocol or a delegated authorization protocol specifically. So what's what's going on here when we use it for signin? Well, this is really just a special case where the API gets replaced with a user info API that just returns claims about the user who logged in. So their ID, their name, their email address, etc. And we kind of use authorization to back our way into authentication. Um, and this became like such a common pattern that people used with OOTH that it got formally standardized as open ID connect which is just an identity layer on top of OOTH uh that standardizes the response format of that user info API. Uh, it also does a couple things that are kind of confusing like introduce more terminology which identity people are prone to do. Uh, we call the authorization server now an identity pro identity provider in in the scope of open ID connect and applications are known as relying parties. Don't get hung up on the terminology. It's it's all the same thing. Uh one other thing that open ID connect does is it introduces an ID token. This is simply a JSON web token which is a cryptographically signed statement about who the user is. Uh this overlaps a lot with the user info API. You can think of it as sort of an optimization that the application can verify itself without making API requests. It also serves some functions in like ongoing session management between applications and authorization servers, but that's kind of beyond the scope of introductory material here. Uh in the real world, these things get deployed together. We'll typically run authorization and authentication flows uh in line uh so that you know we know who the user is who logged in as well as get access to things like their Google calendar. Uh one thing to call out that is important here is that there's three roles in Oath. Uh the client uh and the resource server I think are all relatively straightforward. We understand that from client server architectures. The client requests act uh resources and the resource server responds with the data. Uh what gets different is that we introduce this authorization server in the middle that mediates this access. Uh and it mediates it by issuing tokens. uh issues tokens back to the client which holds them and then presents them to a resource server and the resource server's job is to verify those tokens. Now what's what's the benefit of this sort of model? Uh the main benefit uh flows to the APIs. They don't have to care about anything to do with authentication anymore. So verifying user password or doing step-up authentication, running the consent flows, they hand all that job off to the authorization server and it gets kind of abstracted away by the token uh that the API can verify what has happened. Um there's also some benefits that we can like centralize policy uh and then deploy ecosystems of apps and APIs all kind of protected by a central location and and build out the the ecosystems that we all know today. Uh how do we apply this to MCP and agents in in particular? Well, it it should be pretty simple. Uh now our applications get replaced by a chatbot or agent like claude. Uh that we want to connect to MCP servers. Uh we the MCP clients and the MCP servers should get authorized via OOTH by you know the controlling authorization server in the middle. This should be pretty simple, right? Well, nothing with OOTH is ever so simple. So let's take a look at the state of authorization in MCP. Uh we're going to look at at where it started, where it is now, and and then where it's going in the future. So the first version of MCP, uh it's a pretty young protocol. It's like seven months old to the day, I think. Uh the first version I like to call the NOOTH version. It didn't have any authorization in it at all, uh which they admitted in the spec. It was really a way to get something out there primarily for uh local MCP servers. uh there was some notion of remote MCP servers uh but again no authorization but this kind of spurred discussion people saw the promise of MCP and and started discussing how to add authorization to it. Uh now we have the latest draft of the specification uh which was published in late March. I like to refer to this as OOTH the first attempt and for anyone who has ever done OOTH implementations the first attempt is always pretty poor. Uh, and that is the case with this version of the specification of MCP. Uh, I don't actually recommend anyone read the authorization part of the MCP specification as it is today because you'll walk away with a pretty misinformed view of what OOTH is. But as a quick recap, what it does, it says, okay, MCP clients got to implement the client side of of OOTH. That all makes sense. And then it also says MCTP servers. You need to implement all all of OOTH 2 including authentication, token issuance, etc. Now, OOTH has three roles. Where's where's the third role here? What happened to the OOTH server? Well, it got collapsed into the MCP server, which which is a bit odd. Uh and people started noticing this. So, 5 days after the specification was released, uh a blog post went viral. Uh this one from Christian Posta saying the MCP authorization spec is a mess for the enterprise. Uh and he states you know the problem here is that it treats the MCP server as both a resource server and authorization server. Uh Aaron Perky who does a lot of great OOTH standards work uh followed this up with another blog post that went viral titled let's fix OOTH and MCP where he noted that you know a bunch of the confusion that was happening was because the diagram show that the MCP server itself is handling authorization. Now then this kind of culminated in a in a PR to the specification where uh people proposed let's let's fix this problem. Let's just shift uh the MCP server to be an OOTH OOTH resource server and everything will be good. This is a super interesting PR to read. There's like 400 some comments on it. It's not even the only PR there. Uh but just kind of a example of how people just picked up on this problem and ran with it. Now, I'm not usually one to say I told you so, but all the way back in January of this year, I commented on the uh as a review for the specification. I was like, "Hey, I recommend we model MCP servers as as resource servers from an OOTH perspective." I'm not quite sure where where that got lost. It it didn't get picked up, but in any case, uh we fixed this problem. And one of the reasons I'm here is to tell us all more about OOTH things that we need to pay attention to in order to avoid this problem in the future. Uh so, okay, the next attempt in draft all this feedback has been incorporated and the MCP spec is kind of like fixing its issues. Um, and the draft version of the specification models o all of OOTH pretty cleanly and pretty nicely. Uh, the OOTH authorization server is a totally separate entity. And this is really beneficial for all of you building MCP servers because your job gets a whole lot easier. All you have to do is verify the tokens that come in over HTTP and hand off all the other responsibility to the OA server. So, we're back to a pretty good place uh with respect to OOTH and MCP and in particular how we authorize connections between MCP clients and MCP servers. So, let's talk about the future. If if this is all we do with OOTH, we're not even scratching the surface of what we need in order to fully secure AI and AI interactions. So, what else are we going to need? Uh we're going to burn through this here pretty quick. The first is agentto agent uh communication. So what we've seen with OAS so far as it's applied to MCP like I said that's referred to as the authorization code flow and it's particularly relevant for when we want to do end user delegation. Uh but there's a whole bunch of other flows in OOTH uh that are relevant in particular client credentials and this applies when we want agents to communicate with other agents or other MCP servers on their own behalf not on behalf of a user. So this is one thing to pay pay attention to. The next, this kind of begs the question, agent identity. Uh, what should we do about this? Well, if anyone's ever done OOTH development, you you're probably familiar with this type of flow is you want to build an application. Uh, you want to integrate with an API. You go to some developer portal, create a new application, get a client ID and secret, and then somehow configure your uh application with that those credentials. Uh, this is a bunch of friction. This obviously won't apply well to uh MCP which is trying to be a standard pro protocol and you want to bring tools and agents together that that may not be aware of each other. Uh you can't do this if you presuppose some sort of registration process. So what does MCP do? Uh well it picks up what is known as dynamic client registration. Uh what this does is allows applications and agents to request credentials at runtime rather than like ahead of time in manual registration. Uh so an agent says hey like this is who I am give me a client ID and secret the server does it and the agent goes about the rest of its ooth flow. Now this specification has been around for about 10 years and in practice has seen like no meaningful adoption and one of the implications behind this is it like makes all agents anonymous because the registration request itself is uncredentialed. This makes it hard to build trust in agents. It's probably not super viable in my opinion. So what should we be looking at instead? Well, there's many cases where we just want to use public clients that we don't really care about verifying their identity. In this case, there's an emerging specification called push client registration uh which introduces this kind of like well-known string to identify a like public client. Uh we can just use this well-known string and we skip the whole registration song and dance and then the need to store the resulting state. Uh this is but a lot more simpler. It also has the capability to carry uh certain client metadata in the request if if that's necessary. So this is something we should look in for cases where public clients apply. Uh but what about clients that we actually want to authenticate and verify their identity? Well, my proposal here is that we should start looking at uh using URLs in PKI for identity. Um, this lets us reuse the existing identifiers that people already associate with the apps they're using and and can repurpose them into the agent world. This looks like in practice we'd have a URL such as uh, you know, agent.com to be used as a client identity in OOTH flows and then through the magic of uh, cryp cryptography and key sets, we can authenticate these agents by having them sign uh, jot assertions or HTTP message signatures that we can then verify with with the corresponding public keys. All right, this dub dovetales into agent add astation. Uh, we've connected our agents to the resources that we're using, but then that agent turns around and sends all that information up to an LLM. This seems like something we should probably have some awareness of and control over. Uh, so in kind of protected environments, we can sort of get by like treating the LLM as just another API, which often it is. uh and this is a technique we could apply but it has limited uh capabilities when we look at like edge deployed agents such as on the desktop or mobile devices where we don't really control their software environment. So there's a bunch of interesting work going on in the IET app now with respect to like remote addestation and supply chain uh security where we can start to ad attest to the state of the device and the software running on it and know what LLM our data is going to wind up in and then incorporate that into OOTH authorization flows. Next up transactional authorization what we've done to date in OOTH uh is uh introduce scopes. This is a whole lot better than passwords which OOTH kind of replaced back in the day uh in the sense that now we can do more fine grain permissions such as like read versus write access. Um but in practice these end up being a little bit too coar grained for a lot of use cases and oftentimes a little bit longer lived than than we might like. Uh in in agent interactions we're going to have to be increasingly transactional. So imagine use cases where you want agents to do financial trans transactions or commercial transactions. We're going to want to authorize things on a transaction basis potentially with uh specific amounts or or financial budgets. So we're going to have to look at moving to more dynamic access in this respect. There's a proposal that's actually like a specification at this point called rich authorization requests which is which is worth looking into um and something that we can take inspiration from or either adopt directly for these these use cases. Next up we have chain of custody. This is uh particularly interesting to me. Uh what we talk about with MCP really covers the first leg of this. On the on the lefth hand side we have authorized connections between agents and MCP servers. But what happens on the right side is completely unspecified in terms of like the security pro profile. So how do we protect an MCP server that calls another API within the same domain? In particular, there's a technique called OOTH token exchange that I recommend everyone look into. Uh special case of this is MCP servers to third party APIs. In this case, uh we should look into identity chaining across domains. uh and its corresponding specification the identity assertion grant which lets us do cross domain authorization in the back end. Somewhat outside the scope of OOTH is other internal infrastructure that people should be aware of as they look to deploy these agents. And then the culmination of this is really agentto agent flows where uh I don't know how much of this is happening in practice today but people see the promise of it. Imagine big graphs of agents talking to other agents on other servers. We're going to need endto-end visibility as the authorization flows along these graphs. Finally, async interaction. Uh I think one of the key things to look at here is like OOTH typically assumes a user is sitting in front of a browser and relatively static, but as we kick off flows, users might walk away and agents do work in the background. They're going to need a way to reach out to the user and say, "Hey, I need a bit more access than I've been permissioned." How do we think about bringing more like real-time interactions via channels like SMS or push notifications rather than just browser based flows? And then a hot topic today, uh there's a bunch of interesting work going on in the voice voice track at at the conference. Uh as AI starts to interact with us via voice and video or completely in the background, how do we think about security in those respects? This is really the frontier of of security and inter interaction. But there's a lot of prior art in various real-time communities around SIP, XMPP, XMPPP, WebRTC that uh I think is very interesting for us to all look at. So there's a lot here. Let's let's go build this stuff. It's all important for us uh to to achieve a safe and secure AI future. Uh this is what we're building at Keycard. Uh we're building an identity access management platform that lets you connect your co-pilots, custom agents, and thirdarty agents to all your apps, services, and infrastructure all using standards compliant protocols, ADA, MCP, and OOTH. If building this stuff is interesting to you, we are hiring hiring. So get in touch with me. And if you're if it's not interesting to you, but you know you want to secure your agents, get in touch with us, too. We're looking for partners that are building uh so that we can work with you to secure secure your agents. Uh the website site is keycard.ai and I will be around the rest of the conference. Thanks. [Music]