A2A & MCP Workshop: Automating Business Processes with LLMs — Damien Murphy, Bench
Channel: aiDotEngineer
Published at: 2025-07-26
YouTube video id: wXVvfFMTyzY
Source: https://www.youtube.com/watch?v=wXVvfFMTyzY
[Music] Hey everybody. Uh yeah, thanks for coming. Uh great to see a full room. Uh always good when you're doing a workshop to have a a lot of people here. Um so yeah, I'm I'm Damian Murphy. I'm going to be presenting A2A and MCP. uh two pretty hot topics these days in AI um and how you can use them to automate business processes. Um so yeah a little bit about me um about 15 years full-time uh full stack developer uh five years doing solutions engineering so customerf facing kind of uh forward deployed engineer and uh spent the last three years or so uh working on voice AI and AI agents. Um I did a workshop last year as well um AI voice agent swarms and uh yeah pretty pretty hot topic. I think it's now pretty much standard that everybody can build a voice agent in in 5 minutes. Um so now the the hard part becomes building autonomous agents that actually can do complex tasks. Um, so I joined Bench Computing uh about two months ago uh pre-revenue startup u backed by Sutter Hill Ventures and we're building um what I would imagine to be a better Manis uh that's more focused on teams and enterprises. If you're not familiar with what Manis is, it's kind of like a autonomous AI agent. And Bench is essentially an autonomous AI agent that can do uh sub subp parallel task um automation. All right. So the workshop that we're doing today, we're going to build a multi- aent system um using A2A agents. Uh if you're not familiar with A2A, uh Google released essentially um a protocol that allows agents to communicate over the web. Uh we're going to integrate these agents with uh MCP, which is the model context protocol. Uh MCP is like a USBC for, you know, uh all of your agents to be able to consume um context and tools uh and resources uh very easily. uh we're going to get these agents to work together and we're going to trigger uh the the agent with a web hook and then uh I'm going to cover a little bit about when to use A2A MCP and I'll also go into uh prompt caching and context management as well. All right, so A2A, right? Um it's it's not exactly clear what it's for and why it exists, right? Uh if you ask everybody in the room what they think it does or why it exists, you'll probably get a different answer. Um but the key benefits are you can have agent specialization, right? So rather than trying to make one agent do 100 things, you can have a 100 agents do one thing and do that one thing very well. Um Away allows you to handle task delegation. So, you know, imagine you had a Salesforce agent um and you wanted it to interact with all the Salesforce MCP um uh tools. Uh you could do that. Um you've also got the ability to do parallel processing. Uh and this will become very important when it uh comes to speed and context management. Uh you can then use those A2A agents to have uh complex workflows and and help uh keep your main agents context size down. Uh MCP again really hot topic right now. Uh it's been kind of coined as the USBC for AI. Um and there's definitely some benefits in just having a standard interface, right? You know, there's something like 10,000 MCP uh tools that you can use today. Um about 7,000 of those come through the Zapier MCP. Uh if you're not familiar with Zapier, it's essentially a way to connect disparate systems together. And they've now released all of their uh zaps they're called as MCP uh servers and tools. Uh one of the great things about MCP, no integration with APIs. So you don't have to do any sort of you know different handling of different APIs. Uh it's a plug-in architecture uh an industry standard. Um and it's really based on LSP. So LSP was a way for you know idees to actually uh figure out how different code languages worked. Um and it was a great kind of um transfer uh of of ideas over to the MCP protocol. All right. So when should you use A2A versus MCP? Anybody? MC if you want to resource the infrastructure then you go for MCP but I don't know and and and that's kind of the the challenge right it's like what exactly um you know these protocols for and should I be using them and and things like that. So if you want to have, you know, two agents, right? And typically two agents that are completely unrelated, right? So it's not two agents you necessarily control. It's more likely going to be an agent of a third party or, you know, their first party agent and your agent. Yeah. What's the difference between agent and A2A? So I work a lot on the agent where we have multiple agents and doing the same. The you are saying describing A2A is a lot similar to a care. Yeah. So like autogen and and frameworks like that that allow you to kind of uh manage multiple agents kind of locally. H A2A is more about remote agents, right? So agents you have no knowledge of. Um so you can think of A2A as a way for you to have service discoverability and once you have the endpoint to the agent, you can then learn everything that that agent's capable of. Um with things like autogen it's like you know descriptive so you describe what it's capable of it's in your control. So to summarize agent AI is kind of define the role of each agent and A2 is kind of working on remotely and its role is not defined or defined. So each of the A2A agents will have a a definition and we'll kind of get into that a little bit later. Um but yeah, think think of agentic AI kind of as a superset of everything, right? Um A2A and MCP are just kind of subsets of that, right? Different modalities. Um yeah, so for MCP, you're you're going to connect to external context and tools. Um a lot of people don't use most of the features of MCP, right? They're just using the tools. Um but there's a lot of stuff around prompt templates resources um and a thing called um sampling sampling is actually going to be a really interesting thing I think that we'll see a lot more of as well uh where it allows these MCPs uh to sample the host LLM right so if you're using you know claude and you're hitting an MCP server um that MCP server may want to also use the same model of cloud that you're using and it can use sampling to actually achieve that Um, so when you bring those two together, you you kind of get the benefit of both, right? So you have A2A is the the remote interface. Uh, and MCP is then giving you the actual um tool use and and context management. Okay, so when not to use MCP. Um, and and you'll notice a lot of like memes here. Uh, and just to give you a heads up, all memes were generated by Bench. Uh, actually the whole slide deck was generated by Bench. um I just gave it a markdown file and it and it outputed it. So um when you use A to A or MCP um if you have full control of the tools then you probably don't need it right like if if your function is local to your codebase you know why do you need to create you know a USBC it's kind of like me plugging in my hard drive with a USB cable you know like shouldn't I just use the hard drive that's in my machine right um so calling functions directly in your codebase super easy easy to maintain faster to develop um and then If you have full control of your agents, you probably don't need A2A either, right? Um like if they're your agents, you can use, you know, some sort of local uh function call for them to communicate. And and I I've built multi- aent systems using MCP and using just local function calls. It's a lot easier to just use the code you have. Uh it's going to be faster. There's no protocol overheads and and things like that. A lot easier to debug as well. Okay. So, why do you need A2A and MCP at all? Right. Um, third party tools is probably the number one reason uh to use MCP. Um, you can just get access to such a large array of tools um that you know you're never going to be able to uh let's say you're building a product, right? And and you're like, "Okay, we're going to build first class integrations with Salesforce and Slack and but what about the other 10,000 tools?" I was like, "Okay, we'll just allow people to add their own MCP server." Um, so that gives you great extensibility. Um, but there's a lot of drawbacks with MCP, right? Um, you only get what you're given. Um, and a lot of time that's not exactly what you want. Um, so you may go down the route of saying, you know what, I need a a way to actually index this data so that I'm not calling like, you know, list Slack channels every time I want to post a channel, right? Or post a message. Um, and then with A2A, uh, the com complexity is hidden from you, right? And that's one of the kind of the key tenants of of A2A is that you don't know anything about this agent until you connect. Um, and all of its complexity is is completely opaque. Um, and then you you can essentially connect to, you know, any sort of uh remote A2A agent. Um, so long as you have, you know, the credentials and things like that. Um, we haven't seen any firstparty A2A agents released yet. Um, but Google has about I think 50 uh partners they're going to launch with. So, I I'd imagine there's going to be like a Salesforce A2A agent. Um, it'll probably only come with a paid account, right? Because it's going to use LLM compute. Uh, versus things like MCP typically don't actually use an LLM, right? They use the host LLM. All righty. So, we're going to get into the code now. Um yeah, so uh if you haven't already grabbed the repo, uh we also have a Slack channel, um workshop A2A-mcp, uh-2025. Um and in this repo, there's basically every everything you need to get going. Um yeah, so the the code structure, uh we've got a host agent, um and then we've got some sub agents, right? And the whole concept here is to demonstrate, you know, ATA and MCP. Um, but in reality, these sub agents will probably live in a different repo, you know, run on a different server. Um, yeah. And then we've also got the uh A2A implementation. Uh, the server and the client in uh the repo. Uh, these are taken directly from the ATA uh repo. We've also got the MCP integration. So, this is just a client. Um, we're not creating a a server here. Uh we also have a CLI interface. You're not going to need the CLI interface. That's kind of internally how it's being used. Um yeah, so once you've cloned a repo, you're going to want an MPM install. Um and you're going to need a MCP server URL. Uh this is going to be a Zapier uh URL and a Gemini API key. Uh you can get both of these for free. Uh there's no need to to sign up for a paid account to get them. Um and you'll want to rename your mv.ample example uh to all right so setting up the Zapier MCP um when you go to uh zapier.com/mcp uh you'll have the option to create a new server um and when you go to connect you're going to have a couple of uh options here we're going to use SSE um they recently released uh streamable HTTP which uh is making SSE deprecated and it's going to replace it um but there's There's still a litany of SSE servers out there. So, um I just used SSE for this one. Um once you do that, you're going to get this server URL at the bottom. You can copy that URL. That's going to be the URL that goes into your and then uh you're going to set up a Slack and a GitHub integration. Um so, you're going to want the ability to create an issue. Um you can put in uh the repository URL for the workshop if you want. Uh you can use your own uh as well. um you can let AI choose uh these, but what I found with AI is that it will choose something else, right? Um so a lot of time with these MCPs, you're going to want to kind of say, hey, you know, this is the thing I want to do, so let's just kind of hardcode that. Um but if if you do let it kind of go wild into your Slack, uh it's going to start posting in general and random and sales and uh yeah, a few of my bots have kind of gone rogue. All right. So, the Gemini setup. Yeah. So, you can get the uh API key here, the AI studio. Um, and there's a a link in the uh slide deck as well if you need to click it. Um, you can get a free account, generate an API key, uh, drop that into your M as well. Excuse me. And there's also a remote uh, bench A2A agent. Um, so the code for it actually in the repo. Um, but we haven't officially released our API yet. So, I'm just hosting that remotely. Um, but it's a nice kind of way to show how you would use A2A remotely as well. Um, so what is Bench? Uh, Bench is essentially a kind of LLM aggregator uh with autonomous AI agents. Um, so you get access to cloud, Gemini, OpenAI, XAI and loads more models. Uh, it has I think about 30 tools now um and integrations. So um we actually started out with MCP integrations to Slack and Salesforce. They didn't meet our needs. We built firstparty integrations, you know, data caching and indexing. Um and and that kind of gives you an idea of like how far is MCP going to get you, right? Uh eventually at some point you're going to realize that it doesn't do the you know the specific thing you need to do. All right. So running the application um you're going to run mpm run start all um and that's going to kick off all the agents right so the slack agent the github agent uh the host agent and uh it'll also start the web hook server and the web hook uh admin panel uh you can access net then through localhost port 3000 and um yeah so let's just kind of go into what each of the actual uh agents do. Um so the host agent is essentially your central coordinator, right? Um and this this may be the only agent that you have in your application. It may be using external uh A2A agents. Uh and if that's the case, then you know everything that your host does is going to be delegated um you know to sub agents. Um, so that handles all the agent discovery and and kind of bringing everything together. Yeah. So the the code for that's going to be in source agents host and um you'll notice there's a couple of files in there. One of them is uh the host agent prompt, right? So that's just a plain text uh system prompt uh genkit. That's going to be uh essentially how you hook all of your A2A code up with Gemini. Um and there there's also a genkit MCP plugin that um the sub agents use. Uh so then the Slack agent um so this is going to send a Slack message in response to the web hook transcript and yeah the the the kind of sample web hook that we have in this is essentially you know your meeting end and you're going to receive a transcript of that meeting right um and with that you're going to decide what to do. So, it's going to, you know, if it detects any bugs, it's going to create a GitHub issue. If it detects any, you know, feature requests or or anything of interest, it's going to post that into Slack. Um, and you can think of the kind of automations that you can build with this sort of, uh, scenario, right? So, um, you could even I had a version here that was hooked up to Salesforce, but um, there's actually a limitation on the host agent on how many sub agents it can call. Um, so I I figured, right, if one of them's going to go, it's going to be Salesforce because it's it's probably the hardest to get an account on. Um, but you could actually update an opportunity based on a sales call, right? So you could have a sales call and you, you know, you're talking to them, you're doing your discovery and you're able to update those Salesforce fields automatically. Um, and like the time saving for account executives because, you know, they're probably on backtoback calls is actually pretty big. Yeah. This was an interesting um issue I ran into. So I asked one of my colleagues um to test the repo out, right? Um and he was getting this weird error where it was saying, you know, the Slack MCP succeeded. Um so I asked him to send me the logs and he sent me this and it was like is error false? And I'm like okay that's that's great. So yeah, it turns out that you know not all MCPs are created equally and the Zapier Slack MCP uh fails silently. Um so the the reason it failed was he he had um the default Slack channel name uh which was like test uh Damian Slack and he was in a different workspace where that channel didn't exist. So it just failed silently. Uh so I added a bit of code to detect this kind of empty text array. Um so it will fail now. Um, but it kind of goes to show you just kind of the limitations of MCP. Yeah. So, the GitHub agent, uh, pretty straightforward. It's it's it's probably the the most basic of of of the the three or four. Um, so it it just creates a GitHub issue. Um, super simple. Um, but you could imagine, you know, how you would extend this, right? Maybe it's going to open a PR, right? maybe it's actually going to implement uh the fix for the bug that was reported uh in the meeting. Um and you can see how down the line as you know AI gets better and and and things really improve that a lot of this automation is going to be driven by human interaction, right? So you know speaking with people and posting messages in Slack and talking and GitHub discussions um is going to trigger AI to take action. Yeah. So the bench agent um it can it can do a lot. Uh and that was actually one of the problems that I found with A2A is that like the more functions and capabilities an agent has and the harder it is to describe the agents capabilities um in the agent card. Um so the agent card is essentially like the public um information to any other agent of what that agent's capable of. Um, so I had to really just pair it back and I said, "Look, you know, you can do a handful of things. I know you can do more, but like for now, these are the few things that you can do." Um, and it's able to go off and like, you know, browse the web, do research, uh, data science, all sorts of things. Um, so we're just going to use it for, uh, researching the company and the people, uh, in the meeting transcript. All right, here we go. Demo gods. Uh, before I start, any questions? Yeah, you mentioned some limitation on the number of agents. Yeah, so the the Genkit implementation that Google provide uh limits you to five maximum kind of sub agent calls uh per turn. Is that a hard? Yeah, I I couldn't get around it. The like there was this max like setting but it didn't work. Yeah. Yeah. So, it's something I'm sure they'll fix eventually. But, um, it it was an interesting issue. All right. Let me see if my uh my code is running. Yeah, I think it is. Yeah. So, it should be here. And actually, I'll show you the the MCP server as well while while I'm here. Yeah. So, this is the MCP inspector. It's um an open-source repo as part of the model. Sorry. Yeah. At the back. Yeah, that's actually in the agent card. So that'll be in the index.ts of the of the sub agent. Yeah, I'll be going through the code in a little bit as well so you can see it. Um yeah, so I'm connecting to my Zapier MCP URL that I got. Um, so I just copied this one, dropped it in. Um, going to connect over SSE. Um, and this allows you to, you know, list the tools, call the tools. Um, and it's quite interesting now that Zapier has added instructions, um, as a mandatory field on actually all of their, uh, MCP tools. Um, so you don't actually need to fill out the, uh, the fields anymore. So you can just give it natural language. So, this kind of suggests to me that they're using an LLM on their side to figure out how to populate the fields on your behalf. Um, which is interesting because it's going to cost them a fortune, right, as more people adopt it. All right, so this is the uh the agent dashboard. Let's just make sure everything's working. Yeah. Uh, you can see of a couple of previous ones that I ran. Um, this one is actually the one where the Slack uh thing wasn't found. So, when I was testing that, my mouse isn't moving. There we go. Um, yeah. So, I put in like a, you know, typical unknown uh Slack channel. Um, and then it it detected that it couldn't find it um based on the heristics. Not sure why my mouse isn't moving. There we go. Yeah. So, you have defined four agents here. Mhm. So, All 82A agents. Yeah, correct. Okay. So, maximum you can go for A2A agents is five. Yeah. Uh when when I got to five, that's when I got the error. Yeah. So, I think four. Um um Yeah. And the the host agent here. So, these are the host agent logs. Uh you can see it connecting to the the different agents. Uh this agent's just running on a little dinky uh EC2 instance that I spun up. Um, and it goes through, learns about the agents, you know, processes, web hooks, like you don't necessarily need to go in here unless you you get a failure. Um, Slack agent, pretty similar. Um, it's it's basically just sitting there waiting for another agent to connect. Uh, when the agent connects, it it uh communicates with it. Uh, and you can see here the the bench agents running remotely. Um, the reason I don't have uh verbose logs here is because it's remote. it's not under my control, right? Um, so the A2A logs for that agent are actually on the EC2 server. Um, which kind of brings up another question about how do you debug when an A2 agent fails, right? Um, yeah. So then on the web hooks page, um, so this is the the only web hook that's preconfigured. Um, and this basically explains, you know, to the agent what it's actually going to do when this web hook arrives, right? Um, so it's going to process the incoming web hook. Um, we have a little prompt template here, right? So it, uh, tells it what the agent capabilities are, how to analyze it, right? Um, and then we have the processor config, right? And, and this just kind of tells, hey, these are the agents that you have access to as part of this uh, web hook. Um, this will become important when you've got, say, a 100 A2A agents and you only want like two of them to to interact. Um, and then here we have a test. Um, so this is just a fake transcript that generated with a with an LMM. Um, and when we send the web hook, you can see here it's processing and hopefully the demo gods will will do me good here. And it does take a little bit of time, right? So the host agent has to process it, then has to reach out to the sub agents, you know, get all the information. Um, I think the the bench agent probably takes the longest because it's actually doing its own subtasks as well. Okay, we got a we got a Slack message. That's a good sign. Okay, so Snowflake is interested in Slack and GitHub integrations. Very cool. Um, we have the GitHub. So, I don't know why my mouse keeps freezing. There we go. Yes. So, we should have a GitHub issue. Here we go. Yeah. So during the trial, the AI mclassified the severity of the bugs. Engineers need to investigate and fix the issue, right? So it's re really simple use case, but you can imagine that that transcript is probably going to be 10 times longer. You know, a lot more information in it. Um and and it will just work, right? Um and then we also have the bench agent. So um oh, looks like it's waiting for results. Um so it's going to research uh the company. Uh I think I did one before where it just returned a result. Let me see. Yeah. So it basically goes off does a research into Snowflake and all the participants of the call um and returns that information. Um and this can kind of get as complex or as simple as as you want it to be. Um and yeah, so when you're using the application and you have it up and running and has anybody managed to get it up and running? Wow, impressive. Yeah question you're using bench agent to do the orchestration that's why you're having it right uh no so the bench agent is just like think of it as a third party agent that we can leverage so that the host agent is doing all the orchestration okay so like what is the actual role that agent is playing like what is it actually doing it's doing research on companies and people just another agent. Yeah. So, it's an agent with a load of different capabilities and it's it's basically just um orchestrator isn't local. Yeah. Theo so the these three hosts Slack and GitHub are all local. Yeah. I was like I think I thought orchestration. Yeah. No, Bench is just a um like it's in the repo but um you need an API key for it and um we're we're launching in about two weeks. So uh I just made it remote for the for the purposes of the demo. Um so what about the host agent though? Sorry, the host agent is it uh the zap year agent or no? So the the so all of these agents are A2A agents. Um the Slack agent and the GitHub agent have MCP tools to Slack and GitHub through Zapier. Yeah. Um I can actually show you a diagram that might might explain it a bit better. Yeah. I don't know if that explains it better, but but the orchestration does happen on your local. Yeah. Yeah. Everything's happening on my local. So, if I go into the into the codebase, uh have the agent logs. Um so, this is all happening here, right? So, it's sent to Slack to T or is that readable? I go one more. Yeah. Yeah. So you can see here the transcript came in um and then it got a response from each of the sub agents and then completed them and it did all of this in parallel as well, right? Um sorry, is that a question? Yes. So in your example here, which agent would handle human confirmation? Let's say we want to have a create the test button in spec here. Which agent would handle that part? Do you create a new agent for human confirmations? Do you keep the old one? Yes, you need a staging area for for actions. Um, so it's not something I've built into this. Um, there's a lot more you could do here. Um, but human confirmation would typ typically be done through like a draft, right? So you would maybe pop up a Slack message with some actions. Um, and then when somebody clicks that, it would communicate back kind of like a secondary pass web hook. Uh, you might need to persist state though. Yeah. Yeah. How do you consider the security of this endpoint controls of different vendors communicating from endpoint? How do you manage the security? Yeah. So, as a part of the A2A spec, you're going to have some sort of authentication, right? Um I've just exposed everything, right? Like it won't exist tomorrow. So, there there's no security implic implications. Um, but essentially you're going to you probably have to have a subscription with the company that's providing that A2A agent. Uh, because it is consuming tokens, right? Um, I'm I'm not sure exactly what A2A have in plan. Uh, it's still pretty early days, but um, with MCP, it's a little bit further ahead. It has OT uh, header authentication, things like that. So, imagine something similar. And how about CISA governance like LM firewall all those uh benchmarking autobenchmarking and u also the guard rails etc you do you have a separate agent or everything is being you you'd probably manage that on like an Amazon bedrock or something like that right and you would just you know use that guardrailed LLM um from behind there you don't have to use Gemini here either Yeah. And then that host agent is kind of like the planner and each um do you see like becoming like a talking to each other? Um um I guess you could but I I don't know if that's the intention, right? like um then they just become hosts, right? When they talk to each other, um like if you think about it, like if you have no knowledge of sub agents, um how would you how would you know to talk to them, right? You would have to then become a host agent yourself, connect to that other sub agent to to do that. So I I don't know if that's intended in the A2A spec for sub agents to communicate. Yeah. So with the host agent um and the orchestration that it's doing is it actually managing a combination of all the context windows or like do you hit a limit quickly? Yeah, so all of the context windows and this kind of uh is something I'm going to cover now in a second as well. Let me uh just go back to the slides um which is a good it's a good segue. Um so yeah, one one of the benefits of like A2A or or any sort of sub agent uh framework is that you're you're not consuming um the tool results into your context, right? So like when you say hey you know um and I think of an example later on but if if you have a load of uh Slack messages or GitHub issues or Salesforce opportunities and you want to analyze them and maybe produce like you know summary of categories and counts and the only thing your host agent cares about is the summary of categories and accounts. It doesn't care about the like individual details right because those have already been processed by the sub agent. So the sub agents context gets big, not very big, but like as big as the task demands and the host agent only incrementally grows by the the business value it got from that agent. Um like one of the challenges at bench is you know we have so many tools right like the context can blow up very quick. Um so you know very early on we decided okay we need to have composability. Um so that means that bench can create its own internal bench agent um to avoid that context growth problem. Um and we're even thinking of going one step further whereas like you know should we have an agent for every single tool um so that every single tool is protected from the primary uh prompt. Um so you know as you add more tools like the tool definitions themselves I think we're up to like you know 10,000 tokens just for tool definitions alone. Um, I added the Asana MCP. It added 11,000 more tokens, right? So, like, you know, a lot of these MCP servers like they're, you know, they're giving you a lot of information. Um, and you may not actually want that. Uh, and that's actually one of the challenges with firstparty MCPs is they expose all their tools and that's one of the benefits of Zapier where you can pick and choose which tool you want to use. Yeah. Yeah. I was just going to ask why do we need Zapier? Zapier is just a really easy way to to use uh MCP right now. Um I think like Linear uh Asana um um a few others have added like first party MCP servers that are much better than what Zapier exposes. Yeah. So, so why does context size matter? Um, so AI agents accumulate context like as they work and you're supposed to keep like all of your tool calls, right? What you sent to the to the tool and what you got back, you're supposed to keep that in your context so that later on if you, you know, ask a follow-up question, it still has access to that data. Um, and that becomes very challenging, right? So you've kind of got two options. is like, okay, do I just prune, you know, old tool calls and now the the agent gets dumb or, you know, do I figure out some other way to do it? Um, and cost is a big challenge, especially when you're doing prompt caching. Um, so with prompt caching, it it enables you to essentially put a marker in your context and say, hey, look, when I make my next request, I want everything in my in my context so far uh to be cached so that I'm not going to get charged for it. Um but the cost to actually push that into the cache uh is about threex the cost of of making a single request with that context. Um so that means that you have to be very you know diligent in what sort of uh context management strategies you use. Um you know I was running simulations cuz I I couldn't really figure out like what is the optimal um you know caching uh strategy. Uh so I ran simulations based on usage data um of like you know what's the typical context growth how many turns you know on average like what percentage of of users only send one turn right should we should we cach that one turn if they never ask another question right probably not. Um so you know it probably gets down to the actual you user level. So, if you have a user that always like puts in new prompts into the same chat and never opens a new session, um you're probably going to want to, you know, continuously uh cache their context. Uh but you might have another user who always creates a new session for every question. Um and then just figuring out like, you know, what is the context growth? Uh I think we figured out was around 30,000 tokens was the optimal um kind of across the board for everybody. Um, but that also comes up with false positives. So sometimes you can end up caching the last turn of of a conversation. Um, and and that's going to, you know, cost you a lot more than it than it should naturally. Yeah. So the the great thing about the sub agents, right, it protects them. And this was the GitHub kind of example I was giving you. Um, but this applies to pretty much every uh tool. So, like if you're ever integrating with a system, you're probably going to run into issues like why do I have to call, you know, list Slack channels every time to get the channel ID for the channel name that was provided, right? Cuz like nobody's going to provide like in a chat the channel ID that they want to post, right? It's a it's a UID. It's it's not memorable. Um, so then you get into the question of, okay, well, do I just cache the list of channels and and when do I update that list of channels, right? like what if the channel was deleted, renamed or a new channel was added. Um yeah, and then the the cost is is really probably the biggest one. Um yeah, so the the benefits of this lean context, right? So your sub agents have that isolated context and and that really just allows you to um be be super like fast, low latency, low cost. Um, and if you ever need to go back to ask another question, you know, you're going to like spawn that uh process again, right? Um, so maybe if you're in control of these other agents, you you might want to have some sort of like uh I don't know five minute TTL on previous questions, right? Um, and then yeah, the host agent only processes the summaries. Um, and the raw data is discarded after processing. Um, yeah. So, I'm going to jump back into the code here. Uh, just kind of walk you through uh how it all works. All right, we'll start with the host agent and and you notice a few other things, right? So, there's MCP. This is just your standard. Sorry, I thought something. Um, yeah. So, this is kind of your standard MCP client uh code. Uh, just just allows you to consume um the MCP uh calls coming from the LLM. Um, we have the the GitHub, right? So, this is going to be um what it sends to that Zapier endpoint. Uh, it's going to call GitHub create issue. Uh, and then the Slack agent is going to do send slack channel message. Um so these are just kind of like the MCP client tools that the uh individual agents will use. Um yeah so this genkit um this is based on on what they provide in their in their sample repo. Um you you can use a different model if you want right you can change you know the the settings on it. Um but this essentially uh spawns you a new instance of what's going to communicate. Um, this just loads the system prompt. Um, I can open up the system prompt here. Um, so right, it's got a critical workflow. It's going to do these things in this order. It's got a few steps, you know, discovery. Uh, uh, this is actually something I noticed like if you don't tell the A2A agent to call list remote agents, it just won't, right? And it'll try to answer everything on itself. Um, you know, it can very easily fake sending a Slack channel message and be like, "Oh, I just sent it for you." I say, "No, you didn't. Um, you know, one of the things I've noticed, uh, using cursor is like every time I catch it doing something wrong, it says you're absolutely right. Um, I even tried to prompt that out of it. Um, and it's not promptable to get to get it to not say that. Um, cool. Yeah. And then the the index. So, this is actually where the agent card is. It's a little bit long. Let me see. I think it's up here near the start. There we go. That was line 1200. So, I'm not near the start at all. Um, yeah. So, this this is what the host agent exposes if somebody else wanted to call it. Um, so it has these abilities to list remote agents and send tasks, right? And then if we compare that to the to the GitHub which is uh a lot smaller. Um there we go. Yeah. So the GitHub agent can create GitHub issues, right? Um it's got the ability uh to do various things and um it has a list of skills. Um and this is all that the the host agent really knows about this agent. Um, so you could imagine how big this might get if you were to, you know, implement every single API that say Salesforce has or something like that. Um, and in a lot of cases, um, at least with Salesforce, rather than implementing, you know, wrappers around the APIs, you're probably just going to want to use like the SQL or the so-called directly and let the agent actually write the queries. Um, there's a lot of flexibility when you have, you know, direct database access essentially. um because the the LLM can, you know, bypass, you know, the API layer and just go directly to the to the database. Um and then the um GitHub agent prompt, right? So, it's got some uh things. Um this is something I had to add because it it insisted on um mentioning who submitted the bug report, right? So, there there's definitely concerns around, you know, PII uh leaking from your, you know, internal meeting transcripts and ending up in GitHub, right? Um, and that kind of goes back to your uh your question about, you know, how do you audit what's coming out of these LLMs, right? Uh, and you can do that in a number of ways, but it it wouldn't be a part of the A2A spec. I think it would just be the LLM you connect to has those guard rails in front of it. Uh, and you you're just using that LLM that has the guard rails. um similar Slack um excuse me has a a very simple um uh agent card that I can't seem to find. Um and then if we jump over now to uh the host config. Um so this is essentially what configures um the web hook, right? So the web hook has essentially a config that tells it like what it's doing and and you can see that in the UI as well. Um and then uh within the A2A folder we've got the client and the server. Again the these are just pulled directly from uh the A2A repo. Um I don't think they've actually exposed uh types or packages yet. Uh which is kind of confusing. Um but essentially you can bring that stuff in there. And uh then the web hook server. Uh so this is just a web UI. Uh initially I had this whole thing done through the CLI. Um you know coding with you know tools like cursor or augment code. Um CLIs are way easier for AIS to actually write they're going to be able to test it uh interact with it much better and and be able to uh produce those outputs. Awesome. Uh so yeah I'm gonna going to shift over to kind of Q&A now. Um so yeah, anybody any questions? Yeah. So um I want to talk evals for a second. So like um I assume that you manage or I don't know. I mean you manage them probably at the at the agent level. Is there any type of like distributed evalu? Yeah, I I haven't done much evals on A2A. Um I still think A2A is a bit too early to go into production. um like even MCP is is kind of borderline. Um like there there's a lot of rough edges. Um I think you can achieve like much better uh things if if you're in complete control of everything, you can achieve much better results, you know, with your own local uh function calls. Yeah. Any reason instead of Python? Yeah, you can use any language. I think actually uh the 82A framework is better in Python. Um I just prefer uh TypeScript myself. Yeah. Can you tell more about the caching? Is caching provided by the model providers or we implement our own caching? Yeah. So you implement your own caching. Um so you decide you know when to move that cache marker uh how to manage it. Um it can be tricky and and I don't think there's very good information available online on on what the best strategies are. Um when I was doing the simulations I I used like linear growth, exponential growth, um you know fixed size and and kind of compared them all. Uh they all worked out between 25 and 35% cost savings. Um but like in practice what you'll find is you're going to have outliers where you know the cost of a session kind of balloons because of you know you you cashed at the wrong point. Yeah. Yeah. So each of the agent can be talking to their own like finer. Yeah. Yeah. So they they all have their own um which is kind of in contrast to MCP where the MCP wants to use your LLM, right? Because it doesn't want to generate its own tokens. So yeah. Um about the authentication and authorization to MCP or agent agent authentication or Yeah. So there there's a couple of different ways. Um, so, uh, within the authentication, you can have, uh, headers that do the authentication. Um, I believe if you drop in an OOTH, uh, URL, you'll also get an OOTH popup. Um, I really like the OOTH authentication because you're getting the user's, you know, ACL, right? Um, and that means that, you know, what that user can access um, is specific to them. you have to Yes. So, it's going to be dictated by the the remote uh server. So, either A2A or MCP. Um if you're running your own, you can choose what you want to run. Um there's different transport types as well. So, standard IO is something that you would use locally. So, like imagine you wanted to create like a file on your desktop. Um you're going to use standard IO typically to interact with local. Uh and then SSE was serverside events that got deprecated in favor of streamable HTTP. So sorry. So for example we are interacting with a Salesforce agency let's say and each user has different authorization for example which employee A probably have access to the some sort of tables employ yeah that will typically be handled uh through an OAT MCP server right So they're going to essentially log in as themselves as part of the connection and then they're going to save that refresh token for later use. Yeah. How would you describe the performance um for security especially you explained very well about authentication etc. But I'm looking for more explanation towards encryption asymmetric encryption and also there is a possibility of certificate manager and all the way to the end of the entire architecture. So how would you describe the performance and see I'm looking for some financial application this architecture what you have described is pretty good but uh similar on the financial applications as well as uh some department of defense or some kind of applications highly in highly secured environment where it's all both combination of asymmetric and symmetric yeah you're you're probably going want to run like the LLM yourself and you're more than likely not going to want to interact with anybody outside your VPC, right? In those cases, um I I don't know if you would want to consume a third party MCP server or A2A agent uh in a highly regulated environment, right? Like, you know, HIPPA compliance, financial stuff. Um if you do have the ability to do that, right, you're going to have some sort of agreement with the service provider that provides those tools. Um, and you're going to, you know, do transport over HTTPS, you're going to have maybe mutual TLS both on the A2A agent and the remote agent. Uh, and similar with the MCP server, you're probably going to have some sort of IP whitelisting, right? Like there's there's a ton of things you can do around that. I think they're out of scope of of the actual protocols themselves because, you know, essentially you're over an encrypted line, but uh, typically there's there's more to it than than just that, right? So you're playing around the end point controls on this and that's really scary when dealing with Yeah. Yeah. And like if if these are your own internal MCP servers and your own internal A2A agents maybe from different parts of the organization um you know they'll all live inside your VPC and they're probably never going to talk to the to public internet. So your the sol the answer I get from you is stay with BPC and stay away from uh in that case stay away from endpoint um security which means stay away from MCP or A2A. It's so the these are just protocols. Um it's really up to you whether you want to connect to an external third party and that's going to be your own security posture. Uh it's not really going to be defined by the protocol itself. Yeah. Keep them away from the subet or bring them inside the subet. Which one would you prefer? I I I would liken it to like I found a USB cable. Will I plug it into my laptop? Right. So the USB it's not its fault, right? Like USB is just a a standard. Um it's what that USB is connected to is the risk, right? So like if you're willing to find a dongle on the street and plug it in, you know that that that's really going to be your security posture, right? Yeah. Okay. So um how much heavy lifting do you have the orchestrator do? Like you ever hit the scenarios where uh you have the orchestrator interprets the response from a sub agent and then maybe does a retry with a better prompt. Loose or anything? Yeah. So, so one of the things and I I kind of prompted it out of this uh workshop just to keep it simple is um like the the bench agent wants to have a conversation with the host agent. Um but I I didn't want to kind of implement that back and forth because it was going to delay the uh the web hook processing. Um but you can have backs and forths between the agents. Um and it's probably desirable as well, right? like if if for whatever reason the host agent doesn't give sufficient information, you know, the the remote agent is going to be like, "Okay, you know, I know you want to update an opportunity, but you didn't tell me which opportunity." Right? Um I mean, I could even see scenarios where you have uh an expensive LM that you have on reserve that you go to with a cheaper LLM, agents aren't giving you what you want. Like, sorry, just thinking through stuff. Yeah. And I I I think like LLM cost and capability is is a big challenge with a lot of these things because you know if if you're running say cloud for opus and somebody for whatever reason asks you to summarize like you know five sentences h it's going to cost you a fortune right so you need uh intelligent rooting logic on like does this task need the entire context right does it need 20,000 tokens of a system prompt to summarize you know a short bit of text And that's one of the challenges that you you'll run into where you you kind of need a like a rooting LLM in front of these complex agents so that they can actually figure out you know how deep do I go. Yeah. Similar to the routing orchestration question, I was wondering like if you wanted to post a Slack message that linked the GitHub issue, for example, I think you'd probably prefer your architecture to go back through the host to make that decision rather than let the GitHub agent directly. Yeah. So the the host agent wouldn't run the uh the calls in parallel, right? So there there's actually a flag whether you want it to go in parallel or not. Um, so it would have to say, "Oh, I need to create the GitHub issue first um before I talk to the Slack agent, right? Since I need that URL." But in general, you'd prefer to have those decisions go through the host rather than even allow. Yeah, absolutely. Yeah. Yeah. Yeah. I want to ask that the context slicing for the sub aents that is entirely happening through prompt engineering or are there other frameworks to like slice the context that will be going for different Yeah. So, so typically context management is going to be implemented in your own codebase. Uh the sub agents context management is more than likely going to be a third party's codebase. Um if it's one of your own agents, right, you can manage it as well there. Um but yeah, you're you're going to want to figure out like what's optimal for your actual like production usage. Um yeah but so you you will be using prompts in the host agent to to kind of guide what context to send to each sub aent, right? Yeah. Yeah. So so what you what you send is typically like a question or a task. Um it's usually very small, right? Like you you don't you don't send the full meeting transcript to the Slack agent to to do what it's doing. The host agent processes the transcript and then decides what the tasks are. Um, so like if I look down here uh and actually I think I can see it in the dashboard. Um, yeah. So this is actually what the the host agent sent uh to the GitHub agent, right? It said create an issue in this repo title this, you know, with this description and title. Um, and then the the GitHub agent its task is to extract uh three bits of information, right? So what's the instructions to give the MCP server? What's the body and what's the title? Yeah. context which we want to send for each and every so you show earlier that's pretty much the understandation ID Yeah. So, so Zapier, uh, the SSE implementation doesn't actually require headers. Um, I think these are just left over from from something else. Um, so there's actually no authentication and the URL itself is kind of like a secret key, right? Um, so like if I disconnect and and reconnect without the headers, I should be able to uh Yeah. So I can I can still query it. um they they've moved away from this approach right now with with with more secure kind of uh setups and you you'll notice in their thing right um they've kind of deprecated that and you know treat this URL like a password right um what's your experience in using different workflows like for example Gemini and also did you use for this kind of work. Yeah. So, so we we typically lean towards Gemini for large context um and um Claude Sonnet 4 uh for tool calling. Um Claude Opus is better, but it's not like 4x better. Um you know, and when you compare price to performance, right, like you know, 5% better doesn't equate to 4x to cost. you talking about Gemini Flash or or Pro? Yeah, so we'll use Gemini Flash for simple things like summarization, right? Um you could use Claude Haiku as well, but I think I think Google's kind of taken the lead in in price performance, you know, from an economic standpoint. Uh but Claude is still the kind of king of tools. Uh they they created MCP, so they kind of had a head start, right? What about the hospital? Yeah, we we have Deep Seek hosted in the US. Um, so we've been trying that out. Um, I I think Llama has kind of fallen by the wayside a little bit. Um, and yeah, Deepseek is just, you know, the clear winner right now. Uh, they also released a new version there, I think, on the 28. Um, that's kind of up there with 03 level models. Um, we we actually don't use reasoning models uh for our agents. Um, a lot of the time when you're when you're building, you know, agentic agents, um, a reasoning model isn't really needed. Um, like unless you want to, you know, pay a fortune for some long tiging task. Um, you know, we we we can achieve kind of that reasoning level uh, with just the standard models and and browse and a few other tools. Yeah. So, um, like a third party assume like if Stripe has an agent card and stuff, do you pass instructions for like what like exactly what you want back in terms of like I'm just imagining another third party agent blowing up your contact window because they're flooding you with information you don't care about. Do you handle that through the prompt? Are there other tools to do that issue? Yeah. So, uh, one of the solutions to that is you actually just spawn another agent, um, to communicate with either the tool or the agent, right? And that's one of the things we we have in in bench. Here's some of the slides that so I don't know uh, generate uh, five images uh, in subtasks. So you spawn a sub aent to sort of like absorb the context flood for lack of a better term. Yeah. So the the sub aents just kind of protect you, right? Um and you know like when when you're spawning these things you can do things in parallel. Um actually if I expand you can see the thinking as well. So you can see like as it's going down through it, right? It's it's doing a lot of work um that you don't want in your context, right? like you you don't want all of your thoughts bloating your your context. Um but you also don't want all of your tools bloating your context either. Uh you don't want images bloating your context. You want the ability to analyze an image but you don't want like you know 100,000 characters B 64 in your context. So there's there's a lot of kind of optimizations that you can do there. Um but yeah did that kind of answer your question? Yeah. If you to troubleshoot something like this, it's probably Yeah. Yeah. So you can see here now it's spawning these subtasks. So these are all essentially like instances of bench that will keep that context out of out of my way, right? Yeah. Yeah. What have what you have been using for observability on your agents? Um we we just kind of roll our own right now. Uh there's a lot out there that you can use like uh agent ops is a pretty popular one. Um but yeah like if if you really want to build your own uh kind of custom observability layer um you know you're like like agent ops doesn't really support this concept of composable sub aents. Um so it's not really something that it could model uh correctly. Uh, but we've got some nice pictures of cats. Uh, and yeah, I know we have a few minutes left, but if if anybody's interested, um, I have $50 in free credits. Um, this hasn't launched yet, so you're getting kind of early access to it. Um, and yeah, we'll I think we'll be in public beta in about two weeks. Um, so yeah, try it out. Like, hit me up on LinkedIn. I I'd love uh feedback from you all. you're you're all probably, you know, at the forefront of this uh AI stuff and um it's changing every day. So if you log in one day and it looks completely different, don't be surprised. Happens mid demo for me. Yeah. So you mentioned a lot how hiding context and sub agents is a good thing, but haven't you had cases where you actually then end up missing something important, some small detail, and then how do you resolve that? Does the agent actually go back and ask for that or do you Yeah. So you can keep references uh in your context. So you you might say subtask ID123 and then when the agent's like, "Oh, I wonder I wonder if I have this information. It's just not in my context." Right? And so it has to be smart enough to know when to actually go in and look at that. Um and it can be a sub agent that does that analysis, right? So you could say, "Hey, sub agent, can you just look at all of these IDs and tell me if you can answer this question?" Yeah. there are a lot of right so you mentioned in the beginning about right so there are a lot of discussions saying that it's a way for using MCP for agent to agent communication right because the agent can be a service the client at the same time right so what is your opinion about that you know It's the million-dollar question, isn't it? Yes, that's that's why I asked. Yeah. And I I do think you can achieve easier agent agent communication with MCP. Um but if it's a remote MCP server, I think A2A actually is a little bit better. Um because you have somebody else paying the the tokens and building the agent. Um, like if if all you're getting from a third party is a list of tools, um, those tools may not meet your needs. Um, but if you're getting a fullyfledged agent from that third party, then it might be able to figure out like what it can do with with even private APIs, right? Maybe maybe that agent has direct database base access and it's able to actually on the fly, you know, create the API you need, right? So, so the tradeoffs basically about which is important about cost and who going to pay the for the tokens and whatever something like that can be like you're running the server maybe using MCP going to be easier but am I correct I don't know if I know at the end of the day who's going to pay for the tokens right yeah and I think who pays for the tokens is kind of secondary Right? Like at the end of the day, it's about business value. And if you can get the business value from a tool, right, like send Slack message, um like that's great, right? Like sending a Slack message isn't hard. Um but the implementation of the search function of Slack is is actually not great, right? Um whereas compare that to some of the other uh MCB tools like Linear, uh the search function is actually pretty good, right? Um but then you you start to run into performance uh challenges as well. So like if I want to search 100,000 opportunities in Salesforce um and figure out like what's the close loss reason counts and categorize them and do all of that like that that's a huge data processing challenge. MCP is not going to be the right uh tool for that because you're you're essentially going to say okay list opportunities now get the details of each opportunity right and you're going to make like 100,000 network calls. Um, at that point you're really going to want to actually, you know, ingest that data, you know, build an index, right? And I think, and this is kind of like an idea, is like we we may see a lot of these third party software providers essentially just allow you to access the data lake through an agent, right? Um, so like scoped data access, you know, just running complex queries super fast, you know, no no real like tool calls per se, but just like ask me a question and I'll go figure out how to get the answer. Yeah. Yeah. Those are Mhm. Yeah. So, you can achieve the same with at or with MCP. So, you could just have a a tool that's called talk to sub agent, right? Um and and it can work as the communication protocol. Um I actually built another uh application where I had an LLM uh claude 4 um just talk to its predecessor um just to see what would happen. Uh and then I did it for all the Frontier models. I was like hey look just have have 50 chat turns with your predecessor. Um and it was all done through uh MCP. Um Claude was the only one that taught it became conscious. Uh Claude Opus actually didn't which was strange. Yeah. as a developer right like how much control do you have over the orchestration so is the orchestration done by the LLM or do you have some control over yes so you're prompting the host on how to run the orchestration and that's probably one of the limitations I think as well of the the system is that like you're you're leaving it up to an LLM to make decisions um and a lot of the time like you know if you run that that same uh query multiple times you you'll get different results right? Like you know it's the exact same thing but it's like producing uh different outputs right like uh if I go into the uh GitHub issues uh I've al obviously been testing this a lot of 151 like it submits different issues right um and I think that non-determinism is is a challenge like maybe with changing the temperature you could kind of beat it out of it but you know the temperature is kind of the the beauty of LLMs And also on the context right like who is managing the context is the orchestration engine managing the context or are you managing the developer? Yeah. So, so in this codebase, I didn't do any prompt caching. I just and it's a very small system prompt. It's a very small kind of turn taking. Um, every time you restart, uh, the system, it basically just wipes everything anyways. So, uh, it's super lean. But as you build out more complex systems, uh, you know, context growth is probably the number one challenge because, you know, context growth becomes cost and cost becomes profitability, right? Yeah. And also like when you have like multiple users using the same application, right? So let's say like the Salesforce agent behind the scenes as an employee a I might have access to like one set of like context and the other user they might have like they might be from a different department and they can only query their department's data. Mhm. So how do you control that? Yeah. That would typically be oat, right? So so when you go in and you log in with Google based on my token. Yeah. Yeah. So based on your token and the the context would only get populated when you ask a question. So it's when you ask that question, it's then going off to get the data with your OA token and then bringing back your your kind of scoped data. I see. Yeah. Yeah. Yeah. I was curious about your thoughts on you touched on it briefly about exposing let's say like the agent as like an MTP server as one of an alternate interface to that. So there isn't a lot of great integrations for things like desktop and other things to use that. Is that something you've been thinking about? Like yeah, we're we're probably going to do MCP uh first. Um I I just built the A2A wrapper uh for this, but yeah, I think just being able to drop it into cloud desktop or open AI or whatever and then you have access to that kind of agent that has access to, you know, all your sub tools. Uh, one of the cool things about uh, Bench actually is that you can connect it to um, your uh, Slack, your GitHub, your Salesforce, right? Uh, we've even got this experimental meme server. Um, this is like a remote uh, VM uh, MCP that I wrote around the morph cloud. Um, and this is really cool because then you can ask like super complex stuff, right? Like you can ask like, hey, give me a daily briefing of my email, of my calendar, of my Slack, right? you know, uh, what do I need to do today? Um, and then it's all built around a team as well. So, we have, uh, teams integrations. Um, yeah. And is that like delegating to your So, there's no A2A today in bench. It's it's all MCP. Got it. Yeah. Yeah. And I think the big takeaway from this is like, you know, A2A is very early. It's it's kind of where MCP was, you know, four or five months ago, which is like, you know, forever in AI. Um, so it's it's going to take a bit of time. Um, I'm really excited though to see what, you know, Salesforce release and and all the partners that they partnered with. Um, I don't know if it was just a, you know, a flashy like we're partnering with everybody kind of announcement, but um, if they do release it, uh, you know, there could be a lot more powerful things you can do over A to A versus MCP. Um but the you know the fact that Zapier now has um sorry in here yeah it has this instructions um this kind of acts like like a remote agent right like you can you can just describe in natural language what you want it to do um and like maybe all the other fields just go away then right but then you're at the at the whim at the LLM. Yeah, this one's kind of a random question. Um I'm curious if you're seeing anybody do anything interesting from like an architecture perspective uh to get info that can only come from humans. So one of the things we've been testing is essentially making individual team members like the CFO CFO whatever tools of one of the agents and when it needs something that isn't in some other system only the CFO would have it literally messages uh the CFO like the actual tool is just a slack but the the CFO is described as a tool. So we're essentially making like the human the tool the agent rather than the other way around. Um, it's early days in terms of how we're testing. We're a little hacky with it, but I'm curious if you're seeing how how are you seeing people fill the gap of things that only the humans would have while giving that back to the agent. Yeah. And I think voice agents is a good example where like you could have a tool and and I had it integrated with Bench where like it makes an outbound phone call and finds out some information and then brings it back, right? So you can you can have those scenarios. Um, you may want two-way communication to avoid just like hanging around for a long time. Um, so you could have, you know, your agent be both a client and a server and maybe it gets called with like, you know, a task ID and it's like, hey, I got the response. Yeah. Yeah. We've been doing like a like a node essentially. We use likely and we've been using their weight. Yeah. hesitations on it. I I believe with sampling you could hack that together. So So sampling can take user input as well as LLM uh responses. It's also interesting the spec too is evolving like I follow the spec pretty closely and they have that elicitation is a new feature that they're adding where you can get input from the user. Is it architected that way where it's essentially like it functions like a tool like that's how you think of it from an architect? It's a new kind of protocol message where it sends it back from the server to the client and it asks for information from the user and then it continues after that. Yeah, I feel like that opens up the scope of like what the agent could do if you have a clear way for it to get information or Yeah. And then the the CFO is gonna have his own agent respond. Yeah. back. Yeah, very very difficultly. Um I I have a set of prompts that I use and kind of monitor, you know, how the context grows like when did we when did we move the cache marker, how much did it cost, you know, what was the context per tool. Um you know, definitely adding MCP servers willy-nilly is going to like bloat your context. Um, so we're coming up with ways to basically allow people to add MCB servers and then like hide that from the actual uh system. Also, when you have like agent to agent communications, right? So, let's say agent A calls agent B and agent B calls agent A. How can you make sure this uh recussion like when does it stop? Yeah, you you can have like a max turn, right, where you just kind of jump out of it. Um, like when I had the LLMs talking to each other, I just told them like take 50 turns. Um, you know, and it was funny as I was building that tool, I wanted to like talk to the claude for that thought it was conscious. So, I added a feature where I could just chat to it at that point in its conversation and but then the context kept like getting rate limited. So, then I was like, "Oh I'm going to have to implement, you know, prompt caching, uh, pruning." So then I added like 23 tools to the agent just to continue the conversation. I gave it like memory and all these other things and like it kind of funny how you start out with just I just want to have a long conversation and then you end up with 23 tools. Yeah. Just following up one question like when testing because you are using lot of external tools like lab or salesforce etc as your MCP servers but then you are writing on the real world let's say. Say say again. So you are basically creating a message in Slack or like writing something on Salesforce, creating an entry or etc. So but how do you test those systems like do you mock everything every tool or do you do something else? We we use demo accounts in like Salesforce, we have a sample data, Slack, uh we have a few agents that actually will go in and and just post like conversations. Um, and then there's like a a bench support user that will respond to those fake customers and then we can we can just test uh on synthetic data like that. So for every tool you will have a synthetic. Yeah. Yeah. You you can test in your production account but you can't really demo in your production account. Yeah. Yeah. So when you adopt agent to agent system, do you see an increase in the complexity of the task you can achieve but a decrease in the consistency of the performance? It it's kind of hard to quantify but I I don't know if A2A is is ready yet. Uh at least at least not for my use case. You know may maybe Salesforce can provide much better tools than like an SQL query MCP tool. Yeah. And and they they can just do a lot more than you can ever do in your code, right? Because you're you're only ever able to access, you know, certain things and and do certain uh calls and like if if if a third party can build a better uh system um that's opaque um then that might you know improve performance. Um I I think like fundamentally it always comes down to like indexing data. Um, so like you know the more data you need to process to get the business value out of it and the harder it's going to be to actually do that through MCP or A2A. Yeah. Yeah. So some of these interactions right this can be done through REST API right instead of what is the difference? Yeah and it kind of goes back to uh one of the earlier slides. um yeah when not to use A3A or MCP and it's it's if you have full control of of the things that you're doing right so like you know if if you are a Salesforce um you know and you're building your own internal Salesforce agent like do you need to use an MCP server or A2A no right you're you're you're actually able to run your own local functions that maybe access the database directly right um So like if if you're building something you know where you need file system access um you know do you need to use an MCP uh you know server running locally or do you just write some code that accesses the file system right I think the main difference is like in terms of how do you maintain your state right like MCP start up in a stateful resting your context magic it is really crucial to have MC whereas rest API you can't do that. Yeah. So like a lot of the time when you use a REST API you're going to be like querying like making a lot of calls to to build up the thing that you want to ask the question on. Right? So if it's like hey look at every Slack message in in this channel like it's not just going to be like one API call right just pageionation. You're going to have to pull it all into memory. then you're going to have to run it through an LLM, right? So there's there's still state uh in your application that's leveraging those REST APIs. Yeah, I'm curious about the task concept. Uh is that actually is that kind of LLM defined or do you have code for that? Is it more of a system thing? Which task context? Um so at least in the flow diagram you have Oh, is this in the repo? Does it so from CLI interface it says it sends a task to host a curious is that a proper task or is it just you know just what you call what what sends to it. Yeah. Yeah. It's just saying hey you know process this web hook as a task right have you explored anything where you're actually tracking a proper task and you're assigning tasks to agents and you have basically like you know like a planner where you basically have task a one two three is on this agent and so on and then in relation to the the question about human in the loop you could have task assigned to humans as well right both humans and agents yeah so uh we're looking at uh directed a cyclic grass right so dags um as a part of of bench sub agent tasks, right? So, you know, you you need to have some sort of flow control, right? You know, I need five things done and then when that's done, I need to do one thing with it, but then I need to send that thing to five other things, right? So, you kind of have fan out, fan in uh style stuff. Um it's very similar to like CI/CD pipelines where, you know, you might want to lint in parallel and test in parallel, but you know, you're you're building in serial, right? Uh yeah. So I was looking at code base and you have this defined like a GitHub MCP server and uh in a separate file under the GitHub agent you have also the genkit.ts where you are wrapping the MCP in another function call why is that like can't the MCP just interpolate with our A2A like why do we have to make rappers on top of that's a great question and and I think that's the fundamental question of A2A is like they they launched and they said oh yeah full MCP support you'll be hard pushed to find a single example online maybe maybe this is the only repo that actually has an example of A2A and MCP working together. Um, and it took a lot of work and actually I ended up uh having to use something called uh where is it? Genkit XMCP. That was the only way I could get it to work. Um, so yeah, they they don't really have like proper support yet. Uh, it was I I think if they had this this would have been a lot easier to build. Um, but yeah, hopefully in time. All righty, I think we're we're at time. Uh, thanks everybody for joining. Uh, hope you enjoyed it. Great conversation at the end. And yeah, definitely uh try out Bench, hit me up on uh LinkedIn. I would love feedback uh before we go live. Thanks. [Music]