How Building with AI Can Double the Throughput of Your Engineering Team — Brian Scanlan, Intercom
Channel: aiDotEngineer
Published at: 2026-05-15
YouTube video id: 4_VQBbs2iQA
Source: https://www.youtube.com/watch?v=4_VQBbs2iQA
[music] >> Hey, I'm Brian from Intercom. I'm this has been such a great conference so far. I've learned so much alpha and inspiration from the talks and all the chats and people. So Intercom is a 15-year-old privately held Irish-American B2B SaaS startup that has pivoted to be an AI company the weekend that ChatGPT came out. I've got about 1,400 people across Dublin, London, Berlin, SF, Chicago, Sydney. R&D is led from Dublin. Engineering is almost entirely across Europe. Four deployed engineers have kind of changed that up a bit. And this graph compares our revenue growth to the growth rates of publicly traded SaaS companies over the last few years. And you can see publicly traded SaaS companies kind of on the down. Intercom, this amazing view. And like we're booking the standard trends that SaaS companies have been suffering from recently. And now I'm going to shut Intercom down live on stage. I once did a live deployment during a talk and I thought that was impressive. So like Intercom has become the poster child for companies redefining themselves in the age of AI. New York Times recently did an article about SaaS companies reinventing themselves that prominently featured Intercom. And I'm being an AI company means a lot more just than slapping on, you know, lightweight wrappers or like auto complete some text field or whatever. Our agent our AI agent for customer support, Finn, has over 8,000 customers, industry-leading average resolution rates, revenues like approaching 100 million, launched the day GPT-4 came out, first product actually released on GPT-4. And we've been building AI features since about 2018 or so. And but you know, the modern LLM models have unlocked huge capabilities for dealing with customer support questions, completely obvious. And companies like Anthropic, Snowflake, Linear, Glean, LaunchDarkly use Finn for their customer support. So maybe SaaS isn't dead. And it works well for all size businesses. Also, we recently announced that we have our own model serving 100% of Finn. Like English text-based conversations outperforming frontier models like SaaS, cheaper, faster, better. And we're at like about 2 2 million resolutions a race resolutions a week. And we're also happy to sell direct access to our suite of models. I'm not talking about any of this though. So I'm a senior principal engineer at Intercom, been there for 12 years and I'm on our platform group. We take care of Intercom's uptime, performance, security, cost management, observability, our majestic monolith applications that we love, mostly Ruby on Rails, and all internal developer productivity. And another thing about Intercom is that we are obsessed with shipping. Shipping fast and iteratively is the best way to build high-quality products that customers love to use. And so developer productivity is something we've always invested in. Shipping is the heartbeat of your company. It's a great blog post that we did like many many years ago and Honeycomb made cool stickers about it. And obviously for the last few years, I've been spending a lot of time on enabling use of AI in our software development life cycle. So I'm going to kind of talk about that. And so unsurprisingly, we've been very excited about AI in general, you know, changed the whole company to build customer support using AI agents. And we've been impatient about getting its adoption and changing how we build across Intercom. You know, we went down some kind of familiar routes, you know, we're all using GitHub Copilot and then everyone started adopting Cursor and like we looked at Augment and a few other things. But ultimately, you know, say middle of last year, we've been dissatisfied with the results. Some good signs, some kind of tasks, some work made marginally better and kind of more fun. But you know, we we're pretty aware of where the models are going and the harnesses and we have a strong conviction that AI like from many years ago is going to change in all knowledge work. So last year, middle of last year, we set a simple goal. Let's double the throughput of engineering any year. And you know, we measure a lot of things in Intercom. We use like we do a lot of developer surveys. We use tools like DX. And but we picked code changes per R&D person as the primary way we're measuring productivity. Every measure is bad. Once you start measuring it, it's not a measure and all this. But also like we're like impatient about or like expect the overall throughput to increase. Like if we're like actually adopting new ways of working, putting AI into all of the different places, then we should expect a large throughput increase. And so 2x what we we call this 2x the name of the project and their team and everything. This is like wildly ambitious. Like when we published this back last June or something like that. Doubling productivity without doubling team size. But also kind of wildly unambitious as well if you like connect the dots and see where the models and coding harnesses are going. So in this talk, I'm going to talk about how we went about this, how we think about productivity, and a sneak peek at some of our internal data and skills and stuff. You know, this also coincided the work here with like the most notable shift in model capability and coding capability. And so, you know, we've all seen this and this was like one of our principal engineers posting just kind of like everyone else was in around Christmas break last year going like, oh my god, like the things have changed massively. And so that has contributed a lot to our success on 2x. So this is the kind of engineering leadership-y part of the talk. And so you need to be decisive and give clear executive guidance. And you know, do organizational change. And we've done a lot of things. We updated job descriptions. If you're not adopting AI in Intercom, whether you're a designer, product manager, engineer, whatever, you are not meeting expectations. Binary. And yeah, you have to say the same message over and over and over 100 times, every different forum, whatever. You just got to stay on message and constantly talk about the urgency of us doing this. You got to reward us as well. Like when people do good stuff, you got to like all the Slack channels showing like automating where people like automating when people update skills or do this that and the other. It's like it's get put into these channels. We celebrate stuff. People are showing each other different techniques and what's working for them and that kind of thing. We've done hackathons. We've done AI immersion days. And you know, all of these things are necessary to kind of bring people along. Like also, we staffed this full-time. We have a team 2x. That seems to be just keeps on growing and growing and growing. And you know, we're we're not just saying, hey, you got to AI everything. Best of luck. We're like trying to bring everyone like the hundreds of engineers, hundreds of people in R&D along with us. So you know, if you're in a medium or large organization, you absolutely need to have people and like your best people on this full-time. And and so we chose Cloud Clo Cloud Code as our platform. So prior to this, we're kind of only Divora and like letting people choose their favorite editor and this that and the other. And you know, there's like loads of people adopting Clo Cloud Code, loads of people using Cursor, loads of people using Augment. But we like we're a believer in platforms in general. And it kind of doesn't matter what you choose. But choosing one is important. You know, to a certain extent, you need to get away from model anxiety. It's like being multi-cloud. Like you don't get the compounding benefits of a well-designed platform if you're sending all your different work across different cloud providers or whatever. You're way better being all in on one and optimizing and improving that it works. And like unless there's like very specific or impactful reasons why you need to be spread across multiple agents or whatever. And so our vision on this was like to treat Cloud or to like work on to get Cloud to be able to act like a senior engineer on any technical task across of Intercom. And our vision here was like connect Cloud to everything. So anything I do on my laptop, Cloud should be able to do this. And that means everything. Like now of course, we're not reckless. We're not like just trying to let the thing go off and delete all of our databases. But we're like a mature company. We've got plenty of controls and permissions and audits and everything like that that gives us a lot of confidence to be able to like unleash Cloud in the same way that we unleash our engineers in our environments. And you know, we got to onboard it. We got to teach it all the stuff that we teach people when they join Intercom. All of our Rails conventions, our architecture, React patterns. Like we've built a lot of software in 15 years. But like standard testing standards, security rules, all this. Cloud absolutely has to know the Intercom specific information to be able to do the job. And most importantly, start using the platform for all technical work. And it doesn't get things right first time, hits an issue, goes down the wrong path, update the guidance. Like this is a flywheel that we're all contributing to. And so we've encapsulated a lot of this knowledge and context in engineering captures, skills, guidance, hooks to force these things. We spend a lot of time cajoling Cloud Code to work well. We do things like push out our internal Cloud plugins to everyone's laptops like bypassing the all the Cloud Code updates mechanisms because just it's, you know, it's you spend a lot of time debugging Cloud Code installs on like hundreds of laptops. It's like trying to install Python or manage Python installs or something. And so ultimately though, like every single part of technical work. So it's not just code production, it's not like more advanced auto completes. It's Everything. So debugging, testing, planning, all this kind of stuff. It should just be you driving clouds and ideally like driving it less and less and moving higher up the the food chain and it you know delivers real value, delivers the code products, whatever to customers. So everything's in scope. And like we think that even if the models and harnesses do not improve at all which is not definitely not happening. If anything like this capability curve is accelerating but like the building we have the building blocks today to to improve like basically move vast amounts of work in our software development life cycle to be agent first. Like they could just pause everything and we've just got this flywheel and we're going through everything and looking at every single piece of work and like the tools are good enough today to do this. I wrote some principles to help guide us along the way. You know, when you have you're trying to get hundreds of people to change how they work or understand what we're trying to achieve, you need to write things down and help them out. And you know, different principles should apply in different places but like you know, we believe that all of engineering is changing. Everything that you can do the agent must be able to do and that that can feel weird as well like when you're first connecting us into production systems, whatever. And yeah, like the our job is moving up the stack as engineers product builders, whatever. And like if I a long time ago I used to be a Unix sysadmin and you know, you'd like go into data centers, racking servers, cabling things, configuring networks and all that. And then the cloud came along and I moved up the stack. I you know, and people transition from being sysadmins to SREs. The work was more automation oriented, more impactful, higher paid. And so I think this is like we're kind of speed running this 100 times faster on a full industry scale. But I kind of feel like I've been through this before. We in Intercom are technically conservative. We like using single tools and just using them extremely well. So hence we end up with these Ruby on Rails monoliths and stuff. And so we're kind of applying this thought process as well to like you know, what is the where should our focus be? Where is our attention? Do we want everyone writing their own multi-agent orchestrators or opinionated workflows and you know, we want to build durable, testable, high-quality components and people to be considering like the lifetime value of what they produce and like you know, the tools, the specific implementations of these things will change over time but I'm pretty sure that writing down how to do work in Intercom will be valuable no matter what happens. Maybe it might be easier to discover in the future. That's like a problem at the moment. And so what this what this really means in practice is that we spend our time focusing on small, high-quality, durable, testable skills that do the job extremely well that we can you know, use data, use backtesting We've got like all of the work we've got this huge body of work and changes in code and incidents and everything and so we're using all of this to help form us and prove out that these skills are operating at extremely high quality. And you know, we then we and we also practice continuous improvement here, get these things to be self-updating, make sure that these things are very high quality. And yeah, we don't want to get stuck behind the curve like getting stuck because we've implemented a lot of our own own things. We just want to use things that have come available as Intercom ship or whatever. And maybe we might not stay on topic forever but like we're very we we're eager to get the advantage of somebody else building and shipping great software and capabilities rather than us having to build everything ourselves. So yeah, another thing we guide people to do is like you want to give problems agents not tasks. You know, a lot of the time people will even say in Intercom are saying like prompting agents say run this skill to do a thing which is mostly fine and still kind of necessary. I still do it a lot but like we're more kind of having to like move ourselves to be kind of just describing the problem or describing the task and let let the agent figure out what skills to invoke and what to do here. I have fun story recently I was brought into a security incident. We had accidentally published some snowflake table metadata to a public GitHub repository and I just habitually opened Claude code, told us to join a Slack channel, take a look. And I didn't even know that a skill existed that actually perfectly encapsulated all of our like data breach policies and criteria and what to do, how to analyze this. Claude just automatically downloaded the files, did full analysis, concluded it was innocuous, told me all next steps. And I like I didn't tell it to do this. It just kind of figured it out. It was done in like 2 minutes. And like that would have been a 20-minute task and kind of boring work. I'd have to go where's that policy and take a look at this, that and the other. And like this just felt like a little like it was a small example but it's like again, I just like gave it the problem of like taking a look at security incidents. I just figured out the intent and used a well-written internal skill that did this job for me. And and yeah, it I mentioned it even in Intercom like AI adoption is unevenly distributed. I think we're ahead of the vast majority of companies but you still need to help people understand where they're at and grow towards being highly effective at using agents in their work. I see VAEGI recently talked about like maturity rating for engineers and like our internal one is kind of similar here. You're kind of like trying to get through these different kind of levels and like ultimately you kind of end up mastering all skills and like knowing the tool inside out and ultimate like what we want people to do is like use Claude code for everything, automate your work, then move that to a skill, then get really good at writing skills and then writing write skills and approve the skills and then optimize the environment for agents. That could be everything from software architecture, maybe just to documentation or other approaches or other ways of doing things that allows the agents to be even more effective and optimized for what they're great at today. So here's where we're at. You can see yeah, wild inflection points after after going all in on one tool. That decision was made in December. We started rolling it out in January. And we've been just like we have reached the doubling PR throughput in faster than 1 year. And here's more like data from our internal dashboards. There's some interesting stuff in here. There's like yeah, number of pull requests out of our Claude code. It's like in the 90 somethings. You can see also we're starting to move into like our current bottleneck is code review and but you can see we have this like 17.6% uh approval rate of our automatic code approval and it's like a lot more in-depth than just like hey Claude, can you approve this? We've gone through a lot of detailed work to figure out again using backtesting and previous data and then getting humans to kind of label the outputs and figure out like get the confidence level of the automatic approvers and kind of shape the pull requests towards very safe and simple pull requests which probably always should have been that way. But now like they're just approved automatically and you know, we've also worked with our auditors to ensure that we're fully SOC 2, ISO 27001, HIPAA compliant, all that. You do not need humans in the loop to to meet these certifications. You do you do need to know exactly what you're doing though and make sure you've got like auditing controls and everything. And so by moving approvals to an extremely well-organized, tested and competent suite of agents including Codex for code reviews. I think multimodal code reviews are okay. I just like completely went back on my platform thing. Like >> [snorts] >> we've got a high confidence that like this stuff is not degrading environment or adding additional risk. In fact, I think it's removing risk because humans aren't actually as good as agents like when they're well-defined. Here's like skill invocation. I actually think the earlier numbers were a bit wonky. So like we hook up everything into Honeycomb. All the we've got hooks all over the place for basic information about like which skills are being invoked and things like that and that's internally available. There's no private information in this and everyone can kind of use it to kind of get an idea of like what's being used and where. But we also pull in all session transcripts into S3 for data mining, writing reports, guide like also looking to see our skills effective, that kind of stuff. So we've got like a feedback loop using the session data which is we we can get more out of this but it's we're we're we're doing some interesting stuff with it already. And this wasn't a goal but like I'm not particularly proud of like defects always increasing until recently but like defects are getting closed faster than ever. And like some teams have been inspired by the move to AI to think about things like backlog zero or crunching through hundreds or thousands of defects. So like some of this was like a bit deliberate and planned but just in that at the same time there's just like this natural deflation cuz getting through this work, getting through all the defects so much faster these days. And yeah, it's like we're just seeing this naturally. We've also been working with like Stanford. There's a research group there. We give them all our code and our code quality per their metrics has been increasing over the last while. And Okay, I'm kind of running out of time at this point. We have like hundreds of contributors, tens like tens and lines of code of code in our Claude code plugins. It's very active and yeah, I mean Claude Claude itself loves us. Here's an example skill. This is like not the most We've got like base plugins, things that like do all the session transcripts, session sinking, some safety hooks and things. And here's like a a skill I built which like it just it fixes flaky specs. We have hundreds of thousands of of tests and you know, they get a bit flaky over time and uh we don't we ship a lot so we just kind of barge through the kind of flakes but this this scale was not built like by me kind of sitting down and figuring out like oh what what are all the things he needs to do to fix flaky specs. I've worked in a feedback loop gave the gave the agent a goal and through like guiding us to the right place and working with us to fix a lot of flaky specs. It's written this pretty decent thing with all these cheat codes or like look up tables and relatively well organized using progressive disclosure and all that and it is like fixing stuff that if our most senior rails engineers were doing this I'd be like well they're amazing and yeah like a lot of other stuff going on like our CI melted we had to fix that cloud code is actually widely used across Intercom outside of software it's gone completely viral people are banging down our doors to like use console use consoles and yeah we're yeah we're thinking a lot about like the future of engineering product like should we just marriage all product manager design everything oh yes the single person team product experiments have been pretty interesting as well and I've even been shipping like codes like stuff that people can use in their agents to sign up to Intercom this is stuff that like I like I've just been using our skills to act as a product manager which is pretty wild so that's it I wish you all the best of luck if you're not doing pretty much all of this today you're going to be doing it in the very near future my contact details are at brian.scanlon.ie you can interact with Finn in the messenger configure with our CLI and you can check out ideas.finn.ai for a lot more information about Intercom and our agents Thank you. >> [applause] [music]