What We Learned Deploying AI within Bloomberg’s Engineering Organization – Lei Zhang, Bloomberg
Channel: aiDotEngineer
Published at: 2025-12-16
YouTube video id: Q81AzlA-VE8
Source: https://www.youtube.com/watch?v=Q81AzlA-VE8
[music] I don't have a joke about the dot. I don't have a joke about the uh hot dog either. So, I would just jump to the topic right away. Um, so my name is Lei. Um I lead the uh department of technology infrastructure in Bloommer. So we're basically a group of technologists focus on global infrastructure. Thank data centers connectivities um developer productivities uh thank SCS tooling and also uh reliability solutions think telemetry and instant responses right so um depends on audience sometimes uh you know you're familiar with what Bloomer is sometimes you don't so I thought it might be a good idea to talk a little bit about about our company um so there's no better way to talk about our company by sharing some numbers I want to highlight a few numbers. We have more than 9,000 engineers and most of them are software engineers. Uh we handle a lot of market techs uh which in the billions and 600 billions I believe. And um we also have tons of folks uh focus on AI research and engineering. So we have a more than you know really today's 500 plus employees focus on AI products uh for um sort of our customers. So takeaway here is we are I guess you know building a lot of software and use a lot of data to empower our flagship product which is called the Bloomer terminal and to really support our users to make the most important f decisions for them to do their job um the best. Um in the technical lens um a lot of time kind of to explain that we actually have one of the largest private network uh in the whole world. We also have one of the largest JavaScript codebase um in the world. Um we because the domain where I am uh so promot terminal is really you can think of a um software that supports thousands of different applications. uh we call them functions right um email is a function uh news is a group of functions um let's say fixed income price to yield calculation to spread calculation is another function um trading workflows is another group of functions so there's many many many different type of functions as you can imagine we kind of have to utilize different technologies to really support those uh functionalities uh we also increasingly more than used but also contribute to open source communities. um for this audience I guess I want to call out you know we kind of helped creation of the case envoy AI gateways and among many many other things that that we deploy inhouse and support the communities again in summary there's a lot of software there's a lot of data uh we kind of have to um figure out how to make the best of AI tooling to support us to do our engineering work all right so get to what is AI for coding Um we started about two years ago maybe a little bit more than that. Um and as I guess the rest of the world we look at the toolings provided and you know I apologize if if your logos are not here. Um but as you can imagine it's kind of like overwhelming right there's so many things and every day there's news about this is great this is great. Um so at the time we actually didn't know what all the AI solutions can help us to uh boost our productivities as well as stability. But one thing we knew at the time is um unless we deploy and try we wouldn't know what's the best way to benefit from all the awesome work and and you know a lot of folks are contributing to. So at the time uh we quickly form a team people start kind of like release um kind a set of capabilities so that people start iterating on um utilizing the toolings and then of course you know we are data company so kind of want to get a sense of how we measure the impact and um what we can do from the capability we provide right so we look at the typical developer productivity measurements We ran a few survey. Uh it was very obvious that people felt like there's much quicker uh proof of concept, people rolled out tests. Uh there's a lot of one time use scripts being generated and then the measurements dropped actually pretty quickly when you go beyond all the green field type of thing, right? And then then we start thinking like okay so what are the things that we should really be doing using all those wonderful things so that we can really make a dent um in the in the space and then at this time we also kind of like also [clears throat] be thoughtful of um unleash a very powerful tool right uh the the benefits is it's very fast the challenge is also it's very Right. Um for any of you who actually dealt with hundreds of millions of lines of code, you probably understand the system complexity is a at least [snorts] um exponential or at least polomial I guess function of your live code on software assets, right? So at some point you kind of want to be very careful uh what you do with your software assets. And what we thought so maybe we should look at some of the basics. One idea we had is um all right so AI for coding there's narrow definition of what coding is but there's also a broader definition of what software engineering right and then maybe we can also look into some of the work our developers don't really prefer to do for instance um some men work some of the migration work some of the I don't know men work and stuff like that so I want to give some examples of the things that we been trying and we think there's pretty good return investment. So the question we ask ourselves how do we evolve our codebase right the first one is all right wouldn't it be cool uh the day you get a ticket saying hey you know this piece of software needs patched at the same time you have a pull request with the fix with a patch and also with thinking why the patch happened that way right so it's kind of like we're trying to uh broadly deploy something called uplift agents um broadly scan through our codebase and figure out what the patch would be applicable and be able to apply those patch step back a little bit. We did have a reg based refraction tool um it works to some extent but it's limited right now with um LMS and other tooling. So we are able to uh see very much better results from the um uplift agents. So there are few challenges in case you also plan to deploy such capabilities. The first one is I guess any AI or ML it would be really nice if there's some detistic verification capability. uh oftentimes it's not so easy especially if you have test cases if you don't have good llinter if you don't have good verification the the patch can sometimes be uh uh difficult to to to be applied and uh one thing we also realized when we deploy AI tooling is the average open pull requests increased and time to merge also increased uh because you spin a lot of new code and then still we have to review the code and merge the code right so time to merge merge become a challenge sometimes and the last one is um I think it applies to any gen is the shift becomes what do we want to achieve rather than how we want to achieve right so the second example that I I want to share is uh the other area that people kind of like sometimes really impact our productivity in a negative way or impact our stability in negative way is how we handle instance so we're trying to develop and then deploy um yes response agents. Um now the importance of this is if you really think about GI tools it's really really fast and it's also unbiased right in IMA's instance it can go through your codebase really quickly it can go through your telemetry system very quickly it can go through your feature flags very quickly it can go through your um I don't call trace very quickly and in unbiased lens when we do troubleshooting sometimes we have this biased views it must be this. It turns out to be not the case. So there's many many interesting benefits um by uh deploying agents from this perspective. And then the second question is become interesting is imagine you have organization of 10,000 pe um let's say 9,000 people as I described a lot of people trying to fix those problems right and you have 10 teams who wants to build a pull request review bots. you have too many teams who wants to build a instant response agents right they become very quickly chaotic and sometimes can have duplications so before I talk about the p pass I going to give example of the uh instance response agent so basically this is what you know a in response agent will look like um the key part is we're going to need to build a lot of MCP servers to connect to the um the metrics and logs dashboards you have connect to the topology you have whether it's network topology or it's the um your service dependency topology uh your alarms your triggers right your SLOs's and then we kind of don't want people just start building MCP servers uh without a pay pass so we created a pay pass in partnership with our AI organization and I will talk a little bit what that means before at um I do want to explain a little bit some of the platform principles. Some company allow teams to be have a lot of freedom as at the same time responsibility in the sense a business unit can build whatever infrastructure whatever platform. um some organization have a very very strong tight abstraction of the service infrastructure and typically kind of have to use their platforms right so Bloomberg is kind of in the middle if you look at the golden ones we kind of believe in provide a golden path um with enablement teams so my team is really a en enabling team and one of the guiding principle for us is we want to make easy things extremely easy to do. Uh sorry, the right thing is extremely easy to do and we want to make sure the wrong thing is ridiculous hard to do. Right? So that's the guiding principle here. Now move on. So what is the pay path here? So the pay path is uh we have a gateway so that teams can easily figure out which model works the best. They can do quick experiments. they can um we can have visibility what kind of models being used and we can also guide through teams which model should is a better fit for the for the problem they want to solve. uh we have a two discovery uh basically MCP directory via hub so that let's say team A wants to do something they will go to the hub okay someone is building MCB server already maybe I should partner with them to build it together right uh tool creation and deployment is via a pass it's basically a um you know a standard platform service where you can do your STLC and and we provide runtime environment for you as well taking care of all the off and side of things as well so it reduce friction of for for teams to to deploy um their MT MCP servers. And then the this is kind of interesting is we want to make demo very easy so that or I really say prove concept very easy so that people can try have ideal generation uh because we believe in creativity come from some freedom of try different new things but we also want to make sure the production requires some quality control. um because at the end of the day stability and system reliability is at the core of our business. So this is sort of the pay path that we deployed um and enabled the rest of engineering really the 9,000 software engineers to do their job. Okay. And um with all this and then we start maybe okay yes we got p uh path we have some good ideas of how to evolve our codebase help our people right um now this is where I find that any new things any adoption of new things provide opportunity to leverage the strengths you have and also identify the some of the weakness that you may have. So um in Bloomberg we have a wellestablished training program uh it's more than 20 years so there's on boarding training depends on entry level it depends on senior level um so we have this whole training program to prepare folks to before they join a team and what we did is we just incorporate AI coding in on boarding training program and also show them how to best utilize them with our principles and our technologies right there's a huge benefits here because um if any of you run into the challenge of adoption somehow run into a chasm right the rest of orc is not uh adopt as quick as possible whenever we have folks join a company they learn how to do things in new way when they go back to their team they were like hey why don't we do that right they're going to challenge the some of the senior folks as well to say hey there's a new way to do this type of things why don't we do that so we actually find this program extremely effective uh to be a change agent for anything want to push out and then bunch results there's a lot more f familiarity and comfort with the tooling. Um and also the important part is there's a lot more nuance insights of where it's at value right the second one is um often times we run organization to push uh new initiatives so within Bloomer we have something called um a champ program and a guild program that's basically a cross organization or tech communities where people have similar interest and similar passion they get together and get stuff done so Um we had this for more than 10 years now. Uh we sort of bootstrapped engineer AI productivity community two years back leveraged the community we have already and then have some few results uh because we have this pretty much everyone passionate about this and will be in that community. So organically it dduplicates efforts and there's shared learning uh shared learning happening and it also helps to boost inner source contributions and then visit engineer idea right often times team A wants to do something team B let's say a platform team have different prioritization and the way we solve this is via inner source or via visit engineer we just move someone over the team work for six months a year get it done and then we can move on Um the last one is interesting. So our data shows individual contributors have a much better stronger adoption than our leadership team. Now if you think about this a lot of software TLS and managers in the age of AI they kind of don't really have um enough experience to truly guide their teams to build software. Right? So often times the stuff that they learned before might not be exactly applicable, still very valuable, but there's some missing piece there to make sure they can continue to guide the team to do the right thing. So we're rolling out leadership workshops to make sure our leaders are equipped with whatever knowledge they need to have to drive the techn um innovation. So um I'm going to close my part and to share with you what uh the part I'm I feel most excited about. The part I feel most excite most excited about is that with a lot of um creativity and innovation in the GI space, it actually changes the cost function of software engineering. Meaning the trade-off decision of whether we do something versus we don't do something actually changed because some of the work become a lot cheaper to do and some work become a lot more expensive to do. I tend to think it is a great opportunity for engineers and engineering leaders to get back to some of the uh basic principles and sort of ask a soul searching question. What is a high quality soft engineering and how can we use a tool for that purpose? So that's it. Thank you very much. [applause] [music] Heat.