What We Learned Deploying AI within Bloomberg’s Engineering Organization – Lei Zhang, Bloomberg

Channel: aiDotEngineer

Published at: 2025-12-16

YouTube video id: Q81AzlA-VE8

Source: https://www.youtube.com/watch?v=Q81AzlA-VE8

[music]
I don't have a joke about the dot. I
don't have a joke about the uh hot dog
either. So, I would just jump to the
topic right away. Um, so my name is Lei.
Um I lead the uh department of
technology infrastructure in Bloommer.
So we're basically a group of
technologists focus on global
infrastructure. Thank data centers
connectivities
um developer productivities uh thank SCS
tooling and also uh reliability
solutions think telemetry and instant
responses right so um depends on
audience sometimes uh you know you're
familiar with what Bloomer is sometimes
you don't so I thought it might be a
good idea to talk a little bit about
about our company
um so there's no better way to talk
about our company by sharing some
numbers
I want to highlight a few numbers. We
have more than 9,000 engineers and most
of them are software engineers. Uh we
handle a lot of market techs uh which in
the billions and 600 billions I believe.
And um we also have tons of folks uh
focus on AI research and engineering. So
we have a more than you know really
today's 500 plus employees focus on AI
products uh for um sort of our
customers.
So takeaway here is we are I guess you
know building a lot of software and use
a lot of data to empower our flagship
product which is called the Bloomer
terminal and to really support our users
to make the most important f decisions
for them to do their job um the best. Um
in the technical lens um a lot of time
kind of to explain that we actually have
one of the largest private network uh in
the whole world. We also have one of the
largest JavaScript codebase um in the
world. Um we because the domain where I
am uh so promot terminal is really you
can think of a um software that supports
thousands of different applications. uh
we call them functions right um email is
a function uh news is a group of
functions
um let's say fixed income price to yield
calculation to spread calculation is
another function um trading workflows is
another group of functions so there's
many many many different type of
functions as you can imagine we kind of
have to utilize different technologies
to really support those uh
functionalities
uh we also
increasingly more than used but also
contribute to open source communities.
um for this audience I guess I want to
call out you know we kind of helped
creation of the case envoy AI gateways
and among many many other things that
that we deploy inhouse and support the
communities again in summary there's a
lot of software there's a lot of data uh
we kind of have to um figure out how to
make the best of AI tooling to support
us to do our engineering work all right
so get to what is AI for coding Um we
started about two years ago maybe a
little bit more than that. Um and as I
guess the rest of the world we look at
the toolings provided and you know I
apologize if if your logos are not here.
Um but as you can imagine it's kind of
like overwhelming right there's so many
things and every day there's news about
this is great this is great. Um so at
the time we actually didn't know what
all the AI solutions can help us to
uh boost our productivities as well as
stability. But one thing we knew at the
time is um unless we deploy and try we
wouldn't know what's the best way to
benefit from all the awesome work and
and you know a lot of folks are
contributing to. So at the time uh we
quickly form a team people start
kind of like release um kind a set of
capabilities so that people start
iterating on um utilizing the toolings
and then of course you know we are data
company so kind of want to get a sense
of how we measure the impact and um what
we can do from the capability we provide
right so we look at the typical
developer productivity measurements
We ran a few survey. Uh it was very
obvious that people felt like there's
much quicker uh proof of concept, people
rolled out tests. Uh there's a lot of
one time use scripts being generated and
then the measurements dropped actually
pretty quickly when you
go beyond all the green field type of
thing, right? And then then we start
thinking like okay so what are the
things that we should really be doing
using all those wonderful things so that
we can really make a dent um in the in
the space and then at this time we also
kind of like also [clears throat] be
thoughtful of um unleash a very powerful
tool right uh the the benefits is it's
very fast the challenge is also it's
very
Right. Um for any of you who actually
dealt with hundreds of millions of lines
of code, you probably understand the
system complexity is a at least [snorts]
um exponential or at least polomial I
guess function of your live code on
software assets, right? So at some point
you kind of want to be very careful uh
what you do with your software assets.
And what we thought so maybe we should
look at some of the basics. One idea we
had is um all right so AI for coding
there's narrow definition of what coding
is but there's also a broader definition
of what software engineering right and
then maybe we can also look into some of
the work our developers don't really
prefer to do for instance
um some men work some of the migration
work some of the I don't know men work
and stuff like that so I want to give
some examples of the things that we been
trying and we think there's pretty good
return investment.
So the question we ask ourselves how do
we evolve our codebase right the first
one is all right wouldn't it be cool uh
the day you get a ticket saying hey you
know this piece of software needs
patched at the same time you have a pull
request with the fix with a patch and
also with thinking why the patch
happened that way right so it's kind of
like we're trying to uh broadly deploy
something called uplift agents
um broadly scan through our codebase and
figure out what the patch would be
applicable and be able to apply those
patch step back a little bit. We did
have a reg based refraction tool um it
works to some extent but it's limited
right now with um LMS and other tooling.
So we are able to uh see very much
better results from the um uplift
agents. So there are few challenges in
case you also plan to deploy such
capabilities. The first one is
I guess any AI or ML it would be really
nice if there's some detistic
verification capability. uh oftentimes
it's not so easy especially if you have
test cases if you don't have good
llinter if you don't have good
verification the the patch can sometimes
be uh uh difficult to to to be applied
and uh one thing we also realized when
we deploy AI tooling is the average open
pull requests increased and time to
merge also increased uh because you spin
a lot of new code and then still we have
to review the code and merge the code
right so time to merge merge become a
challenge sometimes and the last one is
um I think it applies to any gen is the
shift becomes what do we want to achieve
rather than how we want to achieve right
so
the second example that I I want to
share is uh the other area that people
kind of like sometimes
really impact our productivity in a
negative way or impact our stability in
negative way is how we handle instance
so we're trying to develop and then
deploy
um yes response agents. Um now
the importance of this is if you really
think about GI tools it's really really
fast and it's also unbiased right in
IMA's instance it can go through your
codebase really quickly it can go
through your telemetry system very
quickly it can go through your feature
flags very quickly it can go through
your um I don't call trace very quickly
and in unbiased lens when we do
troubleshooting sometimes we have this
biased views it must be this. It turns
out to be not the case. So there's many
many interesting benefits um by uh
deploying agents from this perspective.
And then the second question is become
interesting is imagine you have
organization of 10,000 pe um let's say
9,000 people as I described a lot of
people trying to fix those problems
right and you have 10 teams who wants to
build a pull request review bots. you
have too many teams who wants to build a
instant response agents right they
become very quickly chaotic and
sometimes can have duplications
so before I talk about the p pass I
going to give example of the uh instance
response agent so basically this is what
you know a in response agent will look
like um the key part is we're going to
need to build a lot of MCP servers to
connect to the um the metrics and logs
dashboards you have connect to the
topology you have whether it's network
topology or it's the um your service
dependency topology uh your alarms your
triggers right your SLOs's
and then we kind of don't want people
just start building MCP servers uh
without a pay pass so we created a pay
pass in partnership with our AI
organization and I will talk a little
bit what that means
before at um I do want to explain a
little bit some of the platform
principles.
Some company allow teams to be have a
lot of freedom as at the same time
responsibility in the sense a business
unit can build whatever infrastructure
whatever platform.
um some organization
have a very very strong tight
abstraction of the service
infrastructure and typically kind of
have to use their platforms right so
Bloomberg is kind of in the middle if
you look at the golden ones we kind of
believe in provide a golden path
um with enablement teams so my team is
really a en enabling team and one of the
guiding principle for us is we want to
make easy things extremely easy to do.
Uh sorry, the right thing is extremely
easy to do and we want to make sure the
wrong thing is ridiculous hard to do.
Right? So that's the guiding principle
here.
Now move on. So what is the pay path
here? So the pay path is
uh we have a gateway so that teams can
easily figure out which model works the
best. They can do quick experiments.
they can um we can have visibility what
kind of models being used and we can
also guide through teams which model
should is a better fit for the for the
problem they want to solve. uh we have a
two discovery uh basically MCP directory
via hub so that let's say team A wants
to do something they will go to the hub
okay someone is building MCB server
already maybe I should partner with them
to build it together right
uh tool creation and deployment is via a
pass it's basically a um you know a
standard platform service where you can
do your STLC and and we provide runtime
environment for you as well taking care
of all the off and side of things as
well so it reduce friction of for for
teams to to deploy um their MT MCP
servers.
And then the this is kind of interesting
is we want to make demo very easy so
that or I really say prove concept very
easy so that people can try have ideal
generation uh because we believe in
creativity come from some freedom of try
different new things but we also want to
make sure the production requires some
quality control. um
because at the end of the day stability
and system reliability is at the core of
our business. So this is sort of the pay
path that we deployed um and enabled the
rest of engineering really the 9,000
software engineers to do their job.
Okay.
And um with all this and then we start
maybe okay yes we got p uh path we have
some good ideas of how to evolve our
codebase help our people right um now
this is where I find that
any new things any adoption of new
things provide opportunity to leverage
the strengths you have and also identify
the some of the weakness that you may
have. So um in Bloomberg we have a
wellestablished training program uh it's
more than 20 years so there's on
boarding training depends on entry level
it depends on senior level um so we have
this whole training program to prepare
folks to before they join a team and
what we did is we just incorporate AI
coding in on boarding training program
and also show them how to best utilize
them with our principles and our
technologies right there's a huge
benefits here because um if any of you
run into the challenge of adoption
somehow run into a chasm right the rest
of orc is not uh adopt as quick as
possible whenever we have folks join a
company they learn how to do things in
new way when they go back to their team
they were like hey why don't we do that
right they're going to challenge the
some of the senior folks as well to say
hey there's a new way to do this type of
things why don't we do that so we
actually find this program extremely
effective uh to be a change agent for
anything want to push out
and then bunch results there's a lot
more f familiarity and comfort with the
tooling. Um and also the important part
is there's a lot more nuance insights of
where it's at value right
the second one is um often times we run
organization to push uh new initiatives
so within Bloomer we have something
called um a champ program and a guild
program that's basically a cross
organization or tech communities where
people have similar interest and similar
passion they get together and get stuff
done so Um we had this for more than 10
years now. Uh we sort of bootstrapped
engineer AI productivity community two
years back leveraged the community we
have already and then have some few
results uh because we have this pretty
much everyone passionate about this and
will be in that community. So
organically it dduplicates efforts and
there's shared learning uh shared
learning happening
and it also helps to boost inner source
contributions and then visit engineer
idea right often times team A wants to
do something team B let's say a platform
team have different prioritization and
the way we solve this is via inner
source or via visit engineer we just
move someone over the team work for six
months a year get it done and then we
can move on Um the last one is
interesting. So our data shows
individual contributors have a much
better stronger adoption than our
leadership team. Now if you think about
this a lot of software TLS and managers
in the age of AI they kind of don't
really have
um enough experience to truly guide
their teams to build software. Right? So
often times the stuff that they learned
before might not be exactly applicable,
still very valuable, but there's some
missing piece there to make sure they
can continue to guide the team to do the
right thing. So we're rolling out
leadership workshops to make sure our
leaders are equipped with whatever
knowledge they need to have to drive the
techn um innovation.
So um I'm going to close my part and to
share with you what uh the part I'm I
feel most excited about. The part I feel
most excite most excited about is that
with a lot of um creativity and
innovation in the GI space, it actually
changes the cost function of software
engineering.
Meaning
the trade-off decision of whether we do
something versus we don't do something
actually changed because some of the
work become a lot cheaper to do and some
work become a lot more expensive to do.
I tend to think it is a great
opportunity for engineers and
engineering leaders to get back to some
of the uh basic principles and sort of
ask a soul searching question. What is a
high quality soft engineering and how
can we use a tool for that purpose? So
that's it. Thank you very much.
[applause]
[music]
Heat.