Why your product needs an AI product manager, and why it should be you — James Lowe, i.AI

Channel: aiDotEngineer
Published at: 2025-07-28
YouTube video id: xzJdSi2Tsqw
Source: https://www.youtube.com/watch?v=xzJdSi2Tsqw
[Music]
Hi everyone. Thanks for that welcome.
Uh, as you just heard, my name is James
Low. I'm head of AI engineering at the
Incubator for AI. We're a small team of
experts uh, in the UK government. We
were created by 10 Downing Street to
deliver public good using AI and we do
that via experimentation and product
building.
The UK government delivers uh for its
citizens. It spends over a trillion
pounds delivering for its over 70
million citizens. So there's a lot to
play for. At the incubator for AI, uh we
deliver products that uh
uh a wide range of products all the way
from frontline services all the way up
to the prime minister's meetings.
This remmit is very wide. Uh and so
we've had to get quite good at deciding
what we should build and that is what
I'm here to talk to you about today.
I'm going to start with a post from
Andrew Ing.
He says, "Writing software, especially
prototypes, is becoming cheaper. This is
not just because of AI coding agents and
assistants, but also because AI features
in products uh make the previously
impossible possible."
He says this will lead to increased
demand for people who can decide what to
build.
AI product management has a bright
future.
In this talk, I'm going to build on this
post and I'm going to make the case for
the AI product manager. I'm going to
argue that AI expertise is really
important for this role. I'm going to
deliver three hardearned lessons from
the incubator for AI.
So, I hope that whether you're uh a
product manager, whether you're an AI
engineer, or whether you're a founder,
uh there's going to be a lot for you to
learn from these lessons and help you
build great AI products.
Before I talk about AI product
management, I'm going to quickly recap
product management. This is an extremely
rich field, so I'm only going to skin
the surface here.
Product management can be thought of as
the intersect between these three
important areas. We have the business.
Is your product viable? For example, is
it going to be profitable? We have the
technology. Is your product feasible?
For example, do we have the right skills
on the team? And we have users. Most
importantly of all, is your product
desirable? What problem are you solving
for your users? A product manager sits
at the intercept of these three areas
and has to balance them all to find the
right path forward for the product.
Then AI comes along and makes the whole
process a bit more complicated and a bit
messier.
It intersects with each of these areas
in slightly different ways. For example,
for the business, is your business happy
with the fact that for AI products, a
higher amount of experimentation uh is
needed and there's a higher chance of
failure
for technology. How do you evaluate and
monitor the performance of your AI?
And for users, how should you handle the
probabilistic nature of AI? In
particular, will it work for your users?
What guard rails do you need? And how do
you build human in the loop? And for AI
products sitting right in the middle of
all of this, we have a big question. Is
what you're doing even possible?
An AI product manager has to resolve all
of these different areas to find the
right path forward.
A lot of the existing product manager
skills skill set is still very
important, but now there is an increased
importance in things like data and AI
proficiency. AI product managers need to
understand the importance of data, the
necessity of evaluation, and how to deal
with the probabilistic nature of AI.
whether you're uh what what that
essentially means for you is uh if
you're a product person in this room, if
you're a product manager, is the
importance of upskilling in AI. But what
it also means is if you're an AI
engineer or someone more technical, that
actually that is a good background also
to go into the product manager space as
well. And just to be clear, when I talk
about the product manager space, I
actually think of this as more of a
mindset than like a specific role you
need on your team. What's really
important is that you have someone on
your team that is grappling with these
four areas in order to find the path
forward.
As Brett Taylor said on a recent episode
of the latent space podcast, there is a
lot of power in combining product and
engineering into as few people as
possible. Few great things have been
created by committee and that's exactly
the point that we're stressing here.
So, I hope you feel uh excited by the
prospect of of adopting that AI product
manager mindset. And the question now is
what lessons can you learn from the
incubator for AI?
The first lesson is going to come from
our project called consult and it's
going to be all about evaluating AI
early.
Every time the government wants to
undertake a really big policy change,
they need to and want to get input from
the public and in fact they have a legal
duty to do so. They do this via
consultations which are essentially
large uh surveys with free text
responses.
They run hundreds of these a year and
some of these attract hundreds of
thousands of responses.
Analyzing these responses can take
months and cost millions of pounds.
This is a prototypical use case for AI,
but when we started this project 18
months ago, we weren't sure exactly what
path to take.
You see, there was already precedent for
using natural language programming uh
techniques such as BERT topic to analyze
consultations and we were under a large
amount of pressure to start delivering.
So we made the mistake of going straight
into product building mode.
What we did is we built a product around
those existing uh techniques. Uh but
what we found is once we started testing
with real users, we found that the
results were uh inaccurate. they were
inconsistent and they not only didn't
meet user needs but wouldn't have passed
the very high legal threshold that we
needed to pass. So we went back to the
drawing board and instead prioritized
the AI capability first.
We got data from real users and
generated synthetic data to create eval
which we optimized against. And then we
started testing the outputs as well with
real users and we developed that into a
package which we call themefinder which
has now been open source that other
people can benefit from it.
What we found was that the output of
this package was not only comparable to
what humans were doing but it was a
thousand times faster and 400 times
cheaper.
Most importantly of all, by prioritizing
the AI capability, what we found was the
key points in the package in the
pipeline where human input and human in
the loop was really valuable. That meant
the product that the product that we
then went on to build was actually
different from the one we originally
envisioned.
That shows that starting with the AI
capability and getting that right not
only means you don't waste time building
something that's not possible, but also
don't waste time building the wrong
product.
We've now taken this and we've been
evaluating it on live consultations. Uh
and um it leads us very nicely to our
first lesson which is resolve AI
uncertainties early on with evaluations
and tests with real users.
With that with those live consultations,
we've been creating uh evaluations which
we've actually published and our first
one of those even made its way onto the
BBC front page.
I'm going to take us on to another
product now for our second lesson. That
product is our AI transcription tool
called Minute. And uh the lesson's all
about going wide with features.
There are many use cases in the
government where secure AI transcription
and summarization could be
transformational.
There are many places where frontline
staff, for example, are spending time
away from the job that they want to do
to do uh administration and filling
filling in paperwork and forms, for
example.
There's also uh very good existing
off-the-shelf solutions
such as the AWS and Azure transcription
services. So for this product, the
question was more about how do you
create a uh streamlined frictionless
experience for users uh that gives them
this capability.
When we were exploring the possibility
of this space, what we found was um we
thought there was lots of ways that AI
could help by developing AI features
that could um help the user get access
to this experience. But there was a lot
of different ways you could do this and
there's a lot that was quite uncertain.
We also knew that AI could help us build
those features really quickly with AI
coding assistance and tools. So what we
ended up doing is going extremely wide
and trying quite a lot of features with
different groups of users and seeing
what worked and what didn't.
The important thing is after that point
we then stripped back and focused on
what actually worked.
One of the benefits of using AI coding
assistants to make those features as
well is that you don't have the
sentimental attachment to them. So, it
makes it much easier to strip them out
again afterwards.
I'm going to illustrate that point uh by
showing an example of what the tool
looked like when it had lots of features
and then when we streamlined it down. Uh
so here the users already recorded their
meeting uh and they've been taken to
this page to uh help them generate the
summary of them. You see at the top
there's like the ability to choose lots
of different templates because we had
different users we're experimenting
with. Some of our users seem seem to
want the the output to follow an agenda
from the from the meeting. So we gave
them the option of inputting that agenda
information.
At the bottom we had two different AI
features. We had an AI edit button so
they could use free text to uh edit the
output of the meeting but we also had AI
chat so they could ask questions of the
meeting. And this doesn't touch on some
of the AI that's happening behind the
scenes, such as automatically predicting
who the speaker names are and also doing
citations back to the original
transcript.
It's no surprise that when we were
testing this with a lot of our users,
they found it a little bit overwhelming
and a little bit complicated. And in
fact, many of them weren't even using
these features.
We also found because we were testing
with different groups, a specific group
that there was quite a lot of value of
pursuing with, which was the probation
services use case.
So what we did next is we focused in on
that use case and then we streamlined
the app down. And what we ended up with
is with this justice transcribe and we
built this in collaboration with Justice
AI who's an AI team in the Ministry of
Justice. As you can see, it's a lot
simpler because we're focusing on one
set of users. We didn't need to have the
template picking option. These users
didn't need the agenda option, so we
could strip it out entirely.
What we found with the AI edit and the
AI chat feature is an overwhelming
amount of pressure to merge them into
one feature. So, we've taken them out
and we're experimenting heavily so that
there's not that same confusion.
We've been getting extremely positive
feedback from users with this and we're
currently taking part in an evaluation
where we're we're being compared to
other uh tools in the space to work out
which ones are the most impactful.
But I hope this illustrates the point
and the lesson that we're making here,
which is experiment hard and go wide
with lots of features. Lean into that
uncertainty of what is what makes a good
AI feature at the moment, but then cut
back and and streamline the app down.
I thought you might be interested in in
another use case that we were exploring
which was for prime ministerial
meetings. Uh this was actually from a
recent meeting which was the first ever
prime minister meeting where AI was used
to transcribe and summarize the meeting
and it was done using our tool.
For the final lesson I'm going to tell
you a little bit about red box and the
lesson is all about being ready to
pivot.
For those of you that don't know, all of
our government ministers carry around a
big red box which is full of paperwork
and submissions and important decisions
that they have to make. Their private
offices do a lot of work to summarize,
collect, and collect all that
information to put it in the red box.
Again, this is a prototypical use case
of AI. So, it was no surprise that the
idea to digitize this red box was the
winning idea at a hackathon run by one
of our sister teams, Evidence House.
This became the first incarnation of
Redbox to digitize the ministerial red
box.
We we took this uh winning idea from the
hackathon, built it into a full product.
However, what we found when we actually
tested it with real users is that the
feature that they were most after, the
one that they wanted above all else, and
that they didn't really care about
anything else, was just the ability to
securely chat with a large language
model. You see, this was over a year ago
when enterprise uh the ability for
enterprises to chat with large language
models uh was definitely a bit rarer and
particularly in the civil service. So,
people were familiar with the value they
could get from things like chat GPT, but
they couldn't put their work information
into it.
This led to the second incarnation of
Redbox, which was to be the easiest and
cheapest way to securely chat to a large
language model for civil servants.
This also gave us an opportunity.
A lot of our other tools we were
experimenting with ways of making
government specific data more accessible
and easy to navigate. For example, we
had a product called Parlex which was
all about making parliamentary and
legislative data uh more available. But
we were developing these as independent
products with their own user interfaces.
The opportunity we saw is to use Redbox
as that interface. why not bring these
tools and products into this uh kind of
chat interface that we were already
creating and that lots of people already
had access to. That's why our third
incarnation was to be the client to
access the incubator for AI's tools and
data.
It's worth saying after the second one
uh the reason that we' validated that
that was a useful use case, we launched
it within the cabinet office and within
just a matter of weeks we had thousands
of users. So that's why we knew it would
be a useful front front end for some of
our other tools as well.
The next thing that happened were two
important things. The first is that uh
the commercial landscape changed.
Microsoft announced that copilot chat
their enterprise version of chat GPT was
going to be free for enterprise
Microsoft users and a lot of the
government is an enterprise Microsoft
user.
The second thing that happened is that
Claude's model context protocol exploded
onto the scene and provided a way of
providing standardization for being able
to bring tools and data to models.
This meant we had to pivot again. It no
longer made sense for us to bank on red
box being the main way that civil
servants would be accessing secure chat
with large language models. And it no
longer made sense for that to be the
only way for people to access our tools
and data. So instead we've been
investing hard in using the model
context protocol to bring our tools and
data to any client whether it's redbox
whether it's uh copilot chat whether
it's enterprise versions of other tool
like anthropic or chat GPT.
So it's worth stressing that throughout
that time Redbox has been valuable uh
and is still valuable. It's still the
main way that a lot of people in the
cabinet office uh are able to access
that secure chat with a large language
model. But I hope this lesson shows that
things are moving really quickly and
it's really important to evolve and
change with it. Otherwise, you get stuck
on the wrong path.
That's why our third lesson is you'll
have to pivot harder and faster than
ever before.
So, let's recap the four lessons we've
covered today. And yep, there were four.
Lesson zero, the importance of AI
product managers and the fact this is a
vital role which requires AI expertise.
Lesson one, evaluate AI early. Resolve
AI uncertainties early on with
evaluations and tests with users.
Lesson two, go wide with features.
Experiment hard with new features on
real users, then cut back.
And lesson three, be ready to pivot.
You'll have to pivot harder and faster
than ever before.
Now, some of you are probably sat there
thinking, how much of this is really
new? And that's a fair question to ask.
There's a lot of wisdom within existing
product management that feels very
familiar to the stuff we're covering
here. For example, the principle of
resolving your biggest uncertainties
first uh has been around for a long
time, as is putting your users first,
listening to them, and testing features
with them.
However, I hope that what these lessons
have emphasized is that AI really does
make things different.
AI really does uh resolving AI's
uncertainties really is an important
thing you have to do and is something
that is a bit more challenging with the
extra need for experimentation and
evaluation.
Uh for lesson two which we had which was
uh going wide with features AI really
does change the landscape. uh it makes
it easier to go faster with features and
have less attachment to them and
therefore you should be doing that and
testing those features and scaling back
as well as AI features being new and
there being more uncertainty around
exactly what makes a good AI feature.
And finally, the AI landscape is
changing extremely quickly which is why
pivoting is is more necessary now than
ever before.
So I hope uh those lessons have been
useful. I hope that you feel ready to
step up into that AI product manage
manager mindset uh that your product
needs.
And thank you so much for listening and
please do check us out. Uh we're
currently hiring. Thank you.