Minimax M2: Building the #1 Open Model – Olive Song, MiniMax

Channel: aiDotEngineer
Published at: 2025-12-13
YouTube video id: lY1iFbDPRlw
Source: https://www.youtube.com/watch?v=lY1iFbDPRlw
[music]
Hi. Hi everyone. Um, I'm Olive. It's my
great honor here today to present on our
new model, Miniax M2. Um, I actually
lived in New York City for six years, so
it feels great to come back. Um, but
with a different role. Um, I currently
study reinforcement learning and model
evaluation at Miniax. Um, let me just
get a quick sense of the room. Who here
has heard or have tried of Miniax
before? Oh, a couple of there. Yeah, not
everybody, but I guess Yeah, but here's
the value, right, of me standing here
today. Um so we are a global company
that works on both foundation models and
applications. We develop multi modality
models including text um vision language
models our video generation model hyoa
and speech generation music generation
stuff and we also have um many
applications including agents and stuff
um inhouse. So that that's the specific
thing that's different from the other
labs for other companies. So we both
develop foundation models um and
applications. So we have research and
developers sitting uh sitting side by
side working on things. Um so our
difference would be that we have
firsthand experience from our um
in-house developers into developing
models that developers would really need
in the community. And here I want to
introduce our Miniax M2 um which is an
openweight model very small with only 10
billion active parameters um that was
designed specifically for coding
workplace agentic tasks. It's very
costefficient.
Um let me just go over the benchmark
performance because people care about
it. So uh we rank very top in both um
intelligence benchmarks and also agent
benchmarks. Uh we I think we're on the
top of the open source models. But then
numbers don't tell everything because
sometimes you get those super high
number models you plug into them um into
your environment and they suck, right?
So we really care about the dynamics in
the community and in our first week we
had the most downloads
and also we climbed up to top three
token usage on open router. So we're
very glad that people in the community
are really loving our model um into
their development cycle.
So today what I want to share is how we
actually shape these men model
characteristics that made M2 so good in
your coding experience. And I'm gonna
present to you um the training be behind
it that supports each one of them from
coding experience to long horizon state
tracking tasks um to robust
generalization to different scaffolds to
multi- aent uh scalability.
So first let's talk about code
experience which we sc uh which we
supported with um scaled environments
and scaled experts.
So um developers need a model that can
actually work in the language they use
and across the workflow that they deal
with every day. So which means that we
need to utilize the real data from from
the internet and then um scale the
number of environments so that the model
when during training for example during
reinforcement learning it can actually
um reacts to the uh environment. it can
actually target verifiable coding goals
and to learn from it. So that's why we
scaled both the number uh of
environments and also our um
infrastructure so that we can perform
those training very efficiently.
So um with data construction and
reinforcement learning we were able to
train the model so that it's very strong
um it's full stack multilingual
and what I want to mention here is that
besides scaling environment that
everybody talks about we actually scale
something called expert developers um as
reward models. So as I mentioned before
uh we have a ton of um super expert
developers in house that could give us
feedback to our model's performance. So
they participated closely into the model
development and training cycle including
problem definition for example um bugs
bug fixing for example um repo
refactoring and stuff like that. And
also they identify the model behaviors
that developers enjoy and they identify
what's reliable and uh what developers
would trust
and they give precise reward and
evaluation to the model's behaviors to
the final um deliverables so that um it
is a model that developers really want
to work with and that can adds
efficiency to the developers.
So with that we were able to lead in
many um languages in real use.
And the second characteristic that
Miniax M2 has is it it performs good in
those long horizon tasks. Uh those long
tasks that require interacting with
complex environments that requiring um
using multiple tools with reasoning.
And we supported that with the interled
thinking pattern um and reinforcement
learning.
So what is interled thinking? Um so with
a normal reasoning model that can use
tools, it it normally works like this.
You have the tools information given to
it. You have the system prompts. Um you
have user prompts and then the model
would sync and then it calls tools. It
can be a couple of tools at the same
time. And then they get the tool
response from the environment and then
it performs a final thinking and deliver
a final content. But but here's the
truth, right? In real world, the
environments are often noisy and
dynamic. You can't really perform this
one test just by once. You can get um
tool errors for example. You can get um
unexpected results from the environment
and stuff like that. So um what we did
is that we imagine how humans interact
with the world. We we we look at
something we get feedbacks and then we
think about it. We think if the feedback
is good or not and then we make other
actions, make other decisions. And
that's why we did the same thing with
our M2 model. So if we look at this um
chart over a diagram on the right. So
instead of just stopping um after one
round of tool calling, it actually
thinks again and reacts to the uh reacts
to the environments to see if the
information is enough for it to uh get
what it wants. So basically we call the
interle thinking or people call it
interle thinking because it interle
thinking with tool calling. um a couple
of time it can be you know uh tens to
100 um turns [clears throat] of tool
calling within just one user interaction
term
so it helps um adaptation to environment
noise for example uh just like what I
mentioned the environment is it's it's
not stable all the time and then
something is suboptimal and then it can
choose to use other tools or do other
decisions it can focus on long horizon
has um can automate your workflow um
using for example Gmails, notions, um
terminal all at the same time. You just
need to maybe make one model call
without minim with minimal um human
intervention. It can do it all by
itself. And and here's a cool
illustration on the right because it's
New York City. I feel the vibe of you
know trading and marketing. Um so you
can see that there was some um there was
some perturbations in the stock market
uh I think last week and then our model
was able to keep it stable. So just like
I said there's like environment noise
there's no new information there's like
yeah news it looks like there there's
like other trading policies and stuff
like that but our model was able to uh
to perform pretty stably in these kind
of environments.
And the third characteristic is our
robust um generalization to many agent
scaffolds which was supported by our
perturbations in the data pipeline.
So we want our agent to generalize. But
what is agent generalization?
At first we thought it was just tool
scaling. We train the model with enough
tools, various tools kind of new tools.
we invent tools um and then it will just
perform good on unseen tools. Well, that
was kind of the truth. It worked at
first. Uh but then we soon realized that
if we perturb the environment a little
bit, for example, we change another
agent scaffold, then it doesn't
generalize. So what is agent
generalization?
Well, we conclude that um it's
adaptation to perturbations across the
model's entire uh operational space.
If we uh think back what's the model's
um operational space that we talked
about it can be tool information it can
be system prompts it can be user prompts
they can all all be different they can
be the chat template they can be the
environment they can be the tool
response. So what we did is that we
designed and maintained perturbation
pipelines of our data so that um our
model can actually gen generalized to a
lot of agent scaffolds.
And the fourth characteristic that I
want to mention is the multi- aent
scalability
um which is very possible with M2
because it's very small and cost
effective.
I have a couple of videos here. Um, this
is M2 powered by our own Miniax agent uh
app. We actually have a QR code
downside. So, if you want it, you can
just scan and try it. So, it's like an
agent app we we developed. And here we
can see different copies of M2, right?
It can do research. um it can write the
write the research results and analyze
it and put it in a re report. It can put
it in some kind of front end
illustration and they can work in
parallel. So because it is so small um
and so cost effective it can really um
support those long run agentic tasks and
tasks that maybe um require some kind of
parallelism.
So what's next right for Miniax M2 from
what I've introduced we gathered
environments um algorithms data expert
values model architecture inference
evaluation all these stuff to build a
model um that was you know fast that was
uh intelligent that could use tools that
generalizes
what's next
for um M2.1 1 and M3 were in the future
we thinks of better coding maybe memory
work context management proactive AI for
workplace vertical experts and because
we have those great audio generation
video generation models maybe we can
integrate them but all our mission is
that we're committed to bring all these
resources whatever is on the screen and
maybe more yeah and values and put them
all together to develop models for uh
the community to use. So um we really
need feedback from the community if
possible because we want to build this
together and you know this is kind of a
race that everyone needs to participate
and then um we com we are committed to
share it with the community. Yeah.
And that's all the insight for today.
Um, we really hope again we really hope
you to try the model because it's pretty
good. And then we can contact contact us
up there. You can try the models by
scanning the QR code. Yeah, basically
that's it. Thank you all for listening.
[music]