Minimax M2: Building the #1 Open Model – Olive Song, MiniMax
Channel: aiDotEngineer
Published at: 2025-12-13
YouTube video id: lY1iFbDPRlw
Source: https://www.youtube.com/watch?v=lY1iFbDPRlw
[music] Hi. Hi everyone. Um, I'm Olive. It's my great honor here today to present on our new model, Miniax M2. Um, I actually lived in New York City for six years, so it feels great to come back. Um, but with a different role. Um, I currently study reinforcement learning and model evaluation at Miniax. Um, let me just get a quick sense of the room. Who here has heard or have tried of Miniax before? Oh, a couple of there. Yeah, not everybody, but I guess Yeah, but here's the value, right, of me standing here today. Um so we are a global company that works on both foundation models and applications. We develop multi modality models including text um vision language models our video generation model hyoa and speech generation music generation stuff and we also have um many applications including agents and stuff um inhouse. So that that's the specific thing that's different from the other labs for other companies. So we both develop foundation models um and applications. So we have research and developers sitting uh sitting side by side working on things. Um so our difference would be that we have firsthand experience from our um in-house developers into developing models that developers would really need in the community. And here I want to introduce our Miniax M2 um which is an openweight model very small with only 10 billion active parameters um that was designed specifically for coding workplace agentic tasks. It's very costefficient. Um let me just go over the benchmark performance because people care about it. So uh we rank very top in both um intelligence benchmarks and also agent benchmarks. Uh we I think we're on the top of the open source models. But then numbers don't tell everything because sometimes you get those super high number models you plug into them um into your environment and they suck, right? So we really care about the dynamics in the community and in our first week we had the most downloads and also we climbed up to top three token usage on open router. So we're very glad that people in the community are really loving our model um into their development cycle. So today what I want to share is how we actually shape these men model characteristics that made M2 so good in your coding experience. And I'm gonna present to you um the training be behind it that supports each one of them from coding experience to long horizon state tracking tasks um to robust generalization to different scaffolds to multi- aent uh scalability. So first let's talk about code experience which we sc uh which we supported with um scaled environments and scaled experts. So um developers need a model that can actually work in the language they use and across the workflow that they deal with every day. So which means that we need to utilize the real data from from the internet and then um scale the number of environments so that the model when during training for example during reinforcement learning it can actually um reacts to the uh environment. it can actually target verifiable coding goals and to learn from it. So that's why we scaled both the number uh of environments and also our um infrastructure so that we can perform those training very efficiently. So um with data construction and reinforcement learning we were able to train the model so that it's very strong um it's full stack multilingual and what I want to mention here is that besides scaling environment that everybody talks about we actually scale something called expert developers um as reward models. So as I mentioned before uh we have a ton of um super expert developers in house that could give us feedback to our model's performance. So they participated closely into the model development and training cycle including problem definition for example um bugs bug fixing for example um repo refactoring and stuff like that. And also they identify the model behaviors that developers enjoy and they identify what's reliable and uh what developers would trust and they give precise reward and evaluation to the model's behaviors to the final um deliverables so that um it is a model that developers really want to work with and that can adds efficiency to the developers. So with that we were able to lead in many um languages in real use. And the second characteristic that Miniax M2 has is it it performs good in those long horizon tasks. Uh those long tasks that require interacting with complex environments that requiring um using multiple tools with reasoning. And we supported that with the interled thinking pattern um and reinforcement learning. So what is interled thinking? Um so with a normal reasoning model that can use tools, it it normally works like this. You have the tools information given to it. You have the system prompts. Um you have user prompts and then the model would sync and then it calls tools. It can be a couple of tools at the same time. And then they get the tool response from the environment and then it performs a final thinking and deliver a final content. But but here's the truth, right? In real world, the environments are often noisy and dynamic. You can't really perform this one test just by once. You can get um tool errors for example. You can get um unexpected results from the environment and stuff like that. So um what we did is that we imagine how humans interact with the world. We we we look at something we get feedbacks and then we think about it. We think if the feedback is good or not and then we make other actions, make other decisions. And that's why we did the same thing with our M2 model. So if we look at this um chart over a diagram on the right. So instead of just stopping um after one round of tool calling, it actually thinks again and reacts to the uh reacts to the environments to see if the information is enough for it to uh get what it wants. So basically we call the interle thinking or people call it interle thinking because it interle thinking with tool calling. um a couple of time it can be you know uh tens to 100 um turns [clears throat] of tool calling within just one user interaction term so it helps um adaptation to environment noise for example uh just like what I mentioned the environment is it's it's not stable all the time and then something is suboptimal and then it can choose to use other tools or do other decisions it can focus on long horizon has um can automate your workflow um using for example Gmails, notions, um terminal all at the same time. You just need to maybe make one model call without minim with minimal um human intervention. It can do it all by itself. And and here's a cool illustration on the right because it's New York City. I feel the vibe of you know trading and marketing. Um so you can see that there was some um there was some perturbations in the stock market uh I think last week and then our model was able to keep it stable. So just like I said there's like environment noise there's no new information there's like yeah news it looks like there there's like other trading policies and stuff like that but our model was able to uh to perform pretty stably in these kind of environments. And the third characteristic is our robust um generalization to many agent scaffolds which was supported by our perturbations in the data pipeline. So we want our agent to generalize. But what is agent generalization? At first we thought it was just tool scaling. We train the model with enough tools, various tools kind of new tools. we invent tools um and then it will just perform good on unseen tools. Well, that was kind of the truth. It worked at first. Uh but then we soon realized that if we perturb the environment a little bit, for example, we change another agent scaffold, then it doesn't generalize. So what is agent generalization? Well, we conclude that um it's adaptation to perturbations across the model's entire uh operational space. If we uh think back what's the model's um operational space that we talked about it can be tool information it can be system prompts it can be user prompts they can all all be different they can be the chat template they can be the environment they can be the tool response. So what we did is that we designed and maintained perturbation pipelines of our data so that um our model can actually gen generalized to a lot of agent scaffolds. And the fourth characteristic that I want to mention is the multi- aent scalability um which is very possible with M2 because it's very small and cost effective. I have a couple of videos here. Um, this is M2 powered by our own Miniax agent uh app. We actually have a QR code downside. So, if you want it, you can just scan and try it. So, it's like an agent app we we developed. And here we can see different copies of M2, right? It can do research. um it can write the write the research results and analyze it and put it in a re report. It can put it in some kind of front end illustration and they can work in parallel. So because it is so small um and so cost effective it can really um support those long run agentic tasks and tasks that maybe um require some kind of parallelism. So what's next right for Miniax M2 from what I've introduced we gathered environments um algorithms data expert values model architecture inference evaluation all these stuff to build a model um that was you know fast that was uh intelligent that could use tools that generalizes what's next for um M2.1 1 and M3 were in the future we thinks of better coding maybe memory work context management proactive AI for workplace vertical experts and because we have those great audio generation video generation models maybe we can integrate them but all our mission is that we're committed to bring all these resources whatever is on the screen and maybe more yeah and values and put them all together to develop models for uh the community to use. So um we really need feedback from the community if possible because we want to build this together and you know this is kind of a race that everyone needs to participate and then um we com we are committed to share it with the community. Yeah. And that's all the insight for today. Um, we really hope again we really hope you to try the model because it's pretty good. And then we can contact contact us up there. You can try the models by scanning the QR code. Yeah, basically that's it. Thank you all for listening. [music]