Featured Blog

Xiaomi MiMo V2.5: The 310B model that just ate Claude Opus on token efficiency.

2026-05-06
ai.cc — model_review.md
~/posts/2026/   ENTRY 0427
Open-Source AI · Filed Apr 2026

Xiaomi MiMo V2.5:
The 310B model that just caught up ate Claude Opus on token efficiency.

Xiaomi's MiMo V2.5 is the most consequential open-weight release of Q2 2026 — a 310B sparse Mixture-of-Experts model with native multimodal understanding, a 1M-token context window, and benchmark numbers that put it neck-and-neck with Claude Opus and Gemini 3 Pro while burning 40–60% fewer tokens. Here is the full breakdown: architecture, benchmarks, real-world tasks, pricing, and how it stacks up against the closed-source frontier.

Model
MiMo-V2.5 / V2.5-Pro
Params
310B / 1.02T
Context
1,048,576 tokens
License
Open Weights · MIT
Xiaomi MiMo V2.5 official model release banner
FIG.01 MiMo V2.5 — Xiaomi's flagship open-weight release, April 2026

What is Xiaomi MiMo V2.5?

MiMo V2.5 is the latest model family from Xiaomi's MiMo team, released in late April 2026 and pushed straight to Hugging Face as open weights. There are actually two flagship models in the drop, plus a TTS suite and an ASR model — and that distinction matters because most of the hype online conflates them.

The line splits like this:

  • MiMo-V2.5 — The "Omni" multimodal generalist. 310B total params, 15B active, sparse MoE architecture, trained on 48T tokens. Native vision and audio understanding. The all-rounder.
  • MiMo-V2.5-Pro — The "Agent" specialist. 1.02T total params, 42B active. Same hybrid attention backbone but tuned hard for long-horizon coding and thousands-of-tool-calls trajectories.
  • MiMo-V2.5-TTS — A three-model voice suite (TTS, VoiceDesign, VoiceClone) for production speech generation, with style-instruction control over speed, emotion, and tone.
  • MiMo-V2.5-ASR — End-to-end speech recognition that handles Chinese dialects (Wu, Cantonese, Hokkien, Sichuanese), code-switched speech, song lyrics, and noisy acoustic environments.

Both flagship models share an in-house hybrid sliding-window attention architecture inherited from MiMo-V2-Flash, with dedicated visual and audio encoders connected through lightweight projectors. Both ship with a native 1,048,576-token context window. Neither charges a context-length multiplier — Xiaomi removed that on launch day.

Xiaomi did not release a frontier model that matches Claude on intelligence. They released a frontier model that matches Claude on intelligence at roughly half the token cost — which is the only number that matters once you stop talking and start shipping.

MiMo V2.5 vs Claude Opus, Gemini 3 Pro, GPT-5.4

The headline benchmark — and the one Xiaomi led the launch with — is ClawEval, a multi-turn agentic task suite where the model has to plan, call tools, and iterate over long horizons. This is the benchmark that maps to actual production agentic workloads, and it is where MiMo V2.5 looks the strongest.

Model ClawEval Pass³ Tokens / Trajectory Cost-Adjusted Rank
MiMo V2.5-Pro 63.8 – 64.0% ~70K #1 (Pareto frontier)
MiMo V2.5 (base) 62.3% ~75K Tied frontier
Claude Opus 4.6 ~65.4% ~120–175K Higher cost
Gemini 3.1 Pro ~63% ~115K Higher cost
GPT-5.4 ~62% ~110K Higher cost

The takeaway: Claude Opus 4.6 still has a slight edge on raw capability, but MiMo V2.5-Pro hits the same neighborhood while spending roughly 40–60% fewer tokens to get there. On a pricing-per-trajectory basis, this is not a rounding error. As VentureBeat noted, in a world where GitHub Copilot and most agent platforms are moving to usage-based billing, this token efficiency translates directly into real money for any team running agents at scale.

On other benchmarks, the picture is a coding-first specialist:

  • SWE-bench Pro: 57.2% — within half a point of Claude Opus 4.6 and GPT-5.4.
  • Terminal-Bench 2.0: Leads Opus 4.6 and Gemini 3.1 Pro outright.
  • Video-MME: 87.7 — on par with Gemini 3 Pro on video understanding.
  • GDPVal-AA (Elo): 1581 — surpasses Kimi K2.6 and GLM 5.1.
  • Long-context recall (1M): 0.37 BFS / 0.62 Parents — where most competitors collapse to near-zero past 512K.

Where it lacks: HLE (Humanity's Last Exam) and GDPVal-AA broad reasoning — both reward general-purpose breadth over coding-specialist depth. If you need a tutor or a polymath, this is not your model. If you need an agent that ships code, it absolutely is.

MiMo V2.5 architecture and benchmark visualization
FIG.02 Hybrid sparse MoE architecture — V2.5's structural cost advantage

What can MiMo V2.5-Pro actually do?

Benchmarks are one thing. Xiaomi went further and published four multi-hour autonomous task runs — the kind of work where the agent can't be hand-held. These are the demos worth taking seriously, because they include the full tool-call trace.

TASK / 01SOFTWARE ENG
SysY Compiler in Rust
233 / 233
Built a complete compiler from scratch — lexer, parser, AST, Koopa IR codegen, RISC-V backend. 4.3 hours, 672 tool calls. Perfect score against Peking University's hidden test suite (a project that takes a CS major several weeks).
TASK / 02APPLICATION
Desktop Video Editor
8,192 LOC
From a few prompts: multi-track timeline, clip trimming, cross-fades, audio mixing, export pipeline. 11.5 hours, 1,868 tool calls. AI voice-over driven by MiMo-V2-TTS.
TASK / 03HARDWARE EDA
FVF-LDO Analog Circuit
~1 hour
Designed and optimized a low-dropout regulator in TSMC 180nm CMOS using ngspice in a closed loop. Six metrics simultaneously hit spec; four improved by an order of magnitude over the model's first attempt.
TASK / 04HARNESS AWARENESS
Self-Managed Context
1M tokens
Across all four runs, V2.5-Pro demonstrated "harness awareness" — actively managing its own memory, shaping its own context window, and steering toward final objectives across thousands of sequential tool calls.

The Rust compiler run is the one to internalize. It is not a toy. It is a real PKU course project with a real hidden test suite, and a frontier closed-source model would have struggled to do it in one shot at that token budget. This is what the phrase "long-horizon coherence" actually looks like in production.

MiMo V2.5 pricing — and why it's the real story

Here is where the open-source positioning gets interesting. MiMo V2.5 ships under open weights on Hugging Face for self-hosting, but Xiaomi also runs a hosted API with aggressive pricing — and a "Token Plan" subscription model that mirrors Claude Code and OpenAI's flat-rate offerings.

API Pricing — per 1M tokens (overseas) UPDATED 2026-04
Model
Input
Output
vs Opus 4.7
MiMo V2.5 (base)
$0.40
$2.00
~13× cheaper
MiMo V2.5-Pro
$1.00
$3.00
~5–8× cheaper
Claude Opus 4.7
$5.00
$25.00
baseline
GPT-5.5
$5.00
$30.00
baseline

Two things to flag: cache hits drop input cost as low as $0.20–0.40 per million tokens, and Xiaomi made cache writing free of charge for a limited launch window. The 1M-context multiplier is also gone. If you are running long-horizon agents, the real cost gap versus closed-source frontier models is closer to 10× than .

For teams that prefer flat-rate, the four-tier Token Plan goes from $63.36/yr (Lite, 720M credits) to $1,056/yr (Max, 19.2B credits) — and is compatible with Claude Code, OpenCode, and Kilo as drop-in scaffolds.

Should you use MiMo V2.5? Pros, cons, and who it's for.

Strengths

  • Best-in-class token efficiency on agentic tasks (40–60% fewer tokens than Claude Opus 4.6).
  • Genuine 1M-token usable context — doesn't collapse past 512K like most rivals.
  • Native multimodal in a single model (image, video, audio, text).
  • Open weights on Hugging Face — self-hostable, fine-tunable.
  • "Harness awareness" — actively manages its own context across thousands of tool calls.
  • Drop-in compatible with Claude Code, OpenCode, Kilo.

Weaknesses

  • Trails on broad-reasoning benchmarks (HLE, GDPVal-AA) — coding-first by design.
  • Self-reported figures on token efficiency need independent replication.
  • Hosted infrastructure outside China is still maturing — latency varies.
  • Tool-call ecosystem and harness integrations less battle-tested than Claude or GPT.
  • Documentation and community support still catching up to Western providers.

Who should use MiMo V2.5

If you are building agentic coding workflows — long-horizon, multi-tool, repo-scale — and your unit economics depend on token cost, MiMo V2.5-Pro is now on the shortlist. Same goes for any team running multimodal agents with heavy video or document understanding.

Who should stick with Claude or GPT

If your primary workload is broad-reasoning chat, research synthesis, or general knowledge work, Claude Opus 4.7 and GPT-5.5 still hold the edge. The Western models also have more mature tool ecosystems, longer track records of stability under production load, and stronger guarantees around enterprise data handling.

Frequently asked questions

Is MiMo V2.5 actually open source?
Yes. The full V2.5 series — including V2.5, V2.5-Pro, the TTS suite, and the ASR model — is published on Hugging Face under open weights. The base V2.5 release includes weights, tokenizer, and a complete model card. Self-hosting is supported via vLLM, with an official deployment cookbook from Xiaomi.
Is MiMo V2.5 better than Claude Opus 4.7?
It depends on the task. On agentic coding benchmarks like ClawEval and Terminal-Bench, V2.5-Pro is competitive or slightly ahead while using 40–60% fewer tokens. On broad reasoning (HLE) and general intelligence indices, Claude Opus 4.7 retains a clear lead. For production agent workloads, V2.5-Pro is often the better cost-adjusted choice.
How much does MiMo V2.5 cost via API?
MiMo V2.5 (base) is $0.40 per million input tokens and $2.00 per million output tokens. MiMo V2.5-Pro is $1.00 input / $3.00 output. Cache hits can drop input cost to $0.20–0.40. There is no longer a multiplier for using the full 1M context window. For comparison, Claude Opus 4.7 is $5/$25 and GPT-5.5 is $5/$30.
Can I use MiMo V2.5 with Claude Code or OpenCode?
Yes. Xiaomi explicitly supports drop-in compatibility with Claude Code, OpenCode, OpenClaw, and Kilo as agentic scaffolds. You can swap the model endpoint and continue using the same harness. This is one of the most pragmatic adoption paths for existing Claude Code users.
What hardware do I need to self-host MiMo V2.5?
The base V2.5 model has 310B total / 15B active parameters, so inference VRAM scales with the active set plus expert routing. A reasonable self-hosting setup uses 8× H100 or H200 GPUs with vLLM and tensor parallelism. V2.5-Pro is heavier (1.02T / 42B active) and typically requires multi-node inference. Most production teams will start with the hosted API and migrate selectively.
What is "harness awareness" and why does it matter?
Harness awareness is Xiaomi's term for the model's ability to actively reason about its own runtime environment — managing its memory budget, shaping its own context window, and steering its tool-call sequences toward end objectives. In long-horizon tasks (thousands of tool calls), this is the difference between a model that drifts and one that ships. It is the single most underrated capability in the V2.5-Pro release.

The open-source frontier just moved.

MiMo V2.5 is not a Claude Opus replacement for every workload — but for agentic coding at scale, it is the new cost-adjusted leader, and the gap to closed-source frontier is officially within rounding distance. We will be tracking real-world replication, third-party benchmarks, and ecosystem adoption as it evolves.

// END OF FILE ai.cc · model_review · v2.5 · 2026

300+ AI Models for
OpenClaw & AI Agents

Save 20% on Costs