The AI model landscape shifted again on March 18, 2026, when Chinese AI company MiniMax released M2.7 — a self-evolving large language model that managed 30 to 50 percent of its own development workflow.

4 Models
Frontier AI contenders in March 2026
16x Cheaper
M2.7 vs Opus on input pricing
80.8%
Claude Opus leads SWE-bench Verified
$0.30/M
MiniMax M2.7 input token price

MiniMax M2.7: The Self-Evolving Challenger

MiniMax M2.7 is the first major commercial model to publicly document recursive self-improvement. Earlier versions built the research agent harness that managed data pipelines, training environments, and evaluation infrastructure for M2.7 itself — handling 30-50% of its own development workflow.

Key Facts
  • Released March 18, 2026 by MiniMax (China)
  • Self-evolving: model participated in its own training pipeline
  • 205K context window, 131K max output
  • Pricing: $0.30/M input, $1.20/M output
  • PinchBench: 86.2% (within 1.2 points of Claude Opus 4.6)

On the SWE-Pro benchmark, M2.7 scored 56.22%, matching GPT-5.3-Codex. On PinchBench, M2.7 hit 86.2%, placing fifth overall. Terminal Bench 2 scored 57.0%, and SWE Multilingual reached 76.5%.

The hallucination improvement is dramatic: M2.7 scored +1 on the AA-Omniscience Index, up from M2.5's -40, with a hallucination rate of 34% — lower than Claude Sonnet 4.6 (46%) and Gemini 3.1 Pro (50%).

Claude Opus 4.6: The Coding Leader

Anthropic released Claude Opus 4.6 on February 5, 2026. It leads SWE-bench Verified at 80.8% and holds the #1 Chatbot Arena ELO at 1503.

On ARC-AGI-2, Opus scores 68.8% — ahead of GPT-5.4's 52.9% but behind Gemini 3.1 Pro's 77.1%. It also leads on Humanity's Last Exam and DeepSearchQA.

The tradeoff is cost: $5.00/M input and $25.00/M output — the most expensive in this comparison.

GPT-5.4: The Agentic Workhorse

OpenAI released GPT-5.4 on March 6, 2026. It carries a General Intelligence Index score of 57 (vs Opus's 53) and dominates agentic execution.

GPT-5.4 leads Terminal-Bench 2.0 at 75.1% vs Opus's 65.4%, and SWE-bench Pro at 57.7%. Its native Computer Use surpasses human performance on OSWorld at 75.0%.

Gemini 3.1 Pro: The Reasoning Powerhouse

Google's Gemini 3.1 Pro (Feb 19, 2026) ranks #1 on the Artificial Analysis Intelligence Index across 115 models, leading 13 of 16 benchmarks.

ARC-AGI-2: 77.1% — more than doubling its predecessor. GPQA Diamond: 94.3% (all-time record). But SWE-Bench Verified trails at 68.5%, and latency averages 29 seconds to first token.

Benchmark Comparison

Benchmark Claude Opus 4.6 GPT-5.4 Gemini 3.1 Pro MiniMax M2.7
SWE-bench Verified 80.8% ~78% 68.5% TBD
SWE-bench Pro ~45% 57.7% 56.2%
ARC-AGI-2 68.8% 52.9% 77.1%
GPQA Diamond 91.3% 93.2% 94.3%
PinchBench ~87% 86.2%
Terminal Bench 2.0 65.4% 75.1% 57.0%
Chatbot Arena ELO 1503 1463 TBD
MMLU 91.1% 89.6%

Pricing Comparison

Model Input (per 1M tokens) Output (per 1M tokens) Context Window
MiniMax M2.7 $0.30 $1.20 205K
GPT-5.4 $2.50 $15.00 128K
Claude Opus 4.6 $5.00 $25.00 200K (1M beta)
Gemini 3.1 Pro ~$1.25 ~$5.00 1M

MiniMax M2.7 is 8x cheaper than GPT-5.4 and 16x cheaper than Claude Opus 4.6 on input tokens.

Which Model Should You Choose?

Key Facts
  • **Budget production:** MiniMax M2.7 — frontier performance at 1/10th the price
  • **Complex coding:** Claude Opus 4.6 — 80.8% SWE-bench, #1 Chatbot Arena
  • **Agentic automation:** GPT-5.4 — Terminal-Bench leader, Computer Use built in
  • **Research/reasoning:** Gemini 3.1 Pro — ARC-AGI-2 and GPQA Diamond champion

The AI model comparison in March 2026 reveals a market where no single model dominates every category. The era of one clear best model is over. The smartest strategy is matching the model to the task.

Feb 5, 2026
Anthropic releases Claude Opus 4.6
Feb 17, 2026
Claude Sonnet 4.6 launches
Feb 19, 2026
Google releases Gemini 3.1 Pro
Mar 6, 2026
OpenAI releases GPT-5.4
Mar 18, 2026
MiniMax releases M2.7