DeepSeek V4 Release: 4 Delays, Full Benchmarks, 2026 Access

DeepSeek V4 is the most significant AI model launch of 2026 — and as of April 23, 2026, it still hasn't shipped. A trillion-parameter mixture-of-experts model, open-source under Apache 2.0, priced at an estimated $0.30/MTok — roughly 50x cheaper than GPT-5. The official release window is "late April," and it's been delayed four times. Here's the current status, every confirmed spec, leaked benchmarks that put it at 83.7% SWE-bench and 99.4% AIME 2026, and exactly how to access it when it drops.

**~1 trillion** total parameters (Mixture-of-Experts architecture)

**~37 billion** active parameters per token (efficient inference)

**1 million** token context window (Engram conditional memory)

**~$0.30/MTok** estimated API input pricing (vs GPT-5 at ~$15/MTok)

**~83.7%** SWE-bench Verified (leaked benchmark — up from ~80-81% in earlier leaks)

**~99.4%** AIME 2026 math benchmark (leaked)

**Apache 2.0** open-source license — full commercial use permitted

**4** delays since the original Q4 2025 target

Current Status: 4 Delays, Still "Late April"

DeepSeek founder Liang Wenfeng's original target was Q4 2025. The model slipped to February 2026 (tied to Lunar New Year), then March, then early April, and now the confirmed window is "late April 2026" — running through April 30.

As of April 23, 2026, the full V4 model has not appeared in DeepSeek's API. The primary bottleneck: hardware issues during training on Huawei Ascend 910B and 950PR chips. US export restrictions on NVIDIA have forced DeepSeek onto domestic Chinese semiconductor infrastructure — Huawei's chips are capable, but optimizing training at this scale on non-NVIDIA hardware takes time.

What is available now: V4-Lite has been deployed to API infrastructure since early April. Developers report 30% faster inference speeds than V3.2 and 94% context recall at 128K tokens (up from 45% previously). This isn't the full model — it's an optimized preview variant — but it confirms the architectural improvements are real.

Watch for the announcement: Follow @deepseek_ai on X. There is no announced specific date within the April window. Access opens immediately at launch — no waiting list.

Why This Delay Is Different

The previous three delays (Q4 2025, February, March) were training-related. The current delay is hardware-related — specifically, optimizing the V4 training run on Huawei Ascend 950PR chips.

This is geopolitically significant: DeepSeek V4 will be the first frontier-class AI model built entirely on Chinese domestic semiconductor infrastructure. Reuters confirmed on April 3 that DeepSeek gave Huawei's Ascend 950PR chips exclusive early hardware access, while denying NVIDIA early access. China's frontier AI development is no longer dependent on NVIDIA — and V4 is the proof point, whether or not it ships on schedule.

The delay is not a signal that V4 is in trouble. V4-Lite's strong performance numbers (30% speed improvement, 94% context recall) indicate the model quality is there. This is an infrastructure problem, not a capability problem.

Updated Benchmarks: Leaked Numbers

DeepSeek hasn't published official V4 benchmarks. These come from leaks and third-party reporting. Treat them as directional — official results will follow release, and the final numbers may differ.

Benchmark	V4 (Latest Leak)	DeepSeek V3.2	GPT-5	Claude Opus 4.6
SWE-bench Verified	~83.7%	~71%	~72%	~68%
AIME 2026	~99.4%	~92%	~93%	~91%
MMLU	~92–93%	~88%	~90%	~89%
HumanEval	~96%	~91%	~92%	~90%

All V4 numbers are pre-release leaks. Official benchmarks pending full release.

The SWE-bench number — 83.7% on coding tasks — is the figure that matters most for enterprise buyers. If it holds, V4 would lead every publicly benchmarked model on code generation by a significant margin, at 1/50th the API cost of GPT-5. The AIME 2026 score (99.4%) indicates near-complete mastery of competition-level mathematics.

Architecture: Three Innovations That Matter

Engram Memory System

Engram is the enabling technology behind V4's 1-million-token context window. Unlike naive attention scaling that becomes exponentially expensive, Engram uses conditional memory activation — it selectively engages relevant long-context memory rather than attending to everything. The result: 1M context at inference costs closer to a 32K-context model.

This is why the pricing can be so low. Long-context processing is normally the expensive part. Engram neutralizes that cost.

Dynamic Sparse Attention (DSA)

DSA reduces attention computation by skipping token pairs with low relevance scores. In practice, this means faster inference on long documents without losing information from sparsified regions — because the sparsification is learned, not random. V4-Lite's 30% speed improvement is attributed primarily to DSA.

Multi-Head Compression (mHC)

DeepSeek claims a 35x processing speedup and 40% energy reduction from their mHC implementation, which compresses attention head outputs before aggregation. The efficiency math is consistent with the $0.30/MTok target pricing.

Key Facts

Engram: Conditional long-term memory enables 1M token context at low cost
DSA (Dynamic Sparse Attention): Skips irrelevant token pairs for faster inference
mHC (Multi-Head Compression): 35x processing speedup claimed, 40% energy reduction
MoE split: 1T total / 37B active parameters — runs like a 37B model, thinks like a 1T

The Pricing Disruption

The leaked $0.30/MTok input pricing is the number that should concern every AI company charging $10–15/MTok.

For a mid-size enterprise sending 10 billion tokens per month to GPT-5, that's $150,000/month. At V4 pricing, the same workload runs ~$3,000. That's not an incremental cost reduction — it changes the ROI math for entire business cases.

The cached input pricing (~$0.03/MTok — a 90% discount) makes long-context workflows even cheaper. Repeatedly querying a large codebase or document set becomes nearly free.

Open Source: Apache 2.0 Means Full Commercial Freedom

Apache 2.0 is the most permissive major open-source license in common use. For enterprises, this means:

Commercial deployment: No royalties, no licensing fees, no usage restrictions
Fine-tuning: Train V4 on proprietary data, keep the resulting model private
Self-hosting: Run on your own infrastructure — no API costs, no data leaving your environment
No attribution required: Unlike GPL or CC-BY licenses, Apache 2.0 doesn't require crediting DeepSeek
No viral clauses: Use V4 in closed-source products without releasing your code

For organizations with data privacy requirements, self-hosted V4 eliminates both the cost and the compliance concern.

Delay History: From Target to "Late April"

Q4 2025: Original internal target — slipped due to training complexity

February 17, 2026: Lunar New Year window — no launch, training delays

March 2026: Third expected window — Huawei chip bottleneck identified; architecture leaked

Early April: V4-Lite deployed to API; 30% speed gains confirmed by developers

April 3: Reuters confirms late April window; Huawei Ascend 950PR chip partnership confirmed

April 23: Fourth delay — full model still not in API; "late April" window through April 30

April 24–30 (expected): Official announcement via @deepseek_ai on X

Post-launch: Apache 2.0 weights on Hugging Face; quantized variants within days

May 2026: OpenRouter and Together AI typically integrate within 1–2 weeks

How to Access V4 at Launch

Try V4-Lite now: If you have a DeepSeek API key, test V4-Lite immediately. It's already deployed and shows the new architecture's performance characteristics before the full model lands.

When V4 drops, here's how to get full access:

1. DeepSeek API (api.deepseek.com) Same endpoint pattern as V3. If you're already integrated with DeepSeek, upgrading is a single model parameter change. Expect high demand and possible rate limits in the first 48 hours.

2. Hugging Face model weights V4 weights will appear on the DeepSeek Hugging Face organization page. The full MoE model requires ~500GB+ storage. Most teams will wait for quantized GGUF/AWQ versions that typically appear within days.

3. DeepSeek web interface (chat.deepseek.com) Open access from day one, consistent with previous DeepSeek releases. Expect traffic-related slowdowns at launch.

4. Third-party APIs OpenRouter and Together AI typically integrate new DeepSeek models within days. If direct DeepSeek API is congested at launch, these are reliable alternatives with better uptime guarantees.

5. Self-hosted infrastructure The full 1T MoE model requires substantial GPU resources (multiple H100s or equivalent). Monitor the DeepSeek GitHub repo for official quantization guidance.

ℹ️

How to get notified at launch: Follow @deepseek_ai on X and watch their GitHub repository. There is no announced waiting list — access opens immediately at launch. V4-Lite is available via the API right now if you want to start testing.

Who Benefits Most From V4

AI developers and engineers: SWE-bench numbers at ~83.7% would put V4 ahead of every other model on coding benchmarks. If confirmed, it becomes the clear choice for AI-assisted development at any price.

Enterprise IT and procurement: The pricing math is unavoidable. Any current GPT-5 deployment needs a V4 cost comparison before next budget cycle. The input cost gap (50x) is too large to ignore.

Open-source AI community: Apache 2.0 + frontier performance will trigger immediate fine-tuning. Expect specialized V4 derivatives within weeks of release.

Data-sensitive organizations: Apache 2.0 self-hosting is the clearest path to frontier-model AI with full data control — no cloud API, no compliance exposure.

Frequently Asked Questions

Will DeepSeek V4 be free to use? The API will be paid (estimated $0.30/MTok input), but the model weights are Apache 2.0 — free to download, self-host, and use commercially. The web interface at chat.deepseek.com will likely have a free tier consistent with past releases.

Is V4-Lite the same as V4? No. V4-Lite is a preview variant already deployed to API infrastructure. It demonstrates the new architecture's speed improvements (30% faster inference) but isn't the full 1T parameter model.

How many times has DeepSeek V4 been delayed? Four times: the original Q4 2025 target, February 2026 (Lunar New Year window), March 2026, and the current April delay. The consistent reason: hardware optimization challenges on Huawei Ascend chips under US export restriction conditions.

What's the difference between V4 and V3.2? V4 introduces three new architectural components (Engram, DSA, mHC), expands context from 128K to 1M tokens, and jumps ~12 percentage points on coding benchmarks. It's a full-generation jump, not an incremental update.

When exactly will V4 launch? "Late April 2026" is the confirmed window (through April 30). No specific date has been announced. As of April 23, it remains unreleased. Follow @deepseek_ai on X for the announcement.

Can I use V4 for commercial projects? Yes. Apache 2.0 permits commercial use, fine-tuning, and redistribution without fees or attribution requirements.

Why is DeepSeek building on Huawei chips? US export restrictions prevent DeepSeek from using NVIDIA's latest GPUs for large-scale training. Huawei's Ascend 950PR chips are China's domestic alternative. V4 becoming the first frontier model on Chinese chips is as strategically significant as the model's performance numbers.

The Bottom Line

DeepSeek V4 is 7 days or fewer from launch — the "late April" window closes April 30. Four delays haven't changed the underlying specs: 1T-parameter MoE, 1M token context, Apache 2.0 open weights, ~$0.30/MTok pricing, and leaked benchmarks showing 83.7% SWE-bench and 99.4% AIME 2026. Whether it ships this week or slips again, the model is weeks away at most — and V4-Lite is live for developers right now. Start testing at api.deepseek.com. Watch @deepseek_ai on X for the launch announcement.