DeepSeek V4: What We Know Before the April Launch

DeepSeek's first major model since its January 2025 R1 disruption is finally close—a trillion-parameter multimodal system targeting frontier coding benchmarks and million-token contexts. But the AI landscape it's launching into looks nothing like the one R1 disrupted.

Photo by Riccardo Marchegiani on Pexels

Fourteen Months of Silence—And a Model That Still Hasn’t Shipped

In January 2025, DeepSeek’s R1 reasoning model briefly topped global app charts and triggered a sell-off in US semiconductor stocks. The company had pulled off a surprise: a model competitive with GPT-4o, trained at a fraction of the reported cost. Now, 14 months later, DeepSeek is preparing its next flagship—V4—and the industry is watching with a mix of anticipation and fatigue. Multiple release windows have come and gone. March didn’t happen. The current best estimate, from Chinese tech outlet Whale Lab, is April 2026.

What actually ships will matter. V4 is DeepSeek’s first genuinely multimodal model, its first to target the 1-million-token context range, and its first explicit attempt to compete on agentic software engineering tasks. Whether it delivers on any of those promises—and whether it can recreate R1’s market shock in a landscape that now includes GPT-5.2, Gemini 3.1 Pro, and a dozen Chinese competitors—is a real question.

What the Architecture Looks Like

DeepSeek V4 is a Mixture-of-Experts (MoE) model at roughly 1 trillion total parameters, with approximately 32 billion active per token—a 50% increase in total size over V3, while slightly reducing the active-parameter count from 37B. That ratio is the whole point: more specialized knowledge stored in the model, less compute consumed per inference call.

The attention system has been upgraded to what DeepSeek is calling DeepSeek Sparse Attention (DSA). Analysis of code repositories suggests it uses a lightweight indexer—internally called Lightning Indexer—to identify the 2,048 most relevant tokens from a full million-token context window before applying full attention. This is a different approach to long-context efficiency than the brute-force 1M-token windows Google is deploying in Gemini 3.1 Pro, and it trades some recall completeness for significantly lower latency.

The other key architectural addition is Engram Conditional Memory: a static knowledge store that offloads factual retrieval from the neural network itself. The idea is O(1) lookup for established facts, freeing the model’s attention budget for actual reasoning. It’s a pragmatic acknowledgment that transformers are inefficient at pure factual recall—a known problem that most labs have addressed through retrieval augmentation rather than architectural change.

Multimodal Natively, Not Bolted On

Unlike V3, which was text-only, V4 handles text, image, and video generation from a single unified model. DeepSeek is calling this native multimodality rather than a pipeline of separate models stitched together. On March 9, 2026, a “V4 Lite” variant briefly appeared on DeepSeek’s website before being pulled—a leak suggesting the broader model family is close to complete. The full V4 presumably handles heavier workloads at higher parameter counts.

The competitive comparison here is complicated. OpenAI’s GPT-5.2 and Google’s Gemini 3.1 Pro both offer text-image-audio-video capabilities. What DeepSeek is betting on is doing this at lower inference cost through MoE sparsity, with optimization for both Nvidia Blackwell hardware and domestic Chinese chips—Huawei Ascend and Cambricon. The hardware localization matters for Chinese enterprise customers subject to US export controls, even if training itself still required Nvidia GPUs.

The Benchmark DeepSeek Is Racing Toward

Internal leaks and third-party analyses point to a claimed 83.7% on SWE-bench Verified—the benchmark that measures real-world GitHub issue resolution across multi-file codebases. That score would place V4 ahead of Claude 3.7 Sonnet (which sits around 70%) and competitive with the top agentic coding systems as of early 2026. Reuters and The Information have both reported that DeepSeek internally describes V4 as optimized for “long-context software engineering tasks,” with coding as the primary showcase domain.

SWE-bench numbers should always be read carefully. The benchmark has known contamination risks—models trained on GitHub data that overlaps with the test set—and lab-reported scores frequently don’t translate to production performance on real codebases. That said, 83.7% would be a meaningful result if independently verified, particularly given DeepSeek’s track record of submitting conservative claims.

Why the Delay, and Why It Matters

DeepSeek’s V4 has missed at least four publicly anticipated release windows: mid-February, the Lunar New Year window, late February, and early March. The original Financial Times reporting pointed to a March launch. None materialized. One likely reason is training instability at trillion-parameter scale—the Manifold-Constrained Hyper-Connections innovation referenced in code analysis is specifically described as addressing gradient instability during training, which suggests this was a real problem encountered during development.

Another factor is hardware. DeepSeek began V4 training on Huawei Ascend chips as a demonstration of domestic independence, encountered stability issues, and reverted to Nvidia hardware for training—using Chinese chips for inference only. That detour cost time. For a company that has positioned itself as a proof point for efficient AI development outside the US hardware ecosystem, the acknowledgment that Ascend couldn’t handle flagship training is awkward.

The delay also matters for competitive reasons. In January 2025, DeepSeek launched into a gap: GPT-4o was the dominant model, and nothing competitive existed at V3’s price point. By April 2026, the landscape includes GPT-5.2 with a 400K-token context and claimed 6.2% hallucination rates, Gemini 3.1 Pro at 1M tokens, and a cluster of Chinese models—GLM-5 from Zhipu AI and Qwen 3.5 from Alibaba—that have already claimed their share of the open-weights frontier. DeepSeek V4 is no longer arriving into a vacuum.

What to Watch When It Arrives

The most important question isn’t whether V4 hits its SWE-bench number—it’s whether the inference cost structure holds. DeepSeek’s competitive advantage with V3 was delivering GPT-4o-class performance at roughly 10% of OpenAI’s API price. If V4 achieves multimodal frontier performance at similarly disruptive pricing, it changes procurement calculations for every enterprise AI team. If it prices at or near Western frontier rates, the strategic story is considerably weaker.

The second question is open weights. DeepSeek released V3 under a permissive license, which triggered a wave of fine-tunes, quantizations, and derivative research globally. Whether V4 follows the same path—or whether a model at this capability level gets held back commercially—will shape how much impact it has outside China.

For now, V4 represents the most technically ambitious thing DeepSeek has attempted: trillion-parameter scale, native multimodality, novel sparse attention, and a million-token context, all targeting production software engineering. The wait has been long enough that the bar has risen. April will tell us whether they cleared it.

DeepSeek V4: What We Know Before the April Launch

Fourteen Months of Silence—And a Model That Still Hasn’t Shipped

What the Architecture Looks Like

Multimodal Natively, Not Bolted On

The Benchmark DeepSeek Is Racing Toward

Why the Delay, and Why It Matters

What to Watch When It Arrives

Further Reading

Don’t miss on GenAI tips!

Don’t miss on GenAI tips!

Related Posts

DORA 2025: More AI, More Code, Flatter Delivery

Elicit: How AI Cuts Systematic Review Time by 80%