Skip to content

DeepSeek V4 Is Here: 1T Parameters at $0.30/MTok

5 min read

DeepSeek V4 Is Here: 1T Parameters at $0.30/MTok
Photo by Steve Johnson on Pexels

Why DeepSeek V4 Matters Right Now

DeepSeek V4 has arrived, and it changes the economics of frontier AI. The Chinese lab’s latest model packs roughly 1 trillion parameters into a Mixture-of-Experts architecture that activates only 37 billion parameters per token — and prices API access at $0.30 per million input tokens. For context, that’s roughly 3–5× cheaper than GPT-5.4 and Claude Opus 4.6 for equivalent-class capability.

The model shipped in stages: a V4 Lite variant (~200B parameters) appeared on March 9 to validate the core architecture, with the full model rolling out through late March and into April 2026. What makes V4 technically interesting isn’t just scale — it’s three architectural innovations that solve real problems transformers face above 671B parameters.

What’s New in the Architecture

DeepSeek V4 introduces three key technical changes over its predecessor, V3. Each targets a specific bottleneck that emerged as models scaled past the 671B-parameter mark.

Engram Conditional Memory

Engram is the headline innovation. Named after the neuroscience term for a memory trace, it separates static knowledge retrieval from dynamic reasoning. Instead of running everything through attention layers, Engram stores static patterns — syntax rules, entity names, library function signatures — in a hash-based lookup table in DRAM, retrievable in O(1) time.

The practical result: V4 handles a 1-million-token context window without the quadratic attention cost that normally makes long contexts prohibitively expensive. DeepSeek’s testing shows 97% accuracy on Needle-in-a-Haystack evaluations at the full 1M-token length, up from 84.2% on their previous architecture. The team found a sweet spot allocating roughly 20–25% of the model’s sparse parameter budget to Engram, with the remaining 75–80% going to MoE compute.

DeepSeek Sparse Attention

The second innovation is DeepSeek Sparse Attention (DSA) with a “Lightning Indexer” that cuts long-context compute roughly in half. Combined with Engram’s O(1) memory system, this is how V4 makes million-token contexts economically viable at $0.30/MTok input pricing.

Manifold-Constrained Hyper-Connections

The third change, Manifold-Constrained Hyper-Connections (mHC), improves gradient flow during training at trillion-parameter scale. This is less visible to end users but critical for training stability — a problem that has historically forced labs to restart training runs at enormous cost.

Benchmarks: Promising but Unverified

DeepSeek’s internal benchmarks claim V4 scores 80–85% on SWE-bench Verified and around 90% on HumanEval. If accurate, that would put V4 in the same tier as Claude Opus 4.6 and GPT-5.4 on coding tasks — at a fraction of the price.

The caveat matters: these numbers come from DeepSeek’s own testing. Independent evaluations from organizations like NxCode and the broader developer community are still catching up. Early third-party reports suggest V4 is genuinely strong on code generation and long-context retrieval, but the exact benchmark numbers remain contested. One aggregator, NxCode, cites 81% SWE-bench Verified — solid, but a few points below the upper range of DeepSeek’s claims.

On multimodal tasks, V4 handles text, image, and video generation natively. This makes it one of the few models offering genuine multimodal generation (not just understanding) at this price point.

The Huawei Factor

V4 is optimized for Huawei Ascend and Cambricon chips — domestic Chinese silicon, not NVIDIA GPUs. This is a strategic shift with real technical implications. Running V4 on NVIDIA hardware at launch may not achieve the same performance or cost profile as the reported numbers, which were tuned for Huawei’s architecture.

This matters for two reasons. First, it demonstrates that frontier-class models can be trained and served on non-NVIDIA hardware, which was still an open question 18 months ago. Second, it signals that DeepSeek is building for a world where US export controls on advanced chips continue indefinitely. Chinese tech giants including Alibaba, ByteDance, and Tencent have placed bulk orders for Huawei’s chips totalling hundreds of thousands of units.

For developers outside China, the practical question is whether DeepSeek’s API — served from Chinese data centers — delivers the same latency and reliability as US-based alternatives. As we noted in our earlier piece on what to expect from DeepSeek V4, data residency and latency remain the main friction points for Western enterprise adoption.

Who Should Care — and Who Should Wait

If you’re building cost-sensitive AI applications — high-volume summarization, code generation pipelines, or document processing at scale — V4’s pricing is hard to ignore. The $0.30/MTok input rate, with cached prefixes dropping to $0.03/MTok, makes it the cheapest frontier-class model available by a significant margin.

If you need verified, reproducible benchmark results before committing — especially for regulated industries or production systems where model provenance matters — it’s worth waiting for independent evaluations to stabilize. The model is genuinely capable, but the gap between internal claims and third-party verification hasn’t fully closed yet.

The broader pattern is clear: the gap between Chinese and Western frontier models has effectively closed on most benchmarks, and the price competition is intensifying. DeepSeek V4 is the latest — and strongest — evidence that the days of paying premium prices simply because there were no alternatives are ending. As Alibaba’s Qwen 3.5 showed in February, Chinese labs are not just catching up — they’re competing on architecture innovation, not just scale.

Further Reading

Don’t miss on Ai tips!

We don’t spam! We are not selling your data. Read our privacy policy for more info.

Don’t miss on Ai tips!

We don’t spam! We are not selling your data. Read our privacy policy for more info.

Enjoyed this? Get one AI insight per day.

Join engineers and decision-makers who start their morning with vortx.ch. No fluff, no hype — just what matters in AI.