DeepSeek V4 Flash
TrendingDeepSeek · Active · Updated May 20, 2026
DeepSeek's fastest and most cost-efficient model, optimized for high-volume, low-latency applications with strong reasoning capabilities.
Input Price
$0.14/M
per million tokens
Output Price
$0.28/M
per million tokens
Context Window
262,144
tokens
Max Output
8,192
tokens
Technical Specifications
| Provider | DeepSeek |
| Release Date | March 1, 2026 |
| Pricing Type | per token |
| Input Price | $0.14.00 / 1M tokens |
| Output Price | $0.28.00 / 1M tokens |
| Cached Input | $0.03 / 1M tokens |
| Context Window | 262,144 tokens |
| Max Output | 8,192 tokens |
| Input Modalities | text, image |
| Output Modalities | text |
| Status | active |
| Availability | api, web_app |
| Latency | very fast |
| Rate Limit | 10,000 RPM |
| Pricing URL | View official pricing → |
| Docs URL | View documentation → |
Capability Scores
Coding82
Reasoning78
Math80
Speed95
Overview
DeepSeek V4 Flash represents the sweet spot between speed and intelligence. It delivers near-frontier reasoning at a fraction of the cost of premium models, with inference speeds that rival the fastest small models. With a 256K context window and multimodal input support, it is an exceptional choice for production workloads that need both quality and throughput.
Pros
- +Very fast inference (95/100 speed score) — ideal for real-time applications
- +Competitive reasoning at $0.14/M input tokens
- +256K context window exceeds many competitors
- +Image input support for multimodal use cases
Cons
- −Coding performance trails frontier models (82 vs 90+)
- −Lower raw intelligence ceiling than V4 Pro variant
- −Text-only output — no audio or image generation
Use Cases
Real-time chat and customer service at scale
High-throughput content generation and classification
Cost-sensitive production AI pipelines