What is Deepseek AI Training Cost

The training cost for DeepSeek, like other large language models, depends on several factors including model size, training duration, hardware used, and efficiency. Here's a structured estimate based on comparable models and industry standards:

Key Factors Influencing Cost

1. Model Size

- Assuming DeepSeek has a similar scale to GPT-3 (175B parameters), or a slightly smaller size (e.g., 100-200B parameters).

2. Training Data

- Training on hundreds of billions of tokens (e.g., 300B–500B tokens), which increases computational requirements.

3. Hardware

- Using A100 GPUs (common for AI training). Cloud pricing for A100s is ~$3.5/hour.

- Cluster size: 1,024–2,048 GPUs, depending on parallelism and speed requirements.

4. Training Duration

- Roughly 2–4 weeks (e.g., 20–30 days) of continuous training.

5. Efficiency

- Accounting for ~30-50% utilization due to communication overhead and bottlenecks.

Cost Estimation

Example Calculation

- 2,048 A100 GPUs running for 30 days at $3.5/hour

Total Cost = 2,048 GPUs × 720 hours × $3.5/hour ≈ $5.2 million.

- Smaller clusters (e.g., 1,024 GPUs for 21 days):

Total Cost = 1,024 GPUs × 504 hours × $3.5/hour ≈ $1.8 million.

FLOPs-Based Estimate

For a 200B-parameter model trained on 500B tokens

FLOPs = 6 × 200B × 500B = 6e23 FLOPs.

- At 2,048 A100s (312 TFLOPS/GPU), training time ≈ 21 days.

- Total cost aligns with the $1.8–$5.2 million range.

Comparison to Known Models

- GPT-3 (175B parameters): ~$4.6 million (using older V100 GPUs).

- DeepSeek: Likely in the $1–$5 million range, depending on optimizations and hardware choices.

Conclusion

While exact figures for DeepSeek are not public, a reasonable estimate for training a state-of-the-art model of similar scale would be between $1 million and $5 million, with higher costs for larger clusters or longer training times. This aligns with industry benchmarks for models trained on thousands of GPUs over several weeks.

What is Deepseek AI Training Cost

Key Factors Influencing Cost

Cost Estimation

Comparison to Known Models

Conclusion

Manus wave is more Strong than Deepseek to shake world

Upload Failed Deepseek Issue Fixing Guidance

Contact Form