Member-only story
The Valley’s Going Crazy: How DeepSeek Achieved State-of-the-Art AI with a $6 Million Budget
Over the past weekend, more than 20 people approached me, buzzing about DeepSeek — the new AI contender that’s shaking up Silicon Valley.
“DeepSeek R1 made things even scarier.”
These were the chilling words of a Meta insider as they grappled with a harsh truth:
Did you know that every single leader in Meta’s GenAI division earns more than the entire $5.6 million it cost to train DeepSeek v3?
— a model that’s now setting the AI world ablaze in the past few days, with Nasdaq Composite plunging by 3.1%, Nvidia 11%.
And btw, there are a dozens of such “leaders” on Meta’s payroll.
So how DeepSeek did it?
DeepSeek’s Secret Sauce: High Performance on a Shoestring Budget
DeepSeek-V3 is a Mixture-of-Experts (MoE) language model comprising a staggering 671 billion total parameters, with 37 billion activated for each token. Despite its massive scale, the model achieves performance comparable to leading closed-source models while requiring only a fraction of the training resources. As stated in the technical report:
“DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only…