Member-only story

The Valley’s Going Crazy: How DeepSeek Achieved State-of-the-Art AI with a $6 Million Budget

12 min readFeb 4, 2025

Over the past weekend, more than 20 people approached me, buzzing about DeepSeek — the new AI contender that’s shaking up Silicon Valley.

“DeepSeek R1 made things even scarier.”

These were the chilling words of a Meta insider as they grappled with a harsh truth:

Did you know that every single leader in Meta’s GenAI division earns more than the entire $5.6 million it cost to train DeepSeek v3?

— a model that’s now setting the AI world ablaze in the past few days, with Nasdaq Composite plunging by 3.1%, Nvidia 11%.

And btw, there are a dozens of such “leaders” on Meta’s payroll.

So how DeepSeek did it?

DeepSeek’s Secret Sauce: High Performance on a Shoestring Budget

DeepSeek-V3 is a Mixture-of-Experts (MoE) language model comprising a staggering 671 billion total parameters, with 37 billion activated for each token. Despite its massive scale, the model achieves performance comparable to leading closed-source models while requiring only a fraction of the training resources. As stated in the technical report:

“DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only…

The Valley’s Going Crazy: How DeepSeek Achieved State-of-the-Art AI with a $6 Million Budget

So how DeepSeek did it?

DeepSeek’s Secret Sauce: High Performance on a Shoestring Budget

Written by Angelina Yang

No responses yet