DeepSeek's surprisingly inexpensive AI chatbot challenges industry giants. Boasting a self-introduction of "Ask anything, get a surprising answer," DeepSeek's AI has become a major market competitor, even causing significant drops in NVIDIA's stock price. Its success stems from a unique combination of innovative technology and substantial, albeit undisclosed, investment.
Image: ensigame.com
Key technological advancements include:
- Multi-token Prediction (MTP): Predicts multiple words simultaneously, boosting accuracy and efficiency.
- Mixture of Experts (MoE): Employs 256 neural networks, activating eight for each token, accelerating training and improving performance.
- Multi-head Latent Attention (MLA): Repeatedly extracts key information from text fragments, minimizing the risk of overlooking crucial details.
DeepSeek initially claimed a mere $6 million training cost for its DeepSeek V3 model using 2048 GPUs. However, SemiAnalysis revealed a far more extensive infrastructure, encompassing approximately 50,000 Nvidia Hopper GPUs (including H800, H100, and H20 units) spread across multiple data centers. This infrastructure represents a total server investment of roughly $1.6 billion, with operational expenses estimated at $944 million.
Image: ensigame.com
DeepSeek, a subsidiary of High-Flyer, a Chinese hedge fund, owns its data centers, providing control over optimization and faster innovation implementation. Its self-funded status enhances agility. The company attracts top talent, with some researchers earning over $1.3 million annually, primarily from Chinese universities.
The initial $6 million figure likely only reflects pre-training GPU costs, excluding research, refinement, data processing, and overall infrastructure expenses. DeepSeek's total AI development investment exceeds $500 million. Its streamlined structure allows for efficient innovation compared to larger, more bureaucratic competitors.
Image: ensigame.com
While DeepSeek's success showcases the competitive potential of a well-funded independent AI company, the "revolutionary budget" claim is misleading. Their success is attributed to substantial investment, technological breakthroughs, and a strong team. However, even with these significant expenditures, DeepSeek's costs remain considerably lower than competitors. For example, DeepSeek's R1 model cost $5 million to train, compared to ChatGPT4's $100 million.
Image: ensigame.com