Chinese AI startup DeepSeek panics US and the rest of the worldChinese AI startup DeepSeek panics US and the rest of the world
Tiny Chinese company DeepSeek claims to be competing with the likes of OpenAI at a fraction of the cost, leading to fears that the AI bubble is set to burst.
January 27, 2025
DeepSeek released the ‘R1’ version of its AI a week ago, claiming its performance is ‘on par with OpenAI-o1’, which was unveiled late last year. The novel thing about both models is how they approach the inference side of AI, which is the process of tapping expensively trained large language models, in response to queries, to yield the best outcomes.
It has taken the rest of the world a week to assess open-source DeepSeek-R1 and it seems to have come to the conclusion that its claims are legitimate. The main reason it's so disruptive is that the US state has gone to great lengths to deny China the hardware used by the likes of OpenAI – especially Nvidia GPUs. So the latest news from DeepSeek suggests that effort has failed in its stated aim of preventing Chinese AI development.
But that’s not all. Not only is it assumed that, thanks to the US sanctions, this technological parity has been achieved with inferior hardware, it also seems to be able to produce competitive outcomes at a fraction of the price. Since the US approach to AI hinges on throwing zillions of dollars of computing grunt at it, the appearance of not only a much cheaper alternative, but a Chinese one at that, has caused considerable concern.
Precisely how DeepSeek has managed this is complicated, of course, but the good news is that there’s plenty of informed commentary out there. “DeepSeek R1 scores extremely well against almost all of the benchmarks (see here) and comfortably rubs shoulders with the likes of OpenAI o1, Claude 3.5-Sonnet and GPT4o,” wrote Radio Free Mobile soon after its release. “This comes as a big surprise as China is supposed to be cut off from advanced semiconductors that are needed to train a model of this size and complexity.” Here’s a widely shared chart comparing DeepSeek R1 and OpenAI o1 benchmarks.
The FT offers a good compilation of equity analyst opinion, while the WSJ provides a Silicon Valley perspective. TechCrunch offers some more points of view, including the sceptical angle that this is all a Chinese Communist Party wind-up, designed to troll the aforementioned US efforts to suppress China’s AI sector. Indeed, the CCP-controlled Global Times had a good gloat over the weekend, following up today by noting the DeepSeek app is currently top of the US iOS chart.
Perhaps the most instructive piece we’ve read is from tech investor and former Microsoft senior exec Steven Sinofsky on X, headlined ‘DeepSeek Has Been Inevitable and Here's Why (History tells us)’. It seems a lot of this breakthrough comes down to a form of AI that ‘thinks’ much more efficiently than the current paradigm. It remains to be seen whether US AI giants are able to course-correct in response.
If Nvidia’s share price is anything to go by, there’s currently fair bit of panic flying around. NVDA was down 12% in pre-market trading at time of writing and Japanese chip stocks were also down when its market opened today. If this is all just a CCP trolling exercise, it’s an incredibly effective one, which has given the rest of the world plenty to think about. Not least of those concerns will be the fact that US consumers seem to be rapidly migrating to Chinese apps, in part as a direct result of incoherent US foreign policy.
Of course we had to give it a go ourselves, so installed the Android app. We asked it “how does deepseekR1 work’ and you can see the full response pasted below. Not included in the answer is an interesting feature in which it takes you through its ‘thought process’ before delivering the answer. You can see the start of that part in this phone screenshot.
DeepSeek-R1 is an AI assistant developed by DeepSeek, leveraging a proprietary large language model (LLM) tailored for natural language understanding and generation. Here's a structured breakdown of its functionality:
### Core Architecture & Training
- **Transformer-Based Model**: Built on a transformer architecture, optimized for processing sequential data with attention mechanisms, enabling robust context handling.
- **Multilingual Training**: Trained on diverse datasets, with a focus on Chinese language optimization, involving specialized tokenization and character handling.
- **Knowledge Cutoff**: Integrates broad, domain-specific knowledge up to its training cutoff, periodically updated through retraining rather than real-time retrieval.
### Key Features
1. **Natural Language Understanding (NLU)**:
- Excels in parsing complex queries, including idiomatic and context-rich Chinese language, due to tailored tokenization and training data.
2. **Multi-Turn Dialogue**:
- Utilizes a large context window to retain conversation history, enabling coherent and context-aware interactions over extended dialogues.
3. **Task-Specific Optimization**:
- **Fine-Tuning**: Adapted for domains like customer service or education using targeted datasets to refine responses and workflows.
- **Adapter Modules**: Potentially employs parameter-efficient techniques (e.g., adapters) to switch between tasks without full retraining.
4. **Ethical & Safety Measures**:
- **Reinforcement Learning with Human Feedback (RLHF)**: Aligns outputs with human values by incorporating feedback from trainers.
- **Content Filtering**: Automated systems block harmful or inappropriate content pre- and post-generation.
### Workflow Overview
1. **Input Processing**: Receives user input, encoding it with context from prior interactions.
2. **Contextual Generation**: The transformer model generates responses using attention mechanisms to weigh relevant dialogue history.
3. **Safety Checks**: Outputs undergo filtering to ensure compliance with ethical guidelines.
4. **Response Delivery**: Finalized text is returned to the user, maintaining conversational flow.
### Proprietary Enhancements
- **Efficiency Tweaks**: Likely includes architectural optimizations for faster inference or reduced computational costs.
- **Domain-Specific Versions**: Customized deployments for industries like healthcare or finance, trained on specialized corpora.
### Limitations
- **Static Knowledge**: Lacks real-time internet access; relies on pre-trained data updated periodically.
- **Context Window Constraints**: Despite large capacity, very long conversations may truncate earlier context.
In summary, DeepSeek-R1 combines advanced LLM architecture with targeted fine-tuning and safety protocols to deliver a versatile, context-aware AI assistant, particularly optimized for Chinese-language applications and ethical interactions.
About the Author
You May Also Like