What Is DeepSeek-R1?

This high-profile AI model from the Chinese startup DeepSeek achieves comparable results to its American counterparts — at a fraction of the operating cost.

Written by Ellen Glover
Published on Feb. 18, 2025
A photo of DeepSeek's homepage pulled up on a computer.
Image: Robert Way / Shutterstock

DeepSeek-R1 is an AI model developed by Chinese artificial intelligence startup DeepSeek. Released in January 2025, R1 holds its own against (and in some cases surpasses) the reasoning capabilities of some of the world’s most advanced foundation models — but at a fraction of the operating cost, according to the company. R1 is also open sourced under an MIT license, allowing free commercial and academic use.

What is DeepSeek-R1?

DeepSeek-R1, or R1, is an open source language model made by Chinese AI startup DeepSeek that can perform the same text-based tasks as other advanced models, but at a lower cost. It also powers the company’s namesake chatbot, a direct competitor to ChatGPT.

DeepSeek-R1 is one of several highly advanced AI models to come out of China, joining those developed by labs like Alibaba and Moonshot AI. R1 powers DeepSeek’s eponymous chatbot as well, which soared to the number one spot on Apple App Store after its release, dethroning ChatGPT

DeepSeek’s leap into the international spotlight has led some to question Silicon Valley tech companies’ decision to sink tens of billions of dollars into building their AI infrastructure, and the news caused stocks of AI chip manufacturers like Nvidia and Broadcom to nosedive. Still, some of the company’s biggest U.S. rivals have called its latest model “impressive” and “an excellent AI advancement,” and are reportedly scrambling to figure out how it was accomplished. Even President Donald Trump — who has made it his mission to come out ahead against China in AI — called DeepSeek’s success a “positive development,” describing it as a “wake-up call” for American industries to sharpen their competitive edge.

Indeed, the launch of DeepSeek-R1 appears to be taking the generative AI industry into a new era of brinkmanship, where the wealthiest companies with the largest models may no longer win by default. 

 

What Is DeepSeek-R1?

DeepSeek-R1 is an open source language model developed by DeepSeek, a Chinese startup founded in 2023 by Liang Wenfeng, who also co-founded quantitative hedge fund High-Flyer. The company reportedly grew out of High-Flyer’s AI research unit to focus on developing large language models that achieve artificial general intelligence (AGI) — a benchmark where AI is able to match human intellect, which OpenAI and other top AI companies are also working towards. But unlike many of those companies, all of DeepSeek’s models are open source, meaning their weights and training methods are freely available for the public to examine, use and build upon.

R1 is the latest of several AI models DeepSeek has made public. Its first product was the coding tool DeepSeek Coder, followed by the V2 model series, which gained attention for its strong performance and low cost, triggering a price war in the Chinese AI model market. Its V3 model — the foundation on which R1 is built — captured some interest as well, but its restrictions around sensitive topics related to the Chinese government drew questions about its viability as a true industry competitor. Then the company unveiled its new model, R1, claiming it matches the performance of the world’s top AI models while relying on comparatively modest hardware. 

All told, analysts at Jeffries have reportedly estimated that DeepSeek spent $5.6 million to train R1 — a drop in the bucket compared to the hundreds of millions, or even billions, of dollars many U.S. companies pour into their AI models. However, that figure has since come under scrutiny from other analysts claiming that it only accounts for training the chatbot, not additional expenses like early-stage research and experiments.

Check Out Another Open Source ModelGrok: What We Know About Elon Musk’s Chatbot

 

What Can DeepSeek-R1 Do?

According to DeepSeek, R1 excels at a wide range of text-based tasks in both English and Chinese, including:

  • Creative writing
  • General question answering
  • Editing 
  • Summarization

More specifically, the company says the model does particularly well at “reasoning-intensive” tasks that involve “well-defined problems with clear solutions.” Namely:

  • Generating and debugging code
  • Performing mathematical computations
  • Explaining complex scientific concepts

Plus, because it is an open source model, R1 enables users to freely access, modify and build upon its capabilities, as well as integrate them into proprietary systems.

 

DeepSeek-R1 Use Cases

DeepSeek-R1 has not experienced widespread industry adoption yet, but judging from its capabilities it could be used in a variety of ways, including:

  • Software Development: R1 could assist developers by generating code snippets, debugging existing code and providing explanations for complex coding concepts.
  • Mathematics: R1’s ability to solve and explain complex math problems could be used to provide research and education support in mathematical fields.
  • Content Creation, Editing and Summarization: R1 is good at generating high-quality written content, as well as editing and summarizing existing content, which could be useful in industries ranging from marketing to law.
  • Customer Service: R1 could be used to power a customer service chatbot, where it can engage in conversation with users and answer their questions in lieu of a human agent. 
  • Data Analysis: R1 can analyze large datasets, extract meaningful insights and generate comprehensive reports based on what it finds, which could be used to help businesses make more informed decisions.
  • Education: R1 could be used as a sort of digital tutor, breaking down complex subjects into clear explanations, answering questions and offering personalized lessons across various subjects.

 

DeepSeek-R1 Limitations

DeepSeek-R1 shares similar limitations to any other language model. It can make mistakes, generate biased results and be difficult to fully understand — even if it is technically open source. 

DeepSeek also says the model has a tendency to “mix languages,” especially when prompts are in languages other than Chinese and English. For example, R1 might use English in its reasoning and response, even if the prompt is in a completely different language. And the model struggles with few-shot prompting, which involves providing a few examples to guide its response. Instead, users are advised to use simpler zero-shot prompts — directly specifying their intended output without examples — for better results.

Related ReadingWhat We Can Expect From AI in 2025

 

How Does DeepSeek-R1 Work?

Like other AI models, DeepSeek-R1 was trained on a massive corpus of data, relying on algorithms to identify patterns and perform all kinds of natural language processing tasks. However, its inner workings set it apart — specifically its mixture of experts architecture and its use of reinforcement learning and fine-tuning — which enable the model to operate more efficiently as it works to produce consistently accurate and clear outputs.

Mixture of Experts Architecture

DeepSeek-R1 accomplishes its computational efficiency by employing a mixture of experts (MoE) architecture built upon the DeepSeek-V3 base model, which laid the groundwork for R1’s multi-domain language understanding. 

Essentially, MoE models use multiple smaller models (called “experts”) that are only active when they are needed, optimizing performance and reducing computational costs. While they generally tend to be smaller and cheaper than transformer-based models, models that use MoE can perform just as well, if not better, making them an attractive option in AI development.

R1 specifically has 671 billion parameters across multiple expert networks, but only 37 billion of those parameters are required in a single “forward pass,” which is when an input is passed through the model to generate an output. 

Reinforcement Learning and Supervised Fine-Tuning

A distinctive aspect of DeepSeek-R1’s training process is its use of reinforcement learning, a technique that helps enhance its reasoning capabilities. The model also undergoes supervised fine-tuning, where it is taught to perform well on a specific task by training it on a labeled dataset. This encourages the model to eventually learn how to verify its answers, correct any errors it makes and follow “chain-of-thought” (CoT) reasoning, where it systematically breaks down complex problems into smaller, more manageable steps.

DeepSeek breaks down this entire training process in a 22-page paper, unlocking training methods that are typically closely guarded by the tech companies it’s competing with.

It all begins with a “cold start” phase, where the underlying V3 model is fine-tuned on a small set of carefully crafted CoT reasoning examples to improve clarity and readability. From there, the model goes through several iterative reinforcement learning and refinement phases, where accurate and properly formatted responses are incentivized with a reward system. In addition to reasoning and logic-focused data, the model is trained on data from other domains to enhance its capabilities in writing, role-playing and more general-purpose tasks. During the final reinforcement learning phase, the model’s “helpfulness and harmlessness” is assessed in an effort to remove any inaccuracies, biases and harmful content.

 

How Is DeepSeek-R1 Different From Other Models?

DeepSeek has compared its R1 model to some of the most advanced language models in the industry — namely OpenAI’s GPT-4o and o1 models, Meta’s Llama 3.1, Anthropic’s Claude 3.5. Sonnet and Alibaba’s Qwen2.5. Here’s how R1 stacks up:

Capabilities

DeepSeek-R1 comes close to matching all of the capabilities of these other models across various industry benchmarks. It performed especially well in coding and math, beating out its rivals on almost every test. Unsurprisingly, it also outperformed the American models on all of the Chinese exams, and even scored higher than Qwen2.5 on two of the three tests. R1’s biggest weakness seemed to be its English proficiency, yet it still performed better than others in areas like discrete reasoning and handling long contexts.

R1 is also designed to explain its reasoning, meaning it can articulate the thought process behind the answers it generates — a feature that sets it apart from other advanced AI models, which typically lack this level of transparency and explainability.

Cost

DeepSeek-R1’s biggest advantage over the other AI models in its class is that it appears to be substantially cheaper to develop and run. This is largely because R1 was reportedly trained on just a couple thousand H800 chips — a cheaper and less powerful version of Nvidia’s $40,000 H100 GPU, which many top AI developers are investing billions of dollars in and stock-piling. R1 is also a much more compact model, requiring less computational power, yet it is trained in a way that allows it to match or even exceed the performance of much larger models.

Accessibility

DeepSeek-R1, Llama 3.1 and Qwen2.5 are all open source to some degree and free to access, while GPT-4o and Claude 3.5 Sonnet are not. Users have more flexibility with the open source models, as they can modify, integrate and build upon them without having to deal with the same licensing or subscription barriers that come with closed models.

Nationality

Besides Qwen2.5, which was also developed by a Chinese company, all of the models that are comparable to R1 were made in the United States. And as a product of China, DeepSeek-R1 is subject to benchmarking by the government’s internet regulator to ensure its responses embody so-called “core socialist values.” Users have noticed that the model won’t respond to questions about the Tiananmen Square massacre, for example, or the Uyghur detention camps. And, like the Chinese government, it does not acknowledge Taiwan as a sovereign nation.

Models developed by American companies will avoid answering certain questions too, but for the most part this is in the interest of safety and fairness rather than outright censorship. They often won’t purposefully generate content that is racist or sexist, for example, and they will refrain from offering advice relating to dangerous or illegal activities. While the U.S. government has attempted to regulate the AI industry as a whole, it has little to no oversight over what specific AI models actually generate.

Privacy Risks

All AI models pose a privacy risk, with the potential to leak or misuse users’ personal information, but DeepSeek-R1 poses an even greater threat. A Chinese company taking the lead on AI could put millions of Americans’ data in the hands of adversarial groups or even the Chinese government — something that is already a concern for both private companies and the federal government alike. 

The United States has worked for years to restrict China’s supply of high-powered AI chips, citing national security concerns, but R1’s results show these efforts may have been in vain. What’s more, the DeepSeek chatbot’s overnight popularity indicates Americans aren’t too worried about the risks.

More on DeepSeekWhat DeepSeek Means for the Future of AI

 

How Is DeepSeek-R1 Affecting the AI Industry?

DeepSeek’s announcement of an AI model rivaling the likes of OpenAI and Meta, developed using a relatively small number of outdated chips, has been met with skepticism and panic, in addition to awe. Many are speculating that DeepSeek actually used a stash of illicit Nvidia H100 GPUs instead of the H800s, which are banned in China under U.S. export controls. And OpenAI seems convinced that the company used its model to train R1, in violation of OpenAI’s terms and conditions. Other, more outlandish, claims include that DeepSeek is part of an elaborate plot by the Chinese government to destroy the American tech industry.

Nevertheless, if R1 has managed to do what DeepSeek says it has, then it will have a massive impact on the broader artificial intelligence industry — especially in the United States, where AI investment is highest. AI has long been considered among the most power-hungry and cost-intensive technologies — so much so that major players are buying up nuclear power companies and partnering with governments to secure the electricity needed for their models. The prospect of a similar model being developed for a fraction of the price (and on less capable chips), is reshaping the industry’s understanding of how much money is actually needed. 

Going forward, AI’s biggest proponents believe artificial intelligence (and eventually AGI and superintelligence) will change the world, paving the way for profound advancements in healthcare, education, scientific discovery and much more. If these advancements can be achieved at a lower cost, it opens up entire new possibilities — and threats.

Frequently Asked Questions

DeepSeek-R1 has 671 billion parameters in total. But DeepSeek also released six “distilled” versions of R1, ranging in size from 1.5 billion parameters to 70 billion parameters. While the smallest can run on a laptop with consumer GPUs, the full R1 requires more substantial hardware.

Yes, DeepSeek is open source in that its model weights and training methods are freely available for the public to examine, use and build upon. However, its source code and any specifics about its underlying data are not available to the public.

DeepSeek’s chatbot (which is powered by R1) is free to use on the company’s website and is available for download on the Apple App Store. R1 is also available for use on Hugging Face and DeepSeek’s API.

DeepSeek can be used for a variety of text-based tasks, including creating writing, general question answering, editing and summarization. It is especially good at tasks related to coding, mathematics and science.

DeepSeek should be used with caution, as the company’s privacy policy says it may collect users’ “uploaded files, feedback, chat history and any other content they provide to its model and services.” This can include personal information like names, dates of birth and contact details. Once this information is out there, users have no control over who gets a hold of it or how it is used.

DeepSeek’s underlying model, R1, outperformed GPT-4o (which powers ChatGPT’s free version) across several industry benchmarks, particularly in coding, math and Chinese. It is also quite a bit cheaper to run. That being said, DeepSeek’s unique issues around privacy and censorship may make it a less appealing option than ChatGPT.

Explore Job Matches.