What Is DeepSeek-R1?

Summary: DeepSeek-R1 is an open-source language model developed by Chinese startup DeepSeek. It excels in long-context reasoning in both English and Chinese, offering strong performance across coding, math and reasoning benchmarks in comparison to OpenAI’s o1 and other leading foundation models.

DeepSeek-R1 is an AI model developed by Chinese artificial intelligence startup DeepSeek.

At the time of its release in January 2025, R1 held its own against (and in some cases surpassed) the reasoning capabilities of some of the world’s most advanced foundation models — but at a fraction of the operating cost, according to the company. R1 is also open sourced under an MIT license, allowing free commercial and academic use.

What is DeepSeek-R1?

DeepSeek-R1, or R1, is an open source language model made by Chinese AI startup DeepSeek that can perform the same text-based tasks as other advanced models, but at a lower cost. It also powers the company’s namesake chatbot, a direct competitor to ChatGPT.

DeepSeek-R1 is one of several highly advanced AI models to come out of China, joining those developed by labs like Alibaba and Moonshot AI. R1 acts as one of the model options for the DeepSeek chatbot as well, which soared to the number one spot on Apple App Store after its release, dethroning ChatGPT.

DeepSeek’s leap into the international spotlight has led some to question Silicon Valley tech companies’ decision to sink tens of billions of dollars into building their AI infrastructure, and the news caused stocks of AI chip manufacturers like Nvidia and Broadcom to nosedive. Still, some of the company’s biggest U.S. rivals have called its latest model “impressive” and “an excellent AI advancement,” and are reportedly scrambling to figure out how it was accomplished. Even President Donald Trump — who has made it his mission to come out ahead against China in AI — called DeepSeek’s success a “positive development,” describing it as a “wake-up call” for American industries to sharpen their competitive edge.

Indeed, the launch of DeepSeek-R1 appears to have taken the generative AI industry into a new era of brinkmanship, where the wealthiest companies with the largest models may no longer win by default.

What Is DeepSeek-R1?

DeepSeek-R1 is an open source language model developed by DeepSeek, a Chinese startup founded in 2023 by Liang Wenfeng, who also co-founded quantitative hedge fund High-Flyer. The company reportedly grew out of High-Flyer’s AI research unit to focus on developing large language models that achieve artificial general intelligence (AGI) — a benchmark where AI is able to match human intellect, which OpenAI and other top AI companies are also working towards. But unlike many of those companies, all of DeepSeek’s models are open source, meaning their weights and training methods are freely available for the public to examine, use and build upon.

R1 is the latest of several AI models DeepSeek has made public. Its first product was the coding tool DeepSeek Coder, followed by the V2 model series, which gained attention for its strong performance and low cost, triggering a price war in the Chinese AI model market. Its V3 model — the foundation on which R1 is built — captured some interest as well, but its restrictions around sensitive topics related to the Chinese government drew questions about its viability as a true industry competitor. Then the company unveiled its new model, R1, claiming it matches the performance of the world’s top AI models while relying on comparatively modest hardware.

All told, analysts at Jeffries have reportedly estimated that DeepSeek spent $5.6 million to train R1 — a drop in the bucket compared to the hundreds of millions, or even billions, of dollars many U.S. companies pour into their AI models. However, that figure has since come under scrutiny from other analysts claiming that it only accounts for training the chatbot, not additional expenses like early-stage research and experiments.

Check Out Another Open Source ModelGrok: What We Know About Elon Musk’s Chatbot

What Can DeepSeek-R1 Do?

According to DeepSeek, R1 excels at a wide range of text-based tasks in both English and Chinese, including:

Creative writing
General question answering
Editing
Summarization

More specifically, the company says the model does particularly well at “reasoning-intensive” tasks that involve “well-defined problems with clear solutions.” Namely:

Generating and debugging code
Performing mathematical computations
Explaining complex scientific concepts

Plus, because it is an open source model, R1 enables users to freely access, modify and build upon its capabilities, as well as integrate them into proprietary systems.

DeepSeek-R1 Use Cases

DeepSeek-R1 has not experienced widespread industry adoption yet, but judging from its capabilities it could be used in a variety of ways, including:

Software Development: R1 could assist developers by generating code snippets, debugging existing code and providing explanations for complex coding concepts.
Mathematics: R1’s ability to solve and explain complex math problems could be used to provide research and education support in mathematical fields.
Content Creation, Editing and Summarization: R1 is good at generating high-quality written content, as well as editing and summarizing existing content, which could be useful in industries ranging from marketing to law.
Customer Service: R1 could be used to power a customer service chatbot, where it can engage in conversation with users and answer their questions in lieu of a human agent.
Data Analysis: R1 can analyze large datasets, extract meaningful insights and generate comprehensive reports based on what it finds, which could be used to help businesses make more informed decisions.
Education: R1 could be used as a sort of digital tutor, breaking down complex subjects into clear explanations, answering questions and offering personalized lessons across various subjects.

DeepSeek-R1 Limitations

DeepSeek-R1 shares similar limitations to any other language model. It can make mistakes, generate biased results and be difficult to fully understand — even if it is technically open source.

DeepSeek also says the model has a tendency to “mix languages,” especially when prompts are in languages other than Chinese and English. For example, R1 might use English in its reasoning and response, even if the prompt is in a completely different language. And the model struggles with few-shot prompting, which involves providing a few examples to guide its response. Instead, users are advised to use simpler zero-shot prompts — directly specifying their intended output without examples — for better results.

Related ReadingWhat We Can Expect From AI in 2025

How Does DeepSeek-R1 Work?

Like other AI models, DeepSeek-R1 was trained on a massive corpus of data, relying on algorithms to identify patterns and perform all kinds of natural language processing tasks. However, its inner workings set it apart — specifically its mixture of experts (MoE) architecture and its use of reinforcement learning and fine-tuning — which enable the model to operate more efficiently as it works to produce consistently accurate and clear outputs.

Mixture of Experts Architecture

DeepSeek-R1 accomplishes its computational efficiency by employing a mixture of experts architecture built upon the DeepSeek-V3 base model, which laid the groundwork for R1’s multi-domain language understanding.

Essentially, MoE models use multiple smaller models (called “experts”) that are only active when they are needed, optimizing performance and reducing computational costs. While they generally tend to be smaller and cheaper than transformer-based models, models that use MoE can perform just as well, if not better, making them an attractive option in AI development.

R1 specifically has 671 billion parameters across multiple expert networks, but only 37 billion of those parameters are required in a single “forward pass,” which is when an input is passed through the model to generate an output.

Reinforcement Learning and Supervised Fine-Tuning

A distinctive aspect of DeepSeek-R1’s training process is its use of reinforcement learning, a technique that helps enhance its reasoning capabilities. The model also undergoes supervised fine-tuning, where it is taught to perform well on a specific task by training it on a labeled dataset. This encourages the model to eventually learn how to verify its answers, correct any errors it makes and follow “chain-of-thought” (CoT) reasoning, where it systematically breaks down complex problems into smaller, more manageable steps.

DeepSeek breaks down this entire training process in a 22-page paper, unlocking training methods that are typically closely guarded by the tech companies it’s competing with.

It all begins with a “cold start” phase, where the underlying V3 model is fine-tuned on a small set of carefully crafted CoT reasoning examples to improve clarity and readability. From there, the model goes through several iterative reinforcement learning and refinement phases, where accurate and properly formatted responses are incentivized with a reward system. In addition to reasoning and logic-focused data, the model is trained on data from other domains to enhance its capabilities in writing, role-playing and more general-purpose tasks. During the final reinforcement learning phase, the model’s “helpfulness and harmlessness” is assessed in an effort to remove any inaccuracies, biases and harmful content.

How Is DeepSeek-R1 Different From Other Models?

DeepSeek has compared its R1 model to some of the most advanced language models in the industry — namely OpenAI’s GPT-4o and o1 models, Meta’s Llama 3.1, Anthropic’s Claude 3.5. Sonnet and Alibaba’s Qwen2.5.

Here’s how R1 stacks up:

Capabilities

DeepSeek-R1 comes close to matching all of the capabilities of these other models across various industry benchmarks. It performed especially well in coding and math, beating out its rivals on almost every test. Unsurprisingly, it also outperformed the American models on all of the Chinese exams, and even scored higher than Qwen2.5 on two of the three tests. R1’s biggest weakness seemed to be its English proficiency, yet it still performed better than others in areas like discrete reasoning and handling long contexts.

R1 is also designed to explain its reasoning, meaning it can articulate the thought process behind the answers it generates — a feature that sets it apart from other advanced AI models, which typically lack this level of transparency and explainability.

Cost

DeepSeek-R1’s biggest advantage over the other AI models in its class is that it appears to be substantially cheaper to develop and run. This is largely because R1 was reportedly trained on just a couple thousand H800 chips — a cheaper and less powerful version of Nvidia’s $40,000 H100 GPU, which many top AI developers are investing billions of dollars in and stock-piling. R1 is also a much more compact model, requiring less computational power, yet it is trained in a way that allows it to match or even exceed the performance of much larger models.

Accessibility

DeepSeek-R1, Llama 3.1 and Qwen2.5 are all open source to some degree and free to access, while GPT-4o and Claude 3.5 Sonnet are not. Users have more flexibility with the open source models, as they can modify, integrate and build upon them without having to deal with the same licensing or subscription barriers that come with closed models.

Nationality

Besides Qwen2.5, which was also developed by a Chinese company, all of the models that are comparable to R1 upon release were made in the United States. And as a product of China, DeepSeek-R1 is subject to benchmarking by the government’s internet regulator to ensure its responses embody so-called “core socialist values.” Users have noticed that the model won’t respond to questions about the Tiananmen Square massacre, for example, or the Uyghur detention camps. And, like the Chinese government, it does not acknowledge Taiwan as a sovereign nation.

Models developed by American companies will avoid answering certain questions too, but for the most part this is in the interest of safety and fairness rather than outright censorship. They often won’t purposefully generate content that is racist or sexist, for example, and they will refrain from offering advice relating to dangerous or illegal activities. While the U.S. government has attempted to regulate the AI industry as a whole, it has little to no oversight over what specific AI models actually generate.

Privacy Risks

All AI models pose a privacy risk, with the potential to leak or misuse users’ personal information, but DeepSeek-R1 poses an even greater threat. A Chinese company taking the lead on AI could put millions of Americans’ data in the hands of adversarial groups or even the Chinese government — something that is already a concern for both private companies and the federal government alike.

The United States has worked for years to restrict China’s supply of high-powered AI chips, citing national security concerns, but R1’s results show these efforts may have been in vain. What’s more, the DeepSeek chatbot’s overnight popularity indicates Americans aren’t too worried about the risks.

More on DeepSeekWhat DeepSeek Means for the Future of AI

How Is DeepSeek-R1 Affecting the AI Industry?

DeepSeek’s announcement of an AI model rivaling the likes of OpenAI and Meta, developed using a relatively small number of outdated chips, has been met with skepticism and panic, in addition to awe. Many are speculating that DeepSeek actually used a stash of illicit Nvidia H100 GPUs instead of the H800s, which are banned in China under U.S. export controls. And OpenAI seems convinced that the company used its model to train R1, in violation of OpenAI’s terms and conditions. Other, more outlandish, claims include that DeepSeek is part of an elaborate plot by the Chinese government to destroy the American tech industry.

Nevertheless, if R1 has managed to do what DeepSeek says it has, then it will have a massive impact on the broader artificial intelligence industry — especially in the United States, where AI investment is highest. AI has long been considered among the most power-hungry and cost-intensive technologies — so much so that major players are buying up nuclear power companies and partnering with governments to secure the electricity needed for their models. The prospect of a similar model being developed for a fraction of the price (and on less capable chips), is reshaping the industry’s understanding of how much money is actually needed.

Going forward, AI’s biggest proponents believe artificial intelligence (and eventually AGI and superintelligence) will change the world, paving the way for profound advancements in healthcare, education, scientific discovery and much more. If these advancements can be achieved at a lower cost, it opens up entire new possibilities — and threats.

Notable DeepSeek Model Updates

Over the years, DeepSeek has emerged as a key player in China’s (and the world’s) open-source AI movement, advancing bilingual and long-context model development.

Below are a few of DeepSeek’s most significant model releases and enhancements to date:

Sparse Attention Model Release (September 2025)

DeepSeek’s Sparse Attention Model uses new methods to prioritize specific information in long-context operations and aims to reduce inference costs. When a user submits a query, the model uses a lightning indexer to find the most prominent information and passes that excerpt to a limited attention window for further processing. According to DeepSeek, which released the model on Hugging Face, this method can process long-context operations with smaller server loads, and could reduce the costs of an API call by half.

DeepSeek-V3.1 Release (August 2025)

DeepSeek-V3.1 is a hybrid model that supports both a thinking mode and non-thinking mode for chats, with DeepSeek-V3.1-Think able to reach answers in less time than DeepSeek-R1. In particular, DeepSeek-V3.1 shows improvements in tool usage and agentic tasks by using post-training optimization. The model is post-trained on top of DeepSeek-V3.1-Base, which has undergone a two-phase long context extension training and supports a 128K context window.

DeepSeek-R1 Release (January 2025)

The DeepSeek-R1 suite — which includes DeepSeek-R1-Zero and DeepSeek-R1 models — debuted as DeepSeek’s first-generation reasoning models and were both trained based on DeepSeek-V3-Base. DeepSeek-R1-Zero, trained using large-scale reinforcement learning, exhibited performant reasoning behaviors but struggled in endless repetition and poor readability. These challenges were addressed in DeepSeek-R1, which showed performance comparable to OpenAI-o1 across math, coding and reasoning tasks. DeepSeek-R1 signaled China’s growing influence in the open-source large language model ecosystem.

DeepSeek-V3 Release (December 2024)

DeepSeek-V3 is a Mixture of Experts language model built on top of the DeepSeek-V2 architecture with added multi-token prediction. DeepSeek-V3 is meant for a range of general tasks, with particular strengths in math and coding. DeepSeek stated that DeepSeek-V2 outperformed other open-source models and achieved comparable performance to leading closed-source models on MMLU, GPQA, MATH 500 and other model benchmarks.

DeepSeek-V2 Release (May 2024)

DeepSeek-V2 is a suite of Mixture of Experts models designed for long-context and bilingual tasks. The series included DeepSeek-V2, DeepSeek-V2-Chat, DeepSeek-V2-Lite, DeepSeek-V2-Lite-Chat and DeepSeek-V2.5, featuring up to 128K context windows and pretrained on over 8 trillion tokens. DeepSeek-V2-Chat overtook Llama 3 in benchmarks like BBH, MBPP and HumanEval, and surpassed GPT-4-0613 in Chinese open-ended generation, positioning DeepSeek-V2 as one of the highest-performing publicly available models globally.

DeepSeek LLM Release (November 2023)

DeepSeek LLM is a series of four large language models (two pretrained models and two instruction fine-tuned models), comprising 67 billion parameters and trained on 2 trillion tokens in both English and Chinese. DeepSeek LLM specialized in general capabilities like reasoning, coding, math and Chinese comprehension. According to DeepSeek, DeepSeek LLM outperformed Llama 2 in all above general capability areas and surpassed GPT-3.5 in Chinese language.

DeepSeek Coder Release (November 2023)

DeepSeek Coder is a series of eight code language models (four pretrained models and four instruction fine-tuned models) built for coding tasks. The models were trained on 87 percent code and 13 percent natural language in both English and Chinese. DeepSeek Coder achieved “state-of-the-art performance” among open-source code models on various programming languages and model benchmarks.

DeepSeek Founded (July 2023)

Liang Wenfeng, a prominent Chinese entrepreneur and co-founder of quantitative hedge fund High-Flyer, founded DeepSeek. The company reportedly grew out of High-Flyer’s AI research unit, with the goal of developing large language models that eventually achieve artificial general intelligence.

Frequently Asked Questions

How many parameters does DeepSeek-R1 have?

DeepSeek-R1 has 671 billion parameters in total. But DeepSeek also released six “distilled” versions of R1, ranging in size from 1.5 billion parameters to 70 billion parameters. While the smallest can run on a laptop with consumer GPUs, the full R1 requires more substantial hardware.

Is DeepSeek-R1 open source?

Yes, DeepSeek is open source in that its model weights and training methods are freely available for the public to examine, use and build upon. However, its source code and any specifics about its underlying data are not available to the public.

How to access DeepSeek-R1

DeepSeek’s chatbot (which can be powered by the R1 model) is free to use on the company’s website and is available for download on the Apple App Store. R1 is also available for use on Hugging Face and DeepSeek’s API.

What is DeepSeek used for?

DeepSeek can be used for a variety of text-based tasks, including creating writing, general question answering, editing and summarization. It is especially good at tasks related to coding, mathematics and science.

Is DeepSeek safe to use?

DeepSeek should be used with caution, as the company’s privacy policy says it may collect users’ “uploaded files, feedback, chat history and any other content they provide to its model and services.” This can include personal information like names, dates of birth and contact details. Once this information is out there, users have no control over who gets a hold of it or how it is used.

Is DeepSeek better than ChatGPT?

One of DeepSeek’s underlying models, R1, outperformed GPT-4o (which can power ChatGPT’s free version) across several industry benchmarks, particularly in coding, math and Chinese. It is also quite a bit cheaper to run. That being said, DeepSeek’s unique issues around privacy and censorship may make it a less appealing option than ChatGPT.

What is DeepSeek-R1?

What Is DeepSeek-R1?

What Can DeepSeek-R1 Do?

DeepSeek-R1 Use Cases

DeepSeek-R1 Limitations

How Does DeepSeek-R1 Work?

Mixture of Experts Architecture

Reinforcement Learning and Supervised Fine-Tuning

How Is DeepSeek-R1 Different From Other Models?

Capabilities

Cost

Accessibility

Nationality

Privacy Risks

How Is DeepSeek-R1 Affecting the AI Industry?

Notable DeepSeek Model Updates

Sparse Attention Model Release (September 2025)

DeepSeek-V3.1 Release (August 2025)

DeepSeek-R1 Release (January 2025)

DeepSeek-V3 Release (December 2024)

DeepSeek-V2 Release (May 2024)

DeepSeek LLM Release (November 2023)

DeepSeek Coder Release (November 2023)

DeepSeek Founded (July 2023)

Frequently Asked Questions

How many parameters does DeepSeek-R1 have?

Is DeepSeek-R1 open source?

How to access DeepSeek-R1

What is DeepSeek used for?

Is DeepSeek safe to use?

Is DeepSeek better than ChatGPT?

Recent Artificial Intelligence Articles