Baidu’s New ERNIE X1 and 4.5 Models Are Escalating the US-China AI Arms Race

Baidu says ERNIE 4.5 has advanced native multimodal capabilities, while ERNIE X1 specializes in “deep-thinking reasoning,” directly challenging DeepSeek’s R1 and the wave of other reasoning models that have followed.

Written by Ellen Glover
Published on Mar. 19, 2025
Baidu's homepage on a desktop computer, with Baidu's ERNIE Bot pulled up on a smartphone in the foreground
Image: Ascannio / Shutterstock:

Chinese artificial intelligence company Baidu has released two new versions of its ERNIE foundation model — a “deep-think reasoning” model called ERNIE X1 and a multimodal model called ERNIE 4.5 — claiming they perform on par with, or even better than, some of the world’s top AI models. Both are now available for free through the ERNIE Bot chatbot, and will be gradually incorporated into the company’s broader product ecosystem, including Baidu Search, China’s most popular search engine.

This launch comes at a particularly competitive moment in generative AI development, especially between China and the United States. In early 2025, Chinese AI startup DeepSeek sent shockwaves through the industry with the release of R1, an open source reasoning model that reportedly outperformed some of the most advanced AI models while operating at a fraction of the cost. With that, DeepSeek quickly overtook its competitors in both America and China, including Baidu, despite it being one of the first Chinese companies to launch a ChatGPT rival, ERNIE Bot.

ERNIE X1 vs. ERNIE 4.5

ERNIE X1 and ERNIE 4.5 are both foundation models developed by Baidu, offering different capabilities for slightly different use cases.

ERNIE X1 is an advanced, highly efficient reasoning model competing with the likes of DeepSeek R1 and OpenAI’s o3 mini.

ERNIE 4.5 is a large multimodal AI model competing with models like GPT-4o and Gemini.

Since R1’s debut, tech giants like Google, OpenAI, Anthropic and xAI have released reasoning models of their own, shifting their priorities to efficiency and affordability rather than just scale. Now, with the introduction of these new models — particularly ERNIE X1 — Baidu is positioning itself as a strong competitor in the global AI arms race, offering similar performance to R1 and other models at an even lower price.

“2025 is set to be an important year for the development and iteration of large language models and technologies,” Baidu said in a press release. “With the launch of ERNIE 4.5 and ERNIE X1, Baidu will continue to invest in artificial intelligence, data centers and cloud infrastructure to advance our AI capabilities, and develop smarter and more powerful next-generation models.”

Related ReadingWhat DeepSeek Means for the Future of AI

 

What Is ERNIE X1?

ERNIE X1 is a language model with “deep-thinking reasoning” capabilities, according to its maker, Baidu. Unlike traditional language models that generate quick, pattern-based answers, reasoning models are designed to break down complex problems into logical steps, evaluate different possibilities and refine their responses before delivering a final output. This makes them particularly good at tasks that require multi-step planning, logical reasoning and problem solving.

Baidu claims that ERNIE X1’s reasoning abilities are built on several advanced techniques, including “progressive reinforcement learning,” “end-to-end training,” “chains of thought and action,” and a “unified multi-faceted reward system.” The company has not disclosed any more technical details beyond that, but these methods suggest an emphasis on iterative learning, contextual understanding and structured reasoning — strengths seen in other reasoning models as well.

In practice, Baidu says ERNIE X1 possesses “enhanced capabilities in understanding, planning, reflection and evolution,” and that it “excels” in literary creation, manuscript writing, dialogue, logical reasoning, complex calculations and “Chinese knowledge” (the company did not explain what this means). As such, the model can be used to power a wide range of applications, including:

How Does ERNIE X1 Compare to Other Models?

Baidu has not shared any specific benchmarks or evaluations for ERNIE X1, but it claims the model performs “on par with” DeepSeek R1 at “only half the price.” So far, the company has not provided comparisons to any other reasoning models on the market.

Related ReadingElon Musk’s New Grok 3 Model Just Launched, Rivaling Top Models in Many Benchmarks

 

What Is ERNIE 4.5?

ERNIE 4.5 is what Baidu calls a “native multimodal model,” meaning it can both integrate and understand text, image, audio and video content within a single framework. Many AI systems have to process different types of media separately, but ERNIE 4.5 is designed to combine them and convert them across categories — text to audio and vice-versa, for example.

“It achieves collaborative optimization through joint modeling of multiple modalities, demonstrating exceptional multimodal comprehension capabilities,” Baidu explained in its press release. With “refined language skills,” ERNIE 4.5 not only has enhanced understanding and generative capabilities, but logical reasoning, memory and coding capabilities, as well. The company also touts the model’s “strong intelligence” and “contextual awareness” in recognizing internet memes, satirical cartoons and other nuanced content.

Additionally, Baidu says ERNIE 4.5 is less prone to hallucinations — a common issue in AI where a model generates false or misleading information that, at first glance, often appears correct.

Baidu credits these capabilities to several key technologies, including “spatiotemporal representation compression,” “knowledge-centric training data construction,” “self-feedback enhanced post-training” and “heterogeneous multimodal mixture-of-experts.” Essentially, mixture of experts (MoE) models use smaller, specialized “experts” that activate only when needed, optimizing performance and cutting computational costs. While they are generally smaller and cheaper than transformer-based models, MoE models can perform just as well, if not better, making them an attractive option in AI development.

Looking ahead, CNBC reported that Baidu plans to release ERNIE 5 later in 2025, promising “big enhancements” in its multimodal capabilities.

How Does ERNIE 4.5 Compare to Other Models?

Baidu compared ERNIE 4.5’s multimodal capabilities against OpenAI’s GPT-4o, claiming that it came out ahead in almost every benchmark except MMU, which evaluates models on “massive multi-discipline tasks” that demand “college-level subject knowledge” and “deliberate reasoning.”

Baidu also says ERNIE 4.5 beat OpenAI’s GPT-4o and GPT-4.5 models, as well as DeepSeek’s V3 model, across several other benchmarks, including:

  • C-Eval: assess advanced knowledge and reasoning abilities across disciplines ranging from the humanities to science and engineering.
  • CMMLU: evaluates knowledge and reasoning abilities within the context of Chinese language and culture.
  • GSM8K: evaluates multi-step reasoning using grade school math problems.
  • DROP: measures an LLM’s reading comprehension.

It’s worth noting, however, that a number of the benchmarks where ERNIE 4.5 outperformed others were China-specific, which may explain why GPT-4o and GPT-4.5 — models developed by an American company — didn’t do as well. That being said, ERNIE 4.5 still performed better than DeepSeek-V3, which was made by a Chinese company, on many of those benchmarks.

Meanwhile, ERNIE 4.5 apparently did not perform as well on benchmarks like:

  • MMLU-Pro: evaluates language understanding across broader and more challenging tasks (beaten by GPT-4.5).
  • GPQA: comprises a dataset of 448 multiple-choice questions written by experts in subjects like biology, physics and chemistry (beaten by GPT-4.5)
  • Math-500: tests the ability to solve 500 challenging, high-school-level math problems (beaten by DeepSeek-V3 and GPT-4.5)
  • LiveCodeBench: measures coding capabilities (beaten by GPT-4.5)

Although GPT-4.5 outperformed ERNIE 4.5 on several benchmarks, Baidu says its model is priced at just 1 percent of OpenAI’s.

Related ReadingAnthropic’s Claude 3.7 Sonnet Combines Quick Responses and Advanced Reasoning

 

How to Access ERNIE X1 and ERNIE 4.5

ERNIE 4.5 is now accessible via its API and on Baidu AI Cloud’s MaaS platform Qianfan, with input prices starting at RMB 0.004 per thousand tokens and output prices starting at RMB 0.016 per thousand tokens. ERNIE X1 will be available on the platform “soon,” according to Baidu, with input prices starting at RMB 0.002 per thousand tokens and output prices starting at RMB 0.008 per thousand tokens.

Users can also interact with the models via Baidu’s chatbot, ERNIE Bot.

Frequently Asked Questions

ERNIE X1 is a language model with advanced reasoning capabilities. It can perform a variety of text-based tasks, including literary creation, document summarization and code interpretation, as well as image understanding and generation.

ERNIE is an AI model developed by Baidu that as native multimodal capabilities, meaning it can both integrate and understand text, image, audio and video content within a single framework.

According to Baidu, ERNIE X1 can perform a variety of tasks, including:

  • Literary creation
  • Manuscript writing
  • Logical reasoning
  • Complex calculations
  • Document summarization,
  • Code interpretation
  • Academic research
  • Image understanding and generation.

Because ERNIE 4.5 is a “native multimodal” model, Baidu says it can perform a wide range of tasks involving text, image, audio and video content. The company also touts ERNIE 4.5’s “strong intelligence” and “contextual awareness” in recognizing things like internet memes and satirical cartoons.

Baidu says ERNIE X1 will be available on its platform “soon.” For now, users can interact with it for free via the company’s ERNIE Bot chatbot.

ERNIE 4.5 is now available via Baidu’s API and on Baidu AI Cloud’s MaaS platform Qianfan. It is also available via the company’s ERNIE Bot chatbot.

Baidu did not respond to requests for comment for this story.

Explore Job Matches.