OpenAI Declared Code Red. That’s Everybody’s Problem.

The AI giant’s recent moves may spell trouble for the systems built on ChatGPT. Our expert explains what’s going on.

Written by Yvette Schmitter
Published on Dec. 24, 2025
A smartphone with the OpenAI logo on top of a laptop keyboard
Image: Shutterstock / Built In
Brand Studio Logo
REVIEWED BY
Seth Wilson | Dec 22, 2025
Summary: OpenAI has declared a “code red,” prioritizing speed over safety to counter Google’s Gemini 3. Internal memos reveal GPT-5.2 was rushed despite known biases and risks in automated systems. Market position now outweighs safety.

OpenAI recently declared “code red,” and Sam Altman sent an internal memo telling employees to drop everything and fix ChatGPT. And not because their AI failed.

They did it because Googles Gemini 3 topped them on major benchmarks, including a test for doctoral-level reasoning, and because Gemini’s monthly active users jumped, while ChatGPT’s active users dropped. And even Salesforce CEO Marc Benioff publicly ditched ChatGPT for Gemini 3.

OpenAI is under immense pressure and needs $200 billion in revenue by 2030 to turn a profit. When users complained GPT-5 felt “clinical” and performed worse at basic math, OpenAI declared “code orange” and relaxed safety restrictions.

Now there’s a code red, but not because safety guardrails degraded during user conversations, mental health episodes linked to ChatGPT or hallucinations. It’s because they're losing market position, which matters more to the company than safety.

Why Did OpenAI Declare a Code Red?

OpenAI declared a Code Red in response to intense market pressure from Google’s Gemini 3, which recently surpassed ChatGPT on major doctoral-level reasoning benchmarks. To maintain its market position, OpenAI has shifted its focus to speed and reliability over safety, leading to the rapid release of GPT-5.2. This has raised concerns about:

  • Safety Trade-Offs: Relaxing guardrails to fix perceived clinical performance.

  • Systemic Bias: Continued failure to address racial and gender biases in hiring and lending.

  • Automation Risks: Rapid deployment in sensitive sectors like healthcare (Oscar Health) and customer service (Intercom) before accuracy can be verified.

More From Yvette SchmitterOpenAI vs. Anthropic: A Fight to Define How You’ll Work

 

Speed Beats Safety Every Time

OpenAI released its 2025 Enterprise AI report the following week. The forward states their mission: “to ensure that artificial intelligence benefits all humanity.” The report concludes “organizational readiness,” or OpenAI’s internal capacity to implement AI tools, is now the primary constraint. A code red makes that obsolete. Speed is the constraint now. User safety isn’t in the equation.

OpenAI is prioritizing speed, reliability and personalization. And they seriously mean speed. Since calling a code red, OpenAI has already released the newest version of ChatGPT: GPT-5.2. But speed without accuracy is expensive. Life-altering mistakes happen quickly, and AI fails at context, nuance and makes terrible judgment calls.

These skills are the core requirements of white-collar work. When companies automate tasks that require understanding situational context and exercising judgment, but they simultaneously race to ship features faster than the company can verify they work correctly, the company is not scaling intelligence. It’s scaling failure modes. The code red emergency isn’t asking, “How do we build AI that can handle these complexities?” It’s asking, “how do we ship faster than Google?”

Companies are already building on the foundation of OpenAI’s technology. NLPearl promises to reshape the $500 billion call center industry with AI voice agents that replace human workers using “a single prompt.” Their documentation admits AI pushes human labor “toward the edge cases: judgment, empathy and accountability.” But that’s the whole job.

When a customer is upset about a billing error, angry about a service failure or confused about a policy, they need someone who can read the situation, exercise judgment about when to bend rules and take accountability for making it right. AI can route calls and recite policies. It can’t recognize when a loyal customer deserves an exception, when someone’s anger masks fear or when a unique situation requires creative problem-solving.

This is the underlying vein running through every AI deployment pitched as efficiency. Companies are optimizing away the humans who stand between them and individuals’ money. Call centers aren’t failing because representatives cost too much. They’re failing because corporations already cut training, scripts became rigid and workers got measured on call duration instead of problem resolution. Now AI promises to eliminate even that thin layer of human judgment.

OpenAI isn’t prioritizing eliminating hallucinations, fixing selectional biases that are baked into its models or preventing automation bias where hiring algorithms screen out qualified candidates based on names and zip codes. Companies are still learning to use current tools while OpenAI races to ship new ones. The median enterprise hasn’t figured out how to verify AI outputs, audit for bias or prevent discrimination. They haven’t built the infrastructure to catch these issues.

So companies are building HR systems, customer service platforms and financial tools on a foundation with two fatal problems. First, the technology itself fails at the tasks it’s automating. Second, most organizations cannot catch those failures before they harm people. The code red makes both problems worse. It accelerates deployment before either the technology or the safeguards are ready.

 

The Focus Race

OpenAI committed $1.4 trillion to AI data centers over eight years. Wall Street calls these circular deals between suppliers a warning sign of an AI bubble. All of this money goes to fund a race where safety is an afterthought.

The code red forces every competitor to choose: consolidate around flagship products or risk marginalization. Here’s why. When OpenAI declares an emergency and redirects all resources to ChatGPT, it signals to every other AI company that the game just changed. The competitive bar isn’t about building interesting AI capabilities anymore; it’s now about matching the pace and intensity of a company burning billions to dominate the consumer AI interface.

That level of competition requires focus. Companies can’t spread engineering talent, compute resources and capital across multiple products when its primary competitor just went all-in on one. Companies doubling down on dominant interfaces gain compounding advantages through network effects, data accumulation and brand recognition. Companies spreading teams thin lose ground with every release cycle they can’t match.

OpenAI’s massive capital commitments amplify this pressure. Smaller firms can’t compete on compute or speed, so they’re forced toward one of three positions. They can partner with a foundation model provider, specialize in a narrow vertical or build applications on top of major platforms. This creates a layered market with heavy compute foundation providers at the base and application companies competing on domain expertise on top.

And the safety problem? Faster release cycles mean less time for testing, auditing and catching failures before deployment. When competition forces everyone to accelerate, the odds of visible failures and misuse increase across the entire market. One company’s code red becomes everyone’s problem.

Companies in OpenAI’s report built systems assuming measured deployment. Intercom’s Fin Voice handles 53 percent of customer calls. Oscar Health’s chatbot handles 39 percent of benefits inquiries without human escalation. Those aren’t small numbers. Intercom is processing millions of customer service interactions monthly through AI. Oscar Health is making decisions about people’s healthcare coverage, whether they’re covered for a procedure, what their deductible is and whether a claim gets approved, without a human reviewing 39 percent of those conversations.

These companies calibrated their systems based on the trust that OpenAI would maintain stability. 53 percent automation means Intercom determined that level was safe and reliable, given the technology’s capabilities. They didn’t go to 80 percent or 100 percent because they built in safety margins. They tested, validated and committed to a specific level of AI autonomy.

But code red changes the foundation those decisions were built on. When OpenAI declares an emergency,  every system built on that foundation inherits the new risk profile. Intercom’s 53 percent and Oscar’s 39 percent were calculated based on a vendor prioritizing measured deployment. Now that the same vendor is in survival mode, relaxing safety restrictions and accelerating release cycles.

The companies using the technology don’t get to recalibrate. Their customers are already in the system.

 

What the Code Red Actually Means

When companies declare emergencies over market position, everyone loses.

Medical AI misses heart attacks in women and minorities because training data assumes chest pain looks the same across demographics. A 2021 Nature Medicine study found that AI algorithms trained on chest X-rays consistently underdiagnosed disease in historically underserved patient populations, with the highest underdiagnosis rates occurring in intersectional groups. A 2024 study in the Journal of Medical Internet Research tested GPT-4’s assessment of coronary artery disease risk using identical clinical vignettes and found systematic gender bias in cardiovascular risk assessment.

Hiring AI rejects brilliance because some names “don’t fit company culture.” University of Washington researchers tested three leading AI models on over 550 resumes and found the systems favored white-associated names in 85 percent of cases, female-associated names in only 11 percent of cases and never favored Black male-associated names over white male-associated names. An estimated 99 percent of Fortune 500 companies now use AI in hiring, and a 2024 survey found 70 percent allow AI to reject candidates without human oversight.

When it comes to loans, Lehigh University researchers found that leading large language models consistently recommended denying more loans and charging higher interest rates to Black applicants compared to otherwise identical white applicants. A 2022 UC Berkeley study found that African American and Latinx borrowers are charged nearly five basis points higher in interest rates than credit-equivalent white counterparts, amounting to $450 million in extra interest annually. 

When a company feels like it is losing its position, leadership doesn’t slow down to fix bias. They declare emergencies, relax safety restrictions and ship faster. 

Survival mode means the company optimizing loan applications cares more about beating competitors than whether their system charges some individuals extra because of their zip code. The AI screening resumes prioritizes shipping new features over testing whether it discriminates against a specific name. The medical AI diagnosing a chest X-ray was trained on data that assumes everyone presents symptoms the same way, and nobody has time to retrain it when users are abandoning the platform.

These aren’t future concerns. These systems are deployed now.

More on the Tech EconomyWhat Past Tech Bubbles Can Teach Us About This One

 

The Guardrails That Aren’t Coming

OpenAI published research on making models “confess” when they take shortcuts, a technique where the AI admits in a separate report when it violated instructions, hacked reward systems or cut corners. The research worked. Models confessed to misbehavior with 95.6 percent accuracy in stress tests. But the study explicitly describes this as a “proof of concept,” not a deployed feature. And there’s no regulatory requirement for OpenAI or any AI company to actually implement these safety mechanisms in production systems in the U.S. 

All enforcement actions against OpenAI, like the €15 million fine from Italy, FTC investigations and ongoing lawsuits, all target data privacy violations, copyright infringements and unauthorized data collection, not failure to deploy available safety techniques. Companies can publish research showing how to detect AI deception, then never use it.

The situation is clear: The safety and dignity of individuals are up to us. OpenAI’s code red tells us clearly that when the choice is between human safety and their market position, we know which one wins.

Explore Job Matches.