The Australian Department of Employment wanted an analysis of welfare programs. It hired one of the Big Four consulting firms, the kind of name that looks good on a government contract. Six figures later, they got a report citing academic papers that don’t exist and a fabricated Federal Court quote.
A welfare researcher caught it almost immediately. The firm refunded $290,000. The official response? The errors “did not change the final recommendations.” As if hallucinated evidence is fine if you reach the right conclusion anyway. Then it happened again, with the same firm, on a different continent. A healthcare report for Newfoundland and Labrador was riddled with wrong information about hospitals. Two major outrages in two months aren’t bad luck. They’re a pattern.
AI and the Expertise Paradox
The better AI gets at generating authoritative-sounding content, the more expertise you need to verify it’s not nonsense. But if you’re using AI specifically to make up for having fewer expensive senior personnel, you’ve eliminated your quality control mechanism. This introduces a paradox where AI eliminates the very thing it needs to be useful.
The Consulting Model Was Already Fragile
Look, we should be honest about what the Big Four have been selling. It’s not always revolutionary insights. A lot of the time, it’s credibility by association. When a politician needs to make a tough call, saying “McKinsey studied this” is worth a lot. The Big Four serve as expensive insurance policies against being second-guessed.
The people doing the actual work are often smart kids fresh out of college, working brutal hours and following frameworks used hundreds of times before. The real product was always the letterhead. That model was profitable but fragile, and it depended entirely on maintaining a reputation for rigor.
Then came AI. Firms realized that if the work is just templates and synthesis, AI can do it faster. Keep charging the same rates, cut expensive senior consultant time and protect margins. Except automation without accountability destroys the industry’s entire value proposition.
The Expertise Paradox
Here’s what gets me: The mistakes made it into the final deliverables. Either nobody read the reports before submission, or people did read them but didn’t have the expertise necessary to recognize fabricated citations.
And there’s your paradox. The better AI gets at generating authoritative-sounding content, the more expertise you need to verify it’s not nonsense. But if you’re using AI specifically to make up for having fewer expensive senior personnel, you’ve just eliminated your quality control mechanism.
This problem isn’t just affecting consulting, either. Legal services, medical diagnostics, financial advisors: everyone’s chasing the same promise of expertise at scale and analysis at speed. The risk nobody wants to talk about? We’re replacing genuine expertise with convincing approximations, and most clients can’t tell the difference until something breaks.
Why Government Work Makes This Worse
When AI’s mistakes show up in government reports, the stakes get higher fast. Healthcare planning determines which communities get hospitals. Welfare policy shapes how we support vulnerable people. But government procurement is especially vulnerable. How many civil servants can catch a fabricated Federal Court quote? These agencies are hiring consultants because they need expertise, which means the agencies themselves are least equipped to verify the work.
The old defense against errors was reputation. But if AI lets firms scale revenue without scaling expertise proportionally, that calculus changes. You only get caught if mistakes are obvious enough that someone notices publicly.
What This Crisis Means for Knowledge Workers
If you work in knowledge industries, these scandals are your preview for what’s coming to your field. As AI makes professional-looking analysis trivially easy to produce, we’re splitting into two groups: People who can generate convincing content, and people who can evaluate whether it’s any good. A junior analyst with ChatGPT can create a report that looks exactly like what a senior expert would produce. The difference only becomes visible when someone with real expertise spots the logical gaps.
This creates a market for lemons. If buyers can’t distinguish quality from AI approximations, they’ll default to whatever comes at the lowest price. The race to the bottom has started. For those trying to stay valuable, simply being able to generate stuff with AI won’t be enough. The skills that matter are evaluating what AI generates, catching hallucinations, and recognizing when something sounds authoritative but is wrong. In short, domain expertise.
How Can We Stop the Slide?
So, what do we do about this? I’m skeptical of easy fixes, but some practical steps might actually work if anyone has the incentive to implement them.
The most straightforward approach is transparency requirements. Government contracts could mandate disclosure when AI is used in deliverables, along with details about what review processes are in place. It’s not perfect-firms could check a box and move on, but at least it creates accountability. When a consulting firm has to formally state, “We used AI for this analysis and here’s how we verified it,” that’s a paper trail. It forces them to think through their quality control instead of hoping nobody notices.
Governments could also build internal AI evaluation capacity. Instead of outsourcing everything, hire a small team of people who actually understand these tools, their capabilities and limitations. The goal is not to replace consultants entirely, but to serve as informed buyers who can spot when a report seems off. It’s like having a building inspector when you’re constructing a house. The contractor might be reputable, but you still want someone who knows what kinds of problems to look for.
For the consulting firms themselves, the answer is unappealing but necessary: Reinvest in human expertise. That means more senior reviewers per project, mandatory review protocols for AI-generated content, and, honestly, just paying people to read the damn reports before submitting them. I know this cuts into margins. That’s kind of the point. If your business model only works when nobody’s checking your work, maybe it’s not a business model worth preserving.
Professional associations could help by establishing standards. What does responsible AI use in consulting actually look like? When should human review be mandatory? What qualifies as adequate verification? These sound like boring committee questions, but standards matter. They give ethical firms something to point to when clients pressure them to cut corners, and they give clients a baseline for what good practice looks like.
Will any of this happen? I’m not holding my breath. Individual firms have strong incentives to keep using AI aggressively until the reputational damage outweighs the cost savings. Governments are stretched thin, and hiring specialized staff is hard. Industry standards take years to develop and even longer to enforce. But the alternative, waiting until a truly catastrophic policy failure forces action, seems worse.
Where Will This Go?
Every consulting firm faces a dilemma now. Keep using AI aggressively and risk more public failures or invest in human oversight and watch costs rise while competitors undercut you. Maybe this forces a flight to quality, with clients paying premiums for firms that demonstrably maintain real expertise. Maybe. The pessimistic take is that most clients won’t know the difference until they’re implementing policy based on hallucinations.
Canadian authorities are discussing tighter oversight for contractors using AI. But how do you write regulations distinguishing legitimate AI assistance from automation that produces fabricated quotes? The technology moves faster than policy can keep up.
The firms involved will probably be fine because big consulting always weathers these things. But something has shifted. We’ve learned that a $290,000 report from one of the most prestigious firms might contain obvious fabrications. AI didn’t create this problem; it just revealed it. Turns out the emperor’s new clothes were being woven by an algorithm that occasionally hallucinates entire wardrobes.
