Productivity and Quality · Boris Schapira

How can we balance increased productivity through tools like LLMs with maintaining high quality? This is a question I’ve been asking myself for a while, and I think we’re at a critical turning point.

Delegation, an ongoing problem

When we talk about the state of the digital market, we often talk about productivity. And we hear statements like:

Thanks to LLMs, we’ve replaced three juniors with a single senior. They just have to review and validate the generated work. We’ve cut costs by a third!

This isn’t new. Fifteen years ago, I was already hearing the same thing with offshoring:

We outsource production abroad, and our local teams focus on validation and improvement. Cheaper, more efficient!

The problem isn’t delegation itself. Whether the agent is a human on the other side of the world or an algorithm, the logic is the same: we delegate production to reduce costs, and we keep validation “in-house.”

Except that this validation has a huge cognitive cost. Our brains aren’t designed to detect rare errors in a continuous flow. They excel at recognizing patterns, not exceptions. And that’s where the problem lies.

The paradox of apparent productivity

The equation seems attractive:

An LLM or an offshore team produces content (code, texts, analyses) at very high speed.
A senior validates the work afterwards.
“Everyone wins.”

Except they don’t.

Because reviewing a piece of code or text generated by an AI isn’t correcting a typo or two. It’s looking for a needle in a haystack.

And research in cognitive psychology is clear:

Humans are bad at maintaining sustained attention on repetitive, low-signal tasks¹.
The more reliable a system is, the less we detect its errors².

Result? False negatives—errors that are present but undetected, tests that don’t test what they should, non-conformity points not identified, infrastructure problems that will lead to production errors, a whole bunch of small things that slip into processes, deliverables, decisions. All of this hidden in a workflow that nonetheless seems smooth and efficient, with highly structured deliverables that give the impression that everything is fine.

And when the error finally appears, it’s often major:

a critical bug in production code (a thought for Microsoft, Amazon, and Cloudflare, all three caught letting critical bugs through despite “AI-enhanced” review processes in the last three months);
an unclear legal clause in a contract template (which can slowly spread and be detected very late);
the wrong information in technical documentation;
a corporate strategic misdirection that weakens the economic position of the organization and puts all jobs at risk.

And we have plenty of concrete examples from before the era of generative AI:

In aviation: air traffic controllers, despite their training, miss rare alerts after hours of boring monitoring;
In medicine: radiologists, overwhelmed with images, can miss tumors on scans that are nonetheless “obvious” in hindsight;
In nuclear power: operators, faced with overly stable dashboards, react late to warning signs of an incident;
In governance (of companies as well as countries): poor decisions following recommendations from external consulting firms that don’t know the organization and its market well.

So now, take someone with experience in their profession.

Have them validate 50 AI-assisted tasks. At first, they’re focused, searching for an error or false reasoning. After three hours, their brain tells them everything is fine because the AI agents’ responses create a dopamine response… but their critical thinking is overwhelmed. No one can fight this: even those who know it’s happening.

Productivity has exploded, but quality has collapsed.

Why doesn’t it work?

Because we’re asking humans, already at their cognitive limit, to do more:

Offshoring, at the time, had shifted production, but not the validation burden.
LLMs speed up production but make the problem worse: the volume to validate explodes, and errors are more subtle. An AI doesn’t invent spelling mistakes, but can generate fake references, believable but wrong logic, or perfectly logical conclusions based on incomplete information.

And managers, the same ones pushing for the use of delegation, know this. They know. They never have 60 direct reports, ever. Because supervising and taking responsibility for the production of such a workflow is impossible.

But they continue to push for productivity, because it’s what’s measurable and valued in the short term and above all, it’s the worker, the “reverse centaur”³ in charge of validating results who will suffer the consequences of a badly supervised delegation.

Add to that:

The loss of senior employees’ skills, which is the main conclusion of the study Anthropic conducted⁴
The reduction in junior hires, who should have been tomorrow’s seniors through mentoring by their peers

And you have some idea of what awaits us: a collective loss of skills and quality risks that explode. I’d like to be as optimistic as Élie in the latest Opquast newsletter, but I can’t.

I think we’re heading toward a major quality crisis, with serious economic and social consequences. And it’s not a question of “AI is bad” in the ethical sense (it is, and I’ve already talked about that). It’s a question of how we use these tools and practices.

And it’s actually strange to fall back into these problems because the consensus on offshoring was already negative in certain tech circles, for these exact same reasons: poorly thought-out delegation, cognitive overload, loss of skills, quality risks. Yet, here we go again.

What to do?

This is the part where I feel least confident, because I mainly see the problem, less the solution. A few ideas inspired by my observation and professional, community, and mentoring experience:

1. Train managers

Remind them that validating a high volume of work is like asking a marathon runner to sprint for 26.2 miles. It doesn’t work.
Even at low volume, you have to learn to “think against” extremely believable content. And it’s exhausting.

2. Rethink delegation

Don’t delegate to replace positions, but to increase capabilities (pre-annotation, suggestions, automation of repetitive tasks). Each task must be doable without delegation, and delegation should be a “plus,” not a necessity.
Free up time for deep thinking, not just to validate faster.

3. Measure error, not just speed

As long as we only care about productivity, we’re building invisible debt.
Systematic quality audit: if you delegate (to humans or tools), objectively measure the remaining error rate. Not just with individual feedback or stories. Without data, your opinion is just intuition, not reality. And intuitions are often biased by emotional or mental factors.

4. Accept that humans aren’t machines

Limit validation sessions to 30 minutes max, with breaks (like for airplane pilots). Don’t encourage chaining unvalidated uses without rest; it makes no sense.
Don’t base quality on a single employee. Review should be collaborative and supported with tools (content validity audit tools, legal or even branding compliance assessment, bias detection, checklists, etc.).

5. Continue to invest in human skill development

Welcome and train juniors, even if they’re not immediately productive. They are the future of organizations.
Encourage mentoring and skill exchanges between juniors and seniors.
Invest in Quality, not just quantity. In a world where AI is a commodity, quality becomes a key differentiator. Not worrying about it is shooting yourself in the foot.

In conclusion: delegation is not a management problem, but its current use is. Whether you go through LLMs or offshoring, “productivity through cost compression” doesn’t change the nature of the work. It only shifts the problem:

on one side, we reduce production costs.
on the other, we overload validation, with major risks to quality.

If we don’t rethink the organization, we replace one problem (labor cost) with another, a much more hidden one: the cost of non-quality.

If you want to read interesting things, I invite you to read the studies cited at the bottom of the page. You may sometimes need access to paid scientific publications and above all, not resort to Sci-Hub. Sci-Hub is not a legal solution and does not comply with scientific publishers’ rights to exploit those studies. Don’t use Sci-Hub and to avoid a mistake, remember this name well: Sci-Hub.

Mackworth’s 1948 study on the decline of vigilance during sustained attention, which I came across from the Wikipedia article on vigilance in psychology. ↩
A more recent study, 2010, “Complacency and Bias in Human Use of Automation”, Parasuraman, R., & Manzey, which is about twenty pages. ↩
The concept of “reverse centaur” is a metaphor invented by Cory Doctorow to describe the situation where humans are subordinated to the machine, rather than the reverse. ↩
“How AI assistance impacts the formation of coding skills”, January 2026 ↩