The oversight paradox: Why human control over AI may be eroding the very competence it requires

Jul 2, 2026

Artificial Intelligence (AI) handshake with humans on industrial robotics in blue tone color blurred background, The robot has a role to work replacing humans in modern industries, industry 4.0 concept

The oversight paradox at the heart of human attempts to control AI Image: Getty Images/iStockphoto

Isabell Steidel

Global Shaper, Heidelberg Hub, Global Shapers Community

Benedikt Gieger

Founding Curator and Global Shaper, Heidelberg Hub, Global Shapers Community

Frameworks for governing artificial intelligence, such as the EU's AI Act, rely on the premise that the human stays in control.
But the competence a human needs to oversee an AI system is kept through practice, practice that the AI is now doing instead.
Preserving that person, along with their skill, their practice, their judgement, will be increasing essential to governing AI well.

Every framework for governing artificial intelligence (AI) rests on the same reassuring premise: the human stays in control. The EU’s AI Act requires human oversight of high-risk systems; its General Data Protection Regulation (GDPR) restricts decisions taken by machines alone; corporate AI policies promise that a person reviews every output before it reaches a customer. The message is consistent: AI decides nothing on its own.

But that premise hides a problem rarely stated out loud, and it is present from the very first day a system is deployed. The competence a human needs to oversee an AI system is not a fixed asset. It is built and kept alive through practice – the same practice the AI system is now doing instead of the human.

The more capable the system, the more work we hand it, and the fewer occasions the overseer has to exercise the very judgement the oversight role depends on. Human oversight does not simply sit in place as a safeguard. It quietly depreciates, task by task, in step with the system it is meant to control. We call this the oversight paradox.

The uncomfortable implication is that better-performing AI can make oversight weaker rather than stronger. As a system takes on more of the cognitive work, the human reviewing it has less first-hand command of that work and is therefore less able to catch the moments when the system is wrong. Capability and oversight risk rise together. This is not a worry about some future, more advanced AI. It describes what oversight is already up against today.

The oversight paradox is not hypothetical

The speed of the shift is what makes this urgent. On the GPQA benchmark — doctoral-level questions in physics, chemistry and biology — human PhD experts answer roughly 65% correctly. When the benchmark was released in late 2023, the strongest GPT-4 baseline managed only 39%. By 2026 the leading model scores roughly 94%. The comparison is not perfectly clean, but the broad picture holds: machines have moved from well below expert level to well above it.

Software engineering shows the same curve. On SWE-bench, which asks models to fix real issues in open-source code, the best system at launch in 2023 resolved under 5% of problems. On the human-validated SWE-bench Verified set, leading systems now resolve well over 90%.

The calculator took a generation to dull mental arithmetic. AI is reshaping expert performance on a timescale of months – and that compression is the whole problem for oversight. The faster a system climbs past the people meant to supervise it, the faster their hands-on familiarity with the task is left behind. Each capability jump quietly raises the bar the overseer must clear, while removing the practice that would let them clear it.

The pattern is not confined to the lab. Anthropic’s Economic Index finds that AI use is concentrated in mid-to-high-wage knowledge work — software development, technical writing and similar professional tasks — rather than at the lowest- or highest-paid ends of the labour market. Anthropic is careful not to claim that wholesale deskilling will follow; it explicitly declines to predict that outcome. But the exposure is telling: the tasks most touched by AI are precisely the expert, judgement-heavy ones on which meaningful oversight depends.

Eroding the capacity to oversee AI

Three distinct forces are at work, and they compound one another.

The psychological force: automation bias

People tend to trust an automated system when it is usually correct. Classic flight-simulator studies found that even professional pilots failed to act on problems the system missed, or followed faulty prompts without cross-checking. In scenarios where the information needed to catch the error was right there to be checked, error rates of around 55% were recorded. The same reflex transfers to knowledge work. A radiologist whose AI flags anomalies with very high accuracy will, after the hundredth scan, drift from evaluating the image to confirming the machine. The EU AI Act’s Article 14 addresses this directly, requiring overseers to stay alert to automation bias.

The structural force: the deskilling cycle

The system performs the task; the human is moved into a quality-assurance role. But quality assurance requires expertise, expertise is acquired through practice, and practice is the very thing the system has taken over. Consider a lawyer who spends two years reviewing AI-drafted contracts rather than writing them. At what point do they lose the feel for a strategic omission, or the judgement that only comes from having drafted hundreds of contracts themself? Each round of delegation makes the next round of oversight a little thinner.

The legal force: formal versus effective oversight

Article 22 of the GDPR restricts decisions based solely on automated processing where they significantly affect a person. Regulators have been explicit, though, that a controller cannot escape the rule with a token human review: the human involvement must be meaningful, carried out by someone with the authority to overrule the decision and the knowledge to weigh all the relevant data. Where that ability has eroded, what remains is compliance theatre.

The EU AI Act tries to close the gap through Article 4, which requires AI literacy, and Article 14, which requires effective oversight of high-risk systems. Yet both largely assume a stable relationship between human competence and machine capability. A regulation can guarantee the possibility of intervention, but it cannot guarantee that the person intervening can still tell whether intervention is warranted.

Have you read?

Towards oversight that sustains itself

The paradox also reaches into democratic governance. AI systems are increasingly used in public administration, but the elected bodies responsible for governing them face a translation gap: the technical reality of these systems does not render easily into the language of democratic deliberation.

At EU level, the AI Office must draw much of its expertise from the very organizations it regulates — not through any conspiracy, but through a competence asymmetry built into the field.

The oversight paradox is not an argument against adopting AI. It is an argument against assuming that human oversight maintains itself. Three directions deserve serious attention.

Substance over form in human-in-the-loop rules. Oversight provisions should require a demonstrated ability to override the system, not merely the procedural presence of a reviewer.
Deliberate maintenance of competence. Just as pilots must regularly fly without autopilot, knowledge workers in AI-augmented roles need structured practice without the system — treated as an institutional requirement, not an individual’s good intention.
Bridge actors in democratic institutions. People who understand what a model actually does and can translate that into the categories of public accountability. There are too few of them, and training more is an urgent governance priority.

The question every organization, regulator and legislature should be asking is not whether a human is in the loop. It is whether that human could still do the job without the AI – and whether that capacity is being actively kept alive.

In many cases today, the honest answer is no. When the overseer can no longer do the work unaided, oversight stops being a safeguard and becomes a signature. The error that finally slips through can only be caught by a person who still knows enough to see it. Preserving that person — their skill, their practice, their judgement — is becoming the whole of governing AI well.

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.