The AI Engineering Paradox: Faster Code, More Bugs

More Output, Worse Outcomes

Ninety-seven percent of engineering teams have adopted AI coding tools. Individual throughput is visibly up. Yet production incidents have tripled, bugs per developer are rising faster than code volume, and deployment frequency is actually down. This is the central finding of Faros AI’s AI Engineering Report 2026: The Acceleration Whiplash — the largest quantitative study of AI’s impact on software delivery to date, drawing on two years of telemetry from 22,000 developers across 4,000 teams.

The pattern is consistent and statistically significant. AI makes individuals faster. It does not make teams better. And in many cases, it actively degrades the systems those teams are paid to keep running.

What makes this report harder to dismiss than most is its methodology. The data comes from real workflow telemetry — commits, pull requests, review timelines, incident logs — captured before and after AI adoption within the same organizations. That before-and-after design eliminates the selection bias that plagues most industry benchmarks. You are not comparing AI-forward teams against laggards. You are watching what happens to the same team as it moves from low to high AI adoption over two years.

What the Numbers Say

The throughput numbers are real and worth acknowledging. Epics completed per developer are up 66.2%. Task throughput per developer is up 33.7%. PR merge rate per developer is up 16.2%. Developers are unambiguously writing and closing more work items than before. If your KPI is tickets resolved, AI is working.

The problem shows up the moment you look downstream. Pull request size has grown 51%. Bugs per PR are up 28%. Bugs per developer are up 54%. Median time in code review has increased fivefold. Incidents per PR have tripled, with monthly incidents up 57.9%. Code churn — code written and then discarded or rewritten shortly after — is up 861%. And deployments per week are down 11.7%.

More code is entering the system. Less of it is surviving contact with production intact. The AI-generated output that does make it to production is arriving in larger, harder-to-review chunks, carrying more defects, and destabilizing systems at a rate that no throughput gain can justify if your actual goal is shipping software that works.

Why Review Time Has Exploded

The fivefold increase in median review time is the most operationally damaging finding in the Acceleration Whiplash report. It explains precisely how individual speed gains vanish at the team level: one developer ships faster, but their pull request now waits five times longer before anyone can merge it. The net effect on calendar time to production is negative.

The mechanics are not mysterious. AI-generated code arrives in larger batches. Its bugs are distributed differently from human-written code — less likely to be obvious typos or missing null checks, more likely to be architecturally plausible but behaviorally wrong in ways that require deeper reading to catch. Reviewers, who are not themselves generating five times more review capacity, cannot keep pace. The queue grows. Senior engineers are pulled into review cycles that consume the same hours they once used for design, mentoring, and high-leverage technical work.

The Faros data shows 31.3% of pull requests now merge without any human review at all. That number is not a sign of confidence in AI-generated code quality. It is a sign of queue collapse — teams skipping a quality gate they once relied on because the alternative is blocking delivery entirely. The result shows up immediately in production: a 242.7% increase in incidents per PR.

LinearB’s separate analysis of 8.1 million pull requests across 4,800 teams reaches a compatible conclusion: AI-generated code waits 4.6 times longer for review than human-written code. The bottleneck is not generation speed. It never was. It is the human capacity to validate what the generator produces.

Senior Engineers Aren’t the Solution

One intuitive response to the review overload problem is that senior engineers — who understand the codebase more deeply — can absorb the burden. The Faros data directly contradicts this. Engineering maturity is not a shield. High-performing teams show the same quality degradation curve as lower-maturity organizations as AI adoption deepens.

METR’s separate productivity study supports the finding: veteran developers experience an 18% slowdown when using AI tools on complex, unfamiliar codebases. The cognitive load of reviewing plausible-looking but subtly incorrect AI output appears to be harder, not easier, for engineers who care deeply about correctness. They slow down because they are catching things a less experienced reviewer would approve.

The Faros report frames this as a systems problem, not a talent problem. The engineering process was designed around human-paced development and human-quality code. AI has flooded that process with a volume and type of output it was never built to absorb. You cannot fix a process mismatch by assigning better people to it. This is also the core argument behind what vortx has previously called absorption capacity — the organizational ceiling on how much AI output a team can validate and integrate before quality collapses.

What Engineering Leaders Should Do

The Acceleration Whiplash is not an argument against AI coding tools. It is an argument against adopting them without redesigning the processes around them. The throughput gains are real and worth keeping. The quality degradation is also real, is accelerating, and is currently being ignored by most teams tracking only velocity metrics.

Three interventions emerge from the research. First, move quality control upstream: fix AI-generated code at the authoring stage — through better prompting conventions, automated linting, test generation pipelines, and pre-commit checks — rather than pushing the cost onto reviewers downstream. Second, restructure review workflows explicitly for AI-generated code. Smaller logical units, automated pre-screening for common AI failure modes, and clear policies on when human review is required versus when AI-on-AI review is acceptable. Third, measure outcomes that matter: the DORA 2025 report already showed AI adoption correlating with flat or declining delivery performance; organizations still tracking only tickets closed and PRs merged are optimizing the wrong metric.

The 25% of pull requests now reviewed by an AI agent is an early signal that teams are trying to solve a volume problem with more volume. Whether AI-on-AI review closes the quality gap or compounds it is one of the genuinely open empirical questions of 2026. The Acceleration Whiplash data suggests the answer will depend entirely on implementation quality — and that any team betting on AI review without measuring its defect escape rate is repeating the same mistake that got them here.

The AI Engineering Paradox: Faster Code, More Bugs

More Output, Worse Outcomes

What the Numbers Say

Why Review Time Has Exploded

Senior Engineers Aren’t the Solution

What Engineering Leaders Should Do

Further Reading

Don’t miss on Ai tips!

Don’t miss on Ai tips!

The AI Engineering Paradox: Faster Code, More Bugs

More Output, Worse Outcomes

What the Numbers Say

Why Review Time Has Exploded

Senior Engineers Aren’t the Solution

What Engineering Leaders Should Do

Further Reading

Don’t miss on Ai tips!

Don’t miss on Ai tips!

Enjoyed this? Get one AI insight per day.

Related Articles

AI M&A in 2026: Why the LLM Herd Is Thinning Fast

The AI Engineering Paradox: Faster Code, More Bugs

Claude Code vs Cursor vs Copilot: May 2026