The Numbers That Don’t Add Up
Roughly 90 to 95 percent of software developers now use AI coding assistants at work, according to the 2025 DORA State of AI-Assisted Software Development report. The average developer spends around two hours a day with these tools — about a quarter of their workday. By every adoption metric, the transition is complete. Yet when you look at the four DORA metrics that actually measure software delivery — deployment frequency, lead time for changes, change failure rate, and mean time to recovery — the numbers are flat.
This is the central finding of a year’s worth of converging research: AI coding agents have become nearly universal, but organizational delivery performance has not moved in lockstep. More than 50 percent of survey respondents still deploy less than once per week. When deployments fail, 15 percent of teams need more than a week to recover. Those numbers look similar to 2023.
What AI Actually Improved
To be clear, AI tools do produce measurable gains — they just land at the individual level, not the system level. Developers using AI complete 21 percent more tasks and merge 98 percent more pull requests, according to Faros AI’s analysis of the 2025 DORA findings. Individual velocity is genuinely up.
Specific task categories show the clearest wins. Code restructuring, boilerplate generation, and test scaffolding can run 60 to 90 percent faster with AI assistance. Zylos Research tracking from February 2026 puts the average routine-task time savings at 30 to 60 percent. Developers who use AI also report perceiving themselves as more productive — the subjective experience matches the task-level data.
AI now writes roughly 41 percent of all code across the industry, up from under 20 percent two years ago. The raw output numbers are real, and no serious analyst is disputing them. The question is what happens to that output once it leaves the developer’s machine.
Where the Gains Disappear
The problem is downstream. More code, faster, does not automatically mean faster delivery. Faros AI’s breakdown puts code review time up 91 percent and average pull request size up 154 percent. Bug rates have climbed 9 percent per developer. Every hour saved writing code is at least partially consumed by the expanded surface area that code creates.
The PR Size Problem
When AI writes large blocks of plausible-looking code quickly, developers tend to submit larger pull requests — often without the careful decomposition that makes review tractable. Reviewers face denser diffs that are harder to reason about, which slows the entire merge pipeline. The net effect: more PRs in the queue, each taking longer to clear. For teams already close to their review capacity, this translates directly into longer lead times.
Code Quality Degradation
Zylos Research’s 2026 metrics tracking found code duplication increased fourfold since AI coding tools became widespread. Cloned code rose from 8.3 percent to 12.3 percent of codebases. Refactoring dropped from 25 percent to under 10 percent of changed lines between 2021 and 2024. Teams are shipping more, but the structural health of codebases is declining. That decline compounds over time: technical debt from AI-assisted output takes longer to pay down than debt from hand-written code, precisely because the volume is higher and the ownership is less clear.
AI Is an Amplifier, Not a Fixer
The 2025 DORA report frames this clearly: AI magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones. A team with tight CI/CD feedback loops, small batch discipline, and solid test coverage gets faster with AI. A team with flaky pipelines, bloated PRs, and deferred maintenance finds that AI floods the queue with more of what it was already bad at handling.
This is why the headline metric — deployment frequency — hasn’t shifted across the industry. The teams already deploying daily or multiple times a day are mostly getting faster. The teams deploying weekly or less frequently are not catching up; if anything, they’re generating more work-in-progress without the organizational capacity to flow it through.
The DORA research team moved away from the classic elite/high/medium/low tier classification this year, replacing it with seven team archetypes based on throughput and instability patterns. Some teams show high throughput with low instability — that’s the goal. Others show high throughput with correspondingly high instability: AI-accelerated output without the guardrails. That second pattern is the one to watch, because it looks like progress on the wrong metrics. DORA 2025: More AI, More Code, Flatter Delivery covers the archetype breakdown in detail.
What Separates Teams That Do See Improvement
The DORA report identifies seven organizational capabilities that determine whether AI investment translates into delivery improvement: a clear AI policy, a healthy data ecosystem, strong version control practices, small batch discipline, user-centric focus, quality internal platforms, and deliberate change management. Organizations that score high on these prerequisites show real DORA gains. Those that skip the foundation and go straight to tool adoption do not.
Practically, this means a few concrete investments matter more than the choice of which coding agent to deploy. Reducing PR size through stricter review norms and automated PR splitting tools. Strengthening CI/CD feedback so that pipelines can absorb higher commit frequency without degrading. Adding rework rate as a tracked metric — the 2025 DORA report introduced this as a fifth key metric, measuring how often teams push unplanned fixes to production. Without tracking it, teams don’t notice quality erosion until it’s expensive to reverse.
Around 30 percent of developers still say they don’t fully trust AI-generated output. That skepticism is not unfounded — it correlates with the teams most likely to catch quality issues before they reach production. Blind trust in AI output, combined with high PR merge velocity, is the pattern most strongly associated with degraded change failure rates.
The insight from this year’s data is uncomfortable but precise: AI coding agents are now table stakes, and they do accelerate individual work. But the organizations treating adoption as the finish line are discovering that tool penetration is only the starting condition. The delivery ceiling is set by the organizational system, not the AI tool. As covered in AI Velocity Paradox: More Code, More Bottlenecks, the bottleneck has simply moved downstream — and moving it again requires process work, not another model upgrade.
Further Reading
- 2025 DORA State of AI-Assisted Software Development — the primary research source, with the full archetype analysis and capability framework for translating AI adoption into delivery improvement
- DORA Report 2025 Key Takeaways: AI Impact on Dev Metrics | Faros AI — a concise breakdown of the PR size, review time, and bug rate statistics that explain where individual productivity gains go
- Developer Productivity Metrics 2026: From DORA to DevEx and Beyond | Zylos Research — February 2026 data on code duplication, churn, and the expanding gap between DORA metrics and the full picture of engineering effectiveness

