Skip to content

Stripe Minions vs Cursor Agents: Two Paths to Autonomous PRs

9 min read

Stripe Minions vs Cursor Agents: Two Paths to Autonomous PRs
Photo by Kindel Media on Pexels

Two Different Bets on Autonomous Code

Last week, two very different approaches to autonomous software development landed in the spotlight within days of each other. Stripe revealed that its internal AI coding agents — nicknamed Minions — are now shipping over 1,300 pull requests per week, all containing zero human-written code. On April 2, 2026, Cursor launched Cursor 3, its biggest architectural overhaul yet, centering the product around a new Agents Window designed to run multiple autonomous coding agents in parallel — cloud, local, or self-hosted.

Both approaches want the same outcome: AI that ships code without a human writing every line. But the architectures, assumptions, and target users are fundamentally different. Stripe built a bespoke system tuned to the specific needs of a 4,000-engineer fintech company processing over $1 trillion in annual payments. Cursor built a general-purpose platform any team can start using this week.

This piece breaks down how each system works, where each architecture excels, and — critically — which one makes sense for your team.

How Stripe’s Minions Work

Stripe’s Minions are one-shot, end-to-end coding agents. They receive a task, work through it unattended, and deliver a finished pull request. The trigger is deliberately mundane: a Slack emoji reaction on a message describing a bug or a task. That message becomes the agent’s core prompt. From there, no human touches a keyboard until the PR review.

The Blueprint: Deterministic and Agentic Nodes

The core architectural insight behind Minions is what Stripe calls blueprints: orchestration flows that alternate between fixed, deterministic steps and open-ended AI agent loops. Stripe engineer Steve Kaliski described this in detail on the How I AI podcast: the deterministic nodes handle things that should never vary — setting up the container, cloning the repo, parsing the task — while the agentic nodes handle the parts that require reasoning, like identifying which files to modify or what approach to take.

This hybrid approach outperforms fully agentic designs in two practical ways. First, deterministic nodes create explicit failure points: if the container doesn’t spin up, you know exactly why. Second, they reduce the cognitive load on the model — it only needs to reason about the genuinely ambiguous parts of the task, not the boilerplate scaffolding around it.

Stripe forked Goose, the open-source agent harness developed by Block, and customized it for their own LLM infrastructure. Each Minion runs in an isolated container with a fresh checkout of the relevant codebase. The agent searches the code, makes changes, runs tests, reads the output, fixes failures, and iterates — up to two attempts on failing tests before handing the task back to a human engineer.

From Slack Emoji to Merged PR

Stripe’s test suite has over 3 million tests. CI is configured to run only the subset relevant to the changed files, with autofixes applied where possible. When the agent is satisfied — or has hit its retry limit — it opens a PR automatically. That PR goes through standard code review. Minions have submission authority but not merge authority: a human engineer always approves before code lands in main.

The results are striking. Stripe grew from 1,000 to 1,300 AI-generated PRs per week in under two weeks — roughly 30% week-over-week growth at the time of writing. According to InfoQ’s coverage, Minions perform best on well-defined, bounded tasks: configuration adjustments, dependency upgrades, migration scripts, straightforward refactors. They are explicitly not a replacement for complex feature development where requirements are ambiguous.

How Cursor 3 Background Agents Work

Cursor 3 takes the opposite design philosophy: instead of a bespoke internal system, it is a commercial platform any team can deploy, configured through a UI rather than custom code. The April 2 release replaced the Composer pane with a full-screen Agents Window — a dedicated workspace for running and managing multiple agents simultaneously across different execution environments.

The Agents Window

Each agent in Cursor 3 can run in one of four environments: local (your machine), cloud (Cursor’s infrastructure), worktree (an isolated git working tree), or remote SSH (a server you control). Agents can be started in parallel, and the system supports seamless handoff — you can start a task locally, push it to a cloud environment for overnight execution, and pull the results back when you return.

The practical workflow looks different from Stripe’s Slack-driven model. In Cursor 3, a developer queues a task from within the IDE, sets it running in the background (or in a cloud environment), and returns when the agent has a draft. Cursor’s BugBot Autofix, already generally available, extends this to PR-level: the agent can propose fixes directly on open pull requests without the developer initiating anything manually.

Cursor 3 also ships Design Mode and native best-of-N model comparison — two features aimed at reducing the “which model should I use for this?” overhead that teams currently manage manually.

Self-Hosted for Enterprises

On March 31, 2026 — one day before the Cursor 3 announcement — Cursor unveiled self-hosted cloud agents for enterprise customers. This is the feature that opens the door for highly regulated industries: financial services, healthcare, government. The architecture is deliberately simple: workers establish outbound HTTPS connections to Cursor’s cloud. No inbound ports, no VPN, no firewall changes required.

Self-hosted deployments support up to 10 workers per user and 50 per team by default. For large-scale deployments, Cursor provides a Helm chart and Kubernetes operator, plus a fleet management API for monitoring utilization and building custom autoscaling. Source code and build artifacts never leave the company’s environment. Cursor handles inference orchestration and the user experience; the customer controls execution.

According to The New Stack, this is a direct response to enterprise security teams who had blocked Cursor adoption specifically because cloud-hosted execution meant code leaving company infrastructure. The Fortune 500 was already using Cursor — the company claims more than half — but regulated subsidiaries and sensitive codebases were off-limits until self-hosting landed.

Head-to-Head: Architecture, Scale, and Fit

The two approaches differ across almost every meaningful dimension. Here is a direct comparison across the criteria that matter most for adoption decisions:

Criterion Stripe Minions Cursor 3 Background Agents
Trigger / Interface Slack emoji reaction on a task message IDE Agents Window, CLI, or BugBot on PRs
Architecture Blueprint: deterministic nodes + agentic loops; built on Goose harness Managed platform; parallel agents across local/cloud/SSH environments
Scale model Centralized org-level fleet; 1,300+ PRs/week across thousands of engineers Per-developer or per-team; up to 50 workers/team by default
Infrastructure Isolated containers, internal CI integration, Stripe’s own LLM stack Hybrid cloud/local with self-hosted option; outbound HTTPS workers
Human oversight Submission authority only; mandatory human PR review before merge Developer reviews agent output; human approves PR before merge
Task type fit Well-defined, bounded tasks: migrations, dependency bumps, refactors Broader range; handles exploratory tasks and complex feature work
Access Internal only; not available externally Commercial product; ~$40/month per developer (Pro plan)
Security model Internal infra, no third-party data exposure Self-hosted option keeps code on company infrastructure
Build effort Significant: custom harness, blueprint design, CI integration Minimal: install, configure, use within hours

One architectural difference that doesn’t show up in this table: ambiguity tolerance. Stripe’s blueprint model is explicitly designed for tasks where the scope is clear. The deterministic framing reduces what the model has to decide, which improves reliability. Cursor’s agents are more exploratory by design — they are built to handle tasks where the developer hasn’t fully specified the solution path.

Neither is superior in absolute terms. They optimize for different failure modes. Stripe optimizes for reliability at scale (1,300 PRs is only valuable if most of them are correct). Cursor optimizes for flexibility — the kind of open-ended assistance where a developer is working through a problem, not just dispatching well-understood work.

Who Should Use Which

Use Cursor 3 Background Agents if: your team has fewer than ~200 engineers, you need something running this week, your tasks span a wide range of complexity, or you don’t have the engineering bandwidth to build and maintain a custom harness. The self-hosted option now satisfies most enterprise security requirements. For teams already using Cursor as their primary IDE — and increasingly that includes large-scale refactoring work — upgrading to Cursor 3 is a natural path.

Invest in a Stripe-style custom architecture if: you have 200+ engineers generating a predictable high volume of similar, bounded tasks (dependency upgrades, test fixes, configuration migrations), you want centralized control over the agent fleet at org level, and you have senior engineers available to build and iterate on the harness. The blueprint model delivers reliability that general-purpose tools struggle to match at this scale — but it requires ongoing investment to maintain.

There is a third option that deserves mention: open-source harnesses like Goose, OpenHands, or SWE-agent. These sit between Cursor’s commercial product and a fully bespoke system. They give engineering teams a starting point for custom agent workflows without building the scaffolding from scratch. Stripe’s choice of Goose as a foundation suggests this is a viable intermediate path.

The Real Question: Build vs. Buy

The deeper question isn’t really Stripe Minions vs. Cursor — it’s whether your organization should build a custom agent system or buy one off the shelf. The honest answer depends on three factors.

First, task homogeneity. The more your autonomous agent use case looks like a narrow, repeatable workflow — the same kind of task, run thousands of times — the more a custom blueprint architecture pays off. Custom blueprints can be tuned precisely to that workflow in ways a general-purpose tool cannot match. If your use case is diverse and unpredictable, a general-purpose tool wins.

Second, engineering capacity. Stripe’s Minions didn’t appear overnight. Steve Kaliski’s team iterated through multiple architectures before landing on the blueprint model. That’s an ongoing maintenance commitment, not a one-time project. Most teams are better served deploying Cursor and saving their engineering capacity for product work.

Third, scale threshold. At Stripe’s numbers — 1,300+ PRs per week — the economics of a custom system likely justify the build cost. At 50 PRs per week, they almost certainly don’t. Where the crossover point sits depends on your labor costs, the cost of commercial tooling, and how much the custom system outperforms the off-the-shelf alternative. Most teams haven’t run that calculation.

What’s clear is that 90% of engineering teams now use AI coding tools, but most aren’t yet operating at the autonomous agent layer Stripe has reached. Cursor 3 makes it easier than ever to start. Stripe’s architecture shows what the ceiling looks like once you get there.

Further Reading

Don’t miss on Ai tips!

We don’t spam! We are not selling your data. Read our privacy policy for more info.

Don’t miss on Ai tips!

We don’t spam! We are not selling your data. Read our privacy policy for more info.

Enjoyed this? Get one AI insight per day.

Join engineers and decision-makers who start their morning with vortx.ch. No fluff, no hype — just what matters in AI.