Build Persistent AI Agents with Cloudflare’s SDK

Why Most AI Agents Disappear When You Need Them Most

You start a long research task. The agent is mid-loop — 30 seconds into an LLM call, 10 tool-use rounds deep — when your laptop sleeps, a deploy fires, or the process hits a memory limit. State gone. Context gone. Start over. This is not a fringe failure mode; it’s the default behavior of nearly every AI agent in production today.

Cloudflare’s Agents SDK, specifically its Project Think update released April 15, 2026, addresses this at the infrastructure level rather than patching it in application code. The result is agents that persist across restarts, survive crashes, coordinate with child agents, and cost nothing when idle. Here’s how to actually build one.

What You’ll Need

Before you start: a Cloudflare account (free tier works), Node.js 18+, and basic familiarity with TypeScript. The Agents SDK uses Durable Objects under the hood, but you don’t need to understand Durable Objects to follow this guide — the SDK abstracts them away entirely.

Install the required packages:

npm install @cloudflare/think agents ai @cloudflare/shell zod workers-ai-provider
npx wrangler@latest init my-agent --type worker

Step 1: Understand the Agent Model

Each agent in the SDK is a Durable Object — an addressable micro-server with its own SQLite database, WebSocket connections, and scheduling. The key economic fact: it consumes zero compute when hibernated. When a request arrives (HTTP, WebSocket, alarm, or inbound event), the platform wakes the agent, loads its state, and hands it the event. When the work is done, it sleeps again.

This is the actor model applied to AI agents. A hundred million concurrent users each running a modest agent workload at 1% concurrency means ~1 million actually active at any time — not a hundred million idle containers burning money. The SDK’s comparison is stark:

	VMs / Containers	Durable Objects (Agents SDK)
Idle cost	Full compute cost, always	Zero (hibernated)
Persistent state	External database required	Built-in SQLite
Crash recovery	You build it	Platform restarts, state survives
Routing	You build it	Built-in (name → agent)

Step 2: Create Your First Persistent Agent

The minimal working agent using the Think base class looks like this. Think handles streaming, message persistence, the agentic loop, and error recovery — you just tell it which model to use:

// src/server.ts
import { Think } from "@cloudflare/think";
import { createWorkersAI } from "workers-ai-provider";
import { routeAgentRequest } from "agents";

export class ResearchAgent extends Think<Env> {
  getModel() {
    return createWorkersAI({ binding: this.env.AI })(
      "@cf/moonshotai/kimi-k2.5"
    );
  }
}

export default {
  async fetch(request: Request, env: Env) {
    return (
      (await routeAgentRequest(request, env)) ||
      new Response("Not found", { status: 404 })
    );
  }
} satisfies ExportedHandler<Env>;

Deploy with npx wrangler deploy. You now have a chat agent with streaming, conversation history in SQLite, resumable streams after crashes, and a built-in workspace filesystem. Every message is persisted after each turn, so a restart mid-conversation loses nothing.

What you should see: A working Worker URL from Wrangler. Hitting it with a POST containing a message body should return a streaming response from the model.

Step 3: Add Crash Recovery with Fibers

Persistence at the message level is not enough for long-running tasks. A research loop that takes 10 minutes and crashes at minute 8 needs to resume from minute 8 — not restart. That’s what runFiber() provides.

A fiber is a durable function invocation: registered in SQLite before execution begins, checkpointable via ctx.stash(), and recoverable via onFiberRecovered. The SDK keeps the agent alive automatically during fiber execution — no configuration needed:

import { Agent } from "agents";

export class ResearchAgent extends Agent {
  async startResearch(topic: string) {
    void this.runFiber("research", async (ctx) => {
      const findings = [];

      for (let i = 0; i < 10; i++) {
        const result = await this.callLLM(`Research step ${i}: ${topic}`);
        findings.push(result);

        // Checkpoint: if evicted here, we resume from this snapshot
        ctx.stash({ findings, step: i, topic });

        this.broadcast({ type: "progress", step: i });
      }

      return { findings };
    });
  }

  async onFiberRecovered(ctx) {
    if (ctx.name === "research" && ctx.snapshot) {
      const { topic, step } = ctx.snapshot;
      // Resume from where we left off, not from scratch
      await this.continueResearch(topic, step);
    }
  }
}

For tasks longer than a few minutes — CI pipelines, video generation, batch data processing — the pattern shifts slightly: start the job, persist the external job ID to SQLite, hibernate, and wake on the callback. The fiber handles all the bookkeeping.

What you should see: If you kill the Worker process mid-fiber (simulating a crash) and restart, onFiberRecovered fires with the last stashed snapshot. No data loss, no restart from zero.

Step 4: Coordinate Work with Sub-Agents

A single agent doing everything is a bottleneck. The SDK’s sub-agent API lets you spawn isolated child agents colocated with the parent. Each gets its own SQLite database, execution context, and tools. Communication is typed RPC — TypeScript catches misuse at compile time:

export class Orchestrator extends Agent {
  async handleTask(task: string) {
    // Each sub-agent is its own Durable Object, isolated at storage level
    const researcher = await this.subAgent(ResearchAgent, "research");
    const reviewer = await this.subAgent(ReviewAgent, "review");

    // Run in parallel
    const [research, review] = await Promise.all([
      researcher.search(task),
      reviewer.analyze(task)
    ]);

    return this.synthesize(research, review);
  }
}

Sub-agents are not threads. They are genuinely isolated actors — different storage, different compute, potentially different models. The parent orchestrates; the children execute. This pattern also works with Think’s sub-agent RPC, where each child gets its own full conversation tree and memory.

Step 5: Handle Long Conversations Without Context Blowup

The Session API, available on both Agent and Think, stores conversations as trees rather than flat lists. Each message has a parent_id, enabling branching (try a different approach without losing history), non-destructive compaction (summarize old messages rather than truncating them), and full-text search via SQLite’s FTS5 extension:

configureSession(session: Session) {
  return session
    .withContext("memory", {
      description: "Important facts learned during conversation.",
      maxTokens: 2000
    })
    .withCachedPrompt();
}

When context fills up, Think compacts by summarizing — the full history stays in SQLite, the model sees a digest. The agent can also search its own past using the built-in search_context tool, which is useful for week-long or month-long task threads where exact message recall matters.

What This Looks Like at Scale

In May 2026, Cloudflare also shipped Dynamic Workflows, extending durable execution to multi-tenant scenarios. Concurrent workflow capacity increased from 4,500 to 50,000 instances — an 11x jump. The library is MIT-licensed and lets each tenant, agent, or request carry different workflow code, dispatched at runtime rather than at deploy time.

The practical implication: you can give every user their own isolated agent without a pricing cliff. An agent that runs for 20 minutes a day and hibernates the rest costs approximately zero when idle. Running 10,000 such agents looks like 100 simultaneously active — not 10,000 containers billed around the clock.

For teams already using AI coding agents, the operational model aligns well with how agentic workflows are moving to production in 2026 — long-running, task-oriented, and increasingly dependent on state that outlasts a single session.

Where the Tradeoffs Sit

The SDK is still in preview. Project Think’s API surface is marked stable but evolving — Cloudflare explicitly says it will change. That’s a fair warning: don’t build a customer-facing product on @cloudflare/think today expecting the interface to freeze.

The execution model also has a real constraint: Durable Objects are single-threaded per instance. An agent doing CPU-heavy work will queue other requests to that instance. For IO-bound tasks (LLM calls, tool use, API calls) this is rarely a problem. For compute-heavy tasks, sub-agents become the right answer.

Finally, vendor lock-in is genuine. Durable Objects, Dynamic Workers, and the Agents SDK are Cloudflare-specific primitives. The persistence, hibernation, and routing guarantees don’t port to AWS Lambda or GCP Cloud Run without significant re-architecture.

Build Persistent AI Agents with Cloudflare’s SDK

Why Most AI Agents Disappear When You Need Them Most

What You’ll Need

Step 1: Understand the Agent Model

Step 2: Create Your First Persistent Agent

Step 3: Add Crash Recovery with Fibers

Step 4: Coordinate Work with Sub-Agents

Step 5: Handle Long Conversations Without Context Blowup

What This Looks Like at Scale

Where the Tradeoffs Sit

Further Reading

Don’t miss on Ai tips!

Don’t miss on Ai tips!

Build Persistent AI Agents with Cloudflare’s SDK

Why Most AI Agents Disappear When You Need Them Most

What You’ll Need

Step 1: Understand the Agent Model

Step 2: Create Your First Persistent Agent

Step 3: Add Crash Recovery with Fibers

Step 4: Coordinate Work with Sub-Agents

Step 5: Handle Long Conversations Without Context Blowup

What This Looks Like at Scale

Where the Tradeoffs Sit

Further Reading

Don’t miss on Ai tips!

Don’t miss on Ai tips!

Enjoyed this? Get one AI insight per day.

Related Articles

MCP 2026-07-28: Stateless Core, Enterprise Auth Lands

Claude Opus 5: Near-Frontier at Half the Price

Why 88% of Enterprise AI Pilots Never Ship