Andrej Karpathy’s Take on Agentic AI and great Impact on Software Engineering

Keynote from Y Combinator AI Startup School • San Francisco • 17 June 2025

Why this talk matters

In “Software Is Changing (Again)” Andrej Karpathy argues that large-language-model agents mark a third great epoch of software:

Wave	Interface	What engineers write	Compute target
1.0	Compilers/CPUs	Imperative code (C++, Java)	Deterministic machines
2.0	ML frameworks	Data + model architecture	Neural nets
3.0	LLMs-as-OS	Prompts & feedback loops	Stochastic “people spirits” running in the cloud

Karpathy’s core claim: “The hottest new programming language is English.”

What “agentic AI” means in practice

Property of LLM agents	Engineering consequence
Jagged intelligence – super-human recall, sub-human arithmetic	Add verify loops & type-checkers around every call
Utility / fab / OS triple-role – expensive to build, cheap to consume	Treat model labs like cloud vendors; architect for fail-over & rate limits
Programmed in natural language	Shift effort from syntax → prompt design & feedback heuristics
Partial autonomy sliders	Design UIs that expose diff views and “confidence knobs” (Cursor, Perplexity, etc.)
Agents as first-class users	Ship llms.txt, structured markdown, and self-describing APIs so bots—not just humans—can consume your product

Five new pillars of Software 3.0 engineering

Prompt-oriented architecture
- Store prompts under version control, test them like code.
Guard-railed execution
- Wrap every model call in validators (regex, unit tests, type checks).
Tight generate/verify cycles
- Latency budgets must include detours for automatic critique or human review.
Agent-friendly infrastructure
- Documentation readable by both humans and parsers; think OpenAPI plus llms.txt.
Observability for cognition
- Log token streams & system messages; inspect why an agent acted, not just what it returned.

These practices operationalise Karpathy’s warning that LLMs are “people spirits”—creative but error-prone；they demand DevOps-for-cognition, not just DevOps-for-code.

Opportunities unlocked by agentic AI

Opportunity	Example today	Why it’s viable now
One-person MVPs (“vibe coding”)	MenuGen demo: full UI via prompts	LLM handles boilerplate; human supplies vision
Autonomous research & data pull	Perplexity Deep Research agents	Cheap in-context search + summarisation
Continuous code-base refactoring	Internal GPT-powered linters deleting dead paths at Tesla	Model can reason over millions of LOC
Agent-to-agent protocols	DeepWiki, GitIngest	Structured docs allow bots to traverse knowledge

Risks & open questions

Reliability debt – Hallucinations trade raw speed for debugging overhead.
Platform centralisation – Cloud LLMs resemble 1960s mainframes; open-weight models may rebalance power later.
Skill displacement – Traditional “middleware” coding shrinks, but prompt engineering, evaluation and agent-safety grow.
Security surface – Agents that read/write code can exfiltrate secrets or commit “prompt injection” supply-chain attacks.

Karpathy advocates a “keep AI on the leash” stance—tight scopes, incremental release, human-in-the-loop—mirroring comments he reiterated in press interviews.

How to prepare your team

Inventory high-leverage prompts – Identify workflows where English beats code.
Layer tests – Write property-based tests that re-prompt until they pass.
Instrument everything – Token-level logs + embeddings let you analyse failures post-hoc.
Upskill in evaluation – Learn metrics like graded self-consistency (GSC) or LM-confidence via entropy.
Pilot “agent-first” modules – Start with side-cars (doc summariser, migration assistant) before core logic.

Verdict on the keynote

Strengths

Clear, memorable mental model (Software 1.0-2.0-3.0) that frames LLMs as a new compute substrate.
Concrete engineering anecdotes (Autopilot diff-slider, MenuGen) that ground the hype.
Honest about limitations—“jagged intelligence” demands guardrails.

Gaps

Little discussion of on-device LLM inference and its impact on edge privacy.
Tooling section glosses over non-code disciplines (design, legal) that will also face agentic disruption.

Overall: A must-watch thesis for technologists. It’s less a product roadmap than a design brief for building robust, human-aligned AI agents—and a wake-up call that the next decade of software will be written as much in English as in Python.

Share the Post:

v0.app

Fast prototyping with generative AI Why Everyone Is Talking About v0.app — And Why You Should Try It Today If

Writing books using generative AI

Authoring automata In the rapidly evolving landscape of generative artificial intelligence (GenAI), authors and content creators now have access to