Add verify loops & type-checkers around every call
Utility / fab / OS triple-role – expensive to build, cheap to consume
Treat model labs like cloud vendors; architect for fail-over & rate limits
Programmed in natural language
Shift effort from syntax → prompt design & feedback heuristics
Partial autonomy sliders
Design UIs that expose diff views and “confidence knobs” (Cursor, Perplexity, etc.)
Agents as first-class users
Ship llms.txt, structured markdown, and self-describing APIs so bots—not just humans—can consume your product
Five new pillars of Software 3.0 engineering
Prompt-oriented architecture
Store prompts under version control, test them like code.
Guard-railed execution
Wrap every model call in validators (regex, unit tests, type checks).
Tight generate/verify cycles
Latency budgets must include detours for automatic critique or human review.
Agent-friendly infrastructure
Documentation readable by both humans and parsers; think OpenAPI plus llms.txt.
Observability for cognition
Log token streams & system messages; inspect why an agent acted, not just what it returned.
These practices operationalise Karpathy’s warning that LLMs are “people spirits”—creative but error-prone;they demand DevOps-for-cognition, not just DevOps-for-code.
Opportunities unlocked by agentic AI
Opportunity
Example today
Why it’s viable now
One-person MVPs (“vibe coding”)
MenuGen demo: full UI via prompts
LLM handles boilerplate; human supplies vision
Autonomous research & data pull
Perplexity Deep Research agents
Cheap in-context search + summarisation
Continuous code-base refactoring
Internal GPT-powered linters deleting dead paths at Tesla
Model can reason over millions of LOC
Agent-to-agent protocols
DeepWiki, GitIngest
Structured docs allow bots to traverse knowledge
Risks & open questions
Reliability debt – Hallucinations trade raw speed for debugging overhead.
Platform centralisation – Cloud LLMs resemble 1960s mainframes; open-weight models may rebalance power later.
Skill displacement – Traditional “middleware” coding shrinks, but prompt engineering, evaluation and agent-safety grow.
Security surface – Agents that read/write code can exfiltrate secrets or commit “prompt injection” supply-chain attacks.
Karpathy advocates a “keep AI on the leash” stance—tight scopes, incremental release, human-in-the-loop—mirroring comments he reiterated in press interviews.
How to prepare your team
Inventory high-leverage prompts – Identify workflows where English beats code.
Layer tests – Write property-based tests that re-prompt until they pass.
Instrument everything – Token-level logs + embeddings let you analyse failures post-hoc.
Upskill in evaluation – Learn metrics like graded self-consistency (GSC) or LM-confidence via entropy.
Pilot “agent-first” modules – Start with side-cars (doc summariser, migration assistant) before core logic.
Verdict on the keynote
Strengths
Clear, memorable mental model (Software 1.0-2.0-3.0) that frames LLMs as a new compute substrate.
Concrete engineering anecdotes (Autopilot diff-slider, MenuGen) that ground the hype.
Honest about limitations—“jagged intelligence” demands guardrails.
Gaps
Little discussion of on-device LLM inference and its impact on edge privacy.
Tooling section glosses over non-code disciplines (design, legal) that will also face agentic disruption.
Overall: A must-watch thesis for technologists. It’s less a product roadmap than a design brief for building robust, human-aligned AI agents—and a wake-up call that the next decade of software will be written as much in English as in Python.
Andrej Karpathy’s Take on Agentic AI and great Impact on Software Engineering
Keynote from Y Combinator AI Startup School • San Francisco • 17 June 2025
Why this talk matters
In “Software Is Changing (Again)” Andrej Karpathy argues that large-language-model agents mark a third great epoch of software:
Karpathy’s core claim: “The hottest new programming language is English.”
What “agentic AI” means in practice
Five new pillars of Software 3.0 engineering
These practices operationalise Karpathy’s warning that LLMs are “people spirits”—creative but error-prone;they demand DevOps-for-cognition, not just DevOps-for-code.
Opportunities unlocked by agentic AI
Risks & open questions
Karpathy advocates a “keep AI on the leash” stance—tight scopes, incremental release, human-in-the-loop—mirroring comments he reiterated in press interviews.
How to prepare your team
Verdict on the keynote
Strengths
Gaps
Related Posts
v0.app
Fast prototyping with generative AI Why Everyone Is Talking About v0.app — And Why You Should Try It Today If
Writing books using generative AI
Authoring automata In the rapidly evolving landscape of generative artificial intelligence (GenAI), authors and content creators now have access to