Andrej Karpathy’s Take on Agentic AI and great Impact on Software Engineering

Keynote from Y Combinator AI Startup School • San Francisco • 17 June 2025


Why this talk matters

In “Software Is Changing (Again)” Andrej Karpathy argues that large-language-model agents mark a third great epoch of software:

WaveInterfaceWhat engineers writeCompute target
1.0Compilers/CPUsImperative code (C++, Java)Deterministic machines
2.0ML frameworksData + model architectureNeural nets
3.0LLMs-as-OSPrompts & feedback loopsStochastic “people spirits” running in the cloud  

Karpathy’s core claim: The hottest new programming language is English.  


What “agentic AI” means in practice

Property of LLM agentsEngineering consequence
Jagged intelligence – super-human recall, sub-human arithmeticAdd verify loops & type-checkers around every call  
Utility / fab / OS triple-role – expensive to build, cheap to consumeTreat model labs like cloud vendors; architect for fail-over & rate limits  
Programmed in natural languageShift effort from syntax → prompt design & feedback heuristics
Partial autonomy slidersDesign UIs that expose diff views and “confidence knobs” (Cursor, Perplexity, etc.)  
Agents as first-class usersShip llms.txt, structured markdown, and self-describing APIs so bots—not just humans—can consume your product  

Five new pillars of Software 3.0 engineering

  1. Prompt-oriented architecture
    • Store prompts under version control, test them like code.
  2. Guard-railed execution
    • Wrap every model call in validators (regex, unit tests, type checks).
  3. Tight generate/verify cycles
    • Latency budgets must include detours for automatic critique or human review.
  4. Agent-friendly infrastructure
    • Documentation readable by both humans and parsers; think OpenAPI plus llms.txt.
  5. Observability for cognition
    • Log token streams & system messages; inspect why an agent acted, not just what it returned.

These practices operationalise Karpathy’s warning that LLMs are “people spirits”—creative but error-prone;they demand DevOps-for-cognition, not just DevOps-for-code.  


Opportunities unlocked by agentic AI

OpportunityExample todayWhy it’s viable now
One-person MVPs (“vibe coding”)MenuGen demo: full UI via promptsLLM handles boilerplate; human supplies vision  
Autonomous research & data pullPerplexity Deep Research agentsCheap in-context search + summarisation
Continuous code-base refactoringInternal GPT-powered linters deleting dead paths at TeslaModel can reason over millions of LOC
Agent-to-agent protocolsDeepWiki, GitIngestStructured docs allow bots to traverse knowledge

Risks & open questions

  • Reliability debt –  Hallucinations trade raw speed for debugging overhead.
  • Platform centralisation –  Cloud LLMs resemble 1960s mainframes; open-weight models may rebalance power later.  
  • Skill displacement –  Traditional “middleware” coding shrinks, but prompt engineering, evaluation and agent-safety grow.
  • Security surface –  Agents that read/write code can exfiltrate secrets or commit “prompt injection” supply-chain attacks.

Karpathy advocates a “keep AI on the leash” stance—tight scopes, incremental release, human-in-the-loop—mirroring comments he reiterated in press interviews.  


How to prepare your team

  1. Inventory high-leverage prompts –  Identify workflows where English beats code.
  2. Layer tests –  Write property-based tests that re-prompt until they pass.
  3. Instrument everything –  Token-level logs + embeddings let you analyse failures post-hoc.
  4. Upskill in evaluation –  Learn metrics like graded self-consistency (GSC) or LM-confidence via entropy.
  5. Pilot “agent-first” modules –  Start with side-cars (doc summariser, migration assistant) before core logic.

Verdict on the keynote

Strengths

  • Clear, memorable mental model (Software 1.0-2.0-3.0) that frames LLMs as a new compute substrate.
  • Concrete engineering anecdotes (Autopilot diff-slider, MenuGen) that ground the hype.
  • Honest about limitations—“jagged intelligence” demands guardrails.

Gaps

  • Little discussion of on-device LLM inference and its impact on edge privacy.
  • Tooling section glosses over non-code disciplines (design, legal) that will also face agentic disruption.

Overall: A must-watch thesis for technologists. It’s less a product roadmap than a design brief for building robust, human-aligned AI agents—and a wake-up call that the next decade of software will be written as much in English as in Python.

Don’t miss on GenAI tips!

We don’t spam! We are not selling your data. Read our privacy policy for more info.

Don’t miss on GenAI tips!

We don’t spam! We are not selling your data. Read our privacy policy for more info.

Share the Post:

Related Posts

v0.app

Fast prototyping with generative AI Why Everyone Is Talking About v0.app — And Why You Should Try It Today If

Read More