Anthropic Says AI Is Building AI — Humans May Be the Bottleneck

Article is online

Anthropic Says AI Is Building AI — Humans May Be the Bottleneck

Highlights

Anthropic reports that Claude now writes over 80% of the code merged into the company’s repositories and that engineer output has increased roughly eightfold since 2024. The company argues that AI is already contributing substantially to developing future AI systems through coding, experiment execution, and research support. Anthropic warns that, while recursive self-improvement is not guaranteed, continued trends could allow AI to design its own successors, shifting human roles toward oversight and validation.

Sentiment Analysis

The overall tone of the article is cautiously optimistic with elements of concern. It highlights strong progress — increased productivity and deeper agentic capabilities — while underscoring uncertainty and risk around autonomous AI-driven development. The narrative balances enthusiasm for rapid technical gains with sober reminders that recursive self-improvement is not inevitable and that human judgment remains essential.

Positive aspects are emphasized by demonstrable metrics (code share and output growth). Yet the piece also carries a cautionary note about oversight, governance, and the limits of current AI research judgment. This produces a mixed-but-leaning-positive sentiment: recognition of clear progress tempered by responsible warnings.

Article Text

Anthropic reports that its Claude family of models has become a central contributor to the company’s software development process, now producing more than 80% of the code merged into its main codebase. According to the company’s analysis, the introduction of Claude Code and the model’s ability to run code rather than merely propose snippets has coincided with a sharp rise in productivity: engineers are shipping roughly eight times more merged code than before 2025. This shift is visible in per-engineer output, which remained steady during the company’s early years but increased markedly after Claude began executing code and automating parts of the development workflow.

Anthropic frames these changes as part of a broader transformation in how AI systems participate in research and engineering activities. Beyond generating code, Claude is described as helping run experiments, triage issues, and assist with research tasks — functions that collectively speed iteration and lower manual friction. The company suggests that when AI systems take on more of the routine and exploratory work in development cycles, the remaining human role shifts toward oversight: validating results, verifying safety, and setting high-level direction for research agendas.

Importantly, Anthropic cautions that current metrics like lines of code are imperfect proxies for productivity or scientific progress. Code quantity does not directly equate to long-term quality, design insight, or the capacity to select genuinely valuable research directions. The company explicitly states that while trends point toward greater automation, recursive self-improvement — the idea that an AI could autonomously design and build its own successor — is not yet a given. It remains uncertain whether models possess the research judgment to choose the right problems or reliably pursue productive lines of inquiry without human guidance.

Still, the trajectory raises plausible scenarios where AI systems shoulder increasing responsibility for their own advancement. Anthropic outlines multiple possible futures: a slowdown in progress, continued human-led oversight with extensive automation of routine tasks, or a more transformative outcome in which systems gradually develop capabilities that let them autonomously design improved successors. The company does not claim inevitability for the last outcome but warns that it could arrive sooner than many organizations expect, particularly if compute and algorithmic improvements continue apace.

The report arrives amid industry-wide shifts in how models are positioned. Companies are marketing advanced models as collaborators and agents rather than simple conversational tools. Anthropic itself has continued to iterate on Claude, releasing successive versions aimed at stronger coding, reasoning, and agentic performance, while other firms have launched their own generative and agentic offerings. These developments have fueled conversation about the proper balance between automation and human oversight in research and product development.

Operationally, Anthropic anticipates that human involvement will evolve toward oversight, validation, and verification of an expanding "virtual lab" where AI systems execute and test ideas at scale. The company highlights potential spillover benefits: systems that automate AI research tasks might transfer those skills to other scientific domains, accelerating progress outside machine learning. At the same time, Anthropic stresses the need for careful governance, safety review, and attention to alignment, since increased autonomy could amplify both benefits and risks.

In sum, Anthropic’s report documents measurable changes in development workflows driven by capable models like Claude and explores the implications of those changes. While the evidence points to significant productivity gains and deeper integration of AI into research and engineering, the company underscores that important capability gaps and governance challenges remain. Crucially, Claude still lacks proven research judgment, meaning humans currently remain essential for setting priorities and ensuring the soundness of scientific direction. The future may hold more autonomous AI-driven development, but it will require careful management, robust oversight, and ongoing evaluation of what automated systems can and should be allowed to do.

Key Insights Table

Aspect	Description
Code Contribution	Claude authors over 80% of merged code, reflecting deep integration into development workflows.
Productivity Change	Engineers are merging roughly eight times more code than in 2024, linked to Claude executing code and automating tasks.
Potential Trajectories	Outcomes range from slowed progress to human-supervised automation to possible recursive self-improvement, though the last is not guaranteed.
Human Role	Expected to shift toward oversight, validation, and verification of AI-run experiments and virtual labs.
Uncertainties	Key unknowns include models' research judgment, long-term quality of automated outputs, and governance needs for increased autonomy.

Last edited at：2026/6/5