Your Company's Next Senior Engineer Won't Be Human

Will AI autonomously ship production features — from spec to deployment — without human code review at major tech companies by end of 2027?

If you're a software engineer, this determines whether AI is your best tool or your replacement.

Target: Dec 2027(629 days until resolution)

Assessed Probability

60%

More likely than not

Based on 5 expert predictions, 5 evidence items

Community Forecast

Cast your vote

Be the first to weigh in below

Your Prediction

Where do you think this lands?

Join others who've weighed in

5%95%

50% — More likely than not

Share on LinkedIn Share on X

Boris Cherny, Head of Claude Code at Anthropic, ships 22-27 PRs per day with 100% AI-written code. Cursor built a $2B ARR business with 12 employees. SWE-bench went from 4% to 80% in 26 months. The acceleration since Opus 4.5 (November 2025) has been qualitatively different — not just writing code faster, but understanding codebases, making architectural decisions, and running multi-step agent workflows. The power-law productivity distribution is key: top 10% of engineers with AI are 10x more productive, and that gap is widening every month. The remaining question isn't whether AI can ship features — it demonstrably can for the best practitioners. It's whether companies will trust AI to ship without human review. And the answer is increasingly yes, for internal tools and non-critical paths first, then expanding. Dev freelance rates already dropped 36% ($75→$48/hr) — the market is pricing in this shift.

Scenarios

Current value: 46% AI-generated code, SWE-bench 80%, Boris Cherny 100% AI code at 22-27 PRs/day, Cursor $2B ARR / 12 people

S-curve position: Steep mid-curve — code generation nearly solved, autonomous shipping emerging rapidly

Bear Case

Still requires human review for all production code (liability, quality, cultural resistance)

Base Case

Routine and internal features shipped autonomously at 5+ major tech companies; critical features still human-reviewed

Bull Case

Standard practice by Q3 2027 (agent frameworks mature, CI/CD integration, liability frameworks emerge)

How We'll Know

What we measure: Whether AI systems autonomously ship production features (spec → code → test → deploy) without human code review at companies with 1000+ engineers
Confirmed if: At least 3 major tech companies publicly confirm AI autonomously ships production features without human code review as standard practice
Refuted if: All major AI coding tools still require human review for production deployment by end 2027
Data sources: GitHub Copilot / Agent HQ metrics
Company engineering blog posts
SWE-bench / METR evaluations
Developer surveys (Stack Overflow, JetBrains)

Evidence Trail

Evidence For

Mar 7, 2026
GitHub Copilot writes 46% of code for active users. Claude Sonnet 4.6 at 80.8% SWE-bench Verified. METR task horizon at 14.5 hours. 57% of enterprises running multi-step agent workflows. GitHub Agent HQ runs multiple AI models on same codebase.→ Probability: 35%
Mar 7, 2026
Boris Cherny: 100% AI code, 22-27 PRs/day (practitioner proof). Cursor: $2B ARR with 12 employees (market proof). SWE-bench: 4%→80% in 26 months (benchmark proof). Dev freelance rates: -36% (price proof). Power-law: top 10% engineers are 10x productive with AI. Solo founders building complex products. Inference cost decline 200x/year makes AI coding nearly free.→ Probability: 55%

Evidence Against

Mar 7, 2026
SWE-bench measures isolated tasks, not production complexity. Enterprise legacy codebases resist AI. Code review exists for liability and quality, not just correctness. Architectural decisions, cross-team coordination remain human. No major company has publicly shipped production features without human review yet.
Mar 8, 2026
Benchmark-to-production gap remains large: Opus 4.5 scores 80% on SWE-bench Verified but only 18% on private codebases (SWE-bench Pro). Sonar (Feb 2026): Opus 4.6 has 21% MORE issue density and 55% MORE vulnerability density than Opus 4.5 — smarter model ≠ safer code. Code review bottleneck: 21% more tasks completed but review times up 91% (Faros AI, Jan 2026). Cursor BugBot: only 35% of AI-generated fixes merge unmodified. Technical debt accumulating: 14 different DB connection patterns in one AI-generated codebase, code duplication up 4x. The quality gap has shifted from 'can it write code' to 'can you trust the code it writes.'
Mar 13, 2026
METR Opus 4.6: 719min task horizon (50%) / 70min (80%) — 2.45x jump in ~3 months. Cursor Automations launch: event-triggered agents (Slack, Linear, GitHub, PagerDuty) shift from prompt-based to always-on autonomous coding. Cursor approaching $50B valuation. Anthropic Jobs Report: 75% programmer task coverage in practice. SWE-bench Pro at 57.5% with optimized scaffold (WarpGrep v2).

How Our View Evolved

Mar 13, 202655%↑60%
METR Opus 4.6 at 719min/70min task horizon (2.45x jump). Cursor Automations launch (event-triggered agents). 75% programmer task coverage (Anthropic). SWE-bench Pro at 57.5% with scaffold. Cursor $50B valuation.
Mar 8, 2026Initial assessment: 55%
Baseline — initial published assessment

What Experts Say

Dario Amodei

CEO, Anthropic

Track record: 8/10

“AI models will handle most aspects of software engineering tasks from start to finish within 6-12 months”

Jan 2026 | interview

We assess this claim as 70% likely

Andrej Karpathy

AI Researcher, former Tesla AI Director, educator

Track record: 8/10

“Agentic engineering (AI agents writing 99% of code, humans as oversight) becomes the default professional workflow”

Feb 2026 | blog

We assess this claim as 50% more likely than not

Mustafa Suleyman

CEO of Microsoft AI

Track record: 6/10

“Most professional tasks involving sitting at a computer will be fully automated by AI within 12-18 months”

Feb 2026 | interview

We assess this claim as 5% very unlikely

Boris Cherny

Head of Claude Code, Anthropic

Track record: 8/10

“AI can already write 100% of production code; top engineers using AI are 10x more productive”

Feb 2026 | interview

We assess this claim as 70% likely

Cursor (Anysphere)

AI Code Editor ($2B ARR, 12 employees)

Track record: 8/10

“AI-native companies can achieve billion-dollar revenue with teams of <20 people”

Feb 2026 | product

We assess this claim as 95% near certain

What Could Go Wrong

Liability and trust prevent 'no human review' even when AI is technically capable. Companies continue requiring human sign-off for legal and cultural reasons. The gap between 'AI can write the code' and 'we trust AI to ship the code' proves wider than expected. The Sonar finding — Opus 4.6 writes buggier code than Opus 4.5 despite being smarter — suggests the quality problem is structural, not just a matter of model capability.

Your Company's Next Senior Engineer Won't Be Human

Your Prediction

Scenarios

How We'll Know

Evidence Trail

Evidence For

Evidence Against

How Our View Evolved

What Experts Say

Dario Amodei

Andrej Karpathy

Mustafa Suleyman

Boris Cherny

Cursor (Anysphere)

What Could Go Wrong

What should we track about this topic?