AI Forecast Tracker
← Back to forecasts

Your Company's Next Senior Engineer Won't Be Human

Will AI autonomously ship production features — from spec to deployment — without human code review at major tech companies by end of 2027?

If you're a software engineer, this determines whether AI is your best tool or your replacement.

Target: Dec 2027(664 days until resolution)
Assessed Probability
55%
More likely than not
Based on 5 expert predictions, 4 evidence items
Community Forecast
Cast your vote
Be the first to weigh in below

Your Prediction

Where do you think this lands?

Join others who've weighed in

5%95%
50% — More likely than not
Boris Cherny, Head of Claude Code at Anthropic, ships 22-27 PRs per day with 100% AI-written code. Cursor built a $2B ARR business with 12 employees. SWE-bench went from 4% to 80% in 26 months. The acceleration since Opus 4.5 (November 2025) has been qualitatively different — not just writing code faster, but understanding codebases, making architectural decisions, and running multi-step agent workflows. The power-law productivity distribution is key: top 10% of engineers with AI are 10x more productive, and that gap is widening every month. The remaining question isn't whether AI can ship features — it demonstrably can for the best practitioners. It's whether companies will trust AI to ship without human review. And the answer is increasingly yes, for internal tools and non-critical paths first, then expanding. Dev freelance rates already dropped 36% ($75→$48/hr) — the market is pricing in this shift.

Scenarios

Current value: 46% AI-generated code, SWE-bench 80%, Boris Cherny 100% AI code at 22-27 PRs/day, Cursor $2B ARR / 12 people

S-curve position: Steep mid-curve — code generation nearly solved, autonomous shipping emerging rapidly

Bear Case

Still requires human review for all production code (liability, quality, cultural resistance)

Base Case

Routine and internal features shipped autonomously at 5+ major tech companies; critical features still human-reviewed

Bull Case

Standard practice by Q3 2027 (agent frameworks mature, CI/CD integration, liability frameworks emerge)

How We'll Know

What we measure
Whether AI systems autonomously ship production features (spec → code → test → deploy) without human code review at companies with 1000+ engineers
Confirmed if
At least 3 major tech companies publicly confirm AI autonomously ships production features without human code review as standard practice
Refuted if
All major AI coding tools still require human review for production deployment by end 2027
Data sources
  • GitHub Copilot / Agent HQ metrics
  • Company engineering blog posts
  • SWE-bench / METR evaluations
  • Developer surveys (Stack Overflow, JetBrains)

Evidence Trail

Evidence For

  • Mar 7, 2026

    GitHub Copilot writes 46% of code for active users. Claude Sonnet 4.6 at 80.8% SWE-bench Verified. METR task horizon at 14.5 hours. 57% of enterprises running multi-step agent workflows. GitHub Agent HQ runs multiple AI models on same codebase.→ Probability: 35%

  • Mar 7, 2026

    Boris Cherny: 100% AI code, 22-27 PRs/day (practitioner proof). Cursor: $2B ARR with 12 employees (market proof). SWE-bench: 4%→80% in 26 months (benchmark proof). Dev freelance rates: -36% (price proof). Power-law: top 10% engineers are 10x productive with AI. Solo founders building complex products. Inference cost decline 200x/year makes AI coding nearly free.→ Probability: 55%

Evidence Against

  • Mar 7, 2026

    SWE-bench measures isolated tasks, not production complexity. Enterprise legacy codebases resist AI. Code review exists for liability and quality, not just correctness. Architectural decisions, cross-team coordination remain human. No major company has publicly shipped production features without human review yet.

  • Mar 8, 2026

    Benchmark-to-production gap remains large: Opus 4.5 scores 80% on SWE-bench Verified but only 18% on private codebases (SWE-bench Pro). Sonar (Feb 2026): Opus 4.6 has 21% MORE issue density and 55% MORE vulnerability density than Opus 4.5 — smarter model ≠ safer code. Code review bottleneck: 21% more tasks completed but review times up 91% (Faros AI, Jan 2026). Cursor BugBot: only 35% of AI-generated fixes merge unmodified. Technical debt accumulating: 14 different DB connection patterns in one AI-generated codebase, code duplication up 4x. The quality gap has shifted from 'can it write code' to 'can you trust the code it writes.'

What Experts Say

Dario Amodei

CEO, Anthropic

Track record: 8/10
AI models will handle most aspects of software engineering tasks from start to finish within 6-12 months
Jan 2026 | interview
We assess this claim as 65% likely

Andrej Karpathy

AI Researcher, former Tesla AI Director, educator

Track record: 8/10
Agentic engineering (AI agents writing 99% of code, humans as oversight) becomes the default professional workflow
Feb 2026 | blog
We assess this claim as 50% more likely than not

Mustafa Suleyman

CEO of Microsoft AI

Track record: 6/10
Most professional tasks involving sitting at a computer will be fully automated by AI within 12-18 months
Feb 2026 | interview
We assess this claim as 5% very unlikely

Boris Cherny

Head of Claude Code, Anthropic

Track record: 8/10
AI can already write 100% of production code; top engineers using AI are 10x more productive
Feb 2026 | interview
We assess this claim as 70% likely

Cursor (Anysphere)

AI Code Editor ($2B ARR, 12 employees)

Track record: 8/10
AI-native companies can achieve billion-dollar revenue with teams of <20 people
Feb 2026 | product
We assess this claim as 95% near certain

What Could Go Wrong

Liability and trust prevent 'no human review' even when AI is technically capable. Companies continue requiring human sign-off for legal and cultural reasons. The gap between 'AI can write the code' and 'we trust AI to ship the code' proves wider than expected. The Sonar finding — Opus 4.6 writes buggier code than Opus 4.5 despite being smarter — suggests the quality problem is structural, not just a matter of model capability.

What should we track about this topic?