AI Forecast Tracker
← Back to forecasts

AI Is Writing Code That Will Break the Internet

Will a major publicly-disclosed security breach or outage be directly attributed to AI-generated code by end of 2027?

If your company deploys AI-generated code, you're running a security experiment whether you know it or not.

Target: Dec 2027(629 days until resolution)
Assessed Probability
75%
Likely
Based on 1 expert predictions, 4 evidence items
Community Forecast
Cast your vote
Be the first to weigh in below

Your Prediction

Where do you think this lands?

Join others who've weighed in

5%95%
50% — More likely than not
Everyone celebrates that AI writes code faster. Nobody talks about what happens when that code blows up in production. Veracode's 2026 analysis: only 55% of AI-generated code is secure. The best model on BaxBench produced safe code just 56% of the time. Opsera's benchmark of 250,000+ developers: AI code introduces 15-18% more security vulnerabilities than human-written code. Cortex.io's Engineering Benchmark: incident rate per PR up 23.5%, change failure rate up 30% at AI-heavy teams. And 69% of developers have already found AI-introduced vulnerabilities in their systems — 1 in 5 report material business impact. Meanwhile, 'vibe coding' has gone mainstream: non-programmers shipping full-stack apps they can't read or audit through Vercel v0, Replit, and Lovable ($300M valuation). The New Stack predicts 'big explosions coming.' fast.ai calls it 'gambling addiction — losses disguised as wins.' The question isn't whether AI code has vulnerabilities — it demonstrably does. The question is how long before one of them causes a headline-making disaster.

Scenarios

Current value: 69% of developers found AI-introduced vulnerabilities. 1 in 5 had material business impact. No headline-grabbing incident yet.

S-curve position: Pre-incident — vulnerabilities are accumulating but no catastrophic event yet

Bear Case

No major incident (security scanning improves faster, AI code stays in non-critical paths)

Base Case

1-2 significant incidents, at least one making mainstream news, by end 2027

Bull Case

Multiple major breaches by mid-2027 (vibe-coded apps in production, supply chain attack via AI-generated dependency)

How We'll Know

What we measure
Whether a major security breach, data leak, or service outage is publicly attributed primarily to AI-generated or AI-assisted code
Confirmed if
A publicly-disclosed security incident affecting 1M+ users or causing $100M+ in damages is attributed primarily to AI-generated code
Refuted if
No major incident is attributed to AI code through end 2027, despite widespread AI code deployment
Data sources
  • CVE database
  • NIST National Vulnerability Database
  • Major breach disclosure reports
  • Veracode / Opsera annual security reports
  • News coverage of AI-attributed incidents

Evidence Trail

Evidence For

  • Mar 9, 2026

    Veracode 2026: only 55% of AI code secure. BaxBench: best model (Claude Opus 4.5) secure only 56% of the time. Opsera (250K+ devs): 15-18% more vulnerabilities. Cortex.io: incident rate per PR up 23.5%, change failure rate +30%. 69% of developers found AI vulnerabilities, 1 in 5 had material business impact. Sonar: Opus 4.6 has 21% more issue density than Opus 4.5. 'Architecture by Autocomplete' producing unnecessary micro-abstractions and N+1 query bugs.→ Probability: 60%

  • Mar 9, 2026

    Vibe coding going mainstream — VibeKode conference in Munich (June 2026). Lovable ($300M), Vercel v0, Replit enabling non-programmers to ship full-stack apps. 57% of orgs using AI for multi-step engineering workflows. Claude Code authors 4% of all GitHub commits (~135K/day). AI-authored production code at 26.9% and growing. The attack surface is expanding faster than security tooling can keep up.→ Probability: 65%

  • Apr 10, 2026

    AI-generated code now implicated in 1 in 5 enterprise security breaches. CVE entries attributed to AI-generated code grew from 6 in January to 15 in February to 35+ in March 2026 — exponential trajectory. Amazon's 6-hour outage in March 2026 affecting 6.3M orders was linked to AI code defects. Claude Code CVE disclosed in April 2026 enabling remote code execution. Independent research shows AI code carries 2.74x more XSS vulnerabilities and 86% fails injection defense. The 'major incident' threshold may already be met by the Amazon outage alone; the CVE trajectory suggests more are coming in 2026. Maximum allowed 30-day probability delta (+0.10) justified by the combination of incident severity, CVE exponential growth, and disclosed CVE against the tool writing the most AI code.→ Probability: 75%

Evidence Against

  • Mar 9, 2026

    Major platforms adding security scanning before deployment. AI security tools (BugBot, Snyk AI) improving. Companies may keep AI code out of critical paths. The 'major incident' threshold ($100M+ or 1M+ users) is high — many smaller incidents may happen without crossing it.

How Our View Evolved

  • Apr 10, 202665%75%

    CVE trajectory 6→15→35 over Jan/Feb/Mar 2026 (exponential). Amazon 6.3M-order outage linked to AI code defects. Claude Code CVE disclosed April 2026 (remote code execution). 1 in 5 enterprise breaches now involve AI-generated code. Applied maximum +0.10 delta for 30-day window — the Amazon incident alone may already meet the major-incident threshold.

  • Mar 9, 2026Initial assessment: 65%

    Baseline — initial published assessment. Based on Veracode, Cortex.io, Opsera security data and vibe coding trend.

What Experts Say

Cortex.io (Engineering Benchmark Report 2026)

Engineering Intelligence Platform

Track record: 6/10
AI-heavy engineering teams experience 23.5% higher incident rates per pull request and approximately 30% higher change failure rates
Feb 2026 | industry_report
We assess this claim as 65% likely

What Could Go Wrong

Security scanning tools improve fast enough to catch AI-generated vulnerabilities before production. The explosion happens at a company too small to make headlines. AI code stays concentrated in internal tools and non-critical features where breaches don't matter.

What should we track about this topic?