AI Forecast Tracker
← Back to forecasts

Open-Source AI Will Break Big Tech's Grip on Intelligence

Will open-weight models consistently match proprietary frontier performance — making AI effectively free — by end of 2027?

If intelligence becomes free, every AI business model built on charging for it needs to be rethought — including the ones your retirement portfolio is betting on.

Target: Dec 2027(664 days until resolution)
Assessed Probability
60%
More likely than not
Based on 2 expert predictions, 3 evidence items
Community Forecast
Cast your vote
Be the first to weigh in below

Your Prediction

Where do you think this lands?

Join others who've weighed in

5%95%
50% — More likely than not
The most underappreciated story of 2026 isn't a new model — it's that intelligence is becoming free. DeepSeek V3.2 delivers frontier-class performance at $0.14 per million input tokens — 27-35x cheaper than leading proprietary models. Chinese open-source models now account for 30% of all global AI downloads, surpassing the US at 15.7%. The MMLU gap between open and proprietary models closed from 17.5 to 0.3 percentage points in a single year. Meta's Llama 4 Scout offers 10 million token context windows with open weights. A RAND report found Chinese models run at one-sixth to one-quarter the cost of comparable American systems. Traditional benchmarks (MMLU, GSM8K) show 90%+ accuracy across all major models — open and proprietary are now indistinguishable on these tests. The implication is profound: if open-weight models match proprietary on capability while costing 10-35x less, the entire pricing power of OpenAI ($730B), Anthropic ($380B), and Google rests on a narrowing moat. The question is whether harder tasks (FrontierMath, ARC-AGI-2) maintain a meaningful gap — or whether open-source catches up there too.

Scenarios

Current value: MMLU gap: 0.3pp (effectively zero). DeepSeek V3.2 at $0.14/M tokens. FrontierMath gap still large (GPT-5.4 at 47.6%, open-source much lower). ARC-AGI-2: best open at ~40% vs proprietary ~83%.

S-curve position: Mid-curve on easy benchmarks (saturated), early curve on hard benchmarks (still large gap)

Bear Case

Persistent 20%+ gap on hard tasks (proprietary compute advantage at frontier is too large for open-source to match)

Base Case

Parity on most practical tasks, 10-15% gap on frontier reasoning — but the practical gap is irrelevant for 90% of use cases

Bull Case

Full parity by mid-2027 (Chinese investment + Llama 5 + DeepSeek V5 close the gap on hard tasks)

How We'll Know

What we measure
Whether open-weight models match proprietary frontier models on hard benchmarks (FrontierMath, ARC-AGI-2, SWE-bench) while costing 10x+ less
Confirmed if
Top open-weight model scores within 5% of best proprietary model on 3+ hard benchmarks (FrontierMath, ARC-AGI-2, SWE-bench) at <10% of the cost
Refuted if
Proprietary models maintain >15% lead on hard benchmarks through end 2027, or the gap widens
Data sources
  • LMSYS Chatbot Arena rankings
  • FrontierMath leaderboard
  • ARC-AGI-2 results
  • SWE-bench Verified
  • Model pricing databases (artificial analysis)

Evidence Trail

Evidence For

  • Mar 9, 2026

    DeepSeek V3.2: $0.14/M tokens (27-35x cheaper). MMLU gap: 17.5pp to 0.3pp in one year. Chinese open-source: 30% of global AI downloads (US: 15.7%). RAND: Chinese models at 1/6 to 1/4 cost. Llama 4 Scout: 10M token context (open weights). 81% of enterprises use 3+ model families — multi-model reality established.→ Probability: 55%

  • Mar 9, 2026

    Gemini 3.1 Pro at $2/$12 per M tokens (lowest price for frontier reasoning). DeepSeek V4 expected with 1T parameters at 32B active — near-zero marginal cost. Traditional benchmarks (MMLU, GSM8K) now 90%+ for all major models. Open-source models can be run on-premise, avoiding API costs entirely. The economic case for proprietary is eroding for all but the hardest tasks.→ Probability: 60%

Evidence Against

  • Mar 9, 2026

    FrontierMath: GPT-5.4 at 47.6%, open-source far behind. ARC-AGI-2: best systems at 83% but require massive compute. Proprietary models maintain lead on the hardest reasoning tasks. Enterprise buyers prefer proprietary for compliance, support, liability. The 0.3pp MMLU gap is on saturated benchmarks — hard tasks show much larger gaps.

What Experts Say

Penn Wharton Budget Model

Nonpartisan economic research, University of Pennsylvania

Track record: 8/10
AI will increase US GDP by approximately 1.5% by 2035 and roughly 3% by 2055
Jun 2025 | academic_research
We assess this claim as 55% more likely than not

Mark Zuckerberg

CEO, Meta

Track record: 7/10
AI agents will write most of Meta's code in the near future
Jan 2026 | corporate_statement
We assess this claim as 55% more likely than not

What Could Go Wrong

Proprietary models maintain a meaningful capability lead on hard reasoning tasks that matter for enterprise use cases. Enterprise buyers choose proprietary for compliance and support even when open-source is technically equivalent. The cost gap narrows as proprietary models get cheaper too.

What should we track about this topic?