Open-Source AI Will Break Big Tech's Grip on Intelligence

Will open-weight models consistently match proprietary frontier performance — making AI effectively free — by end of 2027?

If intelligence becomes free, every AI business model built on charging for it needs to be rethought — including the ones your retirement portfolio is betting on.

Target: Dec 2027(629 days until resolution)

Assessed Probability

68%

Likely

Based on 2 expert predictions, 4 evidence items

Community Forecast

Cast your vote

Be the first to weigh in below

Your Prediction

Where do you think this lands?

Join others who've weighed in

5%95%

50% — More likely than not

Share on LinkedIn Share on X

The most underappreciated story of 2026 isn't a new model — it's that intelligence is becoming free. DeepSeek V3.2 delivers frontier-class performance at $0.14 per million input tokens — 27-35x cheaper than leading proprietary models. Chinese open-source models now account for 30% of all global AI downloads, surpassing the US at 15.7%. The MMLU gap between open and proprietary models closed from 17.5 to 0.3 percentage points in a single year. Meta's Llama 4 Scout offers 10 million token context windows with open weights. A RAND report found Chinese models run at one-sixth to one-quarter the cost of comparable American systems. Traditional benchmarks (MMLU, GSM8K) show 90%+ accuracy across all major models — open and proprietary are now indistinguishable on these tests. The implication is profound: if open-weight models match proprietary on capability while costing 10-35x less, the entire pricing power of OpenAI ($730B), Anthropic ($380B), and Google rests on a narrowing moat. The question is whether harder tasks (FrontierMath, ARC-AGI-2) maintain a meaningful gap — or whether open-source catches up there too.

Scenarios

Current value: MMLU gap: 0.3pp (effectively zero). DeepSeek V3.2 at $0.14/M tokens. FrontierMath gap still large (GPT-5.4 at 47.6%, open-source much lower). ARC-AGI-2: best open at ~40% vs proprietary ~83%.

S-curve position: Mid-curve on easy benchmarks (saturated), early curve on hard benchmarks (still large gap)

Bear Case

Persistent 20%+ gap on hard tasks (proprietary compute advantage at frontier is too large for open-source to match)

Base Case

Parity on most practical tasks, 10-15% gap on frontier reasoning — but the practical gap is irrelevant for 90% of use cases

Bull Case

Full parity by mid-2027 (Chinese investment + Llama 5 + DeepSeek V5 close the gap on hard tasks)

How We'll Know

What we measure: Whether open-weight models match proprietary frontier models on hard benchmarks (FrontierMath, ARC-AGI-2, SWE-bench) while costing 10x+ less
Confirmed if: Top open-weight model scores within 5% of best proprietary model on 3+ hard benchmarks (FrontierMath, ARC-AGI-2, SWE-bench) at <10% of the cost
Refuted if: Proprietary models maintain >15% lead on hard benchmarks through end 2027, or the gap widens
Data sources: LMSYS Chatbot Arena rankings
FrontierMath leaderboard
ARC-AGI-2 results
SWE-bench Verified
Model pricing databases (artificial analysis)

Evidence Trail

Evidence For

Mar 9, 2026
DeepSeek V3.2: $0.14/M tokens (27-35x cheaper). MMLU gap: 17.5pp to 0.3pp in one year. Chinese open-source: 30% of global AI downloads (US: 15.7%). RAND: Chinese models at 1/6 to 1/4 cost. Llama 4 Scout: 10M token context (open weights). 81% of enterprises use 3+ model families — multi-model reality established.→ Probability: 55%
Mar 9, 2026
Gemini 3.1 Pro at $2/$12 per M tokens (lowest price for frontier reasoning). DeepSeek V4 expected with 1T parameters at 32B active — near-zero marginal cost. Traditional benchmarks (MMLU, GSM8K) now 90%+ for all major models. Open-source models can be run on-premise, avoiding API costs entirely. The economic case for proprietary is eroding for all but the hardest tasks.→ Probability: 60%
Apr 10, 2026
MiniMax M2.5 hits 80.2% SWE-bench Verified — within 0.6 points of Claude Opus 4.6 (80.8%), which is the leading proprietary model. GLM-5 scores 77.8%. This is the first open-weight model to cross the 5%-of-best threshold on one of the three resolution-criteria hard benchmarks (SWE-bench). Cost advantage sustained at 27-35x per DeepSeek V3.2 April 2026 data. ARC-AGI-2 gap still meaningful (GPT-5.4 ~83% vs best open ~52%) and FrontierMath remains proprietary-favored — two of three benchmarks still wide.→ Probability: 68%

Evidence Against

Mar 9, 2026
FrontierMath: GPT-5.4 at 47.6%, open-source far behind. ARC-AGI-2: best systems at 83% but require massive compute. Proprietary models maintain lead on the hardest reasoning tasks. Enterprise buyers prefer proprietary for compliance, support, liability. The 0.3pp MMLU gap is on saturated benchmarks — hard tasks show much larger gaps.

How Our View Evolved

Apr 10, 202660%↑68%
MiniMax M2.5 at 80.2% SWE-bench Verified crosses near-parity with Claude Opus 4.6 (80.8%) — first open-weight model to hit one of three resolution-criteria hard benchmarks within 5%. GLM-5 at 77.8% also competitive. Cost advantage sustained 27-35x. ARC-AGI-2 and FrontierMath gaps persist. +0.08 conservative (one of three benchmarks converged).
Mar 9, 2026Initial assessment: 60%
Baseline — initial published assessment. Based on DeepSeek pricing, MMLU gap closure, Chinese open-source adoption data.

What Experts Say

Penn Wharton Budget Model

Nonpartisan economic research, University of Pennsylvania

Track record: 8/10

“AI will increase US GDP by approximately 1.5% by 2035 and roughly 3% by 2055”

Jun 2025 | academic_research

We assess this claim as 55% more likely than not

Mark Zuckerberg

CEO, Meta

Track record: 7/10

“AI agents will write most of Meta's code in the near future”

Jan 2026 | corporate_statement