Can AI Do Your Entire Workday Without You?

Will AI systems autonomously complete a full 8-hour professional workday — multiple tasks, context switching, decision-making — without human intervention by end of 2027?

This isn't about whether AI takes your job tomorrow — it's about how fast the 'AI can't do that' list is shrinking.

Target: Dec 2027(629 days until resolution)

Assessed Probability

65%

Likely

Based on 6 expert predictions, 5 evidence items

Community Forecast

Cast your vote

Be the first to weigh in below

Your Prediction

Where do you think this lands?

Join others who've weighed in

5%95%

50% — More likely than not

Share on LinkedIn Share on X

METR confirmed Opus 4.6 handles individual tasks taking 14+ hours of expert work. The task horizon doubled from ~5.3 to 14.5 hours in just 4 months. But the real signal isn't benchmarks — it's practitioners. Boris Cherny ships 22-27 PRs per day with 100% AI code, effectively running an AI workday for coding tasks. Claude agent teams coordinate multiple agents on different parts of a codebase simultaneously. The Opus 4.5/4.6 leap (November 2025) was qualitatively different from prior improvements — not just faster, but able to handle the kind of multi-step reasoning, context management, and decision-making that workdays require. If the 89-day doubling rate holds through 2027, the math works. The power-law applies here too: for the top 10% of AI-fluent professionals, the autonomous workday is already approaching reality for specific domains. For the average knowledge worker, it's further out.

Scenarios

Current value: 14.5 hours on single METR tasks (Opus 4.6, Feb 2026); Boris Cherny running ~AI workday for coding; Claude agent teams coordinating multi-agent workflows

S-curve position: Steep mid-curve — single-task autonomy nearly solved, multi-task coordination emerging rapidly

Bear Case

Single tasks only through 2028 (multi-task coordination, real-world messiness, interpersonal judgment too hard)

Base Case

6-8 hour semi-autonomous work sessions for structured professional work; full autonomy for coding/analysis domains

Bull Case

Full autonomous workday by Q3 2027 (Opus 4.5/4.6 leap suggests nonlinear progress in planning + memory)

How We'll Know

What we measure: Whether AI systems can autonomously complete a realistic 8-hour professional workday simulation involving multiple diverse tasks, context switching, and decision-making
Confirmed if: Frontier AI models demonstrate autonomous completion of multi-task 8-hour workday simulations, OR multiple companies publicly deploy AI for full-day autonomous work
Refuted if: Best frontier models remain limited to single-task autonomy below 4 hours on realistic workday simulations
Data sources: METR autonomous task evaluations
SWE-bench Pro
RE-bench (ML research)
Company-reported agent evaluations
Third-party autonomous work benchmarks

Evidence Trail

Evidence For

Mar 7, 2026
METR Opus 4.6: 14.5-hour task horizon (50% success). Task horizon doubled from ~5.3hr to 14.5hr in ~4 months. Claude agent teams mode in production. 57% of enterprises running multi-step agent workflows.→ Probability: 40%
Mar 7, 2026
Boris Cherny: 22-27 PRs/day with 100% AI code — effectively an AI coding workday. Opus 4.5/4.6 qualitative leap in multi-step reasoning. 89-day doubling rate projects 40+ hour task horizon by late 2027. Inference cost collapse (200x/year) enables longer autonomous sessions economically. Power-law: top 10% already approaching AI workday for specific domains.→ Probability: 55%
Mar 9, 2026
GPT-5.4 (March 2026) scored 75% on OSWorld desktop automation — exceeding the human expert baseline of 72.4%. First frontier model to beat humans on full desktop workflow automation. Also achieved 83% GDPval score matching industry professionals across 44 occupations. Gartner predicts 40% of enterprise apps will embed AI agents by end of 2026.→ Probability: 60%
Apr 10, 2026
METR added GPT-5.4 to its time horizon benchmark on April 10 2026. With the doubling trend now estimated at 7 months (vs the earlier 89-day figure), METR's own projection shows autonomous 8-hour workday capability reached by end of 2026 — ahead of the current probability's implicit timeline. Opus 4.6 sits at 719min (50% success) and 70min (80% success), consistent with the trajectory. The incremental shift is small but directional evidence is unambiguous.→ Probability: 65%

Evidence Against

Mar 7, 2026
METR notes its task suite is 'nearly saturated' — unclear if results transfer to new task types. A workday involves context switching, interpersonal judgment, exception handling — qualitatively different from benchmark tasks. Diminishing returns likely as tasks become more open-ended.

How Our View Evolved

Apr 10, 202660%↑65%
METR added GPT-5.4 to its time horizon benchmark (April 10). Doubling rate of 7 months projects 8-hour autonomous workday by end of 2026 per METR's own published trajectory. Ahead of our prior timeline — conservative +0.05.
Mar 9, 202655%↑60%
GPT-5.4 exceeded human expert baseline on OSWorld desktop automation (75% vs 72.4%). First model to beat humans on full workday simulation. Significant milestone for the autonomous workday thesis.
Mar 8, 2026Initial assessment: 55%
Baseline — initial published assessment

What Experts Say

Dario Amodei

CEO, Anthropic

Track record: 8/10

“AI models will handle most aspects of software engineering tasks from start to finish within 6-12 months”

Jan 2026 | interview

We assess this claim as 70% likely

Dario Amodei

CEO, Anthropic

Track record: 8/10

“Systems capable of outperforming Nobel laureates across most fields could arrive by 2027-2028”

Oct 2025 | blog

We assess this claim as 15% very unlikely

Demis Hassabis

CEO, Google DeepMind; Nobel Laureate

Track record: 9/10

“AGI is 3-5 years away; current systems lack reasoning, hierarchical planning, and long-term memory”

Feb 2026 | interview

We assess this claim as 35% roughly even odds

Andrej Karpathy

AI Researcher, former Tesla AI Director, educator

Track record: 8/10

“Agentic engineering (AI agents writing 99% of code, humans as oversight) becomes the default professional workflow”

Feb 2026 | blog

We assess this claim as 50% more likely than not

Gary Marcus

AI Researcher, NYU Professor Emeritus, AI critic

Track record: 7/10

“AGI will not arrive in 2026 or 2027”

Dec 2025 | blog

We assess this claim as 85% very likely

Boris Cherny

Head of Claude Code, Anthropic

Track record: 8/10

“AI can already write 100% of production code; top engineers using AI are 10x more productive”

Feb 2026 | interview

We assess this claim as 70% likely

What Could Go Wrong

Benchmark saturation creates illusion of general capability. Real workdays involve ambiguity, social interaction, and judgment calls that don't appear in standardized evaluations. The doubling trend breaks down above 16 hours as tasks require fundamentally different capabilities.

Can AI Do Your Entire Workday Without You?

Your Prediction

Scenarios

How We'll Know

Evidence Trail

Evidence For

Evidence Against

How Our View Evolved

What Experts Say

Dario Amodei

Dario Amodei

Demis Hassabis

Andrej Karpathy

Gary Marcus

Boris Cherny

What Could Go Wrong

What should we track about this topic?