Can AI Do Your Entire Workday Without You?
Will AI systems autonomously complete a full 8-hour professional workday — multiple tasks, context switching, decision-making — without human intervention by end of 2027?
This isn't about whether AI takes your job tomorrow — it's about how fast the 'AI can't do that' list is shrinking.
Your Prediction
Where do you think this lands?
Join others who've weighed in
Scenarios
Current value: 14.5 hours on single METR tasks (Opus 4.6, Feb 2026); Boris Cherny running ~AI workday for coding; Claude agent teams coordinating multi-agent workflows
S-curve position: Steep mid-curve — single-task autonomy nearly solved, multi-task coordination emerging rapidly
Single tasks only through 2028 (multi-task coordination, real-world messiness, interpersonal judgment too hard)
6-8 hour semi-autonomous work sessions for structured professional work; full autonomy for coding/analysis domains
Full autonomous workday by Q3 2027 (Opus 4.5/4.6 leap suggests nonlinear progress in planning + memory)
How We'll Know
- What we measure
- Whether AI systems can autonomously complete a realistic 8-hour professional workday simulation involving multiple diverse tasks, context switching, and decision-making
- Confirmed if
- Frontier AI models demonstrate autonomous completion of multi-task 8-hour workday simulations, OR multiple companies publicly deploy AI for full-day autonomous work
- Refuted if
- Best frontier models remain limited to single-task autonomy below 4 hours on realistic workday simulations
- Data sources
- METR autonomous task evaluations
- SWE-bench Pro
- RE-bench (ML research)
- Company-reported agent evaluations
- Third-party autonomous work benchmarks
Evidence Trail
Evidence For
- Mar 7, 2026
METR Opus 4.6: 14.5-hour task horizon (50% success). Task horizon doubled from ~5.3hr to 14.5hr in ~4 months. Claude agent teams mode in production. 57% of enterprises running multi-step agent workflows.→ Probability: 40%
- Mar 7, 2026
Boris Cherny: 22-27 PRs/day with 100% AI code — effectively an AI coding workday. Opus 4.5/4.6 qualitative leap in multi-step reasoning. 89-day doubling rate projects 40+ hour task horizon by late 2027. Inference cost collapse (200x/year) enables longer autonomous sessions economically. Power-law: top 10% already approaching AI workday for specific domains.→ Probability: 55%
- Mar 9, 2026
GPT-5.4 (March 2026) scored 75% on OSWorld desktop automation — exceeding the human expert baseline of 72.4%. First frontier model to beat humans on full desktop workflow automation. Also achieved 83% GDPval score matching industry professionals across 44 occupations. Gartner predicts 40% of enterprise apps will embed AI agents by end of 2026.→ Probability: 60%
Evidence Against
- Mar 7, 2026
METR notes its task suite is 'nearly saturated' — unclear if results transfer to new task types. A workday involves context switching, interpersonal judgment, exception handling — qualitatively different from benchmark tasks. Diminishing returns likely as tasks become more open-ended.
How Our View Evolved
- Mar 9, 202655%↑60%
GPT-5.4 exceeded human expert baseline on OSWorld desktop automation (75% vs 72.4%). First model to beat humans on full workday simulation. Significant milestone for the autonomous workday thesis.
- Mar 8, 2026Initial assessment: 55%
Baseline — initial published assessment
What Experts Say
Dario Amodei
CEO, Anthropic
“AI models will handle most aspects of software engineering tasks from start to finish within 6-12 months”
Dario Amodei
CEO, Anthropic
“Systems capable of outperforming Nobel laureates across most fields could arrive by 2027-2028”
Demis Hassabis
CEO, Google DeepMind; Nobel Laureate
“AGI is 3-5 years away; current systems lack reasoning, hierarchical planning, and long-term memory”
Andrej Karpathy
AI Researcher, former Tesla AI Director, educator
“Agentic engineering (AI agents writing 99% of code, humans as oversight) becomes the default professional workflow”
Gary Marcus
AI Researcher, NYU Professor Emeritus, AI critic
“AGI will not arrive in 2026 or 2027”
Boris Cherny
Head of Claude Code, Anthropic
“AI can already write 100% of production code; top engineers using AI are 10x more productive”