.jpg)
We’re upgrading our smartest model. The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, and features a 1M token context window.


.jpg)
We’re upgrading our smartest model. The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, and features a 1M token context window.
We’re upgrading our smartest model. The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, and features a 1M token context window.
We’re upgrading our smartest model. The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, and features a 1M token context window.
.jpg)
We’re upgrading our smartest model. The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, and features a 1M token context window.
We’re upgrading our smartest model. The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, and features a 1M token context window.
We’re upgrading our smartest model. The new Claude Opus 4.6 improves on its predecessor’s coding skills. It plans more carefully, sustains agentic tasks for longer, and features a 1M token context window.
Fundamental Research Labs is a two-year-old research lab focused on building more human-like AI. Their first commercial product, Shortcut, is an AI-powered spreadsheet tool that works across Excel, Google Sheets, and a standalone web and desktop app.
Spreadsheet tasks are deceptively complex for AI agents. A single financial model can contain hundreds of thousands of related cells across multiple sheets, and lacks the typical scaffolding afforded to coding agents to do the job. Nico Christie, who leads Shortcut at Fundamental Research Labs, previously worked in financial consulting where a team could spend weeks iterating on a single model before getting client sign-off. “AI is coming for spreadsheets now the same way it did for coding,” Christie said.
The work involves data extraction from documents, formula creation across interconnected cells, error detection, and model auditing. Tasks that involve finishing or auditing these models require both accuracy and an understanding of how sheets relate to each other.
Fundamental Research Labs built a benchmarking infrastructure to measure AI performance on these tasks: realistic, difficult Excel problems with verifiably correct answers. When they first launched their product Shortcut, the best models they tested scored in the 4-to-5.5 range out of 10. With other model providers, Christie reported that tasks failed roughly 70% of the time.
After testing multiple model providers against its benchmark pipeline, Claude became the only model in production for Shortcut.
A key factor in the selection was how little adaptation Claude required. Other models required rounds of prompt engineering and benchmarking to work around model-specific behavior. Swapping in Opus required none. “There's very little Claude-specific prompting we have to do," Christie said.
Each subsequent Claude release has reinforced that decision. When Anthropic released Opus 4.6 in February, the score went from 7.29 to 8.08 out of 10.
“It was a step change in improvement,” Christie said. “Hard tasks that were impossible became doable. Medium tasks became easy. Easy tasks were just completely saturated. It was a total change, in the same way it was for coding."
Shortcut's architecture uses Claude in a multi-agent pattern. When a user asks Shortcut to audit a complex workbook, the system spins off multiple Claude sub-agents to explore each sheet in parallel, similar to how Claude Code operates. For a ten-sheet financial model, that might mean six to ten agents running simultaneously, each analyzing a different tab for errors, structural issues, and missing data. These agents gather context and feed findings back to a main agent.
Before executing changes, the system enters planning mode where Claude reviews the workbook, identifies issues, and asks clarifying questions. Once the plan is approved, Shortcut hands execution to a fresh Claude agent, which keeps the execution context clean and focused.
Each Claude model upgrade has delivered measurable gains for Shortcut without requiring engineering work. For Christie, that pattern shapes how the team thinks about its roadmap: as Claude's capabilities improve, so does what Shortcut can offer its users.
“Excel is used by a billion or so people, spreadsheets by two billion,” Christie said. “Shortcut’s mission is to give the feeling we get when using Claude to a billion Spreadsheet users around the world."