You should build evaluation engineers instead of evaluation sets

How to design evals for long-running agent harnesses that Claude Code can hill climb. Current evaluation approaches are insufficient for agents working hours on complex tasks.

Details

City

Date

Time

Speaker(s)

Anker Bach Ryhl

CEO & Co-founder,

Parahelp

Anthropic's developer conference, recorded

Keynotes, demos, and conversations with the teams behind Claude. Recorded at Code w/ Claude 2026 San Francisco and ready to replay.