You should build evaluation engineers instead of evaluation sets

How to design evals for long-running agent harnesses that Claude Code can hill climb. Current evaluation approaches are insufficient for agents working hours on complex tasks.

Details

City
Date
Time
Speaker(s)
Anker Bach Ryhl
CEO & Co-founder,
 
Parahelp