You should build evaluation engineers instead of evaluation sets

How to design evals for long-running agent harnesses that Claude Code can hill climb. Current evaluation approaches are insufficient for agents working hours on complex tasks.
Learn more

Details

City
San Francisco
Date
May 7, 2026
Time
10:45AM – 11:15AM
Speakers
Anker Bach Ryhl
CEO & Co-founder,
 
Parahelp

Agenda

Demos and office hours run all day. Drop by between sessions. All times in Pacific Time (PT).
08:00AM – 09:00AM
Check in and breakfast
09:00AM – 09:45AM
Keynote
Claude Code
·
Boris Cherny
10:00AM – 10:30AM
Morning break
morning sessions
10:00AM – 10:30AM
Do agents dream of data models?
Founder stage
·
Caitlin Colgrove
Claude and Sol the trophy tomato
Builder stage
·
Martin DeVido
10:00AM – 10:45AM
How we Claude Code
(Workshop)
Workshops
·
Thariq Shihipar
11:00AM – 11:45AM
Ship your first Managed Agent
(Workshop)
Workshops
·
Gagan Bhat
11:30AM – 12:00PM
Evals for subjective, stateful agents
Builder stage
·
Yikai Zhu
Ajay Arasanipalai
12:30PM – 02:00PM
Lunch
afternoon sessions
12:00PM – 12:45PM
Agents that remember
(Workshop)
Workshops
·
Tina Vachovsky
01:00PM – 01:45PM
Eval-driven agent development
(Workshop)
Workshops
·
Felix Becker
02:00PM – 02:45PM
Compose multi-agent systems with Skills and MCP
(Workshop)
Workshops
·
Tanveer Mittal
02:50PM – 03:20PM
03:00PM – 03:45PM
03:35PM – 04:05PM
Evening
06:00PM – 08:00PM
Closing reception