You should build evaluation engineers instead of evaluation sets
How to design evals for long-running agent harnesses that Claude Code can hill climb. Current evaluation approaches are insufficient for agents working hours on complex tasks.
Details
City
San Francisco
Date
May 7, 2026
Time
10:45AM – 11:15AM
Speakers
Anker Bach Ryhl
Parahelp
Agenda
Demos and office hours run all day. Drop by between sessions. All times in Pacific Time (PT).
08:00AM – 09:00AM
Check in and breakfast
09:00AM – 09:45AM
10:00AM – 10:30AM
Morning break
morning sessions
10:00AM – 10:30AM
10:00AM – 10:45AM
10:45AM – 11:15AM
11:00AM – 11:45AM
11:30AM – 12:00PM
12:30PM – 02:00PM
Lunch
afternoon sessions
12:00PM – 12:45PM
01:00PM – 01:45PM
01:20PM – 01:50PM
postvisit.ai - How a practicing cardiologist built a working patient followup platform in 7 days
Builder stage
·
Michal Nedoszytko
02:00PM – 02:45PM
02:05PM – 02:35PM
13 years to 24 hours: How I used Claude to solve a problem I had no business solving
Builder stage
·
Philip Parkinson
02:50PM – 03:20PM
Claude can code. Experts still matter: Building BioKEA's biodiversity stack
Founder stage
·
Sean Jungbluth
03:00PM – 03:45PM
03:35PM – 04:05PM
Evening
06:00PM – 08:00PM
Closing reception
Anthropic's developer conference
Join us for a day of hands-on workshops, live demos of new capabilities and conversations with the teams behind Claude. Watch live from anywhere, or apply for an in-person seat in San Francisco, London, or Tokyo.