Evals for Taste: Hill-Climbing a Slide-Generation Agent

Built rubric-driven replayable eval system from real user projects giving quality/cost/latency/error/token signals in <6 hours per model change. Evolved into dev flywheel powered by real user dissatisfaction signals.

Details

City
London, UK
Date
20 May 2026
Time
1:00PM – 1:45PM
Speaker(s)
Jiri De Jonghe
Member of Technical Staff,
 
Anthropic

Agenda

08:00AM – 09:00AM
Check-in and breakfast
9:30AM – 9:45AM
morning sessions
10:00AM – 10:30AM
10:00AM – 10:45AM
How we Claude Code
Workshops
·
Arnaud Doko
11:00AM – 11:45AM
Ship your first Managed Agent
Workshops
·
Isabella He
12:00PM – 12:45PM
Agents that Remember
Workshops
·
Kevin Chen
12:00PM – 01:20PM
Lunch
afternoon sessions
Evening
03:45PM – 06:00PM
Closing reception