Evals for subjective, stateful agents

Built rubric-driven replayable eval system from real user projects giving quality/cost/latency/error/token signals in <6 hours per model change. Evolved into dev flywheel powered by real user dissatisfaction signals.
Learn more

Details

City
San Francisco
Date
May 7, 2026
Time
11:30AM – 12:00PM
Speakers
Yikai Zhu
Software Engineer,
 
Descript
Ajay Arasanipalai
AI Researcher,
 
Descript

Agenda

Demos and office hours run all day. Drop by between sessions. All times in Pacific Time (PT).
08:00AM – 09:00AM
Check in and breakfast
09:00AM – 09:45AM
Keynote
Claude Code
·
Boris Cherny
10:00AM – 10:30AM
Morning break
morning sessions
10:00AM – 10:30AM
Do agents dream of data models?
Founder stage
·
Caitlin Colgrove
Claude and Sol the trophy tomato
Builder stage
·
Martin DeVido
10:00AM – 10:45AM
How we Claude Code
(Workshop)
Workshops
·
Thariq Shihipar
11:00AM – 11:45AM
Ship your first Managed Agent
(Workshop)
Workshops
·
Gagan Bhat
11:30AM – 12:00PM
Evals for subjective, stateful agents
Builder stage
·
Yikai Zhu
Ajay Arasanipalai
12:30PM – 02:00PM
Lunch
afternoon sessions
12:00PM – 12:45PM
Agents that remember
(Workshop)
Workshops
·
Tina Vachovsky
01:00PM – 01:45PM
Eval-driven agent development
(Workshop)
Workshops
·
Felix Becker
02:00PM – 02:45PM
Compose multi-agent systems with Skills and MCP
(Workshop)
Workshops
·
Tanveer Mittal
02:50PM – 03:20PM
03:00PM – 03:45PM
03:35PM – 04:05PM
Evening
06:00PM – 08:00PM
Closing reception