Evals for subjective, stateful agents

Built rubric-driven replayable eval system from real user projects giving quality/cost/latency/error/token signals in <6 hours per model change. Evolved into dev flywheel powered by real user dissatisfaction signals.

Details

City
San Francisco, USA
Date
May 7, 2026
Time
11:30AM – 12:00PM
Speaker(s)
Yikai Zhu
Software Engineer,
 
Descript
Ajay Arasanipalai
AI Researcher,
 
Descript