Evals for Taste: Hill-Climbing a Slide-Generation Agent

"Build better evals" is the most repeated advice in AI engineering. The hard part is doing it when the output is a slide deck. In 45 minutes you'll wire up a Managed Agent that generates decks, score it against SlidesBench, and iterate the prompt based on what fails. You'll leave knowing how to turn "this looks bad" into a number you can move, with a working eval loop to prove it.

Details

City
San Francisco, USA
Date
May 7, 2026
Time
01:00PM – 01:45PM
Session type
Workshop
Speaker(s)
Felix Becker
Member of Technical Staff,
 
Anthropic