Evals for Taste: Hill-Climbing a Slide-Generation Agent
"Build better evals" is the most repeated advice in AI engineering. The hard part is doing it when the output is a slide deck. In 45 minutes you'll wire up a Managed Agent that generates decks, score it against SlidesBench, and iterate the prompt based on what fails. You'll leave knowing how to turn "this looks bad" into a number you can move, with a working eval loop to prove it.
Details
City
San Francisco, USA
Date
May 7, 2026
Time
01:00PM – 01:45PM
Session type
Workshop
Speaker(s)
Felix Becker
Anthropic
Anthropic's developer conference, recorded
Keynotes, demos, and conversations with the teams behind Claude. Recorded at Code w/ Claude 2026 San Francisco and ready to replay.