Evals for Taste: Hill-Climbing a Slide-Generation Agent

"Build better evals" is the most repeated advice in AI engineering. The hard part is doing it when the output is a slide deck. In 45 minutes you'll wire up a Managed Agent that generates decks, score it against SlidesBench, and iterate the prompt based on what fails. You'll leave knowing how to turn "this looks bad" into a number you can move, with a working eval loop to prove it.

Details

City

San Francisco, USA

Date

May 7, 2026

Time

01:00PM – 01:45PM

Session type

Workshop

Speaker(s)

Felix Becker

Member of Technical Staff,

Anthropic

Anthropic's developer conference, recorded

Keynotes, demos, and conversations with the teams behind Claude. Recorded at Code w/ Claude 2026 San Francisco and ready to replay.