Eval-driven agent development

Start from a working-but-flaky agent. Write a 10-case eval set, run it, watch it fail, iterate the system prompt, run again, watch the score move. Ends with the eval wired as a CI gate so regressions can't ship.

Details

City

Date

Time

Agenda

08:00AM – 09:00AM

Check in and breakfast

No items found.

10:00AM – 10:30AM

Morning break

morning sessions

No items found.

12:30PM – 02:00PM

Lunch

afternoon sessions

No items found.

Evening

06:00PM – 08:00PM

Anthropic's developer conference

Join us for a day of hands-on workshops, live demos of new capabilities and conversations with the teams behind Claude. Watch live from anywhere, or apply for an in-person seat in San Francisco, London, or Tokyo.