Eval-driven agent development

Start from a working-but-flaky agent. Write a 10-case eval set, run it, watch it fail, iterate the system prompt, run again, watch the score move. Ends with the eval wired as a CI gate so regressions can't ship.
Learn more

Details

City
Date
Time

Agenda

08:00AM – 09:00AM
Check in and breakfast
No items found.
10:00AM – 10:30AM
Morning break
morning sessions
No items found.
12:30PM – 02:00PM
Lunch
afternoon sessions
No items found.
Evening
06:00PM – 08:00PM