You should build evaluation engineers instead of evaluation sets

How to design evals for long-running agent harnesses that Claude Code can hill climb. Current evaluation approaches are insufficient for agents working hours on complex tasks.

Details

City

San Francisco

Date

May 7, 2026

Time

10:45AM – 11:15AM

Speakers

Anker Bach Ryhl

CEO & Co-founder,

Parahelp

Agenda

Demos and office hours run all day. Drop by between sessions. All times in Pacific Time (PT).

08:00AM – 09:00AM

Check in and breakfast

09:00AM – 09:45AM

Keynote

Claude Code

Boris Cherny

10:00AM – 10:30AM

Morning break

morning sessions

10:00AM – 10:30AM

Do agents dream of data models?

Founder stage

Caitlin Colgrove

Claude and Sol the trophy tomato

Builder stage

Martin DeVido

10:00AM – 10:45AM

How we Claude Code

(Workshop)

Workshops

Thariq Shihipar

10:45AM – 11:15AM

You should build evaluation engineers instead of evaluation sets

Founder stage

Anker Bach Ryhl

Designing multi-agent systems: When to split, when to sandbox, what to ship

Builder stage

Nick Khami

11:00AM – 11:45AM

Ship your first Managed Agent

(Workshop)

Workshops

Gagan Bhat

11:30AM – 12:00PM

How we got here: Founders on AI bets, builds, and predictions

Founder stage

Clay

Silvia

Evals for subjective, stateful agents

Builder stage

Yikai Zhu

Ajay Arasanipalai

12:30PM – 02:00PM

Lunch

afternoon sessions

12:00PM – 12:45PM

Agents that remember

(Workshop)

Workshops

Tina Vachovsky

01:00PM – 01:45PM

Eval-driven agent development

(Workshop)

Workshops

Felix Becker

01:20PM – 01:50PM

Measure twice, cut once: Closing the gap between AI intent and execution

Founder stage

David Loker

postvisit.ai - How a practicing cardiologist built a working patient followup platform in 7 days

Builder stage

Michal Nedoszytko

02:00PM – 02:45PM

Compose multi-agent systems with Skills and MCP

(Workshop)

Workshops

Tanveer Mittal

02:05PM – 02:35PM

Listen first. Then, keep thinking

Founder stage

Mike Brown

13 years to 24 hours: How I used Claude to solve a problem I had no business solving

Builder stage

Philip Parkinson

02:50PM – 03:20PM

Claude can code. Experts still matter: Building BioKEA's biodiversity stack

Founder stage

Sean Jungbluth

Coherence at Claude Code speed

Builder stage

Jon McBee

03:00PM – 03:45PM

Agent Battle: Mine the most diamonds in 45 minutes

(Workshop)

Workshops

Matt Roknich

03:35PM – 04:05PM

Claude & the self-driving company

Founder stage

Nicolai Ouporov

Era Online: Resurrecting a 1999 MMORPG with Claude Code

Builder stage

Kyle Easterly

Evening

06:00PM – 08:00PM

Closing reception

Anthropic's developer conference

Join us for a day of hands-on workshops, live demos of new capabilities and conversations with the teams behind Claude. Watch live from anywhere, or apply for an in-person seat in San Francisco, London, or Tokyo.