Evals for taste: Hill-climbing a slide-generation agent

Built rubric-driven replayable eval system from real user projects giving quality/cost/latency/error/token signals in <6 hours per model change. Evolved into dev flywheel powered by real user dissatisfaction signals.

Details

City

Tokyo, JP

Date

11 June 2026

Time

13:00 – 13:45

Speaker(s)

Jonah Dueck

Member of Technical Staff,

Anthropic

Language

English

Agenda

Demos and office hours run all day. Drop by for a demo between sessions. Sign up in advance for office hours. All times in Japan Standard Time (JST).

A note on language. Sessions run primarily in English with some in Japanese (marked on the agenda), and live simultaneous interpretation is available in both directions throughout the event. Office hours are held in English.

08:00 – 09:00

Check-in and breakfast

09:30 – 09:45

Community general session

(

Founder stage

)

Boris Cherny

Anthropic

morning sessions

10:00 – 10:30

What happens when domain experts can finally build

(

Builder stage

)

Builder stage

Jason Tangen

University of Queensland

Building AI-native across industries with NTT, Mizuho and Mercari

(

Founder stage

)

Founder stage

Carlos Donderis

Mercari

Takashi Ebihara

NTT Inc

Tatsuto Fujii

Mizuho Financial Group, Inc.

10:00 – 10:45

How we Claude Code

(

Workshop

)

Workshop

Jason Schwartz

Anthropic

10:45 – 11:15

From Claude prototype to production: How Myrealtrip builds and ships AI workflows

(

Founder stage

)

Founder stage

Wonjin Hur

Myrealtrip

Code for loved ones: Building customized software to bypass language barriers with my girlfriend

(

Builder stage

)

Builder stage

Siyuan Yan (Matt)

Kyushu University

11:00 – 11:45

Ship your first Managed Agent

(

Workshop

)

Workshop

Koki Yoshida

Anthropic

11:30 – 12:00

The 1% problem: How domain expertise + Claude let a 2-person team hit #1 on a global classification benchmark

(

Founder stage

)

Founder stage

Gahee Seo

Federation

How to integrate Claude into your daily life

(

Builder stage

// presented in Japanese

)

Builder stage

Yuta Hayashi

Determinant, Inc.

12:00 – 12:45

Agents that remember

(

Workshop

)

Workshop

Sam Jiang

Anthropic

11:45 – 13:30

Lunch

afternoon sessions

13:00 – 13:45

Evals for taste: Hill-climbing a slide-generation agent

(

Workshop

)

Workshop

Jonah Dueck

Anthropic

13:15 – 13:45

The last mile is the spec

(

Founder stage

// presented in Japanese

)

Founder stage

Hitoshi Tsuyuki

Tsukumo Labs Inc.

Sumiki Hori

Tsukumo Labs Inc.

Same model, three different worlds: Japan, India, Australia

(

Builder stage

)

Builder stage

Jason Bigman

Anthropic

Rye Smith

Spruik Co.

Sumeet Doshi

HaemoLink

Yuta Hayashi

Determinant, Inc.

14:00 – 14:30

Code less, query more with sub-agents and data

(

Founder stage

// presented in Japanese

)

Founder stage

Kenta Yamamoto

primeNumber Inc.

How I built a legal platform for 280 million people at the Claude Code Hackathon

(

Builder stage

)

Builder stage

Ilham Firdausi Putra

Pasal.id

14:00 – 14:45

Tool, skill, or subagent? Decomposing an agent that outgrew its prompt

(

Workshop

)

Workshop

Karan Sampath

Anthropic

14:45 – 15:15

Manager 11: How I finally shipped my dream football management game with Claude

(

Founder stage

)

Founder stage

Rye Smith

Spruik Co.

How I shipped a life-saving app with Claude Code

(

Builder stage

)

Builder stage

Sumeet Doshi

HaemoLink

15:00 – 15:45

Agent Battle: Mine the most diamonds in 45 minutes

(

Workshop

)

Workshop

Liam Plambeck

Anthropic

15:30 – 16:00

Rewriting Bun in Rust

(

Founder stage

)

Founder stage

Jarred Sumner

Anthropic

Evening

15:45 – 18:00

Closing reception

Anthropic's developer conference

Join us for a day of hands-on workshops, live demos of new capabilities and conversations with the teams behind Claude. Watch live from anywhere.