Articles

Articles on AI, coaching, product strategy, and building a business that matters.

7 posts tagged "Evals"

Inside a Claude Skill: The Eight Steps That Build the Red Dot skill

AIJun 9, 2026

I open the hood on the Claude Skill that builds mirror assessments — the eight-step pipeline, the single spec it compiles, and the checker that won't ship broken.

AIProduct WorkClaude SkillsEvals

How to Tell If Your Scoreboard Is Lying to You

AIMay 30, 2026

A pizza shop's flyer contest shows why your AI scoreboard can climb while the work gets worse - and why a measure that can't be gamed is the real job.

AIProduct WorkEvals

The Key Comes Before the Song

AIMay 22, 2026

When you build with AI doing the content generation, the obvious move is QA at the end. But if that's your whole strategy, you sat down at the piano without picking a key. Here's how I'm thinking about evals for a new product — and why guardrails belong upstream, not downstream.

AIProduct WorkEvals

The 4 scans I run before I'm done with any AI-assisted project

AIMay 14, 2026

There are no magic prompts. But there are four scans I run after every first pass on AI-assisted code: race conditions, concurrency, idempotency, and dead code. Each one catches issues you'd otherwise debug months later.

AIProduct WorkEvals

The harness is the craft.

AIMay 10, 2026

Most engineers pick a model they trust, eyeball a couple of runs, and ship. I don't ship that way. Here's what eval-driven development actually looks like, and the seven principles I'd hand to anyone shipping LLM systems.

AIProduct WorkEvals

The 4-Part Loop That Eliminates AI Slop (in Your Apps and Your Content)

AIApr 17, 2026

If your AI output keeps hitting a ceiling, the fix isn't better prompting, it's a scoring loop with an independent judge. Here's the 4-part pattern, with the prompt I use.

AIEvals

Why I Never Let AI Grade Its Own Work

AIApr 4, 2026

Most people accept their first AI draft because it exceeded their expectations. But your expectations of AI aren't your standards. Here's what happened when I made Claude Code and Codex evaluate each other using my own voice profile and audience segments as the rubric.

AIClaude CodeEvals

Want to talk about what you just read?

Every great conversation starts with a single question.