Articles

Articles on AI, coaching, product strategy, and building a business that matters.

All AI Encoding Expertise AI Adoption Agentic Software Vibe Coding Claude Skills AI Economics Evals AI Content Agents Insights Coaching Business Advice Agency Advice Product Work Marketing & Communications WordPress

9 posts tagged "Evals"

I Created a Coding Harness Without Realizing It

AIJul 7, 2026

I sat down to learn Mastra with my Claude Code habits and my Cloudflare opinions in my back pocket. A few agents and a workflow later, I'd accidentally built a full coding harness. Here are the seven rules I refused to bend — and what each one looks like in running, open-source code.

AIAgentic SoftwareEvalsAgents

The AI Interviewed a Woman Who Doesn't Exist. That Was the Point.

AIJul 5, 2026

I spent a weekend building a sales executive who doesn't exist, handed her fifteen secrets, and let an AI try to pull them out. The fake expert was the point: it's the first time an interview had an answer key, so it could earn a score instead of a compliment. Here's what testing an AI honestly actually costs.

AIAgentic SoftwareEncoding ExpertiseEvals

Inside a Claude Skill: The Eight Steps That Build the Red Dot skill

AIJun 9, 2026

I open the hood on the Claude Skill that builds mirror assessments — the eight-step pipeline, the single spec it compiles, and the checker that won't ship broken.

AIProduct WorkClaude SkillsEvals

How to Tell If Your Scoreboard Is Lying to You

AIMay 30, 2026

A pizza shop's flyer contest shows why your AI scoreboard can climb while the work gets worse - and why a measure that can't be gamed is the real job.

AIProduct WorkEvals

The Key Comes Before the Song

AIMay 22, 2026

When you build with AI doing the content generation, the obvious move is QA at the end. But if that's your whole strategy, you sat down at the piano without picking a key. Here's how I'm thinking about evals for a new product — and why guardrails belong upstream, not downstream.

AIProduct WorkEvals

The 4 scans I run before I'm done with any AI-assisted project

AIMay 14, 2026

There are no magic prompts. But there are four scans I run after every first pass on AI-assisted code: race conditions, concurrency, idempotency, and dead code. Each one catches issues you'd otherwise debug months later.

AIProduct WorkEvals

The harness is the craft.

AIMay 10, 2026

Most engineers pick a model they trust, eyeball a couple of runs, and ship. I don't ship that way. Here's what eval-driven development actually looks like, and the seven principles I'd hand to anyone shipping LLM systems.

AIProduct WorkEvals

The 4-Part Loop That Eliminates AI Slop (in Your Apps and Your Content)

AIApr 17, 2026

If your AI output keeps hitting a ceiling, the fix isn't better prompting, it's a scoring loop with an independent judge. Here's the 4-part pattern, with the prompt I use.

AIEvals

Why I Never Let AI Grade Its Own Work

AIApr 4, 2026

Most people accept their first AI draft because it exceeded their expectations. But your expectations of AI aren't your standards. Here's what happened when I made Claude Code and Codex evaluate each other using my own voice profile and audience segments as the rubric.

AIClaude CodeEvals

Want to talk about what you just read?

Every great conversation starts with a single question.

Let's Talk