March 30, 2026

The Hardest Thing in AI Right Now Isn't AI

AI feedback loops can't improve without rubrics and quality criteria extracted from human expertise. The real bottleneck isn't technical.

Chris Lema · AI

Back when this all started, we were just thrilled that AI could produce text better than what we'd scribbled down.

We weren't trying for consistency. We weren't setting standards. We were marveling at the fact that you could type a few sentences and get back something that sounded like a real person wrote it. Maybe even better than what we would have written ourselves.

That was fine. For a while.

The Shift Nobody Noticed

Somewhere along the way, the conversation changed. People started talking about agents. Agent collaboration. Harness decisions. Model selection. Which orchestration framework. Which feedback loop architecture.

And all of that matters. I'm not saying it doesn't.

But here's what I keep noticing: teams are building increasingly sophisticated systems that have no idea what "good" looks like.

They've got agents that can iterate on their own work. They've got feedback loops that can score output and try again. They've got harness architectures that route tasks to the right model at the right time.

And the output is still inconsistent. Still mediocre. Still missing something that the team can feel but can't name.

The feedback loop spins. But it spins empty.

The Wrong Bottleneck

Here's the thing everyone in AI wants to avoid saying: the hardest problem right now isn't technical.

It's not agents. It's not model selection. It's not which harness you pick or how you configure the collaboration between multiple models. There are new tools coming every week that handle those things. Hermes Agent, for example, builds the feedback loop right into the agent harness automatically. The plumbing is getting solved.

What's not getting solved is the question the plumbing depends on: what are we evaluating against?

Think about it. A feedback loop needs criteria. A scoring function needs a rubric. An agent that's supposed to improve its own output needs to know what "improved" means. Without quality criteria, AI agent evaluation is just a loop chasing its own tail. And right now, in most organizations, that knowledge exists in exactly one place.

People's heads.

Your head. My head. The heads of the people on our teams who've spent years developing judgment about what good work looks like in their domain.

We've internalized it. We know a good design when we see it. We know a solid architecture when we review it. We know effective copy when we read it. We know a clean data pipeline when we trace through it.

But we've never had to write it down. We never had to externalize it. Because before AI, we were the evaluation function. We looked at the work, applied our judgment, and said "this needs to be better" or "this is ready."

AI can write. It just can't think. And it definitely can't evaluate without criteria that came from someone who knows the domain.

Now the agent needs to do that. And it can't read our minds.

Where the Difficulty Actually Lives

The real bottleneck in AI right now sits outside the computer and outside the networks.

It's in us.

I've actually been thinking about this problem for a long time. One of my first startups, back in the late 1990s, was a company called CKO, named for "chief knowledge officer." We were predicting that every company would need someone in that role, and an internal platform to capture corporate knowledge and expertise. We were early. Really early. But the core problem we were trying to solve is the same one that's stalling AI adoption right now: the knowledge that matters most lives inside people, and nobody's built a good system for getting it out.

That sounds like a criticism. It's not. It's actually the most encouraging thing I can tell you if your expertise has been making you nervous about where AI is heading.

Because here's what it means: the thing that makes AI work well isn't better prompting or smarter agents or more expensive models. It's the knowledge extraction that humans have to do, pulling their expertise out of their own heads so the system has something to work with.

Rubrics. Frameworks. Quality criteria. Scoring definitions. The difference between a 6 and an 8, written down clearly enough that a machine can act on it.

That's the hard work. And it's work that only someone with deep domain expertise can do.

You can't build a rubric for evaluating code quality if you've never written and reviewed thousands of lines of code. You can't define what "good" technical writing looks like if you haven't spent a decade learning what makes documentation actually useful. You can't articulate the difference between a solid data architecture and a fragile one if you haven't built and broken both.

The AI doesn't have that knowledge. It never will. It needs someone to extract it and put it into a form the feedback loop can use.

That someone is the person who's spent 10 or 15 years building deep expertise in their domain.

What Extraction Looks Like

So what do you actually do with this?

Start with what you reject. This is the trick most people miss when externalizing expertise. Articulating what "good" looks like from scratch is brutal. Your brain doesn't work that way. But show someone a piece of work and ask them what's wrong with it? That's easy. We do that all day.

So start there. Collect the things you reject. The output that isn't good enough. Then ask yourself: why? What specifically is wrong? Write that down. Every time.

After a while, patterns emerge. The reasons you reject work start to cluster. Those clusters are your quality criteria.

Then flip it. If you reject work that lacks specificity, then specificity is a criterion. If you reject work that sounds generic, then distinctiveness is a criterion. If you reject work that misses the audience, then audience alignment is a criterion.

Now you've got a rubric. Not a perfect one. But a real one, built from your actual expertise, not from some template someone downloaded.

And here's what happens when you feed that rubric into your AI feedback loop: the loop actually works. The agent has something to evaluate against. The scoring function produces meaningful scores. The iterations improve in directions that matter.

The output goes from "technically correct but somehow wrong" to "this is actually what I was looking for."

That's the difference between a feedback loop that spins empty and one that produces real improvement.

Your Evolved Role

Here's what I think this means for people who've spent years building expertise in a specific domain.

The conversation about AI keeps framing expertise as the thing that's getting replaced. The QA engineer who can be replaced by AI testing tools. The technical writer who can be replaced by AI documentation generators. The data engineer who can be replaced by AI pipeline builders.

But that framing misses the real story.

Those tools can do the execution. Increasingly well, actually. But they can't define what good execution means. They can't build the rubric. They can't set the criteria. They can't look at output and say "this is a 4 and it needs to be an 8, and here's specifically why."

That's what domain expertise does. And that's the work that doesn't get automated, because it requires exactly the kind of judgment that only comes from years of doing the work and developing a feel for what quality means.

The evolved role isn't "person who does the work." It's "person who defines what good work looks like." I've written before about the four levels of AI work, and this sits at the highest level: not doing, not delegating, not orchestrating, but defining. It's the person who extracts their internalized standards into something an agent can evaluate against. The person who builds and maintains the rubrics. The person who notices when the criteria need to evolve because the domain is changing.

That's not a smaller role. That's a bigger one.

I've been working through this myself. The content system I use, built around tools like YourVoiceProfile.com, YourAudienceSegments.com, and YourContentAgent.com, exists because I had to externalize what I know about creating content that sounds like me. Voice profiles, audience definitions, quality criteria, scoring rubrics. All of it extracted from my head into forms that an AI can actually use.

It was hard. Not technically hard. The tools are getting easier every day. It was hard because I had to articulate things I'd never articulated before. Standards I'd always just felt. Patterns I'd always just recognized.

But once I did that extraction, the feedback loop had something to work with. The quality went up. The consistency went up. And I was free to focus on the things only I can do: form opinions, tell stories, connect ideas nobody else is connecting.

The Real Work Starts Now

The hardest thing in AI right now isn't in AI.

It's in the space between what you know and what you've written down. The gap between your internalized standards and the rubrics that would let a system act on them.

Every week, another harness gets smarter. Another agent framework gets released. Another feedback loop architecture gets published. The plumbing is getting solved.

But the plumbing only works when there's something to flow through it. And that something is your expertise, externalized.

So the most valuable thing you can do right now isn't pick another model or learn another framework or evaluate another agent harness. It's sit down and start writing down what you know. What you accept. What you reject. Why.

That's the hard work. And you're the only one who can do it.

A story. An insight. A bite-sized way to help.

Get every article directly in your inbox every other day.

I won't send you spam. And I won't sell your name. Unsubscribe at any time.

About the Author

Chris Lema has spent twenty-five years in tech leadership, product development, and coaching. He builds AI-powered tools that help experts package what they know, build authority, and create programs people pay for. He writes about AI, leadership, and motivation.