There Are Two Kinds of AI Work. What If You’re Missing One?

Insights

Twenty years ago, we shut down a company. Not because the technology was wrong — because the interface was.

We'd built something we called a “conceptual compiler.” The idea was elegant: teach customers to express software as rules, constraints, and context using predicate logic. Our system would transform those abstractions into domain-specific languages, then compile them into running code. You didn't write software. You described what you wanted, and the machine figured out how to build it.

It was a European tech startup, one of six I'd been involved with. The other five either grew or got acquired. This one I had to close. And the reason had nothing to do with whether the approach worked. It worked beautifully — when someone could actually use it.

The problem was the abstraction step. We were asking business people to think in predicate logic. To formalize their intent into structured rules and constraints. To take the messy, intuitive thing they knew about their domain and express it with the precision of a mathematician.

Most people can't do that. They think in stories. They think in examples. They think in “I want it to feel like this” and “here's something similar to what I mean.” Asking them to go from that to formal logic was like asking someone to write sheet music when all they can do is hum the melody. The gap was too wide, and no amount of training or tooling closed it.

I've thought about that failure a lot over the past two decades. And I've been thinking about it even more lately, because the same architectural pattern is suddenly viable again — and most people building with AI are missing it entirely.

The Expensive Mistake Everyone Is Making

Here's what I see happening across the industry right now. Someone decides to build an AI-powered feature. In my world, maybe it's a chatbot that adapts its communication style to different personality types. Maybe it's a content tool that rewrites marketing copy for different audiences. Maybe it's an assessment platform that analyzes narrative responses and maps them to psychological frameworks.

The default approach looks like this: take the user's input, stuff it into a prompt with some instructions, send it to a powerful model at runtime, and hope the response is good enough. Every single interaction hits the API. Every single response requires the model to figure out the rules, understand the context, apply the framework, and generate the output — all in one shot, all in real time.

This works. Sort of. The way a Rube Goldberg machine works. It gets you from A to B, but it's slow, expensive, fragile, and the quality varies wildly from one run to the next.

A single API call to a frontier model like Claude might take six to eight seconds and cost a meaningful amount per interaction. Multiply that across thousands of users, and you're burning money while your users stare at loading spinners.

Worse, you're asking the model to do cognitive work that doesn't need to happen in real time. Every call is re-deriving the same frameworks, re-analyzing the same domain knowledge, re-discovering the same patterns — from scratch.

It's as if you hired a brilliant architect, and instead of having them design the building once, you asked them to redesign it from memory every time someone walked through the front door.

Two Kinds of Intelligence

The insight that changes everything is deceptively simple: there are two fundamentally different kinds of AI work, and they belong in two fundamentally different phases of your system.

Design-time intelligence is the heavy cognitive lifting. Analysis, strategy, framework construction, algorithm design, prompt engineering. This is where you need a model that can hold complex ideas in tension, reason through edge cases, synthesize research into structured outputs, and produce artifacts that encode deep understanding. It's asynchronous. It's cost-tolerant. And it happens before a single user ever touches your product.

Runtime intelligence is execution. The model doesn't need to think deeply — it needs to follow a well-crafted playbook quickly and cheaply. It's synchronous, cost-sensitive, and latency-critical. The quality of its output depends almost entirely on the quality of what it was given at design time.

These are not the same job. They don't require the same model. And conflating them is the architectural sin that's making most AI products slower, more expensive, and less reliable than they need to be.

I use Claude for design-time work. When I need to analyze sixty years of motivation research and produce a structured taxonomy of intrinsic drives, that's a Claude conversation. When I need to design an algorithm that maps narrative language patterns to psychological dimensions, that's a Claude conversation. When I need to create the rules that govern how a communication should be rewritten for someone with a specific motivational profile, that's a Claude conversation.

None of that happens at runtime. None of it needs to.

What happens at runtime is a fast, cheap model — something like Llama 4 Scout — executing against the artifacts that Claude already produced. Scout doesn't need to understand motivation theory. It doesn't need to derive NLP translation rules from first principles. It just needs to follow precise instructions and apply pre-built resources to the input it receives. It can do that in under a second, for a fraction of the cost.

The Two Artifacts

The design-time phase produces two distinct outputs, and understanding the difference between them is critical.

The first is the instruction set — a prompt or directive that tells the runtime model what to do and how to do it. This is the algorithm expressed as natural language. It's the step-by-step logic the model should follow, the decision trees it should navigate, the quality criteria it should apply. Think of it as the program.

The second is the resource file — the context the runtime model needs to execute well. Domain knowledge, taxonomies, examples, reference frameworks, linguistic patterns, vocabulary lists. Everything the fast model would need to “know” but can't figure out on its own in 200 milliseconds. Think of it as the data.

Here's a concrete example. I work with a motivation assessment framework built on decades of research — the kind of research that identifies intrinsic motivational drives through narrative analysis. One application is translating communications to resonate with people who have different motivational profiles.

An “Achiever” — someone driven by personal excellence and recognition — responds to completely different language than a “Relator” — someone driven by connection and collaboration. Same message, same facts, different framing. The Achiever needs to hear about distinction, mastery, and individual impact. The Relator needs to hear about partnership, team success, and shared experience.

At design time, Claude helps me build the resource file. For the Achiever dimension alone, it includes the core motivation (personal excellence, recognition, measurable success), the meta programs (internal reference, options orientation, toward motivation), eight specific translation rules, curated power verbs and linguistic patterns, four framing strategies, and explicit anti-patterns to avoid. The Relator dimension has its own parallel structure with completely different content. So does every other dimension.

This isn't a prompt. It's a comprehensive reference document — structured, indexed, and ready for consumption. The design-time intelligence that went into creating it was substantial. Understanding which linguistic patterns trigger which psychological responses, mapping NLP techniques to motivational dimensions, testing and refining the rules across real communications — that's months of analytical work compressed into a structured artifact.

At runtime, the instruction set tells Scout: “Read the resource file. Identify the target dimension. Apply the translation rules. Use the specified linguistic patterns. Avoid the listed anti-patterns. Return the rewritten communication.” Scout doesn't need to understand why these rules work. It just needs to follow them. And it does — quickly, consistently, and cheaply.

The Compiler Metaphor Isn't a Metaphor

This is where the twenty-year-old failure becomes relevant again. What I'm describing isn't just a useful architectural pattern. It's the same conceptual compiler I tried to build in 2004, with one critical difference.

The original system had three layers:

IntentAbstractionExecution

A human expressed their intent. That intent got formalized into abstract rules and constraints. Those abstractions got compiled into running code. The architecture was sound. The failure point was the first arrow — the one between intent and abstraction. Humans had to do the formalization themselves, and they couldn't.

What most people building with AI do today is skip the abstraction layer entirely. They throw raw intent at a runtime model and hope it figures everything out on the fly. Sometimes it works. Often it's mediocre. Always it's more expensive and slower than it needs to be.

The approach I'm describing reintroduces the abstraction layer, but replaces the human formalizer with a model that's actually good at abstraction:

IntentClaude does the abstractionStructured artifactsFast model executes

Claude is the compiler. The instruction set and resource file are the compiled output. Scout is the runtime. And the human never has to think in predicate logic — they just have conversations with Claude about what they want, and Claude produces the formal artifacts that make it work.

The twenty-year-old architecture was right. The interface was wrong. AI fixed the interface.

What This Looks Like in Practice

Let me walk through the actual workflow, because the tactical details matter as much as the theory.

Step 1: The Design-Time Conversation. This is an extended dialogue with Claude. I'm not writing prompts — I'm thinking out loud. I'll describe what I need the system to do, share examples of inputs and desired outputs, discuss edge cases, push back on Claude's suggestions, refine the approach. This conversation might take an hour. It might take several sessions. The goal isn't a single output — it's a thorough exploration of the problem space.

For the motivation translation system, these conversations covered the research foundations, the differences between dimensions, how NLP techniques map to motivational profiles, what linguistic markers distinguish good translations from bad ones, and dozens of specific examples. Claude was doing real analytical work here — synthesizing research, identifying patterns, proposing rules, and stress-testing them against edge cases.

Step 2: Artifact Production. Once the thinking is solid, Claude produces the two artifacts. The instruction set gets refined through iteration — I'll test it with sample inputs, identify where the runtime model goes wrong, and adjust the instructions until the outputs are consistently good. The resource file gets built out comprehensively, with enough detail that the runtime model has everything it needs and nothing it doesn't.

This is where Claude's strength in structured output really matters. The resource file isn't a wall of text — it's organized into sections that the runtime model can navigate efficiently. Rules are explicit and actionable. Examples are concrete. Anti-patterns are clearly stated. The structure itself is part of the intelligence.

Step 3: Runtime Integration. The instruction set becomes the system prompt. The resource file gets loaded into context. The runtime model receives user input and executes against both artifacts. Response time drops from eight seconds to under one. Cost per interaction drops by an order of magnitude or more. And quality? Quality actually goes up, because the runtime model is following a playbook refined through extensive design-time analysis rather than winging it from a generic prompt.

Step 4: Iteration. This is where the pattern pays compound dividends. When something doesn't work right at runtime, I don't try to fix it by tweaking the runtime prompt. I go back to the design-time conversation with Claude. We analyze why the output was wrong, identify what the instruction set or resource file is missing, and produce updated artifacts. Each iteration makes the runtime execution more reliable.

Over time, the artifacts accumulate intelligence. The resource file gets richer. The instruction set gets more precise. The runtime model's outputs get more consistent. And all of that intelligence is durable — it persists in the artifacts, not in any single API call.

Why This Is Hard to See

If the pattern is this effective, why isn't everyone doing it?

Three reasons.

First, it requires thinking about AI as two different tools rather than one. Most people's mental model of AI is “I type something, the AI responds.” That's a runtime-only model. The idea that you'd spend hours in conversation with one model to produce artifacts for a different model to execute — that's a different way of thinking about the technology entirely. It requires treating AI as infrastructure, not just an interface.

Second, the design-time work doesn't feel productive in the moment. You're having long conversations. You're iterating on documents. You're testing and refining rules. There's no immediate user-facing output. It feels like preparation, and our industry has a strong bias toward shipping over preparing. But the preparation is where the leverage lives. An hour of design-time work might save thousands of dollars in runtime costs and produce consistently better outputs for months.

Third, it requires understanding that different models have different strengths. Claude is exceptional at analysis, synthesis, and structured output. It can hold complex frameworks in context and reason about edge cases. But it costs more and takes longer per call.

A model like Llama 4 Scout is fast and cheap, follows instructions well, and produces solid output when given good direction — but it won't derive a novel analytical framework from first principles. Most people default to using one model for everything, either overpaying for runtime execution or under-investing in design-time thinking.

The Deeper Pattern

There's something bigger happening here than a cost optimization trick.

What I'm describing is really about where intelligence should live in a system.

The traditional software answer is “in the code.” The naive AI answer is “in the model.” But the right answer, increasingly, is “in the artifacts.

The instruction sets and resource files I'm producing aren't code and they aren't model weights. They're a third thing — structured knowledge artifacts that encode domain expertise in a format that any capable model can execute against. They're portable. They're version-controlled. They're human-readable and human-editable. And they represent the accumulated intelligence of every design-time conversation that produced them.

This is what the conceptual compiler was trying to achieve twenty years ago. A layer of formalized knowledge between human intent and machine execution. The difference is that now, the formalization itself is automated. The brilliant architect designs the building once, and the construction crew builds it reliably every time.

The people who figure this out early will build AI products that are faster, cheaper, and more reliable than their competitors. Not because they have access to better models — everyone has access to the same models. But because they understand that the intelligence doesn't belong at runtime. It belongs in the preparation.

The hard work should happen before anyone clicks a button.