Right Model for the Task

Monday morning, November 24, I was deep into clear/concise/precise refinement.

Looking at agent files. Reviewing instructions. Cutting verbosity. Adding precision.

And I started thinking about the agents themselves. Some were doing complex analysis. Some were just running commands.

Did they all need the same model?

The Question

I had agents doing complex work. Analyzing output, categorizing problems, identifying patterns, making decisions about what needed fixing.

I had agents doing simple work. Invoking commands, capturing output, returning formatted results.

They were both using Sonnet. Same capability. Same cost. Same speed.

Did they both need that capability?

I wasn’t sure. I knew Claude had three models – Opus, Sonnet, and Haiku. But I didn’t know when to use each one or how to decide which agents needed which models.

Getting Help

I asked Claude to help me understand the different models.

Claude explained the trade-offs:

Opus: Most capable. Best at complex reasoning, nuanced analysis, difficult decisions. Slowest. Most expensive.

Sonnet: Balanced. Good reasoning, fast enough, reasonable cost. The default choice for most work.

Haiku: Fastest. Cheapest. Great at straightforward tasks. Not designed for complex reasoning.

Three models. Different strengths. Different trade-offs.

That made sense. But which agents needed which models?

Analyzing the Agents

I asked Claude to analyze every agent and recommend model assignments.

Claude went through each agent, categorized the work, and made recommendations:

Complex work requiring Opus:

Analyzing output and categorizing problems
Evaluating quality and making subjective judgments
Finding patterns across different contexts

Simple work suitable for Haiku:

Running commands and capturing output
Reading files and returning content
Combining lists and sorting data

Middle-ground work good for Sonnet:

Applying patterns with some reasoning required
Making fixes with understanding of context

Clear divisions. Some agents needed complex reasoning. Some agents just needed to execute instructions. Some agents fell in between.

The Realization

I’d been treating all agents the same. Give them all the same capability. Let them all use the same model.

But Claude’s analysis showed that was inefficient. Different tasks need different capabilities.

The complex agents needed Opus. They were doing hard work. Analysis. Decision making. Pattern recognition. That’s what Opus was designed for.

The simple agents could use Haiku. They were running commands. Formatting output. Following straightforward instructions. Haiku should be perfect for that. Faster. Cheaper.

The middle agents could stay on Sonnet. Balanced capability for balanced complexity.

Match the model to the work.

Updating All Agents

I went through every agent file and updated the frontmatter based on Claude’s recommendations.

Complex agents got Opus. Simple agents got Haiku. Middle agents kept Sonnet. All agents had appropriate tool restrictions.

Then I tested the workflow.

Testing Revealed Reality

I ran the workflow with the new model assignments.

Complex agents using Opus were more thorough. Better pattern recognition. More nuanced categorization. They’d occasionally catch edge cases that Sonnet had missed. Those worked well.

But the agents I’d switched to Haiku were mixed.

Some agents ran perfectly with Haiku. Commands executed. Output formatted. Results returned. Fast and consistent.

But other agents didn’t run as consistently as expected. They’d follow instructions most of the time. But occasionally they’d miss something or format output slightly wrong. Small inconsistencies that added up.

Those agents went back to Sonnet.

The Real Lesson

Theory versus practice.

Claude’s recommendations made sense theoretically. Simple tasks should work with Haiku. Complex tasks need Opus.

But in practice, some agents that seemed simple actually needed Sonnet’s capabilities for consistency.

The lesson wasn’t just “match model to task complexity.” It was “test and validate your assumptions.”

Some agents worked perfectly with Haiku. Those stayed on Haiku. Faster and cheaper with no downsides.

Some agents needed Sonnet for consistency. Those went back to Sonnet. Worth the cost for reliable results.

Complex agents stayed on Opus. They needed the reasoning capability.

Why This Matters

Model selection isn’t just theoretical analysis. It’s empirical testing.

You can analyze complexity. Make recommendations. Update configuration.

But then you have to test. Run the workflow. Watch for inconsistencies. Validate that the model actually works for that agent’s job.

Some agents will work with lighter models. Some won’t. The only way to know is to try and measure the results.

Optimization requires both analysis and validation.

The Pattern

This is a pattern I’d see repeatedly: test your assumptions.

You can reason about what should work. But you have to test what actually works.

Sometimes the simpler solution works perfectly. Sometimes it introduces subtle problems. Sometimes the problems are acceptable trade-offs. Sometimes they’re not.

Analyze first. Configure based on analysis. Test the configuration. Adjust based on results.

Getting Help to Optimize

I couldn’t have optimized the models without help. I didn’t know when to use Opus versus Haiku. I didn’t know which agents needed which capabilities.

Claude analyzed every agent. Made recommendations. Explained the trade-offs. Showed me how to configure frontmatter.

That’s the value of working with AI on AI systems. Claude understood its own capabilities better than I could. It could evaluate agent complexity and recommend appropriate models.

But even Claude couldn’t predict which agents would work consistently with Haiku. That required testing.

Sometimes optimization requires expertise you don’t have. Ask for help. Then validate the recommendations.

The Result

By Monday afternoon, every agent had appropriate model configuration. Complex agents using Opus. Some simple agents using Haiku where they ran consistently. Other agents using Sonnet where they needed it for reliability.

Not the clean division I’d expected. But optimized based on actual performance.

Better performance where Haiku worked. Consistent results where Sonnet was needed. Better analysis where Opus mattered.

The lesson: match capability to complexity. Test your model assignments. Keep what works. Adjust what doesn’t.

Optimization isn’t just configuration. It’s validation.

What I’d Find Next

Model optimization was one dimension of configuration. But as I continued clear/concise/precise refinement, I’d discover another problem.

I’d built an architecture weeks ago. Workers execute. Standards provide context. Clean delegation.

But reading through agent files revealed the architecture had eroded. Context had crept back into workers. Duplication everywhere.

Time to enforce the separation I’d already designed.

Next: Workers Delegate to Standards