By Saturday evening, November 22, I’d been testing workflows for a week.

Every test run revealed issues. Workers interpreting instructions differently. Same instruction producing different results. Edge cases handled inconsistently.

The pattern was clear: instructions weren’t precise enough.

The Real Problem

Sunday morning, November 23, I reviewed what I’d learned from testing.

Workers making different decisions from the same instructions. Context appearing in some files but not others. Inconsistency everywhere.

Testing found problems. But I needed a systematic way to improve instruction quality.

How do you make instructions better when you have dozens of files across multiple workflows?

A Methodology From Elsewhere

I’d been working with Claude on personal projects for months. Writing documentation. Building systems. Creating instructions.

At some point, I’d started asking Claude a simple question about every file: “Is this clear, concise, and precise?”

Not “is this good?” Not “can you improve this?” Those questions got vague answers.

But “clear, concise, precise” provided objective criteria. Three questions with specific answers. Actionable feedback.

This methodology had improved documentation on personal projects. Time to see if it worked on workflow instructions.

The First File

Sunday morning, I tested the approach on a single instruction file.

Asked Claude: “Is this file clear, concise, and precise?”

The response came back with specific issues. Ambiguous instructions. Verbose explanations. Vague references without clear definitions.

Not vague feedback. Specific problems with specific criteria violated.

I fixed the issues. Asked again. Got more feedback. Iterated until Claude said: “This file is clear, concise, and precise.”

The Big Decision

One file done. I had dozens more.

I could keep running workflow tests. Keep gathering data. Keep analyzing patterns.

Or I could stop testing and focus entirely on fixing the instructions.

Testing had done its job. It had identified the problems. Running more tests would just generate more examples of the same issues.

Time to stop gathering evidence and start fixing what was broken.

I stopped running workflow tests. Completely.

The Scale Problem

But I had a lot of repetitive work ahead. Same questions. Same criteria. Same iteration pattern.

I’d been building automation for weeks. This was exactly the kind of repetitive work that should be automated.

By late morning, I’d automated the evaluation process.

Now instead of manually asking Claude about each file, I could run one command and get systematic evaluation.

What The Command Found

I ran the command against all my instruction files.

The results were illuminating:

Verbosity everywhere. Introductions that explained context in hundreds of words when a fraction would do.

Bold formatting abuse. So much bolding that nothing stood out anymore. When everything is important, nothing is important.

Imprecise instructions. “Check for issues” without defining which issues. “Format correctly” without stating according to what.

Massive duplication. The same explanations appearing in multiple files. The same criteria repeated. The same examples duplicated.

I’d created reference files specifically to avoid duplication. But context had crept back into instructions anyway.

The Path Forward

By noon, the methodology was clear: Run evaluation. Read feedback. Fix issues. Iterate until the file passed all three criteria.

Work through files systematically until each one was clear, concise, and precise according to specific criteria.

The Duplication Problem

Sunday afternoon, as I started examining files more closely, the duplication problem became impossible to ignore.

I’d created reference files weeks earlier specifically so instructions wouldn’t need to duplicate context about standards, formatting rules, or examples.

But over time, context had crept back in. Instructions explaining standards instead of referencing them. Examples appearing in instructions when they should be in reference files.

The architecture was right. Reference files existed. Instructions were supposed to delegate to them.

But in practice, instructions had gotten bloated with context that belonged elsewhere.

This wasn’t just about verbosity. This was architectural drift. The clear separation between instructions and references had eroded.

What I Learned

First, objective criteria beat subjective judgment. “Is this good?” gets vague answers. “Is this clear, concise, and precise?” gets specific feedback.

Second, methodology matters more than tools. The three criteria provided a framework for systematic improvement.

Third, automation makes methodology scalable. Manual evaluation works for a few files. Commands make it work for dozens.

Fourth, sometimes you have to stop execution to improve instructions. More data doesn’t help when you already know what’s wrong.

Fifth, architectural drift happens naturally. Good design doesn’t maintain itself. Context creeps. Duplication emerges. Refinement requires enforcement, not just initial design.

The Work Complete

By Sunday evening, I’d worked through all the files.

Each one evaluated. Feedback reviewed. Issues fixed. Iterations complete.

Not production-ready. But much better than they had been. Clearer. More concise. More precise.

But as I worked through the day, I’d realized something about the tasks themselves. Not all work was the same. Some required deep reasoning. Others were mechanical pattern matching. That distinction would matter.

Next: Right Model for the Task