Defining Success Criteria

On November 10th, I had a problem. My workers were supposed to use skills I’d built, wrapped tools with specific parameters and error handling for our codebase. But they were only using them about half the time.

The other half? They’d either skip the skill entirely or start using it, hit an issue, and bail to call the tool directly.

The Problem With “Use This Skill”

The worker instructions were clear: “use this skill to perform the action.” Not ambiguous. Not vague. Use the skill.

But here’s what actually happened:

Workers used the skill correctly about a quarter to half the time. The rest of the time, one of two things happened. When the skill hit any issue (an error, an unexpected output, anything that required thinking), workers would abandon the skill and try running the tool directly themselves. Or they’d skip the skill from the start and go straight to the tool.

Making Skills Reliable

First, I needed to make sure the skills themselves weren’t the problem. If workers tried to use a skill and it failed, of course they’d bail and try something else.

So I improved error handling in the skills. Made them more robust. Handled edge cases better. The skills became reliable enough that using them should always work.

But skill usage didn’t improve. Workers still bypassed them about half the time.

What Success Means

I’d already learned that clarity makes AI more reliable. So it didn’t take me long to ask the obvious question: what does the worker think success means?

The answer came back immediately. Success meant accomplishing the goal. Run the tests. Check the code. Generate the report. However it could be done.

This made sense. Its training on vast amounts of human data had taught it that success meant accomplishing the goal. That’s what success means in most contexts.

But my definition of success deviated from that. To me, success meant using the skill. That’s why I built it, to ensure the tool was used correctly for our codebase. The skill encapsulated the right parameters, handled edge cases, provided useful output.

When a skill hit an issue, the worker faced a choice: debug why the skill failed, or just call the tool directly and accomplish the goal. The goal won every time, because that’s what its training had taught it.

I hadn’t told it what I considered success. So it optimized for what its training said success was.

Making Success Explicit

I added explicit success and failure criteria to each worker:

“Success: Used the skill to perform the task. Failure: Bypassed the skill and called the tool directly.”

Not just “use the skill.” Define what success IS. Using the skill wasn’t a step toward success, it WAS success.

This changed everything. Workers now understood that calling the tool directly, even if it accomplished the task, was failure. Using the skill, even if it hit an error, was the path to success. Debug the skill, fix the skill, but use the skill.

From a Quarter to Complete Compliance

The results were immediate. Skill usage went from roughly a quarter to half the time, all the way to 100%.

Not because the instructions got clearer. Not because the skills got better (though they did). But because workers finally understood what I meant by success.

Before: “Use this skill” (but success = accomplish the goal, so bail if needed) After: “Success = used the skill. Failure = bypassed the skill” (now success is crystal clear)

Why This Matters

This isn’t just about skills and tools. It’s about AI understanding your definition of success.

Without explicit criteria, AI optimizes for what it thinks you want. Usually that’s “accomplish the goal by any means.” That’s reasonable, it’s often what we want.

But sometimes, how you accomplish something matters as much as accomplishing it. Sometimes the process IS the goal. Sometimes using the right tool the right way is more important than just getting an answer.

When that’s true, you have to say it explicitly. Define what you consider success. Make it unambiguous.

What I Learned

First, AI needs to know what you consider success, not just what task to do.

Second, without explicit success criteria, AI optimizes for the goal by any means. That’s often fine. But when it’s not, you have to be explicit.

Third, success criteria change behavior more than instructions do. “Use this tool” is an instruction. “Success = used this tool” is a criterion. The criterion worked where the instruction didn’t.

Fourth, when you define what success means, AI can focus on meeting your definition instead of guessing what you want.

The shift from a quarter to complete compliance came from one thing: making it clear what I considered success. Not what to do. Not how to do it. What success meant.

Sometimes the most important thing you can tell AI isn’t what to do, it’s how to know when it’s done it right.

Another Problem Surfaces

Monday evening, November 10, explicit success criteria had fixed the compliance problem. Workers now used skills consistently.

But as I reviewed the worker files to add those criteria, I noticed something I hadn’t seen before. Every worker that checked coding standards included the same WordPress standards explanation. Every worker that parsed output included the same validation rules. Every worker that handled errors included the same error context.

The same information repeated across multiple files. Context mixed with execution steps. Changes would need to be made in five places instead of one.

I’d seen this pattern before. Just not in prompts.

Next: Creating Standards Files