Testing Tracker

By early December 2025, I was ready to test a new workflow.

This one was different. More complex than anything I’d tested before.

And I’d spend three weeks completely stuck.

The Custom Framework Problem

The workflow’s goal: migrate projects from custom framework classes to standard functionality.

The custom framework was a library we’d built years ago. It abstracted common patterns. But it was custom. Non-standard. A maintenance burden.

The workflow would identify custom framework usage and replace it with equivalent standard code. Automated migration.

I had 16 test repositories. Different patterns. Different complexity levels.

The Grouping Strategy

Testing all 16 at once would be chaos. Too many variables. Too much noise.

So I grouped them by the type of custom code they used:

Group 1 (4 repos): Projects using Pattern A. Replace with standard implementation A.

Group 2 (4 repos): Projects using Pattern B. Replace with standard implementation B.

Group 3 (4 repos): Projects using Pattern C. Replace with standard implementation C.

Group 4 (4 repos): Projects using Pattern D. Replace with standard implementation D.

Four groups. Four different patterns to replace. Test one group at a time. Refine the workflow based on each group’s results before moving to the next.

Systematic testing. Made sense on paper.

The Testing Iteration Structure

For each group, I needed five steps per iteration:

Run workflow – Execute on all 4 repositories in the group
Review results – Check PRs, examine code changes, verify correctness
Identify issues – What worked? What failed? What needs fixing?
Refine workflow – Update instructions, improve patterns, fix issues
Reset – Close PRs, delete branches, clean up for next iteration

Then iterate. Run again. Review. Identify. Refine. Reset. Until the group worked correctly.

Then move to the next group. Same five steps. Different pattern to replace.

This seemed manageable. Structured. Methodical.

The Detail Overwhelm

By early January 2026, I’d been testing for three weeks.

Still on Group 1. Maybe Group 2. I wasn’t entirely sure. Multiple iterations per group. Multiple issues per iteration. Multiple fixes attempted.

Where was I? What was working? What still needed fixing?

I’d look at a change. Was this from the iteration where I fixed the registration? Or before? Did I test the updated pattern yet?

The details overwhelmed me. Not the workflow development. The testing coordination. Too many details to hold in memory. Too much context to track manually.

Tuesday, January 13: The Breakthrough

Tuesday morning, I sat down frustrated.

I’d built workflows to automate complex tasks. I’d built commands to manage repetitive interactions. But I couldn’t manage testing a workflow across 4 groups with 5 steps per iteration?

The problem wasn’t complexity. The problem was detail management. I was trying to be the tracker. Remembering where I was. Tracking what was done. Managing state.

That’s what computers do well. Why was I doing it manually?

Then the realization: I don’t need to track the details. Claude can track the details.

Same principle as commands. Define the structure once. Let Claude maintain the details.

The Testing Tracker Structure

Tuesday afternoon, I worked with Claude to define the complete testing structure.

Not just “here’s what I’m testing.” But the complete structure:

Groups defined: 4 groups, which repos in each, what pattern each group replaces.

Iteration steps: 5 steps per iteration, what each step involves, what “done” means for each.

Current state: Which group am I on? Which iteration? Which step in that iteration?

Issue tracking: What issues found? In which iteration? Which are fixed? Which remain open?

Results tracking: What’s working? What’s not? What patterns are successful?

Everything. The complete testing process defined once. All the structure. All the details. All in one place.

Then Claude maintains it. Updates the state after each step. Tracks issues as they’re found. Records fixes as they’re made. Prompts me when I need to do something.

I don’t remember where I am. The tracker knows.

How It Works

The tracker maintains a structured document with all testing state.

After each step, Claude updates the tracker:

“Completed: Iteration 2 – Step 1 – Run workflow on Group 1 (Pattern A migration)
Next: Step 2 – Review results from 4 repositories
Would you like me to help review the changes?”

I don’t say “I need to review changes for iteration 2 of group 1.” Claude knows what’s next. Claude prompts me. Claude guides the process.

When issues are found:

“Issue identified in repo project-alpha: Pattern A implementation missing required configuration
Tracking as Issue #3 for Group 1, Iteration 2
Should we continue reviewing or address this issue now?”

Claude tracks the issue. Records which iteration found it. Asks what to do next.

When I make a fix:

“Issue #3 marked as resolved: Added required configuration to Pattern A implementation
Ready for next iteration? Or review remaining issues first?”

Claude updates the tracker. Knows what issues remain. Guides next steps.

Three Days vs Three Weeks

Tuesday afternoon through Friday evening, I completed all testing.

All 4 groups. Multiple iterations per group. All issues identified. All fixes made. All results verified.

Three days. Complete.

Not because the work was easier. Because I wasn’t managing details anymore. Claude maintained where I was. What was done. What needed doing next. What issues were tracked.

My job: respond when prompted. Review when asked. Make decisions when needed.

Not: remember where I am. Track what’s done. Manage iteration state. Coordinate details.

What Changed

The work was the same. Run workflow. Review results. Fix issues. Iterate.

But the cognitive overhead disappeared.

Before: “Wait, which group am I on? Did I test this fix already? What issues are still open? Let me check my notes.”

After: Claude tells me what’s next. Claude knows what’s done. Claude tracks open issues. Claude prompts action.

I stopped being the detail manager. Claude handled detail management. I focused on decisions and fixes.

The Pattern

This is the same pattern as commands. Same as standards files. Same as evaluation frameworks.

Define the structure once. Completely. Upfront.

Then let the system maintain the details. Don’t track state manually. Don’t remember context yourself. Don’t manage complexity in your head.

Specify the process structure. Let Claude track where you are. Focus on decisions, not detail management.

When This Pattern Helps

Testing tracker works when:

1. Multiple iterations – Not one-time, but repeated cycles

2. Complex state – Multiple dimensions to track (groups, iterations, steps, issues)

3. Manual intervention needed – Not fully automated, you make decisions along the way

4. Detail overwhelm – Too many details to hold in memory reliably

When all four apply, define complete structure upfront and let Claude maintain state.

What I Learned

First, detail management is separate from decision making. You need to do one. Claude can do the other.

Second, define the complete process structure once upfront. Don’t define incrementally. Complete structure enables complete tracking.

Third, let your AI maintain state. Where you are. What’s done. What’s next. What’s tracked. You respond to prompts, not remember state.

Fourth, cognitive overhead is often the real bottleneck. Not the work itself. The coordination. The tracking. The remembering.

Fifth, three weeks to three days came from one thing: stopping manual detail management. Same work. Different approach to tracking.

The Complete Structure

Friday evening, January 16, testing was complete.

All 4 groups tested. All iterations run. All issues found and fixed. Workflow working correctly across all 16 repositories.

Three days. Because the tracker maintained details. Prompted me when needed. Guided the process. Removed cognitive overhead.

The lesson: when process complexity overwhelms you, don’t work harder at tracking. Define the structure once. Let Claude maintain the details. Focus on decisions, not detail management.

That’s what automation should do. Not just execute work. Manage complexity so you don’t have to.

End of series for now.