
October 30th was refactor day. I’d take everything I learned and rebuild the entire system with proper architectural foundations.
By the end of the day, I’d have 12 agents, comprehensive testing documentation, and I’d delete 6,008 lines of the old implementation.
The Plan: Build All, Then Test Bottom-Up
The night before, I created a 10-phase implementation plan. The key insight:
Don’t test while building. Build complete layers, then test from the bottom up.
Phases 2-4: Build all agents (no testing)
Phases 5-7: Test bottom-up (workers → managers → orchestrator)
Why this works: You validate each layer before building on top of it. If workers have bugs, you find them before managers depend on them.
Three-Tier Hierarchy
The new architecture had clear separation of concerns:
Orchestrator (Entry Layer)
- Workflow sequencing
- Verification loops
- Overall coordination
Managers (Coordination Layer)
- Worker selection
- Result aggregation
- Routing logic
Workers (Execution Layer)
- Domain expertise
- Standards knowledge
- Actual work
Eight workers (specialists in their domain)
Three managers (coordinators that don’t do work)
One orchestrator (workflow-level sequencing)
Intelligence Distribution
Where should knowledge live?
Workers contain standards. Not documentation. Not the orchestrator. The workers themselves.
Why? Because standards are coupled to the work. The agent that fixes issues should know what “correct” looks like.
Managers contain routing logic. They know which worker to call for which type of problem.
Orchestrator contains workflow logic. It knows the sequence: prepare → process → verify → fix → loop.
This distribution made each layer maintainable. Want to update a standard? Change one worker. Want to change workflow sequence? Change the orchestrator. Clean boundaries.
Job Naming Reflects Architecture
I established a naming convention based on architectural role:
Managers: Proactive/Reactive
- Proactive: “Process everything comprehensively”
- Reactive: “Fix these specific issues”
Workers: Comprehensive/Targeted
- Comprehensive: “Handle all instances in this file”
- Targeted: “Fix this specific instance”
Tool Runners: Automatic/Diagnostic
- Automatic: “Fix automatically”
- Diagnostic: “Report issues”
The names tell you what the agent does and what layer it operates at.
The Testing Strategy
Phase 5: Test each worker individually
- Invoke with specific inputs
- Verify outputs match expectations
- Document results
Phase 6: Test each manager with its workers
- Invoke manager
- Verify it calls correct workers
- Verify it aggregates results correctly
Phase 7: Test the orchestrator end-to-end
- Full workflow execution
- Verify verification loops work
- Confirm final state is correct
This bottom-up approach caught integration issues at the right level. Worker bugs surfaced in Phase 5. Coordination bugs surfaced in Phase 6. Workflow bugs surfaced in Phase 7.
The Verification Loop in Practice
The orchestrator’s core logic:
- Run automated fixes
- Run comprehensive processing
- Check for issues
- If issues found:
- Run reactive fixes
- Go to step 3
- Report: “All files compliant”
Step 4 is the loop. It keeps running until step 3 returns zero issues.
In practice: multiple files, dozens of issues, multiple iterations → zero issues remaining.
The system knew when it was done. I didn’t have to.
Deleting Fearlessly
At the end of Day 4, I deleted the old implementation: 33 files, 6,008 lines.
All that work from Days 1-3? Gone.
But the learnings from Days 1-3 were baked into the new architecture. I didn’t lose anything (I refined everything).
This is what good refactoring feels like. The old code taught you what the new code should be.
What I’d Built
By the end of Day 4, I had a complete system:
- 12 agents organized in a three-tier hierarchy
- Clear separation of concerns at every level
- Comprehensive testing documentation
- A workflow that could process dozens of files and loop until clean
The architecture felt solid. The patterns felt right. The 10-step rule, the intelligence distribution, the verification loops – everything clicked into place.
I’d gone from “Can agents coordinate other agents?” on Monday morning to a fully-tested hierarchical system by Thursday afternoon.
What I didn’t know yet was that I’d built this entire architecture on an assumption that was about to be proven wrong.
Next: The Long Weekend