Tuesday, November 19, I was still fighting the testing bottleneck.
Parallel execution made workflows faster. But watching them run in the CLI was becoming its own problem.
The Interactive Execution Problems
When you run workflows interactively in the chat, you see everything Claude does. Every tool call. Every worker invocation. Every result.
This creates several problems:
Conversation compacting: Long workflows generate so much context that Claude automatically compacts the conversation mid-execution. The CLI flickers. You lose track of where you are.
Display issues: The CLI tool struggles with massive amounts of output. Screen refreshes get laggy. Text display gets weird.
Context bloat: Every output line adds to the context. By the end of a workflow, the conversation is enormous.
But there was another hypothesis forming: What if the verbosity itself affected timing?
The Verbosity Hypothesis
Claude outputs text during execution. Explanations. Status updates. Results.
What if generating that output wasn’t free? What if the time spent formatting and displaying text was part of why timing was so inconsistent?
Interactive mode requires Claude to:
- Decide what to output
- Format it for readability
- Stream it to the display
- Update the context with it
All of that happens during execution. What if it affected performance?
The Discovery
I found it in the documentation: background execution mode.
Claude can run in the background. No interactive output during execution. Results logged to a file instead of displayed in chat. You start it, it runs, you check the log when it’s done.
The execution model changes completely:
Interactive mode:
You: Run workflow Claude: Starting workflow... [output streams] Claude: Invoking worker A... [more output] Claude: Processing results... [more output] Claude: Done [final output]
Background mode:
You: Run workflow in background Claude: Started [returns immediately] [Workflow runs with no output] You: [later] Check log file Log: [Complete results]
The Benefits
Background execution could solve multiple problems simultaneously:
No compacting: Background runs don’t add to chat context. No conversation compacting mid-execution.
No CLI issues: Nothing streaming to the display means no flicker, no lag, no weird rendering.
Cleaner logs: Output goes to a file. Clean, structured, easy to read later.
Potentially faster: If verbosity affects timing, background runs might be more consistent because they’re not generating interactive output.
“Start and forget”: Launch a workflow, go do other work, come back later for results.
The Problem
Background execution looked promising. But using it for real testing revealed how much manual work was still required.
For each repository, I needed to:
- Record the timing with a stopwatch
- Manually commit and push the changes
- Create pull requests
- Analyze the log files
The logs were the real problem. Incredibly long. JSON format. Basically no documentation. I didn’t even try to parse them manually. I could tell immediately I’d need AI help for that.
The command syntax itself was cumbersome. But that was just the beginning. To test workflows across multiple repositories, I needed all of this automated.
So I asked Claude to help me build something better.