Tuesday, November 19, I was still fighting the testing bottleneck.

Parallel execution made workflows faster. But watching them run in the CLI was becoming its own problem.

The Interactive Execution Problems

When you run workflows interactively in the chat, you see everything Claude does. Every tool call. Every worker invocation. Every result.

This creates several problems:

Conversation compacting: Long workflows generate so much context that Claude automatically compacts the conversation mid-execution. The CLI flickers. You lose track of where you are.

Display issues: The CLI tool struggles with massive amounts of output. Screen refreshes get laggy. Text display gets weird.

Context bloat: Every output line adds to the context. By the end of a workflow, the conversation is enormous.

But there was another hypothesis forming: What if the verbosity itself affected timing?

The Verbosity Hypothesis

Claude outputs text during execution. Explanations. Status updates. Results.

What if generating that output wasn’t free? What if the time spent formatting and displaying text was part of why timing was so inconsistent?

Interactive mode requires Claude to:

  • Decide what to output
  • Format it for readability
  • Stream it to the display
  • Update the context with it

All of that happens during execution. What if it affected performance?

The Discovery

I found it in the documentation: background execution mode.

Claude can run in the background. No interactive output during execution. Results logged to a file instead of displayed in chat. You start it, it runs, you check the log when it’s done.

The execution model changes completely:

Interactive mode:

                                                                                                                                         
  You: Run workflow                                                                                                                             
  Claude: Starting workflow... [output streams]                                                                                                 
  Claude: Invoking worker A... [more output]                                                                                                    
  Claude: Processing results... [more output]                                                                                                   
  Claude: Done [final output]                                                                                                                   
  

Background mode:

                                                                                                                                         
  You: Run workflow in background                                                                                                               
  Claude: Started [returns immediately]                                                                                                         
  [Workflow runs with no output]                                                                                                                
  You: [later] Check log file                                                                                                                   
  Log: [Complete results]                                                                                                                       
  

The Benefits

Background execution could solve multiple problems simultaneously:

No compacting: Background runs don’t add to chat context. No conversation compacting mid-execution.

No CLI issues: Nothing streaming to the display means no flicker, no lag, no weird rendering.

Cleaner logs: Output goes to a file. Clean, structured, easy to read later.

Potentially faster: If verbosity affects timing, background runs might be more consistent because they’re not generating interactive output.

“Start and forget”: Launch a workflow, go do other work, come back later for results.

The Problem

Background execution looked promising. But using it for real testing revealed how much manual work was still required.

For each repository, I needed to:

  • Record the timing with a stopwatch
  • Manually commit and push the changes
  • Create pull requests
  • Analyze the log files

The logs were the real problem. Incredibly long. JSON format. Basically no documentation. I didn’t even try to parse them manually. I could tell immediately I’d need AI help for that.

The command syntax itself was cumbersome. But that was just the beginning. To test workflows across multiple repositories, I needed all of this automated.

So I asked Claude to help me build something better.

Next: Batch Scripts for Testing