Background execution was fast and consistent. But I had no idea what actually happened until I checked the logs. I needed systematic logging, timing data, automated commits, PR creation—the full testing workflow. Claude wrote me a shell script. That script became the foundation for testing at scale.