
By early October 2025, I had spent 35 days immersed in AI safety discourse. I understood the concerns: alignment is hard, models might deceive, verification is nearly impossible. Then I started building the most complex AI systems I’d ever attempted.
October 27-30: The Four-Day Intensive
On October 27, I asked Claude a simple question: “Can you help me create my first agent?”
That question changed everything. I suddenly understood that agents execute instructions autonomously. I wasn’t directing every step anymore. I was defining the process once and letting it run.
What followed was four intensive days of building. I built orchestration patterns. I discovered the 10-step rule for workflow complexity. I created reusable components. By October 30, I had a working multi-agent system with twelve agents across three tiers.
But I also had problems.
The Problems
The agents were inconsistent. They took shortcuts. They bypassed processes I’d defined. Sometimes they worked perfectly. Sometimes they improvised.
My success rate was around 75-80%. That meant 20-25% of the time, something went wrong. Not catastrophically wrong. Just not what I’d specified.
What I Expected vs. What I Saw
Based on my 35-day education, I expected certain things. The discourse had warned about models scheming and deceiving. “Playing the training game.” Appearing aligned while pursuing different goals. Sophisticated deception to avoid detection.
That’s not what I saw.
The behaviors looked different. Not sophisticated scheming. More like confusion or misunderstanding. The agents took the path of least resistance. They were trying to accomplish the goal. They just had different ideas about how.
Workers would sometimes follow instructions perfectly. Sometimes bypass the skills I’d built for them. Sometimes take shortcuts. No malicious intent. Just different interpretations of what “success” meant.
It looked like they were improvising based on incomplete understanding.
A Familiar Pattern
This felt familiar.
In June, I had discovered that structure and clarity improve AI performance. Clear instructions lead to good results. Vague instructions lead to poor results.
Now in October, I was seeing the same pattern. Agents with unclear success criteria were unreliable. This felt like a communication problem.
The Disconnect
I’d just spent 35 days learning why alignment is fundamentally hard. The discourse said it might require breakthroughs we don’t have yet.
But my hands-on experience kept suggesting something different. This seemed like unclear instructions. Better instructions might work.
Maybe I was wrong. Maybe this WAS the alignment difficulty they meant. But the behaviors didn’t look like scheming. They looked like confusion.
I was building successfully. Mostly. But the 25% unreliability frustrated me. I wanted 100% reliability. And I didn’t know what to make of the disconnect between what I’d learned and what I was seeing.
What Came Next
For two weeks, I worked with these inconsistent behaviors. Trying different approaches. Refining instructions. Getting better results, but not quite there.
Then, on November 10th, I tried something different. And everything changed.
This is Part 4 of a 9-part series. Continue to Part 5: The Breakthrough »