I made a classic mistake when building my voice journal cleanup system: I tried to orchestrate everything at once.
The idea was elegant. Spawn multiple subagents, process all unprocessed journal entries in parallel, and let them coordinate the work. It failed. The subagents produced inconsistent results. Instructions conflicted. Arguments didn’t pass through correctly. Each subagent was reading the same files, building overlapping context, and stepping on each other’s toes. The system burned tokens like crazy for work that should have been straightforward.
I stepped back and asked myself: what if I just processed one entry at a time?
Context Clarity Over Parallelism
Processing individual journal entries with clean context eliminated competing instructions, overlapping context, argument-passing confusion, and the decision paralysis that kicks in when an agent is trying to coordinate multiple tasks at once.
One file. One agent. One clear job. That’s how you get reliable, repeatable results.
A Skill-Based Architecture
I rebuilt the system as a single, focused skill: voice-journal-cleanup. Here’s how it flows:
User: /voice-journal-cleanup
↓
Skill runs inline (clean context, no subagent overhead)
↓
1. Strip timestamps (Python script)
2. Apply name corrections (Python script)
3. Generate first-person summary + bullets (LLM)
4. Append raw journal content (shell command)
5. Add YAML frontmatter (LLM)
↓
Output: YYYYMMDD-HHMMSS_slug.md
Not all work requires an LLM. Stripping timestamps is a regex. Appending file content is cat. Moving files is mv. Separating LLM work from mechanical work cut token usage significantly.
Where the Real Win Was
Before: The LLM wrote the entire output file in one operation: YAML frontmatter, first-person summary, and verbatim raw journal content. The raw journal section can be 500 lines or more. The LLM was rewriting content that already existed, burning tokens for no value.
After: I split the work:
-
LLM generates frontmatter + summary +
# Raw Journalheader to a temp file -
Shell appends the cleaned file content via
catredirection -
Shell moves the temp file to its final location
The raw journal section (often 500+ lines) is no longer written by the LLM. It’s just copied via a native OS command. Same output. A fraction of the tokens.
That freed up a second question: if the LLM is only writing frontmatter and a short summary, does it need to be Sonnet? I was running Claude Sonnet for this work. After the redesign, I switched to SWE-1.6, which is free. It handles frontmatter generation and first-person summaries without missing a beat. The output is clean, consistent, and easy to hand off to more capable models later — whether that’s drafting a blog post, building a talk outline, or thinking through a new feature.
Right tool. Right model. Right cost.
Intentional Tool Selection
Every token costs something. Not just money, but compute, energy, and opportunity cost. If you’re using an LLM to do work that a shell command can do, you’re being wasteful. The rule:
-
Use LLMs for judgment, synthesis, and creativity
-
Use native tools for mechanical, deterministic work
-
Measure the output, not the effort
The LLM’s job was to understand the journal content and summarize it in first person. The shell’s job was to copy bytes from one file to another. Each did its job. I got better results with fewer tokens.
What the Output Looks Like
Each cleaned entry has a summary at the top for quick scanning and the raw journal at the bottom for downstream use. The YAML frontmatter is uniform, the voice is first-person, and no tokens are wasted rewriting content that already exists.
And the system is reliable. No more inconsistent results. No more competing instructions. Just one file, one skill, one clear outcome.
The Takeaway
-
Start simple. Batch processing sounded elegant but failed. Individual processing was boring but worked.
-
Measure what matters. Token usage, consistency, and downstream value — not just “did the agent finish?”
-
Respect tool boundaries. LLMs are powerful, but they’re not the right tool for everything. Know when to use
catinstead of Claude. -
Iterate on architecture, not just prompts. The real win came from rethinking the system design, not tweaking the LLM instructions.
-
Match the model to the task. Frontmatter and summaries don’t need the most powerful model on the shelf. Free is fine when the task is well-defined.
Token economics isn’t about squeezing every last byte. It’s about being intentional with your tools and respecting the cost of computation.
What I’m learning is that the most effective AI workflows are often the least glamorous ones: one file, one task, the right tool for the job.





Leave a Reply