We write user stories to communicate intent. But intent without examples is just a polite way to misunderstand each other.
Consider a story like:
“As an accounts payable clerk, I want to see the outstanding balance for each vendor so that I know what we still owe.”
Everyone nods. Then the developer builds it. The stakeholder reviews it. The numbers don’t match what anyone expected. Nobody was wrong — there were just no concrete examples to verify against.
That’s the problem specification by example solves.
The Story: Vendor Outstanding Balance Ledger
Here’s a story from an accounts payable feature:
User Story: View Vendor Outstanding Balance
In order to know exactly how much my business owes each vendor at any point in time,
As an accounts payable clerk,
I want to see a ledger of open invoices, applied payments, and remaining balances for a vendor.
Acceptance Criteria
- The ledger shows all vendor invoices with their original amounts
- Each invoice shows how much has been paid and what remains open
- Payments are shown with which invoice(s) they were applied to
- The summary shows total amount owed, total open invoices, and total unapplied payments
- A partial payment shows the correct remaining balance on the invoice
- A payment applied to multiple invoices reduces each invoice’s balance correctly
Scenario: Single invoice, fully paid
Given I have recorded a vendor invoice and applied a payment in full,
When I view the outstanding balance ledger for that vendor,
Then the invoice and payment both show a balance of zero,
And the summary shows nothing outstanding.
Scenario: Partial payment on a large invoice
Given I have recorded a vendor invoice and applied a payment that covers only part of it,
When I view the outstanding balance ledger for that vendor,
Then the invoice shows the remaining unpaid balance,
And the summary reflects the amount still owed.
Scenario: Multiple invoices, some paid, some open
Given I have several vendor invoices — some paid in full, some not yet paid,
When I view the outstanding balance ledger for that vendor,
Then each invoice shows its correct current balance,
And the summary totals reflect only what is genuinely still owed.
That story is clean. It communicates clearly. But here’s the question: how much is “part of it”? What exactly appears in the summary?
The scenarios work for a quick read, but they’re not verifiable yet.
Enhancing the Story with Specification by Example
The enhance-story-with-spec-by-example skill handles exactly that.
It prompts the AI to look at the story, identify what entities are involved, what state changes and calculations happen, and then generate concrete scenarios using markdown tables with real numbers that show the math.
After running the skill, the same story looks like this:
Scenario: Single invoice — payment applied in full
Given the following account activity:
| Date | Document Type | Document # | Initial Amount | Applied To | Applied Amount |
|---|---|---|---|---|---|
| 2026-06-01 | Invoice | INV-2026-112 | 14750.00 | — | — |
| 2026-06-05 | Payment | PAY-4481 | 14750.00 | INV-2026-112 | 14750.00 |
When I view the outstanding balance ledger for that vendor,
Then the entries list contains:
| Date | Document Type | Document # | Initial Amount | Current Balance |
|---|---|---|---|---|
| 2026-06-01 | Invoice | INV-2026-112 | 14750.00 | 0.00 |
| 2026-06-05 | Payment | PAY-4481 | 14750.00 | 0.00 |
And the summary shows:
| Total Owed | Open Invoices | Unapplied Payments |
|---|---|---|
| 0.00 | 0.00 | 0.00 |
Scenario: Partial payment on a large invoice
Given the following account activity:
| Date | Document Type | Document # | Initial Amount | Applied To | Applied Amount |
|---|---|---|---|---|---|
| 2026-06-03 | Invoice | INV-2026-119 | 22300.00 | — | — |
| 2026-06-07 | Payment | PAY-4502 | 9000.00 | INV-2026-119 | 9000.00 |
When I view the outstanding balance ledger for that vendor,
Then the entries list contains:
| Date | Document Type | Document # | Initial Amount | Current Balance |
|---|---|---|---|---|
| 2026-06-03 | Invoice | INV-2026-119 | 22300.00 | 13300.00 |
| 2026-06-07 | Payment | PAY-4502 | 9000.00 | 0.00 |
And the summary shows:
| Total Owed | Open Invoices | Unapplied Payments |
|---|---|---|
| 13300.00 | 13300.00 | 0.00 |
(22300.00 − 9000.00 = 13300.00. The math is right there in the table.)
Scenario: Multiple invoices — one paid, one open
Given the following account activity:
| Date | Document Type | Document # | Initial Amount | Applied To | Applied Amount |
|---|---|---|---|---|---|
| 2026-06-01 | Invoice | INV-2026-112 | 14750.00 | — | — |
| 2026-06-03 | Invoice | INV-2026-119 | 22300.00 | — | — |
| 2026-06-05 | Payment | PAY-4481 | 14750.00 | INV-2026-112 | 14750.00 |
When I view the outstanding balance ledger for that vendor,
Then the entries list contains:
| Date | Document Type | Document # | Initial Amount | Current Balance |
|---|---|---|---|---|
| 2026-06-01 | Invoice | INV-2026-112 | 14750.00 | 0.00 |
| 2026-06-03 | Invoice | INV-2026-119 | 22300.00 | 22300.00 |
| 2026-06-05 | Payment | PAY-4481 | 14750.00 | 0.00 |
And the summary shows:
| Total Owed | Open Invoices | Unapplied Payments |
|---|---|---|
| 22300.00 | 22300.00 | 0.00 |
How the Skill Works
It starts by reading the story to understand the domain: which entities are involved (invoices, payments, credits), what state changes occur (recorded, applied, settled), and which calculations matter (running balances, totals, partial applications).
From there, it designs two table structures. A Given table captures the sequence of input events: what happened and in what order. A Then table captures what the ledger should look like after those events, with initial amounts and current balances side by side. A Summary table shows the aggregate totals.
The numbers are realistic. Rather than 100, 200, 300, it uses amounts like 14750.00, 22300.00, 9000.00: numbers where the subtraction is visible and verifiable.
It also covers the cases that break things. Partial payments. Multiple applications. Unapplied payments sitting on the account. Empty ledgers. These aren’t afterthoughts. They’re first-class scenarios.
What used to take me hours of careful thinking (designing the table structure, choosing numbers that demonstrate each case, checking the math, writing it all in a consistent format) now takes minutes. More importantly, the output is consistent. Every story enhanced this way follows the same structure, uses the same table keys, and generates the same kind of test output when the internal DSL test skill picks it up downstream.
Why This Matters
The spec-by-example table becomes a contract. Stakeholders can read it and say “yes, that’s exactly what I mean” or “wait, that’s not right” before a single line of code is written.
And when the developer (er, AI agent) implements the feature and runs the tests, the test output mirrors those tables. The numbers in the test output match the numbers in the story. There’s no ambiguity about whether the feature is correct.
That’s the feedback loop I wanted. AI helped me get there faster and more consistently than I ever could by hand.
What’s Next
The scenarios in these tables connect directly to how the specs/tests are written: the Given table becomes the test setup, the Then table becomes assertions, and the test output is readable enough for a stakeholder (or an AI agent) to review.
That’s a post for another day.





Leave a Reply