We write user stories to communicate intent. But intent without examples is just a polite way to misunderstand each other.

Consider a story like:

“As an accounts payable clerk, I want to see the outstanding balance for each vendor so that I know what we still owe.”

Everyone nods. Then the developer builds it. The stakeholder reviews it. The numbers don’t match what anyone expected. Nobody was wrong — there were just no concrete examples to verify against.

That’s the problem specification by example solves.


The Story: Vendor Outstanding Balance Ledger

Here’s a story from an accounts payable feature:


User Story: View Vendor Outstanding Balance

In order to know exactly how much my business owes each vendor at any point in time,
As an accounts payable clerk,
I want to see a ledger of open invoices, applied payments, and remaining balances for a vendor.

Acceptance Criteria

  • The ledger shows all vendor invoices with their original amounts
  • Each invoice shows how much has been paid and what remains open
  • Payments are shown with which invoice(s) they were applied to
  • The summary shows total amount owed, total open invoices, and total unapplied payments
  • A partial payment shows the correct remaining balance on the invoice
  • A payment applied to multiple invoices reduces each invoice’s balance correctly

Scenario: Single invoice, fully paid

Given I have recorded a vendor invoice and applied a payment in full,
When I view the outstanding balance ledger for that vendor,
Then the invoice and payment both show a balance of zero,
And the summary shows nothing outstanding.

Scenario: Partial payment on a large invoice

Given I have recorded a vendor invoice and applied a payment that covers only part of it,
When I view the outstanding balance ledger for that vendor,
Then the invoice shows the remaining unpaid balance,
And the summary reflects the amount still owed.

Scenario: Multiple invoices, some paid, some open

Given I have several vendor invoices — some paid in full, some not yet paid,
When I view the outstanding balance ledger for that vendor,
Then each invoice shows its correct current balance,
And the summary totals reflect only what is genuinely still owed.


That story is clean. It communicates clearly. But here’s the question: how much is “part of it”? What exactly appears in the summary?

The scenarios work for a quick read, but they’re not verifiable yet.


Enhancing the Story with Specification by Example

The enhance-story-with-spec-by-example skill handles exactly that.

It prompts the AI to look at the story, identify what entities are involved, what state changes and calculations happen, and then generate concrete scenarios using markdown tables with real numbers that show the math.

After running the skill, the same story looks like this:


Scenario: Single invoice — payment applied in full

Given the following account activity:

Date Document Type Document # Initial Amount Applied To Applied Amount
2026-06-01 Invoice INV-2026-112 14750.00
2026-06-05 Payment PAY-4481 14750.00 INV-2026-112 14750.00

When I view the outstanding balance ledger for that vendor,

Then the entries list contains:

Date Document Type Document # Initial Amount Current Balance
2026-06-01 Invoice INV-2026-112 14750.00 0.00
2026-06-05 Payment PAY-4481 14750.00 0.00

And the summary shows:

Total Owed Open Invoices Unapplied Payments
0.00 0.00 0.00

Scenario: Partial payment on a large invoice

Given the following account activity:

Date Document Type Document # Initial Amount Applied To Applied Amount
2026-06-03 Invoice INV-2026-119 22300.00
2026-06-07 Payment PAY-4502 9000.00 INV-2026-119 9000.00

When I view the outstanding balance ledger for that vendor,

Then the entries list contains:

Date Document Type Document # Initial Amount Current Balance
2026-06-03 Invoice INV-2026-119 22300.00 13300.00
2026-06-07 Payment PAY-4502 9000.00 0.00

And the summary shows:

Total Owed Open Invoices Unapplied Payments
13300.00 13300.00 0.00

(22300.00 − 9000.00 = 13300.00. The math is right there in the table.)


Scenario: Multiple invoices — one paid, one open

Given the following account activity:

Date Document Type Document # Initial Amount Applied To Applied Amount
2026-06-01 Invoice INV-2026-112 14750.00
2026-06-03 Invoice INV-2026-119 22300.00
2026-06-05 Payment PAY-4481 14750.00 INV-2026-112 14750.00

When I view the outstanding balance ledger for that vendor,

Then the entries list contains:

Date Document Type Document # Initial Amount Current Balance
2026-06-01 Invoice INV-2026-112 14750.00 0.00
2026-06-03 Invoice INV-2026-119 22300.00 22300.00
2026-06-05 Payment PAY-4481 14750.00 0.00

And the summary shows:

Total Owed Open Invoices Unapplied Payments
22300.00 22300.00 0.00

How the Skill Works

It starts by reading the story to understand the domain: which entities are involved (invoices, payments, credits), what state changes occur (recorded, applied, settled), and which calculations matter (running balances, totals, partial applications).

From there, it designs two table structures. A Given table captures the sequence of input events: what happened and in what order. A Then table captures what the ledger should look like after those events, with initial amounts and current balances side by side. A Summary table shows the aggregate totals.

The numbers are realistic. Rather than 100, 200, 300, it uses amounts like 14750.00, 22300.00, 9000.00: numbers where the subtraction is visible and verifiable.

It also covers the cases that break things. Partial payments. Multiple applications. Unapplied payments sitting on the account. Empty ledgers. These aren’t afterthoughts. They’re first-class scenarios.

What used to take me hours of careful thinking (designing the table structure, choosing numbers that demonstrate each case, checking the math, writing it all in a consistent format) now takes minutes. More importantly, the output is consistent. Every story enhanced this way follows the same structure, uses the same table keys, and generates the same kind of test output when the internal DSL test skill picks it up downstream.


Why This Matters

The spec-by-example table becomes a contract. Stakeholders can read it and say “yes, that’s exactly what I mean” or “wait, that’s not right” before a single line of code is written.

And when the developer (er, AI agent) implements the feature and runs the tests, the test output mirrors those tables. The numbers in the test output match the numbers in the story. There’s no ambiguity about whether the feature is correct.

That’s the feedback loop I wanted. AI helped me get there faster and more consistently than I ever could by hand.


What’s Next

The scenarios in these tables connect directly to how the specs/tests are written: the Given table becomes the test setup, the Then table becomes assertions, and the test output is readable enough for a stakeholder (or an AI agent) to review.

That’s a post for another day.

Leave a Reply

Trending

Discover more from Claudio Lassala's Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading