Specification by Example – How AI Turns Vague Stories Into Verifiable Behavior

We write user stories to communicate intent. But intent without examples is just a polite way to misunderstand each other.

Consider a story like:

“As an accounts payable clerk, I want to see the outstanding balance for each vendor so that I know what we still owe.”

Everyone nods. Then the developer builds it. The stakeholder reviews it. The numbers don’t match what anyone expected. Nobody was wrong — there were just no concrete examples to verify against.

That’s the problem specification by example solves.

The Story: Vendor Outstanding Balance Ledger

Here’s a story from an accounts payable feature:

User Story: View Vendor Outstanding Balance

In order to know exactly how much my business owes each vendor at any point in time,
As an accounts payable clerk,
I want to see a ledger of open invoices, applied payments, and remaining balances for a vendor.

Acceptance Criteria

The ledger shows all vendor invoices with their original amounts
Each invoice shows how much has been paid and what remains open
Payments are shown with which invoice(s) they were applied to
The summary shows total amount owed, total open invoices, and total unapplied payments
A partial payment shows the correct remaining balance on the invoice
A payment applied to multiple invoices reduces each invoice’s balance correctly

Scenario: Single invoice, fully paid

Given I have recorded a vendor invoice and applied a payment in full,
When I view the outstanding balance ledger for that vendor,
Then the invoice and payment both show a balance of zero,
And the summary shows nothing outstanding.

Scenario: Partial payment on a large invoice

Given I have recorded a vendor invoice and applied a payment that covers only part of it,
When I view the outstanding balance ledger for that vendor,
Then the invoice shows the remaining unpaid balance,
And the summary reflects the amount still owed.

Scenario: Multiple invoices, some paid, some open

Given I have several vendor invoices — some paid in full, some not yet paid,
When I view the outstanding balance ledger for that vendor,
Then each invoice shows its correct current balance,
And the summary totals reflect only what is genuinely still owed.

That story is clean. It communicates clearly. But here’s the question: how much is “part of it”? What exactly appears in the summary?

The scenarios work for a quick read, but they’re not verifiable yet.

Enhancing the Story with Specification by Example

The enhance-story-with-spec-by-example skill handles exactly that.

It prompts the AI to look at the story, identify what entities are involved, what state changes and calculations happen, and then generate concrete scenarios using markdown tables with real numbers that show the math.

After running the skill, the same story looks like this:

Scenario: Single invoice — payment applied in full

Given the following account activity:

Date	Document Type	Document #	Initial Amount	Applied To	Applied Amount
2026-06-01	Invoice	INV-2026-112	14750.00	—	—
2026-06-05	Payment	PAY-4481	14750.00	INV-2026-112	14750.00

When I view the outstanding balance ledger for that vendor,

Then the entries list contains:

Date	Document Type	Document #	Initial Amount	Current Balance
2026-06-01	Invoice	INV-2026-112	14750.00	0.00
2026-06-05	Payment	PAY-4481	14750.00	0.00

And the summary shows:

Total Owed	Open Invoices	Unapplied Payments
0.00	0.00	0.00

Scenario: Partial payment on a large invoice

Given the following account activity:

Date	Document Type	Document #	Initial Amount	Applied To	Applied Amount
2026-06-03	Invoice	INV-2026-119	22300.00	—	—
2026-06-07	Payment	PAY-4502	9000.00	INV-2026-119	9000.00

When I view the outstanding balance ledger for that vendor,

Then the entries list contains:

Date	Document Type	Document #	Initial Amount	Current Balance
2026-06-03	Invoice	INV-2026-119	22300.00	13300.00
2026-06-07	Payment	PAY-4502	9000.00	0.00

And the summary shows:

Total Owed	Open Invoices	Unapplied Payments
13300.00	13300.00	0.00

(22300.00 − 9000.00 = 13300.00. The math is right there in the table.)

Scenario: Multiple invoices — one paid, one open

Given the following account activity:

Date	Document Type	Document #	Initial Amount	Applied To	Applied Amount
2026-06-01	Invoice	INV-2026-112	14750.00	—	—
2026-06-03	Invoice	INV-2026-119	22300.00	—	—
2026-06-05	Payment	PAY-4481	14750.00	INV-2026-112	14750.00

When I view the outstanding balance ledger for that vendor,

Then the entries list contains:

Date	Document Type	Document #	Initial Amount	Current Balance
2026-06-01	Invoice	INV-2026-112	14750.00	0.00
2026-06-03	Invoice	INV-2026-119	22300.00	22300.00
2026-06-05	Payment	PAY-4481	14750.00	0.00

And the summary shows:

Total Owed	Open Invoices	Unapplied Payments
22300.00	22300.00	0.00

How the Skill Works

It starts by reading the story to understand the domain: which entities are involved (invoices, payments, credits), what state changes occur (recorded, applied, settled), and which calculations matter (running balances, totals, partial applications).

From there, it designs two table structures. A Given table captures the sequence of input events: what happened and in what order. A Then table captures what the ledger should look like after those events, with initial amounts and current balances side by side. A Summary table shows the aggregate totals.

The numbers are realistic. Rather than 100, 200, 300, it uses amounts like 14750.00, 22300.00, 9000.00: numbers where the subtraction is visible and verifiable.

It also covers the cases that break things. Partial payments. Multiple applications. Unapplied payments sitting on the account. Empty ledgers. These aren’t afterthoughts. They’re first-class scenarios.

What used to take me hours of careful thinking (designing the table structure, choosing numbers that demonstrate each case, checking the math, writing it all in a consistent format) now takes minutes. More importantly, the output is consistent. Every story enhanced this way follows the same structure, uses the same table keys, and generates the same kind of test output when the internal DSL test skill picks it up downstream.

Why This Matters

The spec-by-example table becomes a contract. Stakeholders can read it and say “yes, that’s exactly what I mean” or “wait, that’s not right” before a single line of code is written.

And when the developer (er, AI agent) implements the feature and runs the tests, the test output mirrors those tables. The numbers in the test output match the numbers in the story. There’s no ambiguity about whether the feature is correct.

That’s the feedback loop I wanted. AI helped me get there faster and more consistently than I ever could by hand.

What’s Next

The scenarios in these tables connect directly to how the specs/tests are written: the Given table becomes the test setup, the Then table becomes assertions, and the test output is readable enough for a stakeholder (or an AI agent) to review.

That’s a post for another day.

Claudio Lassala's Blog

Leave a ReplyCancel reply

Your First Week of Journaling

From Data to Impact: The Knowledge Cycle

The Power of Constraints and Different Minds

Trending

Your First Week of Journaling

From Data to Impact: The Knowledge Cycle

The Power of Constraints and Different Minds

The Six Core Reflection Questions