There’s a moment in software development where things start to feel like they’re actually fitting together. Not just working, but fitting. I had one of those moments recently while wiring up end-to-end tests for our Engage platform at Improving, using Playwright for the first time, and watching a workflow I’d been building for months finally close its loop.

Let me back up a bit.

User Stories as the Starting Point

For a while now, I’ve been particular about how user stories are written, and it starts with the format.

Most teams learn “As a [role], I want to [action], so that [value].” My problem with that: it buries the most important part, the why, at the very end. I’ve sat through far too many planning meetings where people spend 45 minutes debating buttons and technical decisions without ever getting clear on why they need the feature.

Years ago, I came across an alternative on the Cucumber website: flip it around and start with “In order to…”. Instead of “As a marketing manager, I want to send direct mail with no duplicate addresses so that I save on mailing costs”, I write “In order to save on mailing costs, as the marketing manager, I want to send direct mail with no duplicate addresses.”

My standing recommendation to teams: if we can’t finish that “In order to…” sentence in plain English, we have no business writing code. When developers understand the real motivation, they can suggest better approaches rather than blindly implementing a prescribed requirement.

From there, each story expands into acceptance criteria and Given-When-Then scenarios, which drive the tests. I have a workflow that shapes problem statements and conversation transcripts into stories in exactly this format.

This approach works great with AI tools, too!

What’s changed recently is what comes after those stories get written.

engage-stories.png

Stories That Live Next to the Code

The user stories for the Engage features I’ve been building don’t live in Azure DevOps. They’re Markdown files sitting right alongside the code. No API calls to pull them, no MCP server needed. Just a file the AI can read directly.

When I asked the AI to write end-to-end tests for a feature, it could read the stories right there in context, understand the intended behavior, and generate Playwright tests that covered those scenarios. It didn’t have to guess what the feature was supposed to do.

That’s where BDD earns its keep, not as a process formality, but as a shared vocabulary that spans design, implementation, and testing. From humans to AI.

Making the Codebase Testable

The Engage codebase wasn’t set up for the kind of end-to-end tests I write from stories (verifying end-to-end journeys). There was work required to structure things so that the tests could stub data properly and run the front end in isolation, without needing the back end running at all.

That setup work is real. It doesn’t happen automatically. But once it’s done, you get something valuable: tests that run fast, run reliably, and don’t require a full stack just to verify that a UI behaves correctly, serving a person’s journey through accomplishing their tasks.

engage-e2e-tests.png

Switching from Cypress to Playwright

I’d used Cypress before. It works. But while watching Playwright run the tests I’d just generated, I noticed something: the tests ran faster, and the AI seemed to write them more naturally, too.

That last part is just my read on things, not a controlled experiment. I’ll keep paying attention to it.

I’ll admit: I’m enjoying Playwright. I can see why so many people have moved to it.

Feature Overviews as Living Documentation

After the tests were in place, I ran a separate workflow to generate a feature overview: a short document describing the feature, what it does, and screenshots pulled from the test run frames. Something a teammate, a stakeholder, or a customer support person could read to understand what was shipped.

The screenshots are captured from the video produced by the tests. The AI selects frames that illustrate the key moments in each story, focusing on the meaningful interactions rather than the edge cases. The result is a readable document that describes the feature visually and narratively.

I’m planning to put these alongside the code in the repository. The files are small, around 20-37 KB per image. And having them near the stories and the tests makes the whole thing more coherent. Anyone coming to that part of the codebase later can see the stories, find the tests, and read the overview.

engage-feature-overview.png

What This Closes

What I’ve built here is a loop: a problem is described, stories are written from it, features are implemented, end-to-end tests cover the behavior, and a feature overview documents what was built. Each piece references the one before it.

That loop doesn’t require a sophisticated tool stack. It requires consistency in how stories are written and making them accessible to AI agents and workflows.

The BDD Given-When-Then format isn’t the point in itself. It’s what it enables: a shared language that a developer, an AI assistant, and a stakeholder can all read without translation.

That’s the version of the process I find worth maintaining.

Leave a Reply

Trending

Discover more from Claudio Lassala's Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading