Specs, Not Tests: A Better Way to Think About Automated Verification

During a training session recently, a developer looked at one of our Cypress test files and asked: “Why does our code look nothing like what you see in any tutorial?”

They were right to notice. If you pull up any Cypress getting-started guide, you’ll see something like this:

cy.get('#submit-button').click()
cy.url().should('include', '/success')
cy.get('[data-cy="form-field"]').type('some value')

What you’ll see in our codebase looks closer to this:

given_I_have_permission_to_adjust_inventory()
when_I_select_a_lot_and_decrease_the_quantity()
then_the_updated_count_is_visible_in_the_list()

The difference goes deeper than style.

The Word “Test” Gets in the Way

When we call something a “test,” we’re describing what it does mechanically: it runs, it asserts, it passes or fails. That framing quietly puts the emphasis on verification, on the outcome of something we’ve already built.

“Spec” is an entirely different frame. A spec describes what should be true, independently of how it’s implemented. It’s a specification of behavior. The fact that running it also verifies the implementation is a byproduct.

I’ve started saying this in sessions: some people don’t call these things tests anymore. They call them specs. And once you say it out loud, it’s hard to unsay. Because the word carries the intent. You’re not checking whether code works; you’re describing how a feature should behave.

The Fragility Problem

Traditional automated browser tests, the Selenium-era kind, tend to be tightly coupled to implementation. They say: this page, this button, this dropdown, this CSS selector. Everything is tied to how the feature was built at a specific moment in time.

That creates a fragility problem. If a feature is refactored, say a single page becomes two pages, or a modal becomes an inline form, the test breaks, even if the behavior hasn’t changed at all. The user still does the same thing. The outcome is still the same. But the test now fails because the internals shifted.

This is a real cost. It trains teams to dread refactoring. It makes every UI change feel riskier than it should be. And it contributes to test suites that people quietly stop trusting.

What Changes When You Couple to Behavior

When a spec describes behavior instead of implementation, refactoring becomes less threatening.

The spec says: given a warehouse manager with permission to adjust inventory, when they select a lot and decrease the quantity, then the updated count is visible. Nothing in there says how many pages are involved, what the button is labeled, or what class the form input has. All of that lives one level down, inside a context class or harness, abstracted away from what you read at the scenario level.

If the UI is redesigned tomorrow, there’s one place to update: the helpers that translate English-like method calls into Cypress interactions. The spec itself doesn’t move.

This isn’t theoretical. Over the life of a long-running project, the specs that read like behavior descriptions have stayed stable across multiple redesigns. The ones that started tightly coupled to the UI needed to be updated every time anything on the screen shifted.

The Given-When-Then Connection

This framing lines up naturally with Given-When-Then, not because GWT is a rule you follow, but because the language of GWT already describes behavior. “Given a state of the world, when something happens, then these are the observable results.” That’s a specification. It’s not a description of code.

When I can read a scenario like a sentence, and follow what the user is doing and why, without needing to know anything about CSS selectors or DOM structure, then I know the spec is at the right level.

What This Looks Like in Practice

The technical details don’t disappear. Cypress still needs to know how to find a button, fill a form, and verify a response. That code exists in context classes and helper methods. But it’s abstracted behind a meaningful name.

The division matters: the spec is for understanding. The helpers are for implementation. You can read the spec to know whether you’re capturing the right behavior. You only need the helpers when something breaks at the Cypress level.

The question shifts, too. It used to be “What do I test?” Now it becomes “what is this feature supposed to allow a person to do, and what’s the plainest way to say it?”

The Byproduct That Matters

When a spec passes, that’s a byproduct. The spec’s job is to describe the behavior. The verification is what you get for free when it runs.

That’s not a minor distinction. It changes how you write the spec, how you name things, and what you notice when something fails.

A failing test says: something broke. A failing spec says: this behavior is no longer happening as intended. The second framing is more useful. It draws your attention to what the user experiences and away from how the code is structured.

Which is probably where your attention should be in the first place.

Claudio Lassala's Blog