Testing Behavior, Not Implementation

I started writing end-to-end tests in Cypress back in 2020. When I decided to use the tool, I did what most developers do: I looked for best practices. From previous experience, I expected to find recommendations for the page object pattern, a way to abstract test code from browser automation commands, such as visiting pages, clicking buttons, and verifying on-screen information.

What I found instead were recommendations to use custom Cypress commands: one to log in, one to visit a specific page, etc. But then I realized something: since I write my tests using Given/When/Then statements and make a point of writing them without mentioning anything technical (no UI elements, no page references), the page object pattern didn’t really buy me anything. Neither would custom Cypress commands.

The Problem with Page-Centric Patterns

The core issue is that most of those recommendations focus on the page: the elements on the page. But I don’t think people are using systems thinking in those terms. They don’t think: “I’m going to go to the ABC page, click the whatever button, then select an item from the dropdown.” That’s not a natural way to think. We think in terms of tasks we need to accomplish. Whether we go to one page or multiple pages doesn’t matter.

When we write tests that use page objects, we’re focusing on the website or application’s implementation. We’re focused on the page, the dropdowns, the buttons. We’re shaping the tests based on the implementation. Today we have one page; tomorrow we have two. We need to update the tests whenever the implementation changes.

If we write our Given/When/Then statements while also mentioning dropdowns, text boxes, and buttons, we’re still shaping our tests based on the implementation. What is a dropdown today might be a regular list tomorrow. At that point, it doesn’t matter whether you’re writing GWT statements or not. It’s what goes into those statements that determines whether the language feels natural from a person’s perspective.

What “Natural Language” Means

When I saw the Cypress AI tool, cy.prompt(), which takes statements written in plain English and promotes itself as natural language testing, I had to think about what makes something natural language.

Here’s an example from their documentation:


cy.prompt([
  "Visit https://aicotravel.co",
  "Type 'Paris' in the destination field",
  "Click on the first search result",
  "Select 4 days from the duration dropdown",
  "Press the **Create Itinerary** button"
])

Compare that to the equivalent regular Cypress code:


cy.visit('https://aicotravel.co')
cy.get('[data-testid="destination-field"]').type('Paris')
cy.get('.search-results').first().click()
cy.get('#duration-dropdown').select('4')
cy.get('button').contains('Create Itinerary').click()

The only thing that makes cy.prompt() “natural” is that instead of writing Cypress commands like cy.visit() and cy.get().click(), the code replaces those commands with English sentences. But that’s not people’s natural language. The structure changed; the technical framing didn’t.

Look at the statements: “destination field,” “first search result,” “duration dropdown,” “Create Itinerary button.” These are all implementation details. They’re UI elements. If tomorrow the dropdown becomes a slider, or the button moves to a different part of the page, or the flow splits across multiple pages, these statements break.

Focusing on People, Not Systems

What I prefer is not to focus on technical things at all. A process I’ve followed for years is to write user stories and Given/When/Then statements that focus on people, not on the system. I avoid technical concerns and instead focus on the actions someone is trying to perform and the outcomes they are trying to achieve.

This way, if in the future a user goes to multiple pages instead of one, issues a voice command instead of clicking a button, or selects from a different type of list instead of a dropdown, the actions and outcomes people are trying to achieve don’t change. What changes is the internal implementation.

This maps directly to how we think about refactoring: changing the internal implementation without changing the externally observable behavior. That principle applies to end-to-end tests as well. Whether it’s one page or multiple pages, a dropdown or a regular list, that’s an internal implementation detail, even if the user interacts with it. It’s not the external outcome they’re trying to achieve.

When we focus on what the person is trying to achieve, we can change the internal implementation as much as we want without changing the externally observable outcome or behavior.

AI as an Accelerator, Not a Shortcut

The promise of cy.prompt() is compelling: write tests in seconds by describing the user journey in plain language. And it’s true that it’s fast and translates those statements into real Cypress commands that execute in the browser with full visibility in the Command Log.

But speed isn’t the same as sustainability. If the tests are still coupled to implementation details (fields, dropdowns, buttons), they’ll still break when the implementation changes. You’ve just automated the creation of brittle tests.

What AI tools like this can do well is handle the plumbing once you’ve defined the behavior correctly. In my previous post about moving from Cypress to Playwright, I described how I used AI to analyze my established test patterns from one framework and implement them in another. The AI didn’t decide what to test or how to structure the tests. It implemented the patterns I’d already defined, focused on user outcomes rather than UI mechanics.

That’s where AI earns its keep: not by guessing at what makes a good test, but by accelerating the implementation of tests that are already well-designed.

Transferring Patterns Between Frameworks

Because I focus on this approach, whenever I learn a new test framework, the first thing I try to figure out is how to write tests the way I prefer (Given/When/Then focused on user outcomes), and then I slowly learn the main basics of that framework at my own pace.

After several years of using Cypress, when I started working on a different project that uses Playwright, I took the same approach. In this age of AI, it was even easier to focus on how I want my tests written from the outside and let the AI figure out how to actually implement the plumbing. This lets me get feature coverage while learning the test framework at a slower, more deliberate pace, with a focus on delivering outcomes rather than mastering the intricacies of a new framework.

To make the jump from Cypress to Playwright, I followed these steps. I went to my Cypress project, where I have a very established set of patterns, and prompted AI to analyze those patterns and create a comprehensive guide: one that excluded anything project-specific and focused purely on the patterns and the reasoning behind them. That produced a markdown file, which I then took to the new project and prompted AI to implement those same patterns using Playwright.

One specific thing about the Cypress project was that I had wired up an authorization bypass, so I could say “Given I have permission to [accomplish X]” and, internally, that would skip the login screen and API calls, dropping me directly into the relevant part of the application. The Playwright project didn’t have that. So I prompted an AI to document that pattern from the Cypress project, then analyze the Playwright project to implement something similar. I was able to stay focused on externally observable behavior, explain why I wanted the pattern, and implement it. In places where the AI didn’t know what to do, it asked me questions, and I got what I needed.

The Real Test of Natural Language

The real test of whether something is “natural language” isn’t whether it uses English words instead of code syntax. It’s whether the language reflects how people actually think about the task they’re trying to accomplish.

“I want to plan a trip to Paris for 4 days” is natural language.

“Type ‘Paris’ in the destination field, select 4 days from the duration dropdown, and press the Create Itinerary button” is implementation language dressed up in English.

The difference matters because one stays stable when the implementation changes, and the other doesn’t.

Claudio Lassala's Blog