Early in most projects, someone sets up a quick script to populate the database with test data. It works fine at first. You run it, you get a handful of records, you start building. Then the project grows. New features need different amounts of data. Some features need consistent, stable records. Others need volume. The sprint review is tomorrow and the demo tenant should look like a real seafood distributor.

That’s when the original “just seed a few records” script starts to buckle.

I’ve dealt with this problem on several projects. In the one I want to talk about here, we found ourselves reaching for a more intentional solution, and it’s one I’m glad we took the time to build.

The Problem Was Bigger Than It Looked

On the surface, the problem was “we need data in the database.” But the actual requirements were more nuanced:

  • We needed small, fast seeds for day-to-day development and integration tests. If the database gets reset every time the backend restarts, re-seeding should take seconds, not minutes.

  • We needed feature-specific seeds tuned to whatever story we were actively building.

  • We needed demo-ready seeds for sprint reviews, trade show demos, and client presentations, each with data that resonates with the audience.

  • We needed stable, repeatable seeds for end-to-end tests, where the same IDs and names need to be present every time.

Four different needs. One mechanism to serve them all.

No Direct Database Inserts

Here’s the constraint that shaped everything: we couldn’t just insert rows into the database.

The system uses event sourcing for many of its aggregates. Every state change is the result of an event being appended to a stream. Projections rebuild read models from those events. If we bypassed that and wrote directly to the database tables, we’d get records with no event history, projections out of sync, and domain logic left untested.

So the seeder doesn’t use SQL inserts. It creates real aggregate instances in C# and saves them through their repositories, just as the application would at runtime. That means events get written, projections get rebuilt, and the whole flow runs. We’ve caught real bugs this way, before they ever reached a feature branch.

Slower than raw inserts? Yes. Worth it? Absolutely.

YAML That Humans Can Actually Read

The seed data lives in YAML files. That was a deliberate choice, borrowed from my time working with Ruby on Rails projects going back to 2011.

The goal was for someone to open a seed file and understand it without being a database expert. Instead of referencing a category by its GUID, you just write the category name: category: fruit. The seeder looks up the ID. That’s the kind of thing that makes maintaining seed data sustainable over time.

Each YAML file holds data for a specific domain area (accounting, catalog, operations, inventory) within a specific dataset. Different datasets represent different scenarios:

  • Templates — minimal, used by integration and E2E tests

  • Feature datasets — tailored to whatever story is being built

  • Demo datasets — industry-specific data for client presentations

  • Trade show datasets — tuned for dashboards and cockpits

When a sprint review is coming and the demo tenant should look like a real produce distributor, we load the produce dataset. When an integration test runs, it uses the template dataset with stable IDs it can rely on.

More Than Development

Once stakeholders saw what the seeder could do, the question came quickly: could we use this to onboard real clients?

The original design goal was a development tool. But we had already built an integration with the client’s legacy system to export their item catalog, customers, and vendors into our YAML format. That same YAML could flow straight through the seeder. Suddenly we had a path to load a demo environment with a client’s own data before a presentation. They see their own items, their own vendors. That changes the conversation.

That path eventually pointed toward production onboarding: letting a new tenant import their data, practice with it, and go live with sanitized data. We’re not fully there yet, but having the seeder in place made the distance much shorter.

A Tool That Grew With the Project

What started as “we need some records in the database to develop against” grew into a system that supports development, testing, sprint reviews, industry demos, client onboarding, and multi-tenant environments.

What I’m learning is that infrastructure decisions made early in a project tend to compound, for better or worse. In this case, treating the data seeder as a first-class concern, with real thought put into human-readable formats, event sourcing compatibility, and flexible dataset sizes, paid off in ways we didn’t anticipate. Not because we predicted those needs, but because we built something honest enough to grow with us.

Leave a Reply

Trending

Discover more from Claudio Lassala's Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading