DB "snapshots" and sample data on staging servers

We’ve been chatting both on Slack and delivery circle on requirements for the best and cost effective way to save and restore the DB state on a staging server. This could serve as a good starting point to manage new sample data.

In short,
testers would like to have the possibility to:

  • take “snapshots” of the DB in different states, in staging servers, and be able to restore them after making changes on the server
  • use sample data as an alternative to set up complex test scenarios
  • do this via Github Actions

The longer version

Describing the use case

A tester wishes to test a PR on invoices, for example. To do this, the tester chooses a staging server, stages the PR and encounters several, pre-existing enterprises on that server. Currently, these mostly come from anonymized production data, and from previous manual tests. At this stage it’s not possible to know, which enterprise has the settings required to test invoices and related features.

This means, chances are, the test setup may have to be created from scratch, (i.e., manually create and set up an enterprise and, in some cases, change the tax settings of the server, as superadmin, etc). After this is done, the DB will eventually be changed due to test execution and server use. So, it is challenging for a tester to repeat the tests at a later stage, and assure reproducibility.

Saving/restoring DB “snapshots”

It would be great to be able to roll back the DB to a previous state, after some changes were made on the server. This way, tests could be repeated without needing to set up everything from scratch. So, ideally we’d come up with a process to:

  • save a “snapshot” of the DB in a given state
  • roll back any changes

Sample data

Manually setting up test data has the advantage of being very granular: we can create and set up the app exactly the way it is needed. However, it can be very time consuming, and with limited reproducibility (with manual set up it’s fairly easy to forget minor details).

As an alternative, generating sample data (and server setup) by running scripts to manual setup, has been implemented for some time now, but it’s only i) being used on the development environment and ii) only covers very basic test scenarios.

Extending sample data

To extend the existing sample data, we’d need to code specific test setups, basically, extending or creating files similar to this one.

Bringing it all together

Having a process to control the DB state on staging servers would allow:

  • saving the DB state, after setting up complex test scenarios
  • restoring to a previous DB state, would save setup time and improve reproducibility, before starting manual testing
  • serve as a process to manage (new) sample data

Open questions

  • what is the best tech approach to implement DB “snapshots” in staging? (@maikel @dcook seemed to have a pretty good idea on this one already)
  • what is the time estimate?
  • which priority?
1 Like

There are two separate tasks here. The first one is to provide a way to take and restore database snapshots. The second is to expand the sample data set.

More sample data

It’s relatively easy to add more sample data. But we can’t estimate this work unless we know what kind of data we need to program. As a rule of thumb, I would guess half a day to a day of work for a data set supporting a new kind of test scenario.

Testers need to agree on what kind of setups they need regularly. Writing the code for a setup that is only used for one pull request is not worth the effort. But we kind of do that already when writing specs for a pull request. So it’s good to do it at the same time.

One idea could be: when you encounter a pull request and recognise that you need to create certain test data which will be useful in the future, too, then you go back to the author and request test data to be added to the sample data. It’s good to be specific here. For example: “I need two enterprises, one allows order changes and one doesn’t.”

Database snapshots

During deployment, we already take a database backup. We can add a parameter to the deploy action to copy that backup in a separate folder under a certain name. For security, we should limit the name to the following characters: A-Za-z0-9_.-

We can also offer an input field to restore a database backup by name. It’s not easy to list the existing backups in the Github Actions UI. So maybe testers would keep track of the snapshots in the wiki? The list should include the date, staging server name and purpose.

I would estimate 2 days for this work.

Listing the actual database backups in the admin UI on the server can be done in a couple of hours.

1 Like