OFN Australia adopting the common release cycle

First paragraph:

In the majority of failed deployment situations, it probably makes more sense to revert the bad code and redeploy, rather than running deploy:rollback. Capistrano provides basic rollback support, but as each application and system handles rollbacks differently, it is up to the individual to test and validate that rollback behaves correctly for their use case. For example, capistrano-rails will run special tasks on rollback to fix the assets, but does nothing special with database migrations.

That’s exactly what I meant.

Probably @Rachel would be better placed than me to answer… I like the idea of continuous deployment and in that sense release testing doesn’t make much sense. When we do production upgrade usually (we didn’t lat time…) we do a sanity check which is :
1- I can login
2- I can create a product and it appears correctly in the shop as supposed (so check the OC quickly)
3- I can checkout
4- I see the order on the order view

I think it’s 5 minute to check that, so we could do that for each PR if we move in that direction. And as you say, try to test some more “intuitive” potential impact it could have on top of pure “what to test” description… intuition can only come with experience and time to dedicate to testing (if tester is in a hurry will not take the time maybe to pause and think what else can be impacted…)
So for new testers it would be good to have a Ha or Ri tester review the test because she would think of some potential impact to test that the Shu tester would not think about… @Rachel I’m not sure we did that so much with you, maybe we should have ! How do you feel about this “intuition” on what can be impacted ? I think I remember some conversations where you disagreed about the fact that we should try to think about any case that the PR could affect… but it is a fact also that we had more bugs recently coming through, and even if probably multifactorial, I think Sally was doing more extensive “intuition based” use case testing…

I don’t know what is best, spend more time in testing, break more things, and so release less quickly but more safely, or do less “intuition based testing”, release without being afraid of bugs, and then fix them quickly… this “testing design principle” is essential to me in this conversation about continuous deployment…

But I don’t have much experience in software release pipe management so your experience will be precious @Rachel, and probably @danielle can have good inputs there as well with her own experience (and her suggestion to hire a professional tester).

Nope that’s not what I said. Of course introducing new stuff should be tested and the rest of the platform should remain the same and we need to make sure of that. But as much as the RI dev who is working on, there will never be a RI tester able to think of everything. This is human nature :slight_smile: So, we should have detailed testing processes and run them every time. By detailed I mean with key info such as steps, URL, profile… But even with this there will still be mistakes because this is too much for a human tester to do on each PR (let’s not forget test should be ran on different browsers yet along different devices). So we need automated tests for that. I don’t see any other way to improve testing quality.

Sally confirmed on slack that there never was a sanity check per PR (sorry I can’t find the post anymore :frowning: ) But you were tag in this discussion I think @maikel :slight_smile: We introduced regular sanity checks on stagings with release stagings AFAIK. The rest of the sanity checks were done in production, but I’m not even sure every instance are aware that they should do that :confused:

Well… about that… we had a pretty annoying week true. Is this still a fact? I would love to improve what I’m testing, but I have no fact-based info where I can see that a specific bug was introduce by a specific PR that was not “extendly” tested, and therefore know what I was missing in testing. The only one I can think of is the instagram links, because yes I went to quick on this one and forgot front office.
So unless there is, the only solution to me is documenting testing processes so that A. every tester knows what to do in each case (and you don’t need to be RI to do that, if there is a process anyone can follow it) and B. we can cover our app with more automated tests.
And yes like, I said on slack, a professional tester could be very helpfull on that.

Thank you @Rachel and @MyriamBoure for all that input. We do have a lot of automated tests. Login, checkout, orders page, all that is tested automatically. A detailed testing manual can just be converted into an automated test. Computers are good at following instructions. But there are some things that our tests don’t cover:

  • Layout issues. It looks ugly, may look unusable, but all text is still there and you can use it. The solution to this would be visual acceptance testing. One tool for that is https://percy.io/.
  • Real communication with external services like Stripe. The solution to this would be to create automated tests for a staging server. A good tool for that is Selenium. @Rachel Have you worked with it? You can actually record a testing session and replay your clicks and form fills for any pull request.
  • Complex real world data. The space of possible data is too huge to be generated in an automated way or for a test environment. We could test with every production database and would still not be able to catch a bug that appears with data entered by a user tomorrow. I don’t know of any reasonable solution to this problem. We can just try to make our test data more and more complex and we can run database migrations with production data. But there will always be bugs in production, we have to live with that.
  • Using context of recent events to try different edge cases: intuition based testing. Computers are not smart enough to compete with humans on this level.

These are some ideas, but I think that our testing process is pretty good already. I agree with Rachel that most of the recent bugs have not been introduced due to bad testing. They appeared due to more complex production data which we will never be able to cover completely. It’s growth pain.

@Rachel How long does release testing take at the moment? Do you agree with the 5 minute assessment from Myriam?

1 Like

Great summary @maikel ! It express what I was trying to say in a more comprehensive way :slight_smile: I haven’t worked yet with selenium but I 've seen demos :heart_eyes:

About release testing : what Myriam was refering too was production testing. Release testing is heavier with more use case see example here : https://docs.google.com/document/d/1NjxmE11lA2z_JJ0kwSZvfVudliik3kEoMMUw-tU_j7g/edit#heading=h.4ctypvfojpgm

It took me at least one hour, and I didn’t test everything, Myriam and Sally took over some part that I didn’t know well (reports e.g.)

Wow, that is huge. I would like to make two suggestions.

  1. We should review the release testing template and find the scenarios that are covered by automated testing. We don’t need to test those manually.
  2. I would like to avoid the double-checking of all the merged pull requests. But that needs a little modification of our process. After a tester gave their okay, we just merge it into master. We developers do that anyway. There is no other assessment we do. We just merge it. So we can automate that or give testers permission to merge. And if every successfully tested pull request is in master, we can merge master into every pull request when staging and it will be up to date. It is very unlikely that a pull request breaks an existing feature.
2 Likes

I really like those two suggestions @maikel :slight_smile: let’s see what others think about this!

I was talking about production testing yes so if every PR would become a release we would do the similar production like sanity check IMO.

One thing I don’t understand @maikel is that if there is continuous deployment (every PR merged is a release) there is not anymore release testing, only PR testing, so I don’t understand your point 1. We will always want to do sanity check about checkout even if there are automated test… so we would still do manual sanity check for each PR.
I agree that double checking all PR is not ideal (and doesn’t make sense if every PR is a release!), but @Rachel idea was that testing them all together can reveal bugs.

I’m wondering one thing : shouldn’t we have rather than continuous deployment (every PR is a release) in our case have regular releases like we do now, but test the release in a “pre-production server” as it is usual in many software projects ? Which would be UK/Aus or Fr production like for instance to have real data and reveal bugs when we test the release.

If we do continuous deployment, then I don’t understand why we wouldn’t all be directly on Master ? If we want every new PR to be directly a release, it would avoid the upgrade process… like Aus was actually. If there is an issue with a PR we revert the last PR… for me the evolution toward OFN install was to enable quick releases and upgrade but is kind of contrary to continuous deployment (if I understand well). Is it adapted to continuous deployment like you suggest Maikel = each PR becomes a release ?

Maybe I misunderstood some bits so please correct me, but what I understood in last hangout discussion was that we could have quick and regular releases as we do now, with 2/3 groups of simultaneous upgrade (by time zone) but I think it might be too much of a mess to go toward “each PR is a release”… adding a release testing on a pre-production server could avoid the “guinea pig” effect…

@MyriamBoure there are a few things that I don’t understand in your proposition :

That is exactly what we are doing now : we test the release on a staging server. What are you suggesting we should change or add on that?

Maikel’s suggestion number 2 avoids this problem : the proposal is to stage each PR with the most up-to-date master. So except for the last PR of a release, we would have already tested them together. That would save us a great deal of time, and also increase our testing quality.

IMO you have to separate the discussion on automated tests (Maikel’s point 1) and continuous deployment. Here point 1 is about increasing testing quality. You can have automated test and continuous deployment or automated tests and releases.

Agree with @Rachel. I feel we’re talking about several things here and although we might want them all we can’t have them at the same time.

So to me, continous deployment is something I’d like to aim for next year but first, we need to improve the current process. So let’s put continuous deployment aside for now.

I think what @MyriamBoure with a pre-production server is a server that uses production’s DB so that we test with the same complex a amount of data.

I broadly agree with @maikel’s ideas so let’s turn this discussion into actionable issues that allow us to iterate.

I was wondering if we can make our PR testing good enough for releases. The process would be to make release testing more efficient and include more into PR testing until they are the same.

Very well. It sounds reasonable to me to test the checkout as the most business critical feature. For example, the production and staging environment is communicating with Stripe while the tests simulate the communication with Stripe. But we don’t need to test five different enterprise fees manually, because that should be tested automatically and the communication to Stripe is not affected by the type of enterprise fee.

I think it’s good to review the manual testing from time to time and identify parts that can be automated so that they don’t need to be tested manually. And I agree that there will always be some parts of integration testing that can’t be covered by automated tests.

As Rachel asked: What is the difference to a staging server? I think you are saying that our staging servers don’t have that complex data. You would like production data on a staging server. That is almost possible. The process of copying the data from production to staging involves some changes though. Secrets like Stripe keys need to be replaced or removed so that we are not transferring money into real accounts. The email setup needs to be changed so that we are not sending emails to real people. We need to put the application in a sandbox to test with production data without affecting the real world. That’s a medium sized project.

We would. With Continuous Deployment master is always the latest release. We can give a name or leave it.

Ofn-install and our custom deploy script are just two ways to update a server to a specific version of the code. Both work and are efficient enough for continuous deployment. With either of them you can deploy master or a release. We invested work in ofn-install so that it works for all instances, is more reliable and we can share the sysadmin work globally.

Talking to the other core developers, we agree that we would like to work towards continuous deployment. But that’s a slow process, because we need to clarify the impacts on the whole process and adapt our testing and deployment methods. We changed a lot within the last year and it still feels like we are experimenting with out processes. We need to run the current model for a few cycles to identify what works and doesn’t work and then iterate. We are trying to become more efficient and make more and smaller releases. And one day the smallest release contains only one pull request.

Maikel, your 2 suggestions above are awesome! Specially 2. I think we should go for it. I agree with everything you are saying on this thread, not much I can add.

re point 1. I wouldn’t worry about sanity checks in staging, I dont see much value in repeating the automated checkout test in staging… how probable it is to have a green build with a broken checkout in staging? Otherwise, I think testing integrations in staging and sanity checks in production are important.

re point 2. we should stop staging PRs without merging master into them (AUS: semaphore staging is merging master into the PR, UK: I am not sure the UK staging process is merging master, ES/FR: not merging master). We can do this as we move all staging processes to semaphore.

Matt’s original implementation didn’t merge master into the PR. He changed his pull request so that it’s now using a merge commit, but I’m not sure if he updated UK staging.

Preproduction is “same as production” which is not the case when we test on staging, they have staging data not production data. Preproduction is you copy the production and test on it so you have the real data and real problems… I learnt that in the discussions with GLP IT architect when they were thinking about using OFN.
https://www.ibm.com/support/knowledgecenter/en/SSWT7D_1.0.0/com.ibm.commercecloud.overview.doc/concepts/cov_overview_environment_preprod.htm
From what I understood any reasonable size project has that, @sauloperez you didn’t have that at Redhat ?
I don’t know if we would need that if we go toward continuous integration… maybe not.

I think I was confused between release and upgrade. In my mind on continuous integration as soon as a PR is merged a new release happen and a new upgrade happen, but probably not, we still have a periodical upgrade process from what I understand now.

Anyway it’s a bit clearer to me and all what you said makes sense, it seems you have a plan that you are all happy with so let’s action it !

Thank you for the clarification @MyriamBoure. Pre-production could be useful, but it’s not as simple as copying the data. We need to make sure that we are not emailing real customers, charging real cards etc. Automating the database changes could be a large project, maybe only medium.

@MyriamBoure @maikel we are actually doing both. UK staging for example has production data. I agree that concerning people’s data it’s not at all a good practice right now (it is not anonymized), but it really helped me on Bulk invoice printing to have real live cases on a large database.

So maybe the first step here is to ensure our staging data is getting better/larger (we already started that) and see in a second one if we need to set-up production clones?

I’ve been thinking lately about continous delivery and continous deployment and I wanted to study a bit more the topic. From our beloved Martin Fowler, I came across https://www.continuousdelivery.com/ and I think it’s one of the best sources on the matter. Make sure you check it out!

Also, I remember thinking there might better things we can do other than investing in pre-production environments but I don’t remember what was the alternative I thought of or why :trollface: I’ll try to remember.

Hello!

Re point 2 staging PR to master, I think we are doing it right now, is this correct? ping @maikel @luisramos0 @sauloperez @kristinalim @Matt-Yorkley

If we are doing this, tester can run a 5 min sanity check on each PR. This current sanity check involves:
1- You can login
2- You can create a product and it appears correctly in the shop as supposed
3- You can checkout
4- You see the order on the order view

More than that on each PR and testing will take too much time. If we do this and we remove release testing it would mean that features like BOM, Inventory, Subs, won’t be manually tested regularly unless there is a PR touching them.

I’m ok with that because I would prefer us to work on improving our automated tests (including an acceptance test framework Seed data [development] [provisioning] [deployment] or visual testing Automated visual testing ).

Any thoughts? Pinging @lin_d_hop as well

Those four tests seem very reasonable. And we can automate them as well. Let’s have a session about Selenium at some point. @Rachel

In addition, we should always look beyond the scope of the current pull request and think about other areas that could be affected. That may be a guess. Or if we can’t think of anything, we can just choose something we haven’t tested for a while. The goal is that we do test features like BOM or inventories from time to time, just not for every pull request. But if we cycle through them, we could still have a really good test coverage at any given point in time. And I don’t think these tests should be standardised. They have to be creative to increase our chance of catching uncommon bugs.