Following some discussion on Slack, we reviewed our deployment practice. We recognised that a lot of things changed within the last year and decided to deploy releases like everybody else.
Where we come from
When the Australian team was the only one, we were on our own journey to find the best delivery process for us. In the beginning we would merge pull requests and then do a big deploy from time to time. There was always the question: Should we push it? Do we risk it? And then we would play Salt’n’Pepa’s Push It while running the deploy process to the production server. Sometimes we did that on a Friday afternoon and regretted it, because we had to quickly fix things after hours or at the weekend while having other plans. So we came up with a rule: no deploys close to finishing work.
Finding out what went wrong was particularly difficult with big pushes. Which change broke it? So we worked on continuous delivery. We set up a delivery pipeline in Buildkite:
- Run all automated tests. We had our own server at first and then switched to Travis.
- Review the pull request. One review by a peer developer was enough. It was Rohan, Rob and me at the time.
- Have the pull request be in sync with master. We rebased them ourselves, but the CI server would merge master into it before staging as well.
- Stage the pull request.
- Let Sally thoroughly test it. She would not just test the feature, but also test all areas that could be affected and always test login and checkout to prevent any S1 reaching production.
- Merge the pull request into master. The script would sanity check that the pull request is still up-to-date and nothing else has been merged into master since staging.
- The new master would be deployed on production automatically.
Since this was modelled in the Buildkite pipeline, there was no way around it. It was the only way of doing things and the CI scripts would enforce this process.
We would test, merge and deploy several times per day. Sally had become an excellent tester by the time already and we were super confident. New bugs in production were very rare.
When there was an international community, we started creating releases. They were really just a way to broadcast the changes and remind people to update their servers. They had no direct use for us.
The changes within the last year
After our gathering last year we changed a few things to open the pipeline for the new international team, random contributions and a quick delivery for everybody.
- We abandoned the strict use of our Buildkite pipeline to enable other people to merge without being coupled to our deploys.
- Anyone in the core dev team can merge a pull request via Github.
- More people started testing and we tried to formalise the process and transfer testing knowledge.
- Sally, our testing guru, our only local tester, is leaving.
- The number of pull requests increased.
- Releases are created by the dev team and they became more frequent.
- The Aus dev team is smaller (just me).
- I now deploy the master branch to production several times a week, usually including multiple pull requests (not at the end of the day).
- We introduced release testing.
- We skip basic release testing on every pull request. (Right?)
I’m not sure about the extent of the pull request testing at the moment. I’m not sure if it was a deliberate decision to skip more general testing for every pull request, if that was just lost in the transfer of testing knowledge or if we still do it. Can you answer that @MyriamBoure
Our planned changes in Aus
Beginning 2019, we would like deploy releases like everybody else. This means that we won’t be the guinea pig any more and behave like the rest of the global team. While we still believe that frequent deploys minimise the risk of introducing multiple bugs at once and simplifies finding the bug, we are also recognising the new risks of doing things differently to the rest of the team and not having everything tested by Sally any more. The fact that there are release tests implies that master is usually not tested for production use any more. It is also not feasible to deploy one pull request at a time without a new way of automation.
This means that we will all be on the same page and pull into the same direction together to improve our release and deployment process.
As you can hopefully see from this post, our practice of continuous delivery was the result of a lesson learned the hard way. And I still think it is the best way to minimise risk and we should try to find a new way to make it possible. We need to compromise now to build a team, but I hope we won’t repeat the mistakes from the past.
Having a global team is a problem that made our old process infeasible, but it’s also the solution. When pull requests are merged while I’m asleep, Europeans can actually deploy it to our server and minimise the risk of affecting prime shopping time.
I would like us to speed up the release cycle to a point that every new pull request merged into master is a new release. The pull request testing and release testing would come together again. We then need a new way of deploying this to several servers in an iterative way to minimise risk.
Just an idea. Imagine a pull request is merged in Europe. You can deploy to Australia while everybody is asleep there. If something like a database migration fails, you have plenty of time to fix it. If everything is fine, maybe after an hour, you can deploy to an American server. People there may just be waking up. We can work our way through the time zones, deploying one server at a time, one pull request at a time.