Upgrades, rollouts and potential blockers

We’ve been doing some thinking about how to proceed with a number of big sysadmin tasks that are currently on our plate:

  • Canada Production needs to be upgraded from Ubuntu 14 and provisioned with ofn-install.
  • We need to undertake a managed rollout of v2
  • We’re ready to start testing and upgrading all servers to Ubuntu 18

If we try to do all of these things at once it will be absolute chaos, so we have a proposed roadmap for accomplishing all of these jobs (in order):

  1. Upgrade Canada Production to Ubuntu 16 (with ofn-install) ASAP.
  2. Rollout v2. See: OFN v2 rollout plan
  3. Postpone testing and upgrading servers to Ubuntu 18 until Q3/Q4. This will not be a quick process, and we shouldn’t do it until the v2 rollout is finished and the dust has settled. The plan here would also include keeping half our staging servers on 16 and half on 18 so we can compare/diagnose builds on both versions whilst the upgrades are still in progress.

So, the top priority now is the upgrade and migration of Canada Production, which we can plan to do over the next month. This will need some co-ordination between the instance and sysadmins. The plan for the upgrade looks like this:

  • We’ll need a second server for Canada so we can do: Current Server -> New Server. This will need some co-ordination in terms of web hosting accounts and access etc.
  • Start setting up the new server with ofn-install (making sure to deploy with the same release), and make sure it’s working. Copy any custom files like logos, TOC’s, etc.
  • Plan a time for the switchover and announce scheduled downtime to Canada’s users (we should be able to use our timezone differences to do it while Canada is asleep)
  • At the scheduled time we shut down the current server, export the database, import it into the new server, then switch the DNS so the domain points to the new server’s address. This shouldn’t take more than 10 minutes in theory, but we can announce an hour downtime so we leave some room for the unexpected.
  • After it’s all done and the new setup is working, the old server can then be cancelled.

How does that sound?

the DNS might take longer but in any case. The timezones couldn’t be better aligned. 08 AM CEST is 02 AM EDT (that is your timezone @tschumilas, right?). So, I would highlight to users that should not be affected at all by the downtime.

I think traditionally they used to say allow “up to” 24 hours for DNS records to update, but I think these days it’s generally a few minutes. You’re right though, it’s a factor we don’t really control.

Depending on how much control the host allows, it might be possible to adjust the TTL value of the DNS record beforehand so it’s caching time is reduced…?

I would not worry about the dns switch. I have done quite a few of them, yes, it can take hours but it always took less then 5 minutes in reality.

Yeah, it just that thos 24h for ours window feel scary. I agree with you all. Let’s do it.

Sorry - missed this post before. This all sounds fine to me. Just let me know the exact days/schedule once you know it, and I’ll let our users know.

A lot has happened since we opened this post. Everyone is on v2, Canada has got a new server for some time. We’re missing

And I would that Katuma needs to be upgraded as we did with Canada. I think it’s fair that global deals with it now that it’s a fully automated process (don’t remember if this was already agreed or not).

I think we should definitely prioritise upgrading Katuma. It should be quick and painless now that it’s automated.

On the Ubuntu 18 front it looks like the Italian instance has deployed to Ubuntu 18 already, and is using it (for staging), so I guess the compatibility work we did previously is sufficient. :tada: