OFN v2 rollout plan

OFN v2 is almost ready to go.

We need to define our roll out plan.
In the last upgrade catch up meeting we have agreed that:

  • We will send an email to Katuma users to let them know about the deploy of v2
  • We will clearly define how to reach Katuma’s support for that
  • We will schedule downtime for the deploy
  • Deploy v2 to Katuma on April 29th
  • Deploy to another small instance (likely Canada) on May 6th
  • Deploy the 3 big instances after the gathering
  • Deploy the rest (still not clear exactly when)

re Katuma’s date , @sauloperez wrote this plan himself so I assume he agrees with it :smile:

re CAN’s date, @tschumilas said on slack: “May 6 is actually a really good day for this in Canada. Our 2 larger more complex hubs (ie: using inventories…) look like they are closing their cycles on May 5, 8:00 pm our time. So big users won’t be ‘up’ again until May 8. Other users right now are smaller single farm stores -so not very complex. I can reserve May 6 & 7 for issues, and I’ll email users to let them know - what exactly? OFN is down? Or just altering they might see changes? So happy to help support this anyway I can.”

We can use this thread to discuss the plan and then for the specific details of the release we can use the normal release issue, here’s the v2 release gh issue.

An update. I discussed it with Guida and she claims that Mondays are not so good day as many hubs close their OCs that day, and so we better deploy on Tuesday. I want to check the reality in Matomo though.

I’ll keep you posted.

1 Like

hello, my suggested approach.

Somewhere before the April 30th:

  • prepare both v1.31.0 and v2.0.0 (details about code branches here)

April 30th:

  • release v1.31.0 to all instances except Katuma that gets v2.0.0.

May 6th

  • release v2.0.0 to CAN

Maybe the 30th of April is too early, shall we move Katuma to the 6th as well Pau? Or maybe start with CAN on May 6th?

Just one concern on my side:

April 30th: release v1.31.0 to all instances except Katuma that gets v2.0.0.

To make this work on the testing side I need to be able to start testing at least next Wednesday (April 24th). Is it doable from a dev perspective?

I would then need to give you the result of the tests by at least Thursday so you have time to get the release published or do a spike of the remaining bugs that I found.

I can make room for two release testings, but if @lin_d_hop you have room to take one of them it could be a good thing. I’m really afraid of missing a bug because I would be doing the same thing over and over again during a short period of time :frowning:

I’ll take one of them on @Rachel. I’ll wait for your direction as to which.

I’m not a super tester or anything so grand - but I can also help testing v2 - but realistically - I won’t do much until after Easter (so apr 23/24 ish) - and I’ll let @Rachel know if I find things – because frankly, I don’ t know the process of logging bugs into GH so well. (unless anyone has other directions to me.)

Second about pushing thorugh the V2 to Canada on May 6. This turns out to be a problem. There is a large hub that closes May 6 at 10:00 am our time - so that is May 6 afternoon in Europe. Is it possible to wait until after 2:00 ish in Europe to do the downtime and V2 to Canada? Or if you need to do it in the am - can it be May 7 am?

I’m glad that not the only considering moving the deploy to the 6th. The closer we get to the date the more I don’t see it possible. So checking the calendar I see that I’ll be working just two days between today and the initial deploy date, which was the 29th: tomorrow and the 22th (I’ll spend 3 days off that week).

Then, as I mentioned above, Guida suggests moving the deploy to a Tuesday but other than that she thinks user won’t notice anything. We are just the ones feeling a bit nervous :sweat_smile:. I will check numbers tomorrow.

So that being said, similar to what @luisramos0 suggests I think it’s better to deploy the May 7th. It gives plenty of time to prepare releases and have the server provisioned with the latest changes as well as time to communicate the data to Katuma’s user base. We just need to commit to a date so we can clearly communicate it to them.

Lastly, while chatting with @Matt-Yorkley he suggested we better replace CAN with BE as the 2nd instance to deploy to because it is a server configured with ofn-install. That will give us much more confidence at this still early stage of the roll-out process. Besides, we don’t want to tie any server upgrades with v2’s roll-out as it would make it a lot longer.

It turns out, that in the last Matomo weekly report BE had a bit more traffic than CAN so the impact will be similar. Then, I’d leave CAN for a later stage. We will need to speak @Theodore if we decide so.

This is my 2cents. I’ll wrap my ahead around all this, as well as the steps to communicate it to our local community tomorrow. Let me know what you think.

I’m fine if you hold off the CAN deploy. Maybe until after we get our server upgrades? (If that makes sense)
There are 3 hubs trading on OFN CAN now - so when we figure out a date, I can let them know. Tuesday-Wednesday is generally good timing for me to be ‘on hand’ for support.

1 Like

Some updates on the communication with Katuma users.

It turns out Guida was right. Monday is our busiest day (See Matomo’s graph below). So the most appropriate seemed to deploy v2 on a Tuesday. This way we have time enough the solve any incidents during working hours througout the week.

Screenshot_from_2019-05-01_10-10-57

So, I we went ahead and sent an email to all users and shared the announcement on our Discourse. Tomorrow we might publish it on Twitter as well. For the record, this is the message we wrote:

Next Tuesday, May 7, 2019, Katuma will receive scheduled maintenance. We expect this to take about 2 hours (9:00 am - 11:00 am) although it could take another hour. During this period, Katuma will not be operational.

The purpose of this maintenance is to make a great step forward in the shared goal among all the Open Food Network’s instances of improving service quality by upgrading our infrastructure. This at the same time will allow us to reduce the software development costs we have to face.

We appreciate your understanding and hope that all of this does not pose a big problem.

If you have any questions, you can ask the community at https://community.coopdevs.org/c/katuma or by sending an e-mail to info@katuma.org, which will be served from 10:00 a.m. to 6:00 p.m. from Monday to Friday.

Hope it helps and we improve it as we move along. Now, let’s make it to the 7th :muscle:

For the record, I share the message a wrote on the Slack channel

a little update about things we talked about with @rachel . Let us know if you have any concerns:
• PRs that must make it to v2: #3734, #3747, #3717. The others are not critical and could wait for a v2.1.0 soon after
• We need to prepare a v1.31.0, just missing #3763, and do mainly a sysadmin testing. It’s about getting the infrastructure 100% ready.
• We still need to enable the latest Datadog monitoring and backups on Katuma production. v1.31.0 is needed for that as well.
• v2 release needs to be prepared and tested.

Agenda we suggest:

  • Today: prepare v1.31.0, get Katuma’s sysadmin stuff ready, move the 3 required v2 PRs merged
  • Tomorrow: do quick testing of v1.31.0, prepare v2 release
  • Monday: deploy v1.31.0 to Katuma, make sure we have production backups available (test rollbacks), v2 release testing and compare Katuma production (v1.31.0) with Katuma staging (v2)
  • Tuesday: Deploy v2 to Katuma production :tada:, spot any issues that might need patching and prepare v2.0.1 if needed
  • Wednesday: relax and enjoy. Hopefully, everything will go smoothly and there’ll be no :fire:

Next instance to deploy v2

We also briefly discussed what should be the next instance to deploy v2. I suggested BE but DE could also be a candidate. They are the only ones among the small instances that use ofn-install. Essential to deploy v2 confidently at this stage.

Both have pros and cons though, one of them being that they don’t have actual customers so it’ll be a bit harder to spot issues.

On the other hand, the lack of customers makes unnecessary to communicate the scheduled maintenance to all users. Enough telling so to the appropriate people.

thoughts @Kirsten @Matt-Yorkley @luisramos0?

1 Like

awesome @sauloperez
yes, we can do DE and BE afterwards.

So, after one week we can say we haven’t had major issues. We found https://github.com/openfoodfoundation/openfoodnetwork/issues/3830 but we still need to assess it and find a proper fix.

The plan is continue the roll-out with the “small” instances: BE and DE next. Meanwhile, we’ll work on moving CAN to ofn-install so that it gets v2 after these.

From now on, we’ll prepare v2.x releases from master which means that release v2.1.0 is going to come from master and get deployed to Katuma. Then, any new instances adopting v2 are going to get the latest release available. This way they’ll get all the improvements on top of v2.0. Therefore, the performance improvements we’ve made so far will be included in v2.1.0

Then, we will only maintain v1.31.0 if we encounter an s1 bug. So don’t expect a v1.32.0 unless we find an s1.

All this we’ll make the roll-out manageable and quicker.

Hello! I’m posting here so we can figure out what are the best next steps. I selfishly want to know when I need to test stuff :grin:

I’m proposing the following plan:

  • Prepare the next release (I feel like we have a LOT to release).
  • Update Canada Thursday 13th same time slot as Belgium, with this new release? Is this smart?
  • Update BE and Katuma the same day?
  • Plan for the German update, when are you back @sauloperez ?

Then the 3 big instances… France is going to be the latest one. I’ve started to talk to our users. From what I see, it would be great if we could upgrade France after June the 30th, but BEFORE August the 15th.

Would that be manageable? What are everyone’s plan for holidays?

That sounds good to me. As per my comment above we agreed to continue the roll-out based on master. AFAIK we said we would prepare that big v2.1.0 release this week so we could test next week.

So, yes Thursday 13th we deploy CAN, BE and Katuma with v2.1.0 :+1:. Someone else can deal with Germany the week after.

I’ll be back to action the 3rd of July so I can focus on “The Big Ones”.

Sounds good to me :slight_smile:

If you deploy first thing Thursday - then I"ll have some time to check things out before our hub has to set up their new order cycle. Its tight, but as long as we don’ t run into any problems, it should be OK. They set up at noon Friday (our time) (Your Friday evening)

Another step forward! we deployed v2.1.0 to Katuma, Belgium and Canada :tada:! welcome to the v2 club @tschumilas :joy:.

I take the chance to share the documentation I wrote to make it easy for anyone to replace me on the roll-out while I’m away :point_right: https://github.com/openfoodfoundation/ofn-install/wiki/Deployment-of-v2. Feel free to fix any typos. Anything worth adding?

Next step is Germany. @luisramos0 will you deal with it? they might not need to email customers but still, they need to be aware of the maintenance window.

5 Likes

@sauloperez @luisramos0 I was speaking with Brandy this morning and we agreed that it would be great if the US upgrades were indeed managed by the global team. So the info here: https://github.com/openfoodfoundation/ofn-install/wiki/Current-deployment-status is correct.

@lauriewayne1 for v2, we need to schedule a downtime for the upgrade. You will need to warn your users. When would be the best date for you to have that upgrade?

there’s a cost to this we cannot ignore. we are managing 7/8 servers right now and it’s all global pot money that is not going into dev time. The investment in ofn-install is exactly to allow people to manage their own instances easily.
I think we should do the upgrade to v2 in US, it’s a large upgrade, but I’d like to explore the possibility that all server upgrades from then on are managed by local teams, in all servers (possibility). Maybe we need a different discourse thread for this.

@luisramos0 I understand what you are saying, but to my knowledge we also did this standardization so it was easier for a dev to upgrade several instances at once and thus saving time globally.

In the case of US I think we should have the global team doing it, even after the upgrade. Brandy has a windows machine and for now a very little time to give to this side of the project. So to me the US is a bit in the same case as France. We have someone who could do the upgrade, but who isn’t around all the time. So what differentiate both cases? Why is France updated by the global team and not the US?