OFN deployment with docker

Hi everyone,

This is my first post on this forum , so please tell me if I am not in the right category or if I am doing something wrong

TL;DR : the instabilities on ofn-install execution are due to web application code rebuild between ofn and ofn-install. The best way to solve it imho is to use docker to build , install and deploy ofn.

The problem :
My last PR unlock the ofn-install installation process by heroinedor · Pull Request #10002 · openfoodfoundation/openfoodnetwork · GitHub made me realize that we have an instability issue on ofn-install execution : each time a new version of Openfoodnetwork application is released, it is materialized only by a tag on the code (and the corresponding zip archive) and there is no package or artifact built.
Ex: Release v4.2.21.1 Revert Address Caching · openfoodfoundation/openfoodnetwork · GitHub
So when ofn-install runs on this release it has always to download the code and rebuild it, instead of downloading a package/artifact and just install it.

This leads to the following problem with ofn-install :

  • no separation of concern : some problems in the openfoodnetwork application build are detected during the ofn-install CI process
  • the openfoodnetwork code is rebuilt in a slightly different environment from the openfoodnetwork which introduce a small risk of undetected bug
  • the ofn-install code is quite complex partly because it has to rebuild the ofn application
  • the installation procedure for server not maintained by ofn is quite heavy.

Proposal:
I see 2 very different ways to solve these problems :

  1. Basic way :

We review and compare ofn and ofn-install build commands and ensure that the 2 environments are strictly identical (which can be tricky). Then we automatize the launch of ofn-install CI each time a release of openfoodnetwork is done, and manually correct it simultaneously to ofn in case of problem

  • Pros :
    • very cheap to develop
    • changes to do in the code will be probably minor
  • Cons :
    • still no separation of concern :
      code problem in the ofn app will still be able to block ofn-install
    • ofn-install still complex to understand and maintain
    • this way to deploy application is not anymore a standard (it is the old way)
  • Estimates : 4-5 days of work
  1. Docker way :
    We change the way the ofn application is built and deployed :
  • In openfoodnetwork : use of docker to build the server image (that we can consider as a “package”)
  • store this image on docker hub
  • In ofn-install : on production servers deploy this image (+ docker-compose scripts ) instead of native installation
  • Pros :
    • separation of concerns between build and deploy process
    • reduce the complexity of ofn-install : easier to understand and maintain
    • reduce the installation procedure complexity for server not maintained by ofn : this may lead to simplify the adoption of ofn by admin users
    • current standard way to deploy application now
  • Cons :
    • costly in terms of development
    • Installation documentation update
    • Change in the way to install local environment for some developers
    • Introduce another technology to be known and understood by developers and admins
  • Estimates :
    This works is too big to get a precise and reliable estimate. It has to be splitted in several tasks :
    • dockerisation of openfoodnetwork : 5-10 days ?
    • publication on docker hub with ofn CI : 5 days
    • ofn-install adaptation to deploy docker-compose stack : 5-10 days ?
    • deployment on staging and then production servers maintained by ofn : 7 days (2 for staging, 5 for production)
    • documentation for administrators : 1-2 days
      => Total : 23-34 days, but it is a rough estimation that must be detailed and then can vary.

What is your opinion about it ?
Do you think Docker is the best option or should we avoid it ?

Looking forward to read your answers.

3 Likes

Thanks @heroinedor for bringing such a discussion to OFN community :pray:

As you know, I’m also a bit worried about this instability (due to the fact that we build on different environnements, and don’t deliver a package/artifact) even we don’t know exactly the risk that is taken, difficult to estimate.

I’m also concern about ofn-install code complexity and its maintability, and you mention that the solution 2 (ie. the docker way) could simplify a lot the code base. To me, thats a great news. Also, as you mentioned in the delivery circle, Docker tend to become a standard, and it’s much more easier to find documentation, developers, …

That’s why I’m in favor of the Docker solution.

2 Likes

@heroinedor ,

the installation procedure for server not maintained by ofn is quite heavy

totally agree with you

this way to deploy application is not anymore a standard (it is the old way)

I also agree on this, especially with such a frequent release of updates, compiling every time is not the best idea

in the current conditions, using docker is probably a more suitable solution

A very interesting discussion in delivery circle today.

We left without a decision on the above, but with a few options on the table.

  1. Firstly, deciding to make a full switch to Docker for all prod envs is a big decision to make without @maikel. We’d like to include Maikel in this decision as much of the responsibility now and in the future for maintaining the deployment scripts and processes is in his hands.

  2. In terms of @heroinedor’s next steps there are some options:
    a) We can go ahead with fixing the critical aspects of the anisible build. This work might end up being time wasted if we decide to move to Docker, but addresses our current ciritcal problem without committing to months of work.
    b) We put energy into improving the dev Docker environment. This will make it easier to onboard devs and contributors, and would go some distance toward the bigger docker task. However it won’t address our current critical build issues.

I will put my opinions on this decision in a separate comment :slight_smile:

With regard to what @heroinedor should take on as the next steps…
If there is a general desire among the existing community that Docker is a strong move, I would suggest we move forward with looking at the Docker dev env. This will gain us a big win in either case and moves us in the direction of switching over to Docker without committing just yet.

If the community seems to support sticking wtih Ansible (which currently doesn’t seem strongly the case… but we’ve only had a couple comments so far) then I would suggest diving in and fixing the critical Ansible packaging issue so that we remove this vulnerability from our build and deployment process asap.

I know that @heroinedor is keen to continue, so let’s aim to make a decision on next steps in 24hrs (meaning all timezones have a work day to think about it) :slight_smile:

1 Like

Thanks heroinedor. I’ve tried to summarise this to understand it better, can you please confirm if I have this right?

Servers:
Currently, we manage production servers with Ansible, to provision (occasional system setup) and deploy (weekly app releases). Each server varies a bit with system specs and OS version (Ubuntu 16, 18 or 20). The system also includes packages that are built for the OS version (eg postgres 9.5, 10 or 12 respectively).
The Ansible configuration/scripts are version controlled in GitHub in ofn-install, and tested with GH Actions on Ubuntu 18.

Application:
Updates to our Rails app are controlled in GitHub in openfoodnetwork. They are first tested with GH Actions before a release, using a separate set of config commands, using Ubuntu 20 and Postgres 10 for example.

It’s been identified that we have a small risk of an undetected bug appearing in deployment to production.
From my understanding, this could occur due to a difference in the OS or system packages used to run the app.

Assuming this is correct, to avoid this risk we either:

  1. Update everything to use the same OS and package versions (eg upgrade everything to Ubuntu 20), and automatically test each application release with ofn-install.
    OR
  2. Create a single docker image (containing a single OS and package version), and use this as a base to deploy everywhere.
    Changes to the OS, packages, other setup, AND the app would all be packaged together as a complete version (eg “ofn-docker v1.1” includes Ubuntu 20, Postgres 10, OFN 4.2.22), tested and released.

In theory, options 1 and 2 have almost the same effect, but option 1 is harder to rollback and not guaranteed as much (because there’s not a full system image stored)

My questions are:

  1. What is the risk? It sounds unlikely.
  2. What is the impact? It sounds potentially very large, for example the system is unusuable for all managed instances. Our capacity to handle this in a timely manner is quite low.
  3. What is the overall benefit then? Improved guarantee of availability.
  4. What is the cost? You have started to estimate this, thank you!

I also have a specific question: how does using Docker “reduce the complexity of ofn-install”?
I’m guessing we could move some of the Ansible scripts into Docker config (eg installing packages). Is that right?

And my opinion for the short term: I agree that it makes sense to improve the Docker setup for development, if it can make it easier for new developers to get started. This seems like a good start for existing developers to start using it and get familiar with it too.

In the long term, it’s a big change and will require the careful consideration that Maikel can bring.

Just a comment, regarding our ofn-install repo.

We should take into consideration that this repo uses old version of Ansible (2.9, end of life May 2022). This doesn’t allow us (for example) to upgrade to the latest LTS version of nodejs (the one we use in ofn app, Upgrade nodejs to latest LTS, ie. 18.x · Issue #838 · openfoodfoundation/ofn-install · GitHub). For this particular case, I’m not 100% that this will not causes any troubles in a near future. We do have others issues as well (because of a very old version of python Unable to deploy with latest version of Jinja2 · Issue #842 · openfoodfoundation/ofn-install · GitHub).

A sum-up: whatever path we choose, we should be aware that we should must make some effort to upgrade and have a up to date ofn-install application.

1 Like

Everything you said @dcook is correct and I would like to give you some answers :

Changes to the OS, packages, other setup, AND the app would all be packaged together as a complete version (eg “ofn-docker v1.1” includes Ubuntu 20, Postgres 10, OFN 4.2.22), tested and released.

Yes it is the right approach.
But I would like to add a little nuance :
It will be exactly as you say if we consider staying in a monolith approach.
But maybe we can start being a little bit less monolithistic, trying to split the stack in several containers.
We could use for example a first docker-compose stack with :

  • App container for OFN
  • db container for postgres

In a second time, going further, I would also add to the docker-compose stack:

  • nginx container for proxy/loadbalancing
  • nodejs for frontend (but I don’t know how much the backend and frontend are tightened)
  1. What is the impact? It sounds potentially very large, for example the system is unusuable for all managed instances. Our capacity to handle this in a timely manner is quite low.
  • large impact because :
    • it will change on every server the way application is deployed. BUT it will still be managed by ofn-install, so we just have to find a way for ofn-install to manage temporarily together the ‘old native way’ and the ‘new docker way’ to deploy application.
    • servers admin will have to improve their docker knowledge
  • today with ansible or tomorrow with ansible+docker, a buggy version deployed will impact managed servers the same way : docker will not magically bring them all and in the darkness bind them to the SegFault crash :ring: :smiley: . Each server keep it’s independence.
  1. What is the overall benefit then? Improved guarantee of availability.
  • simplified maintenance because of identical infrastructure for all managed server (and for not managed servers also)
  • simplified deployment for managed and not managed servers.
  • simplified rollback when a deployment fails.

I also have a specific question: how does using Docker “reduce the complexity of ofn-install”?
I’m guessing we could move some of the Ansible scripts into Docker config (eg installing packages). Is that right?

=> yes that is right and it will be more than just few scripts.

1 Like

Hey @heroinedor ,
Thanks for starting this discussion.

If we go ahead with docker to manage deployments in production, do you have any idea how it could affect our process to deploy in staging?

Currently, we’re still using an old version of Semaphore to do this, although we’ve moved the CI/build to run in Github Actions. See also this comment for additional context.

We keep the Semaphore process to enable non-devs/testers to be able to easily stage and test PRs, with a few clicks - this is part of our delivery cycle. However, eventually we’d like to unify this process and trigger PR deployments with GH-Actions as well, instead of relying on Semaphore for it (we’re not sure when it will be deactivated).

So, do you think the docker-approack would provide a straightforward way for testers/non-devs to stage PRs? If so, then we might not need to set this up in GH-Actions.

Grateful for your thoughts on this.

Good point, this sounds like a good approach (although we should to consider if any management overheads it adds are worth it). I can see this is already the case in the dev docker-compose stack: openfoodnetwork/docker-compose.yml at master · openfoodfoundation/openfoodnetwork · GitHub