OFN deployment with docker

heroinedor · November 21, 2022, 4:46pm

Hi everyone,

This is my first post on this forum , so please tell me if I am not in the right category or if I am doing something wrong

TL;DR : the instabilities on ofn-install execution are due to web application code rebuild between ofn and ofn-install. The best way to solve it imho is to use docker to build , install and deploy ofn.

The problem :
My last PR unlock the ofn-install installation process by heroinedor · Pull Request #10002 · openfoodfoundation/openfoodnetwork · GitHub made me realize that we have an instability issue on ofn-install execution : each time a new version of Openfoodnetwork application is released, it is materialized only by a tag on the code (and the corresponding zip archive) and there is no package or artifact built.
Ex: Release v4.2.21.1 Revert Address Caching · openfoodfoundation/openfoodnetwork · GitHub
So when ofn-install runs on this release it has always to download the code and rebuild it, instead of downloading a package/artifact and just install it.

This leads to the following problem with ofn-install :

no separation of concern : some problems in the openfoodnetwork application build are detected during the ofn-install CI process
the openfoodnetwork code is rebuilt in a slightly different environment from the openfoodnetwork which introduce a small risk of undetected bug
the ofn-install code is quite complex partly because it has to rebuild the ofn application
the installation procedure for server not maintained by ofn is quite heavy.

Proposal:
I see 2 very different ways to solve these problems :

Basic way :

We review and compare ofn and ofn-install build commands and ensure that the 2 environments are strictly identical (which can be tricky). Then we automatize the launch of ofn-install CI each time a release of openfoodnetwork is done, and manually correct it simultaneously to ofn in case of problem

Pros :
- very cheap to develop
- changes to do in the code will be probably minor
Cons :
- still no separation of concern :
  code problem in the ofn app will still be able to block ofn-install
- ofn-install still complex to understand and maintain
- this way to deploy application is not anymore a standard (it is the old way)
Estimates : 4-5 days of work

Docker way :
We change the way the ofn application is built and deployed :

In openfoodnetwork : use of docker to build the server image (that we can consider as a “package”)
store this image on docker hub
In ofn-install : on production servers deploy this image (+ docker-compose scripts ) instead of native installation

Pros :
- separation of concerns between build and deploy process
- reduce the complexity of ofn-install : easier to understand and maintain
- reduce the installation procedure complexity for server not maintained by ofn : this may lead to simplify the adoption of ofn by admin users
- current standard way to deploy application now
Cons :
- costly in terms of development
- Installation documentation update
- Change in the way to install local environment for some developers
- Introduce another technology to be known and understood by developers and admins
Estimates :
This works is too big to get a precise and reliable estimate. It has to be splitted in several tasks :
- dockerisation of openfoodnetwork : 5-10 days ?
- publication on docker hub with ofn CI : 5 days
- ofn-install adaptation to deploy docker-compose stack : 5-10 days ?
- deployment on staging and then production servers maintained by ofn : 7 days (2 for staging, 5 for production)
- documentation for administrators : 1-2 days
  => Total : 23-34 days, but it is a rough estimation that must be detailed and then can vary.

What is your opinion about it ?
Do you think Docker is the best option or should we avoid it ?

Looking forward to read your answers.

jibees · November 22, 2022, 9:36am

Thanks @heroinedor for bringing such a discussion to OFN community

As you know, I’m also a bit worried about this instability (due to the fact that we build on different environnements, and don’t deliver a package/artifact) even we don’t know exactly the risk that is taken, difficult to estimate.

I’m also concern about ofn-install code complexity and its maintability, and you mention that the solution 2 (ie. the docker way) could simplify a lot the code base. To me, thats a great news. Also, as you mentioned in the delivery circle, Docker tend to become a standard, and it’s much more easier to find documentation, developers, …

That’s why I’m in favor of the Docker solution.

romale · November 22, 2022, 9:57am

@heroinedor ,

the installation procedure for server not maintained by ofn is quite heavy

totally agree with you

this way to deploy application is not anymore a standard (it is the old way)

I also agree on this, especially with such a frequent release of updates, compiling every time is not the best idea

in the current conditions, using docker is probably a more suitable solution

lin_d_hop · November 22, 2022, 9:59am

A very interesting discussion in delivery circle today.

We left without a decision on the above, but with a few options on the table.

Firstly, deciding to make a full switch to Docker for all prod envs is a big decision to make without @maikel. We’d like to include Maikel in this decision as much of the responsibility now and in the future for maintaining the deployment scripts and processes is in his hands.
In terms of @heroinedor’s next steps there are some options:
a) We can go ahead with fixing the critical aspects of the anisible build. This work might end up being time wasted if we decide to move to Docker, but addresses our current ciritcal problem without committing to months of work.
b) We put energy into improving the dev Docker environment. This will make it easier to onboard devs and contributors, and would go some distance toward the bigger docker task. However it won’t address our current critical build issues.

I will put my opinions on this decision in a separate comment

lin_d_hop · November 22, 2022, 10:03am

With regard to what @heroinedor should take on as the next steps…
If there is a general desire among the existing community that Docker is a strong move, I would suggest we move forward with looking at the Docker dev env. This will gain us a big win in either case and moves us in the direction of switching over to Docker without committing just yet.

If the community seems to support sticking wtih Ansible (which currently doesn’t seem strongly the case… but we’ve only had a couple comments so far) then I would suggest diving in and fixing the critical Ansible packaging issue so that we remove this vulnerability from our build and deployment process asap.

I know that @heroinedor is keen to continue, so let’s aim to make a decision on next steps in 24hrs (meaning all timezones have a work day to think about it)

dcook · November 23, 2022, 3:17am

Thanks heroinedor. I’ve tried to summarise this to understand it better, can you please confirm if I have this right?

Servers:
Currently, we manage production servers with Ansible, to provision (occasional system setup) and deploy (weekly app releases). Each server varies a bit with system specs and OS version (Ubuntu 16, 18 or 20). The system also includes packages that are built for the OS version (eg postgres 9.5, 10 or 12 respectively).
The Ansible configuration/scripts are version controlled in GitHub in ofn-install, and tested with GH Actions on Ubuntu 18.

Application:
Updates to our Rails app are controlled in GitHub in openfoodnetwork. They are first tested with GH Actions before a release, using a separate set of config commands, using Ubuntu 20 and Postgres 10 for example.

It’s been identified that we have a small risk of an undetected bug appearing in deployment to production.
From my understanding, this could occur due to a difference in the OS or system packages used to run the app.

Assuming this is correct, to avoid this risk we either:

Update everything to use the same OS and package versions (eg upgrade everything to Ubuntu 20), and automatically test each application release with ofn-install.
OR
Create a single docker image (containing a single OS and package version), and use this as a base to deploy everywhere.
Changes to the OS, packages, other setup, AND the app would all be packaged together as a complete version (eg “ofn-docker v1.1” includes Ubuntu 20, Postgres 10, OFN 4.2.22), tested and released.

In theory, options 1 and 2 have almost the same effect, but option 1 is harder to rollback and not guaranteed as much (because there’s not a full system image stored)

My questions are:

What is the risk? It sounds unlikely.
What is the impact? It sounds potentially very large, for example the system is unusuable for all managed instances. Our capacity to handle this in a timely manner is quite low.
What is the overall benefit then? Improved guarantee of availability.
What is the cost? You have started to estimate this, thank you!

I also have a specific question: how does using Docker “reduce the complexity of ofn-install”?
I’m guessing we could move some of the Ansible scripts into Docker config (eg installing packages). Is that right?

dcook · November 23, 2022, 3:19am

And my opinion for the short term: I agree that it makes sense to improve the Docker setup for development, if it can make it easier for new developers to get started. This seems like a good start for existing developers to start using it and get familiar with it too.

In the long term, it’s a big change and will require the careful consideration that Maikel can bring.

jibees · November 23, 2022, 10:45am

Just a comment, regarding our ofn-install repo.

We should take into consideration that this repo uses old version of Ansible (2.9, end of life May 2022). This doesn’t allow us (for example) to upgrade to the latest LTS version of nodejs (the one we use in ofn app, Upgrade nodejs to latest LTS, ie. 18.x · Issue #838 · openfoodfoundation/ofn-install · GitHub). For this particular case, I’m not 100% that this will not causes any troubles in a near future. We do have others issues as well (because of a very old version of python Unable to deploy with latest version of Jinja2 · Issue #842 · openfoodfoundation/ofn-install · GitHub).

A sum-up: whatever path we choose, we should be aware that we ~~should~~ must make some effort to upgrade and have a up to date ofn-install application.

heroinedor · November 23, 2022, 12:56pm

Everything you said @dcook is correct and I would like to give you some answers :

Changes to the OS, packages, other setup, AND the app would all be packaged together as a complete version (eg “ofn-docker v1.1” includes Ubuntu 20, Postgres 10, OFN 4.2.22), tested and released.

Yes it is the right approach.
But I would like to add a little nuance :
It will be exactly as you say if we consider staying in a monolith approach.
But maybe we can start being a little bit less monolithistic, trying to split the stack in several containers.
We could use for example a first docker-compose stack with :

App container for OFN
db container for postgres

In a second time, going further, I would also add to the docker-compose stack:

nginx container for proxy/loadbalancing
nodejs for frontend (but I don’t know how much the backend and frontend are tightened)

What is the impact? It sounds potentially very large, for example the system is unusuable for all managed instances. Our capacity to handle this in a timely manner is quite low.

large impact because :
- it will change on every server the way application is deployed. BUT it will still be managed by ofn-install, so we just have to find a way for ofn-install to manage temporarily together the ‘old native way’ and the ‘new docker way’ to deploy application.
- servers admin will have to improve their docker knowledge
today with ansible or tomorrow with ansible+docker, a buggy version deployed will impact managed servers the same way : docker will not magically bring them all and in the darkness bind them to the SegFault crash . Each server keep it’s independence.

What is the overall benefit then? Improved guarantee of availability.

simplified maintenance because of identical infrastructure for all managed server (and for not managed servers also)
simplified deployment for managed and not managed servers.
simplified rollback when a deployment fails.

I also have a specific question: how does using Docker “reduce the complexity of ofn-install”?
I’m guessing we could move some of the Ansible scripts into Docker config (eg installing packages). Is that right?

=> yes that is right and it will be more than just few scripts.

filipefurtado · November 23, 2022, 5:16pm

Hey @heroinedor ,
Thanks for starting this discussion.

If we go ahead with docker to manage deployments in production, do you have any idea how it could affect our process to deploy in staging?

Currently, we’re still using an old version of Semaphore to do this, although we’ve moved the CI/build to run in Github Actions. See also this comment for additional context.

We keep the Semaphore process to enable non-devs/testers to be able to easily stage and test PRs, with a few clicks - this is part of our delivery cycle. However, eventually we’d like to unify this process and trigger PR deployments with GH-Actions as well, instead of relying on Semaphore for it (we’re not sure when it will be deactivated).

So, do you think the docker-approack would provide a straightforward way for testers/non-devs to stage PRs? If so, then we might not need to set this up in GH-Actions.

Grateful for your thoughts on this.

dcook · November 23, 2022, 11:15pm

Good point, this sounds like a good approach (although we should to consider if any management overheads it adds are worth it). I can see this is already the case in the dev docker-compose stack: openfoodnetwork/docker-compose.yml at master · openfoodfoundation/openfoodnetwork · GitHub

maikel · December 28, 2022, 2:04am

Just a quick reply: Docker may be a good move but doesn’t magically solve everything. There are also lots of other ways to simplify.

I need to think about this more and discuss this with you. My preference of not using Docker is mainly around simplicity (no Docker middle man) and performance. Docker has it’s own strengths, especially now that we have many instances and the host servers vary, Docker can be a common interface to standardise our environment.

Our Ansible scripts in ofn-install set up the server environment. All that would still be needed with Docker but instead of being able to set up several versions of Ubuntu, we can simplify and script only for one version in the Docker image. In theory, Ansible is supposed to be that abstraction layer that can install a database no matter which version you use but in practice Ansible doesn’t live up to its promise and different application versions have incompatibilities beyond the setup by Ansible.

We used to support only one or two Ubuntu versions. Within that Ubuntu system, we would use only the default database and nginx packages which meant that we had only two versions in total to support. But then we started installing custom Postgresql and custom Nginx and custom Certbot and now the combinations to support has exploded. We need to reduce that variety.

Docker can solve that because we would have only one version of it all and the host system doesn’t matter much. I have three remaining concerns though:

Performance. We are struggling with resources and guess that Docker will add to the problem.
Host systems still need to be updated. And Docker versions change, too. This is additional maintenance.
The added Docker layer can make debugging harder. Maybe it’s just my inexperience with Docker but I always found it harder when something went wrong with Discourse, for example.

And then there’s a lot of undocumented functionality in ofn-install, some of which may still be useful and some can be removed. There’s lots we can simplify and we would need to do that anyway if we wanted to rewrite it for Docker.

This is a very large project.

heroinedor · January 2, 2023, 3:50pm

Hi @maikel , thank you for the questions.
I totally agree with everything you wrote.
To answer to your questions :

Performance :
Since Docker is just a way to isolate processes, there is (almost) no additional resources needed to run the application on Docker. So no worries about resources consumption.
Docker versions
Yes we will have to update Docker versions. But if we use containerized versions for all the stack (OFN app + database + redis + nginx + …) the result will be an unique stack deployed on all servers, which means drastically less maintenance. In that case the overhead cost of the docker maintenance will be harmless.
Debugging layer :
yes you are right : adding the docker layer will make it a little more tricky to debug the application, but i think it’s no big deal. just a quick example : Debugging Rails App With Docker Compose | by Ravic Poon | GOGOX Technology | Medium . I have to go a little deeper on this question since I am not a Rails developer, but I am quite confident that it is not so hard to find a solution.

maikel · January 3, 2023, 3:20am

Okay, I’m happy to give it a go on my dev environment and see about that.

heroinedor · January 17, 2023, 9:54am

I should have been more precise :

on development machine running on Windows or MacOs, there is a linux VM that must be launched (with hyperv or WSL) to work with docker. So more resources are needed.
on our servers and linux machines, there is (almost) no additional resources needed to run the application on Docker.
So there won’t be any additional resource problem on our servers.

filipefurtado · February 7, 2023, 12:24pm

Docker Meeting notes

Setting the scope of the meeting

production → main goal of the discussion
development
CI

Whats the problem we’re trying to solve? Two:

When we deliver code we:

we build the code on CI
we rebuild it on the ofn-install CI

This may introduce discrepancies, between servers as settings may differ.

The solution:
Have a docker image (thus preventing different postgres versions, for example)

environment control / maintainability
We build the code on a OS and then when we deploy the production, we can do deployment without having changes; this is easier to maintain, because we diminish the combinations and potential differences

Discussion

Should we use a specialized docker hosting?

Probably not the cheapest option. A standard docker perhaps a bit more challenging to maintain the whole system, but probably cheaper. We just want to deploy, should not need orchestration.

Our goal is to deploy a full docker compose stack (with redis, postgres, puma, etc.). However, we’ll still need to address some differences between servers, like env variables like secrets and such. How is this best handled?

Server specific assets can in principle be handled by two different docker images.

Round of comments

There is general consensus from the team that this is a good move. Some side remarks:

Docker for the dev setup will make it easier to onboard new developers.
Having docker on the different environments (dev, CI, staging, production). There are some specific settings which should/could affect how tests are ran. Reference on dockerizing system tests here, recent change to consider docker on the CI here.
Concerns around skills and maintenance. There are few people in the team which are knowledgeable around docker, so maintaining/debugging any issues could be a challenge in the future.
- Some Ansiable code will not be required, as it will be split to the container. So the perspective, is to ease up maintainability. Everything related to the installation will be taken off → this is to be replaced by the installation of the binaries in the server.
Puma (with it’s several processes ) should be within one container.

Workload split

The current workload split was shared on slack, but is to be updated [link here: Slack]

Proposal: to first dockerize the postgresql, to have the same version across servers.
Counterproposal: instead, rather dockerize lower risk services → it was suggested to start off with redis, and then go from there.

Worth underlining, that that these steps go through a preliminary test phase on staging-servers.