Build Pipelines

maikel · March 19, 2016, 7:31am

Once a developer has some code ready, it should be tested, reviewed, merged and deployed to staging and production servers. The build pipeline is a composition of all these steps, a chain of actions and states. I would like to explain how the Australian setup works and open that for discussions.

The short summary of events

push code to Github
Travis tests code
Buildkite is notified if the Travis tests passed

one click deploys to our staging server
another click merges into master and deploys to production

The more detailed version

Whenever we push code to the openfoodnetwork repository, Github sends out notifications to several services. Travis is notified and runs all the automated tests. Once Travis finishes testing, it tells Github if the code passed or failed. If it passed, Travis also triggers a deploy event.

Buildkite listens to deploy events. So whenever some code passes the tests in Travis, we trigger a build in Buildkite. Buildkite doesn’t really do anything just yet. The build just sits there and waits for someone to click “Stage it!”. It then deploys that code to our staging server.

If we are happy with the results on the staging server, we hit another button and the code is merged into the master branch and deployed to the Australian production server.

A slightly different path

If somebody opens a pull request on Github, Travis tests that code as well. Additionally, Codeclimate checks the code style. If the branch in question is within the main repository, it can go through Buildkite as described above. If it is in another repository, a developer has to import that branch into the main repository in order to trigger Buildkite and deploy it.

maikel · March 19, 2016, 7:57am

I did the description above, because I would like to change the pipeline and I’m interested in your thoughts.

What I don’t like about the current build pipeline

Every passed build ends up in Buildkite. Most of them are never
staged. That means that we look at a lot of noise when looking at
Buildkite builds. You have to search for the last build that actually
made it to staging or even production. I find myself scrolling and
hoping that I don’t miss the last staged build.
The deploy events are triggered with my Github API key which means
that a lot of branches are associated to me even though I never worked
on them. That makes the overview of “my branches” on Github useless,
because I have too many branches in there and you don’t have enough.
If Codeclimate tests fail, the Buildkite builds fail as well.
Codeclimate looks only at pull requests. We end up with a lot of failed
builds in Buildkite as well.
We can’t stage code from other repositories. That was a conscious
decision in the beginning, because the code executed on our Buildkite
agent sits within the repository. An attacker could modify these scripts,
open a pull request and possibly harmful code is executed on our machine.
But it means that we have to push each branch to the main repository
and wait for the test results even though the same code got tested before.

What I would like to change

I would like Buildkite to listen to pull requests instead of deploy events. That means that we have to open a pull request for every branch that should be staged. At the moment, the Australian team doesn’t need to open pull requests, because we can merge ourselves. But it would be a good practice. We can get rid of the deploy events then. We loose the automatic merge Github is doing on deploy events. But we can merge in Buildkite.

If we use a separate set of scripts on our Buildkite agent, we don’t have that security vulnerability any more and can process third-party pull requests as well.

Further thoughts for the future

Instead of Buildkite, we could have some other interface to deploy to staging and production. It could be a very lean server that is just displaying the currently open pull requests. You can click on a stage button to create a staging deploy event on Github. Our staging server could actually listen to that and deploy the commit itself. And if a production deploy is triggered, the staging and production servers listen to that. The staging server saves it’s baseline data and the production server is doing the deploy. I’m wondering if there is a tool already…

pmackay · March 22, 2016, 10:57am

@maikel This is very useful, thanks! Few things:

Should CodeClimate be run on all commits not just PRs? I suppose just using PRs would make this irrelevant anyway…

Is there a greater level of automation that could be aimed for? Why have deploying to staging as manual? Is that because there is only 1 staging server? How feasible might it be to build staging environments for each branch (perhaps for significant branches/PRs, they vary)? Often Nick is asking to push a branch to UK staging, but its another manual step.

Could the deploy scripts be moved into the rest of the deploy tools? This would hopefully improve consistency as well as a greater focus them being the primary tool for workflow, etc. Also I’d really like to rethink ofn_deployment and building something more flexible and effective for a growing number of global servers, which I hope to write about soon.

I also wonder what would it take to increase the frequency of releases? Is this desirable or possible? I can already see that once per month presents some challenges but totally appreciate it takes work and time is in short supply.

maikel · March 23, 2016, 7:24am

Why have deploying to staging as manual?
How feasible might it be to build staging environments for each branch

Good question. I never thought of that. We would need multiple apps to run on the same server. They could listen to the domain name commit-id.staging.example.org. They would need a database each. And we need to configure AWS S3 to work with this. If they get installed automatically, they have to be deleted after some time as well.
We would not be able to stage third-party pull requests though. That could be exploited quite easily. And that means there would still be a manual step of verifying or importing code of a pull request.

Could the deploy scripts be moved into the rest of the deploy tools?

The current deploy scripts still support that, I guess. It was just more convenient to use simple post-receive hooks instead of Ansible. We were more thinking in the other direction. What if you could click Upgrade in the admin panel? Maybe you could have a tag selection for advanced users. Since our servers know already how to upgrade themselves via their post-receive hook, we would just need a UI component for that. Nick could do the same on the staging server.

I also wonder what would it take to increase the frequency of releases?

The Australian server always runs with the master branch. Pushes to the master branch are the tiniest steps of releases. What is in between? Weekly or fortnightly? There are some weeks in which we don’t push anything. In others we push a lot. We could tag each version we deploy. But that would be a lot of tags…

pmackay · March 25, 2016, 9:10am

We would not be able to stage third-party pull requests though.

How many of those are there currently compared with PRs inside the project (because many of the core committers have access)? Would it be an acceptable burden to create a shadow branch inside the project for the external branch, which is what Rohan seemed to be doing last time I was pushing a PR?

The current deploy scripts still support that, I guess. It was just more
convenient to use simple post-receive hooks instead of Ansible.

Agree with your other comment that it would be good if ansible scripts used the post-receive hooks rather than having duplicate code.

What if you could click Upgrade in the admin panel?

Not sure about this. What if there is a problem? Would the non-dev know how to rollback or deal with issues? It might be useful on staging perhaps but its building tool functionality into the core app. Are there any security implications?

The Australian server always runs with the master branch.

The production server? So this is another difference between Aus and rest of world. Generally we are taking the tagged releases as they are likely to be more tested, etc. Not necessarily a problem, just a difference worth being aware of!

maikel · March 26, 2016, 7:26am

We would not be able to stage third-party pull requests though.

How many of those are there currently compared with PRs inside the project (because many of the core committers have access)?

We don’t tend to open pull requests within the core team at the moment, but we can do that.

But my point was that we need to review a third-party pull request before that code is executed on our staging server. We can’t just run foreign code on a publicly available server. There must be a manual action to stage these. Creating a shadow branch is fine if we need to. We do that all the time.

What if you could click Upgrade in the admin panel?

What if there is a problem?

The in-app upgrade button should be only one way you can upgrade. I think it should just be a user interface, a trigger for a shell script or rake task. A developer should be able to login and do the same thing, independent from the app.

The Australian server always runs with the master branch.

The production server?

Yes, it’s in our build pipeline. Merge to master and deploy to production happen in the same step. That’s why we are so protective of the master branch and merging. One day, the master branch might be managed by a more global community and the Australian server will have it’s own repository. But in the past, the code was mostly driven by Australian feature requests, people wanted stuff as soon as possible. And we actually make sure that the master branch is well tested.