Delivery process 2.0: feature toggles

sauloperez · February 24, 2021, 3:26pm

Hello folks, @Jana and I had a call yesterday to talk about the feature toggles and thought it was worth sharing with you all the ideas that we have and want to discuss in the next delivery meeting, as part of Review of the process, so we are all on the same page. It looks like this won’t fit into said meeting so let’s go through it async.

Upside of feature development under a feature toggle: More agile development; we get feedback as early on as possible (product team, core team, user testing, % of user base, etc.). and we integrate it in the development process lowering the cost of releasing the feature and hopefully gets to users’ hands faster.

This impacts the current delivery process and so we suggest:

No manual testing while the feature is still in development, at least not in the early stage. There’s no point on manual testing something that might change along the way. The automated tests should cover it as well as the feature toggle prevents it to be release to the masses. Remember, tests are mandatory.
We would manually test the feature at the very end of the process when the team thinks the feature is stable and ready for prime time.
Product validates (together with the tech lead?) early on and iteratively but asynchronously so misunderstandings or invalid hypotheses are caught early on.
Any issues that may come up are not labeled as bugs but are simply new issues to append to the epic.

Thoughts?

apb · February 24, 2021, 5:17pm

I have to say I’m pretty wary of this being integrated into the delivery process, if we’re saying that it’s the default way that features are delivered. It has the potential to make debugging very difficult if there are different states that the code could be in based on if a user has a feature toggled on or off.

It also raises the question of how each instance decides when they’re ready for a feature, and who’s managing the process of toggling various users or groups on or off for each instance, communicating with the instance manager about it, etc.

To take a concrete example, the unit prices front end work is now behind a feature toggle - but only in development and staging. I agree that there is real benefit to getting feedback early in the process from design, but couldn’t that have been achieved simply by deploying the PR to a staging server? I don’t think that things like:

def unit_price_price
  (rand * 10).round(2)
end

should be sitting in production on a regular basis.

In summary, I think it’s a powerful tool to be used wisely but sparingly

Rachel · February 24, 2021, 6:59pm

Thank you @sauloperez for the post

I think I share @apb concerns.

In the proposal, if I understand correctly, manual testing (ie testing done by testers?) would be moved on to the epic only. Therefore I suspect we might end up with the same problems we had 3 years ago when testing huge code change (we moved to smaller PR to help not only code review, but also testing and delivery).

But maybe I misunderstood the proposal? Can you clarify when do testers have a role in this process?

To feed a counter proposal, I’m sharing here some thoughts @filipefurtado and I had on how to introduce product and/or design feedbacks in our processes (we planed initially to chat about it with Eriol - and possibly Jana - on next testing catchup on March 9th).
The main conclusion of our earlier chat was that there are 3 moments in the delivery process where testing / feedbacks can occur:

Before the development phase (something mentioned before by @Erioldoesdesign - to be confirmed)
After the development phase and before merging - what we are currently doing.
In production - possibly making use of feature toggle

What’s interesting with presenting it this way is that we can see that the feature toggle is really only useful for testing/hiding in production. But here in this proposal I understand that we use its potential to change processes before production. Yet I don’t understand the link… why would we do that? To solve what type of problem?

filipefurtado · February 25, 2021, 12:51pm

Thank you for raising this well-needed thread!

One aspect we discussed (w/ @Rachel when considering 3 different moments in the current process is that it is beneficial to keep the time between committing and merging as short as possible (2nd moment).

As I understand the proposal you describe Pau (I’ll try to add the actors and types of testing below):

we would rely on code review and our build between committing a PR and merging it → these are automated tests; actors: CI-tool/devs.
after merge, this toggled-PR(s) would then be tested in production → these tests comprise visual and usability testing among others; actors: product and design teams. External/voluntary users can involved at this stage as well to provide early insights.

The resulting feedback will drive new issues/requirements, which goes back to the dev-team. Within this feedback-loop, the time between committing/merging is minimal, no testing from the testing team is involved - so it looks to me, as this might have a positive impact here.

After the feedback-loop is concluded and after green light from product/design: the complete feature (epic) is tested in production → functional and exploratory testing; actors: the testing team.

Would this sum it up correctly @sauloperez @Jana ? I guess none of this is meant as being a static process, teams can entangle, when needed, in the different moments.

My thoughts:

I think this new process improves the feedback-loop - devs/product/design: speeds it up and separates the different types of feedback, coming from the different teams/actors.

I can imagine it might be very challenging to provide and interpret feedback when “wearing all hats on”, and in one single moment of the pipe - maybe this was an issue, in the past?
As the team grows, we can afford to provide more detailed feedback, from different actors and on designated moments of the development process.

Additional advantages of the ability to selectively toggle features seem to me:

the ability to involve users in an early stage a feature development
slowly shifting the testing paradigm: from staging to production

Looks like we are in a good position to give it a go on Unit Prices, as a first trial? Would be great to here from @jibees on this as well.

Rachel · February 25, 2021, 9:49pm

How does this work for instances which deploy the code themselves? Can we really release something that hasn’t been tested?

filipefurtado · February 28, 2021, 8:53pm

That’s a good point Rachel, we need to define a process for instances which deploy the code themselves.

I guess the ability to toggle features will evolve as we encounter new use-cases and requirements add up.

Can we really release something that hasn’t been tested?

I’m wondering if we aren’t doing this already, to some extent, as we are currently releasing features which are still “work in progress” - under the feature toggle, that is.

Surely something to address on a broader discussion. I agree @Rachel, as you pointed out on before, it needs it’s own space/meeting.

jibees · March 1, 2021, 10:19am

Thanks for opening the discussion here. I’m very happy to create a new way of working together, by shortening cycles of work between dev team, product team and designer team.

If we go this way, one things I would like to underline, is the importance of the code review and the development process itself. From my experience with unit price feature, I know it’s possible to forgot the conditional testing line (like: - if feature? :unit_price, spree_current_user). It happened to me once, but I realized just before the PR was tested (and I had the time to correct my mistake).

I don’t have a proposal to make at the moment, but we must be aware that pushing something in production should always be done with great caution. I don’t know if automated testing could avoid this kind of mistakes…

sauloperez · March 1, 2021, 1:17pm

Thanks everyone for this discussion. It’s really good that this is taking place and we see the potential for improvement!

I think what follows covers all the doubts ideas shared so far.

Shortening cycle time

I think you @filipefurtado summarized it perfectly : We aim for shortening our delivery cycle time (the time it takes from deciding to make a change to having it available to users) and hence, having an effective feedback loop. As the toggles enable even smaller PRs, manually testing those adds even more overhead, and causes PRs to wait in Test Ready, defeating the purpose of the toggles. Manually testing the smallest and more narrowed-scope PRs doesn’t make any sense either. It’s at a high and functional level that manual testing shines.

Manual testing

So what we’re suggesting is a bit of pragmatism on what is manually tested to get merged. We’ve been merging more small refactor PRs for which our test suite is enough, for quite some time now. Now, we’ll do the same for PRs that introduce new functionality but under a toggle.

We mustn’t forget that we’re trying to fully automate release testing and while also trying to improve the test coverage of all PRs. Investing in this while still keeping the same level of manual testing leads to redundancy that increases costs.

So, from those bullet points @filipefurtado listed above we wouldn’t completely rule out manual testing before merging a PR. So:

After code review and before merging a PR → type of testing: automated tests, actors: CI build/devs + testers when required.

Our idea is to leave the manual testing for the final part of the development of a new feature, when the feature can already be used and a high-level test will provide useful feedback and catch bugs. Does this sound good?

Team agreement

We don’t have a rule yet about when to manually test and when not other than our own judgment. I guess we need to agree on some shared judgment at this point? we’ll surely need to experiment and adapt things along the way though.

Could we leave it up the teach lead and PO to start with? I think code reviewer are the ones who first consider whether or not manual testing is needed. So when the tech lead reviews, he can make the call.

I think we all agree already on what @apb pointed out regarding feature toggles:

it’s a powerful tool to be used wisely but sparingly

References

An in-depth explanation of feature toggles: Feature Toggles (aka Feature Flags)
The book from where I took all these ideas and it’s a canonical reference on software delivery: Continuous Delivery

Rachel · March 1, 2021, 3:05pm

Thanks @sauloperez for this summary

I’m still lost on what it means operationally and who is in charge of what. Can we try with an example?

Let’s say we want to toggle feature A entirely. And let’s take the simple case of a feature that needs completely to log in
PRs end up in code review and once they get 2 approvals, they get merged.
Release happens and I’m assuming only super admin get their toggle activated. Then what happens? It’s the responsibility of the instance manager and support team to test the feature and then decide for which users to turn it on?

Do we really have the problem? And if so, isn’t the bottleneck just moved on to production (ie on to support team)? How can we avoid that?

delivery cycle time (the time it takes from deciding to make a change to having it available to users)

I think this definition is misleading the conversation. Toggle do not make the delivery cycle shorter. They can make merging happen faster. But delivering to all users will still take as much time as before…

As the toggles enable even smaller PRs

Can you explain how and why?

Jana · March 1, 2021, 3:07pm

Thanks for summarizing!!

Totally, this is iterative and not set in stone. But, as with the introduction of every new process, it’s important to get really clear (even on the V1) upfront how this can be “operationalized”. As in how to communicate in which stage of testing/feedback loop a feature is in and from whom that feedback or green light is expected. Otherwise it could get messy.

Could we leave it up the teach lead and PO to start with? I think code reviewer are the ones who first consider whether or not manual testing is needed. So when the tech lead reviews, he can make the call.

@sauloperez , by this you are referring to the decide when the “3rd moment” that Felipe described has come?

After the feedback-loop is concluded and after green light from product/design: the complete feature (epic) is tested in production → functional and exploratory testing; actors: the testing team.

Related to this it´s also important to specify at which stage what group of users has “early access” to a feature. For example that helps to have a clear understanding for everyone who to ping for review/feedback at which point of time.

sauloperez · March 1, 2021, 4:14pm

Ok, I think the summary didn’t have the intended effect. I see we’re not all on the same page That’s totally fine. Let’s see if we can align all our ideas and definitions around feature toggles while not making this a wall of text.

EDIT OMG this is huge… shall we change the format of this discussion? I’d be very hard for me to read something like this…

First of all, I want to highlight this only applies in the very few cases were we will use a feature toggle. I don’t want anyone to think now everything will come hidden in a toggle and we’ll change all our delivery process.

Who is in charge of what

To answer you @Rachel

Release happens and I’m assuming only super admin get their toggle activated.

We started enabling customer balance to super admins because it’s the simplest way to get someone to use customer balance and gather production metrics while not revealing the change to regular users. It doesn’t mean it should always be like this. To me, this is a case per case decision to be taken at production curation, led by the PO and tech lead.

Then what happens? It’s the responsibility of the instance manager and support team to test the feature and then decide for which users to turn it on?

Nope, it’s still the responsibility of the core team, particularly PO and tech lead, to test the feature in production. That is, to gather feedback from users, feedback from instance managers, check performance metrics, etc. Then, it’s still the responsibility of the core team to decide how to proceed with the roll-out based on this feedback.

Do we really have the problem?

yes, I’ve watched the pipe closely and PRs tend to pile up in Test Ready (we can check Github’s even feed in the PRs themselves). I believe manual testing is one of the causes but not only.

And if so, isn’t the bottleneck just moved on to production (ie on to support team)?

We might still need to work on the deployed change to actually release it to customers (it’s still disabled for all users) but it’s not blocking the pipe: the code got merged to master which means that said PR won’t diverge from the rest of the codebase, it won’t need merge-conflict resolution, we know up until this point things are working and risks are also more manageable because smaller PR get shipped.

I think this definition is misleading the conversation. Toggles do not make the delivery cycle shorter.

yes, they do, because of what I mentioned above. Smaller changesets make it faster to master because their risk is smaller, they are also way quicker to review, and so they make it faster to production.

They can make merging happen faster. But delivering to all users will still take as much time as before…

I agree. We add steps that didn’t exist in the process, however, my hypothesis is that, due to the feedback-loop, at the point of delivering to all users, the team will be fully confident that what’s delivered solves users’ needs and with increased quality. I think that dividing the development into smaller chunks and acting upon this early feedback will shorten the time it takes to have a new feature available to all users and stable (no known bugs or edge cases). The earlier we catch defects, the cheaper they are to fix (this statement is not mine; it’s popular wisdom). It’s like we can adjust more often, and before it’s too late although there’s no magic. We’re still a small team.

Can you explain how and why?

yes, the small granularity level of a PR is no longer the smallest change that can be understood by and look consistent to users. We can go beyond that and ship bits of frontend stuff without any backend ready yet like we’ve been doing with unit prices.

Also, I think about customer balance. The goal in the first place was to be able to fix the performance hit while also fix the calculation everywhere in the app. Unthinkable to open a PR fixing it all in one shot. No one would review it.

sauloperez · March 1, 2021, 4:22pm

To answer you @Jana

by this you are referring to the decide when the “3rd moment” that Felipe described has come?

No, I meant after the “1st moment”. Once the code review has passed and we decide whether the PR goes to Test Ready or not. Then, for example, if I’m reviewing unit prices and I see the PR leaves the feature almost entirely usable I could move it to Test Ready and let you as PO plus JB and Andy know. That means @filipefurtado would test it while still having the toggle in place.