Live integration tests / setup verification

maikel · October 21, 2018, 11:33pm

When Google Maps stopped working we wished we had some automated test for it. We have lots of automated tests for our code. But this kind of problem is not in our code. Google changed their API and the OFN depends on it.

This kind of integration can only be tested in a live setup, using the real configuration of the app, testing the real APIs of other services. And in some cases it involves doing real payments using real money. That’s why these tests are usually not automated and not very popular. They are also very complex.

Nevertheless, it would be awesome to easily test the following scenarios with the push of a button:

Sending and receiving emails that include the right confirmation links.
Payments work and the money is received in the right account.
Enterprises are placed on the map correctly.

I would like to know your thoughts about the importance of these tests and if you have any idea how we integrate them into the current process. Does anyone have experience with this kind of tests? Do we actually want a test script to have access to a bank account to verify that a payment was made?

Another strategy is to improve error handling. The checkout is used all the time. When something fails, we need to get notified and customers need to see a nice message explaining what’s going on. I think this is happening already. When Google Maps stopped working, the only symptom was a missing pointer on the map. There was no notification. If we made sure that this is reported properly, we don’t need to run an additional test. The code is used in production all the time.

@luisramos0 This is a response to your feedback on my pull request.
@sauloperez @MyriamBoure @Matt-Yorkley @kristinalim I would love to know your thoughts on this. I would love everybody’s thoughts, but it wouldn’t be good to ping everybody here.

luisramos0 · October 22, 2018, 9:41am

Yeah, thanks so much for creating the thread with details @mkllnk
It’s a timeless topic, these things keep breaking and keep annoying users.
It’s good we have a place to discuss how to improve them.

I don’t have a specific solution/approach in mind. But it’s certainly something we should and can improve.

My first thoughts are that for payments we need good alerts on user actions, when a payment fails live we need to be alerted and investigate all alerts with priority.
In terms of emails and probably maps, the approach could be to have a test script in production validating the process.

This needs some investigation.

MyriamBoure · October 27, 2018, 7:53am

I don’t think I’m tech savy enough to contribute to that discussion but I would of course vote first on good alert when something goes wrong and we can fix asap. If tests are super heavy to setup at least that would be a start… But I’ll let you discuss tech things

sauloperez · October 31, 2018, 4:37pm

Great topic @maikel.

To me adding a test script to check this things should be considered very carefully. If you ask me, I would not do that now for two reasons.

First, the higher the level of the test, the morel likley it is to report false positives. That means extra maintenance overhead. I don’t need to tell you why, just think of our feature tests :trolface:

Second, I feel there are easier things we can do first such as release testing (we starting that, yay! ) iterating from our learnings there and improving our error handling as you suggested.

Related to the latter, I think it has come the time to start considering proper app logging/monitoring. Not only should we get an error on bugsnag when a payment fails, but we should also have logs reporting events such as empty geolocation resolution, payments that go through, payments that do not for some expected reason, etc.

Then on top of this we can have alarms although it . With the geolocation resolution for instance, a drop on the number of addresses that are succesfully resolved could have raised an alarm and we could have noticed way earlier.

At Coopdevs we’ve start settingup a logging server for one of our customers using an ELK stack, which allows us to have a dashboard like:

maikel · November 1, 2018, 5:50am

I agree. I think good error reporting is much better than trying to run automated tests on live systems. Our users are testing it all the time. We just need to get the feedback. The test results (logs) are all we need.