What’s the problem
We currently don’t have an automated process to track API changes: if code changes and the corresponding specs are updated (i.e., green build) we may end up shipping changes which can break integrations.
Some background
We’ve had some unnoticed changes on endpoints which were not announced to instance managers. This was discussed this on the latest delivey-circle meeting, which followed-up on recent v0-integration outages, which happened here and here.
First measure
The first agreed change, was to better flag API changes, within the release process. This was done here, and introduces a dedicated secion when creating PRs, so that we can automatically generate release notes and timely signal these changes.
How to automate?
We’ve discussed briefly on how to automate and monitor these changes, on the unsupported v0, but also for the v1 and DFC APIs.
Two main ideas came about:
i) @maikel proposed to have a script which screens for spec changes, when drafting releases. This way, even if a developer forgets to signal a PR as API-changing, the script could catch it, and signal this to the release manger, which would in turn signal this to instance managers. This would happen before the release is deployed into production.
ii) @lin_d_hop proposed to use Postman or some other tool to assess which endpoints are used the most, and have a process for actively monitoring endpoints. This could happen either at staging (before), or at production servers (after deployment into production).
On proposal ii)
I’ve had an exploration of Postman to monitor a given endpoint, and found that:
-
it’s fairly easy to set up monitors, which make real requests and check for payloads with a Javascript test. As a proof of concept, one can see that the endpoint monitoring goes from green to red, when different releases are staged, on a staging server.
-
monitors can be set up in collections which can (as far as I understood) be ran as part of a GitHub Action (by using Newman, a CLI tool to run Postman collections) - I have not tried this
-
we’re currently on the process to replace Datadog with New Relic. It seems possible to integrate Postman API monitoring with New Relic. It could be a nice to have, to see all monitoring in one place. I have not tried this either.
Summary and Open questions
Proposal i) sounds like a quick win, which might not take too much time to implement.
Proposal ii) might be something to aim for, as a process to back up all API work, and assure integrations work. There might be several ways to achieve this. The Postman-Newman-New Relic process may have some pitfalls though:
- cost? I think there might be a limit to the number of tests endpoints we can use on a free basis
- tests are in Javascript - I could not find an easy way to use rspec while monitoring with Postman
- how seamless is this really? I’m wondering what others think on this approach
Other more general questions
- monitoring production or staging? Monitoring production has obvious wins, but may have some downsides as well:
- production monitoring: we see endpoints breaking only after shipping, which may be too late; may impact performance?
- staging monitoring: the disadvantages above don’t apply, but tests in staging are always less valuable than production tests, as we may fail to take into account production configurations (timezone, traffic, etc)
- is there a tool/process to know which integrations use which endpoints the most?
- what other process/tools could be used instead, to actively monitor our APIs?
- how should we prioritize this exploration?
Plenty of questions. Thoughts?