What’s the problem
We currently don’t have an automated process to track API changes: if code changes and the corresponding specs are updated (i.e., green build) we may end up shipping changes which can break integrations.
We’ve had some unnoticed changes on endpoints which were not announced to instance managers. This was discussed this on the latest delivey-circle meeting, which followed-up on recent v0-integration outages, which happened here and here.
The first agreed change, was to better flag API changes, within the release process. This was done here, and introduces a dedicated secion when creating PRs, so that we can automatically generate release notes and timely signal these changes.
How to automate?
We’ve discussed briefly on how to automate and monitor these changes, on the unsupported v0, but also for the v1 and DFC APIs.
Two main ideas came about:
i) @maikel proposed to have a script which screens for spec changes, when drafting releases. This way, even if a developer forgets to signal a PR as API-changing, the script could catch it, and signal this to the release manger, which would in turn signal this to instance managers. This would happen before the release is deployed into production.
ii) @lin_d_hop proposed to use Postman or some other tool to assess which endpoints are used the most, and have a process for actively monitoring endpoints. This could happen either at staging (before), or at production servers (after deployment into production).
On proposal ii)
I’ve had an exploration of Postman to monitor a given endpoint, and found that:
monitors can be set up in collections which can (as far as I understood) be ran as part of a GitHub Action (by using Newman, a CLI tool to run Postman collections) - I have not tried this
we’re currently on the process to replace Datadog with New Relic. It seems possible to integrate Postman API monitoring with New Relic. It could be a nice to have, to see all monitoring in one place. I have not tried this either.
Summary and Open questions
Proposal i) sounds like a quick win, which might not take too much time to implement.
Proposal ii) might be something to aim for, as a process to back up all API work, and assure integrations work. There might be several ways to achieve this. The Postman-Newman-New Relic process may have some pitfalls though:
- cost? I think there might be a limit to the number of tests endpoints we can use on a free basis
- how seamless is this really? I’m wondering what others think on this approach
Other more general questions
- monitoring production or staging? Monitoring production has obvious wins, but may have some downsides as well:
- production monitoring: we see endpoints breaking only after shipping, which may be too late; may impact performance?
- staging monitoring: the disadvantages above don’t apply, but tests in staging are always less valuable than production tests, as we may fail to take into account production configurations (timezone, traffic, etc)
- is there a tool/process to know which integrations use which endpoints the most?
- what other process/tools could be used instead, to actively monitor our APIs?
- how should we prioritize this exploration?
Plenty of questions. Thoughts?