UK's downtime retro

Matt-Yorkley · August 1, 2019, 3:22pm

It was an exciting and eventful week! We should have some new things in place soon that will allow us to make more #datadriven decisions on this front, as well as some greatly improved proactive security measures.

On the monitoring strategy side of things, I’m not sure about throwing more money at Datadog. Does their APM work with Rails 3?

Pending this issue being resolved: https://github.com/elastic/beats/issues/13103 …

…I have a proposal for a vastly improved monitoring system using a self-hosted ElasticStack server that would enable us to have full metrics, unlimited log collection, security monitoring, custom alerting, and (when we get to Rails 4) APM, across all instances, for the price of a server ($20?), instead of $50-$100+ per instance (if you include APM and full log collection). And we have ~10 instances now?

As soon as that above issue is closed/merged (the ElasticStack Postgres dashboard isn’t fully implemented), I’ll be strongly recommending we switch from Datadog. We would have all the features we have now, plus much more, on all servers, for a fraction of the price. Also the data visualization and custom Dashbord building is incredible.

https://www.elastic.co/products/kibana

P.S. I finished writing some Ansible playbooks/roles that would allow us to implement this setup about a month ago…

P.P.S if we start using the Intrusion Detection System I just added in the ofn-security repo, we could ingest the logs from that as well with this monitoring setup, with custom alerting and high-level visibility of security issues. For all servers, at no additional cost. And it would be really easy…