Hello! I’d like to make a proposal to switch our monitoring from Datadog (SAAS) to a self-hosted ElasticStack setup.
Datadog vs ElasticStack
We currently have limited monitoring on the “big three” instances (aka the Triumvirate) and no monitoring on the others, and this costs us around $180/month. These costs would increase if we added monitoring on additional servers (like US and Canada) or additional monitoring features, like improved log ingestion. The price:value ratio is pretty terrible and awful at scale.
If we switch to a self-hosted ElasticStack setup we can have world-class monitoring on all servers, and the cost will be the price of a decent server (maybe $40/month). The costs would not increase above $40/month if we added any number of additional servers, or any additional monitoring features. I think we could easily get the equivalent monitoring value of a $2000/month Datadog bill out of that $40/month.
With the Metabase work moving forward there is a growing need for custom monitoring to be added to the Metabase server. If we do it with Datadog it will be expensive…
I think Aus production currently has some custom alerting on metrics like disk space (via Wormly), which would be great to have on all servers. With ElasticStack we can set up any kind of alerts we want, like server uptime or alerts when the disk is dangerously full, on all servers. As OFN continues to grow I think we will increasingly need this kind of setup for managing an expanding infrastructure with a small team.
Cost of setup
Setting up a nice Ansible repo for automatically provisioning and configuring the ElasticStack server and configuring the new monitoring agents takes quite a bit of work… which I’ve already done. I pushed the new (private) repo this morning for devs to look at and can do a live demo at some point. It’s been sat on my hard drive for a while because it required Rails 4, but now that we’re there I’ve retested it and it’s basically production-ready.