Message Title

David Martin commented on

The only reason I can think of using TimescaleDB is so Promethues can use it for long term storage. Unless there is benefit for the App Metrics service?

Prometheus docs mentions this adapter https://github.com/timescale/prometheus-postgresql-adapter
That also mentions TimescaleDB is optional "TimescaleDB (optional for better performance and scalability)"
So, this task could look at using the postgres adapater for Prometheus, without TimescaleDB initially so we solve the long term storage problem for Prometheus.

What's not clear at this point is the priority of having long term storage for Prometheus vs. relying on the currrent storage in a PV.
John Frizelle Maybe you can comment on that. There is no commitment this sprint to solving the long term storage problem for Promtheus.
The Prometheus docs explain about the issues around scalability and recoverability of 'local storage'
https://prometheus.io/docs/prometheus/latest/storage/#local-storage

2 pieces of interest from the docs

Note that a limitation of the local storage is that it is not clustered or replicated. Thus, it is not arbitrarily scalable or durable in the face of disk or node outages and should thus be treated as more of an ephemeral sliding window of recent data. However, if your durability requirements are not strict, you may still succeed in storing up to years of data in the local storage.

If your local storage becomes corrupted for whatever reason, your best bet is to shut down Prometheus and remove the entire storage directory. However, you can also try removing individual block directories to resolve the problem. This means losing a time window of around two hours worth of data per block directory. Again, Prometheus's local storage is not meant as durable long-term storage.

I wouldn't see solving this problem high priority at this time. One potential operational workaround is documenting that the PV should be backed up/replicated/clustered.

Add Comment

This message was sent by Atlassian JIRA