Gnocchi or Prometheus?

What if you had to choose

Wednesday 30 August 2017 Gnocchi, Prometheus Comments

The realm of time series database keeps expanding those last years. Now and then a new contender appears from the fog. People keep asking me about the difference between Gnocchi and Prometheus. It's time to content them.

Gnocchi and Prometheus are two open source projects evolving in the same expertise area, time series handling. They both are licensed under the Apache 2.0 license (see Gnocchi license file and Prometheus license file. And that's a good thing!

Both Gnocchi and Prometheus offers a bunch of features. Here's a table summary of the differences between the features they both offer – or not.

FeaturePrometheusGnocchi
Multi-tenant
User auth & ACL
Resource history
Metric polling
Highly available
Horizontal scalability
Alerting engine
Data compression
Pre-computed aggregation
Grafana support
collectd support

There's a lot of overlap between the two projects, but there are also some major differences.

First, Gnocchi does not try to solve the metric retrieval problem. Prometheus provides a pull mechanism and takes in charge of getting the measurements. Gnocchi developers estimate that they are plenty of tools already doing that and that work well, such as collectd.

Secondly, Prometheus offers an alerting engine, statically configured with a YAML file. It is way better than Gnocchi which offers nothing in comparison – for now. Gnocchi developers are discussing the feature and while it's not on the roadmap yet, it will happen. It will, however, leverage a REST API to be controlled, as it seems important to us to be able to define alerts programmatically.

Then there is a bunch of features where Gnocchi shines compared to Prometheus, and it is the core of its function: storing metrics. Gnocchi has a great storage engine that supports many storage backends (plain files, OpenStack Swift, Ceph…). It helps Gnocchi scaling horizontally and providing native high-availability, whereas Prometheus stays a single point of failure.

Multi-tenant and authentication are also supported by Gnocchi, allowing a single instance to be shared by multiple accounts. System administrators do not commonly use this kind of feature, but applications developers usually need them.

That brings me to the usage and querying of Prometheus and Gnocchi. Prometheus has its small DSL (referred to as PromQL) whereas Gnocchi has a fully featured REST API that tries to expose proper semantic. It does not seem there are major differences between the two in term of features.

Both Prometheus and Gnocchi support aggregating values over time ranges on query time ("give me the minimum value for every 5 minutes range over the last day"). Gnocchi always aggregates metrics at writing time, and never at query time (unless doing it cross-metrics). This implies that Gnocchi needs a bit of CPU time at write time to pre-compute those aggregates, but it is blazingly fast at reading time as it has nothing to compute. Prometheus can do the same thing using recording rules.

Prometheus has some limitations inherent to time series database designed around the notion of "monitoring": they tend to compute everything relatively to $NOW. For example, it seems impossible to inject data from the past. The timestamp for a value is the timestamp where Prometheus read that value. If Prometheus misses values for a few hours, don't think about importing it back.

I'm noting this here as it makes it harder to benchmark Prometheus for ingestion. You need tons of fake metrics to polls and build data. I did not find any reference of Prometheus performances online, though it is advertised to ingest "millions of measures from thousands of sources".

Query performances seem to vary on Prometheus, and I did not find any benchmark on that neither. Gnocchi leverages standard RDBMS (MySQL or PostgreSQL is supported) to query indexed data and the metrics retrieval is always O(1), making it always fast.

Conclusion

If you look in different and older areas, there never has been only one HTTP server. Many people use Apache HTTP server, but you'll find plenty of users of nginx, Tomcat, HAProxy, Node.js or uwsgi which are also common options nowadays. Same goes for RDBMS if you look at PostgreSQL, MySQL and other databases solution, etc. There will never be a project winning all the market share.

It seems to me that time series storage and management is also growing in this category. There will probably be various projects that will enjoy some popularity and growth. Every project addresses the time series problem space with a different view and different trade-offs. There might never be a single project solving all problems at once.

Prometheus seems to be oriented toward monitoring of live systems. Gnocchi is oriented to highly available time series storage at massive scale. Not considering performances (I was not able to compare anyway), both have different tradeoffs in term of features, philosophy, and orientation. Depending on your use cases, one might be a better fit than the other.