It has been a long time since I have tested Gnocchi performances. Last time was two years ago, on version 2. The current version for Gnocchi is 4.0, released a couple of months ago. It adds a lot of new features, such as a Redis incoming driver and a new job distribution method.
Many of those features and improvement implemented over the last couple of years were made with performance in mind. It is time to check if this lives up to our expectation.
I have pulled the servers I used a couple of years ago out of the dust, updated them with latest RHEL 7 and installed Gnocchi 4.0.1 and Redis 4.0.1 on one of them. I used the other server as the benchmark client, in charge of generating a bunch of loads.
The hardware configuration for each server is:
2 × Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz (6 cores each)
32 GB RAM
SanDisk Extreme Pro 240GB SSD
I have installed Gnocchi using
pip install gnocchi[postgresql,file,redis], created a PostgreSQL database and wrote the following configuration file:
[indexer] url = postgresql://root:@localhost/gnocchi # Uncomment when testing with Redis # [incoming] # driver = redis [storage] file_basepath = /root/gnocchi-venv/data
The perk of having good default values: you only to write a couple of configuration lines to get it working.
I have used uWSGI as the Web server, using the configuration
file provided Gnocchi's documentation and configured it with 64 processes and 16 threads.
Since the hardware configurations are identical, I allow myself in this article to compare the performances of Gnocchi 2 and Gnocchi 4 directly.
For generating loads, I have reused the code that I wrote and merged
in python-gnocchiclient. It is still not that easy to generate a lot of parallel loads in Python, though it is still the best tool I find available that was not too complicated to setup for things like CRUD operations.
To benchmark measures, I needed something very fast to generate requests on the client side to be sure to be able to overload the server. I have leveraged wrk, which is written in C++ and is fast. It is scriptable using Lua, so it made it easy to generate fake batches of data.
Metric CRUD operations
The first step is to benchmarks the CRUD operations for metrics. Here are the
results, compared to the benchmarks I did against Gnocchi 2.
Without surprises (but with great pleasure), everything is between 13% and 26% faster. Those operations mostly consist of SQL operations for the backend and serialization on the API – nothing heavy.
Sending and getting measures
Writing measures is still the hottest topic! How fast can you push things into that time series database and how efficient it is at retrieving those?
Gnocchi has been supporting various batching methods for a while, and here the tested one is the simplest case, i.e., batching for one metric at a time.
I think the chart talks for itself. With Redis as a driver, I attained almost 1 million measures per second. I did not find a suitable tool to report performances with a payload bigger than 5000 points, so I stopped at that. Those results are inline with what Gordon Chung measured recently on Gnocchi 4 – though he achieved 1.3 million measures per second with his bigger hardware!
These are performances using HTTP as a protocol – with all its overhead and JSON serialization going on. Gnocchi does not implement any custom protocol so far because we never had any requirement for more performances. However, that would certainly be a good path to follow for anyone wanting to go even faster.
Reading metrics is 54% faster here again. You can retrieve up to 400 000 measures per second (around 150 Mbit/s of data). That means you can retrieve a metric with a whole year of measures with a one-minute aggregate in 1.3 seconds. More realistically, you can retrieve the last 24 hours of data with a one minute precision for 280 metrics in just one second. That is more data you could ever fit on your graph dashboard!
Most of the time is spent serializing points in JSON – again, a different retrieving mechanism could be envisioned to achieve even higher performances.
I did not benchmark myself metricd speed, as Gordon wrote a complete report in the meantime. Gnocchi 4 multiplies the processing speed from Gnocchi 2 by a factor of 2.
This speed is quite impressive and allows Gnocchi to ingest and pre-compute considerable amount of data in a short time span. Some of the changes Gordon tested here are not yet released and will be part of the next minor release (4.1).
Being that efficient means that with only 1 CPU, Gnocchi can process (data aggregation) roughly 700 measures per second. If you have 70 servers and gather 10 metrics per server every second, Gnocchi can process them without any delay.
If you scale back your polling to one minute instead of one second (the most common scenario) and use a single computer with 12 cores, that means Gnocchi can aggregate the metrics from 50 400 servers with only one server.
Not that bad.
Our processing engine is getting now really mature. Hundreds of deployments are now using it for production purpose of gathering metrics. The recent improvements made for Gnocchi 4 are a compelling argument for users to upgrade, and we are pretty proud of our work! We still have a few ideas on how to improve some corner cases, but the general use case is getting well covered. Adding to that the native horizontal capability that Gnocchi provides since day one, it is getting hard to find a time series database that has those features with this level of performance (but of course I'm biased, haha).
And if you have any questions, feel free to shoot them in the comment section. 😉