/ OpenStack

Lessons from OpenStack Telemetry: Deflation

This post is the second and final episode of Lessons from OpenStack Telemetry. If you have missed the first post, you can read it here.

Splitting

At some point, the rules relaxed on new projects addition with the Big Tent initiative, allowing us to rename ourselves to the OpenStack Telemetry team and splitting Ceilometer into several subprojects: Aodh (alarm evaluation functionality) and Panko (events storage). Gnocchi was able to join the OpenStack Telemetry party for its first anniversary.

Finally being able to split Ceilometer into several independent pieces of software allowed us to tackle technical debt more rapidly. We built autonomous teams for each project and gave them the same liberty they had in Ceilometer. The cost of migrating the code base to several projects was higher than we wanted it to be, but we managed to build a clear migration path nonetheless.

Gnocchi Shamble

With Gnocchi in town, we stopped all efforts on Ceilometer storage and API and expected people to adopt Gnocchi. What we underestimated is the unwillingness of many operators to think about telemetry. They did not want to deploy anything to have telemetry features in the first place, so adding yet a new component (a timeseries database) to have proper metric features was seen a burden – and sometimes not seen at all.
Indeed, we also did not communicate enough on our vision for that transition. After two years of existence, many operators were asking what Gnocchi was and what they needed it for. They deployed Ceilometer and its bogus storage and API and were confused about needing yet another piece of software.

It took us more than two years to deprecate the Ceilometer storage and API, which is way too long.

Deflation

In the meantime, people were leaving the OpenStack boat. Soon enough, we started to feel the shortage of human resources. Smartly, we never followed the OpenStack trend of imposing blueprints, specs, bug reports or any process to contributors, obeying my list of open source best practice. This flexibility allowed us to iterate more rapidly; compared to other OpenStack projects; we were going faster proportionately to the size of our contributor base.

Capturer Le moment

Nonetheless, we felt like bailing out a sinking ship. Our contributors were disappearing while we were swamped with technical debt: half-baked feature, unfinished migration, legacy choices and temporary hacks. After the big party that happened, we had to wash the dishes and sweep the floor.

Being part of OpenStack started to feel like a burden in many ways. The inertia of OpenStack being a big project was beginning to surface, so we put up a lot of efforts to dodge most of its implications. Consequently, the team was perceived as an outlier, which does not help, especially when you have to interact with a lot your neighbors.

The OpenStack Foundation never understood the organization of our team. They would refer to us as "Ceilometer" whereas we formally renamed ourselves to "Telemetry" since we were englobing four server projects and a few libraries. For example, while Gnocchi has been an OpenStack project for two years before leaving, it has never been listed on the project navigator maintained by the foundation.

That's a funny anecdote that demonstrates the peculiarity of our team, and how it has been both a strength and a weakness.

Competition

Nobody was trying to do what we were doing when we started Ceilometer. We filled the space of metering OpenStack. However, as the number of companies involved increased and the friction with it along, some people grew unhappy. The race to have a seat at the table of the feast and becoming a Project Team Leader was strong, so some people preferred to create their project rather than trying to play the contribution game. In many areas, including our, that divided the effort up to a ridiculous point where several teams where doing the exact the same thing, or were trying to step on each other toes to kill the competitors.

We spent a significant amount of time trying to bring other teams in the Telemetry scope, to unify our efforts, without much success. Some companies were not embracing open-source because of their cultural differences, while some others had no interest to join a project where they would not be seen as the leader.

That fragmentation did not help us, but also did not do much harm in the end. Most of those projects are now either dead or becoming irrelevant as the rest of the world caught up on what they were trying to do.

Epilogue

As of 2018, I'm the PTL for Telemetry – because nobody else ran. The official list of maintainer for the telemetry projects is five people: two are inactive, and three are part-time. During the latest development cycle (Queens), 48 people committed in Ceilometer, though only three developers made impactful contributions. The code size has been divided by two since the peak: Ceilometer is now 25k lines of code long.

Panko and Aodh have no active developer. A RedΒ Hat colleague and I are maintaining the projects afloat to keep it working.

Gnocchi has humbly thriven since it left OpenStack. The stains from having been part of OpenStack are not yet all gone. It has a small community, but users see its real value and enjoy using it.

Those last six years have been intense, and riding the OpenStack train has been amazing. As I concluded in the first blog post of this series, most of us had a great time overall; the point of those writings is not to complain, but to reflect.

I find it fascinating to see how the evolution of a piece of software and the metamorphosis of its community are entangled. The amount of politics that a corporately-backed project of this size generates is majestic and has a prominent influence on the outcome of software development.

So, what's next? Well, as far as Ceilometer is concerned, we still have ideas and plans to keep shrinking its footprint to a minimum. We hope that one-day Ceilometer will become irrelevant – at least that's what we're trying to achieve so we don't have anything to maintain. That mainly depends on how the myriad of OpenStack projects will chose to address their metering.

We don't see any future for Panko nor Aodh.

Gnocchi, now blooming outside of OpenStack, is still young and promising. We've plenty of ideas and every new release brings new fancy features. The storage of timeseries at large scale is exciting. Users are happy, and the ecosystem is growing.

We'll see how all of that concludes, but I'm sure it'll be new lessons to learn and write about in six years!

Lessons from OpenStack Telemetry: Deflation
Share this