Sunday, August 14, 2016

Release management in Open Source projects

By: Martin Michlmayr (@MartinMichlmayr)
Associate editor: Stefano Zacchiroli (@zacchiro)

Open source software is widely used today. While there is not a single development method for open source, many successful open source projects are based on widely distributed development models with many independent contributors working together. Traditionally, distributed software development has often been seen as inefficient due to the high level of communication and coordination required during the software development process. Open source has clearly shown that successful software can be developed in a distributed manner.

The open source community has over time introduced many collaboration systems, such as version control systems and mailing lists, and processes that foster this collaborative development style and improve coordination. In addition to implementing efficient collaboration systems and processes, it has been argued that open source development works because it aims to reduce the level of coordination needed. This is because development is done in parallel streams by independent contributors who work on self-selected tasks. Contributors can work independently and coordination is only required to integrate their work with others.

Relatively little attention has been paid to release management in open source projects in the literature. Release management, which involves the planning and coordination of software releases and the overall management of releases throughout the life cycle, can be studied from many different aspects. I investigated release management as part of my PhD from the point of view of coordination theory. If open source works so well because of various mechanism to reduce the level of coordination required, what implications does this have on release management which is a time in the development process when everyone needs to come together to align their work?

Complexity of releases

As it turns out, my study on quality problems has highlighted that release management can be a very problematic part in the production of open source. Several projects described constant delays with their releases, leading to software which is out-of-date or which has other problems. Fundamentally, release management relies on trust. Contributors have to trust release managers and deadlines imposed by them, otherwise the deadlines are simply ignored. This leads to a self-fulfilling prophecy: because developers don't believe a release will occur on time, they continue to make changes and add new code, leading to delays with the release. It's very hard to break such a vicious circle.

It's important to consider why creating alignment in open source projects can be a challenge. I identified three factors that made coordination difficult:
  1. Complexity: alignment is harder to achieve in large projects and many successful open source projects have hundreds of developers.
  2. Decentralization: many open source projects are distributed, which can create communication challenges.
  3. Voluntary nature: it's important to emphasize that this does not mean that contributors are unpaid. While some open source contributors are unpaid, increasingly open source development is done by developers employed or sponsored by corporations. The Linux kernel is a good example with developers from hundreds of companies, such as Intel and Red Hat. What I mean by voluntary nature is that the project itself has no control over the contributors. The companies (in the case of paid developers) define what work gets done and unpaid contributors generally "scratch their own itch". What this means is that it's difficult for a release manager or a project leader to tell everyone to align their work at the same time.

Time-based releases

While my research has shown problems with release management in many projects, it has also identified a novel approach to release management employed by an increasing number of projects. Instead of doing releases based on new features and functionality, which has historically been the way releases are done, releases are made based on time. Time-based releases are like a release train: there is a clear timetable by which developers can orient themselves to plan their work. If they want to make the next train (release), they know when they have to be ready.

Time-based releases work particularly well if the releases are predictable (for example, every X months) and frequent. If a release is predictable, developers can plan accordingly. If a release is fairly frequent, it doesn't matter if you miss a release because you can simply target the next one. You can think of a train station with a train that is leaving. Should you run? If you know the next train will leave soon and you can trust that the next train will leave on time, there is no need to hurry — you can avoid a dangerous sprint across the platform. Similarly, if the next release is near and predictable, you don't need to disrupt the project by making last minute changes.

Additionally, frequent releases give developers practice making releases. If a project does releases only every few years, it's very hard to make the release process an integral process of the development cycle. When releases are done on a regular basis (say every three or six months), the release process can work like a machine — it becomes part of the process.

Time-based releases are a good mechanism for release management in large open source projects. Speaking in terms of coordination theory, time-based releases decrease the level of coordination required because the predictable timetable allows developers to plan for themselves. The timetable is an important coordination mechanism in its own right.

Looking at various open source projects, time-based release management is implemented in different ways. For example, GNOME and Ubuntu follow a relatively strict frequency of six months. This is frequent, predictable and easy to understand. The Linux kernel employs a 2 week "merge window" in which new features are accepted. This is followed by a number of release candidates with bug fixes (but no new features) until the software is ready for release, typically after 7 or 8 release candidates. Debian follows a model where the freeze date is announced in advance. The time between freeze and release depends on how fast defects get fixed. Debian's freeze dates are more than 2 years apart, which in my opinion leads to challenges because the release process is not performed often enough to become a routine.

Recent changes

There have been many changes recently in the industry and development community that have influenced release management. One change relates to the release frequency, which has been going up in several projects (such as U-Boot, which moved from a release every three months to one every two months in early 2016). This may stem from the need to serve updates to users more frequently because the technology is changing so rapidly. It could also be because the release train is working so well and the cost of doing releases has gone down. Frequent releases lead to a number of questions, though. For example, do users prefer small, incremental updates over larger updates? Furthermore, how can you support old releases or are users expect to upgrade to the latest version immediately (a model employed by various app stores for mobile devices)?

In addition to more frequent releases, I also observe that some projects have stopped making releases altogether. In a world of Continuous Integration (CI) and Continuous Deployment (CD), does it make sense to offer the latest changes to users because you can assure that every change has been sufficiently tested? Is there still a value of performing separate releases when you have CI/CD?

I believe more research is needed to understand release management in our rapidly changing world, but one thing is clear to me: studying how contributors with different interests come together to produce and release software is fascinating!


1 comment: