Sunday, June 12, 2016

There Ain't No Such Thing as a Free Build


By: Shane McIntosh (@shane_mcintosh)
Associate Editor: Abram Hindle (@abramh)

Modern software is developed at a rapid pace. Last May (2015), Mozilla processed 8,363 updates to the codebase (roughly 270 updates per day!) The widespread adoption of techniques like Continuous Delivery (CD) accelerates the rate at which these changes become visible to users. Google, LinkedIn, and Facebook release several times daily. In May 2011, Amazon engineers deployed new releases to production every 11.6 seconds. Indeed, CD appears to be here to stay.

While its easy to sing CD's praises, there is plenty of hard work that goes into producing a smooth CD pipeline. At the heart of CD is the build system, i.e., the scripts, specifications, and tools that define and automate the complex build process of large software systems. Build systems orchestrate hundreds (or thousands!) of tool invocations, preserving the finicky order in which build commands must be executed. Rapid release cycles would be too error-prone and risky without a reliable build system in place.

In our research, we analyze the dark side of CD—the overhead that build systems introduces on development and release teams, and their infrastructure—with a particular focus on how the overhead can be mitigated.

Build Systems Require Maintenance!

Really, they do. While that statement may seem obvious to some, it's important that we're on the same page here. CD does not come for free. Indeed, our prior work shows that up to 27% of source code changes (and 44% of test code changes) are accompanied by changes to the build system.

In recent work, we asked ourselves "what can be done to mitigate build maintenance overhead?" We began by analyzing the impact of build technology choice. For example, do projects that adopt more modern build technologies like Maven incur less maintenance activity than projects that adopt older build technologies like Ant?

Surprisingly, the answer is no. In fact, our analyses of a large sample of open source repositories (177,039!) suggests that more modern build technologies are accompanied by greater quantities of build maintenance activity than older technologies are! In a follow-up study, we also found that more modern technologies like Maven tend to be more prone to copy-pasting than older technologies are. While there are several reasons for migrating to a more modern build technology, our analyses suggest that lowering maintenance activity does not hold.

On the other hand, we observed that there are open source projects that keep maintenance activity and cloning rates much lower than their counterparts. A deeper analysis of these projects revealed a couple of commonly-adopted patterns of creative build system abstraction.

Pattern 1: XML Entity Expansion

Rather than duplicating repetitive XML in their build.xml files, creative build engineers store common logic in a single file, and load it as a macro using the following snippet:

<!-- Define references to files containing common targets -->
<!DOCTYPE project [
  <!ENTITY modules-common SYSTEM "../modules-common.ent">
]>

Later, the macro can be expanded in various locations:

<project name="bea" default="all">
  <!-- Include the file containing common targets. -->
  &modules-common;
</project>

Pattern 2: On-the-fly Build Spec Generation

Although less egregious, copy-pasting is still quite a frequently occurring phenomenon in the build systems of C/C++ projects. In our work, we have observed that the studied C/C++ systems with a low rates of copy-pasting avoid duplication by filling in template build specs during an initial step of the build process. This also helps to keep build maintenance activity localized, avoiding painful duplicate effort when maintenance is required.

Conclusions

To keep up with the pace of modern software development, a robust and reliable build system is required. While recent advances have been made, build systems still require a considerable investment.

In our research, we evaluate means of reducing the overhead that's introduced by the build system. In this blog, I've presented two interesting patterns that we observed in the build systems of projects that have low build maintenance activity. More detail can be found in our papers.



If you liked this post, you may also like to read the IEEE Software Special Issue on Release Engineering. [magazine]

No comments:

Post a Comment