- How can performance analysis keep up with ever faster and more frequent release cycles in the DevOps world?

by Felix Willnecker (@Floix), fortiss GmbH, Germany, Johannes Kroß, fortiss GmbH, Germany, and André van Hoorn (@andrevanhoorn), University of Stuttgart, Germany
Associate Editor: Zhen Ming (Jack) Jiang, York University, Canada

Back in the “good old days”, a release occurred every month, quarter, or year—leaving enough time for a thorough quality analysis and extensive performance/load tests. However, these times are coming to an end or are almost over. Deploying every day, minute, or every couple of seconds becomes the new normal [1]. Agile development, test automation, consequent automation in the delivery pipeline and the DevOps movement drive this trend that conquers the IT world [2]. In this world, performance analysis is left behind. Tasks like load tests take too long and have a lot of requirements on the test and delivery environment. Therefore, performance analysis tasks are nowadays skipped and performance bugs are only detected and fixed in production. However, this is not a willful decision but an act from necessity [3]. The rest of this blog post is organized as follows: First, we outlines the three strategies on including performance analysis in your automatic delivery pipeline without slowing down your release cycles. Then we introduce the accompanied survey to find out how performance concerns are currently addressed in industrial DevOps practice. Finally, we conclude this blog post.

Strategy # 1: Rolling back and forward

The usual response that we get when talking about performance analysis in a continuous delivery pipeline is: “Well, we just roll back if something goes wrong”. This is a great plan: in theory. In practice, this often fails in emergency situations. First of all, this strategy requires not only a continuous delivery pipeline but also an automatic rollback mechanism. This is pretty easy on the level of an application server (just install release n-1), but is getting harder with databases (e.g., legacy table views for every change), and almost impossible if multiple applications and service dependencies are involved. Instead of rolling back, rolling forward is applied. Which means, we deploy as many fixes, until the issue is resolved. Such emergency fixes are often developed in a hurry or in war room sessions. When your company introduced continuous delivery pipeline they often promised that these war room sessions come to an end, just by releasing smaller incremental artifacts. Truth is, in case of emergency Murphy’s Law applies, your rollback mechanism fails and you spend the rest of the day/night resolving this issue.

Strategy # 2: Functional tests applied on performance

Another common strategy is using functional tests and derive some metrics that act as indicator for performance bugs. Measuring the number of exceptions or SQL statements during a functional test and comparing these numbers with a former release or baseline are common practice. Some tool support like the PerfSig utilizing Dynatrace AM exist to automate such analysis using the Jenkins build server [4]. This approach acts pro-actively, so issues can be detected before release and requires no additional tests, just some tooling and analysis software in your delivery pipeline. However, the impact on the performance of your application are vague. Resource utilization or response time measurements conducted during short functional tests usually delivery no meaningful values, especially if the delivery pipeline runs in a virtualized environment. Exceptions and SQL statements act as an indicator and may reduce the number of performance issues in production but won’t identify a poorly developed algorithm.

Strategy # 3: Model-based performance analysis

Performance models have their origin in academia and today are only rarely adopted by practitioners. However, such models can help to identify performance bugs in your software, without adding new tests. Nowadays, performance model generators exist that derive the performance characteristics of an application directly from a build system [5]. These approaches rely on measurements on operation and component level and require a good test coverage. A complete functional test run should execute each operation multiple times so that these generators can derive resource demands per operation. Changes in the resource demands indicate a performance change either for good (decreased resource demand) or for worse (increased resource demand). The main advantage compared to simple functional test analysis, is that a complete set of tests is analyzed and multiple runs of the same test set are supported. However, major changes in the test set, may require a new baseline for a model-based analysis.

Survey

To identify and capture the current state-of-the-art of performance practices, but also present problems and issues, we have launched a survey that we would like to promote and encourage you or your organization to participate in. We would like to find out how performance concerns are currently addressed in industrial DevOps practice and plan to integrate the impressions and results in a blueprint for performance-aware DevOps. Furthermore, we would like to know whether classical paradigms still dominate in your organization, at what stages performance evaluations are conducted, what metrics are relevant for you, and what actions are applied after a performance evaluation.

Our long-term aim is to not only conduct this survey once, but to benchmark the state-of-the-art continuously, compare the results over a longer period, and to regularly incorporate outcomes to our blueprint. The results of this survey will be incorporated into our bigger project of building the reference infrastructure for performance-aware DevOps and helps to understand DevOps in industry today.

Conclusions

Classical performance and load test phases may vanish and never come back. However, current strategies on reducing the risks of performance issues in production have a number of disadvantages. Rollback mechanisms might fail, functional tests only deliver indicators, and model-based evaluations lack of industrial tool support. Most of the time, performance does not receive enough or even any attention. In our opinion, this is primarily due to the fact that present performance management practices are not integrated and adapted to typical DevOps processes, especially in terms of automation and holistic tool support.

References

J. Seiden, Amazon Deploys to Production Every 11.6 Seconds, http://joshuaseiden.com/blog/2013/12/amazon-deploys-to-production-every-11-6-seconds/.

Puppet, 2016 State of DevOps Report, https://puppet.com/resources/white-paper/2016-state-of-devops-report.

A. Brunnert, A. van Hoorn, F. Willnecker, A. Danciu, W. Hasselbring, C. Heger, N. Herbst, P. Jamshidi, R. Jung, J. von Kistowski, A. Koziolek. Performance-oriented devops: A research agenda. arXiv preprint arXiv:1508.04752. 2015.

T-Systems MMS, PerfSig-jenkins. https://github.com/T-Systems-MMS/perfsig-jenkins,

M. Dlugi, A. Brunnert, H. Krcmar. Model-based performance evaluations in continuous delivery pipelines. In Proceedings of the 1st International Workshop on Quality-Aware DevOps. 2015.

If you like this article, you might also enjoy reading:

L. Zhu, L. Bass, G. Champlin-Scharff. DevOps and Its Practices. IEEE Software 33(3): 32-34. 2016.
M. Callanan, A. Spillane. DevOps: Making It Easy to Do the Right Thing. IEEE Software 33(3): 53-59. 2016.
Diomidis Spinellis. Being a DevOps Developer. IEEE Software 33(3): 4-5. 2016.

By: Shane McIntosh (@shane_mcintosh)

Associate Editor: Abram Hindle (@abramh)

Modern software is developed at a rapid pace. Last May (2015), Mozilla processed 8,363 updates to the codebase (roughly 270 updates per day!) The widespread adoption of techniques like Continuous Delivery (CD) accelerates the rate at which these changes become visible to users. Google, LinkedIn, and Facebook release several times daily. In May 2011, Amazon engineers deployed new releases to production every 11.6 seconds. Indeed, CD appears to be here to stay.

While its easy to sing CD's praises, there is plenty of hard work that goes into producing a smooth CD pipeline. At the heart of CD is the build system, i.e., the scripts, specifications, and tools that define and automate the complex build process of large software systems. Build systems orchestrate hundreds (or thousands!) of tool invocations, preserving the finicky order in which build commands must be executed. Rapid release cycles would be too error-prone and risky without a reliable build system in place.

In our research, we analyze the dark side of CD—the overhead that build systems introduces on development and release teams, and their infrastructure—with a particular focus on how the overhead can be mitigated.

Build Systems Require Maintenance!

Really, they do. While that statement may seem obvious to some, it's important that we're on the same page here. CD does not come for free. Indeed, our prior work shows that up to 27% of source code changes (and 44% of test code changes) are accompanied by changes to the build system.

In recent work, we asked ourselves "what can be done to mitigate build maintenance overhead?" We began by analyzing the impact of build technology choice. For example, do projects that adopt more modern build technologies like Maven incur less maintenance activity than projects that adopt older build technologies like Ant?

Surprisingly, the answer is no. In fact, our analyses of a large sample of open source repositories (177,039!) suggests that more modern build technologies are accompanied by greater quantities of build maintenance activity than older technologies are! In a follow-up study, we also found that more modern technologies like Maven tend to be more prone to copy-pasting than older technologies are. While there are several reasons for migrating to a more modern build technology, our analyses suggest that lowering maintenance activity does not hold.

On the other hand, we observed that there are open source projects that keep maintenance activity and cloning rates much lower than their counterparts. A deeper analysis of these projects revealed a couple of commonly-adopted patterns of creative build system abstraction.

Pattern 1: XML Entity Expansion

Rather than duplicating repetitive XML in their build.xml files, creative build engineers store common logic in a single file, and load it as a macro using the following snippet:

<!-- Define references to files containing common targets -->
<!DOCTYPE project [
  <!ENTITY modules-common SYSTEM "../modules-common.ent">
]>

Later, the macro can be expanded in various locations:

<project name="bea" default="all">
  <!-- Include the file containing common targets. -->
  &modules-common;
</project>

Pattern 2: On-the-fly Build Spec Generation

Although less egregious, copy-pasting is still quite a frequently occurring phenomenon in the build systems of C/C++ projects. In our work, we have observed that the studied C/C++ systems with a low rates of copy-pasting avoid duplication by filling in template build specs during an initial step of the build process. This also helps to keep build maintenance activity localized, avoiding painful duplicate effort when maintenance is required.

Conclusions

To keep up with the pace of modern software development, a robust and reliable build system is required. While recent advances have been made, build systems still require a considerable investment.

In our research, we evaluate means of reducing the overhead that's introduced by the build system. In this blog, I've presented two interesting patterns that we observed in the build systems of projects that have low build maintenance activity. More detail can be found in our papers.

If you liked this post, you may also like to read the IEEE Software Special Issue on Release Engineering. [magazine]

Sunday, November 27, 2016

Performance and the Pipeline

- How can performance analysis keep up with ever faster and more frequent release cycles in the DevOps world?

Strategy # 1: Rolling back and forward

Strategy # 2: Functional tests applied on performance

Strategy # 3: Model-based performance analysis

Survey

Conclusions

References

References

If you like this article, you might also enjoy reading:

Sunday, June 12, 2016

There Ain't No Such Thing as a Free Build

Build Systems Require Maintenance!

Pattern 1: XML Entity Expansion

Pattern 2: On-the-fly Build Spec Generation

Conclusions