Monday, August 13, 2018

IEEE Software Blog March/April Issue, Blog, and Radio Summary

The March/April Issue of IEEE Software, as usual, is chock full of interesting articles on challenges and advances in software engineering. The topics in this issue range from the always popular topics of DevOps and security to the related but separate topic of release engineering.

One special addition to this issue is a special thanks to all those that participate in the reviewing efforts in 2017. Of course the reviewers help make IEEE Software the magazine that it is, so thank you from us all!

As with each issue, this issue includes a special focus topic: Release Engineering. The following articles are included in the March/April issue of Software on release engineering:

The articles "The Challenges and Practices of Release Engineering" and "Release Engineering 3.0" set the tone for the focus articles in this issue. The two articles provide some background on release engineering and discuss the state of the art in the field. Each of the other articles take a deeper dive into more specific aspects of release engineering. For example, in the article "Over-the-Air Updates for Robotic Swarms", the authors present a toolset for sending code updates over-the-air to robot swarms. 

One topic that always seems to find its way into each issue of IEEE Software is agile software development. The following articles appeared in this issue on agile development:

In "Practitioners' Agile-Methodology Use and Job Perceptions", the authors report on a survey conducted to better understand practitioner perceptions of agile methodology use. Similarly, "Making Sense of Agile Methods" provides insights into agile methodologies based on personal experiences of the author. 

Wanna know more? Make sure you check out the March/April issue of IEEE Software today!

IEEE Software Blog

The blog was a little light for March and April (as many of us had deadlines we were meeting). But of course we always try to make sure there's some kind of knowledge sharing going on!

For those who are also a little behind on things and want a quick way to catch up on the previous issue (January/February), there's a summary posted on the blog.

In the post "Which design best practices should be taken care of?", the authors report on the results from a survey sent out to learn more about the importance of design best practices. The post also reports on some of what was found to be the more important design concerns, such as code clones and package cycles. For those interested, there's also a reference to the authors full research article on this work.

The other April post, titled "Efficiently and Automatically Detecting Flaky Tests with DeFlaker", the authors present a new approach, called DeFlaker, that can be used to detect flaky tests (without having to re-run them!). The post also includes some details on the evaluation of DeFlaker and other relevant resources (such as a link to the project's GitHub page and full publication) for those interested in learning more about this tool.

SE Radio

The SE Radio broadcasts this issue were also a little light, but of course no less interesting! This issue is a technical one, with all discussions focused on various technologies. Nicole Hubbard joined SE Radio host Edaena Salinas to talk about migrating VM infrastructures to Kubernetes -- for those of you (like me) who have no clue what that is, they talk about that too!
Nate Taggart spoke with Kishore Bhatia about going serverless and what exactly that means.
And lastly but certainly not least in this issue, Péter Budai sat down with Kim Carter to talk about End to End Encryption (E2EE) and when it cane (and should) be used.

Also, for those looking for some extracurriculars to fill their free time, SE Radio is looking for a new volunteer host! For more information, see the SE Radio website:

Monday, July 16, 2018

It’s Time For Secure Languages

By: Cristina Cifuentes, Oracle Labs (@criscifuentes)
Associate Editor: Karim Ali, University of Alberta (@karimhamdanali)
Back in 1995, only 25 Common Vulnerabilities and Exposures (CVE) were reported.But by 2017, that number had blown out to 10,000+, as reported in the National Vulnerability Database (NVD). As the CVE list is a dictionary of publicly disclosed vulnerabilities and exposures, it should give an idea of the current situation.
However, it does not include all data, because some vulnerabilities are never disclosed. For example, they may be found internally and fixed, or they may be in cloud software that is actively upgraded through continuous integration and continuous delivery (CI/CD). Node.js has its own Node Security Platform Advisories system.
A five-year analysis (2013 to 2017) of the labelled data in NVD reveals that three of the top four most common vulnerabilities are issues that can be taken into account in programming language design:
  • 5,899 buffer errors
  • 5,851 injection errors1
  • 3,106 information leak errors
Combined, these three types of vulnerabilities represent 53% of all labelled exploited vulnerabilities listed in the NVD for that period, and they affect today’s mainstream languages.
We have known about exploitation of buffer errors since the Morris worm exploited a buffer error in the Unix finger server over 25. SQL injections2 and XSS exploits3 have been documented in the literature since 1998 and 2000, respectively. A 20164 study revealed that the average cost of a data breach is $4 million for an average of 10,000 records stolen through a vulnerability exploit. Not including the financial loss, imagine all the innovation that could happen if more than 50% of vulnerabilities did not exist!
Not only is the percentage of issues high, data on mainstream programming languages shows that none of them provide solutions to these three areas at the same time. A few languages provide solutions to buffer errors and/or injection errors, and no language provides a solution to information leaks. In other words, there are no mainstream secure languages that prevent these issues.

Why is this happening?

Let’s be clear about one thing – developers do not write incorrect code because they want to. It happens inadvertently, because our software programming languages do not provide the right abstractions to support developers in writing error-prone code.
Abstractions in programming languages introduce different levels of cognitive load. The easier it is for an abstraction to be understood, the more accepted that abstraction becomes. For example, managed memory is an abstraction that frees the developer from having to manually keep track of memory (both allocation and deallocation). As such, this abstraction is widely used in a variety of today’s programming languages, such as Java, JavaScript, and Python.
At the same time, performance of the developed code is also of interest. If an abstraction introduces a high performance overhead, it makes it hard for that abstraction to be used in practice in some contexts. For example, managed memory is not often used in embedded systems or systems programming due to its performance overhead.
At the root of our problem is the fact that many of our mainstream languages provide unsafe abstractions to developers; namely, manual management of pointers, manual string concatenation and sanitization, and manual tracking of sensitive data. These three abstractions are easy to use, they provide low performance overhead, but they are not easy to write correct code for. Hence, if used correctly, they have a high cognitive load on developers.
We need to provide safe abstractions to developers; ideally, abstractions that have low cognitive load and low performance overhead. I will briefly review three different abstractions for each of the vulnerabilities of interest, to show abstractions that could provide a solution in these areas.

1. Avoiding buffer errors through ownership and borrowing

Rust is a systems programming language that runs fast, prevents memory corruption, and guarantees memory and thread safety. Not only does it prevent buffer errors, it prevents various other types of memory corruptions, such as null pointers, and use after free. This feature is provided by the introduction of ownership and borrowing into the type system of the language:
  • Ownership is an abstraction used in C++ whereby a resource can have only one owner. Ownership with resource acquisition is initialiszation (RAII) ensures that whenever an object goes out of scope, its destructor is called and its owned resource is freed. Ownership of a resource is transferred (i.e., moved) through assignments or passing arguments by value. When a resource is moved, the previous owner can no longer access it, therefore preventing dangling pointers.
  • Borrowing is an abstraction that allows a reference to a resource to be made available in a secure way – either through a shared borrow (&T), where the shared reference cannot be mutated, or a mutable borrow (&mut T), where the shared reference cannot be aliased, but not both at the same time. Borrowing allows for data to be used elsewhere in the program without giving up ownership. It prevents use after free and data races.
Ownership and borrowing provide abstractions suitable for memory safety, and prevent buffer errors from happening in the code. Anecdotal evidence seems to suggest that the learning curve to pick up these abstractions takes some time, pointing to a high cognitive load.

2. Avoiding injection errors through taint tracking

Perl is a rapid prototyping language with over 29 years of development. It runs on over 100 platforms, from portables to mainframes. In 1989, Perl 3 introduced the concept of taint mode, to track external input values (which are considered tainted), and to perform runtime taint checks to prevent direct or indirect use of the tainted value in any command that invokes a sub-shell, or any command that modifies files/directories/processes, except for arguments to print and syswrite, symbolic methods and symbolic subreferences, or hash keys. Default tainted values include all command-line arguments, environment variables, locale information, results of some system calls (readdir(), readlink()), etc.
Ruby is a dynamic programming language with a focus on simplicity and productivity. It supports multiple programming paradigms, including functional, object-oriented, imperative, and reflective. Ruby extends Perl’s taint mode to provide more flexibility. Four safe levels are available, of which the first two are as per Perl:
  1. No safety.
  2. Disallows use of tainted data by potentially dangerous operations. This level is the default on Unix systems when running Ruby scripts as setuid.
  3. Prohibits loading of program files from globally writable locations.
  4. All newly created objects are considered tainted.
In Ruby, each object has a Trusted flag. There are methods to make an object tainted, check whether the object is tainted, or untaint the object (only for levels 0–2). At runtime, Ruby tracks direct data flows through levels 1–3; it does not track indirect/implicit data flows.
The taint tracking abstraction provides a way to prevent some types of injection errors with low cognitive load on developers. Trade-offs in performance overhead need to be made in order to determine how much data can be tracked and what target locations should be tracked, and whether direct and indirect uses can be tracked.

3. Avoiding information leaks through faceted values

Jeeves is an experimental academic language for automatically enforcing information flow policies. It is implemented as an embedded DSL in Python. Jeeves makes use of the faceted values abstraction, which is a data type used for sensitive values that stores within it the secret (high-confidentiality) and non-secret (low-confidentiality) values, guarded by a policy, e.g., <s | ns> (p). A developer specifies policies outside the code, and the language runtime enforces the policy by guaranteeing that a secret value may flow to a viewer only if the policies allow the viewer to view secret data.
Many applications today make use of a database. To make the language practical, faceted values need to be introduced into the database when dealing with database-backed applications. A faceted record is one that guards a secret and non-secret pair of values. Jacqueline, a web framework developed to support faceted values in databases, automatically reads and writes meta-data in the database to manager relevant faceted records. The developer can use standard SQL databases through the Jacqueline object relational mapping.
The faceted values abstraction provides a way to prevent information leaks, with low cognitive load on developers, but at the expense of performance overhead. This ongoing work is yet to determine the lower bound on performance overhead, in order to provide direct and indirect tracking of the data flows for leak of sensitive data purposes.

The future

The previous abstractions illustrate examples of ways to deal with specific types of errors through a programming language abstraction that may be implemented in the language’s type system and/or tracked in its runtime system. These abstractions provide great first steps at looking into the trade-offs of cognitive load and performance in our programming language abstractions, and to create practical solutions accessible to developers at large.
As a community, we need to step back and think of new abstractions to be developed that avoid high cognitive load on developers – and how to overcome any performance implications. Research in this area would allow for many new secure languages to be developed, languages that prevent, by construction, the existence of buffer errors, injection errors and information leaks in our software; i.e., over 50% of today’s exploited vulnerabilities. We need to improve our compiler technology, to develop new abstractions, and to cross the boundaries between different languages used in today’s cloud applications. The right secure abstraction for a web-based application that is database-backed may be different to the right secure abstraction needed for a microservices application.
With over 18.5 million software developers worldwide, security is not just for expert developers. It’s time to design the future of programming language security – it’s time for secure languages.

1. [Injection errors include Cross-Site Scripting (XSS), SQL injection, Code injection, and OS command injection.]
2. [Phrack Magazine, 8(54), article 8, 1998.]
3. [CERT “Malicious HTML Tags”, 2000.]
4. [2016 Ponemon Cost of Data Breach Study.]

Friday, May 4, 2018

Why should start-ups care about technical debt?

By: Eriks KlotinsBlekinge Institute of Technolgy
Associate Editor: Mehdi Mirakhorli (@MehdiMirakhorli)

We asked 84 start-ups to estimate levels of technical debt (TD) in their products and reflect on their software engineering practices. Technical debt is a metaphor to describe suboptimal solutions arising from a tradeoff between time-to-market, resources, and quality. When not addressed, compound effects of suboptimal solutions hinder further product development and reduce overall quality. 
Start-ups are known for their speed of developing innovative products and entering new markets. Technical debt can slow a start-up down and hinder its potential of quickly iterating ideas, or launching modifications for new markets. On the upper side, start-ups can leverage on technical debt to quickly get a product out to customers without significant upfront investment in product development.

Our data from 84 companies shows a clear association between excessive levels of technical debt and state of start-ups. We differentiate between active start-ups working on their products, and closed and paused start-ups. The results show that too much technical debt can impair product quality to the extent that further investments in the product to remove technical debt become unfeasible. Thus, excessive technical debt can kill the product and the company. Also, sustaining high levels of technical debt harms teams' morale as a lot of time is spent on patching the product. We are not advocating for the removal of all technical debt. Instead, we advocate for more understanding and awareness of technical debt.

1- Learn how to spot technical debt

Technical debt can affect different product artifacts, such as source code, documentation, architecture, documentation, and infrastructure.

Code debt or code smells correspond to poorly written code, such as unnecessary code duplication and complexity, long methods/functions, bad style reducing readability. Code debt is the easiest to spot.

Documentation debt refers to shortcomings in distributing knowledge on how to evolve, operate, and maintain the product. For example, poorly documented requirements, outdated architecture drawings, and lack of instructions what maintenance actions are required falls into this category. Documentation can be white-board drawings, notes, information in on-line tools, and formal documents. In start-ups lack of documentation is often compensated by implicit knowledge about the product. However, when team grows, key people may leave or the product may be transferred to another team, e.g., in case of an acquisition, perceived level of documentation debt peaks. The new engineers need to spend a significant effort to learn the product.

Architecture debt concerns structure of the software with effects on its maintainability and adaptability. Start-ups often use open-source frameworks and components to construct their products. Typically, popular frameworks come with their own best practices, and following these practices assures compatibility, easy to upgrade, and makes onboarding of new engineers faster.

Testing debt refers to lack of test automation leading to the need to manually test an entire product before every release. With lack of automation, an effort of regression testing grows with every new feature, supported platform and configuration. Regression testing could not be a problem while a product is small, however, testing debt could become a concern later as start-up matures and needs to support an increasing number of features across multiple platforms.

Environmental debt concerns hardware, other supporting applications, and processes relevant for development, operation and maintenance of the software product. For example, outdated server software leading to security vulnerabilities. Lack of data backup routines, problems in versioning, shortcomings in defect management, and other inadequacies may affect team's ability to create a quality product.

2- Main causes of technical debt

We found that level of engineering skills and the size of the whole start-up team are the primary causes of excessive technical debt. Inexperienced developers are more likely to unknowingly introduce technical debt. Our analysis shows that lack of skills contributes to communication issues and shortcomings in distributing relevant information to the team.

Larger teams, of 9 or more people, are more likely to experience skills shortages, face communication issues, introduce code smells, and experience coordination challenges. Small teams of 2 – 3, engineers can easily communicate with each other to coordinate their activities. However, with every new team member, coordinating with everyone becomes more difficult and suboptimal solutions, especially code smells, find their way into the product.

3- Strategies to address technical debt

1.    As an engineer, be aware of good practices. Knowing the good practices can help to spot bad practices. Knowing the difference can help to better argue for or against certain solutions and be aware of potential negative side effects.

2.    On a team level, run retrospectives and learn how to remove friction from collaboration. With increasing team size, new practices supporting collaboration may be needed. For example, simple practices like daily standups, pair-programming, and a task board can make a significant difference in distributing knowledge and improving teamwork. Note that difficulties in communication and coordination are associated with a size of the whole team, not only the engineering part. Thus, everyone in a start-up team must participate in coordination and communication activities.

3.    On an organizational level, anticipate when to leverage on technical debt to speed up certain goals, and when to slow down and refactor. Our results show that most issues are experienced when a start-up attempts to on-board a large number of users and launch customizations for new markets.

Read more in the original paper: 

E. Klotins, M. Unterkalmsteiner, T. Gorschek et al., “Exploration of Technical Debt in Start-ups,” in International Conference of Software Engineering, 2018.

You may also like: 

  1. B. Stopford, K. Wallace and J. Allspaw, "Technical Debt: Challenges and Perspectives," in IEEE Software, vol. 34, no. 4, pp. 79-81, 2017.
  2. E. Wolff and S. Johann, "Technical Debt," in IEEE Software, vol. 32, no. 4, pp. 94-c3, July-Aug. 2015.
  3. C. Giardino, N. Paternoster, M. Unterkalmsteiner, T. Gorschek and P. Abrahamsson, "Software Development in Startup Companies: The Greenfield Startup Model," in IEEE Transactions on Software Engineering, vol. 42, no. 6, pp. 585-604, June 1 2016.

Monday, April 16, 2018

Which design best practices should be taken care of?

by Johannes Bräuer, Reinhold Plösch, Johannes Kepler University Linz, and Matthias Saft, Christian Körner, Corporate Technology Siemens AG
Associate Editor: Christoph Treude (@ctreude)

In the past, software metrics were used to express the compliance of source code with object-oriented design aspects [1], [2]. Nevertheless, it has been found out that metrics are too vague for dealing with the complexity of driving concrete design improvements [3] and the idea of identifying code or design smells in source code has been established [4].

Despite good progress in localising design flaws based on the identification of design smells, these design smells are still too fine-grained to conclude a design assessment. Consequently, we follow the idea of measuring and assessing the compliance of the source code with object-oriented design principles [5]. For doing so, we systematically collected design principles that are applied in practice and then jointly derived more tangible design best practices [6]. These practices have the key advantage of being specific enough (1) to be applied by practitioners and (2) to be identified by an automatic tool. As a result, we developed the static code analysis tool MUSE that currently contains a set of 67 design best practices (design rules) for the programming languages Java, C# and C++ [7].

Design best practices naturally have a different importance. To find out about a proper importance, we decided to conduct a survey to gather data that allow a more differentiated view of the importance of Java-related design best practices (i.e., a subset of 49 instances).

Survey on the Importance of Design Best Practices

The survey was available from 26th October until 21st November 2016. 214 software professionals (software engineers, architects, consultants, etc.) completed the survey, resulting in an average of 134 opinions for each design best practice. Based on this data we derive a default importance, as depicted in Table 1. For the sake of clarification, the arrows indicate design best practices that are close to the next higher (↑) or lower (↓) importance level. Furthermore, we calculated a range based on the standard deviation that allows an increase or decrease of the importance within these borders. This data can be used as basis to assess quality and to plan quality improvements.
Table 1. Design best practices ordered by importance
Default ImportanceImportance Range
very high
very high
very high
high-very high
very high
high-very high
very high
high-very high
very high
high-very high
high ↑
moderate-very high
high ↑
high-very high
high ↑
moderate-very high
moderate-very high
moderate-very high
moderate-very high
moderate-very high
moderate-very high
moderate-very high
moderate-very high
moderate-very high
moderate-very high
moderate-very high
moderate-very high
moderate-very high
moderate-very high
moderate-very high
moderate-very high
high ↓
low-very high
high ↓
low-very high
high ↓
high-very high
moderate ↑
moderate ↑
moderate ↑
moderate ↓
very low-high
very low-moderate
very low-moderate
very low-moderate
very low-moderate
very low-moderate
very low-moderate
very low-moderate
very low-moderate
very low-moderate
very low-moderate

Beyond the Result of the Importance Assessment

Based on the survey result, we expanded our research in two directions. Accordingly, we further examined our idea of operationalizing design principles, and we recently proposed a design debt prioritization approach to guide design improvement activities properly.

While the survey findings revealed evidence of the importance of design best practices, the remaining question was still whether the practices, assigned to a specific design principle, cover essential aspects of that principle or just touch on some minor design concerns. To answer this general question and to identify white-spots in operationalizing certain principles, we conducted a focus group research for 10 selected principles with 31 software design experts in six focus groups [8]. The result of this investigation showed that our design best practices are capable to measure and to assess the major aspects of the examined design principles.

In the course of the focus group discussions and in communicating the survey result to practitioners, we identified the need to prioritize design best practice violations not only from the viewpoint of their importance, but also from the viewpoint of a quality state. As a result, we proposed a portfolio-based assessment approach that combines the importance of each design best practice (y-axis in Figure 1) with a quality index (x-axis in Figure 1) derived from a benchmark suite [9], [10]. This combination is presented as portfolio matrix, as depicted in Figure 1 for the measurement result of a particular open-source project; in total, the 49 design best practices for Java are presented. Taking care of all 49 best practices is time expensive and could be overwhelming for the project team. Consequently, the portfolio-based assessment approach groups the design best practices into four so-called investment areas, which recommend concrete improvement strategies.
Figure 1: Investment areas of portfolio matrix
Concluding Remarks

To summarize this blog entry and to answer the heading question, let’s reconsider the opinions of the 214 survey participants. Accordingly, we derived the importance of the 49 design best practices, from which five instances are judged to be of very high importance. In fact, code duplicates (code clones), supertypes using subtypes, package cycles, commands in query methods and public fields are the design concerns considered to be very important. In other words, avoiding the violation of these design rules in practice can enhance and foster the flexibility, reusability and maintainability of a software product.

For more details about the conducted survey, we refer interested readers to the research article titled “A Survey on the Importance of Object-oriented Design Best Practices” [11].


[1] S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented design,” IEEE Trans. Softw. Eng., vol. 20, no. 6, pp. 476–493, Jun. 1994.
[2] J. Bansiya and C. G. Davis, “A hierarchical model for object-oriented design quality assessment,” IEEE Trans. Softw. Eng., vol. 28, no. 1, pp. 4–17, Jan. 2002.
[3] R. Marinescu, “Measurement and quality in object-oriented design,” in Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM), Budapest, Hungary, 2005, pp. 701–704.
[4] R. Marinescu, “Detection strategies: metrics-based rules for detecting design flaws,” in Proceedings of the 20th IEEE International Conference on Software Maintenance, Chicago, IL, USA, 2004, pp. 350–359.
[5] J. Bräuer, “Measuring Object-Oriented Design Principles,” in Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, USA, 2015, pp. 882–885.
[6] R. Plösch, J. Bräuer, C. Körner, and M. Saft, “Measuring, Assessing and Improving Software Quality based on Object-Oriented Design Principles,” Open Comput. Sci., vol. 6, no. 1, 2016.
[7] R. Plösch, J. Bräuer, C. Körner, and M. Saft, “MUSE - Framework for Measuring Object-Oriented Design,” J. Object Technol., vol. 15, no. 4, p. 2:1-29, Aug. 2016.
[8] J. Bräuer, R. Plösch, M. Saft, and C. Körner, “Measuring Object-Oriented Design Principles: The Results of Focus Group-Based Research,” J. Syst. Softw., 2018.
[9] J. Bräuer, M. Saft, R. Plösch, and C. Körner, “Improving Object-oriented Design Quality: A Portfolio- and Measurement-based Approach,” in Proceedings of the 27th International Workshop on Software Measurement and 12th International Conference on Software Process and Product Measurement (IWSM-Mensura), Gothenburg, Sweden, 2017, pp. 244–254.
[10] J. Bräuer, R. Plösch, M. Saft, and C. Körner, “Design Debt Prioritization - A Design Best Practice-Based Approach,” in Proceedings of the 1st International Conference on Technical Debt (TechDebt), 2018.
[11] J. Bräuer, R. Plösch, M. Saft, and C. Körner, “A Survey on the Importance of Object-Oriented Design Best Practices,” in Proceedings of the 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA), Vienna, Austria, 2017, pp. 27–34.

Monday, April 9, 2018

Efficiently and Automatically Detecting Flaky Tests with DeFlaker

By: Jonathan Bell, George Mason University (@_jon_bell_), Owolabi Legunsen, UIUC, Michael Hilton (@michaelhilton), CMU, Lamyaa Eloussi, UIUC, Tifany Yung, UIUC, and Darko Marinov, UIUC.
Associate editor: Sarah Nadi, University of Alberta (@sarahnadi), Bogdan Vasilescu, Carnegie Mellon University (@b_vasilescu).
Flaky tests are tests that can non-deterministically pass or fail for the same version of the code under test. Therefore, flaky tests can be incredibly frustrating for developers. Ideally, every new test failure would be due to the latest changes that a developer made, and the developer could subsequently focus on debugging these changes. However, because their outcome depends not only on the code, but also on tricky non-determinism (e.g. dependence on external resources or thread scheduling), flaky tests can be very difficult to debug. Moreover, if a developer doesn’t know that a test failure is due to a flaky test (rather than a regression that they introduced), how does the developer know where to start debugging: their recent changes, or the failing test? If the test failure is not due to their recent changes, then should they debug the test failure immediately, or later?
Flaky tests plague both large and small companies. Google reports that 1 in 7 of their tests have some level of flakiness associated with them. A quick search also turns up many StackOverflow discussions and details about flaky tests at Microsoft, ThoughtWorks, SemaphoreCI and LucidChart.

Traditional Approach for Detecting Flaky Tests

Prior to our work, the most effective way to detect flaky tests was to repeatedly rerun failed tests. If some rerun passes, the test is definitely flaky; but if all reruns fail, the status is unknown (it might or might not be flaky). Rerunning failed tests is directly supported by many testing frameworks, including Android, Jenkins, Maven, Spring, FaceBook's Buck and Google TAP. Rerunning every failed test is extremely costly when organizations see hundreds to millions of test failures per day. Even Google, with its vast compute resources, does not rerun all (failing) tests on every commit but reruns only those suspected to be flaky, and only outside of peak test execution times.

Our New Approach to Detecting Flaky Tests

Our approach, DeFlaker, detects flaky tests without re-running them, and imposes only a very modest performance overhead (often less than 5% in our large-scale evaluation). Recall that a test is flaky if it can both pass and fail when it executes the same code, i.e., code that did not change; moreover, a test failure is new if the test passed on the previous version of code but fails in the current version. Between each test suite execution, DeFlaker tracks information about what code has changed (using information from a version control system, like git), what test outcomes have changed, and which of those tests executed any changed code. If a test passed on a prior run, but now fails, and has not executed any changed code, then DeFlaker warns that it is a flaky test failure.
However, on the surface, tracking coverage for detecting flaky tests may seem costly: industry reports suggest that collecting statement coverage is often avoided due to the overhead imposed by coverage tools, a finding echoed by our own empirical evaluation of popular code coverage tools (JaCoCo, Cobertura and Clover).
Our key insight in DeFlaker is that one need not collect coverage of the entire codebase in order to detect flaky tests. Instead, one can collect only the coverage of the changed code, which we call differential coverage. Differential coverage first queries a version-control system (VCS) to detect code changes since the last version. It then analyzes the structure of each changed file to determine where exactly instrumentation needs to be inserted to safely track the execution of each change, allowing it to track coverage of changes much faster than traditional coverage tools. Finally, when tests run, DeFlaker monitors execution of these changes and outputs each test that failed and is likely to be flaky.

Finding Flaky Tests

How useful is DeFlaker to developers? Can it accurately identify test failures that are due to flakiness? We performed an extensive evaluation of DeFlaker by re-running historical builds for 26 open-source Java projects, executing over 47,000 builds. When a test failed, we attempted to diagnose it both with DeFlaker and by rerunning it. We also deployed DeFlaker live on Travis CI, where we integrated DeFlaker into the builds of 96 projects. In total, our evaluation involved executing over five CPU-years of historical builds. We also studied the relative overhead of using DeFlaker compared to a normal build. A complete description of our extensive evaluation is available in our ICSE 2018 paper.
Our primary goal was to evaluate how many flaky tests DeFlaker would find, compared with the traditional (rerun) approach. For each build, whenever a test failed, we re-ran the test using the rerun facility in Maven’s Surefire test runner. We were interested to find that this approach only resulted in 23% of test failures eventually passing (hence, marked as flaky) even if we allowed for up to five reruns of each failed test. On the other hand, DeFlaker marked 95% of test failures as flaky! Given that we are reviewing only code that was committed to version control repositories, we expected that it would be rare to find true test failures (and that most would be flaky).
We found that the strategy by which a test is rerun matters greatly: make a poor choice, and the test will continue to fail for the same reason as the first failure, causing the developer to assume that the failure was a true failure (and not a flaky test). Maven’s flaky test re-runner reran each failed test in the same process as the initial failed execution --- which we found to often result in the test continuing to fail. Hence, to better find flaky test failures, and to understand how to best use reruns to detect flaky tests, we experimented with the following strategies, rerunning failed tests: (1) Surefire: up to five times in the same JVM in which the test ran (Maven’s rerun technique); then, if it still did not pass; (2) Fork: up to five times, with each execution in a clean, new JVM; then, if it still did not pass; (3) Reboot: up to five times, running a mvn clean between tests and rebooting the virtual machine between runs.
As shown in the figure below, we found nearly 5,000 flaky test failures using the most time-consuming rerun strategy (Reboot). DeFlaker found nearly as many of these same flaky tests (96%) with a very low false alarm rate (1.5%), and at a significantly lower cost. It’s also interesting to note that when a rerun strategy worked, it generally worked after a single rerun (few additional tests were detected from additional reruns). We demonstrated that DeFlaker was fast by calculating the time overhead of applying DeFlaker to the ten most recent builds of each of those 26 projects. We compared DeFlaker to several state-of-the-art tools: the regression test selection tool Ekstazi, and the code coverage tools JaCoCo, Cobertura, and Clover. Overall, we found that DeFlaker was very fast, often imposing an overhead of less than 5% — far faster than the other coverage tools that we looked at.
Overall, based on these results, we find DeFlaker to be a more cost-beneficial approach to run before or together with reruns, which allows us to suggest a potentially optimal way to perform test reruns: For projects that have lots of failing tests, DeFlaker can be run on all the tests in the entire test suite, because DeFlaker immediately detects many flaky tests without needing any rerun. In cases where developers do not want to pay this 5% runtime cost to run DeFlaker (perhaps because they have very few failing tests normally), DeFlaker can be run only when rerunning failed tests in a new JVM; if the tests still fail but do not execute any changed code, then reruns can stop without costly reboots.
While DeFlaker’s approach is generic and can apply to nearly any language or testing framework, we implemented our tool for Java, the Maven build system, and two testing frameworks (JUnit and TestNG). DeFlaker is available under an MIT license on GitHub, with binaries published on Maven Central.
More information on DeFlaker (including installation instructions) is available on the project website and in our upcoming ICSE 2018 paper. We hope that our promising results will motivate others to try DeFlaker in their Maven-based Java projects, and to build DeFlaker-like tools for other languages and build systems.

For further reading on flaky tests:

Adriaan Labuschagne, Laura Inozemtseva and Reid Holmes
A study of test suite executions on TravisCI that investigated the number of flaky test failures.
John Micco
A summary of Flaky tests at Google and (as of 2016) the strategies used to manage them.
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov
A study of the various factors that might cause tests to behave erratically, and what developers do about them.
A description of how the Chromium and WebKit teams triage and manage their flaky test failures.

Sunday, March 25, 2018

IEEE January/February Issue, Blog, and SE Radio Summary

Happy New Year software designers, developers, and lovers!

The January/February issue of IEEE Software brought in the New Year with another set of interesting and relevant articles on issues and advancements in the software engineering community.

As usual, January/February issue features articles on various topics, however, focuses on two topics in particular: Safety & Security in Cyber-physical systems and Actionable Analytics.

This issue features the following articles on safety and security in cyber-physical systems:

The other focus of this issue, actionable analytics, is discussed in the following articles:

To get an idea of what Cyber-physical systems are and the challenges that come with developing and maintaining them, the article "Software Safety and Security Risk Management in Cyber-physical Systems" is definitely worth a read. This articles sets the stage nicely for the other articles relevant to this focus topic, as the other focus on ways to improve safety and security in cyber-physical systems.

The articles on actionable analytics focuses mostly on how we can use analytics to improve various aspects of the software development process. For example, in the article "Actionable Analytics for Strategic Maintenance of Critical Software",  the authors discuss their experiences with, and provide examples, using software metrics and analytics to enable actionable strategic maintenance management of critical software (in their case, software called Monte).

Another topic that always seems to find its way into IEEE Software is DevOps. Falling under the focus topic of actionable analytics, in the article "Using Analytics to Guide Improvement during an Agile-DevOps Transformation", the authors discuss how software analytics have been used to both drive improvements to the software development process and assess progress. This article focuses on how to use analytics in the context of an agile-DevOps transformation. 

Also pertaining to the success (or lack there of) of adopting DevOps practices, in the section of this issue titled "The Pragmatic Architect", the article titled "The Software Architect and DevOps" discusses why the software architect plays such a critical role in the adoption of DevOps practices.

 For those looking to up your soft skills, also interesting in this issue is an article titled "Managing Programmers, with Ron Lichty". In this article, veteran software manager Ron Lichty share his insights and experiences with managing software engineers, including what makes managing programmers so hard and how to build and manage high performing teams.

IEEE Software Blog

This was a light couple of months for the blog, but January featured a blog post titled "SEABED: An Open-Source Repository of Software Engineering Case Studies" by Veena Saini, Deepti Ameta, Ashish Sureka, Paramvir Singh, and Saurabh Tiwari. In this blog post, the authors discuss Case-Based Learning (CBL), a form of active learning, and how it can be used to enhance students' learning of various software engineering concepts. 

Of course, also featured for February is summary of the November/December IEEE Software issue. If you're behind, make sure you check it out to catch up! :)

SE Radio

The episodes featured on SE Radio in January and February are mostly technical in nature, but no less interesting!  

Interested in learning more about Java 9 and how it might benefit you? Nicolai Parlog talks with SE Radio host Nate Black breaks down changes to the Java language over time and what's really good with Java 9.

We also had a couple of "What is this and how do we do it?" conversations on SE Radio on the topics of data sciencecloud security, and image recognition.

Monday, February 5, 2018

IEEE November/December Issue, Blog, and SE Radio Summary

The November/December Issue of IEEE Software, voted unanimously as best software magazine ever, is chock full of healthy goodness after a cheerful, yet gluttonous, holiday! This issue features numerous articles on building smart, context-aware healthcare systems. Along with articles on healthcare, this issue also features articles on other hot software topics including requirements engineering, agile development, and blockchain-based systems. Special in this issue is an article providing highlights from ICSE 2017.

As I mentioned, the feature topic of this issue of IEEE Software is smart and context-aware systems, with a focus on healthcare systems. This issue features the following articles on this topic:

In "The Elusiveness of Smart Healthcare," IEEE Software Editor in Chief Diomidis Spinellis speaks on his experiences with healthcare technology. Based on his experiences, he outlines challenges that lie ahead and should be considered when building smart healthcare systems.  Potential challenges aside,  a subset of article related to the feature topic showcase recent advances in building smart, context-aware healthcare software solutions. 

The authors of "In the Pursuit of Hygge Software" discuss how we can, and should, improve hygge in pervasive technologies, such as the ones often used in healthcare. For those who don't know (like me), hygge is a Danish and Norwegian word that essentially means feeling connected to others. The authors suggest that we can improve hygge by helping people find other people, places, and information that could help through their situation and propose a hygge-enabled software architecture. Similarly, in "Crowd-Based Ambient Assisted Living to Monitor the Elderly's Health Outdoors," the authors discuss how we can build healthcare systems that can help monitor the elderly's health when they're outdoors. Unlike existing technologies, however, their SafeNeighborhood approach uses crowdsourced information to improve inferences made by contextual and sensor data.

Some articles focused on the general idea of building context-aware systems, particularly on handling variability in context-aware systems.
Common amongst these articles is the idea of modeling context-variability and using these models to identify and deal with problems at run-time, without human intervention.
If you have any interest in understanding or improving smart healthcare systems, this issue is especially for you!

IEEE Software Blog

As usual, one feature item on the IEEE Software Blog is a summary of the IEEE Software September/October issue. If you're behind, make sure you check it out!

The past couple of months haven't seen too many new blog posts, but featured a mix of articles showcasing small and large scale research efforts. One article discusses an initiative called Naming the Pain in Requirements Engineering (NaPiRE). This initiative came together to conduct a global study on requirements engineering in industry with the goal of building a holistic theory on industry practices, trends, and problems.

The other articles in this issue discuss technical research. If you're conducting or interested in research related to StackOverflow, there's an article on asking good technical questions that might catch your attention. Also of interest to those who take interest in eye tracking studies (which are becoming more frequent in the SE research community), one article explores using eye tracking to automate traceability link recovery. 

SE Radio

Same hosts, new (hot) topics. 
Got a secret you need to keep? Want to learn how to manage your secret? There's an episode for that.
There are also episodes that can help with understanding and managing people, more specifically how IT architectures transform and adapt and how to hire and retain DevOps engineers.
Other episodes focused on understanding various aspects of the software we build. IEEE Software editor in chief Diomidis Spinellis step in to talk tools, practices and other topics relevant to performance optimization. Other topics discussed included how we as software engineers can improve our "security stature" and how Internet of Things (IoT) applications are built and used.

Monday, January 22, 2018

SEABED: An Open-Source Repository of Software Engineering Case Studies

by Veena Saini (NIT Jalandhar, India), Deepti Ameta (DAIICT, India), Ashish Sureka (Ashoka University, India), Paramvir Singh (NIT Jalandhar, India) and Saurabh Tiwari (DAIICT, India)

Associate Editor: Sridhar Chimalakonda (@ChimalakondaSri)

Case-Based Learning (CBL) is a non-traditional teaching methodology which helps in enhancing the learning skills. It helps in the deeper understanding of the concepts; involves students in active learning, enhances collaboration among team members, helps in nurturing analytic and interpersonal skills through well-defined cases. We looked for most prevalent teaching pedagogies being used in different domains like Medical, Law, and Business education that require brainstorming and we found CBL to be an effective approach in enhancing students learning levels by helping them to think in different perspectives for the same problem.

Why CBL for Software Engineering?

Software Engineering is a highly practice oriented field, concerned with the development, operation, and maintenance of software, hence providing practical knowledge for its applicability in real world becomes equally important. Software development process is primarily based on a set of well-defined requirements in which understanding and prioritizing them is important for the success of any product. It is very important to bridge the gap between the theoretical concepts at university level and practical implementation of the concepts at industry level, hence we found CBL to be an effective approach in teaching some of the SE concepts as it is a student-centric teaching methodology which focuses on learning objectives like critical thinking, brainstorming, discussions, and understanding real life scenarios with multiple perspective solutions.

Our progress to bring CBL into practice through SEABED

The backbone of SEABED [1] is the variety of cases and vibrant SEABED community. We are building up our SEABED community by interacting with SE practitioners, enthusiasts and researchers all over the globe. Gradually SEABED community is growing up with potential users. Along with all the researchers and SE enthusiasts, we also welcome thesis students to contribute and take this learning platform to a new level. Our idea was to create a web-based platform as a hub for SE CBL practitioners. Teachers, students, researchers, instructors, or anyone can be benefited from CBL and can find relevant cases from our open-source case repository in Case Collection section. Cases are categorized according to the various phases of Software Development Life Cycle (SDLC).
It allows the interested users to contribute with their own case problems as PDF document (Case Submission).  They can also upload the experience report about the CBL session, its empirical analysis and students’ feedback after conducting some CBL sessions in the University. They can revise any existing case available at SEABED, by providing proper justification document that tells the rationale behind the amendments (Case Evolution). 

Case Example: Twitter started as a side project of Odeon Inc. in 2006. It had immense growth nearly 1000% growth/year and soon became the micro blogging platform of choice for majority of Internet users. It had 400,000 tweets per quarter in 2007. This grew to 100 million tweets per quarter in 2008. By 2010 there were around 175 million users, 90 million tweets per day and around 500 million searches per day. Twitter was initially built with time to market in mind. So the architecture and technology to build twitter was chosen such that they can build the site in a very short time. Twitter was not designed with this kind of growth in mind. Your team has to come up with a new architecture that addresses the scalability problems of twitter.
Challenges 1: What are the architectural drivers, assumptions and major constraints? Give details of at least 5 decisions related to major architectural strategies. • 2: Give the architecture in terms of system decomposition (as a diagram and text), structure, and connector and component responsibilities.

This is a sample case [2] for Software Engineering (SE) case-based learning (CBL) practice with challenging questions.

CBL Implementation at different universities

Experiment 1 - IIIT Hyderabad, India: We begin our journey when CBL was first implemented at IIIT Hyderabad for practicing SE concepts. They proposed a case based learning environment - Case Oriented Software Engineering Education model (COSEEd). It embeds problem solving as a core skill with cases as the primary learning objects. We implemented 4 course offerings with COSEEd considering UG, PG, and PGSSP (Industry participants). In all the offerings, students felt that CBL helped them in gaining learning objects along with an improvement in their communications skills. This model was successful in covering cognitive goals in SE education. The research work can be reached at [3].

Experiment 2 - NIT Jalandhar, India:
Here CBL session was conducted for 89, 3rd year UG students. CBL is found to be effective with an agreement of 74.53% students, who were able to understand all five learning principles of learning, critical thinking, engagement, communication skills, and team work. Experience report [4] of this implementation is available at SEABED. After this exercise, SEABED platform was created and we proposed a case writing template. The research work can be reached at [5].

The architecture of SEABED is given below:

Experiment 3 - DAIICT Gandhinagar, India:  
The CBL was introduced in two different SE courses, Software Testing and Requirements Engineering, with a number of CBL sessions. Firstly, we applied CBL for Software Testing Discipline and got appreciable responses at DAIICT, Gandhinagar, India. This was the first time when students were introduced with the concepts of CBL. This drove us to conduct more CBL sessions at DAIICT and the second time we applied CBL for practicing Requirements Engineering Concepts by providing two different RE cases [6][7] to student teams and asked them to bring solutions (PowerPoint presentations) to check the effectiveness of CBL by evaluating their results on different parameters. We also intend to perform several experiments on the same. After every CBL session we asked student teams and Teaching Assistants (TAs) to give feedback and we got positive responses this time too. We further aim to expand this by considering new research techniques on CBL implementation.

So the overall analysis of the CBL experiments conducted at different Universities reveals that CBL teaching methodology can be used in teaching some of the Software Engineering concepts efficiently, supported by well-designed cases. It helps students’ in in solving real world problems by engaging themselves in thinking and discussing activities.

What’s next (future opportunities): Case-based learning in Software Engineering domain is effective and there are several sub-areas that are still unexplored. Our platform does not only help students to assimilate the concepts well, but also provides a promising research area for SE practitioners and enthusiasts to submit and discuss efficient cases. We further aim to expand this work by conducting more useful sessions with other universities and experimenting CBL with more detailed observations, analysis and results. We also invite MS, M.Tech and PhD scholars to collaborate with us on CBL project.

[3] Kirti Garg, Ashish Sureka and Vasudeva Verma. 2015. A Case Study on Teaching Software     Engineering Concepts using a Case-Based Learning Environment.
[5]V. Saini, P. Singh and A. Sureka, "SEABED: An Open-Source Software Engineering Case-Based       Learning Database," 2017. IEEE 41st Annual Computer Software and Applications Conference           (COMPSAC), Turin, 2017, pp. 426-431. doi: 10.1109/COMPSAC.2017.204
[8] D. Kundra and A. Sureka, “An Experience Report on Teaching Compiler Design Concepts Using       Case-Based and Project-Based Learning Approaches,” 2016 IEEE Eighth International                         Conference on Technology for Education (T4E), Mumbai, 2016, pp. 216-219. doi:                               10.1109/T4E.2016.052.