Wednesday, August 31, 2016

EVOSS: simulating FOSS distribution upgrades

Associate editor: Stefano Zacchiroli (@zacchiro)


Upgrading software is a critical and error prone task that might have dramatic consequences. This is the case of the Japanese flagship astronomical satellite Hitomi that was successfully launched on 17 February 2016 but that, most probably because of a software upgrade, was declared lost on March 26th, 2016. The cost of the satellite was $286 million.

In this article we focus on how to manage the upgrade of Free and Open Source Software (FOSS) distributions. Distributions are complex systems, composed of thousand components (software packages) evolving rapidly and independently from each other. A FOSS distribution can be seen as a consistent and functional collection of software packages comprising a complete operating system. The management of FOSS distributions evolution is very challenging due to its community-centric nature and to the frequent releases of components [1].

Typically, distributions offer automated tools, called package managers, for managing the components they are made of, and for managing system upgrades. State of the art package managers are able to successfully manage a limited set of upgrade types and, specifically, they are only aware of static dependencies among packages that can influence upgrades. The most important information available to package managers concern the specification of inter-package relationships such as dependencies (i.e., what a package needs in order to be correctly installed and to function correctly), and conflicts (i.e., which other packages should not be present on the system in order to avoid malfunctioning). Package managers completely ignore relevant dynamic aspects, such as potential faults of configuration scripts that are executed during upgrade deployment. Thus, it is not surprising that an apparently innocuous package upgrade can end up with a broken system state [2].

EVOSS


EVOSS (EVolution of free and Open Source Software), proposed within the context of the Mancoosi EU project, is an approach to support the upgrade of FOSS systems (see Figure 1). The approach is based on Model-driven Engineering (MDE) [3] which refers to the systematic use of models as first class entities throughout the software engineering life cycle. In MDE, Domain-Specific Modeling Languages (DSMLs) are used to describe the system. DSMLs use metamodels to define the main language concepts such as the relations among domain concepts and their semantics. More precisely, a metamodel is an abstraction that highlights the properties of the models, which are said to conform to its metamodel like a program conforms to the grammar of the programming language. DSMLs are used to build a system model according to its metamodel semantics and constraints. In MDE it is common to have a set of transformation engines and generators that produce various types of artefacts. There are many model-to-model transformation approaches, as surveyed in [4]. It is important to note that model transformations are defined once for all at the metamodeling level; therefore, practitioners act as users of model transformations, which are defined by MDE and domain experts.

Figure 1: Overview of the EVOSS approach working in a real system
In order to make upgrade prediction more accurate, EVOSS considers both static and dynamic aspects of packages and their upgrades. The main dynamic aspects considered are those related to the behaviour of package configuration scripts (AKA maintainer scripts) which are executed during upgrade deployment.

Maintainer scripts are executed during upgrades and they are full-fledged programs usually written in the POSIX shell language. Moreover, they are run with the system administrator rights and may therefore perform arbitrary changes to the whole system. They are expected to complete without errors: their failures, usually signalled by returning non-0 exit codes, automatically trigger upgrade failures and may easily lead the system to a non-coherent state.

EVOSS defines a DSL to specify the behaviour of maintainer scripts: this is a way to make the effect of the scripts predictable. This also limits the expressive power of the language used in maintainer scripts, without however reducing important functionality of the scripts themselves. The DSL includes a set of high level clauses with a well-defined transformational semantics expressed in terms of system state modifications: each system state is given as a model and the script behaviour is represented by corresponding model transformations.

The idea of EVOSS is to exploit the DSL semantics to better understand how the system evolves in a new configuration. This is obtained by simulating system upgrades via an upgrade simulator (see Figure 1) that allows system users to discover upgrade failures due to fallacious maintainer scripts before performing the upgrade on the real system. The simulator takes into account both fine-grained static aspects (e.g., configuration incoherencies) and dynamic aspects (e.g., the execution of maintainer scripts). The simulator is based on model-driven techniques and makes use of a model-based description of the system to be upgraded. More specifically, the configuration model represents a model of the system configuration and package models represent a model of packages to be installed, removed, or updated. The simulator is able to simulate also pre- and post-installation scripts that come with distribution packages. In order to apply the proposed simulation approach system configuration and package models have to be extracted automatically from existing artefacts.

In order to build the system's configuration and package models, EVOSS makes use of model injectors that are apt to extract models from existing artefacts. The outcome of the system injection is a model that represents, in a homogeneous form, different aspects of a running system, such as installed packages, users and groups, MIME type handlers, file alternatives, implicit dependencies, etc. The outcome of package injection contains modelling elements encoding both the considered package and its scripts (as DSL statements).

The fault detector is then used to check system configurations for incoherencies. The coherence of a configuration model is evaluated by means of queries which are embodied in the fault detector. In particular, for each detectable fault, a corresponding Object Constraint Language (OCL) expression is defined and used to query models and search for model elements denoting faults. OCL is a declarative language that provides constraint and object query expressions on models and meta-models. Obviously it is not possible to define once forever a complete catalogue of faults because they are based on experience and acquired know-how. Therefore, the fault detector has been designed to be open and extensible so that new queries can be added whenever new classes of faults are identified.
Implications

EVOSS represents an advancement, with respect to the state of the art of package managers, in the following aspects: (i) it provides a homogeneous representation of the whole system’s configuration in terms of models, including relevant system elements that are currently not explicitly represented, (ii) it supports the upgrade simulation with the aim of discovering failures before they can affect the real system, (iii) it proposes a fault detector module able to discover problems on the configuration reached by the simulation.

The EVOSS approach has been investigated and exploited by the Caixa Magica GNU/Linux distribution to support selective roll-back. More details might be found here on YouTube.

More information about EVOSS as well as its source code can be found on the EVOSS website.

If you like this article, you might also enjoy reading:


Overall description of the EVOSS approach:
  • Roberto Di Cosmo, Davide Di Ruscio, Patrizio Pelliccione, Alfonso Pierantonio, Stefano Zacchiroli (2011) Supporting software evolution in component-based FOSS systems, Science of Computer Programming 76: 12. 1144-1160. http://dx.doi.org/10.1016/j.scico.2010.11.001
Details about the simulator component:
  • Davide Di Ruscio, Patrizio Pelliccione (2014) Simulating upgrades of complex systems: The case of Free and Open Source Software, Information and Software Technology 56: 4. 438-462 April. http://dx.doi.org/10.1016/j.infsof.2014.01.006
Details about the fault detector component:
  • Davide Di Ruscio, Patrizio Pelliccione (2015) A model-driven approach to detect faults in FOSS systems, Journal of Software: Evolution and Process 27: 4. 294-318 April. http://dx.doi.org/10.1002/smr.1716
Details about the tool:
  • Davide Di Ruscio, Patrizio Pelliccione, Alfonso Pierantonio. EVOSS: a tool for managing the Evolution of Free and Open Source Software systems. 34th International Conference on Software Engineering (ICSE), Zurich, Demo paper, 2012, pp. 1415-1418. http://dx.doi.org/10.1109/ICSE.2012.6227234

References


[1] Eric S. Raymond. The cathedral and the bazaar. O'Reilly, 2001.
[2] Olivier Crameri, Nikola Knezevic, Dejan Kostic, Ricardo Bianchini, and Willy Zwaenepoel. Staged deployment in mirage, an integrated software upgrade testing and distribution system. SIGOPS Oper. Syst. Rev., 41(6):221-236, 2007.
[3] D. C. Schmidt. Guest Editor's Introduction: Model-Driven Engineering. Computer, 39(2):25-31, 2006.
[4] K. Czarnecki and S. Helsen. Feature-based survey of model transformation approaches. IBM Syst. J., 45:621-645, July 2006.

Sunday, August 28, 2016

Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects

by Michael Hilton, Oregon State University, Canada (@michaelhilton)
Associate Editor: Sarah Nadi, University of Alberta, Canada (@sarahnadi)

Continuous integration (CI) systems automate the compilation, building, and testing of software.  A recent large survey of software development professionals, found that 50% of respondents use CI [1]. Despite the widespread usage of CI, it has received almost no attention from the research community. This gap has left a lot of unanswered questions around CI. For example, how widely is CI used in practice, and what are some costs and benefits associated with CI? To answer these questions, we studied the usage, costs, and benefits of CI in open-source projects. We examined the 34,000 most popular open-source GitHub projects to evaluate which CI service(s) they use. From that list, we collected over 1.5 Million builds from over 600 of the most popular projects and then surveyed 442 of the developers of these projects. In the following, we answer some of your burning questions about CI.

Usage of CI

Is CI actually used?  We found that over 40% of all open-source projects that we examined use CI.  Note that since we cannot identify private CI servers that may be used by some projects, 40% should be considered as the floor of CI usage.

What is the most popular CI for open source?  The most commonly used CI service in our data was Travis CI. Over 90% of all open-source projects using CI use Travis CI.

Are the ``big names’’ in open source using CI?  We sorted the projects by popularity and found that for the 500 most popular projects, 70% use CI.  As the projects become less popular, the percentage that use CI goes down.

Is CI a passing fad?  We asked developers in our survey if they plan on using CI for their next project.  The top two options, 'Definitely' and 'Most Likely' account for 94% of all our survey respondents.  Even among respondents who are not currently using CI, over 50% said they would use CI for their next project.

Costs of CI

Why then do some projects not use CI? The most common reason reported in our survey is that other developers in a project are not familiar with CI.  The second most popular reason was that the project does not have automated tests.

Will I have to be continuously fixing the configuration files? We found that the median project changes their CI configuration files 12 times over the lifetime of the project, so there is not a lot of churn in CI configurations.

Benefits of CI

So what exactly do developers like about CI? When we asked developers why they use CI, the most common answer was “CI helps us worry less about breaking our builds”. This was reported by 87% of our respondents. The second most common answer was that “CI helps catch bugs earlier”.

Can CI help me release more often? We found that projects with CI do in fact release faster then projects that do not use CI (.54 releases per month with CI versus .24 releases per month without CI).  To control for project type, we also looked at only projects that use CI and compared how often they released before and after introducing CI to the project.  Before CI, they released at a rate of .34 releases per month, but after introducing CI, the release rate rose to .54 releases per month.

Can CI help me spot problems before they happen?  Projects that use CI accepted less pull requests then projects that do not use CI.  This could be because CI helps identify subtle issues that a quick human review would not catch.

Will CI help me save time? We found that projects with CI accept pull requests (PRs) on average 1.6 hours faster than projects that do not use CI.  This could be because of the time saved by letting the CI review the PR, as opposed to manually reviewing it.
 

Conclusion

CI can save you time, help you release more often, and help you worry less about breaking your builds.  We hope these results motivate developers to continue to adopt CI, but also to provide a call to action for the research community to continue investigating this important aspect of the development process.

If you are interested in more details about our results, please read our full paper [2].




[1] Version One. 10th annual state of Agile development survey. https://versionone.com/pdf/VersionOne-10th-Annual- State-of-Agile-Report.pdf, 2016. 

[2] Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects. Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016). To appear.

Sunday, August 21, 2016

From Aristotle to Ringelmann: Using data science to understand the productivity of software development teams

By: Ingo Scholtes, Chair of Systems Design, ETH Zürich, Switzerland (@ingo_S)
Associate Editor: Bogdan Vasilescu, University of California, Davis. USA (@b_vasilescu)

I am sure that the title of this blog post raises a number of questions: What has the Greek philosopher Aristotle to do with the 19th century French agricultural engineer Maximilien Ringelmann? And what, if anything, do the two of them have to do with software?

The answers to both questions are related to Aristotle's famous reference to systems where the "whole is greater than the sum of its parts", i.e. complex systems where interactions between elements give rise to emergent or synergetic effects. Teams of software developers which communicate and coordinate to jointly solve complex development tasks can be seen as one example for such a complex (social) system. But what what are the emergent effects in such teams? How do they affect productivity and how can we quantify them?

This is where Maximilien Ringelmann, a French professor for agricultural engineering, who would later - astonishingly - become one of the founders of social psychology, enters the story. Around 1913, Maximilien Ringelmann became interested in the question how the collective power of draught animals, like horses or oxen pulling carts or plows, changes as increasingly large teams of them are harnessed. He answered the question based on a data-driven study. Precisely, he asked increasingly large teams of his students to jointly pull on a rope, measuring the collective force they were able to exert. He found that the collective force of a team of students was less than the sum of forces exerted by each team member alone, an effect that became later known as the "Ringelmann effect". One possible explanation for the finding are coordination issues that make it increasingly difficult to tightly synchronize actions in an increasingly large team. Moreover, social psychology has generally emphasized motivational effects that are due to shared responsibility for the collective output of a team, a phenomenon known by the rather unflattering term "social loafing".

Changing our focus from draught oxen to developers, let us now consider how all of this is related to software engineering. Naturally, the question how the cost and time of software projects scale with the number of software developers involved in the project is of major interest in software project management and software economics.

In the 1975 book The Mythical Man-Month, Fred Brooks formulated his famous law of software project management, stating that "adding manpower to a late project makes it later". Just like for the Ringelmann effect, different causes for this have been discussed. First, software development naturally comes with inherently indivisible tasks which cannot easily be distributed among a larger number of developers. Illustrating this issue, Brooks' stated that "nine women can't make a baby in one month". Secondly, larger team sizes give rise to an increasing need for coordination and communication that can limit the productivity of team members. And finally, for developers added to a team later, Brooks' discussed the "ramp-up" time that is due to the integration and education of new team members. The result of these combined effects is that the individual productivities of developers in smaller teams cannot simply be multiplied to estimate their productivity in a larger team.

While the (empirical) software engineering community has been rather unanimous about the fact that the productivity of individual developers is likely to decrease as teams grow larger, a number of recent works in management science and data science (referenced and summarized in [1]) have questioned this finding in the context of Open Source communities. The argument is that larger team sizes increase the motivation of individuals in Open Source communities, thus giving rise to an "Aristotelian regime" where indeed the whole team produces more than expected based on the sum of its parts. The striking consequence of this would be that Open Source projects are instances of economies of scale, where the effort of production (in terms of team members involves) sublinearly scales with the scale of the project. In contrast traditional software projects represent diseconomies of scale, i.e. the cost of production superlinearly increases as projects become larger and more complex.

The goal of our study [1] was to contribute to this discussion by means of a large-scale data-driven study. Precisely, we studied a set of 58 major Open Source projects on gitHub with a history of more than ten years and a total of more than half a million commits contributed by more than 30,000 developers. The question that we wanted to answer is simple: How does the productivity of software development teams scale with the team size? In particular, we are interested whether Open Source projects indeed represent exceptions from basic software engineering economics, as argued by recent works.

Regarding methodology, answering this question involves two important challenges:
  1. We need a quantitative measure to assess the productivity of a team
  2. We must be able to calculate the size of a development team at any given point in time
In our work, we addressed these challenges as follows. First, in line with a large body of earlier studies and notwithstanding the fact that it necessarily gives rise to a rather limited notion of productivity, we use a proxy measure for productivity that is based on the amount of source code committed to the project's repository. In fact, there are different ways to define such a measure. While a number of previous studies have simply used the number of commits to proxy the amount of code produced, we find that the distribution of code contributions in these commits are so broadly distributed that we cannot simply use it as a measure for productivity. Doing so would substantially bias our analysis. To avoid this problem, we use the Levenshtein distance between the code versions in consecutive commits, which allows us to quantify the number of characters edited between consecutive versions of the source code.

A second non-trivial challenge is to assess the size of a development team in Open Source communities. Most of the time there is no formal notion of a team, so who should be counted as a team member at a given point in time? Again, we address this problem by means of an extensive statistical analysis. Specifically, we analyze the inter-commit times of developers in the commit log. This allows us to define a reasonably-sized time window based on the prediction whether team members are likely to commit again in the future after a given period of inactivity. This time window can then be used to estimate team sizes in a way that is substantiated by the temporal actitiy distribution in the data set (see [1] for details).

We now have all that we need. To answer our question we only need to plot the average code contribution per team member (measured in terms of the Levenshtein distance) against the size of the development team (calculated as described above). If Ringelmann and Brooks are right, we expect a decreasing trend which indicates that developers in larger development teams tend to produce less. If, on the other hand, studies highlighting synergetic effects in OSS communities are right, we expect an increasing trend which indicates that developers in larger developments teams tend to produce more (because they are more motivated). The results across all of the 58 projects are shown in the following plot.


The clear decreasing trend that can be observed visually shows that Ringelmann and Brooks are seemingly right. We can further use a log-linear regression model to quantify the scaling factors and to assess the robustness of our result. This analysis confirms a strong negative relation, spanning several orders of magnitude in terms of the code contributions. Notably, this negative relation holds both at the aggregate level as well as for each of the studied projects individually.

While our analysis quantitatively confirms the Ringelmann effect in all of the studied projects, we have not yet addressed why it holds. Unfortunately, it is non-trivial to quantify potential motivational factors that have been discussed in social psychology. But what we can do is to study potential effects which are due to increasing coordination efforts. For this, we again use the time-stamped commit log of projects to infer simple proxies for the coordination structures of a project. Precisely, we construct complex networks based on the co-editing of source code regions by multiple developers. Whenever we detect that a developer A changed a line of source code that was previously edited by a developer B, we draw a link from A to B. The meaning of such a link is that we assume that there is the potential need for developer A to coordinate his or her change with developer B.

The result of this procedure are co-editing networks that can be constructed for different time ranges and projects. We can now study how these networks change as teams increase in size. What we find is that, in line with Brooks' argument on the increasing coordination and communication effort, the number of links in the co-editing networks tends to grow in a super-linear fashion as teams grow larger. The result of this is that the coordination overhead for each team member is likely to increase as the team grows, thus providing an explanation for the decreasing code production. Moreover, by fitting a model that allows us to estimate the speed at which co-editing networks grow in different projects, we find that there is a statistically significant relation between the growth dynamics of co-editing links and the scaling factor for the decrease of productivity. This finding indicates that the management of a project and resulting coordination structures can significantly influence the productivity of team members, thus enforcing or mitigating the strength of the Ringelmann effect as the team grows.

So, do developers really become more productive as teams grow larger? Does the whole team really produce more than the sum of its team members? Or do we find evidence for Brooks' law and the Ringelmann effect? Based on our large-scale data analysis of more than 580,000 commits by more than 30,000 developers in 58 Open Source Software projects, we can safely conclude that there is a strong Ringelmann effect in all of the studied Open Source projects. As expected based on basic software engineering wisdom, our findings show that developers in larger teams indeed tend to produce less code than developers in smaller teams. Our analysis of time-evolving co-editing networks constructed from the commit log history further suggests that the increasing coordination overhead imposed by larger teams is a factor that drives the decrease of developer productivity as teams grow larger.

In summary, Open Source projects seem to be no magical exceptions from the basic principles of collaborative software engineering. Our study demonstrates how data science and network analysis techniques can provide actionable insights into software engineering processes and project management. It further shows how the application of computational techniques to large amounts of publicly available data on social organizations allows to study hypotheses relevant to social psychology. As such it highlights interesting relations between empirical software engineering and computational social science which provide a large potential for interesting future works.

[1] Ingo Scholtes, Pavlin Mavrodiev, Frank Schweitzer: From Aristotle to Ringelmann: a large-scale analysis of productivity and coordination in Open Source Software projects, Empirical Software Engineering, Volume 21, Issue 2, pp 642-683, April 2016, available online

Sunday, August 14, 2016

Release management in Open Source projects

By: Martin Michlmayr (@MartinMichlmayr)
Associate editor: Stefano Zacchiroli (@zacchiro)

Open source software is widely used today. While there is not a single development method for open source, many successful open source projects are based on widely distributed development models with many independent contributors working together. Traditionally, distributed software development has often been seen as inefficient due to the high level of communication and coordination required during the software development process. Open source has clearly shown that successful software can be developed in a distributed manner.

The open source community has over time introduced many collaboration systems, such as version control systems and mailing lists, and processes that foster this collaborative development style and improve coordination. In addition to implementing efficient collaboration systems and processes, it has been argued that open source development works because it aims to reduce the level of coordination needed. This is because development is done in parallel streams by independent contributors who work on self-selected tasks. Contributors can work independently and coordination is only required to integrate their work with others.

Relatively little attention has been paid to release management in open source projects in the literature. Release management, which involves the planning and coordination of software releases and the overall management of releases throughout the life cycle, can be studied from many different aspects. I investigated release management as part of my PhD from the point of view of coordination theory. If open source works so well because of various mechanism to reduce the level of coordination required, what implications does this have on release management which is a time in the development process when everyone needs to come together to align their work?

Complexity of releases

As it turns out, my study on quality problems has highlighted that release management can be a very problematic part in the production of open source. Several projects described constant delays with their releases, leading to software which is out-of-date or which has other problems. Fundamentally, release management relies on trust. Contributors have to trust release managers and deadlines imposed by them, otherwise the deadlines are simply ignored. This leads to a self-fulfilling prophecy: because developers don't believe a release will occur on time, they continue to make changes and add new code, leading to delays with the release. It's very hard to break such a vicious circle.

It's important to consider why creating alignment in open source projects can be a challenge. I identified three factors that made coordination difficult:
  1. Complexity: alignment is harder to achieve in large projects and many successful open source projects have hundreds of developers.
  2. Decentralization: many open source projects are distributed, which can create communication challenges.
  3. Voluntary nature: it's important to emphasize that this does not mean that contributors are unpaid. While some open source contributors are unpaid, increasingly open source development is done by developers employed or sponsored by corporations. The Linux kernel is a good example with developers from hundreds of companies, such as Intel and Red Hat. What I mean by voluntary nature is that the project itself has no control over the contributors. The companies (in the case of paid developers) define what work gets done and unpaid contributors generally "scratch their own itch". What this means is that it's difficult for a release manager or a project leader to tell everyone to align their work at the same time.

Time-based releases

While my research has shown problems with release management in many projects, it has also identified a novel approach to release management employed by an increasing number of projects. Instead of doing releases based on new features and functionality, which has historically been the way releases are done, releases are made based on time. Time-based releases are like a release train: there is a clear timetable by which developers can orient themselves to plan their work. If they want to make the next train (release), they know when they have to be ready.

Time-based releases work particularly well if the releases are predictable (for example, every X months) and frequent. If a release is predictable, developers can plan accordingly. If a release is fairly frequent, it doesn't matter if you miss a release because you can simply target the next one. You can think of a train station with a train that is leaving. Should you run? If you know the next train will leave soon and you can trust that the next train will leave on time, there is no need to hurry — you can avoid a dangerous sprint across the platform. Similarly, if the next release is near and predictable, you don't need to disrupt the project by making last minute changes.

Additionally, frequent releases give developers practice making releases. If a project does releases only every few years, it's very hard to make the release process an integral process of the development cycle. When releases are done on a regular basis (say every three or six months), the release process can work like a machine — it becomes part of the process.

Time-based releases are a good mechanism for release management in large open source projects. Speaking in terms of coordination theory, time-based releases decrease the level of coordination required because the predictable timetable allows developers to plan for themselves. The timetable is an important coordination mechanism in its own right.

Looking at various open source projects, time-based release management is implemented in different ways. For example, GNOME and Ubuntu follow a relatively strict frequency of six months. This is frequent, predictable and easy to understand. The Linux kernel employs a 2 week "merge window" in which new features are accepted. This is followed by a number of release candidates with bug fixes (but no new features) until the software is ready for release, typically after 7 or 8 release candidates. Debian follows a model where the freeze date is announced in advance. The time between freeze and release depends on how fast defects get fixed. Debian's freeze dates are more than 2 years apart, which in my opinion leads to challenges because the release process is not performed often enough to become a routine.

Recent changes

There have been many changes recently in the industry and development community that have influenced release management. One change relates to the release frequency, which has been going up in several projects (such as U-Boot, which moved from a release every three months to one every two months in early 2016). This may stem from the need to serve updates to users more frequently because the technology is changing so rapidly. It could also be because the release train is working so well and the cost of doing releases has gone down. Frequent releases lead to a number of questions, though. For example, do users prefer small, incremental updates over larger updates? Furthermore, how can you support old releases or are users expect to upgrade to the latest version immediately (a model employed by various app stores for mobile devices)?

In addition to more frequent releases, I also observe that some projects have stopped making releases altogether. In a world of Continuous Integration (CI) and Continuous Deployment (CD), does it make sense to offer the latest changes to users because you can assure that every change has been sufficiently tested? Is there still a value of performing separate releases when you have CI/CD?

I believe more research is needed to understand release management in our rapidly changing world, but one thing is clear to me: studying how contributors with different interests come together to produce and release software is fascinating!

References

Sunday, August 7, 2016

Architecture-Based Self-Protecting Software Systems

By:
Eric Yuan and Sam Malek,
Software Engineering and Analysis Lab,
University of California, Irvine

Associate Editor: Mehdi Mirakhorli (@MehdiMirakhorli)

Security remains one of the principal concerns for modern software systems. In spite of the significant progress over the past few decades, the challenges posed by security are more prevalent than ever before. As the awareness grows of the limitations of traditional, static security models, current research shifts to dynamic and adaptive approaches, where security threats are detected and mitigated at runtime, namely, self-protection.  Self-protection has been identified by Kephart and Chess [1] as one of the essential traits of self-management for autonomic computing systems.  From a “reactive” perspective, the system automatically defends against malicious attacks or cascading failures, while from a “proactive” perspective, the system anticipates security problems in the future and takes steps to mitigate them.  My systematic survey of this research area [2] shows that although existing research has made significant progress towards autonomic and adaptive security, gaps and challenges remain. Most prominently, self-protection research to-date has primarily focused on specific line(s) of defense (e.g., network, host, or middleware) within a software system.  Such approaches tend to focus on a specific type or category of threats, implement a single strategy or technique, and/or protect a particular component or layer of the system.  In contrast, little research has provided a holistic understanding of overall security posture and concerted defense strategies and tactics.  

In this research project, we are making a case for an architecture-based self-protection (ABSP) approach to address the aforementioned challenges. In ABSP, detection and mitigation of security threats are informed by an architectural representation of the software that is kept in sync with the running system. An architectural focus enables the approach to assess the overall security posture of the system and to achieve defense in depth, as opposed to point solutions that operate at the perimeters. By representing the internal dependencies among the system's constituents, ABSP provides a better platform to address challenging threats such as insider attacks. The architectural representation also allows the system to reason about the impact of a security breach on the system, which would inform the recovery process. 

To prove the feasibility of the ABSP approach, we have designed and implemented an architecture-based, use case-driven framework, dubbed ARchitectural-level Mining Of Undesired behavioR (ARMOUR), that involves mining software component interactions from system execution history and applying the mined architecture model to autonomously identify and mitigate potential malicious behavior.  

The first step towards ABSP is the timely and accurate detection of security compromises and software vulnerabilities at runtime, which is a daunting task in its own right. To that end, the ARMOUR framework starts with monitoring component-based interaction events at runtime, and using machine learning methods to capture a set of probabilistic association rules or patterns that serve as a normal system behavior model. The framework then applies the model with an adaptive detection algorithm to efficiently identify potential malicious events. From the machine learning perspective, we identified and tailored two closely related algorithms, Association Rules Mining and Generalized Sequential Pattern Mining, as the core data mining methods for the ARMOUR framework. My evaluation of both techniques against a real Emergency Deployment System (EDS) has demonstrated very promising results [3,4,5].  In addition to threat detection, the ARMOUR framework also calls for the autonomic assessment of the impact of potential threats on the target system and mitigation of such threats at runtime. In a recent work [yuan_architecture-based_2013], we have shown how this approach can be achieved through (a) modeling the system using machine-understandable representations, (b) incorporating security objectives as part of the system's architectural properties that can be monitored and reasoned with, and (c) making use of autonomous computing principles and techniques to dynamically adapt the system at runtime in response to security threats, without necessarily modifying any of the individual components. Specifically, we illustrated several architecture-level self-protection patterns that provide reusable detection and mitigation strategies against well-known web application security threats.

The high-level architecture of the framework is depicted in the diagram below:




My work outlined in this project makes a convincing case for the hitherto overlooked role of software architecture in software security, especially software self-protection. The ABSP approach complements existing security mechanisms and provides additional defense-in-depth for software systems against ever-increasing security threats. By implementing self-protection as orthogonal architecture concerns, separate from application logic (as shown in the diagram), this approach also allows self-protection mechanisms to evolve independently, to quickly adapt to emerging threats. 

References:
  1. Kephart, J., and Chess, D. The vision of autonomic computing. Computer 36, 1 (Jan. 2003), 41–50.
  2. Yuan, E., Esfahani, N., and Malek, S. A Systematic Survey of Self-Protecting Software Systems. ACM Trans. Auton. Adapt. Syst. (TAAS) 8, 4 (Jan. 2014), 17:1–17:41.
  3. Esfahani, N., Yuan, E., Canavera, K. R., and Malek, S. Inferring software component interaction dependencies for adaptation support. ACM Trans. Auton. Adapt. Syst. (TAAS) 10, 4 (2016), 26.
  4. Yuan, E., Esfahani, N., and Malek, S. Automated Mining of Software Component Interactions for Self-adaptation. In Proceedings of the 9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (New York, NY, USA, 2014), SEAMS 2014, ACM, pp. 27–36.
  5. Yuan, E., and Malek, S. Mining software component interactions to detect security threats at the architectural level. In Proceedings of the 13th Working IEEE/IEIP Conference on Software Architecture (Venice, Italy, Apr. 2016), WICSA 2016.
  6. Yuan, E., Malek, S., Schmerl, B., Garlan, D., and Gennari, J. Architecture-based Self-protecting Software Systems. In Proceedings of the 9th International ACM Sigsoft Conference on Quality of Software Architectures (New York, NY, USA, 2013), QoSA '13, ACM, pp. 33–42.