IEEE Software Blog: April 2016

Monday, April 25, 2016

Common Architecture Weakness Enumeration (CAWE)

By Mehdi Mirakhorli (@MehdiMirakhorli), Associate Editor.

Software architecture design is the first and the fundamental step to address quality goals surrounding attributes such as security, privacy, safety, reliability, dependability, and performance. Design Flaws in the architecture of a software system mean that successful attacks could result in enormous consequences. To satisfy a security concern, an architect must consider alternate design solutions, evaluate their trade-offs, identify the risks and select the best solution. Such design decisions are often based on well-known architectural patterns, defined as reusable techniques for achieving specific quality concerns.

Security patterns come in many different shapes and sizes and provide solutions for enforcing the data integrity, privacy, accountability, availability, safety and non-repudiation requirements, even when the system is under attack.

Previous estimations indicate that roughly 50% of security problems are the result of software design flaws such as miss-understanding architecturally significant requirements, poor architectural implementation, violation of design principles in the source code and degradations of the security architecture. Flaws in the architecture of a software system can have a greater impact on various security concerns in the system and, as a result, giving more space and flexibility for malicious users.

Fundamentally, Design flaws (or only "flaws'') are different from Bugs, as the latter are more code-level while the former are at a deeper level and are much more subtle than bugs such as buffer overflows. Although a software system will always have bugs, recent studies show that the security of many software applications is breached due to flaws in the architecture.

Architectural flaws are results of inappropriate design choices in early stages of software development, incorrect implementation of security patterns, or degradation of security architecture over time.

An example of such architectural flaw is the Use of Client-Side Authentication, in which a client/server product performs authentication within client code, but not in server code, allowing server-side authentication to be bypassed via a modified client that omits the authentication check.

This design decision to implement authentication creates a flaw in the security architecture. It can be successfully exploited by an intruder with reverse-engineering skills.

Even though there are many techniques and practices such as threat modeling, static and dynamic code analysis, penetration testing that help developing a secure software system, there have not been many previous research papers in the literature that approach security from the architecture perspective. A recent effort is the IEEE Center for Secure Design launched by IEEE Computer Society. However as of today, there are not many examples or catalog of design flaws obtained or published yet that can help architects and developers learn and avoid such flaws.

Therefore, in our research team, we are working on establishing a catalog of Common Architecture Weakness Enumeration, containing architectural weaknesses that may create security breaches within the software.

This catalog is built on top of the previous library of Common Software Weaknesses Enumeration, which documents about 1000 software weaknesses. These weaknesses, however, are not categorized based on their architectural impacts and do not clearly distinguish between architectural weaknesses (security issues rooted in software architecture) and programming issues. We categorize these weaknesses into architectural and non-architectural and release the resulting catalog for the public. In addition, in a series of real case studies, we demonstrate instances of architectural weaknesses in four systems. These case studies indicate that the catalog of architectural weaknesses will be helpful for architects and designers to adopt a proactive approach to architecture-bases security.

Designing for Security

To ensure an application is secure, the security principles need to be implemented from the ground up. During requirements analysis, malicious practices are taken for granted, and requirements engineers identify all the use cases which are interests of an attacker. During architecture design, architects carefully analyze these requirements and adopts appropriate security patterns to resist, detect and recover from attacks.

Weaknesses in a Security Architecture

A software architecture can be flawed for many reasons resulting fundamental breaches in the system. Such flaws occur because of bad design decisions (flaws of commission), lack of design decisions (flaws of omission) or incorrect implementation of architectural patterns used to make the system secure (flaws of realization). These types of flaws are discussed in the following:

Flaws of Omission. Such design flaws result from ignoring a security requirement or potential threats. Such flaws identify decisions that were never made. A common design flaw is to store a password in a file without encryption. In this flaw the architect assumes that attackers would never have access to the file, thereby considering that the password stored in plaintext would not correspond to a compromise of the system. However, such design decision can open the system to attacks, because anyone, who has granted read access to the file, will be able to read all the stored passwords.

Flaws of Commission.Such design flaws refer to the design decisions which were made and could lead to undesirable consequences. Examples of such flaws are “Client side authentication” or ”using a weak encryption algorithm” to achieve better performance while maintaining data confidentiality.

Flaws of Realization. The design decision is correct but the implementation of that suffers from a coding mistake. For instance, the system was designed to have the Chroot Jail pattern. In this pattern, a controlled environment (“jail”) is created to limit access to system files so, attackers are avoided to exploit files/directories outside a specific directory. A common way to implement this pattern on Unix environments is to invoke the chroot() system function, which creates the jail but does not change the current working directory. Consequently, a developer may incorrect implement it through creating the chroot jail without changing the working directory, which allows that relative path still point to files- side the jail. Thus, attackers would still be able to access files/directories outside the jail even after invoking the chroot() function.

Examples of Weaknesses in Security Architecture

The Secure Session Management pattern is concerned about keeping track of sessions, which are a set of activities performed over a limited time period by a certain user. The main goal of this pattern is to keep track of who is using the system at a given time through managing a session object that contains all relevant data associated to the user and the session. In this pattern, every user is assigned an exclusive identifier (Session ID), which is utilized for both identifying users and retrieving the user-related data. Since session IDs are a sensitive information, this pattern may be affected by two main types of attacks: session hijacking (an attacker impersonate a legitimate user through stealing or predicting a valid session ID) and session fixation (an attacker has a valid session ID and forces the victim to use this ID).

The session hijacking can be facilitating by the architectural flaw of not securing the storage of session identifiers. Such flaw can be observed in the “session” module of the PHP language:

Per this description we note that PHP was designed to store each data session in plain textual files in a temporary directory without using a security mechanism for storing these session files (such as encryption). When closely inspecting the source code of PHP in version 4.0, we observe that the mod file.cc names every session file as "sess_xyz” (where "xyz" is the session ID), as shown in the code snippet presented above (where buf is a variable later used when creating the session files).
Figure 1(a) shows a scenario in which the flaw could be exploited. First, a legitimate user successfully identifies him/herself to the application. This causes the Web application written in PHP to start a session for the user through invoking the session start() from the PHP’s session module. Then, the session module in the PHP assigns a session ID for the user and it creates a new file named as “sess_qEr1bqv1q4V2FGX9C7mvb0” to store the data about the user’s session. At this point, the security of the application is compromised when an attacker observes the session file name and realizes that the user’s session ID is equal to “qEr1bqv1q4V2FGX9C7mvb0”. Subsequently, the attacker is able to impersonate the user through sending a cookie (PHPSESSIONID) in a HTTP request with this stolen Session ID. The Web application, after calling functions from the PHP’s session, verifies that the session ID provided matches with the user’s data so, the application considers that the requests are being made by a legitimate user.

From this scenario we can observe that such architectural weakness can lead to many consequences. First, if the user has an administrative role in the application, the attacker will be able to perform all the administrative tasks. Second, the attacker may be able to read the contents of the session file, thereby accessing the data, which may be sensitive, about the user that the attacker is not supposed to have access. It is important to highlight that such flaw affects not only the Secure Session Management, but also other security patterns (e.g. Authentication and Authorization) which uses the Secure Session Management for performing authentication and access control of users.

An example from PHP of an architectural weakness that facilitates the session fixation is shown below:

When verifying the session implementation in the source code of PHP version 5 we note that there is an incorrect implementation (i.e. a realization flaw) in the PHP’s session module that accept uninitialized session IDs before using it for authentication/authorization purposes. In fact, in the line 158 shown above, the function ps files valid key() does not properly validates the session ID, it only checks whether it contains a valid charset and has a valid length but does not verify whether the ID actually exists associated to the client performing the HTTP request.
Figure 1(a) shows how this architectural vulnerability is exploited. The attack starts with the attacker establishing a valid session ID (steps 1 to 4). Next, the attacker induces the user to authenticate him/herself in the system using the attacker’s session ID (steps 5 and 6).

Application of CAWE Catalog

Given that the CAWE catalog provides detailed information about architectural weaknesses, it can be used to guide architects and developers make appropriate design and implementation decisions to preserve security concerns at the architectural level throughout the software development lifecycle.

For example, code reviews are usually focused on finding bugs through technical discussions and analysis of the source code and other related artifacts (such as a portion of the requirements document, the architecture, etc). However using the CAWE catalog, the reviewers, who are responsible for inspecting the code, can check common security issues in their software. Past experiences in industry lead to the creation of security-driven software development processes, which emphasizes security concerns early in the software development lifecycle, such as CLASP (Comprehensive, Lightweight Application Security Process) and Microsoft’s SDL (Security Development Lifecycle). A common aspect of these processes and practices is the recommendation of providing proper training of the employees to promote a common background about software security. With this respect, our catalog could be used to aid such training and promote the awareness of the potential architectural issues that a system may be exposed to. Moreover, those security-driven processes include two activities for modeling potential threats in the software: threat modeling and design of misuse cases. These two activities are usually done through brainstorming sections. Such brainstorming could be aided with the CAWE for obtaining insights. In fact, practitioners in the security domain, support the usage of threat libraries, built from the MITRE’s catalog, for aiding this threat modeling process. Architectural risk analysis, which is a systematic approach of evaluating de- sign decisions against quality requirements, could also benefit from our catalog as a guidance for the evaluation.

You may also like:

Iván Arce, Kathleen Clark-Fisher, Neil Daswani, Jim DelGrosso, Danny Dhillon, Christoph Kern, Tadayoshi Kohno, Carl Landwehr, Gary McGraw, Brook Schoenfield, Margo Seltzer, Diomidis Spinellis, Izar Tarandach, and Jacob West, Avoiding the Top 10 Software Security Design Flaws, IEEE Cybersecurity, 2015.

Hall, Anthony & Chapman, Roderick. “Correctness by Construction: Developing a Commercial Secure System.” IEEE Software 19, 1 (Jan./Feb. 2002): 18-25.

Linger, R. C. “Cleanroom Process Model.” IEEE Software 11, 2 (March 1994): 50-58.

Acknowledgement:
"This post includes joint work with Joanna Santos and Jairo Pavel Veloz Vidal, graduate students at RIT.

Sunday, April 10, 2016

Dissecting The Myth That Open Source Software Is Not Commercial

By: Karl Fogel (@kfogel)
Associate editor: Stefano Zacchiroli (@zacchiro)

Writing a myth-debunking piece for such an informed audience poses a certain risk. The readers of the IEEE Software Blog already know what open source software is, and many have probably written some. How can I be sure that anyone reading this even holds the belief about to be debunked?

Well, I can't be completely sure, but can at least say that this myth is one I still encounter frequently among software specialists, including people who themselves use free software on a daily basis. (By the way, I will use the terms "free" — as in "free software" — and "open source" interchangeably here, because they are synonyms in the sense that they refer to the same set of software and the same set of pro-sharing software licenses.) The continued prevalence of this myth in many organizations is an obstacle to the adoption and production of open source software.
First, to state it clearly:

Myth: Open source is not commercial, or is even anti-commercial, and is driven mostly by volunteerism.

That's really two myths, but they're closely related and it's best to address them together.

In mainstream journalism, open source is still almost always portrayed as a spare-time activity pursued by irrepressible programmers who band together for the love of coding and for the satisfaction they get from releasing free tools to the world. (See the second letter here for one example, but there are many, many more examples like that.) Surprisingly, this portrayal is widespread within the software industry too, and in tech journalism. There is, to be fair, a grain of truth to the legend of the volunteers: especially in the early days of open source — from the mid 1980s until the late 1990s (a period when it wasn't even called "open source" yet, just "free software") — a higher proportion of open source development could legitimately have been called volunteer than is the case today.

But still not as high a proportion as one might think. Much free software activity was institutionally funded even then, although the institutions in question weren't always aware of it. Programmers and sysadmins frequently launched shared collaborative projects simply to make their day jobs easier. Why should each person have to write the same network log analyzer by themselves, when a few people could just write it once, together, and then maintain it as a common codebase? That's cheaper for everyone, and a lot more enjoyable.

In any case, intentional investment in open source by for-profit outfits started quite early on, and such investment has only been growing since (indeed, to the point now where meta-investment is happening: for example, my company, Open Tech Strategies, flourishes commercially by doing exclusively open source development and by advising other organizations on how to run open source projects). For a long time now, a great deal of widely-used open source software has been written by salaried developers who are paid specifically for their work on that software, and usually paid by for-profit companies. There is not space here to discuss all their business models in depth, nor how erstwhile competitors manage to collaborate successfully on matters of shared concern (though note that no one ever seems to wonder how they manage this when it comes to political lobbying). Suffice it to say that there are many commercial organizations in whose interests it is to have this growing body of code be actively maintained, and who have no need to "own" or otherwise exercise monopolistic control over the results.

A key ingredient in this growth has been the fact that all open source licenses are commercial licenses. That is, they allow anyone to use the software for any purpose, including commercial purposes. This has always been part of the very definition of a "free software" or "open source" license, and that's why there is no such thing as software that is "open source for non-commercial use only", or "open source for academic use only", etc.

An important corollary of this is that open source software automatically meets the standard government and industry definition of "Commercial Off-The-Shelf" (COTS) software: software that is commercially available to the general public. COTS doesn't mean you must pay money — though you might choose to purchase a support contract, which is a fee for service and is very different from a licensing fee. COTS essentially just means something that is equally available to all in the marketplace, and open source software certainly fits that bill.

So: open source is inherently commercial, and the people who write it are often paid for their work via normal market dynamics.

Why, then, is there a persistent belief that open source is somehow non-commercial or anti-commercial, and that it's developed mainly by volunteers?

I think this myth is maintained by several mutually reinforcing factors:

Open source's roots are as an ideologically-driven movement (under the name "free software"), opposed to monopoly control over the distribution and modification of code. Although that movement has turned out to be successful in technical and business terms as well, it has not shed its philosophical roots. Indeed, I would argue, though will not here due to space limitations, that its philosophical basis is an inextricable part of its technical and business success. (It is worth considering deeply the fact that merely being anti-monopoly is enough to get a movement a reputation for being anti-commercial; perhaps it is the roots of modern capitalism as actually practiced that need closer examination, not the roots of open source.)
For a time, various large tech companies whose revenues depend mainly on selling proprietary software on a fee-for-copying basis made it a standard part of their marketing rhetoric to portray open source as being either anti-commercial or at any rate unconcerned with commercial viability. In other words: don't trust this stuff, because there's no one whose earnings depend on making sure your deployment is successful. This tactic has become less popular in recent years, as many of those companies start to have open-source offerings themselves. I hope to see it gradually fade away entirely, but its legacy lives on in the many corporate and government procurement managers who were led to believe that open source is the opposite of commercial.
Many companies now offer software-as-a-service based on open source packages with certain proprietary additions — those additions being their "value-add" (or, less formally, their "secret sauce"), the thing that distinguishes their SaaS offering from you just deploying the open source package on which it is based, and the thing that not coincidentally has the potential to lock you in to that provider.
Unfortunately, companies with such offerings almost always refer to the open source base package as the "community edition", and their proprietized version as the "commercial edition" or sometimes "enterprise edition". A more accurate way to label the two editions would be "open source" and "proprietary", of course. But, from a marketing point of view, that has the disadvantage of making it plain what is going on.
Software developers have multiple motivations, and it's true that in open source, some of their motivation is intrinsic and not just driven by salary. It's actually quite common for open source developers to move from company to company, being paid to work on the same project the whole time; their résumé and work product are fully portable, and they take advantage of that. Open source means that one cannot be alienated from the fruits of one's labor, even when one changes employers. There is nothing anti-commercial about this — indeed, it could be viewed as the un-distortion of a market — but one can certainly see how observers with particular concerns about the mobility of labor might be inclined to fudge that distinction.

Finally, I think people also want to believe in a semi-secret worldwide army of happy paladins acting for the good of humanity. It would be so comforting to know they're out there. But what's actually going on with open source is much more complex and more interesting, and is firmly connected to commerce.

References

Sunday, April 3, 2016

The Descartes Modeling Language for Self-Aware Performance and Resource Management

Samuel Kounev, University of Würzburg, Würzburg, Germany
Associate Editor: Zhen Ming (Jack) Jiang, York University, Toronto, Canada

Modern software systems have increasingly distributed architectures composed of loosely-coupled services that are typically deployed on virtualized infrastructures. Such system architectures provide increased flexibility by abstracting from the physical infrastructure, which can be leveraged to improve system efficiency. However, these benefits come at the cost of higher system complexity and dynamics. The inherent semantic gap between application-level metrics, on the one hand, and resource allocations at the physical and virtual layers, on the other hand, significantly increase the complexity of managing end-to-end application performance.

To address this challenge, techniques for online performance prediction are needed. Such techniques should make it possible to continuously predict at runtime: a) changes in the application workloads [3], b) the effect of such changes on the system performance, and c) the expected impact of system adaptation actions [1]. Online performance prediction can be leveraged to design systems that proactively adapt to changing operating conditions, thus enabling what we refer to as self-aware performance and resource management [4, 7]. Existing approaches to performance and resource management in the research community are mostly based on coarse-grained performance models that typically abstract systems and applications at a high-level (e.g., [2, 5, 8]). Such models do not explicitly model the software architecture and execution environment, distinguishing performance-relevant behavior at the virtualization level vs. at the level of applications hosted inside the running VMs. Thus, their online prediction capabilities are limited and do not support complex scenarios such as predicting how changes in application workloads propagate through the layers and tiers of the system architecture down to the physical resource layer, or predicting the effect on the response times of different services, if a VM in a given application tier is to be replicated or migrated to another host, possibly of a different type.

To enable online performance prediction in scenarios such as the above, architecture-level modeling techniques are needed, specifically designed for use in online settings. The Descartes Modeling Language (DML) provides such a language for performance and resource management of modern dynamic IT systems and infrastructures. DML is designed to serve as a basis for self-aware systems management during operation, ensuring that system performance requirements are continuously satisfied while infrastructure resources are utilized as efficiently as possible. DML provides appropriate modeling abstractions to describe the resource landscape, the application architecture, the adaptation space, and the adaptation processes of a software system and its IT infrastructure [1, 4, 6]. An overview of the different constituent parts of DML and how they can be leveraged to enable online performance prediction and proactive model-based system adaptation can be found in [6]. A set of related tools and libraries are available from the DML website at: http://descartes.tools/dml.

References

[1] F. Brosig, N. Huber, and S. Kounev. Architecture-Level Software Performance Abstractions for Online Performance Prediction. Elsevier Science of Computer Programming Journal (SciCo), Vol. 90, Part B:71–92, 2014.

[2] I. Cunha, J Almeida, V. Almeida, and M. Santos. Self-Adaptive Capacity Management for Multi-Tier Virtualized Environments. In IFIP/IEEE Int. Symposium on Integrated Network Management, pages 129–138, 2007.

[3] N. Herbst, N. Huber, S. Kounev, and E. Amrehn. Self-Adaptive Workload Classification and Forecasting for Proactive Resource Provisioning. Concurrency and Computation - Practice and Experience, John Wiley and Sons, 26(12):2053–2078, 2014.

[4] N. Huber, A. van Hoorn, A. Koziolek, F. Brosig, and S. Kounev. Modeling Run-Time Adaptation at the System Architecture Level in Dynamic Service-Oriented Environments. Service Oriented Computing and Applications Journal, 8(1):73–89, 2014.

[5] G. Jung, M.A. Hiltunen, K.R. Joshi, R.D. Schlichting, and C. Pu. Mistral: Dynamically Managing Power, Performance, and Adaptation Cost in Cloud Infrastructures. In IEEE Int. Conf. on Distributed Computing Systems, pages 62 –73, 2010.

[6] S. Kounev, N. Huber, F. Brosig, and X. Zhu. Model-Based Approach to Designing Self-Aware IT Systems and Infrastructures. IEEE Computer Magazine, 2016, IEEE. To appear. http://se2.informatik.uni-wuerzburg.de/pa/uploads/papers/paper-926.pdf

[7] S. Kounev, X. Zhu, J. O. Kephart, and M. Kwiatkowska, editors. Model-driven Algorithms and Architectures for Self-Aware Computing Systems. Dagstuhl Reports. Dagstuhl, Germany, January 2015. http://drops.dagstuhl.de/opus/volltexte/2015/5038/

[8] Qi Zhang, Ludmila Cherkasova, and Evgenia Smirni. A Regression-Based Analytic Model for Dynamic Resource Provisioning of Multi-Tier Applications. In Proceedings of the 4th International Conference on Autonomic Computing, 2007.

If you like this article, you might also enjoy reading:

A. Avritzer, J. P. Ros and E. J. Weyuker, "Reliability testing of rule-based systems," IEEE Software, vol. 13, no. 5, pp. 76-82, Sep 1996.
E. Dimitrov, A. Schmietendorf, R. Dumke, "UML-Based Performance Engineering Possibilities and Techniques”, IEEE Software, vol. 19, no. 1, pp. 74-83, Jan-Feb, 2002.
J. Happe, H. Koziolek and R. Reussner, "Facilitating Performance Predictions Using Software Components," in IEEE Software, vol. 28, no. 3, pp. 27-33, May-June 2011.

Associate Editors

Jeffrey Carver (Practitioners' digest)
Dario Di Nucci (Testing)
Niko Mäkitalo (Microservices/Software Architecture)
Sofia Ouhbi (Requirements Engineering and Software Sustainability)
Varun Gupta (Global developments)
Jinghui Cheng (Human Aspects)
Muneera Bano (User Centric/Human Aspects)
Ronald Jabangwe (Software Engineering Process Models)
Mehdi Mirakhorli (Design/ Architecture and Requirements)
Brittany Johnson (Issue and SE Radio Summary)
Sarah Nadi (Software release and configuration management)
Stefano Zacchiroli (Open source software systems)
Federica Sarro (Mobile applications and systems)
Sridhar Chimalakonda (Software Quality and Software Reuse)
Danilo Pianini (Pervasive computing)
Karim Ali (Programming Languages)
Mei Nagappan (Practitioner perspectives)
Xabier Larrucea (Practitioner perspectives)