Monday, March 20, 2017

IEEE January/February Issue, Blog, and SE Radio Summary

Associate Editor: Brittany Johnson (@brittjaydlf)

The January/February issue of IEEE Software continues to deliver new and exciting things happening in the software world. This issue covers a few different of topics, such as global software engineering and software requirements, but focuses on software engineering for the internet of things. A large subset of the articles in this issue

This issue features an article on requirements engineering titled "Guidelines for Managing Requirements Rationale" by Anil Kumar Thurimella, Mathias Schubanz, Andreas Pleuss, and Goetz Botterweck. In this article, the authors discuss both the challenges involved in requirements rationales along with detailed guidelines for creating them. Based on the article, the guidelines presented are best suited for organizations that require both transparency and well thought out decisions.

Most of the papers in this issue discuss challenges and concerns that need to be addressed when software engineering for the Internet of Things (IoT). Aside from the introduction to this issue's focus, titled "Software Engineering for the Internet of Things" by Xabier Larrucea, Annie Combelles, John Favaro, and Kunal Taneja, this issue features the following articles on the IoT:

Some articles provide general suggestions for IoT software engineering. For example, in "Scalable-Application Design for the IoT", Vankatesh and colleagues propose a modular approach to designing and implementing context-aware IoT applications. Given the dynamic nature of IoT applications and large amounts of raw data produced, scalability becomes an important issue to address early on in the design of IoT applications. The authors propose and analyze an approach that uses context engines, or small and simple functional units, to improve scalability in context-aware IoT applications.

In "Key Abstractions for IoT-Oriented Software Engineering", Zambonelli presents common feature across IoT software and applications to determine the important abstractions for engineering IoT software. The common features he discusses come from existing IoT research on designing and developing IoT systems. Based on the common features, such as "things" and software infrastructure "glue", the author identified the following key abstractions to consider during analysis, design, and development of IoT systems: Stakeholders and Users (Analysis), Requirements (Analysis), Groups and coalitions (Design), Avatars (Design and Development), and Smart Things (Development).

With any growth and change come challenges that must be overcome. In the article "A Roadmap to the Programmable World: Software Challenges in the IoT Era", Taivalsaari and Mikkonen provide a forward-looking roadmap to a programmable world and discuss some of the challenges software developers may face along the way.  The authors highlight the differences between IoT and typical software development, such as the system of devices that comprise an IoT application, and implications of these differences along with possible challenges. Some of the challenges highlighted in this article include considerations for multi-device programming, security of IoT systems, and dealing with the distributed, dynamic, and potentially migratory nature of IoT software.

IEEE Software Blog

The blogs posts from the past couple months continue to reflect on and discuss the state of software development and provide insights into new technologies and approaches relevant to the work we do. January's blog posts focus on recent innovations in new and old technologies. Most interesting is a new tool introduced called the Unified ASAT Visualizer (UAV) which is used for comparing multiple static analysis tools on a given project. This aims to developers decide whether the costs for using more than one tool (i.e. FindBugs + Checkstyle) are worth it. February touches on the theme from the IEEE Software focus topic, Internet of Things, but also discusses other topics such as rethinking the role of developers in OSS and sustainable software design.


SE Radio

Featured on SE Radio this issues is a range of topics. Most of the episodes in this issue focus on software integrity, from testing and dealing with bugs and failures to dealing with the challenges in debugging distributed systems. Invited guests include Florian Gilcher, Gerald Weinberg, James Whittaker, Donny Nadolny, Alexander Tarlinder, John Allspaw, and James Cowling. Also, SE Radio welcomed a new member to the SE Radio Team, Edaena Salinas

Sunday, March 12, 2017

The Spartanizer

Authors: Yossi Gil and Matteo Orrù, Technion, Israel

In this post, we would like to tell you about the Spartanizer, an automatic software-engineering tool developed at the Technion, that helps you code in the “Spartan Programming Style” that means that a programmer phrases his code’s statements in the same way a person of Laconia phrases his statements of speech:

saying the most in a few, clear words.
For example, the Spartanizer takes some verbose piece of code such as
public static String getCurrentLanguage()
{
  String language;
  if (getBooleanProperty("lang.usedefaultlocale"))
    {
language = Locale.getDefault().getLanguage();
    }
    else
    {
language = getProperty("lang.current", "en");
    }
  return language;
}
Figure 1: An example taken from jEdit
(found in the Jedit project), and converts it  into the concise spartan form;
public static String getCurrentLanguage() {
 return getBooleanProperty("lang.usedefaultlocale")
           ? Locale.getDefault().getLanguage()
           : getProperty("lang.current", "en");
}
Figure 2: The spartanized code of Figure 1
Transformations made by the Spartanizer are too many (currently over 150) to enumerate here; their main kinds are compile time evaluation when possible, elimination of dead code, simplification of boolean expressions, removing syntactic baggage such as superfluous call to super(), short-circuiting and early returns, identifying cases in which a for should have been used instead of a while, using standard short names whenever this makes sense.  A frequently recurring theme is refactorings inspired by the distributive rules of arithmetic, ab+ac⇒a(b+c) and ba+ca⇒(b+c)a i.e., identifying a common factor of two
syntactical elements and rewriting it in equivalent way with this common factor occurring only once. The Spartanizer is thus able to unite two adjacent conditionals, factor out commonalities of the two branches of a conditional. Yet another kind of refactorings carried out by the spartanizer are applying programming idioms whenever appropriate, e.g., following fluent API programming style.
The code generated by the Spartanizer is concise, and often looks more functional than imperative, and may need some getting used to conventions. Yet, our experience in teaching spartanization, and applying it in student and non-student projects, including several projects with a couple of hundreds of classes each, shows that after adjustment period is short, and programmers tend to prefer minimal ink to verbosity, as long as the minimal like is regular, i.e., follows a small set of recurring patterns. An empirical study [1] shows that the spartanization of code makes it measurably more predictable.
Spartanized code has unique lean appearance: few statements, minimal number of intermediates, adherence to idioms, use of conditional expressions (and in recent versions Java streams) rather than explicit, general purpose, control such as if, else, while, and for. In fact, spartanized code often assumes the concise and elegant looks of functional programs, even though the original version of the code was purely imperative. For example, this function  for comparing two lists of nodes  (drawn from the Spartanizer’s implementation) was created  automatically from its imperative version.


<N extends ASTNode> boolean same(final List<N> ns1, final List<N> ns2) {
  return ns1 == ns2 ||
    ns1.size() == ns2.size() &&
       range.from(0).to(ns1.size())
       .stream()
       .allMatch -> same(ns1.get(λ), ns2.get(λ)));
 }
Figure 3: An example of spartanized code taken from the Spartanizer project


Conversion can be done either interactively, or in batch form. The Spartanizer features an Eclipse plugin, which offers “spartanization” tips to the programmer. The programmer will reach the concise form by following the six tips offered by the Spartanizer to the jEdit function in Figure  Figure. Unlike many code analysis and code generation tools, the conversion from the verbose to the concise is continuous: one little change at a time. An experimental feature in the Spartanizer, available from version 2.8 is capable of doing the opposite conversion, from the concise to the verbose, making it possible for the programmer to use the mouse wheel for both zooming in and zooming out the code.
Alternatively, spartanization can be done in batch application. Unlike many deep analysis tools automatic spartanization is fast, with more than five thousand tips per minute on a contemporary computer.  Like other refactoring tools, the Spartanizer is optimistic, making e.g. some assumptions on the use of overloading, or that the order of evaluation of arguments to a function is immaterial to correctness.
Internally, the Spartanizer follows this extensible modular structure: spartanization tips are generated by small software modules, called tippers. Each of these generate its own kind of spartanization tips. Tippers are applied automatically or interactively by tip applicators, which apply automatic spartanization to user selected portions of the code (function, class, file or project).
The Spartanizer is available at the Eclipse marketplace, and open source on GitHub.


References
[1] Y. Gil and M. Orrú, “Code Spartanization,” in Proc. of SAC’17, the 32nd ACM Symposium on Applied Computing, Marrakesh, Morocco, April 3–7 2017.
[2] Y. Gil and M. Orrú, “The Spartanizer: Massive Automatic Refactoring,” in Proc. of 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering, Klagenfurt Austria, February 20-24, 2017.
[3] The Spartanizer at the Eclipse Marketplace. https://marketplace.eclipse.org/content/spartan-refactoring-0
[5] The Spartanizer website: http://spartan.org.il/

Monday, March 6, 2017

Helping Developers Configure Application-Level Caches for Web Applications Through DevOps

by Peter (Tse-Hsun) Chen, Queen's University (@petertsehsun)
Associate Editor: Sarah Nadi, University of Alberta (@sarahnadi)

Developers nowadays use application-level caching frameworks, such as Ehcache [1], to cache objects in object-oriented languages in memory. After the objects are cached, application servers do not need to retrieve the object data from external machines such as a database server, which can significantly improve the application performance. 
Alert: Application-level caching frameworks are only useful if you know what you want to cache. However, determining the best way to use such a caching framework may not always be easy. A large-scale web application may have hundreds of different types of objects. Now the challenge is how do we know which type of objects should be cached?
Developers need to manually configure the caching frameworks to enable cache for each type of object (e.g., enable cache for Student objects). Therefore, it is difficult to know what the best cache configuration is, especially given that the benefit of caching is directly related to how users are using the application. For example, enabling caching on frequently-modified objects, will not improve the application performance, but will actually slow down the application due to frequent cache renewal. 
How can we help improve the performance of web applications by finding better cache configurations? 
CacheOptimizer - finding optimal cache configurations by leveraging DevOps.DevOps tries to combine the developer and administrator world to further improve software development. Therefore, one main concept of DevOps is to assist software development by understanding user behaviour (such as knowing how the application is used in production). However, what kind of data can we analyze to recover the user behaviour? One potential way is to instrument the application to record the information that we need. However, this solution is not acceptable for applications running in production, since instrumentation will add too much overhead to the application. 
Instead, we propose a framework, called CacheOptimizer, that analyzes readily-available application runtime logs to automatically help developers find the optimal cache configurations when using application-level caching frameworks. 
Understanding user behaviour by combining static code analysis and runtime logs.CacheOptimizer analyzes web access logs to understand how users are accessing the application. Such access logs are automatically generated by web servers like Tomcat, so we do not add any extra overhead to the application. CacheOptimizer then links the logs to data accesses using static code analysis. For example, consider the following log line: 
200 127.0.0.1 /user/getDetails/peter
The log line contains information about a user request. Here, 200 represents the HTTP status code and 127.0.0.1 represents the IP of the user. Through static code analysis, we can find the method in the code that handles this particular user request. We can also uncover the type of data access that is called in this request-handler method (e.g., accessing user data in the database). Then, by combining the information in the log and the information that we obtained from the code, we know that the user request is reading detailed information about the user Peter from the database. In this simple example, since we are only reading user objects from the database, the optimal cache configuration would be to enable caching on the User class. 
In short, given the logs, CacheOptimizer can recover how the users are actually using the application and can suggest the optimal cache configuration for the application. 
CacheOptimizer brings significant performance improvement. We apply CacheOptimizer to three open source applications. We find that CacheOptimizer can significantly improve the throughput of the application (it can process more requests per second)! 
We compare CacheOptimizer with several different cache configurations. CacheAll, where we simply enable cache on all objects; DefaultCache, where we use the default cache configuration that already exists in the studied application; and NoCache, where we disable all the caches. The figure below shows the cumulative throughput when using different cache configurations in one of the studied applications. We can see that by using CacheOptimizer, we can achieve a much better throughput compared to all other configurations.
Assisting development using production data. Finding the optimal cache configuration is just one way that we can assist software development by leveraging production data. Due to the agile natural of modern software development, a lot of software design and development decisions are directly affected by how users are actually using the application. The idea of DevOps brings a new set of research challenges and interesting problems. As software practitioners and researchers, can we think of new ways to assist software development by embracing DevOps?
You can find more details about CacheOptimizer in our research paper [2].
[1] Ehcache. http://www.ehcache.org/. Last accessed Feb. 6 2017 [2] Tse-Hsun Chen, Weiyi Shang, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora. 2016. CacheOptimizer: helping developers configure caching frameworks for hibernate-based database-centric web applications. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016).

Sunday, February 26, 2017

Sustainable Software Design

by Martin Robillard, McGill University (@mp_robillard)
Associate Editor: Christoph Treude (@ctreude)

cross-posted from Martin Robillard's blog

There's a lot of interest in understanding software design as an activity. But what happens to the outcome of this activity? At one extreme, what happens on the whiteboard stays on the whiteboard (design is never explicitly captured). At the other extreme, design information is meticulously captured and archived, and then invalidated by the first decent refactoring. So without explicit effort, design knowledge just disappears. The consequence is that without design information developers will have to perform ignorant surgery. The temptation for ignorant surgery has to be related to the cost of maintaining accessible design information. When a particular design is expensive to describe and maintain, its description and rationale is at risk of being lost, no matter how awesome the underlying design ideas are.

So why don't we explicitly consider how expensive a particular design decision will be to capture and maintain consistent with the code, before adopting it?

There exist quality models for software design. They include attributes like reusability, flexibility, understandability. But, as far as I can tell we don't yet have an attribute that captures how cost-effective it is to describe a set of design decisions over time. That attribute is what I would call sustainability.

There's no techno-fix to make design more sustainable. It's a complex, multi-faceted problem. In this paper I review the areas of software development research and practice that relate to design sustainability and explain why they are not silver bullets. These include:
  • Modularity can contribute to design sustainability when design decisions map to code organization concerns. Unfortunately many design decisions don't have much to do with modular decomposition. There's also the issue that many concerns cannot easily be modularized.
  • The relation between documentation and sustainable design is paradoxical. On one hand, good documentation can help sustain a design over time. On the other, documentation is expensive, which is a direct factor of unsustainability. The idea of fully self-documenting systems is appealing, but impractical. Parnas and Clements compared it to the Philosopher's Stone.
  • Programming language constructs are another tool to manage sustainability. For example assert statements are a cheap and relatively user-friendly way to capture simple designs rules and assumptions. There's also research to develop language support for stating and verifying more complex design-level properties, such as immutability. The related challenge for design sustainability is that that language-supported specifiable properties form a closed set of low-level concerns, whereas the set of possible design decisions is open and ranges over different abstraction levels.
  • Design patterns offer a natural map between parts of a system and a set of design rules and even their rationale. The major limitation here is that by definition, patterns are solutions to common problems, whereas there are many idiosyncratic design problems in software projects.
So, how to we move towards sustainable design? Maybe we can draw a lesson from gardening. At this point in our history we have the technology to grow anything anywhere. But the best way for a garden to stay alive with minimum effort is to select plants that are a good match for a specific environment. Likewise with design decisions. Different projects and systems have characteristics that are the equivalent of different types of soil, luminosity conditions, humidity. So we have to figure out how to select and nurture the design decisions that will thrive in these conditions.

References

Monday, February 20, 2017

Anticipating Cross-Layer Attacks in Software Systems

By: Eunsuk Kang (eunsuk.kang@berkeley.edu), Aleksandar Milicevic (almili@microsoft.com), and Daniel Jackson (dnj@mit.edu)

Associate Editor: Mehdi Mirakhorli (@MehdiMirakhorli)

Abstraction is one of the most fundamental techniques in software design, but can be a double-edged sword, especially in systems where security is a major concern. Most systems are too complex to reason about all at once, and so developers tend to focus on one aspect of a system at a time, and ignore irrelevant details. However, an attacker need not respect the same abstraction boundaries, and may exploit details across multiple layers of a system.

One well-known example of this type of security risk can be found in OAuth, a popular authorization protocol. Many web-based implementations of OAuth have been shown to be vulnerable to attacks [4], despite the fact that the protocol itself was subjected to rigorous analyses (including formal verification [2][5]). In hindsight, this is not too surprising, since the protocol is a set of abstract rules that omits details about an underlying platform---deliberately so, since it makes the protocol reusable across multiple platforms! At the same time, it is exactly some of these details (e.g., various browser features and interaction with malicious agents on the web) that allowed an attacker to compromise the security guarantees that the protocol is designed to provide.

In general, security is a cross-cutting concern that cannot be easily contained within a single abstraction of a system. A security guarantee established at one layer may no longer hold once the system is elaborated with details during an implementation phase. Currently, it remains the developer’s burden to ensure that the high-level guarantee is preserved in the implementation---a challenging task even for those with security expertise, due to the complexity of modern platforms such as web browsers.

How do we reason about potential attacks across multiple abstractions of a system? Can we anticipate and address such attacks proactively, before building an implementation? How much detail about the underlying platform do we need to include to perform this analysis?

Poirot is a security analysis tool designed to help developers proactively detect what we call cross-layer vulnerabilities [3]. The tool takes three types of inputs: (1) a pair of models that describe a high-level design and a low-level platform, (2) a desired security property (typically expressed over the high-level model), and (3) a representation mapping, which describes how entities from the high-level model are to be represented in terms of their low-level counterparts. Given these inputs, Poirot exhaustively analyzes potential interactions between the two layers and produces scenarios that describe how an attacker may exploit some of these interactions to undermine the security property. The analysis can be carried out incrementally: Starting with an abstract model that represents an initial design of the system, the designer can elaborate a part of the model with a choice of representation, transforming the model into a more detailed one.

Figure 1: Partial Mapping from abstract Add to HTTP request.














A key aspect of Poirot is the representation mapping, which allows the developer to specify decisions about how an abstract design is to be implemented using concrete primitives. For example, when designing an online shopping cart, one may define an operation named Add, which corresponds to the action of adding a new item to a customer’s shopping cart. At the abstract design level, this operation contains two arguments, as shown in Figure 1: the identifier of the item to be added (i), and a token that represents a customer’s credential (t). In order to deploy the shopping application onto the HTTP protocol, our developer must eventually decide how the two parameters from Add are to be mapped to its counterparts in a concrete HTTP request.

In an early design stage, however, the developer may possess only partial knowledge about the system, and some of these decisions may be unknown. Poirot allows the representation mapping to be only partially specified, allowing the developer to express her uncertainty about design decisions and systematically explore different candidate mappings. For instance, Figure 1 depicts a partial mapping specification that lists only the origin and path of the Add URL, leaving unspecified how the item ID and token will be transmitted as part of a request; this, naturally, yields a space of possible mappings, each leading to a different implementation of the shopping cart (and each with its own security vulnerabilities).

Another important part of Poirot is the library of domain models that together describe a platform---collection of generic components, data structures, and libraries that are used to implement an application. In our example, this library would include generic models of various components of the Web, such as a web server, the HTTP protocol, a browser, and its various features (cookie handling, page rendering, scripts, etc.). Once constructed by domain experts, these models should be reusable for analysis of multiple systems in the same domain. Reusability is crucial for Poirot: If each developer had to write these models for every system to be analyzed, it would simply take too much effort! Thanks to our flexible composition mechanism [3], this library is also easily extensible, in that fresh knowledge about a feature or newly discovered vulnerability can be encoded as a separate model and inserted into the library for later use.

Poirot is most effective when applied early in a development process. As a case study, we had an opportunity to work with a startup called HandMe.In, which was building an online system for tracking personal items. In collaboration with the lead developer, we applied Poirot to discover a number of potential security vulnerabilities in the system, resulting in significant changes to the design. In another study, we used Poirot to model and analyze the security of IFTTT, an application that allows an end user to automate web services, and discovered a previously unknown attack that exploited an interaction between the IFTTT protocol and a browser vulnerability called login CSRF. Many of the attacks generated by Poirot exploited system details at multiple levels of abstraction, and would not have been found if the analysis were confined to a single layer.

There are examples of cross-layer vulnerabilities in a number of other domains besides web security. For instance, a program in a high-level language (e.g., Java) may inadvertently expose private data when translated into a low-level representation (bytecode) [1]. We are currently investigating whether Poirot can be applied to these types of domains as well. In addition, we are building an extension that will allow Poirot to not only produce potential vulnerabilities, but also generate a representation mapping that preserves a desired security property across multiple layers, by leveraging techniques from program synthesis.

Read more about Poirot in our FSE paper [3], or try out the tool at https://eskang.github.io/poirot/


References

[1] M. Abadi. Protection in programming language translations. In International Colloquium on Automata, Languages and Programming (ICALP), 1998.
[2] S. Chari, C. S. Jutla, and A. Roy. Universally Composable Security Analysis of OAuth v2.0. IACR Cryptology ePrint Archive, 2011:526, 2011.
[3] E. Kang, A. Milicevic, and D. Jackson. Multi-representational security analysis. In International Symposium on the Foundations of Software Engineering (FSE), 2016.
[4] S. Sun and K. Beznosov. The devil is in the (implementation) details: an empirical analysis of OAuth SSO  systems. In ACM Conference on Computer and Communications Security (CCS), 2012.
[5] X. Xu, L. Niu, and B. Meng. Automatic verification of security properties of OAuth 2.0 protocol with cryptoverif in computational model. Information Technology Journal, 12(12):2273, 2013.