Sunday, April 2, 2017

Scaling up the software development process

by Mark Basham, Diamond Light Source (@basham_mark)
Associate Editor: Christoph Treude (@ctreude)

 
What should I talk about at the first RSE conference in the UK?

New conferences don't start every day, and so when a community which you have been a part of for a while decides to run one and asks for abstracts, it's important to take part.  The community I’m talking about is the UK’s RSE or Research Software Engineer community, and it has been gathering strength over the last few years.  Originally started by the SSI (Software Sustainability Institute, www.software.ac.uk) the initial aim of the group was to create a job title and career path for the oft underappreciated group of researchers who support their research group by writing and maintaining, often critical, software.  Given this audience the question is really what should I talk about?  I contribute to several open source software packages, which could always do with more publicity (www.dawnsci.org, https://github.com/DiamondLightSource/Savu), but that felt like a shameless plug which would only interest a very minor group, and I wanted something which would appeal to a larger audience.

Project Management?

One more general area in which I have been working is project managing a large cross group software project at Diamond Light Source (www.diamond.ac.uk).  When I started working the project about 2 years ago, I had little experience of project managing such a complex project, although Diamond runs projects routinely.  I cannot understate that the amount I have learnt from my peers at Diamond whilst making the transition from developer to manager during this project is huge, and wasn't long before I was directed towards a selection of texts in the field.  The wealth of information available on the topic is absolutely astounding, and almost immediately influenced my management decisions, but coming from my background as a scientist, this wealth of knowledge had passed me by almost entirely.  It seemed like exactly the kind of thing on which to base a presentation.

The Books?

There are many good books about project management for software, but the ones which I discovered and found particularly relevant were the Mythical Man Month, Peopleware, and User Story Mapping.

The Mythical Man Month

This book is from the 70s but still has many relevant points for the modern developer.  The key take home for me was that if you want to write software well, as in tested, integrated and ready for others to use, it takes about 10 times longer than to just hack it up.

Peopleware

This is a great book, and I have enjoyed other books by the Atlantic Systems Guild, such as “Adrenaline Junkies to Template Zombies” and “Slack”.  The thing that resonated with me here was how damaging context switching was, and it is something I see regularly in our predictably overstretched team.

User Story Mapping

The last book I wanted to talk about was User Story Mapping, simply because getting requirements right is always difficult, and the book suggested a clear and concise method for doing this, as well as a nice discussion on the merits and pitfalls of user stories.

So what are you going to talk about?

OK, so three books do not make a presentation, but they do lay the foundations for a talk based around work we conducted at Diamond to try to get the aforementioned project back on track.  There was one deliverable which for many valid reasons was falling behind the rest of the project, and put the whole initiative in jeopardy.  Diamond took action. First we identified a team, and freed up their time to focus on the deliverable to stop them from having to multitask.  We moved them out of their usual open plan environment and into a new office for the duration of the intervention, increasing their workspace by around 70% and reducing off topic distractions.  Finally we trialled a new method for identifying requirements and planning the work based on the Story Mapping approach.  The cost of these interventions was high, but the result was significant, bringing the deliverable back on track with a generally high level of design, code quality and tests in place. This journey seemed like a perfect topic for a talk to a growing community which may meet similar challenges.

Final Title?

“Scaling up the software development process, a case study highlighting the complexities of large team software development” (https://arxiv.org/abs/1703.00958).  Personally I enjoyed preparing it, and the associated paper is available if you are interested in the results of the experiment.

Monday, March 20, 2017

IEEE January/February Issue, Blog, and SE Radio Summary

Associate Editor: Brittany Johnson (@brittjaydlf)

The January/February issue of IEEE Software continues to deliver new and exciting things happening in the software world. This issue covers a few different of topics, such as global software engineering and software requirements, but focuses on software engineering for the internet of things. A large subset of the articles in this issue

This issue features an article on requirements engineering titled "Guidelines for Managing Requirements Rationale" by Anil Kumar Thurimella, Mathias Schubanz, Andreas Pleuss, and Goetz Botterweck. In this article, the authors discuss both the challenges involved in requirements rationales along with detailed guidelines for creating them. Based on the article, the guidelines presented are best suited for organizations that require both transparency and well thought out decisions.

Most of the papers in this issue discuss challenges and concerns that need to be addressed when software engineering for the Internet of Things (IoT). Aside from the introduction to this issue's focus, titled "Software Engineering for the Internet of Things" by Xabier Larrucea, Annie Combelles, John Favaro, and Kunal Taneja, this issue features the following articles on the IoT:

Some articles provide general suggestions for IoT software engineering. For example, in "Scalable-Application Design for the IoT", Vankatesh and colleagues propose a modular approach to designing and implementing context-aware IoT applications. Given the dynamic nature of IoT applications and large amounts of raw data produced, scalability becomes an important issue to address early on in the design of IoT applications. The authors propose and analyze an approach that uses context engines, or small and simple functional units, to improve scalability in context-aware IoT applications.

In "Key Abstractions for IoT-Oriented Software Engineering", Zambonelli presents common feature across IoT software and applications to determine the important abstractions for engineering IoT software. The common features he discusses come from existing IoT research on designing and developing IoT systems. Based on the common features, such as "things" and software infrastructure "glue", the author identified the following key abstractions to consider during analysis, design, and development of IoT systems: Stakeholders and Users (Analysis), Requirements (Analysis), Groups and coalitions (Design), Avatars (Design and Development), and Smart Things (Development).

With any growth and change come challenges that must be overcome. In the article "A Roadmap to the Programmable World: Software Challenges in the IoT Era", Taivalsaari and Mikkonen provide a forward-looking roadmap to a programmable world and discuss some of the challenges software developers may face along the way.  The authors highlight the differences between IoT and typical software development, such as the system of devices that comprise an IoT application, and implications of these differences along with possible challenges. Some of the challenges highlighted in this article include considerations for multi-device programming, security of IoT systems, and dealing with the distributed, dynamic, and potentially migratory nature of IoT software.

IEEE Software Blog

The blogs posts from the past couple months continue to reflect on and discuss the state of software development and provide insights into new technologies and approaches relevant to the work we do. January's blog posts focus on recent innovations in new and old technologies. Most interesting is a new tool introduced called the Unified ASAT Visualizer (UAV) which is used for comparing multiple static analysis tools on a given project. This aims to developers decide whether the costs for using more than one tool (i.e. FindBugs + Checkstyle) are worth it. February touches on the theme from the IEEE Software focus topic, Internet of Things, but also discusses other topics such as rethinking the role of developers in OSS and sustainable software design.


SE Radio

Featured on SE Radio this issues is a range of topics. Most of the episodes in this issue focus on software integrity, from testing and dealing with bugs and failures to dealing with the challenges in debugging distributed systems. Invited guests include Florian Gilcher, Gerald Weinberg, James Whittaker, Donny Nadolny, Alexander Tarlinder, John Allspaw, and James Cowling. Also, SE Radio welcomed a new member to the SE Radio Team, Edaena Salinas

Sunday, March 12, 2017

The Spartanizer

Authors: Yossi Gil and Matteo Orrù, Technion, Israel

In this post, we would like to tell you about the Spartanizer, an automatic software-engineering tool developed at the Technion, that helps you code in the “Spartan Programming Style” that means that a programmer phrases his code’s statements in the same way a person of Laconia phrases his statements of speech:

saying the most in a few, clear words.
For example, the Spartanizer takes some verbose piece of code such as
public static String getCurrentLanguage()
{
  String language;
  if (getBooleanProperty("lang.usedefaultlocale"))
    {
language = Locale.getDefault().getLanguage();
    }
    else
    {
language = getProperty("lang.current", "en");
    }
  return language;
}
Figure 1: An example taken from jEdit
(found in the Jedit project), and converts it  into the concise spartan form;
public static String getCurrentLanguage() {
 return getBooleanProperty("lang.usedefaultlocale")
           ? Locale.getDefault().getLanguage()
           : getProperty("lang.current", "en");
}
Figure 2: The spartanized code of Figure 1
Transformations made by the Spartanizer are too many (currently over 150) to enumerate here; their main kinds are compile time evaluation when possible, elimination of dead code, simplification of boolean expressions, removing syntactic baggage such as superfluous call to super(), short-circuiting and early returns, identifying cases in which a for should have been used instead of a while, using standard short names whenever this makes sense.  A frequently recurring theme is refactorings inspired by the distributive rules of arithmetic, ab+ac⇒a(b+c) and ba+ca⇒(b+c)a i.e., identifying a common factor of two
syntactical elements and rewriting it in equivalent way with this common factor occurring only once. The Spartanizer is thus able to unite two adjacent conditionals, factor out commonalities of the two branches of a conditional. Yet another kind of refactorings carried out by the spartanizer are applying programming idioms whenever appropriate, e.g., following fluent API programming style.
The code generated by the Spartanizer is concise, and often looks more functional than imperative, and may need some getting used to conventions. Yet, our experience in teaching spartanization, and applying it in student and non-student projects, including several projects with a couple of hundreds of classes each, shows that after adjustment period is short, and programmers tend to prefer minimal ink to verbosity, as long as the minimal like is regular, i.e., follows a small set of recurring patterns. An empirical study [1] shows that the spartanization of code makes it measurably more predictable.
Spartanized code has unique lean appearance: few statements, minimal number of intermediates, adherence to idioms, use of conditional expressions (and in recent versions Java streams) rather than explicit, general purpose, control such as if, else, while, and for. In fact, spartanized code often assumes the concise and elegant looks of functional programs, even though the original version of the code was purely imperative. For example, this function  for comparing two lists of nodes  (drawn from the Spartanizer’s implementation) was created  automatically from its imperative version.


<N extends ASTNode> boolean same(final List<N> ns1, final List<N> ns2) {
  return ns1 == ns2 ||
    ns1.size() == ns2.size() &&
       range.from(0).to(ns1.size())
       .stream()
       .allMatch -> same(ns1.get(λ), ns2.get(λ)));
 }
Figure 3: An example of spartanized code taken from the Spartanizer project


Conversion can be done either interactively, or in batch form. The Spartanizer features an Eclipse plugin, which offers “spartanization” tips to the programmer. The programmer will reach the concise form by following the six tips offered by the Spartanizer to the jEdit function in Figure  Figure. Unlike many code analysis and code generation tools, the conversion from the verbose to the concise is continuous: one little change at a time. An experimental feature in the Spartanizer, available from version 2.8 is capable of doing the opposite conversion, from the concise to the verbose, making it possible for the programmer to use the mouse wheel for both zooming in and zooming out the code.
Alternatively, spartanization can be done in batch application. Unlike many deep analysis tools automatic spartanization is fast, with more than five thousand tips per minute on a contemporary computer.  Like other refactoring tools, the Spartanizer is optimistic, making e.g. some assumptions on the use of overloading, or that the order of evaluation of arguments to a function is immaterial to correctness.
Internally, the Spartanizer follows this extensible modular structure: spartanization tips are generated by small software modules, called tippers. Each of these generate its own kind of spartanization tips. Tippers are applied automatically or interactively by tip applicators, which apply automatic spartanization to user selected portions of the code (function, class, file or project).
The Spartanizer is available at the Eclipse marketplace, and open source on GitHub.


References
[1] Y. Gil and M. Orrú, “Code Spartanization,” in Proc. of SAC’17, the 32nd ACM Symposium on Applied Computing, Marrakesh, Morocco, April 3–7 2017.
[2] Y. Gil and M. Orrú, “The Spartanizer: Massive Automatic Refactoring,” in Proc. of 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering, Klagenfurt Austria, February 20-24, 2017.
[3] The Spartanizer at the Eclipse Marketplace. https://marketplace.eclipse.org/content/spartan-refactoring-0
[5] The Spartanizer website: http://spartan.org.il/

Monday, March 6, 2017

Helping Developers Configure Application-Level Caches for Web Applications Through DevOps

by Peter (Tse-Hsun) Chen, Queen's University (@petertsehsun)
Associate Editor: Sarah Nadi, University of Alberta (@sarahnadi)

Developers nowadays use application-level caching frameworks, such as Ehcache [1], to cache objects in object-oriented languages in memory. After the objects are cached, application servers do not need to retrieve the object data from external machines such as a database server, which can significantly improve the application performance. 
Alert: Application-level caching frameworks are only useful if you know what you want to cache. However, determining the best way to use such a caching framework may not always be easy. A large-scale web application may have hundreds of different types of objects. Now the challenge is how do we know which type of objects should be cached?
Developers need to manually configure the caching frameworks to enable cache for each type of object (e.g., enable cache for Student objects). Therefore, it is difficult to know what the best cache configuration is, especially given that the benefit of caching is directly related to how users are using the application. For example, enabling caching on frequently-modified objects, will not improve the application performance, but will actually slow down the application due to frequent cache renewal. 
How can we help improve the performance of web applications by finding better cache configurations? 
CacheOptimizer - finding optimal cache configurations by leveraging DevOps.DevOps tries to combine the developer and administrator world to further improve software development. Therefore, one main concept of DevOps is to assist software development by understanding user behaviour (such as knowing how the application is used in production). However, what kind of data can we analyze to recover the user behaviour? One potential way is to instrument the application to record the information that we need. However, this solution is not acceptable for applications running in production, since instrumentation will add too much overhead to the application. 
Instead, we propose a framework, called CacheOptimizer, that analyzes readily-available application runtime logs to automatically help developers find the optimal cache configurations when using application-level caching frameworks. 
Understanding user behaviour by combining static code analysis and runtime logs.CacheOptimizer analyzes web access logs to understand how users are accessing the application. Such access logs are automatically generated by web servers like Tomcat, so we do not add any extra overhead to the application. CacheOptimizer then links the logs to data accesses using static code analysis. For example, consider the following log line: 
200 127.0.0.1 /user/getDetails/peter
The log line contains information about a user request. Here, 200 represents the HTTP status code and 127.0.0.1 represents the IP of the user. Through static code analysis, we can find the method in the code that handles this particular user request. We can also uncover the type of data access that is called in this request-handler method (e.g., accessing user data in the database). Then, by combining the information in the log and the information that we obtained from the code, we know that the user request is reading detailed information about the user Peter from the database. In this simple example, since we are only reading user objects from the database, the optimal cache configuration would be to enable caching on the User class. 
In short, given the logs, CacheOptimizer can recover how the users are actually using the application and can suggest the optimal cache configuration for the application. 
CacheOptimizer brings significant performance improvement. We apply CacheOptimizer to three open source applications. We find that CacheOptimizer can significantly improve the throughput of the application (it can process more requests per second)! 
We compare CacheOptimizer with several different cache configurations. CacheAll, where we simply enable cache on all objects; DefaultCache, where we use the default cache configuration that already exists in the studied application; and NoCache, where we disable all the caches. The figure below shows the cumulative throughput when using different cache configurations in one of the studied applications. We can see that by using CacheOptimizer, we can achieve a much better throughput compared to all other configurations.
Assisting development using production data. Finding the optimal cache configuration is just one way that we can assist software development by leveraging production data. Due to the agile natural of modern software development, a lot of software design and development decisions are directly affected by how users are actually using the application. The idea of DevOps brings a new set of research challenges and interesting problems. As software practitioners and researchers, can we think of new ways to assist software development by embracing DevOps?
You can find more details about CacheOptimizer in our research paper [2].
[1] Ehcache. http://www.ehcache.org/. Last accessed Feb. 6 2017 [2] Tse-Hsun Chen, Weiyi Shang, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora. 2016. CacheOptimizer: helping developers configure caching frameworks for hibernate-based database-centric web applications. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016).

Sunday, February 26, 2017

Sustainable Software Design

by Martin Robillard, McGill University (@mp_robillard)
Associate Editor: Christoph Treude (@ctreude)

cross-posted from Martin Robillard's blog

There's a lot of interest in understanding software design as an activity. But what happens to the outcome of this activity? At one extreme, what happens on the whiteboard stays on the whiteboard (design is never explicitly captured). At the other extreme, design information is meticulously captured and archived, and then invalidated by the first decent refactoring. So without explicit effort, design knowledge just disappears. The consequence is that without design information developers will have to perform ignorant surgery. The temptation for ignorant surgery has to be related to the cost of maintaining accessible design information. When a particular design is expensive to describe and maintain, its description and rationale is at risk of being lost, no matter how awesome the underlying design ideas are.

So why don't we explicitly consider how expensive a particular design decision will be to capture and maintain consistent with the code, before adopting it?

There exist quality models for software design. They include attributes like reusability, flexibility, understandability. But, as far as I can tell we don't yet have an attribute that captures how cost-effective it is to describe a set of design decisions over time. That attribute is what I would call sustainability.

There's no techno-fix to make design more sustainable. It's a complex, multi-faceted problem. In this paper I review the areas of software development research and practice that relate to design sustainability and explain why they are not silver bullets. These include:
  • Modularity can contribute to design sustainability when design decisions map to code organization concerns. Unfortunately many design decisions don't have much to do with modular decomposition. There's also the issue that many concerns cannot easily be modularized.
  • The relation between documentation and sustainable design is paradoxical. On one hand, good documentation can help sustain a design over time. On the other, documentation is expensive, which is a direct factor of unsustainability. The idea of fully self-documenting systems is appealing, but impractical. Parnas and Clements compared it to the Philosopher's Stone.
  • Programming language constructs are another tool to manage sustainability. For example assert statements are a cheap and relatively user-friendly way to capture simple designs rules and assumptions. There's also research to develop language support for stating and verifying more complex design-level properties, such as immutability. The related challenge for design sustainability is that that language-supported specifiable properties form a closed set of low-level concerns, whereas the set of possible design decisions is open and ranges over different abstraction levels.
  • Design patterns offer a natural map between parts of a system and a set of design rules and even their rationale. The major limitation here is that by definition, patterns are solutions to common problems, whereas there are many idiosyncratic design problems in software projects.
So, how to we move towards sustainable design? Maybe we can draw a lesson from gardening. At this point in our history we have the technology to grow anything anywhere. But the best way for a garden to stay alive with minimum effort is to select plants that are a good match for a specific environment. Likewise with design decisions. Different projects and systems have characteristics that are the equivalent of different types of soil, luminosity conditions, humidity. So we have to figure out how to select and nurture the design decisions that will thrive in these conditions.

References