Monday, January 30, 2017

The Unified ASAT Visualizer (UAV), a tool for comparing multiple ASATs on your Java projects


Authors: Moritz Beller (@Inventitech), Andy Zaidman (@azaidman), Tim Buckers, Clinton Cao, Michiel Doesburg, Boning Gong, Sunwei Wang
Editors: Nikolaos Tsantalis (@NikosTsantalis), Mei Nagappan (@meinagappan)

TL;DR: You want to compare whether using multiple Automated Static Analysis Tools (ASATs) makes sense for your projects, say whether the additional warnings of using Checkstyle on top of FindBugs are worth the increased maintenance costs. We have implemented a prototypical tool to visualize this in an intuitive manner, called UAV. Go check it out on an example project. Here follows the full story.

UAV in action

When we look at the software engineering landscape today, we see that state-of-the-art projects already make use of Automated Static Analysis Tools (ASATs) in some form or another: Either as traditional tools such as JSLint, PVS-Studio, FindBugs, Google's Error-Prone, as web services such as Coverity Scan, or as compiler-embedded analyses like the ones you get from clang. In fact, ASATs have become so numerous for any given programming language that it is difficult for us as project managers to decide which ones my project the most benefits. While deciding on one tool might still be simple, deciding on an effective combination of tools is even more difficult, even though we know that combining multiple ASATs would unleash their potential [1]. The complementary warnings that different ASATs can find often combine nicely think for example of FindBug's bug-finding capabilities and the code readability-centered CheckStyle.

However, many developers understandably refrain from running more than one ASAT mainly because of two reasons: 
1. It is difficult to compare and understand the strengths of multiple ASATs on your own project. Currently, in most cases, you would have to go through the (possibly) lengthy list of findings which each tool emits. This is tedious work since the warnings are not standardized, so it is difficult to tell if the two tools indeed find different warnings or not. 
2. Sifting through ASAT warnings in general is hard work. You don't want to make your life harder by including more tools than necessary. We know that many ASAT warnings are not important to developers, so finding (or configuring) an additional ASAT that reports only interesting warnings for your project is crucial. However, this is again made difficult because there is no common and automatically applicable classification between the warnings that different ASATs emit. As an end result of these complications, many projects still only employ one ASAT with practically no further customization [1].

With UAV, the Unified ASAT Visualizer, we created an ASAT-comparison tool with an intuitive visualization that enables developers, researchers, and tool creators to compare the complementary strengths and overlaps of different Java ASATs. UAV’s enriched treemap and source code views provide its users with a seamless exploration of the warning distribution from a high-level overview down to the source code. We have evaluated our UAV prototype in a user study with ten second-year Computer Science (CS) students, a visualization expert and tested it on large Java repositories with several thousands of PMD, FindBugs, and Checkstyle warnings.

TS;WM (Too Short; Want more): We have a tool paper that goes into many of UAV's implementation details [2].


[1] Moritz Beller, Radjino Bholanath, Shane McIntosh, Andy Zaidman: Analyzing the State of Static Analysis: A Large-Scale Evaluation in Open Source Software. In 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER), Osaka (Japan), 2016.

[2] Tim Buckers, Clinton Cao, Michiel Doesburg, Boning Gong, Sunwei Wang, Moritz Beller and Andy Zaidman: UAV: Warnings from Multiple Automated Static Tools at a Glance. In 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER), Klagenfurt (Austria), 2017.

Monday, January 16, 2017

IEEE November/December Issue, Blog, SE Radio Summary

November/December Issue

IEEE Software Magazine

The November/December issue of IEEE Software offers a variety of relevant and interesting topics in the software world. From hot topics like crowdsourcing and agile to thought-invoking discussions on how research translates to practice, this issue spans a wide range of topics. Tying together all the articles in this issue is an article on telling the story of computing and the role computer plays in the art of story telling. We as software engineers are artists; specializing in the art of technology and "using our software and our hardware as our brush and our canvas".

Featured in this issue are two articles on the changes in the software world that affect that developer and end user:
  • "A Paradigm Shift for the CAPTCHA Race: Adding Uncertainty to the Process" by Shinil Kwon and Sungdeok Cha, where the authors propose ways to improve CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) challenges for increased human ability and decreased bot ability to solve these challenges; and 
  • "Examining the Rating System Used in Mobile-App Stores" by Israel J. Mojica Ruiz, Meiyappan Nagappan, Bram Adams, Thorsten Berger, Steffen Dienst, and Ahmed E. Hassan,  in which the authors explore how accurately user ratings in app stores maps to actual user satisfaction levels with mobile apps.

A large portion of the papers in this issue discuss the artistry of the software architect and how the role of software architect has been changing, and will continue to change, with the changes in technology:


On one side, as technology changes, the importance of the role of the software architect increases. In "The Changing Role of the Software Architect," Editor in Chief Diomidis Spinellis discusses this phenomena in some detail. As software evolves to play a more ubiquitous role in our lives and store more critical and personal information, the design of our software and systems becomes even more vital to the potential for quality, secure transactions. For example, software architecture plays a direct role in the ability for attackers to find and manipulate attack surfaces, or the places where enemies can target their attacks on a given system. This is such an important topic that research has been devoted to approximating and minimizing attack surfaces [1, 2, 3]. Although approximating an attack surface isn't necessarily an architecture problem, minimizing them is. Having the power to determine the design of a system, especially a critical system, is one that should not be taken lightly.

But as Benjamin Parker warned Spider-man, "with great power comes great responsibility". If software architects are becoming more important to the software development and maintenance process, it naturally follows that responsibilities can, and probably should, change. But how? Articles in this issues make some suggestions. For example, Rainer Weinreich and Iris Groher propose one change to the responsibilities of the software architect in their article "The Architect's Role in Practice. From Decision Maker to Knowledge Manager?". The authors interviewed practitioners to learn about how the role of the architect has transformed. Architects are typically tasked primarily, if not solely, with making decisions regarding the design of the target system. However, they discovered that there are additional responsibilities that come with being a software architect, such as advisor and knowledge manager. All the practitioners the authors interviewed agreed that when it comes to knowledge management it is particularly important to document project-specific decisions. With the changes to the software architect role, there is a growing need for tools and guidelines to support their daily activities. Are we up for the challenge??

IEEE Software Blog

In the past couple months, the IEEE Software Blog covered some interesting and practically relevant topics. New to the blog are postmortems, modeled after Postmortems in Gamasutra.com, where we give companies an opportunity to discuss what is working and what challenges remain for software developers. December features the company Deducely. Along the same lines, there are blog posts regarding various aspects of the software development process, including using creativity in requirements engineering and how to identify and avoid code smellsAlso featured in the November/December blog entires is a blog on the panel titled "The State of Software Engineering Research", which was held last year at FSE 2016.

SE Radio

Featured for this issue of IEEE Software on SE Radio are topics ranging from soft skills, such as salary negotiation, to hard skills, like site reliability engineering and software estimation. Invited guests include Steve McConnell, Sam Aaron, Josh Doody, Björn Rabenstein, Gil Tene, and Peter Hilton. Also, SE Radio welcomed two new members to the SE Radio team: Marcus Blankenship and Felienne Hermans


References
[1] Theisen, C., Herzig, K., Morrison, P., Murphy, B., & Williams, L. (2015, May). Approximating attack surfaces with stack traces. In Proceedings of the 37th International Conference on Software Engineering-Volume 2 (pp. 199-208). IEEE Press.
[2]Bartel, A., Klein, J., Le Traon, Y., & Monperrus, M. (2012, September). Automatically securing permission-based software by reducing the attack surface: An application to android. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (pp. 274-277). ACM. 
[3] Manadhata, P. K., & Wing, J. M. (2011). An attack surface metric. IEEE Transactions on Software Engineering37(3), 371-386.

Sunday, January 8, 2017

BarraCUDA in the Cloud

by W.B. Langdon, UCL and Bob Davidson, Microsoft
Associate Editor: Federica Sarro (@f_sarro), University College London, UK 

What is this, flying fishes? Well no. BarraCUDA is the name of a Bioinformatics program and the cloud in question is Microsoft’s Azure, which is in the process of being upgraded with copious nVidia K80 tesla GPUs which support CUDA in instances of virtual machines. BarraCUDA has been around for a few years [1]. It is a port of BWA [2] which takes advantage of the massive parallelism available on graphics hardware (GPUs) to greatly speed up approximate matching of millions of short DNA strings against a reference genome. For example, the human reference genome [3]. Approximate matching is necessary, because of noise but primarily because the medical purpose of many DNA scans is to reveal differences between them and “normal” (i.e. reference) DNA. A typical difference is to substitute one character for another, but tools like BarraCUDA also find matches where a character is inserted and where one is deleted. Although there are many sources of DNA data, BarraCUDA and similar programs are targeted at strings generated by “Next Generation Sequencing” (NGS) machines. These are amazing devices. A top end NGS machine is now capable of generating more than a billion DNA strings, sequences of A, C, G or T letters. Part of the trade-off for this speed is the strings are short (typically a hundred letters long) and noisy. The first step is to find where the short fragments of DNA came from by aligning the strings against a reference genome. To account for the various sources of noise, NGS is usually run with three fold redundancy and sometimes a particularly important part of a person’s genome may be scanned ten or more times. Given multiple alignments to the same part of the reference genome, it becomes possible to look for consistent variations.

BWA, BarraCUDA and Bowtie are members of a family of Bioinformatics tools which have proved successful because they are able to compress the human reference genome into less than 4 gigabytes of RAM, making it possible to run an important part of the DNA analysis tool chain on widely available computers. Indeed in the case of BarraCUDA, GPUs with 4GB are also widely available. Recently BarraCUDA was optimised using genetic improvement[4,5] (see blog posting February 3, 2016).  This updating prompted the question was it possible to use BarraCUDA with epigenetics data.

To grossly oversimplify, whilst (to a good approximation) all the cells in your body contain the same DNA, what makes your cells different from each other is how that DNA is used. It is thought that to a large extent how DNA is enabled and disabled is controlled by epigenetic makers on that DNA itself. These epigenetic markers differ between cells. Indeed the markers change not only between cells but also with the person’s age and factors outside the cell. Since this is not fully understood, the study of epigenetics, particularly how it relates to disease is a very active topic. Much of the Next Generation Sequencing technology can be reused by epigenetics. However when matching epigenetic sequences against a reference, the reference is twice the size of the DNA reference. Fortunately this need has coincided with the launch of GPUs with larger memory (e.g. the Tesla K40 has 12GB). Which in turn has coincided with the introduction of Azure cloud nodes with multiple K40s or K80s. Recently we have been benchmarking [6] BarraCUDA on epigenetics data supplied by Cambridge Epigenetics on Azure nodes.

 
Data from Nvidia

At 30 Nov 2016 there were 1519 GPU articles in the USA National Library of Medicine
(PubMed). 1221 (80%) since the end of 2009.


References
[1]  P. Klus et al., BarraCUDA. BMC Res Nts, 5(27), 2012.
[2]  Heng Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform.  Bioinformatics 2009, 25(14):1754-1760.
[3]  Initial sequencing and analysis of the human genome. Nature 409, 6822, (15 Feb 2001), 860–921.
[4]  W.B. Langdon. Genetically improved software. In Amir H. Gandomi et al., editors, Handbook of Genetic Programming Applications, chapter 8, pages 181–220. Springer, 2015.
[5]  W.B. Langdon, Brian Yee Hong Lam, M. Modat, J. Petke, M. Harman. Genetic Improvement of GPU Software. Genetic Programming and Evolvable Machines. Online first.
[6] W.B. Langdon, A. Vilella, Brian Yee Hong Lam, J. Petke and M. Harman, Benchmarking Genetically Improved BarraCUDA on Epigenetic Methylation NGS datasets and nVidia GPUs. In Genetic Improvement 2016, (GECCO 2016) workshop, pages 1131-1132, 20-24 July, Denver.

You might also enjoy reading
Genetic Improvement, IEEE Software blog, February 3, 2016
GPGPUs for bioinformatics, Oxford Protein Informatics Group, April 17, 2013
Genome Sequencing in a Nutshell, Deborah Siegel and Denny Lee, May 24, 2016