Sunday, October 23, 2016

Tabs or Spaces? What can we find by looking at white space?

By: Dale Fletter, California State University, Sacramento. USA (
Associate Editor: Bogdan Vasilescu, Carnegie Mellon University. USA (@b_vasilescu)

The TV show Silicon Valley had a scene that satirized a common cultural rift among coders, spaces or tabs? [1] What difference does it make? [2] I argue that it is very important and leads us to an understanding of how the software engineering community has not yet matured its understanding of the underlying processes and artifacts that are at the core of the work.

Compilers for most languages, even those that use block structured statements, ignore most white space. To them a tab or spaces is the same. Software engineers many decades ago found value in "pretty printing", that is inserting white space with the intent of visually communicating the underlying blocks in the structures. [3] To the person writing the code, both will render identical code on their screen. Those who prefer the tab will cite the efficiency with which a coder can transfer their logic into the source code. Those who prefer spaces cite the control they have over the code presentation.

The difference between them becomes clear as soon as one does a cut-and-paste from their editor/IDE into a word processing document. Those text editors were designed to bring forward the metaphor of the typewriter tab settings where the typist set the tabs manually. When one inserts text, it is a shortcut to manual entry. To ensure the same presentation in the word processing document as in the text editor used for composition requires the software engineer to set the tab stops to match the IDE and choose a uni-space font which mimics the text editor. Failing to do either will typically render the presentation of the code in a way that significantly differs from what the coder intended which will include the visual indication of blocks.

If the only process that depended upon the visual presentation of the code were the compiler, for most languages it would be irrelevant. But the software engineer is a reader of code and the speed, accuracy and completeness with which the program is communicated is a key contributor to code comprehension. To change that presentation is to ignore the contribution the coder made to ensuring a particular layout on their device. This more expansive way of looking at code not just as a means to instruct the machine but as a human utterance worthy of consideration on its own merits is an exciting new direction of research. [4]

It is also possible this is not a transitory effect of personal preference either. It parallels an equally divisive style issue in legal writing. Due to the limitations of the typewriter and the need for comprehension in legal writing, the convention of double space after a period was adopted. This artifact has continued on into the present day despite the objection of typographers who point out that with a proportionally spaced font that this is not needed. It is not just the local choice of each law office. The use of the double space following a period is found in different legal publishers as well. 

One approach is seen in many IDEs that will do an automatic format of code to conform to style rules for the presentation of blocks. There are problems in this approach. First, it leaves the choice to do the format to the coder. If it is a choice some coders will use the tool and convert to the standard for that IDE. But as we know code is often edited with different editors by different people on different platforms. The result is a source code that has a mixture of tabs and space. Another problem is that those coders who invest a great deal of time to insert spaces to achieve a visual effect can have their work undone by the well intentioned rules. But for the coder who believes they achieve greater efficiency by using a tab instead of 4 spaces, this is an imperfect compromise.

Is there any difference in coder productivity when the tab key is used? or is this a way of expressing a personal choice in a task that is limited in its humanity and expressiveness? Is there a real time savings from those who insert a tab character from those who have developed the nervous tic of hitting the space bar four times instead? This may be worthy of analysis since a coder may need to do this many times in the course of writing and editing code.

If the informative use of white space were limited to marking off blocks of code, this might be too insignificant an issue to justify the attention. But we all know those coders who find very creative ways to draw attention to parallel structures in their code with only the use of spaces and line feeds. This visual information is as important to retain as is the code itself. This suggests that the creation of a code file is not merely the recording of a string of text but instead a form of visual communication as well.

I believe this bit of comic relief informs the lack of understanding that is still pervasive in the community. Writing code that is understandable to a human is more important than merely marking blocks of code. A more complete look at how coders present their code can illustrate what they have found to be important clues to increased readability. Those who still prefer tabs seem to be blasé about the loss of control over presentation depending on the editor that is used to look at the text string.  

[3] Programming with small blocks, Mark K. Joseph. SIGSOFT Softw. Eng. Notes 9, 5 (October 1984), 28-42.
[4] New initiative: the naturalness of software. In Proceedings of the 37th International Conference on Software Engineering - Volume 2 (ICSE '15), Vol. 2. IEEE Press, Piscataway, NJ, USA, 543-546.
[5] A metric for software readability. Raymond P.L. Buse and Westley R. Weimer. In Proceedings of the 2008 International Symposium on Software Testing and Analysis (ISSTA '08). ACM, New York, NY, USA, 121-130.

Sunday, October 16, 2016

JDeodorant: Clone Refactoring Support beyond IDEs

By: Nikolaos Tsantalis, Concordia University, Montreal, Canada (@NikosTsantalis)
Associate Editor: Sonia Haiduc, Florida State University, USA (@soniahaiduc)

Why should I bother to refactor duplicated code in my project?
Duplicated code (also known as code clones) can be a potential big trouble for the maintenance of your project. Not all clones are harmful [1] or equally harmful [2]. However, if you find yourself repeatedly applying the same changes (e.g., bug fix) in multiple places in your code, you should definitely consider to refactor [3] this duplication. Merging such clones into a single copy will make future maintenance faster and less error-prone.

Why is tool support necessary for the refactoring of clones?
Software clones tend to diverge after the initial copy-and-paste. This happens because developers have to adjust the copied code to a new context and/or different requirements. Therefore, clones tend to have non-trivial differences as the project evolves (e.g., different methods being called, or completely different logic in parts of the implemented algorithm). Unfortunately, the current state-of-the-art IDEs support the refactoring of clones with trivial differences (i.e., identical clones with variations in whitespace, layout, comments, and variable identifiers).

JDeodorant [4] is an Eclipse plugin that can help developers to inspect the differences between clones in Java projects, and refactor them safely, if possible. More specifically, JDeodorant applies a sophisticated static source code analysis to examine whether the parameterization of the differences appearing in the clones is free of side effects. We refer to this feature as refactorability analysis [5]. If all examined preconditions pass successfully, JDeodorant can proceed with the refactoring of the clones.

Feature 1: Clone import and clone group exploration
  • JDeodorant can import results from 5 popular clone detection tools, namely CCFinder, ConQAT, NiCad, Deckard, and CloneDR. In addition, JDeodorant can analyze any pair of methods selected by the developer from the Eclipse Package Explorer view.
  • While importing the results, JDeodorant automatically checks the syntactic correctness of the clone fragments, and fixes any discrepancies by removing incomplete statements and adding the missing closing brackets from incomplete blocks of code. Additionally, the tool filters out the clones that extend beyond the body of a method (i.e., class-level clones).
  • The imported results are presented to the user in a tree-like view, as shown in Figure 1. The clones are organized into groups based on their similarity (i.e., a clone group contains two or more clone instances).
  • The clone groups are also analyzed to discover subclone relationships between them. Group A is a subclone of group B, if every clone instance in A is a sub-clone (i.e., a partial code fragment) of an instance in B. The subclone information appears as a link in the last column of the clone group table to help the user navigate between clone groups having such a relationship.
  • By clicking on the “Show only clone groups for the files opened in the editor” checkbox, the user can filter the clone groups table to display only the clones being relevant to the context (i.e., appearing in the files) he/she is currently working on.
  • All clones are constantly monitored for modifications. If the developer refactors or updates some code associated with the imported clones, the clone group table is automatically updated by disabling the clones affected by the modification (disabled clones appear with strikethrough text), and by re-computing the offsets of other clones belonging to the same modified Java files (shifted clones appear with text highlighted in green). In this way, the user can continue with the inspection and refactoring of other clone groups without having to import new results from external tools. 

Figure 1: Presentation of the imported clone detection results to the user

Feature 2: Clone visualization and refactorability analysis
The user can right-click on any pair of clones from the same clone group, or any pair of methods from the Eclipse Package Explorer and select “Refactor Duplicated Code…” from the popup menu. The outcome of the clone pair analysis is presented to the user as shown in Figure 2.

Figure 2: Clone pair visualization and refactorability analysis

The analyzed clone fragments appear as two side-by-side trees, where each pair of tree nodes in the same row represents a pair of mapped statements in the first and second clone fragment, respectively. The user can inspect the clones in a synchronized manner, by expanding and collapsing the nodes corresponding to control statements (i.e., loops, conditionals, try blocks). The code is highlighted in 3 different colors to help the developer inspect and understand the differences between the clones.
  • Yellow: Represents differences in expressions between matched statements. These expressions are evaluated to the same type, but have a different syntactic structure or identifier.
  • Red: Represents unmapped statements that do not have a matching statement in the other clone fragment (also known as clone gaps).
  • Green: Represents semantically equivalent statements, i.e., statements of different AST types performing exactly the same functionality. In Figure 2, we can see a for loop in the left clone matched with a while loop in the right clone having the same initializer and updater as separate statements.
By hovering over a pair of statements highlighted in yellow, a tooltip appears providing semantic information about the type of each difference based on the program elements (e.g., variables, method calls, literals, class instantiations) appearing in the difference. Currently, JDeodorant supports over 20 difference types, including some more advanced ones, such as the replacement of a direct field access with the corresponding getter method call, and the replacement of a direct field assignment with the corresponding setter method call. In addition, the tooltip may also include information about precondition violations, if the expressions appearing in the differences cannot be safely parameterized.

Semantically equivalent differences, and renamed variables are not examined against preconditions, since they should not be parameterized. JDeodorant automatically detects the local variables that have been consistently renamed between the clone fragments (as shown in the bottom-right side of Figure 2).

Feature 3: Clone refactoring
Based on the location of the clones, JDeodorant determines automatically the best refactoring strategy:

1.  Extract Method (clones belong to the same Java file)
2.  Extract and Pull Up Method (clones have a common superclass)
a) Introduce Template Method (clones call local methods from the subclasses)
b) Extract Superclass (clones have an external common superclass, or the common superclass has additional subclasses)
3.  Introduce Utility Method (clones access/call only static fields/methods)

As shown in Figure 3, JDeodorant can generate a detailed preview of the refactoring to be applied, where the developer can inspect all the changes that will take place at a fine-grained level. Finally, the user can undo and redo the applied refactorings, since they are recorded in the change history of Eclipse.

Figure 3: Clone refactoring preview

JDeodorant is an open-source project hosted on GitHub.
Videos demonstrating the use and features of JDeodorant can be found on YouTube.

[1] Cory J. Kapser and Michael W. Godfrey, ""Cloning considered harmful" considered harmful: patterns of cloning in software," Empirical Software Engineering, vol. 13, no. 6, pp. 645-692, December 2008.
[2] Foyzur Rahman, Christian Bird, and Premkumar Devanbu, "Clones: what is that smell?," Empirical Software Engineering, vol. 17, no. 4-5, pp. 503-530, August 2012.
[3] Emerson Murphy-Hill, Don Roberts, Peter Sommerlad, and William F. Opdyke, "Refactoring [Guest editors' introduction]," IEEE Software, vol. 32, no. 6, pp. 27-29, November-December 2015.
[4] Davood Mazinanian, Nikolaos Tsantalis, Raphael Stein, and Zackary Valenta, "JDeodorant: Clone Refactoring," 38th International Conference on Software Engineering (ICSE'2016), Formal Tool Demonstration Session, Austin, Texas, USA, May 14-22, pp. 613-616, 2016.
[5] Nikolaos Tsantalis, Davood Mazinanian, and Giri Panamoottil Krishnan, "Assessing the Refactorability of Software Clones," IEEE Transactions on Software Engineering, vol. 41, no. 11, pp. 1055-1090, November 2015.

Sunday, October 9, 2016

What's in Repeated Requirements Research for Practitioners?

By: Nan Niu, University of Cincinnati
Associate Editor: Mehdi Mirakhorli (@MehdiMirakhorli)

For many things said in requirements engineering (RE) research, practitioners may know whether they can apply the research in their current or future projects, and yet they may not know. For the things said and done (i.e., evaluated by the RE researchers), practitioners can gain a more detailed understanding about the research's scope of applicability. For the same research done repeatedly, practitioners are much more informed: knowing better about under which conditions the research can be applied, under which it cannot, what benefits are expected and how much, what limitations there are and how to overcome them, and more importantly, what's in common and what varies when the research is repeated.

In a replication study, we re-tested the work by Easterbrook and his colleagues [1]. They reported that, when approaching a conceptual modeling problem, it was better to build many fragmentary models representing different perspectives than to attempt to construct a single coherent model. Their case study, illustrated by the following figure, was carried out by two teams using different processes to build i* models for the Kids Help Phone (KHP) organization [1]: The global (G) team worked together whereas the viewpoints (V) team worked individually on separate, loosely coupled, yet overlapping models before explicitly merging their viewpoints together.

The results? The V team gained a richer domain understanding than the G team. The take-away for practitioners? Adopting viewpoints in requirements modeling, especially for multi-stakeholder, socio-technical, large-scale, distributed projects. Well, not that fast. The V team's richer domain understanding was gained, according to [1], at the cost of slowness. That is, the viewpoints process was so slow that no merged model was ever produced. That's why only the model slices (shown in the above figure) were presented to the KHP stakeholders. Viewpoints or not? To practitioners, the results in [1] were mixed at best.

How have things changed since [1]? We took theoretical replication's advantage to improve the study design. Among the improvements, we paid specific attention to i* modeling tools that were developed in the past decade. We asked our G and V teams to use OpenOME [2] in constructing their models for the Scholar@UC project [3]. Our results? Not only did our study confirm the deeper domain understanding achieved by the V teams, but the viewpoints modeling was no longer slower. In fact, with OpenOME, the 2 V teams in our study spent less time in generating the final, integrated models than the 2 G teams.

The take-away from our repeated research? Viewpoints-based requirements modeling is a valuable approach to adopt for practitioners in many domains, such as IoT and smart cities, because the process leads to better understandings in terms of hidden assumptions, stakeholder disagreements, and new requirements. With the tech transfer of research tools like OpenOME, the more valuable process also becomes faster and more practical.

  1. S. Easterbrook, E. Yu, J. Aranda, Y. Fan, J. Horkoff, M. Leica, and R. Qadir, "Do Viewpoints Lead to Better Conceptual Models? An Exploratory Case Study," in 13th IEEE International Requirements Engineering Conference (RE) Paris, France: IEEE Computer Society, 2005, pp. 199-208.
Interested in our study? We welcome your feedback and invite you to replicate.
  • N. Niu, A. Koshoffer, L. Newman, C. Khatwani, C. Samarasinghe, and J. Savolainen, "Advancing Repeated Research in Requirements Engineering: A Theoretical Replication of Viewpoints Merging," in 24th IEEE International Requirements Engineering Conference (RE) Beijing, China: IEEE Computer Society, 2016, pp. 186-195. (pre-print)
  • Our replication packet hosted by Scholar@UC: