Sunday, November 6, 2016

When and Why Your Code Starts to Smell Bad

By: Michele Tufano, College of William and Mary, USA (@tufanomichele)
Associate Editor: Sonia Haiduc, Florida State University, USA (@soniahaiduc

Have you ever tried to modify a large class with too many methods? And what about those unnecessarily complicated multi-level nested loops? How did you feel about that? Those are code smells, and can make the evolution of your system a nightmare.

More formally, code smells are the clues that indicate violations of fundamental design principles and negatively impact design quality [1]. Several studies demonstrated the negative impact of code smells on change- and fault-proneness [2], software understandability [3] and maintainability [4] [5].

The question is: when and why those code smells are introduced? Common wisdom suggests that they are introduced during maintenance and evolution activities on software artifacts. However, such a conjecture has never been empirically verified. In this work, we empirically answer such questions by analyzing the complete change history of 200 Java software systems, belonging to three ecosystems - Apache, Android and Eclipse. We considered five types of code smells: Blob, Complex Class, Class Data Should Be Private, Functional Decomposition, and Spaghetti Code [1].


When? - To answer this question we checked out every single commit of the analyzed systems and ran a code smell detector (i.e., DECOR [5]) on the Java classes introduced/modified in the commit. We also computed the value of quality metrics on such classes in order to obtain evolutionary metric trends. These steps allowed us to (i) understand after how many modifications on a software artifact code smells are usually introduced, and (ii) compare the metric trends of clean and smelly software artifacts, looking for significant differences in how fast their metrics’ values increase or decrease.

Curiosity - How much did it take? Eight weeks on a Linux server with
7 quad-core 2.67 GHz CPU (28 cores) and 24 Gb of RAM.

Why? - Here we wanted to understand why developers introduce code smells. In particular, does their workload influence the probability of introducing a code smell? What about the deadline pressure for releases?  Which are the tasks (implementation of new features, bug fixing, refactoring, etc.) that developers perform when introducing code smells? To this aim, we tagged the commits that introduced the smells. To perform such an analysis, we needed to identify those commits responsible for the introduction of a code smell. When the code smell is introduced during the creation of the software artifact, trivially, we just analyzed the first commit, but what about code smell instances that appear after several commits? Which commits should we analyze? If we analyze only the one in which the code smell is identified, we would discard all the change history that led to a smelly artifact! For this reason we defined smell-introducing commits as commits which might have pushed a software artifact toward a smelly direction, looking at discriminating metrics’ trends. For example, in the following figure, commits c3, c5 and c7 are identified as smell-introducing commits and tagged as such.


When? - While common wisdom suggests that smells are introduced after several activities made on a code component, we found instead that such a component is generally affected by a smell since its creation. Thus, developers introduce smells when they work on a code component for the very first time.

However, there are also cases where the smells manifest themselves after several changes were performed on the code component. In these cases, files that will become smelly exhibit specific trends for some quality metric values that are significantly different than those of clean (non-smelly) files.

For example, the Weighted Method Complexity (WMC) of classes that eventually become Blobs, increases more than 230 times faster with respect to clean classes, considering the same initial development time.

Why? - Smells are generally introduced by developers when enhancing existing features or implementing new ones. As expected, smells are generally introduced in the last month before a deadline, while there is a considerable number of instances introduced in the first year from the project startup. Finally, developers that introduce smells are generally the owners of the file (i.e., they are responsible for at least 75% of the changes made to the file) and they are more prone to introducing smells when they have higher workloads.


[1] M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts, Refactoring: Improving the Design of Existing Code. Addison-Wesley, 1999.
[2] F. Khomh, M. Di Penta, Y.-G. Gueheneuc, and G. Antoniol, “An exploratory study of the impact of antipatterns on class change- and fault-proneness,” Empirical Software Engineering, vol. 17, no. 3, pp. 243–275, 2012.
[3] M.Abbes, F.Khomh, Y.-G. Gueheneuc, and G.Antoniol, “An empirical study of the impact of two antipatterns, Blob and Spaghetti Code, on program comprehension,” in 15th European Conference on Software Maintenance and Reengineering, CSMR 2011, 1-4 March 2011, Oldenburg, Germany. IEEE Computer Society, 2011, pp. 181–190.
[4] D. I. K. Sjøberg, A. F. Yamashita, B. C. D. Anda, A. Mockus, and T. Dyba, “Quantifying the effect of code smells on maintenance effort,” IEEE Trans. Software Eng., vol. 39, no. 8, pp. 1144–1156, 2013.
[5] Michele Tufano, Fabio Palomba, Gabriele Bavota, Rocco Oliveto, Massimiliano Di Penta, Andrea De Lucia, and Denys Poshyvanyk. 2015. When and why your code starts to smell bad. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15), Vol. 1. IEEE Press, Piscataway, NJ, USA, 403-414. Preprint available at:
[6] N.Moha, Y.-G. Gueheneuc, L. Duchien, and A.-F.L. Meur, “DECOR: A method for the specification and detection of code and design smells,” IEEE Transactions on Software Engineering, vol. 36, pp. 20–36, 2010.

No comments:

Post a Comment