The author of this blog post is a lecturer, teaching different subjects related to software engineering. While I mostly focus on studying and teaching software evolution and related topics, I’ve been also supervising capstone projects of our third-year undergraduate students. Back in 2011 we have been struggling with a peculiar problem: the students were supposed to develop software according to the European Space Agency guidelines [1] , and while the interviews with the students suggested that some groups were adhering to the guidelines better than the others, we had no idea how frequently these guidelines have been violated. We needed a way to get insight into the process the students followed when creating software. Moreover, as opposed to the interviews, we needed objective information about the process the students followed rather than they believe they have followed. This is why we have decided to mine the students' software repositories, including their version control system, issue tracker and mail archive.
To understand the software development process we (1) combined information from different repositories into one log, and (2) applied to the log process mining, collection of log analysis techniques originally developed for business process analysis. The first results turned out to be promising: we have discovered, e.g., that one of the six student groups have reused the prototype implementation in direct violation of the European Space Agency guidelines, and that due to similar violations for three groups out of six the teacher should have considered intervening and correcting the process[2]. The discrepancy between the prescribed model and the ongoing software development process turned out not to be limited to student projects: e.g., the bug resolution process of the GCC, the GNU Compiler Collection, as reflected in the Bugzilla, turned out to be more complex[3] than the bug resolution process prescribed by the Bugzilla Guide[4].
The next step consisted in applying our approach to commercial software development. Upon request of a large multinational company we have analyzed the way their engineers are resolving bugs. We have observed that in contrast with the observations for the GCC Bugzilla the process followed by the company engineers mostly agrees with the prescriptions of the bug tracking software.
Moreover, we have observed that the steps related to the organization of the Defect Management Process contribute significantly to the duration of the whole defect resolution process: for instance, the decision who has to resolve the bug is taken by a special board outside of the development team and the median time spent associated with this decision is 21 days. For the sake of comparison the median time of the resolution by the developers constitutes 36.4 hours. We have also seen that if rework is needed, i.e., the bug has not been solved correctly from the first attempt, the reworking activities are time consuming: median time associated with the first rework investigation is 9.9 days. Based on these observations a number of improvement actions have been implemented by the company. In the coming years we plan to assess how those improvement actions affected the company bug resolution process.
By conducting a series of increasingly complex case studies ranging from student projects to large commercial software systems we have seen that by applying process mining techniques to software repository data one can obtain actionable insights into the ongoing software development process.
To understand the software development process we (1) combined information from different repositories into one log, and (2) applied to the log process mining, collection of log analysis techniques originally developed for business process analysis. The first results turned out to be promising: we have discovered, e.g., that one of the six student groups have reused the prototype implementation in direct violation of the European Space Agency guidelines, and that due to similar violations for three groups out of six the teacher should have considered intervening and correcting the process[2]. The discrepancy between the prescribed model and the ongoing software development process turned out not to be limited to student projects: e.g., the bug resolution process of the GCC, the GNU Compiler Collection, as reflected in the Bugzilla, turned out to be more complex[3] than the bug resolution process prescribed by the Bugzilla Guide[4].
The next step consisted in applying our approach to commercial software development. Upon request of a large multinational company we have analyzed the way their engineers are resolving bugs. We have observed that in contrast with the observations for the GCC Bugzilla the process followed by the company engineers mostly agrees with the prescriptions of the bug tracking software.
Moreover, we have observed that the steps related to the organization of the Defect Management Process contribute significantly to the duration of the whole defect resolution process: for instance, the decision who has to resolve the bug is taken by a special board outside of the development team and the median time spent associated with this decision is 21 days. For the sake of comparison the median time of the resolution by the developers constitutes 36.4 hours. We have also seen that if rework is needed, i.e., the bug has not been solved correctly from the first attempt, the reworking activities are time consuming: median time associated with the first rework investigation is 9.9 days. Based on these observations a number of improvement actions have been implemented by the company. In the coming years we plan to assess how those improvement actions affected the company bug resolution process.
By conducting a series of increasingly complex case studies ranging from student projects to large commercial software systems we have seen that by applying process mining techniques to software repository data one can obtain actionable insights into the ongoing software development process.
[1] The latest Software Standard of ESA is ECSS-E-ST-40C, available at
http://wwwis.win.tue.nl/2R690/doc/ECSS-E-ST-40C(6March2009).pdf It is the software part of a set of systems engineering standards for Space Engineering by ESA-ESTEC, Requirements & Standards Division, Noordwijk, The Netherlands (2009).
http://wwwis.win.tue.nl/2R690/doc/ECSS-E-ST-40C(6March2009).pdf It is the software part of a set of systems engineering standards for Space Engineering by ESA-ESTEC, Requirements & Standards Division, Noordwijk, The Netherlands (2009).
[2] Wouter Poncin, Alexander Serebrenik, Mark van den Brand: Mining student capstone projects with FRASR and ProM. OOPSLA Companion 2011: 87-96.
[3] Wouter Poncin, Alexander Serebrenik, Mark van den Brand: Process Mining Software Repositories. CSMR 2011: 5-14
[3] Wouter Poncin, Alexander Serebrenik, Mark van den Brand: Process Mining Software Repositories. CSMR 2011: 5-14
[4] The bug resolution process is described in Figure 6.1 (The Bugzilla Guide -2.18.6 Release---Chapter 6. Using Bugzilla---6.4. Life Cycle of a Bughttps://www.bugzilla.org/docs/2.18/html/lifecycle.html) Since then the bug lifecycle has been (a) simplified and (b) made customizable: https://bugzilla.readthedocs.org/en/latest/using/editing.html#life-cycle-of-abug
About the author: Alexander Serebrenik is an associate professor of software evolution at Eindhoven University of Technology. He has obtained his Ph.D. in Computer Science from Katholieke Universiteit Leuven, Belgium (2003) and M.Sc. in Computer Science from the Hebrew University, Jerusalem, Israel. Dr. Serebrenik’s areas of expertise include software evolution, maintainability and reverse engineering, program analysis and transformation, process modeling and verification. Dr. Serebrenik recently served as the General Chair of the IEEE International Conference on Software Maintenance 2013, the Program Chair of the IEEE International Conference on Software Analysis, Evolution, and Reengineering and as a Program Committee Member of a number of software engineering conferences. He is currently acting as the chairman of the ICSME Steering Committee and as a member of the steering committee of SCAM; he also acts as the online-presence manager of IEEE Software. Serebrenik can be reached via a.serebrenik@tue.nl or @aserebrenik; he also tweets as @ieeesoftware. |
As your blog is totally dedicated to software, can we expect also some info in close future about best virtual data rooms or maybe some reviews of data rooms? Thanks in advance
ReplyDelete