by Bonita Sharif, Youngstown State University (@shbonita), and Huzefa Kagdi, Wichita State University
There is no denying that large-scale software systems evolve. Software developers are routinely faced with new features and bugs that drive essential changes. Oftentimes, they are not even the original authors of the code they need to change. They need to piece together many aspects to realize these changes whilst juggling various project parameters such as quality assurance and deadlines. The knowledge space for this existential activity typically spans several artifacts from the problem (e.g., requirements) and solution (design and implementation) domains. The explicit connectivity among these artifacts is often missing in practice [1, 2], which further adds to the developer’s plea. Questions that are left to be answered include “Where is the relevant code to this feature/bug?” or “What would be the design impact due to a change request?”. The promise of a software traceability tool is to establish (recover) and maintain (evolve) links among artifacts as software evolves . Traceability benefits are projected in several key software development tasks such as program comprehension [4, 5] and impact analysis [6, 7], which address the aforementioned questions.
Over the years, software engineering researchers have proposed techniques for automatically recovering and maintaining (explicit) traceability links among software artifacts. The application of Information Retrieval (IR) techniques is a popular and heavily experimented choice. It is an artifact-centric approach that has an underlying model of textual similarities in software artifacts. The expressiveness, effectiveness, and usefulness rely on the artifacts, and their state of availability and quality during the software’s lifecycle.
What We Did
Our research direction was to investigate a human-centric approach to traceability. The underlying premise of our approach is based on what humans look at while they are performing software-engineering tasks, including those of bug fixes and implementing new features. The prerequisite to such an approach is that it should be unobtrusive to developers and blend into the background. We use an eye tracker to collect developers' gazes on software artifacts while they work on their tasks within the IDE (we use Eclipse; however, the same concept applies to others). Our eye tracking infrastructure is called iTrace [8, 9]. It seamlessly works within the IDE to map eye gazes to source code elements on large files in the presence of scrolling and context switching between files. A preliminary version of iTrace is available at https://github.com/Sereslab/iTrace-Archive. An enhanced release with additional support for IDEs and Web browsers is planned for the near future.
We first did a pilot study  to determine the feasibility of using eye gaze for traceability link recovery. After seeing promising results, we conducted a larger, realistic study  with thirteen software developers who were asked to perform bug-localization tasks for eight submitted bug reports in Jabref (an open-source reference management system). The gaze-link algorithm uses gaze data of developers during the session using certain heuristics and weights. In a bug fixing task for example, the main premise is that as you get closer to fixing a bug you will focus on the part(s) of the code that is most related to the bug report, thus having more weight. We compared trace link results of our gaze-link algorithm with IR methods such as Latent Semantic Indexing (LSI) and Vector Space Model (VSM).
What We Found
The gaze-link algorithm outperforms both LSI and VSM in terms of precision and recall with respect to the commit oracle (how the JabRef developers fixed the bug). We recorded an average precision of 55% and recall of 67% for all tasks. The gaze-link algorithm outperformed in 6 out of the 8 tasks. Another set of developers found the links generated with iTrace to be significantly more useful than the IR links in a majority of the tasks. The gaze-link algorithm underperforms when the developer prematurely attempts to fix the bug without the adequate understanding of the bug or its solution. Interestingly and perhaps surprisingly, the gazes captured from a developer in cases where they did not quite fix the bug, but was close enough, ended up being helpful to another developer as a starting point towards eventually fixing the bug.
See Table 1 for an example of links generated for a bug report across the different approaches at both class and method level granularity. The gaze algorithm is crucial to weeding out irrelevant and stray glances (shown in ETraw). Results from the gaze link algorithm (ETweighted) are more specific than the rankings from current IR methods (LSI and VSM).
The eye tracker captures the gaze data without any additional effort on the part of the developers. This property allows it to be as effortless as we can get to provide traceability under the hood while developers work. An added benefit of using the eye tracker is that we also learn about those hidden links that information retrieval methods have a hard time finding as they are related to tacit developer knowledge in various related code entities. Imagine a software development world where your gazes could inform things you do. The transparency and minimal effort required by developers makes gaze tracking an attractive possibility. It is reasonable to imagine a future in which eye trackers capture information while we work to help us with many more tasks than just software traceability.
We refer interested readers to the research article titled “Eye Movements in Software Traceability Link Recovery” .
 B. Ramesh, "Factors Influencing Requirements Traceability Practice," Communications of the ACM, vol. 41, pp. 37-44, December 1998.
 P. Mader, O. Gotel, and I. Philippow, "Motivation matters in the traceability trenches," in RE, 2009, pp. 143-148.
 D. J. Palmer, "Traceability," in Software Engineering, M. Dorfman and R. H. Thayer, Eds., ed: Wiley-IEEE Computer Society Press, Los Alamitos, California, 1996, pp. 266-276.
 A. De Lucia, D. P. Massimiliano, and R. Oliveto, "Improving Source Code Lexicon via Traceability and Information Retrieval," IEEE Transactions on Software Engineering pp. 205-227, 2011.
 A. De Lucia, M. D. Penta, R. Oliveto, and F. Zurolo, "Improving Comprehensibility of Source Code via Traceability Information: a Controlled Experiment," in 14th IEEE International Conference on Program Comprehension (ICPC'06), Athens, Greece, 2006, pp. 317-326.
 G. Antoniol, G. Canfora, G. Casazza, and A. De Lucia, "Identifying the starting impact set of a maintenance request," in 4th CSMR 2000, pp. 227–230.
 A. von Knethen, "Change-Oriented Requirements Traceability. Support for Evolution of Embedded Systems," in International Conference on Software Maintenance (ICSM 02), 2002.
 B. Sharif and J. I. Maletic, "iTrace: Overcoming the Limitations of Short Code Examples in Eye Tracking Experiments," in 32nd IEEE International Conference on Software Maintenance and Evolution (ICSME), Technical Briefing, Raleigh, NC, USA, 2016, p. 647.
 T. Shaffer, J. Wise, B. Walters, S. Müller, M. Falcone, and B. Sharif, "iTrace: Enabling Eye Tracking on Software Artifacts Within the IDE to Support Software Engineering Tasks," in ESEC/FSE 2015, Bergamo, Italy, 2015, pp. 954-957.
 B. Walters, T. Shaffer, B. Sharif, and H. Kagdi, "Capturing Software Traceability Links from Developers' Eye Gazes," in 22nd International Conference on Program Comprehension (ICPC), Hyderabad, India, 2014, pp. 201-204.
 B. Sharif, J. Meinken, T. Shaffer, and H. Kagdi, "Eye Movements in Software Traceability Link Recovery," Empirical Software Engineering Journal, vol. 22, pp. 1063-1102, 2017.