Sunday, September 25, 2016

The Value of Applied Research in Software Engineering


By: David C. Shepherd, ABB Corporate Research, USA (@davidcshepherd)

Associate Editor: Sonia Haiduc, Florida State University, USA (@soniahaiduc


Jean Yang’s thoughtful post “Why It’s Not Academia’s Job to Produce Code that Ships” posits that academics should (and must) be free to explore ideas with no (immediate) practical relevance. Agreed, as these long term, deep, seemingly impractical research projects often end up causing major advances. However she goes further. In her defense of fundamental research she claims, “In reality, most research--and much of the research worth doing--is far from being immediately practical.” This subtle disdain for applied research, all too common in academic circles, has had deep implications for our field’s impact and ultimately widens the already cavernous gap between academic research and industrial practice. In this example-driven article I’ll discuss why applied research is a key enabler for impact, identify common vehicles for applied research, and argue for better metrics for scientific evaluation that measure both fundamental and applied contributions. Impacting the practice of software engineering is a huge challenge; it will require the best from our fundamental AND applied researchers.


Applied Research Reduces Risk

Software engineering research has produced thousands of novel solutions in the past decade. In many cases these approaches are rigorously evaluated in the lab.  That is, not by real developers and/or not in the field, as that would require the approach to be implemented in a production quality tool. Thus, companies looking to adopt more effective software engineering techniques are faced with a huge amount of risk. Should they adopt technique A, which had great success in lab evaluations, but has never been tested in the field? The answer to this question is almost always “No”.  Fortunately, applied research can reduce this risk dramatically, leading to more adoption and impact on industrial practices. Below I briefly discuss two examples.

In 2004, Andrian Marcus first published the concept of searching source code using information retrieval techniques. It is tempting to think that this work should have been immediately transferred to modern development environments. However, when my colleagues and others implemented prototypes of the concept and released it to the public it became evident there were many remaining research challenges. We discovered that the query/term mismatch problem is even more pronounced in source code and thus we developed a source code focused autocomplete functionality. We (and others) discovered how difficult it is to split identifiers into words (e.g., ‘OpenFile’ to ‘open’ and ‘file’) due to the many different naming conventions. We found that indexing speed is a major issue, as most searches occur within about 15s of opening a project. By discovering and solving these various applied research challenges this work is now ready to be transferred with a lowered level of risk.

The Microsoft Research team behind Pex, an automated test generation tool, can also attest to the importance of refining and revising research contributions through real world usage. Their work actually began as the Spec explorer project, a model-based testing tool with an embedded model checker, which they quickly discovered was too complex for the average user. They noted, “After observing the difficulties of “[training]” the target tool users, the first author moved on to propose and focus on a more lightweight formal-testing methodology, parameterized unit testing.” Later, when they began releasing Pex as an independent extension for Visual Studio they soon discovered users had “…strong needs of tool support for mocking (independent of whether Pex.. is used), [and so] the Pex team further invested efforts to develop Moles.” Having refined their approach via user feedback and solved practical blockers by creating Moles (i.e., a mocking framework) they have since shipped a simplified version of Pex with Visual Studio called IntelliTest which has received many positive reviews.

Vehicles for Applied Research

While I may have convinced you that applied research is a key step in the innovation pipeline it can be difficult to imagine how applied research can be conducted in today’s academic and economic climate. Below I will detail ways in which researchers have used open source software, startups, and product groups as vehicles for driving applied research. Their approaches can serve as a pattern for applied researchers to follow.

Creating or contributing to open source software can be a great vehicle for applied research. It allows you to gather contributions from others and get your work out to users with a minimum of hassle. Perhaps one of the best examples of using open source as a successful means for performing applied research is Terence Parr and his work on ANTLR, a project which he founded in 1988. In the course of his work on this parser generator he has balanced between acquiring users and contributing to theory. In the past few years he has both published at competitive venues (e.g., OOPSLA and PLDI) and supported major products at Twitter, Google, Oracle, and IBM. Impressive. Unfortunately, it appears that his achievements are not appreciated in academia, as he recently tweeted “I wish tenure/promotion committees knew what software was…and valued it for CS professors.”

Startups can be another great vehicle for applied research, fleshing out the ideas started during a thesis or project into a full-fledged technology. The team at AnswerDash provides a great example. Parmit Chilana’s thesis work focused on providing contextual help to website visitors. The concept appeared promising when first published but was untested in the “real world”. The authors, Parmit along with professors Andrew Ko and Jacob Wobbrock, spent significant effort performing applied research tasks such as deploying and gathering feedback from live websites prior to creating a spinoff. The applied research brought up (and answered) crucial questions, reducing risk and ultimately creating enough value that AnswerDash has received over $5M in funding to date.

Finally, for those working in or even collaborating with a large company product improvements can be a great vehicle for applied research. In this space there is no shortage of examples. From Pex and Moles (which became IntelliTest) to Excel’s Flash Fill to refactorings in NetBeans researchers have found it can be very effective to convince an existing product team on how your improvements can help their product and then leverage their development power. As Sumit Gulwani of Flash Fill fame says, “If we want to do more, and receive more funding, then we need a system in which we give back to society in a shorter time-frame by solving real problems, and putting innovation into tangible products.”

Rewards for Applied Research

“One of the major reasons I dropped out of my PhD was because I didn’t believe academia could properly value software contributions” – Wes McKinney, author of Pandas package for data analysis

Unfortunately, in spite of the importance of applied research for increasing our field’s impact there are few, if any rewards for undertaking this type of work. Tenure committees, conferences, professional societies, and even other researchers do not properly value software contributions. In this section I discuss how we as a community can make changes to continue to reward fundamental researchers while also rewarding applied researchers.

One of the most important issues to address, for both academic and industrial researchers, is that of what metrics are they evaluated on. For this matter I hold up ABB Corporate Research as an example of an enlightened approach. When I applied for ABB’s global “Senior Principal Scientist” position, which is awarded annually and is maintained as 5-10% of the overall researcher population, I had to create an application packet similar to a tenure packet. However, unlike an academic tenure packet I was encouraged to list applied research metrics such as tool downloads, talks at developer conferences, tool usage rates, blog post hits, and ABB internal users. While I did not use this metric at the time, I would also add the amount of collected telemetry data to this list (e.g., collected 1000 hours of activity from 100 users working in the IDE).  At ABB Corporate Research these applied metrics are considered equally, if not more than traditional metrics such as citation count.

The CRA also provides guidelines on how to evaluate researchers for promotion and tenure. In 1999 the CRA recommended that when measuring the impact of a researcher’s work factors like “…the number of downloads of a (software) artifact, number of users, number of hits on a Web page, etc…” may be used which, especially for the time, was progressive. Unfortunately, after listing these potential metrics they spend two paragraphs discrediting them, stating that “…popularity is not equivalent to impact…” and “it is possible to write a valuable, widely used piece of software inducing a large number of downloads and not make any academically significant contribution.” In brief, they stop short of recommending the type of metrics that would reward applied researchers.

Second to how researchers are evaluated are how their outputs are evaluated. In this area there have been small changes in how grants are evaluated, yet the impact of these changes has yet to be seen. The NSF, the US’s primary funding agency for computer science research, started allowing researchers to list “Products” instead of “Publications” in their biological sketch in 2012. This allows researchers to list contributions such as software and data sets in addition to relevant publications. Unfortunately, I have no evidence as to how seriously these non-publication “Products” are considered in evaluating grants (I would love to have comments from those reviewing NSF grants).

While some companies like ABB and institutions like the NSF and to a lesser extent the CRA have begun to consider applied metrics the adoption is not widespread enough to affect sweeping changes. In order to increase the amount of impact our field has on the practice I estimate we would need at least one quarter of our field to be composed of applied researchers. As of today we have less than five percent. The adoption and acceptance of applied metrics may be the single biggest change we could make as a community to increase impact.

Conclusion

Academia’s not-so-subtle distain for applied research does more than damage a few promising careers; it renders our field’s output useless, destined to collect dust on the shelves of Elsevier. We cannot and should not accept this fate. In this article I have outlined why applied research is valuable (it reduces risk), how applied research can be undertaken in the current climate (e.g., open source, startups, and product contributions), and finally have discussed how to measure applied research’s value (e.g., downloads, hits, and usage data). I hope these discussions can serve as a starting point for chairs, deans, and industrial department managers wishing to encourage applied research.

Wondering how to publish your applied research work at a competitive venue? Consider applying to one of the following… I know one of the PC co-chairs and can assure you that he values applied work. Publish at an industry track: SANER Industrial TrackICPC Industry TrackFSE Industry Track, ICSE SEIP


No comments:

Post a Comment