Monday, August 19, 2019

Studying human values in software engineering

Authors: Emily Winter and Maria Angela Ferrario
Associate EditorJinghui Cheng (@JinghuiCheng)

Why do human values matter for software engineering? Recent years have seen high profile software scandals and malpractices, in which individual privacy and social democracy have been undermined, as in the case of Cambridge Analytica’s use of Facebook data [1] and even human lives lost, as in the case of Boeing 737 [2]. As another recent IEEE Software post puts it, we are heading into an age of considerable ‘values debt’ [3], as the negative societal consequences, both intended and unintended, of our software systems mount up.

There is a pressing need then to understand how human values operate, to develop methods and tools to study them in a software engineering context, and to build on this understanding to consider how SE research might contribute to a more socially responsible software industry.

How do values operate?

We use values research from social psychology as our framework. In particularly, we draw on Schwartz’s values model, based on extensive empirical research spanning the last three decades. Schwartz’s work has identified, through survey research, a range of values that are recognised across cultures and that operate relationally. Schwartz’s values model operates across two key oppositional axes: self-enhancement vs. self-transcendence; and openness vs. conservation [4].

We also use Maio’s work, which considers values as mental constructs that can be studied at three different levels: the system level (the relationships outlined by Schwartz); the personal level (the different interpretations of values held by individuals); and the instantiation level (how values are expressed through behaviours) [5]. At the system level, for example, a software engineer who is highly concerned about their personal career development (achievement) is – according to Schwartz’s model – less likely to be concerned about the environmental sustainability (universalism) of the systems they are building. At the personal level, software engineers may have different interpretations of high quality code (achievement) - e.g. ‘code that does the job’ vs. ‘elegant code’. At the instantiation level, a concern with privacy may manifest in a development decision not to track user queries. 

Understanding values in software engineering

In order to study human values in a software engineering context, we required methods that were relatable and relevant to the software engineering community. We used Q-methodology as our starting point [6]. Q-methodology is a well-established method designed to systematically study subjectivity. It involves participants taking part in a card ranking exercise; they are interviewed about their decisions and multiple participants’ ‘sorts’ can be statistically analysed. The structured nature of the sorting helped with the systematic articulation and analysis of the qualitative data elicited from participant’s narratives; it was also important that the card statements were specific to the software engineering community. We used the newly revised ACM Code of Ethics [7] as a basis, choosing principles that corresponded with Schwartz’s values types and filling in any gaps by creating statements in accordance with the missing values. It was important, in order to gain a full understanding of the software engineering context, to consider a wide range of values, not just those considered ethical, in order to understand fully values trade-offs within complex industry contexts. Power and profit were as important for our study as honesty and the social good.

The role of the researcher in promoting a socially responsible software industry

One of our key findings is that people interpret and act out values in very different ways. Two of our study participants, for example, who both placed the statement ‘it is important for me that the public good is the central concern of all professional computing work’ in their ‘top 3’, showed almost opposite understandings of this value. For Laura, for example, the public good was about optimising the user experience: she explained they would ‘analyse the data once the user hits our website; we would then optimise off that behaviour’. By contrast, Stuart didn’t want to overly ‘structure’ the experience of users. He explained that an e-commerce site could ask users ‘do you want us to try and automate your offers? Yes or no’. He viewed an overly structured web experience as being oppositional to users’ freedom of choice.

By simply introducing the Q-Sort to software engineers, we have already encouraged articulation of these differences of interpretation, things that are often taken for granted and rarely explained. Maio and Olson, for example, argue that values often act as truisms, ‘widely shared, rarely questioned, and, therefore, relatively bereft of cognitive support’ [8]. Carrying out this kind of research may be the first step in encouraging a more values-aware and values-reflective technology industry – in which the taken-for granted may begin to be reflected upon and articulated. Avenues for future work include identifying opportunities for light-weight interventions that enable values reflection as an integral part of the agile process, for example.

As well as encouraging discussion of values within industry, we (SE researchers and academics) need to foster reflective, critical skills within our students. For example, we used the Q-Sort as a teaching tool in our Software Studio, a 20-week long module for second year Software Engineering undergraduate students. Within this module, students work in small teams to ideate, design and develop a software application. We introduced the Q-Sort to teams early on in the process as a way of encouraging values articulation and prioritisation that would underpin the entire software engineering decision making process. As well as generating discussion, reflection and critical thinking, this led to concrete future design decisions. One team, for example, went on to adopt a ‘most-vulnerable-first’ approach to system design and development for their train journey planning app, prioritizing search needs for people with disabilities, people with young children, and the elderly. In contrast to standalone ethics courses, the Software Studio embedded values and ethical considerations into the module; they were integrated with technical skills, not an optional add-on.

This is one example of teaching practice that supports the Values in Computing mission: that the next generation of computing professionals will be equipped with the technical tools, foundational knowledge, and critical skills necessary to distinguish responsible software engineering decisions from those that are potentially harmful to self and society.

References
  1. The Guardian, Cambridge Analytica Files. Retrieved on 12 August 2019 from https://www.theguardian.com/news/series/cambridge-analytica-files
  2. Helmore, E. (2019) ‘Profit over safety? Boeing under fire over 737 Max crashes as families demand answers’, The Guardian. Retrieved on 12 August 2019 from https://www.theguardian.com/business/2019/jun/17/boeing-737-max-ethiopian-airlines-crash
  3. Hussain, W. (2019) ‘Values debt is eating software’, IEEE Software blog. Retrieved on 12 August 2019 from http://blog.ieeesoftware.org/2019/07/values-debt-is-eating-software.html
  4. Schwartz, S. H. et al. (2012) ‘Refining the theory of basic individual values’. Journal of personality and social psychology 103(4): 663-688
  5. Maio, G. R. (2010) ‘Mental representations of social values’. in Advances in Experimental Social Psychology (Vol 42). Academic Press, pp. 1–43.
  6. Winter, E. et al. (2019) ‘Advancing the study of human values in software engineering’. In Proceedings of the 12thInternational Workshop on Cooperative and Human Aspects of Software Engineering (CHASE '19). IEEE Press, pp. 19-26. 
  7. ACM (2019) Code of Ethics and Professional Conduct. Retrieved on 12 August 2019 from https://www.acm.org/code-of-ethics
  8. Maio, G. R. and J. M. Olson (1998) ‘Values as truisms: Evidence and implications’. Journal of Personality and Social Psychology, 74(2): 294-311.

Wednesday, August 7, 2019

Design and Evolution of C-Reduce (Part 2)

Associate Editor: Karim Ali (@karimhamdanali)

Part 1 of this series introduced C-Reduce and showed how it combines a domain-independent core with a large collection of domain-specific passes in order to create a highly effective test-case reducer for C and C++ code. This part tells the rest of the story and concludes.

Parallel Test-Case Reduction


C-Reduce's second research contribution is to perform test-case reduction in parallel using multiple cores. The parallelization strategy, based on the observation that most variants are

uninteresting, is to speculate along the uninteresting branch of the search tree. Whenever C-Reduce discovers an interesting variant, all outstanding interestingness tests are killed and a new line of speculation is launched. This is the same approach that was subsequently rediscovered by the creator of halfempty.
C-Reduce has a policy choice between taking the first interesting variant returned by any CPU, which provides a bit of extra speedup but makes parallel reduction non-deterministic, or only taking an interesting variant when all interestingness tests earlier in the search tree have reported that their variants are uninteresting. We tested both alternatives and found that the speedup due to non-deterministic choice of variants was minor. Therefore, C-Reduce currently employs the second option, which always follows the same path through the search tree that non-parallel C-Reduce would take. The observed speedup due to parallel C-Reduce is variable, and is highest when the interestingness test is relatively slow. Speedups of 2–3x vs. sequential reduction are common in practice.

Applicability and Moving Towards Domain-Independence

Something that surprised us is how broadly applicable C-Reduce is to test cases in languages other than C and C++. Our users have found it effective in reducing Java, Rust, Julia, Haskell, Racket, Python, SMT-LIB, and a number of other languages. When used in this mode, the highly C/C++-specific C-Reduce passes provide no benefit, but since they fail quickly they don't do much harm. (C-Reduce also has a --not-c command line option that avoids running these passes in the first place.) One might ask why C-Reduce is able to produce good results for languages other than C and C++; the answer appears to be based on the common structural elements found across programming languages in the Algol and Lisp families. In contrast, in our limited experience, C-Reduce does a very poor job reducing test cases in binary formats such as PDF and JPEG. Other reducers, such as the afl-tmin tool that ships with afl-fuzz, work well for binary test cases.

A consequence of the modular structure of C-Reduce is that while its transformation passes are aimed at reducing C and C++ code, the C-Reduce core is completely domain-independent. C-Reduce has been forked to create a highly effective reducer for OpenCL programs. We believe it would be relatively straightforward to do something similar for almost any other programming language simply by tailoring the passes that C-Reduce uses.

Limitations


When used for its intended purpose, when does C-Reduce work badly? We have seen two main classes of failure. First, C-Reduce can be annoyingly slow. This typically happens when the passes early in the phase ordering, which are intended to remove a lot of code quickly, fail to do this. Second, highly templated C++ sometimes forces C-Reduce to terminate with an unacceptably large (say, >1 KB) final result. Of course this is better than nothing, and subsequent manual reduction is usually not too difficult, but it is frustrating to have written 69 different Clang-based passes and to still find effectively irreducible elements in test cases. The only solution—as far as we know—is to strengthen our existing transformation passes and to create more such passes.

A small minority of compiler bugs appears to be impossible to trigger using small test cases. These bugs are exceptions to the small scope hypothesis. They typically stem from resource-full bugs in the compiler. For example, a bug in register spilling code requires the test case to use enough registers that spilling is triggered; a bug in logic for emitting long-offset jumps requires the test case to contain enough code that a long offset is required; etc. These test cases are just difficult to work with, and it is not clear to us that there's anything we can do to make it easier to debug the issues that they trigger.

C-Reduce Design Principles


In summary, C-Reduce was designed and implemented according to the following principles:

  1. Be aggressive: make the final reduced test case as small as possible.
  2. Make the reducer fast, for example using parallelism, careful phase ordering of passes, and avoiding unnecessary I/O traffic, when this can be done without compromising the quality of the final output.
  3. Make it as easy as possible to implement new passes, so that many domain-specific passes can be created.

  4. Keep the C-Reduce core domain-independent.
  5. Focus only on producing potentially-interesting variants, delegating all other criteria to the user-supplied interestingness test.

Directions for Future Test-Case Reduction Research

Although perhaps a few dozen papers have been written about test-case reduction since Hildebrandt and Zeller's initial paper 19 years ago, I believe that this area is under-studied relative to its practical importance. I'll wrap up this article with a collection of research questions suggested by our experience in over a decade of work on creating a highly aggressive reducer for C and C++.


What is the role of domain-specificity in test case reduction? Researchers who cite C-Reduce appear to enjoy pointing out that it is highly domain-specific (nobody seems to notice that the C-Reduce core is domain-independent, and that the pass schedule is easy to modify). The implication is that domain-specific hacks are undesirable and, of course, an argument against such hacks would be forceful if backed up by a test-case reducer that produced smaller final C and C++ code than C-Reduce does, without using domain knowledge. So far, such an argument has not been made.


Is domain knowledge necessary, or can a domain-independent test-case reducer beat C-Reduce at its own game? The most impressive such effort that we are aware of is David MacIver's structureshrink, which uses relatively expensive search techniques to infer structural elements of test cases that can be used to create variants. Anecdotally, we have seen structureshrink produce reduced versions of C++ files that are smaller than we would have guessed was possible without using domain knowledge. Even so, some useful transformations such as function inlining and partial template instantiation seem likely to remain out-of-reach of domain-independent reduction techniques.


What is the role of non-greedy search in test-case reduction? In many cases, the order in which C-Reduce runs its passes has little or no effect on the final, reduced test case. In other words, the search is often diamond-shaped, terminating at the same point regardless of the path taken through the search space. On the other hand, this is not always the case, and when the search is not diamond-shaped, a greedy algorithm like C-Reduce's risks getting stopped at a local minimum that is worse than some other, reachable minimum. The research question is how to get the benefit of non-greedy search algorithms without making test-case reduction too much slower.


What other parallelization methods are there? C-Reduce's parallelization strategy is simple, gives a modest speedup in practice, and always returns the same result as sequential reduction. There must be other parallel test-case reduction strategies that hit other useful points in the design space. This is, of course, related to the previous research question. That is, if certain branches in the search tree can be identified as being worth exploring in both directions, this could be done in parallel.


What is the role of canonicalization in test-case reduction? A perfectly canonicalizing reducer would reduce every program triggering a given bug to the same final test case. This is a very difficult goal, but there are many relatively simple strategies that can be employed to increase the degree of canonicalization, such as putting arithmetic expressions into a canonical form, assigning canonical names to identifiers, etc. C-Reduce has a number of transformations that are aimed at canonicalization rather than reduction. For example, the reduced test case at the top of Part 1 of this piece has four variables a, b, c, and d, which first appear in that order. I believe that more work in this direction would be useful.

Can we avoid bug hijacking? Test reduction sometimes goes awry when the bug targeted by the reduction is "hijacked" by a different bug. In other words, the reduced test case triggers a different bug than the one triggered by the original. During a compiler fuzzing campaign this may not matter since one fuzzer-generated bug is as good as another, but hijacking can be a problem when the original bug is, for example, blocking compilation of an application of interest. Hijacking is particularly common when the interestingness test looks for a non-specific behavior such as a null pointer dereference. C-Reduce pushes the problem of avoiding hijacking onto the user who can, for example, add code to the interestingness test looking for specific elements in a stack trace. The research question here is whether there are better, more automated ways to prevent bug hijacking.

Obtaining C-Reduce

Binary C-Reduce packages are available as part of many software distributions including Ubuntu, Fedora, FreeBSD, OpenBSD, MacPorts, and Homebrew. Source code can be found at:
https://github.com/csmith-project/creduce
Acknowledgments: C-Reduce was initially my own project, but by lines of code the largest contributor by a factor of two is my former student Yang Chen, now a software engineer at Microsoft. Yang wrote effectively all of the Clang-based source-to-source transformation code, more than 50,000 lines in total. Eric Eide, a research professor in computer science at the University of Utah, is the other major C-Reduce contributor. Our colleagues Pascal Cuoq, Chucky Ellison, and Xuejun Yang also contributed to the project, and we have gratefully received patches from a number of external contributors. Someone created a fun visualization of the part of C-Reduce's history that happened on Github:


Finally, I’d like to thank Eric Eide and Yang Chen for reviewing and suggesting improvements to this piece.