Tuesday, July 23, 2019

Values Debt is Eating Software

By: Waqar Hussain (@infinity4upk )
Associate Editor: Muneera Bano (@DrMuneeraBano)

The growing diffusion of software into human societies and its profound impact on individuals calls for a reflective and responsible attitude from its creators. Software engineers often do well when it comes to delivering software functionality thus creating economic or business value. However, an aspect that hasn’t received sufficient attention is human values – which we all would claim to care about. On the positive side, the shift from value-neutral towards value-based software engineering is a sign of growing maturity for software engineering as a discipline [1].

Values such as fairness, sustainability and pleasure are ideals and aspirations that are important to humans. Values transcend situations, guide human decision making and influence their evaluation of worldly things [2]. When values are not designed for and delivered in software, there are ramifications. I call value deficiencies or omissions in software, Values Debt (VD). Unlike other forms of software debt often remedied by spending additional cost or effort, high VD renders software unsalvageable. It poses an existential threat to the ‘owner’ organization - something that happened to Volkswagen and Facebook [3].

VD is present in all kinds of software. For example, the mundane software like flight reservation systems has an inherent VD as they are built on economic drivers of supply and demand rather than on the values of empathy and compassion. Delta airline’s reservation system manifested its VS when it started price gouging people trying to escape from the devastation of Hurricane Irma [4].

More alarmingly, VD in also present in Machine Learning (ML) and Artificial Intelligence (AI) based systems that can easily scale and impact masses of people. These ‘intelligent’ ML/AI based systems are often trained on biased data hence spew out biased results such as longer sentencing for criminals of a particular skin type, denial of loans and jobs based on certain ethnicity or gender and so on. With the prevalence of VD in all software, the point of emphasis is, while software may be eating the world, VD is dining on software!

Contrary to the common (practitioners’) belief, not all ills of software can be put to rest by adding ‘more’ code. For example, the tech giant Facebook had to undergo social audit rather than technical overhauls of its platform to fix privacy, discrimination and other human rights issues.

But why is it so hard to embed even a small subset of human values in software? Firstly, it is due to the subjective nature of values and their imprecise definitions. Software engineers, especially those with little or no background in social sciences, can hardly relate values to their ‘day to day’ work. Secondly, the interconnected of values. Implementing one needs to address the willful implementation of its interconnected set.

Poor implementation of one e.g. privacy might impact other connected values such as perceived autonomy and freedom. Thirdly, prioritizing stakeholder values (individual vs. corporate vs. societal) and aligning them in software delivery is extremely challenging.

But how do software engineers deal with values in practice? Our research shows that there is a general lack of awareness in the creators of software about human values and how they relate to their work [4]. Some avoid dealing with the murky and philosophical area of values and divorce themselves from taking up any responsibility. Others get deterred by the cost and effort required to achieve the desired level of maturity in implementing values and are content with covering enough regulatory requirements just to be compliant. Yet others consider the delivery of software itself takes care of human values - and the attitude is like ‘after all we are solving people’s problems and keeping ourselves from doing any harm ’.

What can be done to address this? Besides providing awareness of the importance of human values in SE and providing practitioners with the right tools and techniques, more accountability is needed. While codes of professional conduct for software practitioners and standards to support the development of ethical AI and ML-like IEEE P7000 suite are useful. Appealing merely to the good conscience of people alone, however, will not achieve the desired results. Regulations like General Data Protection Regulation (GDPR) and others are helpful [5]. Regulations deter technological advancements from deliberately breaching societal values and hold organizations accountable (and even individuals) when software negatively impacts people or the environment.

Where to now? Values Debt metaphor is an ice breaker to discuss, understand and thereby avoid the implications of human-values-deficient software. Imagine how does the world start to look like if we could engineer in values from the get-go? Do we move to avoid human rights deficiencies in software? Can we solve failures like Delta Air ticketing system labelled as one with “no built-in ethical values”; can we ‘educate’ and restrain intelligent algorithms in-time from sending pictures of self-harm to children and avoid future suicides? [6] I believe we can if we reduce software Values Debt.


[1] Boehm, B. Value-based software engineering, ACM SIGSOFT Software Engineering Notes, 28(2), 4., 2003
[2] Schwartz, S.H., et al., Refining the theory of basic individual values. Journal of personality and social psychology, 2012. 103(4): p. 663.
[3 ] Smith D. Mark Zuckerberg vows to fight election meddling in marathon Senate grilling. The Guardian. 2018 Apr;11.
[4] Justin Sablich. 2017. ’Price Gouging’ and Hurricane Irma: What Happened and What to Do. https://www.nytimes.com/2017/09/17/travel/price-gouginghurricane-irma-airlines.html.
[5] Voigt, P. and A. Von dem Bussche, The EU General Data Protection Regulation (GDPR). A Practical Guide, 1st Ed., Cham: Springer International Publishing, 2017.
[6] Crawford, A. Instagram 'helped kill my daughter'. BBC News. Retrieved on 24 July 2019 from https://www.bbc.com/news/av/uk-46966009/instagram-helped-kill-my-daughter, 2019

Tuesday, July 9, 2019

Design and Evolution of C-Reduce (Part 1)

Associate Editor: Karim Ali (@karimhamdanali)

Since 2008, my colleagues and I have developed and maintained C-Reduce, a tool for programmatically reducing the size of C and C++ files that trigger compiler bugs. C-Reduce also usually does a credible job reducing test cases in languages other than C and C++; we'll return to that topic in Part 2.

Why Reduce Test Cases?

Here's a typical C-Reduce output, as found in the LLVM bug database:

int a[1];
int b;
void c() {
  void *d = b = 0;
  for (;; b++)
    a[b] = (int) d++;

LLVM crashes when compiling this code at the -O2 optimization level. The bug report does not contain the original, unreduced test case, but most likely it was larger.

A reduced test case is preferable because:
  • it usually gets the compiler to misbehave quickly, reducing the number of function calls, memory allocations, etc. that the compiler developer has to step through and reason about while debugging
  • the reduced file contains little code not directly related to triggering the bug, making it less likely that compiler developers will be distracted by extraneous features of the test case
  • reduced test cases for the same bug often look similar to each other, whereas this is not normally true for unreduced files that trigger the same bug
  • there is often little or no discernible similarity between an unreduced source file and its reduced version, making it easier for compiler bugs triggered by proprietary code to be reported externally

The minimum-sized input triggering any particular compiler bug can be found using a trivial algorithm: exhaustive search of text strings of increasing size. This method is, of course, almost always intractable. In practice, test case reduction proceeds in the other direction: starting with a large, failure-inducing test case, incrementally making it smaller until a local minimum is reached.

A Bit of Background

The history of automated test case reduction does not seem to be well documented, but several examples can be found in software testing papers from the 1990s such as Differential Testing for Software and Massive Stochastic Testing of SQL. Test-case reduction was first studied in its own right in 2000 when Hildebrandt and Zeller introduced Delta Debugging: a general-purpose technique for test case reduction. Their algorithm uses a greedy search where a series of variants (our term, not theirs, for partially-reduced test case candidates) is produced by removing chunks of the test case. As reduction progresses, the chunk size is reduced, until it reaches some minimum-sized unit, such as a single line, token, or character. When no minimum-sized chunk can be removed from the test case without breaking the property that it triggers the bug, the Delta Debugger terminates. Almost all subsequent test-case reduction work, including ours, builds upon this work.

Towards C-Reduce

I became interested in test case reduction when my colleagues and I began to find a lot of bugs in C compilers using random testing. We found so many bugs that reporting them became bottlenecked on reducing the bug-triggering programs. Since I was the one reporting the bugs we found, I was the one who felt the pain of manual test-case reduction, and it quickly got old. I eventually reported around 500 compiler bugs and I could not have done this without first creating C-Reduce.

At the time, the best open-source implementation of Delta Debugging, from UC Berkeley, was line-based and contained a significant innovation over the original algorithm: it could reorganize a file in such a way that all nested curly braces deeper than a configurable level would appear on a single line. Thus, at level zero, entire functions would be placed on the same line, enabling the line-based reducer to remove an entire function at once. At higher nesting levels, functions would be split across lines, enabling finer-grained reduction. This method worked well but the tool ended up being inadequate for my needs: it got stuck at local minima that were often orders of magnitude larger than what could be achieved when reducing by hand.

The limiting factor in the Berkeley Delta tool ("Delta" from now on) was obvious: it was not able to exploit enough of the structure of the file being reduced. For example, it could usually not do much to simplify arithmetic expressions. These sorts of simplifications tend to have a cascading effect: eliminating the last use of a variable allows its definition to be eliminated, etc. The obvious path forward was to write a new tool that solved a reduction problem that Delta could not solve, and then to alternate running this tool and Delta until a global fixpoint was reached. I did this, adding more and more reduction techniques over time. I eventually wrote a line-elimination pass in my new reducer, at which point Delta was subsumed and could be dropped.

We ended up keeping two elements of Delta's design. First, the configurable hierarchical reformatting of a test case based on curly brace nesting. This technique, followed by removing contiguous lines of code, is still one of C-Reduce's most useful first lines of attack on a test case. Second, Delta's mechanism for determining whether a given variant is "interesting." An interesting variant is used as the basis for further reduction steps; an uninteresting variant is a dead end, and is discarded. Delta determined interestingness by invoking a user-supplied program—typically a shell script—whose process exit code determines the interestingness of the current variant. The flexibility afforded by this small element of user extensibility ends up being extremely useful. For example, the interestingness test can discard test cases that trigger certain compiler warnings, it can attempt to disambiguate different crash bugs, etc.

It is more challenging to reduce test cases that cause the compiler to emit incorrect object code than it is to reduce test cases that merely cause the compiler to crash. C-Reduce itself is agnostic about the character of the bug of interest: we push all of the difficulties in reducing miscompilation triggers into the interestingness test, which should try to answer questions such as:
  • is the variant well-defined by the C or C++ standard?
  • does the variant avoid depending on behaviors that are unspecified by the C or C++ standard?
  • does the buggy compiler turn the variant into an executable?
  • does this executable terminate within a specified time?
  • does the reference compiler (assumed to not contain the bug of interest) turn the variant into an executable?
  • does this executable also terminate within a specified time?
  • does the behavior of the two executables differ in a way that indicates that a miscompilation occurred?

The variant is interesting if the answer to all of these questions is "yes."

The hardest part of reducing programs that trigger miscompilation bugs is ensuring that variants avoid undefined behaviors (such as invalid pointer operations) and do not rely on unspecified behaviors (such as the order of evaluation of function arguments). A test case doing one of these things is ill-formed and can accomplish nothing beyond annoying compiler developers. Empirically, if undefined behavior is not actively avoided during test-case reduction, C-Reduce will almost certainly introduce it. The practical solution is to use suitable static and dynamic analysis tools to rule out ill-formed variants. Since no single tool that detects all undefined and unspecified behaviors in C and C++ programs exists, a hybrid approach involving multiple tools is typically used in practice. This approach is not completely satisfying, but it works well enough that C-Reduce can reliably produce useful reduced test cases for miscompilation bugs in C and C++ compilers.

Writing good interestingness tests for miscompilations takes a bit of practice. First, when there are many criteria that must be satisfied for a variant to be interesting, it is useful to minimize the test’s expected-case runtime by asking the quickest and most-likely-to-fail questions first. Second, it is easy to write buggy tests. More than one user has described C-Reduce as being something like the sorcerer's apprentice: it does an excellent job reducing according to the criteria it is given, but if these criteria contain any kind of loophole, C-Reduce is likely to find it. For example, it is easy to accidentally write a test that claims that the empty file is interesting.

From the start, C-Reduce’s main goal was to produce a very small final reduced test case, even when this would take longer than we liked. This is based on the premise that we should burn cycles instead of human time, and that reporting a compiler bug is not usually on the critical path; we can often afford to wait for a better result. The consequences of this decision can be seen in Tables 1 and 2 of this paper that evaluates several test-case reduction methods: C-Reduce produces the smallest final output, but takes more time to do so.

A Modular, Domain-Independent Reducer Core

Although C-Reduce started out as a pet project solving a specific problem, it evolved into a research project involving a number of my colleagues, whose top-level goal was to produce an effective and usable reducer for C and C++ code as found in the wild. The first research contribution to come out of this effort was a way to achieve a clean mechanism/policy separation in a test case reducer. Previous reduction techniques had all baked specific transformations into the overall search strategy. That approach impedes extensibility, which we found to be crucial. The structure that we ended up with is a small core that invokes a collection of pluggable transformation passes until a global fixpoint is reached.

The API for C-Reduce passes is simple but—like many simple things—required a lot of iterations before it felt finished. It is based on the ideas that transformation passes should be stateless and that every pass should implement a linear sequence of transformations, each of which results in a variant that may or may not be interesting. The interface is as follows:

state new(filename, option)

Return a fresh state object. Each pass uses this state to keep track of where it is in the sequence of transformations that it is capable of performing. These states may contain arbitrary data items; the C-Reduce core treats them as opaque. A typical pass stores some kind of cursor—usually a byte offset, token offset, line number, or position in a tree traversal—in the state object.

The file referred to by filename is logically part of the state object even though it resides in the filesystem instead of memory. Of course it would not be difficult to pass the contents of the file around as a memory object but this approach would be slow when these objects are large: C-Reduce is frequently invoked on multi-megabyte preprocessed C++ files.

The "option" is used to select among different behaviors implemented by a composite pass.

state advance(filename, option, state)

Return a new state object referring to the next transformation opportunity following the one referenced by the state object passed as a parameter.
result transform(filename, option, state)
Modify the file in-place, selecting the transformation instance referred to by the state object. The result takes one of three values:
  • OK : the transformation succeeded
  • STOP : no more transformation instances remain for this pass
  • ERROR : something went wrong; for example, an external tool crashed, a working file or directory could not be created, etc.

(The API contains one additional method, which checks whether a pass's external dependencies are satisfied, that doesn't matter here.)

Our experience has been that every transformation pass that we wanted has been easy to implement behind this API.

The C-Reduce core implements this algorithm:

current = original_test_case
  size_at_start = size(current)
  foreach (p, option) in pass_list
    state = p::new(current, option)
      variant = current         // this is a file copy operation
      result = p::transform(variant, option, state)
      if result == ERROR
        report_problem_in_pass(p, option)
      if result == OK
        if is_interesting(variant)
          current = variant     // also a file copy
          state = p::advance(current, option, state)
    while result == OK
while size(current) < size_at_start

The termination argument for C-Reduce is:
  1. Since the outermost loop requires the size of the test case to decrease monotonically, it can only execute as many times as the size (in bytes) of the unreduced test case. In practice, it executes many fewer times than this.
  2. The loop over passes terminates because the pass list is immutable after C-Reduce’s initialization phase.
  3. Each iteration of the innermost loop either advances the state object or else (by selecting an interesting variant) removes one transformation opportunity. Either way, the number of transformations remaining in the current pass is decreased by one.
  4. The interestingness test is, at worst, terminated (using OS support for killing a process group) after a configurable timeout.

In practice, the weak link in this argument is the third item, which is vulnerable to bugs in passes. C-Reduce terminates robustly by abandoning passes when they appear to be behaving unreasonably.

The C-Reduce core does not insist that transformations make the test case smaller, and in fact quite a few of its passes potentially increase the size of the test case, with the goal of eliminating sources of coupling within the test case, unblocking progress in other passes.

The sequence of transformation passes is carefully orchestrated such that passes that are likely to give the biggest wins—such as those that remove entire functions—run first; otherwise the tool would end up spending days or weeks doing silly things such as trying to shrink numeric constants in a huge source file. Shrinking numbers is useful, and it should be done, but only after many other reduction mechanisms have run to completion.

C-Reduce's collection of cooperating passes, with heavy phase-ordering constraints, is highly reminiscent of how a modern optimizing compiler works. However, only a small proportion of the transformation passes is intended to be semantics-preserving in the sense that a compiler's optimization passes must be. In this domain, we only want to preserve enough semantics that we can probabilistically avoid breaking whatever property makes a test case interesting.

A consequence of writing a modular reducer is that once we came up with the right API for writing passes, we were free to write a lot of passes. My colleagues and I spent several years doing this and we ended up with:
  • 35 passes, implemented in Perl, that include heuristics such as removing lines, removing various kinds of matched delimiters (and perhaps also the text between them), and shrinking integer values
  • 6 passes that invoke external utilities such as unifdef, a partial evaluator for the C preprocessor language, a lexer for C and C++ that supports various token-level reduction transformations, and pretty-printing utilities that make the reduced test case more pleasant to look at
  • 69 passes, implemented in C++, that use LLVM's Clang front end as a library for source-to-source transformation of C and C++ code; these include function inlining, partial template instantiation, scalar replacement of aggregates, copy propagation, and eliminating levels of a class hierarchy.

The actual number of dynamic passes is larger than the total of these numbers since some passes can be invoked in different modes using the "option" parameter mentioned above.

In this piece, we looked at why we had to create C-Reduce and at the modular structure that was key to making it solve the problems that we wanted to solve. In Part 2, I'll describe how C-Reduce improves reduction times using multiple cores and why C-Reduce usually does a good job reducing test cases in languages other than C and C++; finally, I'll discuss a few open research problems in test case reduction.

Tuesday, July 2, 2019

Autonomous Computing Systems: The Convergence of Control Theory and Computing Systems

Associate editor: Danilo Pianini (@DanySK86)

Computing systems are becoming increasingly complex both in terms of scale, and in terms of functionality. Applications need to process large amounts of data with time requirements that are becoming smaller and smaller. The distribution of computation has an unprecedented growth. It is sufficient to think about the number of Internet of Things (IoT) devices that is increasing exponentially, and it is expected to reach 31.4 billion by 2023.
Such an increasing complexity needs cannot be handled with static policies or with human operators, and sound mathematical approaches are needed, as also highlighted in the seminal paper by A.-L. Barabási, 2005. In particular, novel applications like smart houses, smart grids and cities, Industry 4.0, and robotics, pose a number of challenges for the efficient management of the computational resources, and for the design of scalable and efficient autonomous solutions.

The need for self-adaptation

The design of decision-making strategies for computing systems is known in the literature as self-adaptation, i.e., computing systems dynamically adapt to changes. Several disciplines have been identified as potential contributors. Among these, machine learning provides additional knowledge of the system; see for example the work by Esfahani et al., 2013, and control theory provides a vast array of tools for designing robust adaptive physical systems with formally assured behavior; see for example Cheng et al., 2009 or Filieri et al., 2017. The combination of knowledge, robustness and formal guarantees has led to increased interest in developing control-based approaches in various computing system problems, as highlighted in the surveys by Patikirikorala et al., 2012 and by Shevtsov et al., 2018.

Modeling computing systems

In almost every field of science, there is an attempt of building mathematical representations of the physical world. On the other hand, Computer Science seems to be a bit different. Linus Tolvalds wrote:

I'm personally convinced that computer science has a lot in common with physics. Both are about how the world works at a rather fundamental level.
The difference, of course, is that while in physics you're supposed to figure out how the world is made up, in computer science you create the world.
Within the confines of the computer, you're the creator. You get to ultimately control everything that happens. If you're good enough, you can be God. On a small scale. 
Linus Torvalds, Just for Fun

Even though we are the ones creating our computing systems, it seems that we have a little understanding on how to model their runtime behavior. This may be due to the fact that the way we conceive algorithms is fundamentally different with respect to the way we describe dynamic models. The gap in the two approaches is very difficult to fill, and several attempts has been made. Techniques ranging from Markov Decision Processes to Neural Networks, have been widely explored in the scientific literature in different fields.

One model to rule them all?

A very important distinction should be made on the nature of models in science. There are (at least) three types of models:

  • Simulation models, where the objective is to represent in the most accurate way the behavior of the modeled system, and to simulate it completely in silico, i.e., in a computer simulation. They are the most common models in almost every field of science, including physics, chemistry, and computer science.
  • Prediction models, where the objective is to provide forecasts on the future behavior of a system, based on its known past behavior. The main difference with respect to simulation models is that at every time step the model is fed with the most updated values of its behavior. This is a very common model in statistics, finance, or even weather forecast, where we try to make our best guesses on the future behavior of some financial indexes or what will the weather look like in the next few days, based on the current knowledge of the system.
  • Control models, where the objective is to describe the main dynamics of the system with simple models, in order to capture basic properties, i.e., how the system responds to external stimuli in terms of timing properties during transients.

The need for control models

A famous quote says:
All models are wrong but some are useful
In fact, all the types of models are approximations of the reality, and they are meant for a specific purpose. Using a prediction model to simulate the behavior of a complex system is not a good idea.

As a simple example, Gulisano et al., 2018 provided a very accurate, yet complex, mathematical description of how to model performance in a join operator in data streaming applications. The results show that it is possible to model the latency and throughput of the operator even in presence of overload, and obtain an error in the order of fractions of milliseconds. However, adopting such a model for designing a control strategy is extremely challenging.

More in general, several  not all  of the models proposed in computer science get closer to simulation models rather than control models. They are extremely accurate but they fall short when it comes to model-based self-adaptation design. To make a physics parallel, most of the time, the proposed models are focusing on describing the motion of single particles to infer what is the temperature in a room: The model itself becomes extremely complicated, but it is of little use when a controller needs to be designed for controlling the temperature in a room. The concept of a higher level of abstraction is key to enable control models for computing systems.

Control-based Computing System Design

The importance of scalable methods for analysis and synthesis of large-scale inter-connected systems is widely recognized. Resource efficient solutions will not be possible without a firm mathematical foundation. Being able to program and manage computing systems in an efficient way is of fundamental importance. By combining feedback-based computing with suitable programming models, it is possible to design predictable applications with guarantees on response time, resource allocation, and power consumption.

Control theory seems to be the most promising approach to tame such complexity. Other machine learning approaches represent a valid alternative, with the disadvantage of limiting the capabilities of formally assess properties of the controlled system, as well as the generality of the proposed solutions. Such limitations generated the new trend of explainable Artificial Intelligence (AI), in order to describe the way several machine learning strategies take their decisions.
On the other hand, control-based approaches are typically designed according to well understood policies, with deterministic algorithms. Model-based design is in fact the most common approach, since it allows the control engineer to provide formal guarantees on the robustness and on the performance that can be obtained with the designed solutions.

Wrapping up

The main challenge in applying control theory to computing systems lies in the modeling step. Nonetheless, there have been in the past several works control-based solutions, trying to combine model-free or black-box modeling approaches to compensate for the lack of suitable control models. Even in such cases, control-based solutions have proven to be a more effective approach with respect to classical machine learning approaches, e.g., see Maggio et al., 2012, as well as other heuristic approaches, e.g., see Terraneo et al., 2014.

The area of the design of autonomous computing systems is extremely broad, and with a lot of opportunities to advance science and engineering. The innovation of the next generation computing systems lies in between different scientific communities, including software engineering, computer engineering, computer science, control theory, statistics, and mathematics.