Sunday, May 21, 2017

How Cross-stack Configuration Errors can Ruin a 360 Degree Panorama Website

by Mohammed SayaghEcole Polytechnique de Montréal.

Associate Editor: Sarah Nadi, University of Alberta (@sarahnadi)


Wordpress user “msayagh” could not upload his latest 360 degree panorama of Buenos Aires’ Plaza de Mayo to his Wordpress site (Figure 1). That's strange, since he was using the well-known “NextGen” Wordpress plugin. Being a software engineer at heart, and PhD student at work, he delved into the plugin’s source code, without finding any clue. Little did he know that the NextGen problem was not due to a source code problem, but due to an incorrect configuration option.

Basically, software configuration is the mechanism used to adapt a software system to different situations. For example, one has to only change an option like “file_uploads” to enable/disable file uploads in a particular system. While this gives a lot of flexibility to people to customize their system, it also means that assigning a wrong value to one option can lead the software to an incorrect behaviour. In the research literature, researchers found that such errors are frustrating as they are severe and require a lot of efforts to be fixed.
Figure 1: Configuration error encountered by msayagh’s 360 degree panorama.

Unfortunately, msayagh’s configuration error was not an ordinary configuration error, because only much later was the poor student able to find that the misconfigured option was not an option of “NextGen”, but of the PHP interpreter. That’s right! Although the error message popped up in NextGen’s GUI layer, the actual configuration option causing the trouble had to be fixed much deeper inside the Wordpress LAMP stack (Fig. 2). In other words, a particular system’s configuration is determined by the integration of configurations of each layer in a stack, i.e., each layer’s configuration is linked to each other (more on that later).

Figure 2: LAMP Stack Architecture


Was msayagh just using the wrong Wordpress plugin on the wrong day, or are cross-stack configuration errors a fundamental problem? By manually studying 1,082 configuration errors from StackOverflow and StackExchange, he found that cross-stack configuration errors are intrinsically more severe than single-layer errors and often occur during devops operations [1]. On the other hand, single-layer errors particularly occur during the setup of an environment and the maintenance of a website, which is indeed less severe.

Heartbroken from the 360 degree panorama incident, msayagh started working on an approach to help other Wordpress users like himself to debug cross-stack configuration errors using modern source code analysis techniques. He observed how existing analysis techniques only targeted single-layer errors, yet were quite good at that. For example, let’s consider the example of Figure 3 with three stack layers, each of which use a given programming language (PHP and C) and have their own configuration options.

Figure 2: LAMP Stack Architecture.

Was msayagh just using the wrong Wordpress plugin on the wrong day, or are cross-stack configuration errors a fundamental problem? By manually studying 1,082 configuration errors from StackOverflow and StackExchange, he found that cross-stack configuration errors are intrinsically more severe than single-layer errors and often occur during devops operations [1]. On the other hand, single-layer errors particularly occur during the setup of an environment and the maintenance of a website, which is indeed less severe.

Heartbroken from the 360 degree panorama incident, msayagh started working on an approach to help other Wordpress users like himself to debug cross-stack configuration errors using modern source code analysis techniques. He observed how existing analysis techniques only targeted single-layer errors, yet were quite good at that. For example, let’s consider the example of Figure 3 with three stack layers, each of which use a given programming language (PHP and C) and have their own configuration options.

Figure 3:Simplified source code of 3 LAMP stack layers (left) with their corresponding dependency graphs (right).

To analyze this example, a single layer approach for a given layer would rely on a graph model of the layer’s data [2] and control-flow dependencies [3], as shown on the right side of Figure 3. If the misconfigured option (i.e., the option that msayagh would like to find using his approach) in the example is “Option4” and the error message is printed by the print statement circled in red, a single-layer approach would very efficiently perform a breadth-first search starting from the circled node trying to find nodes that use a given option node, yielding only “Option2” (i.e., a false alarm). If there would only be a way to “link” the middle layer’s dependency graph to the graphs of upper and lower layers, the code accessing Option4 (i.e., the culprit option) would have been found!

At this point, msayagh suddenly realized that the missing links in the graphs are in fact the “Physical Links” between layers of a stack, such as the calling conventions from a plugin to Wordpress, or from a PHP program to the language primitives implemented in C. Since such links are documented (or in the worst case can be found in the stack layers’ code), they are straightforward to find and add to the dependency graph (see Figure 4). For example, the function “bar()” in the top layer is connected to its implementation “php_bar()” the lower layer by following a common naming convention in PHP. Similarly, we have the connections between the call to the function “mysql_bar()” and its implementation in the lower layer. By performing a simple breadth-first traversal of the unified stack dependency graph, the approach now yields (in that order) Option 4, Option1/2 and Option 3, with the first recommendation being the right answer. Since this new approach basically reuses the results of existing single-layer approaches, it is easy to generate the dependency graphs of additional layers or swap a single-layer approach for an equivalent one.
Figure 4: Model with “Physical links”

The approach basically reports as output the suspected configuration options to have an incorrect value. While the first reported option is the most likely to be misconfigured, the last one is the least likely.

Being an empirical researcher, msayagh evaluated the effectiveness of his approach on 36 real Cross-stack configuration errors, and found that the approach was able to identify the misconfigured option for 32 cases, with 23 misconfigured options reported as top recommendation. Each time, the approach finished its analysis in a median time of 230 seconds (asking a question on StackOverflow would have taken a median of 2.33 hours, without a strong guarantee of correctness).

In other words, in less than 4 minutes, our approach would enable any Wordpress user to upload their favourite 360 degree panoramas, or configure any other plugin in their Wordpress system! For more details, please see our ICSE 2017 paper [1] or if you happen to be in Bueones Aires next week, come see our presentation!

References:
[1] Mohammed Sayagh, Noureddine Kerzazi, Bram Adams, “On Cross-stack Configuration Errors”, in Proc. of the 39th ICSE, 2017. Link: http://mcis.polymtl.ca/publications/2017/icse.pdf
[2] Data flow: https://en.wikipedia.org/wiki/Data_flow_diagram
[3] Control flow: https://en.wikipedia.org/wiki/Control_flow_graph