Monday, February 13, 2017

Network Science Offers New Perspective on Developer Roles in Open Source Software

By: Mitchell JoblinSiemens AG. Erlangen, Germany (@mitchelljoblin)
Sven ApelUniversity of Passau, Germany (@SvenApel) 
Associate Editor: Bogdan Vasilescu, Carnegie Mellon University. USA (@b_vasilescu)


Software development is a distributed, collaborative effort, wherein different contributors play different roles, e.g., depending on their ability, experience, history, and position within a project. Research at the University of Passau, Siemens Corporate Technology, and the Technical University of Applied Sciences Regensburg provides novel quantitative insights into the properties and relevance of developer roles and their change over time. The insights could provide guidance on creating optimal governance and coordination structures for distributed software projects.

In open source software in particular, two main groups stand out: a small fraction of developers (the core group) responsible for performing the vast majority of work, assisted by a larger group of less active contributors (the peripheral group), involved in smaller or shorter activities. The two groups should be in balance. A large group of peripheral contributors is critical for ensuring software quality (recall Linus’s law "given enough eyeballs, all bugs are shallow"), but this imposes increased coordination effort on the core developers.

Our software analytics framework Codeface has substantially lowered the technical barrier to performing sophisticated in-depth analyses on big software repository data. As part of a recent empirical study, we provide developer role classification based on data automatically extracted from software repositories (e.g., version-control systems, mailing lists). Our techniques substantially extend rudimentary indicators based on counting lines of code or number of commits. These simple measures—although they are common in research and practice—carry significant uncertainty regarding their validity when it comes to capturing nuances of developer roles, and they are extremely limited with respect to inter-developer relationships. For example, the bare number of lines of code a developer has contributed tells us very little about how regularly they make contributions, how they are organizationally embedded within the project, or the degree of influence they may have on other developers.

Developer Networks

A developer network is a relational abstraction of the social and technical activities performed by developers. An example developer network for the open-source project QEMU is shown in Figure 1. Such networks are accurate in reflecting developer perception, and they reveal important functional substructure, or communities, with related tasks and goals [1, 3]. It has been shown that they evolve over time according to a number of fundamental changes in the network’s structural properties [2].

In a recent study [4], we explored the use of developer networks for classifying developer roles. In a nutshell, we found that they are superior in obtaining developer classification information compared to basic counts of individual developer activities, which we explain next.

Figure 1. The QEMU developer network emphasizing the community structure. Each node represents a developer and edges indicate interrelated development activity. The uniquely colored boxes enclosing developers represent communities of developers that work on strongly interrelated tasks. The pie chart for each node represents the amount of development activity associate with each community. The expected level of influence for each developer is reflected by the node size.

Empirical Study

As a first step, we performed a large-scale empirical study to obtain a ground truth on classifications for roughly 1000 developers. We used the data to validate (statistically and empirically) that using information from developer networks leads to more accurate results and greater practical insights regarding the differences between core and peripheral developers than simpler approaches (e.g., based on counting commits). As a key result, we found that core developers are positioned in the organizational structure distinctly from peripheral developers. Furthermore, the nature of the core developer group in the organizational structure is distinct from peripheral developers, in particular, in two different ways: (1) hierarchy and (2) stability.

Figure 2: Clustering coefficient versus node degree for developers of QEMU. Each point on the scatter plot represents one developer. The linear dependence is indicative of an organizational hierarchy with core developers at the top and peripheral developers underneath.


To mathematically detect if hierarchy exists in a developer network, we need to inspect the dependence between the so-called clustering coefficient and the degree of nodes in the network [5]. The hierarchical relationship for QEMU is shown in Figure 2; there is an obvious linear dependence between the log node degree and log clustering coefficient. Core developers that exhibit a high degree (i.e., who coordinate with many other developers) are seen to exclusively have a very low clustering coefficient (i.e., have neighbors that are only loosely interconnected) and are indicative of developers in a leadership role. In comparison, peripheral developers are seen as low degree nodes having consistently higher clustering coefficients. A developer’s position in the developer network is an organizational manifestation of their particular role. This and similar results may hold great potential for practice, for instance, when the agreement between prescribed and actual social organization needs to be ascertained and optimized, or when the organizational structure of projects is devised along the lines of established, successful projects.


A stable developer is one that maintains consistent participation in the project over a substantial period of time. The result of examining developer stability over one year of development for QEMU are shown in Figure 3. In this figure, the transition probabilities between developer states are shown in the form of a Markov chain. The primary observation is that developers in a core role are substantially less likely to transition to the absent state (i.e., leave the project) or isolated state (i.e., have no neighbors in the developer network by working exclusively on isolated tasks), which is in contrast to developers in a peripheral role. Based on this result, we can conclude that the core developers represent a more stable group than peripheral developers.

Figure 3: The developer-group stability for QEMU shown in the form of a Markov Chain. Edges are labeled with the probability that a developer in one state transitions to the next state in the following development cycle. A few less important edges have been omitted for visual clarity.

Final Remarks

Developer networks present a unique opportunity to capture insights about developer roles that are not visible in simpler, yet ubiquitously used representations. Inferring and modeling the inter-developer relationships leads to greater accuracy in capturing the real world, and it delivers deeper insights into the organizational structure of a software project. Our data and analysis suggest that peripheral developers are extremely reliant on core developers, especially because peripheral developers tend not to associate with other peripheral developers. If a core developer becomes overwhelmed by the need to oversee the activities of too many peripheral developers, a likely consequence is that effective coordination will not occur. If this is a pervasive phenomenon in a project's organizational structure, it may lead to severe degradation of the software architecture and have a negative impact on the overall source code quality. Our results—which can be immediately used by practitioners thanks to the associated freely available software framework Codeface—provides novel means for avoiding these traps.


[1] C. Bird, D. Pattison, R. D’Souza, V. Filkov, and P. Devanbu. Latent social structure in open source projects. In Proc. International Symposium on Foundations of Software Engineering, pages 24–35. ACM, 2008.
[2] M. Joblin, S. Apel, and W. Mauerer. Evolutionary trends of developer coordination: A network approach. Empirical Software Engineering, 2017. To appear.
[3] M. Joblin, W. Mauerer, S. Apel, J. Siegmund, and D. Riehle. From developer networks to verified communities: A fine-grained approach. In Proc. International Conference on Software Engineering, pages 563– 573. IEEE, 2015.
[4] M. Joblin, S. Apel, C. Hunsen, and W. Mauerer. Classifying Developers into Core and Peripheral: An Empirical Study on Count and Network Metrics. In Proc. International Conference on Software Engineering, 2017. To appear.
[5] E. Ravasz and A.-L. Barabasi. Hierarchical organization in complex networks. Physical Review E, 67(2), 2003.

Sunday, February 5, 2017

Empowering Users to Build IoT Software with a Puzzle-like Environment

by Yijun Yu, Pierre A. Akiki, Arosha K. Bandara
Associate Editor: Christoph Treude (@ctreude)

Jigsaw puzzle pieces serve as building blocks that allow children as well as adults to put together impressive pictures. Software engineers follow a similar approach to build complex programs. Since David Parnas' modularization conceptualization (Parnas, 1972), many kinds of software modules in the form of "building blocks" have been proposed and adopted. These include functions, objects, remote procedures, web services, cloud services, and microservices, to name a few. However, from a developer’s perspective, it is hard to achieve the exact requirements of end-users. Therefore, it would be useful to empower end-users with the capability to adapt a software product to their individual needs (Lapouchnian et al., 2006).

Internet of Things (IoT) devices and services can be configured to work together in many different ways. This configuration can be performed by end-users and is considered an end-user development challenge. Millions of school kids nowadays are taught programming concepts using end-user development environments such as Scratch, which has also been adapted into the Sense environment used to teach entry-level computing at The Open University (Kortuem et al., 2013). Can't we use a similar environment to teach end-users to develop software for IoT?

Akiki's recent research on Visual Simple Transformations (ViSiT) solves this problem by empowering end-users to wire IoT devices and services (Akiki et al., 2017). For example, end-users can use puzzle pieces to implement a transformation (Figure 1) that allows a Microsoft Xbox controller to communicate with a Lego Mindstorms robot. This paradigm is familiar to anyone who is used to programming with an environment like Scratch. Each puzzle piece is a visual block that connects to its neighbouring pieces in a similar way. Intuitively, puzzle pieces bind to each other through predefined sockets. The parameters of these building blocks provide the required concretization that matches with the configuration parameters of IoT devices and services.

Figure 1. An example transformation that connects an Xbox controller to a Lego Mindstorms robot
ViSiT’s underlying service-oriented code implements the transformations as executable workflows and composes them into a holistic application for IoT. Although this work primarily targets end-users, software developers can modify the executable workflows using Cedar Studio. This tool was originally developed as part of Akiki's work on adaptive user interfaces (Akiki et al., 2016), (Akiki et al., 2014).

By empowering end-users and lowering the barrier for software development, the creation of real-life IoT applications such as the robot shown in Figure 2 comes within the reach of a wider audience.

Figure 2. A shooting robot can be controlled using an Xbox controller as a result of applying visual simple transformations
Testing ViSiT with a number of end-users showed that it lowers the acceptance barrier and facilitates connecting IoT devices and services. A video demonstrating ViSiT and its supporting development environment can be viewed at:

With the evolution of such development paradigms and their supporting tools, end-users will have an ever-growing role to play in software development. As the adoption of end-user development environments grows, someday the data learnt from these environments could be used to train artificial intelligence to automatically compose software systems.