Sunday, August 21, 2016

From Aristotle to Ringelmann: Using data science to understand the productivity of software development teams

By: Ingo Scholtes, Chair of Systems Design, ETH Zürich, Switzerland (@ingo_S)
Associate Editor: Bogdan Vasilescu, University of California, Davis. USA (@b_vasilescu)

I am sure that the title of this blog post raises a number of questions: What has the Greek philosopher Aristotle to do with the 19th century French agricultural engineer Maximilien Ringelmann? And what, if anything, do the two of them have to do with software?

The answers to both questions are related to Aristotle's famous reference to systems where the "whole is greater than the sum of its parts", i.e. complex systems where interactions between elements give rise to emergent or synergetic effects. Teams of software developers which communicate and coordinate to jointly solve complex development tasks can be seen as one example for such a complex (social) system. But what what are the emergent effects in such teams? How do they affect productivity and how can we quantify them?

This is where Maximilien Ringelmann, a French professor for agricultural engineering, who would later - astonishingly - become one of the founders of social psychology, enters the story. Around 1913, Maximilien Ringelmann became interested in the question how the collective power of draught animals, like horses or oxen pulling carts or plows, changes as increasingly large teams of them are harnessed. He answered the question based on a data-driven study. Precisely, he asked increasingly large teams of his students to jointly pull on a rope, measuring the collective force they were able to exert. He found that the collective force of a team of students was less than the sum of forces exerted by each team member alone, an effect that became later known as the "Ringelmann effect". One possible explanation for the finding are coordination issues that make it increasingly difficult to tightly synchronize actions in an increasingly large team. Moreover, social psychology has generally emphasized motivational effects that are due to shared responsibility for the collective output of a team, a phenomenon known by the rather unflattering term "social loafing".

Changing our focus from draught oxen to developers, let us now consider how all of this is related to software engineering. Naturally, the question how the cost and time of software projects scale with the number of software developers involved in the project is of major interest in software project management and software economics.

In the 1975 book The Mythical Man-Month, Fred Brooks formulated his famous law of software project management, stating that "adding manpower to a late project makes it later". Just like for the Ringelmann effect, different causes for this have been discussed. First, software development naturally comes with inherently indivisible tasks which cannot easily be distributed among a larger number of developers. Illustrating this issue, Brooks' stated that "nine women can't make a baby in one month". Secondly, larger team sizes give rise to an increasing need for coordination and communication that can limit the productivity of team members. And finally, for developers added to a team later, Brooks' discussed the "ramp-up" time that is due to the integration and education of new team members. The result of these combined effects is that the individual productivities of developers in smaller teams cannot simply be multiplied to estimate their productivity in a larger team.

While the (empirical) software engineering community has been rather unanimous about the fact that the productivity of individual developers is likely to decrease as teams grow larger, a number of recent works in management science and data science (referenced and summarized in [1]) have questioned this finding in the context of Open Source communities. The argument is that larger team sizes increase the motivation of individuals in Open Source communities, thus giving rise to an "Aristotelian regime" where indeed the whole team produces more than expected based on the sum of its parts. The striking consequence of this would be that Open Source projects are instances of economies of scale, where the effort of production (in terms of team members involves) sublinearly scales with the scale of the project. In contrast traditional software projects represent diseconomies of scale, i.e. the cost of production superlinearly increases as projects become larger and more complex.

The goal of our study [1] was to contribute to this discussion by means of a large-scale data-driven study. Precisely, we studied a set of 58 major Open Source projects on gitHub with a history of more than ten years and a total of more than half a million commits contributed by more than 30,000 developers. The question that we wanted to answer is simple: How does the productivity of software development teams scale with the team size? In particular, we are interested whether Open Source projects indeed represent exceptions from basic software engineering economics, as argued by recent works.

Regarding methodology, answering this question involves two important challenges:
  1. We need a quantitative measure to assess the productivity of a team
  2. We must be able to calculate the size of a development team at any given point in time
In our work, we addressed these challenges as follows. First, in line with a large body of earlier studies and notwithstanding the fact that it necessarily gives rise to a rather limited notion of productivity, we use a proxy measure for productivity that is based on the amount of source code committed to the project's repository. In fact, there are different ways to define such a measure. While a number of previous studies have simply used the number of commits to proxy the amount of code produced, we find that the distribution of code contributions in these commits are so broadly distributed that we cannot simply use it as a measure for productivity. Doing so would substantially bias our analysis. To avoid this problem, we use the Levenshtein distance between the code versions in consecutive commits, which allows us to quantify the number of characters edited between consecutive versions of the source code.

A second non-trivial challenge is to assess the size of a development team in Open Source communities. Most of the time there is no formal notion of a team, so who should be counted as a team member at a given point in time? Again, we address this problem by means of an extensive statistical analysis. Specifically, we analyze the inter-commit times of developers in the commit log. This allows us to define a reasonably-sized time window based on the prediction whether team members are likely to commit again in the future after a given period of inactivity. This time window can then be used to estimate team sizes in a way that is substantiated by the temporal actitiy distribution in the data set (see [1] for details).

We now have all that we need. To answer our question we only need to plot the average code contribution per team member (measured in terms of the Levenshtein distance) against the size of the development team (calculated as described above). If Ringelmann and Brooks are right, we expect a decreasing trend which indicates that developers in larger development teams tend to produce less. If, on the other hand, studies highlighting synergetic effects in OSS communities are right, we expect an increasing trend which indicates that developers in larger developments teams tend to produce more (because they are more motivated). The results across all of the 58 projects are shown in the following plot.

The clear decreasing trend that can be observed visually shows that Ringelmann and Brooks are seemingly right. We can further use a log-linear regression model to quantify the scaling factors and to assess the robustness of our result. This analysis confirms a strong negative relation, spanning several orders of magnitude in terms of the code contributions. Notably, this negative relation holds both at the aggregate level as well as for each of the studied projects individually.

While our analysis quantitatively confirms the Ringelmann effect in all of the studied projects, we have not yet addressed why it holds. Unfortunately, it is non-trivial to quantify potential motivational factors that have been discussed in social psychology. But what we can do is to study potential effects which are due to increasing coordination efforts. For this, we again use the time-stamped commit log of projects to infer simple proxies for the coordination structures of a project. Precisely, we construct complex networks based on the co-editing of source code regions by multiple developers. Whenever we detect that a developer A changed a line of source code that was previously edited by a developer B, we draw a link from A to B. The meaning of such a link is that we assume that there is the potential need for developer A to coordinate his or her change with developer B.

The result of this procedure are co-editing networks that can be constructed for different time ranges and projects. We can now study how these networks change as teams increase in size. What we find is that, in line with Brooks' argument on the increasing coordination and communication effort, the number of links in the co-editing networks tends to grow in a super-linear fashion as teams grow larger. The result of this is that the coordination overhead for each team member is likely to increase as the team grows, thus providing an explanation for the decreasing code production. Moreover, by fitting a model that allows us to estimate the speed at which co-editing networks grow in different projects, we find that there is a statistically significant relation between the growth dynamics of co-editing links and the scaling factor for the decrease of productivity. This finding indicates that the management of a project and resulting coordination structures can significantly influence the productivity of team members, thus enforcing or mitigating the strength of the Ringelmann effect as the team grows.

So, do developers really become more productive as teams grow larger? Does the whole team really produce more than the sum of its team members? Or do we find evidence for Brooks' law and the Ringelmann effect? Based on our large-scale data analysis of more than 580,000 commits by more than 30,000 developers in 58 Open Source Software projects, we can safely conclude that there is a strong Ringelmann effect in all of the studied Open Source projects. As expected based on basic software engineering wisdom, our findings show that developers in larger teams indeed tend to produce less code than developers in smaller teams. Our analysis of time-evolving co-editing networks constructed from the commit log history further suggests that the increasing coordination overhead imposed by larger teams is a factor that drives the decrease of developer productivity as teams grow larger.

In summary, Open Source projects seem to be no magical exceptions from the basic principles of collaborative software engineering. Our study demonstrates how data science and network analysis techniques can provide actionable insights into software engineering processes and project management. It further shows how the application of computational techniques to large amounts of publicly available data on social organizations allows to study hypotheses relevant to social psychology. As such it highlights interesting relations between empirical software engineering and computational social science which provide a large potential for interesting future works.

[1] Ingo Scholtes, Pavlin Mavrodiev, Frank Schweitzer: From Aristotle to Ringelmann: a large-scale analysis of productivity and coordination in Open Source Software projects, Empirical Software Engineering, Volume 21, Issue 2, pp 642-683, April 2016, available online


  1. Researchers from all across the globe mainly encounter two kinds of data, the first being qualitative data and the later quantitative data. Quantitative data depicts the quality and can be scrutinized, but measuring it precisely is daunting enough; in contrast quantitative data can be easily measured and is depicted in number or amount. statistical data analysis services