Sunday, January 24, 2016

Diversity in software teams: buzzword or necessity?

Associate Editor: Bogdan Vasilescu, University of California, Davis. USA (@b_vasilescu) 
"I've had some bad experiences, particularly based on my gender. [...] a lot of time I'll join an online team and sometimes the other team members are very rude. In the past I have used a fake GitHub handle (my normal GitHub handle is my first name, which is a distinctly female name) so that people would assume I was male. I wouldn't lie if asked directly about my gender, but I know a lot of people assume by default others on the internet are male, and so I use a unisex handle [...] so people assume I am male without asking. I'm not happy with this solution, but sometimes I really want to participate [...] and I don't have the energy to defend myself for being female." [open source developer, USA]
This is not a singular occurrence. Despite its core meritocratic values, the open source "hacker" culture is known to be male-dominated and unfriendly to women [1, p.194]. Some authors go as far as to say that sexist behavior in open source is "as constant as it is extreme" [2]. In open source, women represent only around 10% of the all contributors [3], considerably more underrepresented than in the big tech companies; Google, for example, reports 18% women in technical jobs; Microsoft is close, with 16.6%.

Yet, open source software are well recognised for their high quality, and are becoming increasingly popular. According to a recent survey, 78% of companies run their operations on open source software; 66% create software for their customers on top of open source software. Surely, open source software teams know what they’re doing. By staying homogeneous, teams limit their members’ potential differences in values, norms, and communication styles; it also better shields them from stereotyping, cliquishness, and conflict. It becomes much less likely for team members to disagree if they all think (and look) alike, with the majority class being the Western young white male. 

"Clones" by Nick Royer (CC BY-SA)
However, teams who inadvertently act like this may be missing out. Increased team diversity results in more varied backgrounds and ideas, which provide a team with access to broader information and enhanced creativity, adaptability, and problem solving skills. This fact has been documented countless times in "traditional" teams, that interact face-to-face. Does it also hold online, in the meritocratic open source, where "code sees no color or gender" [4]? After all, on the Internet nobody knows you're a dog.

We set out to investigate how diversity affects team performance on GitHub, the nexus of open source development, using a mixture of qualitative and quantitative research methods. First, through a survey with more than 800 respondents [4], we found that individual demographic attributes are surprisingly salient: while developers are, above all, aware of one another’s programming skills, they are also well aware of each other’s gender, real name, and country of residence. Their opinions on diversity are, however, split. Most respondents (62.5%) seem to view diversity as always positive. They acknowledge that diversity in terms of demographics and technical background is often associated with new ideas and approaches to solve problems, access to different networks, lively discussions around issues and pull requests, and ultimately better code. Still, some respondents (30%) report on occasional negative effects due to diversity, such as the developer quoted at the beginning of this blog post, referring to a gender-related incident.

Second, we extracted data from a large sample of 23,000 active projects on GitHub, spanning six years of activity and including a multitude of variables. We focused on two facets of diversity: gender (having a more balanced male-female team; we used people’s names to infer their gender) and tenure (having a mixture of seniors and juniors; we estimated a person’s programming experience from across all their GitHub contributions). We then used regression analysis to model effects associated with diversity on the outputs produced by teams per unit time (we counted the number of commits to a project by team members during a quarter), as a measure of the teams' effectiveness [5]. After controling for team size, project age, social activity, and other confounds, our models show that both gender and tenure diversity are positive and significant predictors of productivity, together explaining a small but significant fraction of the data variability.

Together, the two analyses paint a complete picture: diversity is more than a buzzword, it’s a necessity! More diverse teams perform better. On a larger, economic and societal scale, these findings also suggest that added investments in educational and professional training efforts and outreach for female programmers will likely result in added overall value.

[1] Turkle, S. The Second Self: Computers and the Human Spirit. MIT Press, 2005
[2] Nafus, D. ‘Patches don’t have gender’: What is not open in open source software. New Media & Society 14, 4 (2012), 669–683.
[3] FLOSS 2013 survey,
[4] Vasilescu, B., Filkov, V., and Serebrenik, A. Perceptions of Diversity on GitHub: A User Survey. In 8th International Workshop on Cooperative and Human Aspects of Software Engineering, CHASE, IEEE (2015), 50–56.
[5] Vasilescu, B., Posnett, D., Ray, B., Brand, M.G.J. van den, Serebrenik, A., Devanbu, P., and Filkov, V. Gender and tenure diversity in GitHub teams. In ACM SIGCHI Conference on Human Factors in Computing Systems, CHI, ACM (2015), 3789–3798.

If you liked this post, you might also enjoy reading:


  1. Is there away to get more details on what it means that

    "gender and tenure diversity are positive and significant predictors of productivity".

    For example by how much?

    1. Hi Adrian,

      "positive predictors" means that other things being equal, there is an association between higher gender diversity / higher tenure diversity and increased output per unit time at the team level.

      "significant" means that the effect is statistically significant, p < 0.05.

      The effect sizes are, however, small. The two predictors explain only a small fraction of the data variance. You can find more technical details in the CHI'15 paper.

  2. This comment has been removed by the author.

  3. A marvelous Mobile App Development Company in Pakistan having a successful record. Ecommerce website design