Sunday, February 28, 2016

Do commercial software teams use GitHub? How?

By: Eirini Kalliamvakou, University of Victoria, Canada (@irina_kAl)
Associate Editor: Bogdan Vasilescu, University of California, Davis. USA (@b_vasilescu) 

GitHub is really popular. Right now it has more than 31 million repositories and over 12 million users. It is also growing at an impressive rate, becoming the tool of choice for a lot of development teams. Much empirical research in software engineering is currently focusing on GitHub; the transparent development environment it affords, together with its pull-based workflow, provide a lightweight mechanism for managing code changes. GitHub’s features impact how it is used and the benefits it provides to teams’ development and collaboration. While most of the evidence we have is from GitHub’s use in open source software (OSS) projects, GitHub is also used in an increasing number of commercial projects.

It is difficult to pin down what “good collaboration” is, what tools and practices make it up. So when something as popular as GitHub comes along in the service of “good collaboration” we want to know how it works in practice. In a qualitative study, we investigated GitHub and collaboration by looking at the practices of commercial software teams. That means teams that develop software that is proprietary, built in commercial organizations, and hosted on private repositories. Our study looked both at how these teams use GitHub and how they think about collaboration.

We surveyed and interviewed professional developers that use GitHub in their workplace. The practices that we heard about from the commercial software teams fall under 3 categories: the teams’ workflow, their communication and coordination, and their self-organization.

We asked participants to describe every step of the process that takes them from a task list all the way through to a merge. Our finding was that commercial teams follow a “branch & pull” workflow (Figure 1), that is not either the fork & pull, or the shared repository workflows, the two main workflows recognized by GitHub.

Figure 1: Branch & pull workflow

In the fork & pull model, there is a main repository for the project or the team, and developers isolate their work by creating a copy of the repository and making their changes there. When they are done they submit a pull request and that triggers a code review, changes and a merge. The fork & pull model has a distinct team phase — the code review step — that is triggered by the pull request. This is in line with the tradition of open source projects that were the inspiration for this workflow – pull requests act as screening mechanism for code that is coming from unknown contributors. Branch & pull works like fork & pull, with the difference being that work is isolated through the use of branches rather than forks. Instead of users making a copy of the main repository under their own GitHub account, they make a branch inside the main repository. That is an appropriation of an open source-style workflow to a commercial team environment. 

We found the branch & pull workflow to be very popular, reported by 23 out of our 24 interviewees and the reason was because it made the code reviews part of the workflow, instead of an afterthought.

Communication & Coordination
We asked participants to give us examples of circumstances when they found it essential to communicate and coordinate with their team and the mechanisms they used. The overall observation was that although GitHub is not a communication tool, communication is happening on GitHub.

Developers had a preference for awareness rather than direct communication, by looking at the issue list, the commit list, getting notification emails from GitHub, or using a chat client that still integrates with GitHub. Between these mechanisms, the preference was pretty much equal. GitHub was preferred for code-centric discussions. Most developers said that GitHub lacked the space and synchronicity that is essential when discussing ideas, and in those cases they found the need to move their conversations to a communication tool that was external to GitHub. They found however that communication through comments was great for code-centric discussions. Why? Because all the information that is related to an artifact is attached to it and remains so, becoming essentially a record of decisions. 

The primary way self-organization showed up was as self task-assignment. A developer would choose what to work on based on their expertise and availability, and would pick tasks off of GitHub’s issue list (or other issue tracker if that is what the team was using). This is not a practice that is typically associated with commercial projects. However, the manager is still part of the process. Their role is to define bite-size tasks (can be worked on by a single developer), and they are still part of prioritization and estimation.

Does it sound familiar?
All the above work practices that we heard about from commercial software teams using GitHub, are also known open source practices. We know that open source projects that use GitHub use pull request and they screen contributions with them. What’s unexpected here is that the commercial teams do not have the same need for screening - there is trust built into the team - and yet they still prefer to use the pull requests as an opportunity to review the code. Open source projects on GitHub also use comments for providing direct feedback and as part of the code reviews. This is true of open source project in general; lightweight, text-based communication that is automatically archived is the preferred way of communicating. Finally, self-organization is a long known practice in OSS projects.

What do these results mean?
One thing to take away is that we saw GitHub acting as a vehicle for commercial software teams to adopt best practices, styled after open source ones. Our results indicate that GitHub is giving commercial teams the chance to adopt best practices that are tried and true in open source projects when they choose to use it.

What is more, GitHub seems to act not only as a toolkit but also as a process kit. This is based on how consistent we saw GitHub being used — 23 out of the 24 commercial interviewees described the same workflow to us. And that was not the expected one for commercial projects, but the one that open source projects use. GitHub seems to come together with a “way to use it” and that travels together with the tool — very visibly in the open source world but quite organically in the commercial world too.

How? This takes us to the third take away: GitHub users advocate it in the workplace. They are the ones that bring the bundle that is the tool and process and best practice into their organizations. It is a bottom-up approach rather than a top-down one. 

Where to go from here?
Given the overlap between Github and Git, which of them is responsible for the trend we saw? Would Git by itself have the same effect?  The same question applies to other GitHub-like tools. How much of GitHub can we strip away before the effect disappears?


If you liked this post, you might also enjoy reading:

No comments:

Post a Comment