Sunday, February 28, 2016

Do commercial software teams use GitHub? How?

By: Eirini Kalliamvakou, University of Victoria, Canada (@irina_kAl)
Associate Editor: Bogdan Vasilescu, University of California, Davis. USA (@b_vasilescu) 


GitHub is really popular. Right now it has more than 31 million repositories and over 12 million users. It is also growing at an impressive rate, becoming the tool of choice for a lot of development teams. Much empirical research in software engineering is currently focusing on GitHub; the transparent development environment it affords, together with its pull-based workflow, provide a lightweight mechanism for managing code changes. GitHub’s features impact how it is used and the benefits it provides to teams’ development and collaboration. While most of the evidence we have is from GitHub’s use in open source software (OSS) projects, GitHub is also used in an increasing number of commercial projects.


It is difficult to pin down what “good collaboration” is, what tools and practices make it up. So when something as popular as GitHub comes along in the service of “good collaboration” we want to know how it works in practice. In a qualitative study, we investigated GitHub and collaboration by looking at the practices of commercial software teams. That means teams that develop software that is proprietary, built in commercial organizations, and hosted on private repositories. Our study looked both at how these teams use GitHub and how they think about collaboration.


We surveyed and interviewed professional developers that use GitHub in their workplace. The practices that we heard about from the commercial software teams fall under 3 categories: the teams’ workflow, their communication and coordination, and their self-organization.


Workflow
We asked participants to describe every step of the process that takes them from a task list all the way through to a merge. Our finding was that commercial teams follow a “branch & pull” workflow (Figure 1), that is not either the fork & pull, or the shared repository workflows, the two main workflows recognized by GitHub.


Figure 1: Branch & pull workflow


In the fork & pull model, there is a main repository for the project or the team, and developers isolate their work by creating a copy of the repository and making their changes there. When they are done they submit a pull request and that triggers a code review, changes and a merge. The fork & pull model has a distinct team phase — the code review step — that is triggered by the pull request. This is in line with the tradition of open source projects that were the inspiration for this workflow – pull requests act as screening mechanism for code that is coming from unknown contributors. Branch & pull works like fork & pull, with the difference being that work is isolated through the use of branches rather than forks. Instead of users making a copy of the main repository under their own GitHub account, they make a branch inside the main repository. That is an appropriation of an open source-style workflow to a commercial team environment. 


We found the branch & pull workflow to be very popular, reported by 23 out of our 24 interviewees and the reason was because it made the code reviews part of the workflow, instead of an afterthought.


Communication & Coordination
We asked participants to give us examples of circumstances when they found it essential to communicate and coordinate with their team and the mechanisms they used. The overall observation was that although GitHub is not a communication tool, communication is happening on GitHub.


Developers had a preference for awareness rather than direct communication, by looking at the issue list, the commit list, getting notification emails from GitHub, or using a chat client that still integrates with GitHub. Between these mechanisms, the preference was pretty much equal. GitHub was preferred for code-centric discussions. Most developers said that GitHub lacked the space and synchronicity that is essential when discussing ideas, and in those cases they found the need to move their conversations to a communication tool that was external to GitHub. They found however that communication through comments was great for code-centric discussions. Why? Because all the information that is related to an artifact is attached to it and remains so, becoming essentially a record of decisions. 


Self-organization
The primary way self-organization showed up was as self task-assignment. A developer would choose what to work on based on their expertise and availability, and would pick tasks off of GitHub’s issue list (or other issue tracker if that is what the team was using). This is not a practice that is typically associated with commercial projects. However, the manager is still part of the process. Their role is to define bite-size tasks (can be worked on by a single developer), and they are still part of prioritization and estimation.


Does it sound familiar?
All the above work practices that we heard about from commercial software teams using GitHub, are also known open source practices. We know that open source projects that use GitHub use pull request and they screen contributions with them. What’s unexpected here is that the commercial teams do not have the same need for screening - there is trust built into the team - and yet they still prefer to use the pull requests as an opportunity to review the code. Open source projects on GitHub also use comments for providing direct feedback and as part of the code reviews. This is true of open source project in general; lightweight, text-based communication that is automatically archived is the preferred way of communicating. Finally, self-organization is a long known practice in OSS projects.


What do these results mean?
One thing to take away is that we saw GitHub acting as a vehicle for commercial software teams to adopt best practices, styled after open source ones. Our results indicate that GitHub is giving commercial teams the chance to adopt best practices that are tried and true in open source projects when they choose to use it.


What is more, GitHub seems to act not only as a toolkit but also as a process kit. This is based on how consistent we saw GitHub being used — 23 out of the 24 commercial interviewees described the same workflow to us. And that was not the expected one for commercial projects, but the one that open source projects use. GitHub seems to come together with a “way to use it” and that travels together with the tool — very visibly in the open source world but quite organically in the commercial world too.


How? This takes us to the third take away: GitHub users advocate it in the workplace. They are the ones that bring the bundle that is the tool and process and best practice into their organizations. It is a bottom-up approach rather than a top-down one. 

Where to go from here?
Given the overlap between Github and Git, which of them is responsible for the trend we saw? Would Git by itself have the same effect?  The same question applies to other GitHub-like tools. How much of GitHub can we strip away before the effect disappears?

References:


If you liked this post, you might also enjoy reading:



Sunday, February 21, 2016

Self-writing Software

By Rishabh Singh (@rishabhs), Associate Editor.

In this world of ever-increasing automation from self-driving cars to personal robots, how far are we in automating the art of writing software? Surprisingly, the problem of automatically learning programs, also known as “program synthesis”, has been an intriguing research area dating back to 1970s [1]. A lot of work then focused on using deductive reasoning in automated theorem provers to derive programs by systematically transforming the program specifications. These techniques were able to automatically synthesize simple programs involving numerical computations and data structure operations, but didn’t quite catch on since often times writing a complete specification of the program proved to be a more daunting task than writing the original program itself in first place. As a result, the interest in program synthesis research started to die down in late 1980s, but the research area has seen a recent resurgence and is now considered one of the hottest areas in software engineering and programming languages community.

What changed from 1980s?
In my opinion, the three biggest advances that have led to this resurgence are: 1) exceptional advances in constraint solving algorithms (such as SAT and SMT solvers), 2) slightly different formulation of the problem (more on it next), and 3) Moore’s law giving us faster and faster compute power. While the traditional techniques focused on learning arbitrary programs from complete specifications, the recent techniques formulate the problem in two key different ways. First, instead of relying on complete specifications which are hard to write, they allow for incomplete and more natural specification mechanisms such as input-output examples, demonstrations of program behavior, partial programs with holes, natural language, etc. and their combinations. Second, instead of considering Turing-complete languages for the hypothesis space of the possible set of programs, they restrict the hypothesis space using a Domain-specific language. This results in not only faster search algorithms since the hypothesis space is more structured, but also results in learning programs that are more understandable and readable.

Isn’t this an undecidable problem?
Yes, in general it is an undecidable problem. Even answering if a program always terminates is an undecidable problem. Here, synthesis techniques are solving a much harder problem of learning a program that not only terminates but also satisfies the properties defined by the specification. For some fragments of language hierarchy such as restricted regular expressions, there are decidable and efficient synthesis algorithms. Even for more expressive languages, techniques based on bounded reasoning and abstract interpretation have proven to be quite successful. Instead of providing guarantees over complete input space (an undecidable problem), these techniques only consider a finite bounded set of inputs with the assumption that if the learnt program behaves correctly on the bounded set of inputs it is very likely to also work for all possible inputs.

What can we synthesize currently?
Even with all the recent advances, the ideal applications for synthesis techniques have been in learning programs that are small (few tens of lines) but are complex and tricky to write manually. Some of the recent applications include synthesizing programs for data extraction and transformation using examples, parsers from interactive labeling of sub-expressions in a program, program refactoring sequences given an input and output program, network policies from scenarios consisting of packet traces and corresponding actions to the packets, optimal platform-tuned implementations of signal processing transforms from their mathematical descriptions, compiler for a low-power spatial architectures that partitions the program into optimal fragments, efficient synchronization in concurrent programs, low-level bitvector algorithms, type completions, hints and feedback for programming assignments, and many many other cool applications [2]. Let me expand a bit more on how synthesis techniques are being applied concretely in two interesting domains of Data Wrangling and Computer-aided Education.

Data Wrangling & Computer-aided Education
Data wrangling is a term coined to refer to the process of converting data in a raw format to a suitable format that allows for subsequent useful analysis. Since writing such data extraction/transformation scripts manually is cumbersome and sometimes even beyond the programming expertise of the users, some studies estimate that this process of data wrangling takes 80% of the total analysis time. The FlashFill system [3,4], often quoted as one of the top new features in Microsoft Excel 2013, allows users to perform regular-expression based data transformations such as splitting and merging by only providing a few input-output examples without the need to write complex Excel macros or VBA. These techniques have been extended to learn semantic data type transformations (such as date, address, etc.) and table join/lookup programs by examples. The FlashExtract system [5] allows users to perform data extraction from semi-structured text files using few examples. The key idea behind these systems is to first design a domain-specific language that is expressive enough to encode majority of the data wrangling tasks but at the same time is concise enough for amenable learning. These systems then use Version-space algebra based techniques to efficiently search over the large space of programs in the DSL to learn the programs that are consistent with the user-provided examples.

Computer-aided Education is another domain where synthesis techniques are finding an interesting application. There has been a lot of interest recently in making quality education accessible to students worldwide using education initiatives such as EdX, Coursera, and Udacity. The massive open online courses (MOOC) on these platforms are typically taken by hundreds of thousands of students, which presents a massive challenge to provide quality feedback to these students in the same way teachers do in traditional classrooms. Since there are several different ways to solve a given problem especially in programming courses, we cannot simply use a syntactic approach to provide feedback. The AutoProf system [6] uses synthesis techniques to automatically generate feedback on introductory programming assignments. Given an incorrect student submission, a reference solution, and an error model capturing the set of common mistakes students typically make for a given problem, AutoProf uses the Sketch synthesis system to compute minimal changes to the student program such that it becomes functionally equivalent to the reference solution. It has been used by TAs to grade homework assignments, and is also being integrated on the MITx platform. A similar auto-grading system CPSGrader [7] was developed for grading laboratory problems in the area of cyber-physical system using parameter synthesis techniques, and was successfully used for grading thousands of submissions for the online edX class. AutomataTutor [8] is another such system that uses synthesis techniques for automated grading and feedback generation for finite automata constructions, and is currently being used by thousands of students worldwide.

Isn’t Program Synthesis just Machine Learning?
There has been a long ongoing debate on how machine learning is similar/different from program synthesis, since program synthesis techniques (machines) are essentially learning a program from some specification (training data). However, there are some key differences. First, machine learning techniques typically learn a complex higher-order function over a set of finite feature values, whereas program synthesis techniques learn complex structured programs in a Domain-specific language (sometimes recursive and even Turing-complete). This also results in human-understandability of the learnt programs, which can be manually inspected in contrast to complex higher-dimensional functions learnt by machine learning techniques. Second, the synthesis techniques learn from very few examples (often 1), whereas machine learning techniques requires a large amount of data. Finally, synthesis techniques aren’t very robust to noise in the datasets whereas machine learning techniques are able to handle noise quite effectively. A more comprehensive comparison can be found in this nice survey article [9].

What are the current challenges and trends in Program Synthesis research?
The scalability of synthesis algorithm to learn larger and larger pieces of code has been a perpetual challenge. A recent trend has been to devise new ways to combine machine learning techniques with program synthesis techniques for both scaling the inference process and also making it more robust to handle noise in specification. Another important challenge has been in developing the right user interaction model, where users can provide specification iteratively in an interactive fashion instead of providing the specification in a one-shot black-box manner.

So, are our programming jobs going to be taken up by robots?
Well, not really, at least not in foreseeable future. With the current synthesis algorithms and computational resources, I believe that soon enough we will be able to leverage synthesis technology to automatically write small complex functions using high-level specifications in our day-to-day programming tasks, but the creative burden of coming up with the right design and algorithms for composing these functions to build larger software artifacts would still lie with the programmer. Though, one day we might reach the stage where even the complex algorithms and designs can be automatically generated by the synthesis algorithms. From one of the famous quotes of President John F. Kennedy (slightly rephrased): “If we have the talent to invent new machines that puts people out of work, we have the talent to put them back to work.” Then, we will reinvent programming.


References
  1. Zohar Manna, Richard J. Waldinger: Synthesis: Dreams - Programs. IEEE Trans. Software Eng. 5(4): 294-328 (1979)
  2. Rastislav Bodík, Barbara Jobstmann. Algorithmic program synthesis: introduction. STTT 15(5-6): 397-411 (2013)
  3. Sumit Gulwani. Automating string processing in spreadsheets using input-output examples. POPL 2011: 317-330
  4. Sumit Gulwani, William R. Harris, Rishabh Singh: Spreadsheet data manipulation using examples. Communications of the ACM 55(8): 97-105 (2012)
  5. Vu Le, Sumit Gulwani: FlashExtract: a framework for data extraction by examples. PLDI 2014: 542-553
  6.  Rishabh Singh, Sumit Gulwani, Armando Solar-Lezama: Automated feedback generation for introductory programming assignments. PLDI 2013: 15-26
  7. Garvit Juniwal, Alexandre Donzé, Jeff C. Jensen, Sanjit A. Seshia: CPSGrader: Synthesizing temporal logic testers for auto-grading an embedded systems laboratory. EMSOFT 2014: 24:1-24:10
  8.   Rajeev Alur, Loris D'Antoni, Sumit Gulwani, Dileep Kini, Mahesh Viswanathan: Automated Grading of DFA Constructions. IJCAI 2013
  9.  Sumit Gulwani, José Hernández-Orallo, Emanuel Kitzelmann, Stephen H. Muggleton, Ute Schmid, Benjamin G. Zorn: Inductive programming meets the real world. Communications of the ACM 58(11): 90-99 (2015)

If you liked this you may also like:

Julig, R.K., "Applying formal software synthesis", in Software, IEEE , vol.10, no.3, pp.11-22, 1993. 
Kant, E., "Synthesis of mathematical-modeling software", in Software, IEEE , vol.10, no.3, pp.30-41, 1993. 
Mueller, R.A.; Duda, M.R., "Formal Methods of Microcode Verification and Synthesis", in Software, IEEE , vol.3, no.4, pp.38-48, 1986. 
Abbott, B.; Bapty, T.; Biegl, C.; Karsai, G.; Sztipanovits, J., "Model-based software synthesis", in Software, IEEE , vol.10, no.3, pp.42-52, 1993. 

Sunday, February 14, 2016

Why Should Software Architects Write Code?


By Mehdi Mirakhorli (@MehdiMirakhorli), Associate Editor.

In the software engineering community there is a divide on who needs architects, what responsibilities should architects have, and whether architects should code. We all have heard these questions in one form or another, all motivated by a wide range of pragmatic opinions. Some practitioners argue that architects responsibility is for the integrity of the entire system, and satisfying business goals while mitigating the risks; Therefore, architects can postpone other lower-level decisions, including the coding decisions, to developers that typically have more limited responsibilities. On the other side of this spectrum, a group of practitioners argue architects should code. Some critics even take the criticism of “architects not practicing coding” further and argue that: “powerpoint architects'' are ineffective while expensive to afford. These architects join the project kickoff meetings, draw all sort of diagrams and leave before implementation starts.  They are ineffective because of their absence and lack of ongoing feedback during the development cycle. We should also keep in mind that, such disconnection will create more problems considering that the requirements which the same architecture addressed will more likely change later on.

Other practitioners choose a moderated perspective. In his recent keynote talk, Martin Fowler explores different ways of stimulating collaboration and communication between programmers and architects, and he advocates the idea of architects pair-programming with developers. Ward Cunningham in an anti-pattern called “Architects Don't Code'' addresses the same issue---  that  “The Architect responsible for designing your system hasn't written a line of code in two years''. Ward recommends getting architects involved at the implementation level.
In his new book, “Software Architecture for Developers'', Simon Brown advocates a transition from architects as “ivory tower'' to a role which is about coding, coaching and collaboration. All these practitioners agree on the necessity of having some form of architecture design. Simon Brown argues that most software developers are not architects or do not have extensive design skills. Therefore, he advocates increasing the design knowledge of developers as a way to bridge the gap.

These discussions are just a few samples of pragmatic perspectives. Following either of these opinions largely depends on having the necessary trust in the person advocating it. 

While there have been numerous studies about the impact of architecture in a software system, there has not been an empirical study to examine the influence that software architects can have during coding activities of a software system. In our research we use mining software repository techniques to answer the question of Why should architects write code?

We studied the architecture and implementation of six software systems. In these case studies we looked at the involvement of developers in implementing architectural patterns/tactics (e.g. authentication, audit trail, thread pooling, scheduling).


Observation 1: Developers have more difficulty implementing the architectural choices (e.g. patterns/tactics) than functional features.  

The developers contributing to these projects were observed to have difficulties implementing architectural tactics. The tactical files for all of these projects underwent more refactoring efforts than non-tactical files and contained more defects. In fact, from November 2008 through November 2011, 2.8 times as many defects were found in tactical files than non-tactical files for the Hadoop project, while from January 2009 through November 2011, 2.0 times as many defects were found in these files for the OFBiz project.

Following this observation, we formulated the following hypothesis: When software architects write code, the number of defects in the tactical fragments of the systems will be reduced.

To examine this hypothesis, we conducted a second study to examine the impact that architecture-savvy and non-architecture savvy developers have on defects in design fragments of a system.
We divided the source code commits into architecturally-significant commits and functional commits. Code changes impacting the architectural patterns/tactics were classified as architecturally significant while those commits only impacting the functional features of the system were labeled as functional commits. Then we created a profile for each developer involved in the implementation of tactics. We used a persona-based human modeling approach to extract the architecture and design experience of the developers from the six software projects. For each developer, we created a profile to document his/her design expertise. Next, these profiles were categorized into architecture-savvy and non-architecture-savvy personas. Using these profiles, we created a design contribution matrix, examining the relationship between developers design backgrounds and defects in design fragments. In this post, I report a some of the results of our study and what we observed across these six projects.


Observation 2: Non-architecture savvy developers introduce more defects into architecturally-significant code snippets than architecture-savvy developers.  

Overall in four out of six projects, we have strong statistical results, validating our research hypothesis. This provides empirical evidence on why software architect should write code or, at least, be engaged in the development of tactical fragments in a system. Developers with little or no background in architecture and design struggled to implement architectural patterns/tactics. The majority of the defects in the tactical files were introduced because of a misinterpretation of the design concept, being unaware of the rationale for the tactic, or just plainly implementing it incorrectly.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I am looking forward to read your comments and feedback on this post. Also I would like to encourage researchers and practitioners to conduct similar empirical studies to examine the controversial pragmatic opinions related to Architecture and Coding.

Next post is going to summarize a few practices to bridge the gap between design and implementation. Stay tuned!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You may also like:

Buschmann, F.; Bartholdt, J., "Code Matters!," in Software, IEEE , vol.29, no.2, pp.81-83, March-April 2012.

Frank Buschmann, "Introducing the Pragmatic Architect," IEEE Software, vol. 26, no. 5, pp. 10-11, September/October, 2009.

Fowler, M., "Design - Who needs an architect?," in Software, IEEE , vol.20, no.5, pp.11-13, Sept.-Oct. 2003.

Paul Clements, Mary Shaw, ""The Golden Age of Software Architecture" Revisited," IEEE Software, vol. 26, no. 4, pp. 70-72, July/August, 2009.

Acknowledgement:
"This post includes joint work with Inayat Rehman, Matthew Thornton and Azat Aralbay Uulu, graduate students at RIT. I would like to thank Mei Nagappan for crunching the numbers for the experiment reported in this blog."


Sunday, February 7, 2016

License compliance for off-the-shelf free/open source software components

Associate Editor: Stefano Zacchiroli

One of the major accomplishments of Free and Open Source Software (FOSS) has been the materialization of the ideas behind component-off-the-shelf (COTS) software engineering. COTS encouraged reuse: any time developers would need certain functionality, they would integrate ready-to-use components that implement the desired functionality. There exist innumerable FOSS components. Today, a necessary skill for any software developer is to be able to find, select and integrate FOSS components to reuse.

FOSS components are usually available for download at zero cost. Many of them are highly reliable and are widely used in the industry. One of the best examples is OpenSSL. As reported by the Guardian daily: "There is a very high chance that at least one service that you use [uses OpenSSL]" [1].

While FOSS components are likely to be free (as in gratis) to download, they do have a cost when the final product in which they are integrated is distributed ("shipped" to the customer, so to speak). This cost is compliance with the license under which the component is being made available. There exist many different FOSS licenses, but each of these licenses have several important characteristics in common: the source code must be made available, the code should be allowed to be modified, and perhaps the most important of all, the source code (including modifications) should be redistributable in either source code and/or binary form. Each FOSS license can also set a series of conditions that must be satisfied before the right to such redistribution is granted.

Therefore, when a developer is planning to reuse a given FOSS component, the second and third questions should be, "what is its license?" and "is this license compatible with the license of the product where it is being integrated?" (the first, of course, is: "does it do what is needed"?). For example, the license of OpenSSL has only one major condition: it requires an acknowledgment: "this product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit" to appear in advertising materials and redistributions. This condition might be impossible to satisfy by some; for example, software licensed under the GPL-2.0 or GPL-3.0 cannot satisfy it, and hence, cannot use the library.

Identifying the license of a component is not always easy, primarily because there is no agreed format to document it. SPDX (Software Package Data Exchange), a consortium of organizations, for- and non-for-profit has been working towards the creation of a standardized format for documenting licensing information, including the standardization of the names of the licenses, and tools to maintain this information.

Answering the question "is this license compatible with the license of the product where it is being integrated?" is not always an easy question to answer. Whether a system can reuse software under a given license is affected by the architecture of the system, the way the component is connected to the rest of the system, and how the component is redistributed.

In a survey, Soler and Henkel found that developers do not usually have the training to understand software licenses nor their impact [2]. I am not advocating that they should be legal experts, but rather, that they should be able to identify potential issues and, when necessary, ask for help. Ideally, any organization that creates and distributes software should have a license compliance team that is responsible for approving the reuse of external components (including FOSS), and to assess that—when a product is released—all the necessary requirements of the licenses of the reused components are satisfied.

FOSS is an enormous resource, and the Internet makes it easy to search and find suitable FOSS components. The real challenge is to properly reuse FOSS by satisfying the requirements of its licenses.

Further reading

  • Bain [3], describes, from a legal point of view, the difficulties and challenges of integrating components licensed under the GPL with other licenses.
  • For more information about SPDX, please visit https://spdx.org/
  • In [4] we describe the challenges of license identification in FOSS.
  • In [5] we describe the problem of license mismatch when reusing FOSS components and different methods to address it.

References

  • [1] Samuel Gibbs, "Heartbleedbug: what do you actually need to do to stay secure?", The Guardian, April 10, 2014. 
  • [2] Malcolm Bain. "Software interactions and the GPL", International Free and Open Source Software Law Review, 2(2), 2011.
  • [3] Manuel Sojer and Joachim Henkel. "License risks from ad hoc reuse of code from the internet." Communications of the ACM, 54(12):74–81, December 2011.
  • [4] Daniel M. German, Yuki Manabe, Katsuro Inoue. (2010). "A sentence-matching method for automatic license identification of source code files". ASE 2010, 25th IEEE/ACM International Conference on Automated Software Engineering, Antwerp, Belgium, September 20-24, 2010. ACM 2010
  • [5] Daniel M. German, Ahmed E. Hassan. (2009). "License integration patterns: Addressing license mismatches in component-based development". Proceedings 31st. International Conference on Software Engineering, ICSE 2009. 31st International Conference on Software Engineering (ICSE 2009), Vancouver, Italy, 2009-05-16 (188--198)

Wednesday, February 3, 2016

Genetic Improvement

Associate Editor: Federica Sarro (@f_sarro)

Genetic improvement [1] has been demonstrated to be able to automatically repair real bugs in real programs [2] and give considerable speed up to real programs [3], e.g. by specialising them [4] or adapting to new hardware. There is now interest in using GI to improve non-functional properties such as extending battery life or reducing memory consumption and indeed in providing software designers with a range of trade-offs between different objectives: such as speed v. accuracy, speed v. memory.
Typically, in the case of bug fixing, a test case is needed which demonstrates the bug and a handful of tests which are used to ensure the putative fix has not repaired the bug at the expense of destroying other parts of the program. Usually the test comes from regression test suites developed as the program was written but they could be automatically generated. In GI it is common to run both the original code and the mutated code on the tests and compare their answers and performance. Effectively the old code becomes its own specification and even for new tests it can be used as the test oracle. 



The figure shows the basics of Genetic Improvement. On the left hand side is the system to be improved and its test suite. On the right is the generational cycle of artificial evolution which optimises patches. Typically a patch deletes, copies or inserts an existing line of human written code. GI does not need to invent entirely new code. Each generational mutation and crossover create new patches. The patches are applied via the grammar to create patched versions of the software. These variants are tested on a small randomly selected part of the test suite and their answers and resource consumption are compared to those of the original system. Only patches responsible for better programs get children in the next generation. After a small number of generations the best patch in the last generation is cleaned up by removing unneeded changes and validated.
The grammar describes the original code and legal variations from it. In the case of automated bug repair, its role is typically replaced by abstract syntax trees ASTs. In the case of AST, perhaps about half the mutants compile and similarly the grammar ensures between 40% and 100% of patches compile. Typically, most patches that compile also run and produce an- swers. CPU limits, timeouts or loop iteration limits are imposed to ensure termination and typically some form of sandboxing is used to ensure badly behaved mutants cannot damage the GI system itself.
Whilst much GI work has been on source code, Darwinian evolution has been shown to be effective on Java byte code and even machine code. GI has been shown to offer Pareto tradeoffs in graphics applications and in improving Bioinformatics and other applications which use graphics cards as parallel computing accelerators (known as GPGPU).  



Presenters at the First Genetic Improvement workshop, Madrid 2015 
There is increasing interest in genetic improvement, with the number of papers and successful applications increasing. Last year saw the first international workshop and a GI special issue of the “Genetic Programming and Evolvable Machines” journal (publication due later this year). Whilst this year the workshop will be repeated in Denver and there will be a special session on GI at the IEEE World Congress on Computational Intelligence in Vancouver. 


References
  1. [1]  William B. Langdon. Genetically improved software. In Amir H. Gandomi et al., editors, Handbook of Genetic Programming Applications, chapter 8, pages 181–220. Springer, 2015.
  2. [2]  Westley Weimer, Stephanie Forrest, Claire Le Goues, and ThanhVu Nguyen. Automatic program repair with evolutionary computation. Communications of the ACM, 53(5):109–116, June 2010. 
    [3]  William B. Langdon and Mark Harman. Optimising existing software with genetic programming. IEEE Transactions on Evolutionary Computation, 19(1):118–135, February 2015.
    [4]  Justyna Petke, Mark Harman, William B. Langdon, and Westley Weimer. Using genetic improvement and code transplants to specialise aC++ program to a problem class. In Miguel Nicolau et al., editors, 17th European Conference on Genetic Programming, LNCS Springer v. 8599. 
You might also enjoy reading: