Tuesday, April 16, 2019

Microservice API Patterns - How to Structure Data Transfer Representations and Endpoints

Authors: Olaf Zimmermann, Uwe Zdun (@uwe_zdun), Mirko Stocker (@m_st), Cesare Pautasso (@pautasso), Daniel Lübke (@dluebke)

Associate Editor: Niko Mäkitalo (@nikkis)


The Microservice API Patterns at www.microservice-api-patterns.org distill proven solutions to recurring service interface design and specification problems such as finding well-fitting service granularities, promoting independence among services, or managing the evolution of a microservice API.


Motivation
It is hard to escape the term microservices these days. Much has been said about this rather advanced approach to system decomposition since James Lewis’ and Martin Fowler’s Microservices Blog Post from April 2014. For instance, IEEE Software devoted a magazine article, a two-part Insights interview (part 1part 2) and even an entire special theme issue to the topic.

Early adopters’ experiences suggest that service design requires particular attention if microservices are supposed to deliver on their promises:
  • How many service interfaces should be exposed?
  • Which service cuts let services and their clients deliver user value jointly, but couple them loosely?
  • How often do services and their clients interact to exchange data? How much and which data should be exchanged?
  • What are suitable message representation structures, and how do they change throughout service lifecycles?
  • How to agree on the meaning of message representations – and stick to these contracts in the long run?

The Microservice API Patterns (MAP) at www.microservice-api-patterns.org cover and organize this design space providing valuable guidance distilled from the experience of API design experts.

What makes service design hard (and interesting)?
An initial microservice API design and implementation for systems with a few API clients often seem easy at first glance. But a lot of interesting problems surface as systems grow larger, evolve, and get new or more clients:
  • Requirements diversity: The wants and needs of API clients differ from one another, and keep on changing. Providers have to decide whether they offer good-enough compromises or try to satisfy all clients’ requirements individually.
  • Design mismatches: What backend systems can do and how they are structured, might be different from what clients expect. These differences have to be dealt with during the API design.
  • Freedom to innovate: The desire to innovate and market dynamics such as competing API providers trying to catch up on each other lead to the need to change and evolve the API. However, publishing an API means giving up some control and thus limiting the freedom to change it.
  • Risk of change: Introducing changes may result in possibly incompatible evolution strategies going beyond what clients expect and are willing to accept.
  • Information hiding: Any data exposed in an API can be used by the clients, sometimes in unexpected ways. Poorly designed APIs leak service implementation secrets and let the provider lose its information advantage.

Such conflicting requirements and stakeholder concerns must be balanced at the API design level; here, many design trade-offs can be observed. For instance, data can be transferred in a few calls that carry lots of data back and forth, or alternatively, many chatty, fine-grained interactions can be used. Which choice is better in terms of performance, scalability, bandwidth consumption and evolvability? Should the API design focus on stable and standardized interfaces or rather focus on fast-changing and more specialized interfaces? Should state changes be reported via API calls or event streaming? Should commands and queries be separated?

All of these – and many related – design issues are hard to get right. It is also hard to oversee all relevant consequences of a design decision, for instance regarding trade-offs and interdependencies of different decisions.


Enter Microservice API Patterns (MAP)
Our Microservice API Patterns (MAP) focus – in contrast to existing design heuristics and patterns related to microservices – solely on microservice API design and evolution. The patterns have been mined from numerous public Web APIs as well as many application development and software integration projects the authors and their industry partners have been involved in.

MAP addresses the following questions, which also define several pattern categories:
  • The structure of messages and the message elements that play critical roles in the design of APIs. What is an adequate number of representation elements for request and response messages? How are these elements structured? How can they be grouped and annotated with supplemental usage information (metadata)?
  • The impact of message content on the quality of the API. How can an API provider achieve a certain level of quality of the offered API, while at the same time using its available resources in a cost-effective way? How can the quality tradeoffs be communicated and accounted for?
  • The responsibilities of API operations. Which is the architectural role played by each API endpoint and its operations? How do these roles and the resulting responsibilities impact microservice size and granularity?
  • API descriptions as a means for API governance and evolution over time. How to deal with lifecycle management concerns such as support periods and versioning? How to promote backward compatibility and communicate breaking changes? 

So far, we have presented ten patterns at EuroPLoP 2017 and EuroPLoP 2018; about 35 more candidate patterns are currently being worked on. The published patterns and supporting material are available on the MAP website that went live recently. The papers are available via this page.

Sample Patterns for Communicating and Improving Interface Quality
To illustrate MAP a bit further, we summarize five patterns on communicating and improving API qualities below. We also outline their main relationships.

Figure: Relationships between Selected Patterns for Communicating and Improving Interface Quality.

  • API Key: An API provider needs to identify the communication participant it receives a message from to decide if that message actually originates from a registered, valid customer or some unknown client. A unique, provider-allocated API Keyper client to be included in each request allows the provider to identify and authenticate its clients. This pattern is mainly concerned with the quality attribute security.
  • Wish List: Performance requirements and bandwidth limitations might dictate a parsimonious conversation between the provider and the client. Providers may offer rather rich data sets in their response messages, but not all clients might need all of this information all the time. A Wish List allows the client to request only the attributes in a response data set that it is interested in. This pattern addresses qualities such as accuracy of the information needed by the consumer, response time, and performance, i.e., the processing power required to answer a request.
  • Rate Limit: Having identified its clients, an authenticated client could use excessively many resources, thus negatively impacting the service for other clients. To limit such abuse, a Rate Limit can be employed to restrain certain clients. The client can stick to its Rate Limit by avoiding unnecessary calls to the API. This pattern is concerned with the quality attributes of reliabilityperformance, and economic viability.
  • Rate PlanIf the service is paid for or follows a freemium model, the provider needs to come up with one or more pricing schemes. The most common variations are a simple flat-rate subscription or a more elaborate consumption-based pricing scheme, explored in the Rate Plan pattern. This pattern mainly addresses the commercialization aspect of an API.
  • Service Level Agreement: API providers want to deliver high-quality services while at the same time using their available resources economically. The resulting compromise is expressed in a provider’s Service Level Agreement(SLA) by the targeted service level objectives and associated penalties (including reporting procedures). This pattern is concerned with the communication of any quality attribute between API providers and clients. Availability is an example of a quality that is often expressed in such an SLA.

More patterns and pattern relationships can be explored at www.microservice-api-patterns.org. In addition to the patterns, you find there additional entry points such as a cheat sheet and various pattern filters such as patterns by force, and patterns by scope (phase/role).


Wrapping Up

Microservice API Patterns (MAP) is a volunteer project focused on the design and evolution of Microservice APIs. We hope you find the intermediate results of our ongoing efforts useful. They are available at www.microservice-api-patterns.org – we will be glad to hear about your feedback and constructive criticism. We also welcome contributions such as pointers to known uses or war stories in which you have seen some of the patterns in action.

The patterns in MAP aim at sharing timeless knowledge on distributed system APIs. While trends like microservices come and go, the fundamental design problems of exposing remote APIs will not go out of fashion any time soon!




Tuesday, April 9, 2019

A Survey of Sustainability in the Workplace of ICT Professionals using the Transtheoretical Model of Behavior Change

By: Juan M. Carrillo de Gea, José Alberto Garcia-Berna, José L. Fernández-Alemán, Joaquín Nicolás, Begoña Moros, Ambrosio Toval, Ali Idri

Associate Editor: Sofia Ouhbi

We live in a finite world, with limited resources. The idea of sustainable development arises to counteract the overexploitation of natural and environmental resources. The World Commission on Environment and Development (a.k.a. the Brundtland’s Commission), defined sustainable development as development that meets the needs of the present without compromising the ability of future generations to meet their own needs (World Commission on Environment and Development, 1987). At the same time, the Information and Communication Technologies (ICT) represent an important driver of innovation, competitiveness and sustained long-term growth for modern knowledge-based societies (Cardona, Kretschmer, & Strobel, 2013). ICT are also recognized catalysts of sustainable development and they can boost the impact of sustainable development efforts (Zelenika & Pearce, 2013). However, positive and negative impacts of ICT on sustainability tend to cancel each other out, and it is crucial to actively design policies that encourage ICT applications that result on a positive outcome for the environment (Hilty et al., 2006).

In an organizational context, behavior change is an important tool to improve compliance with business processes and policies (Gelles, 2016). There are many theories of behavior change: diffusion of innovations, hierarchy of effects, steps to behavior change, stages of change or transtheoretical model (TTM), social learning theory and social cognitive theory, theory of reasoned action and theory of planned behavior, health belief model, operant conditioning, value-belief-norm theory, Fogg behavior model, and DO IT process, just to name a few. In particular, the TTM of behavior change includes five stages, ranging from no intention to change, to maintain behavior: precontemplationcontemplationpreparationaction, and maintenance. Behavior change takes place when people progress—or move back—towards a desired behavior. While the TTM of behavior change has been mostly applied in health research, there are similarities between health behavior and environmental behavior (Nisbet & Gick, 2008).

In this work, we specifically address the stage construct of the TTM with the purpose of characterizing behavior change among ICT professionals about four key sustainability areas: (1) electric consumption, (2) waste treatment, (3) water consumption, and (4) transport and mobility. A total of 141 participants from ICT companies participated in an industry survey in the Region of Murcia (Spain) through an on-line questionnaire. We gathered information from all the respondents about their individual behavior at the workplace regarding the four sustainability areas. There were 26 individual behavior questions designed to be rated on a 3-point unipolar Likert-type scale (i.e. NoIn some casesYes), and we also included a Not applicable (N/A) response option. An additional question was included at the end of each block of questions that depends on the previous answers (i.e. a filter or contingency question) to assign the respondents to the specific stage of change in each sustainability dimension under study.

Figures 1-4 show the results for each sustainability dimension under study. Our findings suggest that the ICT professionals are generally respectful with the environment, especially in relation to electric consumptionwaste treatment, and water consumption. With regard to transport and mobility, the situation is not so good. All this is more evident if we move on to Figure 5, where we show the percentage of respondents in each stage of the TTM for each sustainability dimension. 

Figure 1. What sustainable habits do you have at work in relation to electric consumption?

Figure 2. What sustainable habits do you have at work regarding waste treatment?

Figure 3. What sustainable habits do you have at work with respect to water consumption?

Figure 4. What sustainable habits do you have at work in terms of transport and mobility?
Figure 5. Percentage of respondents in each stage of the TTM

Our study is still under development and we are currently working on expanding the results posted here. All in all, we hope that the information obtained will help ICT professionals to become aware that it is also possible to contribute to sustainable development through our behaviors in the workplace.

References
Cardona, M., Kretschmer, T., & Strobel, T. (2013). ICT and productivity: conclusions from the empirical literature. Information Economics and Policy, 25(3), 109-125.
Gelles, M. G. (2016). Chapter 2 - common challenges to maturing an insider threat program. In M. G. Gelles (Ed.), Insider threat(pp. 19-37). Boston: Butterworth-Heinemann.
Hilty, L. M., Arnfalk, P., Erdmann, L., Goodman, J., Lehmann, M., & Wäger, P. A. (2006). The relevance of information and communication technologies for environmental sustainability - a prospective simulation study. Environmental Modelling and Software, 21(11), 1618-1629.
Nisbet, E. K. L., & Gick, M. L. (2008). Can health psychology help the planet? applying theory and models of health behaviour to environmental actions. Canadian Psychology/Psychologie canadienne, 49(4), 296-303.
World Commission on Environment and Development. (1987). Our common future. Oxford, UK: Oxford University Press.
Zelenika, I., & Pearce, J. M. (2013). The internet and other ICTs as tools and catalysts for sustainable development: innovation for 21st century. Information Development, 29(3), 217-232.

Monday, April 1, 2019

Designing for E-commerce User-Experience in Complex Scenarios

By: Catherine Hills (@daughterofbev
Associate Editor: Muneera Bano (@DrMuneeraBano)

Designing ‘good’ e-commerce and m-commerce usability and user-experience is even more important than before. Issues such as trust, security, service quality, as well as website and mobile application quality continue to be critical and have an impact on how consumers feel about purchasing in digital, non-physical settings.

E-commerce purchases and online shopping transactions are increasingly becoming the norm in consumer shopping behaviour. In mature economies, like Australia and the US, this might seem unsurprising, however, worldwide, 50% internet adoption still means, statistically, half the world’s population does not have access to these technologies. Estimated e-commerce sales are expected to rise substantially worldwide, year on year with increased projected growth predicted beyond 2021. While mobile device usage for e-commerce purchasing is ever more popular, the desktop computer still presents as the most popular device type used for e-commerce transactions and browsing. 

Furthermore, if we consider the early adoption of technologies like Alexa or Google Home, voice commands, control and artificial intelligence integrate with visual and non-visual interactions, e-commerce ecosystems and services are further affecting consumers. The way consumers interact with these e-commerce ecosystems will continue to diversify, become more ubiquitous and provide persistent challenges for the designers of these user experiences.  As the user-experience and engineering communities understand more about e-commerce ecosystems and how they might co-exist with service delivery fulfilment and logistical service strategies further influenced by increased technology adoption and device ubiquity, the terrain for both user-experience designers and consumers will become more complex and unique.

In recent research, the perceptions of designers of e-commerce user experiences were tested for their perceptions of e-commerce experiences in comparison to consumers who were tested with the same questions and responses to these same experiences. The research demonstrated that the differences between what is perceived to be good or bad e-commerce user-experience design between designer-consumers and consumer-consumers have been shown to be quite complementary, when the user role, as the consumer is considered as a primary focus.

This research also demonstrated that there is a little dissimilarity in the expectations of designers in their perception of ‘good’ user-experience of e-commerce user-experiences to those of consumers conveniently sampled in a qualitative study. Most participants regardless of their designer or non-designer background agreed that factors such as error-free flows, low advertising, security, clear information design and representative information of products and services for purchase were highly important in their experience of a ‘good’ e-commerce website.

Key differences between the views of designer-consumers and consumer end-users, who are not involved in the design of e-commerce user-experiences, were demonstrated to be caused by the domain and technical knowledge of individuals in the designer-consumer versus the consumer-end-user groups. These differences were manifested by expressed language in their observations of designers regarding their perceptions and ideas of good and bad e-commerce experiences. In addition to the domain and professional knowledge of the people we spoke to, domain-specific biases expressed by designer-consumers in their responses to e-commerce user-experiences were detected via their increased technical knowledge, informing the researchers that they had more cutting-edge and technology-specific expectations of functionality than the needs and opinions expressed by the consumer-end-user participants.

In addition to these aforementioned considerations, the research also considered that as user-experience designers and researchers, the design of the user-experience must be accessible and inclusive.  As the globalisation of e-commerce continues, how might issues such as diversity, gender and culture be addressed in the way we design these shopping interfaces? Moreover, as designers and engineers of e-commerce user-experiences, how might we provide user-experiences that cater for users of diverse backgrounds, networked access as well as devices that are broadly reliant on the socio-economic circumstances of our users?

Given website and service quality frameworks were considered important in the collection of data in this research and the analysis of the findings, it was of curiosity to the researchers as to how gender, culture and personality might be considered in relation to these quality dimensions. Given we had a limited sample size with a small population of participants located in Melbourne, Australia, we were still able to consider data collected from a broad range of ages, ethnicities and both male and female participants.

A limitation to the data collection was the binary sampling of participant genders and to be truly representative, a wider diversity of gender participation would be required. Additionally, in order to fully analyse the differences between countries, a wider data collection is required. Responses relating to gender in this research were broadly stereotyped occasionally amusing and sometimes biased, but not significant enough to draw conclusions and would require a further, focused phase of inquiry. Overarchingly this research indicated that e-commerce is still a widening terrain and as users adapt further to new technologies, we must consider service and user-experience conditions with fresh eyes and attention to the basic needs of consumer end-users.



Tuesday, March 5, 2019

What should practitioners know about user involvement in software development?

By: Didar ZowghiMuneera Bano (@DidarZowghi @DrMuneeraBano
Associate Editor: Muneera Bano (@DrMuneeraBano)


In the neo-humanist approach, software is designed to support and improve the working environment of its intended users. In practice, this means involving them in the development process and not to treat them as mere consumers of the software

But is it really important to involve users in software development? To what extent, and at what stages of software development, should they participate? How much power should the users be given in software development decision making?  These questions have been asked by researchers and practitioners for decades and together with many related subjects, they have been the topic of a great deal of research.  

Users’ involvement (UI) during software development has been claimed to link to users’ satisfaction with the resulting system, that in turn leads to the assertion of system success. However, much of the empirical evidence to date shows that this connection between UI and system success is not ubiquitous. Although much of research in this area has revealed that involving users in software development contributes positively to system success, it is also been observed that UI is indeed a double-edged sword and can equally create problems rather than benefits.

In theory, Users’ involvement in software development and system success’ is a complex combination of four different concepts that have been studied and analyzed separately and in combination. Over five decades of this investigation enormous amount of debate and disagreement has been published both in software engineering and information systems research literature. 

Like others, we have been curious about so much interest and conflicting results on the topic of users’ involvement in software development. In the last 6 years, we have conducted empirical longitudinal studies to explore: what has been reported in research literature? what are the problems and challenges of UI? how does users’ satisfaction with their involvement evolve? and what useful theories could be developed about the link between UI and system success?. Furthermore, we have been wondering about what impact has Agile software development had on addressing the problems and challenges of UI. So, we also explored the alignment of stakeholder Expectations about UI in agile development. Finally, being convinced that UI is clearly a multi-faceted and communication rich phenomenon, we conducted a case study of organizational power and politics in the context of UI in software development.

The overall findings from our longitudinal study have resulted in several key observations and the development of a theory.  First, we observed that system success is achievable even when there are problems and challenges in involving users. User satisfaction significantly contributes to the system success even when schedule and budget goals are not met. Second, there are additional factors that contribute to the evolution of user satisfaction throughout the project. Users’ satisfaction with their involvement and the resulting system are mutually constituted while the level of user satisfaction evolves throughout the stages of the development process. Third, organizational and project politics are significant factors often used to exert power and influence in decision-making processes. Communication channels are frequently exploited for political purposes. These contribute to the users’ dissatisfaction with their involvement thus impacting on the project outcome. Fourth, varying degrees of expectation misalignments exist between the development team and users in Agile software development projects.

Our findings enabled us to provide useful hints for practitioners. Understanding the nature of the problems related to UI helps the project managers to develop appropriate strategies for increasing the effectiveness of user involvement. These management strategies and appropriate level and extent of user representation are essential elements of maintaining an acceptable level of user satisfaction throughout the software development process. When there are multiple teams of stakeholders with different levels of power in decision-making, politics is inevitable and inescapable. Without careful attention, the political aspect of user involvement in software development can contribute to an unsuccessful project.


Tuesday, February 26, 2019

How Python Tutor Uses Debugger Hooks to Help Novices Learn Programming

By: Philip Guo (@pgbovine)
Associate Editor: Karim Ali (@karimhamdanali)


Since 2010 I've been working on Python Tutor, an educational tool that helps novices overcome a fundamental barrier to learning programming: understanding what happens as the computer runs each line of code. Python Tutor allows anyone to write code in their web browser, see it visualized step by step, and get live real-time help from volunteers. Despite its now-outdated name, this tool actually supports seven languages: Python, JavaScript, TypeScript, Ruby, Java, C, and C++. So far, over 3.5 million people in over 180 countries have used it to visualize over 50 million pieces of code. You can find research papers related to this project on my publications webpage. But in this blog post, I want to dive into some implementation details that I haven't gotten to highlight in my papers.


Let's start with a simple Python example (run it live here):


This code creates instances of three basic data structures: an ordered collection (called a list in Python), a key-value mapping (called a dict or dictionary in Python), and an unordered set. Note how elements within these data structures can point to other data structures; for instance, the second element of the top list (accessible via the global variable x) points to the bottom list. Using Python Tutor, novices can easily see pointer and aliasing relationships by following the arrows in these diagrams. Without this tool, they would need to print out serialized string values to the terminal, which obscures these critical details.

How is Python Tutor implemented? By hooking into Python's built-in debugger protocol (bdb in its standard library). This tool runs the user's inputted code, single-steps through execution one line at a time, and traverses the object graph starting from globals and stack-local variables. It records a full trace of the stack and heap state at all execution steps and then sends the trace to the web frontend to render as interactive diagrams.

The main limitation of this "trace-everything" approach is scalability: it's clearly not suitable for code which runs for millions of steps or creates millions of objects. But code written by instructors and students in educational settings is usually small -- running for dozens of steps and creating around a dozen data structures -- so this simple approach works well in practice.


Now here's the same code example ported to JavaScript (run it live here):


This heap object diagram looks exactly the same as the Python one, albeit with different labels: in JavaScript, an ordered collection is called an array, and a key-value mapping is called an object (note that there's also a Map type). The JavaScript implementation works in the same way as the Python one: by hooking into the debugger protocol of the Node.js JavaScript runtime.

Here's what this example looks like in Ruby, once again implemented by hooking into the interpreter's built-in debugger protocol (run it live here):


These three identical-looking examples show how the diagrams generated by Python Tutor are designed to be fairly language-independent. Novice programmers need to learn about concepts such as stack frames, scope, control flow, primitive data types, collections, and pointers. To facilitate this learning, Python Tutor implements a graphical abstraction layer that takes the details of each language's low-level trace data and turns them into higher-level diagrams that capture the essence of the associated programming concepts. This abstraction makes it straightforward to expand the tool to work on additional languages as demand arises. It also makes it possible to scaffold learning of one language when someone already knows another one, such as teaching Python programmers how to quickly get up to speed on JavaScript.

This tool visualizes Java code in a similar way, but I'll skip that illustration to save space. Let's now turn to a more challenging pair of languages: C and C++. Unlike code in the above languages, C and C++ programs are not necessarily type- or memory-safe. This means that hooking into a debugger such as gdb isn't enough, since it's not clear which chunks of memory correspond to valid data. Here's a C example to show what I mean (run it live here):


This code allocates a 6-element integer array on the stack (accessible via localArray) and a 10-element integer array allocated on the heap via malloc (accessible via b). It then populates the elements of both arrays at indices 1, 3, and 5. The resulting visualization shows those initialized values and ? symbols next to the remaining uninitialized values. In addition, it knows that the heap array has exactly 10 elements and does not try to read unallocated elements beyond that bound, which risks crashing the program. Readers familiar with C and C++ will recognize that such memory allocation and initialization data is not available to debuggers such as gdb. Python Tutor hooks into Valgrind Memcheck to get this vital data. Without something like Memcheck, it would be impossible to build a safe and accurate visualizer for C and C++.

Finally, let's end with a C++ example (run it live here):


This visualization shows object-oriented programming concepts such as calling an instance method Date::set(), its this pointer referring to a Date object on the stack (accessible via stackDay), and another Date object on the heap (allocated with the new keyword and accessible via heapDay). Just like it does for C programs, Valgrind Memcheck ensures that Python Tutor reads only memory that has been both allocated (here recognizing that there is only one Date object on the heap) and initialized (so that it doesn't show stale junk values).


That was my quick tour of how Python Tutor works for a variety of languages that are frequently used to teach programming. The underlying principle that drives my implementation decisions is that authenticity is key: experts can work around bugs or other quirks in debugging tools, but novices will get confused and demoralized if a tool renders inaccurate diagrams. Thus, this tool needs to be able to take whatever code that users put into it and do something reasonable, or at least fail in a transparent way (e.g., stopping after 1,000 execution steps and suggesting for the user to shorten their code). I've tried out alternative language interpreters and experimental runtimes to get more detailed tracing (e.g., finer-grained stepping into subexpressions rather than line-level stepping), but time and time again I've gone back to built-in debugger hooks and widely-deployed tools such as Valgrind since they are far more robust than experimental alternatives. Try it out today at http://pythontutor.com/ and let me know what you think!

Tuesday, February 19, 2019

Software Improvement by Data Improvement

By: William B. Langdon, CREST Department of Computer Science, University College, London

Associate Editor: Federica Sarro, University College London (@f_sarro)


In previous blogs [1, 2] we summarised genetic improvement [3] and described the result of applying it to BarraCUDA [4]. Rather than giving a comprehensive update (see [5]) I describe a new twist: evolving software via constants buried within it to give better results. The next section describes updating 50000 free energy parameters used by dynamic programming to find the lowest energy state of RNA molecules and hence predict their secondary structure (i.e. how they fold) by fitting data in silico to known true structures. The last section describes converting a GNU C library square root function into a cube root function by data changes.

Better RNA structure prediction via data changes only

RNAfold is approximately 7 000 lines of code within the open source Vienna- RNA package. Almost all the constants within the C source code are pro- vided via 21 multi (1–6) dimensional int arrays [6, Tab. 2]. We used a population of 2000 variable length lists of operators to mutate these inte- gers. The problem dependent operators can invert values, replace them or update them with near by values. They can be applied to individuals values or using wild cards (*) sub-slices or even the whole of arrays. From these a population of mutated RNAfold is created. Each member of the popula- tion is tested on a 681 small RNA molecules and the mutants prediction is compared with their known structure [6, Tab. 1]. At the end of each gen- eration the members of the population are sorted by their average fitness on the 681 training examples and the top 1000 are selected to be parents of the next generation. Half the children are created by mutating one parent and the other 1000 by randomly combining two parents. After one hundred generations, the best mutant in the last generation is tidied (i.e. ineffective bloated parts of it are discarded) and used to give a new set of 50 000 integer parameters (29% of them are changed).
On average, on both big and small molecules of known structure (not used in training), the new version of RNAfold does better than the original. (In many cases it gives the same prediction, in some it is worse but in more it is better.)
Figure 1 shows RNAfold’s original prediction of the secondary structure of an example RNA molecule and then the new prediction using the updated free energy parameters.

Figure 1: Secondary structure (i.e. folding patterns) for RNA molecule PDB 01001. 1) Prediction made original RNAfold does not match well true structure right. For example the highlighted hairpin loop (red) is not in the true structure. 2) Prediction made with GI parameters is almost identical to the true structure. 3) True structure. (Figure 2 tries to show the three dimensional structure of two PDB 01001 RNA molecules in a larger complex.)

A new Cube Root Function

The GNU C library contains more than a million constants. Most of these are related to internationalisation and non-ascii character sets [8]. However one implementation of the double precision square root function uses a table of 512 pairs of real numbers. (Most implementations of sqrt(x) simply call low level machine specific routines.) The table driven implementation
is written in C and essentially uses three iterations of Newton-Raphson’s method. To guarantee to converge on the correct square(x) to double precision accuracy, Newton-Raphson is given a very good start point for both the target value x^(1/2) and the derivative 0.5x^(−1/2) and these are held as pairs in the table.
Unlike the much larger RNAfold (previous section), with cbrt(x) some code changes were made by hand. These were to deal with: x being negative, normalising x to lie in the range 1.0 to 2, reversing the normalisation so that the answer has the right exponent and replacing the Newton-Raphson constant 1/2 by 1/3 [8, Sec. 2.1]. Given a suitable objective function (how close 23 is cbrt(x)×cbrt(x)×cbrt(x) to x), starting with each of the pairs of real numbers for sqrt(x), in less than five minutes CMA-ES [9] could evolve all 512 pairs of values for the cube root function.
The GNU C library contains many math functions which follow similar implementations. For fun, we used the same template to generate the log2(x) function [10].

Figure 2: Three dimensional structure of two PDB 01001 RNA molecules (blue, orange) in a Yeast protein complex (green, yellow) [7, Fig 2. A].

References

  1. W. B. Langdon and Justyna Petke. Genetic improvement. IEEE Soft- ware Blog, February 3 2016.
  2. W. B. Langdon and Bob Davidson. BarraCUDA in the cloud. IEEE Software Blog, 8 January 2017.
  3. W. B. Langdon and Mark Harman. Optimising existing software with genetic programming. IEEE Transactions on Evolutionary Computa- tion, 19(1):118–135, 2015.
  4. W. B. Langdon and Brian Yee Hong Lam. Genetically improved Barra- CUDA. BioData Mining, 20(28), 2 August 2017.
  5. Justyna Petke et al. Genetic improvement of software: a comprehensive survey. IEEE Transactions on Evolutionary Computation, 22(3):415– 432, 2018.
  6. W. B. Langdon, Justyna Petke, and Ronny Lorenz. Evolving better RNAfold structure prediction. In Mauro Castelli et al., editors, EuroGP 2018, pages 220–236, Parma, Italy, 2018. Springer Verlag.
  7. Masaru Tsunoda et al. Structural basis for recognition of cognate tRNA by tyrosyl-tRNA synthetase from three kingdoms. Nucleic Acids Re- search, 35(13):4289–4300, 2007.
  8. W. B. Langdon and Justyna Petke. Evolving better software parame- ters. In Thelma Elita Colanzi and Phil McMinn, editors, SSBSE 2018, pages 363–369, Montpellier, France, 2018. Springer.
  9. Nikolaus Hansen and Andreas Ostermeier. Completely derandom- ized self-adaptation in evolution strategies. Evolutionary Computation, 9(2):159–195, 2001.
  10. W. B. Langdon. Evolving square root into binary logarithm. Technical Report RN/18/05, University College, London, UK, 2018.

You might also enjoy reading

  • James Temperton. Code ’transplant’ could revolutionise program- ming. Wired.co.uk, 30 July 2015. Online.
  • John R. Woodward, Justyna Petke, and William Langdon. How com- puters are learning to make human software work more efficiently. The Conversation, page 10.08am BST, June 25 2015.
  • Justyna Petke. Revolutionising the process of software development. DAASE project blog, August 7 2015.