Tuesday, February 19, 2019

Software Improvement by Data Improvement

By: William B. Langdon, CREST Department of Computer Science, University College, London

Associate Editor: Federica Sarro, University College London (@f_sarro)

In previous blogs [1, 2] we summarised genetic improvement [3] and described the result of applying it to BarraCUDA [4]. Rather than giving a comprehensive update (see [5]) I describe a new twist: evolving software via constants buried within it to give better results. The next section describes updating 50000 free energy parameters used by dynamic programming to find the lowest energy state of RNA molecules and hence predict their secondary structure (i.e. how they fold) by fitting data in silico to known true structures. The last section describes converting a GNU C library square root function into a cube root function by data changes.

Better RNA structure prediction via data changes only

RNAfold is approximately 7 000 lines of code within the open source Vienna- RNA package. Almost all the constants within the C source code are pro- vided via 21 multi (1–6) dimensional int arrays [6, Tab. 2]. We used a population of 2000 variable length lists of operators to mutate these inte- gers. The problem dependent operators can invert values, replace them or update them with near by values. They can be applied to individuals values or using wild cards (*) sub-slices or even the whole of arrays. From these a population of mutated RNAfold is created. Each member of the popula- tion is tested on a 681 small RNA molecules and the mutants prediction is compared with their known structure [6, Tab. 1]. At the end of each gen- eration the members of the population are sorted by their average fitness on the 681 training examples and the top 1000 are selected to be parents of the next generation. Half the children are created by mutating one parent and the other 1000 by randomly combining two parents. After one hundred generations, the best mutant in the last generation is tidied (i.e. ineffective bloated parts of it are discarded) and used to give a new set of 50 000 integer parameters (29% of them are changed).
On average, on both big and small molecules of known structure (not used in training), the new version of RNAfold does better than the original. (In many cases it gives the same prediction, in some it is worse but in more it is better.)
Figure 1 shows RNAfold’s original prediction of the secondary structure of an example RNA molecule and then the new prediction using the updated free energy parameters.

Figure 1: Secondary structure (i.e. folding patterns) for RNA molecule PDB 01001. 1) Prediction made original RNAfold does not match well true structure right. For example the highlighted hairpin loop (red) is not in the true structure. 2) Prediction made with GI parameters is almost identical to the true structure. 3) True structure. (Figure 2 tries to show the three dimensional structure of two PDB 01001 RNA molecules in a larger complex.)

A new Cube Root Function

The GNU C library contains more than a million constants. Most of these are related to internationalisation and non-ascii character sets [8]. However one implementation of the double precision square root function uses a table of 512 pairs of real numbers. (Most implementations of sqrt(x) simply call low level machine specific routines.) The table driven implementation
is written in C and essentially uses three iterations of Newton-Raphson’s method. To guarantee to converge on the correct square(x) to double precision accuracy, Newton-Raphson is given a very good start point for both the target value x^(1/2) and the derivative 0.5x^(−1/2) and these are held as pairs in the table.
Unlike the much larger RNAfold (previous section), with cbrt(x) some code changes were made by hand. These were to deal with: x being negative, normalising x to lie in the range 1.0 to 2, reversing the normalisation so that the answer has the right exponent and replacing the Newton-Raphson constant 1/2 by 1/3 [8, Sec. 2.1]. Given a suitable objective function (how close 23 is cbrt(x)×cbrt(x)×cbrt(x) to x), starting with each of the pairs of real numbers for sqrt(x), in less than five minutes CMA-ES [9] could evolve all 512 pairs of values for the cube root function.
The GNU C library contains many math functions which follow similar implementations. For fun, we used the same template to generate the log2(x) function [10].

Figure 2: Three dimensional structure of two PDB 01001 RNA molecules (blue, orange) in a Yeast protein complex (green, yellow) [7, Fig 2. A].


  1. W. B. Langdon and Justyna Petke. Genetic improvement. IEEE Soft- ware Blog, February 3 2016.
  2. W. B. Langdon and Bob Davidson. BarraCUDA in the cloud. IEEE Software Blog, 8 January 2017.
  3. W. B. Langdon and Mark Harman. Optimising existing software with genetic programming. IEEE Transactions on Evolutionary Computa- tion, 19(1):118–135, 2015.
  4. W. B. Langdon and Brian Yee Hong Lam. Genetically improved Barra- CUDA. BioData Mining, 20(28), 2 August 2017.
  5. Justyna Petke et al. Genetic improvement of software: a comprehensive survey. IEEE Transactions on Evolutionary Computation, 22(3):415– 432, 2018.
  6. W. B. Langdon, Justyna Petke, and Ronny Lorenz. Evolving better RNAfold structure prediction. In Mauro Castelli et al., editors, EuroGP 2018, pages 220–236, Parma, Italy, 2018. Springer Verlag.
  7. Masaru Tsunoda et al. Structural basis for recognition of cognate tRNA by tyrosyl-tRNA synthetase from three kingdoms. Nucleic Acids Re- search, 35(13):4289–4300, 2007.
  8. W. B. Langdon and Justyna Petke. Evolving better software parame- ters. In Thelma Elita Colanzi and Phil McMinn, editors, SSBSE 2018, pages 363–369, Montpellier, France, 2018. Springer.
  9. Nikolaus Hansen and Andreas Ostermeier. Completely derandom- ized self-adaptation in evolution strategies. Evolutionary Computation, 9(2):159–195, 2001.
  10. W. B. Langdon. Evolving square root into binary logarithm. Technical Report RN/18/05, University College, London, UK, 2018.

You might also enjoy reading

  • James Temperton. Code ’transplant’ could revolutionise program- ming. Wired.co.uk, 30 July 2015. Online.
  • John R. Woodward, Justyna Petke, and William Langdon. How com- puters are learning to make human software work more efficiently. The Conversation, page 10.08am BST, June 25 2015.
  • Justyna Petke. Revolutionising the process of software development. DAASE project blog, August 7 2015.

Monday, February 11, 2019

Learning from Failure: Modeling User Goals in the App Market

By: Grant Williams (@g_will74), Nishant Jha (@nishant_vodoo), and Anas Mahmoud (@nash_mahmoud), Louisiana State University

Associate Editor: Federica Sarro, University College London ( @f_sarro)

App success has been broadly studied in the literature. This line of research is mainly driven by the significant business interests in the performance of mobile apps. In general, app success can be quantified based on market performance; apps that get more downloads, appear frequently on the top-charts, have steady user acquisition and retention rates, or generate more revenue, are more successful. Failure, on the hand, is a less well-understood phenomenon. Naturally, success is more appealing as an object of analysis than failure. However, market research has shown that failure should be studied in isolation from total prior experience. According to Madsen et al. [1], organizations learn more effectively from failures than successes. The ability to analyze individual cases of failure represents a unique opportunity for gaining knowledge about what went wrong, the key learning points, and how to prevent such failures in the future. 

To shed more light on app failure, in this article, we describe a case-study dedicated to analyzing the main reasons that led to the failure of Yik Yak, one of the most popular social networking apps of 2015. Our objective is to model the main users’ goals in domain of Yik Yak along with their relations to the core features of the domain. Models can help to translate tacit domain knowledge into an explicit form. Once explicit domain knowledge is created, it can be preserved, communicated, and passed through to others.

A Case Study on App Failure

Fig. 1 The number of monthly downloads after 
anonymity was removed and then restored. 

Launched in 2013, Yik Yak was a location-based social networking app that allowed users to post and vote on short messages (known as yaks). Yik Yak distinguished itself from its competitors by two main features: anonymous communication and geographical locality; users could anonymously post and engage in discussions within a 1.5 mile radius around their location. The combination of anonymity and locality proved successful, and by the end of 2014, Yik Yak had become one of the most popular social networking platforms on college campuses, with a market valuation close to $400 million. 
Despite its popularity among users, the anonymity provided by Yik Yak had a downside. Due to the lack of personal identifiability, Yik Yak posts were an easy vector for cyber-bullies to anonymously harass other users at their schools or universities. This problem became significant when the app was used to make threats credible enough for institutions to request police assistance. To control for cyberbullying, Yik Yak rolled out mandatory handles, where users would be forced to choose a screen name. The response from the community to this feature update was overwhelmingly negative. Consequently, the number of downloads and active users dropped sharply. Yik Yak’s attempt to reverse course by reintroducing anonymous posting was never successful in recapturing its previous popularity, eventually, forcing the company to shut down the app and suspend its operations in May of 2017. 

The story of Yik Yak represents a unique opportunity for analyzing one of the most recent cases of major failures in the app market. To get a systematic in-depth look into this case, we use feature-softgoal interdependency graphs (F-SIG). F-SIGs enable a comprehensive qualitative reasoning about the complex interplay between the functional features of an application domain and its users’ goals. To generate our model, we initially identified the main rival apps of Yik Yak in the domain of anonymous social networking apps. This list included Kik, Jodel, Firechat, Whisper, Swiflie, and Spout. To identify end-users’ goals in the domain, we analyzed user feedback available on app store reviews and the Twitter feeds of these apps [3]. Our analysis was focused on identifying user sentiments toward the core features of the domain [4]. The generated F-SIG model is shown in Figure 2.   
Fig. 2: A model of the common user goals (concerns) and their relationships to the core features of the domain of anonymous social networking apps.

Significance and Future Work

The F-SIG diagram in Figure 2 shows that several in-depth insights can be gleaned from analyzing user goals and their relations to the core features of the domain. Such information can serve as input for the Next Release Problem (NRP), which is mainly concerned with maximizing customer value by optimizing the subset of requirements to be included in the coming release. Furthermore, through the model's dependency relations, developers can get a sense of the synergy and trade-offs between features and user goals, thus can adjust their release strategies to focus on features that enhance the most desirable goals. This information can be particularly useful for smaller businesses and startups trying to break into the app market. Specifically, the proposed model will serve as a core asset that will help startup companies to get a quick and comprehensive understanding into the history of the domain, providing information about how specific user goals and features of the domain have emerged and evolved to their current state.
Our future work in this direction will be focused on automating the model generation process, utilizing automatic app store and social media mining methods as well as automated domain modeling techniques. Our goal is to devise automated methods for data collection, domain feature analysis, and model generation. These methods will be evaluated over large datasets of app data to ensure their practicality and accuracy.


[1] G. Williams and A. Mahmoud, “Modeling User Concerns in the App Store: A Case Study on the Rise and Fall of Yik Yak,” in IEEE International Requirements Engineering Conference, 2018.

[2] S. Jarzabek, B. Yang, and S. Yoeun. Addressing quality attributes in domain analysis for product
lines. IEE Proceedings - Software, 153(2):6-73, 2006.

[3] W. Martin, F. Sarro, Y. Jia, Y. Zhang, and M. Harman, A survey of app store analysis for software engineering, in IEEE Transactions on Software Engineering, vol. 43, no. 9, pp. 817-847, 1 Sept. 2017.

[4] P. Madsen and V. Desai, “Failing to learn? The effects of failure and success on organizational learning in the global orbital launch vehicle industry,” Academy of Management Journal, vol. 53, no. 3, pp. 451–476, 2010.

Thursday, January 31, 2019

Architectural Security Weaknesses in Industrial Control Systems (ICS)

By Mehdi Mirakhorli (@MehdiMirakhorliAssociate Editor and Danielle Gonzalez (@dngonza) 
Center for Cybersecurity at RIT

Industrial control systems (ICS) are computers that control any automation system used in industrial environments that include critical infrastructures. They allow operators to monitor and control industrial processes and support the day to day operations of manufacturing, oil and gas production, chemical processing, electrical power grids, transportation, pharmaceutical, and many other critical infrastructures. Ever since the Stuxnet attack on modern supervisory control and data acquisition (SCADA) and PLC systems, there have been many vulnerabilities discovered and reported for these systems. Recent studies have shown that the documented cases of attacks to ICS infrastructures have exponentially increased from a few incidents to hundreds per year. In this blog post, we discuss common security architectural weaknesses in ICS.

We conducted a study that took several months in which we collected and analyzed 988 security advisories that disclosed ICS vulnerabilities. These vulnerabilities were received from the Industrial Control Systems Cyber Emergency Response Team (ICSCERT). We performed a detailed analysis of the vulnerability reports to measure which components of ICS have been affected the most by known vulnerabilities, which security tactics were affected most often in ICS and what are the common architectural security weaknesses in these systems 
Figure 1 shows a typical architecture of ICS. Programmable Logic Controllers (PLCs) and Remote Terminal Units (RTUs) connect to sensors and actuators through a Fieldbus to gather real-time information and provide operational control of the equipment. These controllers also communicate to a data acquisition server (SCADA) using various industrial communications protocols.

ICS relies on Human-Machine Interfaces (HMI) to configure the industrial plant, troubleshoot the operation and equipment, run, stop, install and update programs, and recover from failures. The day to day operational data is logged by the Data Historian component for further analysis and diagnosis. Each of these components has its own interfaces and can be the target of attacks that impact industrial equipment and operations. 

Figure 1. Overivew of an ICS Architecture

Our analysis detected 17 ICS components in 544 ICS advisory reports (55.06% of the entire corpus). Figure 2 is a heat map showing the most common vulnerable components in these ICS advisory reports. The size of each block in the heatmap represents the frequency of a component. 

Human-Machine Interfaces (HMIs), Supervisory Control And Data Acquisition (SCADA) Software, and Programmable Logic Controllers (PLCs) are the most vulnerable ICS components.
The remaining components were significantly less vulnerable. For instance, the fourth component impacted by vulnerabilities is OPC (reported 57 times) which is a software protocol based on Microsoft’s Distributed Component Object Model (DCOM), used by manufacturers for interoperability of industrial processes in real time.

We examined how many of the vulnerabilities had an architectural root cause by identifying the Common Architectural Weakness Enumeration (CAWE) assigned to each vulnerability and noting how many were associated with architectural tactics:
62.86% of ICS vulnerability disclosures had an architectural root cause, while 37.14% of the vulnerabilities were due to coding defects.

The most common architectural root causes for ICS vulnerabilities:

Improper Input Validation affects ICS the most, followed by Cross-site Scripting, Cross-Site Request Forgery, SQL Injection,  Improper Authentication

Table III shows the top 20 architectural CAWEs affecting ICS.

Table 2. Top CAWEs in Industrial Control Systems

1) Improper Input Validation: This weakness occurs when an ICS component either does not or incorrectly implements Validate Inputs tactic. Our analysis indicates that this weakness can be present in every ICS component and the consequences often results in a denial of service attack. For instance, ICSA-14-079-01 reports two vulnerabilities impacting Siemens SIMATIC S7-1200 Programmable Logic Controllers (PLCs). The crafted packets sent on 2 specific ports could initiate a special “defect mode” on the PLC device. The subsystem receiving the packets fails to validate, thus causing a vulnerability that can be exploited remotely and causes denial-of-service. These vulnerabilities were viable since the devices were accessible through the internet, however, they were mitigated by Siemens new versions of these PLCs. 

2) Improper Neutralization of Input During Web Page Generation (’Cross-site Scripting’): This weakness occurs in an ICS component that has a web-accessible UI, but does not or incorrectly “neutralize” input provided by one user (commonly a web request) than is subsequently used to generate web pages which are served to additional users. With this flaw, attackers can inject client-side scripts into web pages viewed by other users. This weakness is related to the Validate Inputs tactic, however, it is limited to ICS components with web-accessible UIs. In contrast, the previous weakness (CWE-20) presented the cases that attackers could implement a denial of service attack using a crafted input, that did not necessarily involved web applications. ICSA-15- 342-01 describes a cross-site scripting vulnerability affecting XZERES 442SR Wind Turbines. The web-based operating system of the turbine generator did not properly neutralize incoming web request content. This vulnerability could be exploited remotely and could cause a loss of power for the entire system. 

3) Improper Authentication: A common design weakness discovered in ICS is that a component fails to or incorrectly verifies the identity claims of an actor interacting with it. For instance, ICSA-17-264-04 reports that iniNet Solutions GmbH’s SCADA Web server assumed the SCADA is used in a protected network and did not implement “Authenticate Actor” tactic at all. However, the system was connected to the Internet. In another case, ICSA-16-308-01 describes vulnerabilities in multiple versions and models of Moxa’s OnCell Security software, which are cellular IP gateways that connect devices to cell networks. Access to a specific URL is not restricted, and any user may download log files when the URL is accessed. This can be exploited and used for an authentication bypass, where the malicious actor accesses the URL without prior authentication. 

4) Improper Access Control: This is a weakness in Authorize Actors tactic that occurs when an ICS component fails to or incorrectly restricts an unauthorized actor from accessing resources or equipment. This weakness can result in privilege escalation, the disclosure of confidential or sensitive data, or execution of malicious code. ICSA-13-100-01 discusses a case that MiCOM S1 Studio Software which is used to configure parameters of electronic protective relays fails to limit access to executables, meaning users without administrative privileges can replace the executables with malicious code or perform unauthorized modification of the relay parameters. The malicious actor could also exploit this to allow other users to escalate their own privileges. Exploiting this vulnerability requires physical access to the device and it can’t be exploited remotely. The company provides mitigation strategies for users, but it has not been patched or fixed by adding appropriate security tactics. 

5) Cross-Site Request Forgery (CSRF): ICS components may not verify that an expected and/or valid request to a web server was sent intentionally by the client (a.k.a. “forged” request). A malicious entity could take advantage of this weakness to force or trick a client into sending an unintentional request to an ICS component or equipment, which appears to be valid to the SCADA server or process controllers. ICSA-15-239-02 describes a cross-site request forgery (CSRF) vulnerability in the integrated web server of Siemens SIMATIC S7-1200 CPUs, which are used in Programmable Logic Controllers (PLCs). A user with an active session on these web servers could be tricked into sending a malicious request, which would be accepted by the CPU server without verifying intent. This can be exploited remotely, and would allow the malicious entity to perform actions to the PLC using the tricked user’s permissions. 

6) Use of Hard-coded Credentials: occurred when ICS components stored any type of credential used for authentication, external communication, or encryption of data in readable formats (e.g. plain text) in discoverable locations. This weakness compromises an Authenticate Actors tactic in ICS, and enables malicious actors to implement authentication bypass exploits and make ICS components or equipment perform actions that require authentication or elevated privilege. This weakness was prevalent across all ICS components. For instance, ICSA-15-309-01 discloses a hard-coded SSH key in Advantech’s EKI-122X Modbus gateways, which integrate Modbus/RTU and Modbus/ASCII devices to TCP/IP networked devices. The firmware of these gateways contains unmodifiable, hard-coded SSH keys. Since these keys cannot be changed but are discoverable, they could be remotely exploited to intercept communication by a malicious external actor. 

7) Improper Neutralization of Special Elements used in an SQL Command (’SQL Injection’): This is another weakness in ICS that affects implementations of the Validate Inputs tactic. It describes an ICS component’s incorrect or failure to “neutralize” user-provided inputs which are inserted as parameters to SQL queries, which can be exploited in SQL Injection attacks. In such cases the un-handled input is executed as part of the query itself instead of as a parameter. This can lead to the database unintentionally returning sensitive data or modification of the existing data. This weakness was mostly associated with components such as Human-machine Interface (HMI), Supervisory control software (SCADA) and variants of SCADA (e.g. Process Control System (PCS)). ICSA-14-135- 01 discusses a SQL injection vulnerability in several versions of the web-based CSWorks framework that is used to build process control software. The framework fails to sanitize or validate (“neutralize”) user provided inputs which are intended to be used to read and write paths. 

8) Use of Hard-coded Password: Hard-coded passwords are passwords stored in plain text or easily decryptable formats that are located somewhere discoverable by malicious actors, who can exploit them to bypass authentication or authorization checks. This weakness affects implementations of the Authenticate Actors tactic in ICS products. For instance, ICSA-12-243-01 report describes a privilege escalation vulnerability in multiple versions of two varieties of GarrettCom Magnum MNS-6K Management Software, which is used for device management on managed Ethernet switches. An undocumented but discoverable hard-coded password could allow a malicious actor with access to a preexisting account on a device to elevate their account privileges to admin levels. With these elevated privileges, a denial-of-service attack could be rendered or sensitive equipment settings could be changed. 

9) Insufficiently Protected Credentials: This weakness occurred when the ICS components used an insecure method for transmission (e.g. insecure network) or storage (e.g. plain-text) of credentials. This insecurity can be exploited for malicious interception or discovery. This weakness affects implementations of the Encrypt Data tactic. ICSA-16-336-05B discusses unprotected credentials which can be exploited in multiple versions of 3GE Proficy Human Machine Interface (HMI), Supervisory Control and Data Aquisition (SCADA), and Data Historian products. In these products, if a malicious actor has access to an authenticated session, they may be able to retrieve user passwords not belonging to the authenticated account they are using. However, it cannot be exploited remotely. 

10) Improper Control of Generation of Code (’Code Injection’): This weakness occurs when the ICS components fail to or incorrectly “neutralize” user-provided input to remove code syntax. This can result in this code being executed when the input is used to generate a code segment in the software. This weakness affects the Validate Inputs tactic. ICSA-14-198-01 discusses a code injection vulnerability affecting some versions of Cogent Real-Time Systems, Inc’s DataHub, a middleware used to interface with various control systems. If a malicious actor has access to and creates a ‘Gamma’ script on the local file system, he/she can craft a specially formatted user name and password to perform a code injection attack via an ASP page to execute that script file.

Finally, we looked at which tactics are most often compromised in ICS.

The top 3 most compromised architectural tactics in ICS are Validate Inputs, Authenticate Actors, and Authorize Actors.

Furthermore, our finding indicates that:

OWASP Top 10 covers the critical Web Application Security Risks, however, it does not adequately present the risks in ICS.  Here you can learn about Top 20 security risks in ICS.


  • Many modern HMIs are now web-based and there are cloud-based SCADA systems, therefore common web vulnerabilities affect these components of ICS.
  • The most common architectural weaknesses in ICS is different than for web applications (OWASP top 10).
  • ICS are most vulnerable to CWE-20 Improper Input Validation that can result in denial of service attack, crashing ICS process and equipment.
  • ICS components are not secured by design. Many vendors have indicated that their products are designed to be used in a protected environment and therefore do not have mitigation techniques (e.g. authentication and input validation tactics) required to work in an untrusted environment.
  • Industrial control systems rely on many vendors for PLC, RTU, IED and other controllers and equipment. This increases their attack surface and makes enforcing Input Validation tactic difficult.

In a paper to appear at ICSA 2019[1], we report these findings in greater details and present advice for practitioners, such as Isolate control system devices and/or systems from untrusted networks.

You may also like:

Joanna C. S. Santos, Anthony Peruma, Mehdi Mirakhorli, Matthias Galster, Jairo Veloz Vidal, Adriana Sejfia: Understanding Software Vulnerabilities Related to Architectural Security Tactics: An Empirical Investigation of Chromium, PHP and Thunderbird. ICSA 2017

Joanna C. S. Santos, Katy Tarrit, Mehdi Mirakhorli: A Catalog of Security Architecture Weaknesses. ICSA Workshops 2017: 220-223

M. Krotofil and D. Gollmann, Industrial control systems security: What is happening?2013 11th IEEE International Conference on Industrial Informatics (INDIN), Bochum, 2013.

S. McLaughlin et al., The Cybersecurity Landscape in Industrial Control Systems, in Proceedings of the IEEE, vol. 104, no. 5, pp. 1039-1057, May 2016.

[1] Danielle Gonzalez, Fawaz Alhenaki and Mehdi Mirakhorli, “Architectural Security Weaknesses in Industrial Control Systems (ICS): An Empirical Study based on Disclosed Software Vulnerabilities”, International Conference on Software Architecture, Hamburg, Germany, March 2019.