Monday, March 6, 2017

Helping Developers Configure Application-Level Caches for Web Applications Through DevOps

by Peter (Tse-Hsun) Chen, Queen's University (@petertsehsun)
Associate Editor: Sarah Nadi, University of Alberta (@sarahnadi)

Developers nowadays use application-level caching frameworks, such as Ehcache [1], to cache objects in object-oriented languages in memory. After the objects are cached, application servers do not need to retrieve the object data from external machines such as a database server, which can significantly improve the application performance. 
Alert: Application-level caching frameworks are only useful if you know what you want to cache. However, determining the best way to use such a caching framework may not always be easy. A large-scale web application may have hundreds of different types of objects. Now the challenge is how do we know which type of objects should be cached?
Developers need to manually configure the caching frameworks to enable cache for each type of object (e.g., enable cache for Student objects). Therefore, it is difficult to know what the best cache configuration is, especially given that the benefit of caching is directly related to how users are using the application. For example, enabling caching on frequently-modified objects, will not improve the application performance, but will actually slow down the application due to frequent cache renewal. 
How can we help improve the performance of web applications by finding better cache configurations? 
CacheOptimizer - finding optimal cache configurations by leveraging DevOps.DevOps tries to combine the developer and administrator world to further improve software development. Therefore, one main concept of DevOps is to assist software development by understanding user behaviour (such as knowing how the application is used in production). However, what kind of data can we analyze to recover the user behaviour? One potential way is to instrument the application to record the information that we need. However, this solution is not acceptable for applications running in production, since instrumentation will add too much overhead to the application. 
Instead, we propose a framework, called CacheOptimizer, that analyzes readily-available application runtime logs to automatically help developers find the optimal cache configurations when using application-level caching frameworks. 
Understanding user behaviour by combining static code analysis and runtime logs.CacheOptimizer analyzes web access logs to understand how users are accessing the application. Such access logs are automatically generated by web servers like Tomcat, so we do not add any extra overhead to the application. CacheOptimizer then links the logs to data accesses using static code analysis. For example, consider the following log line: 
200 127.0.0.1 /user/getDetails/peter
The log line contains information about a user request. Here, 200 represents the HTTP status code and 127.0.0.1 represents the IP of the user. Through static code analysis, we can find the method in the code that handles this particular user request. We can also uncover the type of data access that is called in this request-handler method (e.g., accessing user data in the database). Then, by combining the information in the log and the information that we obtained from the code, we know that the user request is reading detailed information about the user Peter from the database. In this simple example, since we are only reading user objects from the database, the optimal cache configuration would be to enable caching on the User class. 
In short, given the logs, CacheOptimizer can recover how the users are actually using the application and can suggest the optimal cache configuration for the application. 
CacheOptimizer brings significant performance improvement. We apply CacheOptimizer to three open source applications. We find that CacheOptimizer can significantly improve the throughput of the application (it can process more requests per second)! 
We compare CacheOptimizer with several different cache configurations. CacheAll, where we simply enable cache on all objects; DefaultCache, where we use the default cache configuration that already exists in the studied application; and NoCache, where we disable all the caches. The figure below shows the cumulative throughput when using different cache configurations in one of the studied applications. We can see that by using CacheOptimizer, we can achieve a much better throughput compared to all other configurations.
Assisting development using production data. Finding the optimal cache configuration is just one way that we can assist software development by leveraging production data. Due to the agile natural of modern software development, a lot of software design and development decisions are directly affected by how users are actually using the application. The idea of DevOps brings a new set of research challenges and interesting problems. As software practitioners and researchers, can we think of new ways to assist software development by embracing DevOps?
You can find more details about CacheOptimizer in our research paper [2].
[1] Ehcache. http://www.ehcache.org/. Last accessed Feb. 6 2017 [2] Tse-Hsun Chen, Weiyi Shang, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora. 2016. CacheOptimizer: helping developers configure caching frameworks for hibernate-based database-centric web applications. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016).

No comments:

Post a Comment