Detecting Data Corruption Hang Bugs in Cloud Server Systems

02/2017-05/2018 NC State

Cloud server systems such as Hadoop and Cassandra have enabled many real-world data-intensive applications running inside computing clouds. However, those systems present many data-corruption and performance problems which are notoriously difficult to debug due to the lack of diagnosis information.
We present DScope, a tool that statically detects data-corruption related software hang bugs in cloud server systems. DScope statically analyzes I/O operations and loops in a software package, and identifies loops whose exit conditions can be affected by I/O operations through returned data, returned error code, or I/O exception handling. After identifying those loops which are prone to hang problems under data corruption, DScope conducts loop bound and loop stride analysis to prune out false positives. We have implemented DScope and evaluated it using 9 common cloud server systems. Our results show that DScope can detect 42 real software hang bugs including 29 new bugs. In contrast, existing bug detection tools miss detecting most of those bugs.

Cassandra Compress Hadoop Hive

Detecting Timeout Bugs in Cloud Server Systems

02/2017-04/2018 NC State

Timeout is commonly used to handle unexpected failures in server systems. However, improper use of timeout can cause server systems to hang or experience performance degradation. We conduct a comprehensive study to characterize 156 real-world timeout problems in 11 commonly used cloud server systems. Our study reveals timeout problems are widespread among cloud server systems.
We futher present TScope to achieve timeout bug identification by leveraging kernel-level system call tracing and machine learning based anomaly detection and feature extraction schemes. We conducted experiments using 19 real-world server performance bugs, including 12 timeout and 7 non-timeout performance bugs. The results show that TScope correctly classifies 18 out of 19 bugs with the false positive rate 0.8%.

Apache Cassandra Flume Hadoop HBase MySQL Phoenix Qpid Tomcat ZooKeeper
LTTng SOM Java

Hytrace: A Hybrid Approach to Performance Bug Diagnosis in Production Cloud Infrastructures

08/2015-05/2018 NC State

Server applications running inside production cloud infrastructures are prone to various performance problems (e.g., software hang, performance slowdown). When those problems occur, developers often have little clue to diagnose those problems. In this project, we present Hytrace, a novel hybrid approach to diagnosing performance problems in production cloud infrastructures.
Hytrace combines rule-based static analysis and runtime inference techniques to achieve higher bug localization accuracy than pure-static and pure-dynamic approaches for performance bugs. Hytrace does not require source code and can be applied to both compiled and interpreted programs such as C/C++ and Java. We conduct experiments using real performance bugs from seven commonly used server applications in production cloud infrastructures. The results show that our approach can significantly improve the performance bug diagnosis accuracy compared to existing diagnosis techniques.

Apache Cassandra Lighttpd Hadoop MySQL Squid Tomcat
LTTng Clustering LLVM Findbugs Java Shell Script

CCM: Cloud Configuration Management System for Elastic Application Deployment in Private Clouds

01/2015-08/2015 Credit Suisse NC State

Cloud infrastructures and distributed applications become increasingly complex. We are thirsting for an easy-to-use cloud application deployment tool. This tool needs elasticity for dynamic environments, e.g., support geographically distributed hosts, well-handle cloud system anomalies, and dynamically balance workloads.
The CCM project aims to present this exactly easy-to-use application deployment tool. It has automatic component composition and instantiation. More importantly, it has an elastically auto-scaling mechanism to handle overload conditions, resource contentions, and system anomalies.

Openstack Docker ZooKeeper cAdvisor
Java Python Shell Script

Thanks to Font Awesome, Ionicons, Academicons, Social Buttons for Bootstrap, Social Buttons for Twitter Bootstrap for providing excellent fonts, logos and buttons.
Thanks to HTML Color Codes for providing great color codes, and Google Developers for providing detailed event tracking explanation.