Authors: Ke Xu, Yun Wang, Leni Yang, Yifang Wang, Bo Qiao, Si Qin, Yong Xu, Haidong Zhang, Huamin Qu
Abstract: Cloud computing is pervasively used by businesses and individuals for big data analysis and services nowadays. Detecting and analyzing potential anomalous performances in cloud computing systems is essential for avoiding losses to customers, as well as ensuring the efficient operation of the system. To this end, recent research has developed a variety of automated techniques to identify the anomalies in cloud computing performance, and these techniques are usually conducted by tracking the performance metrics of the system (e.g., CPU, memory, and disk I/O), represented by a multivariate time series. However, given the complex characteristics of these performance data, such as the huge data volume and time variation, the effectiveness of these automated methods is affected. Thus substantial human judgment on the automated analysis results is required for anomaly interpretation. In this paper, we present a unified visual analytics system named CloudDet to interactively detect, inspect, and diagnose anomalies in cloud computing. A novel unsupervised anomaly detection algorithm is developed to identify the anomalies based on the specific temporal pattern of the given metrics data (e.g., the periodic pattern), the results of which are visualized in our system to indicate the occurrences of anomalies. Moreover, rich visualization and interaction designs are used to help understand the anomalies with both spatial and temporal context. We demonstrate the effectiveness of CloudDet through a quantitative evaluation, two case studies with real-world data, and interviews with domain experts.