Authors: Yuanzhe Chen, Panpan Xu, Liu Ren
Abstract: Event sequences analysis plays an important role in many application domains such as customer behavior analysis, electronic health record analysis and vehicle fault diagnosis. Real-world event sequence data is often noisy and complex with high event cardinality, making it a challenging task to construct concise yet comprehensive overviews for such data. In this paper, we propose a novel visualization technique based on the minimum description length (MDL) principle to construct a coarse-level overview of event sequence data while balancing the information loss in it. The method addresses a fundamental trade-off in visualization design: reducing visual clutter vs. increasing the information content in a visualization. The method enables simultaneous sequence clustering and pattern extraction and is highly tolerant to noises such as missing or additional events in the data. Based on this approach we propose a visual analytics framework with multiple levels-of-detail to facilitate interactive data exploration. We demonstrate the usability and effectiveness of our approach through case studies with two real-world datasets. One dataset showcases a new application domain for event sequence visualization, i.e., fault development path analysis in vehicles for predictive maintenance. We also discuss the strengths and limitations of the proposed method based on user feedback.