Authors: Salman Mahmood, Klaus Mueller
Abstract: Organizing multivariate data spaces by their dimensions or attributes can be a rather difficult task. Most of the work in this area focuses on the statistical aspects such as correlation clustering, dimension reduction, and the like. These methods typically produce hierarchies in which the leaf nodes are labeled by the attribute names while the inner nodes are often represented by just a statistical measure and criterion, such as a threshold. This makes them difficult to understand for mainstream users. Taxonomies in science, biology, engineering, etc. on the other hand, are easy to comprehend since they provide meaningful labels at the inner nodes as well. Labeling inner nodes of taxonomies automatically requires the identification of hypernyms. Our proposed framework, called Taxonomizer, takes a visual analytics approach to meet this challenge. It appeals to the wisdom of humans to liaise with state of the art data analytics, neural word embeddings, and lexical databases. It consists of a set of visual tools that starts out with an automatically computed hierarchy where the leaf nodes are the original data attributes, and it then allows users to sculpt high-quality taxonomies for any multivariate dataset.