Authors: Jose Gustavo S. Paiva, William Robson Schwartz, Helio Pedrini, Rosane Minghim
Abstract: Automatic data classification is a computationally intensive task that presents variable precision and is considerably sensitive to the classifier configuration and to data representation, particularly for evolving data sets. Some of these issues can best be handled by methods that support users’ control over the classification steps. In this paper, we propose a visual data classification methodology that supports users in tasks related to categorization such as training set selection; model creation, application and verification; and classifier tuning. The approach is then well suited for incremental classification, present in many applications with evolving data sets. Data set visualization is accomplished by means of point placement strategies, and we exemplify the method through multidimensional projections and Neighbor Joining trees. The same methodology can be employed by a user who wishes to create his or her own ground truth (or perspective) from a previously unlabeled data set. We validate the methodology through its application to categorization scenarios of image and text data sets, involving the creation, application, verification, and adjustment of classification models.