Authors: Xi Ye, Shouxing Xiang, Jiazhi Xia, Jing Wu, Yang Chen, Shixia Lu
Abstract: In this paper, we develop a visual analysis method for interactively improving the quality of labeled data, which is essential to the success of supervised and semi-supervised learning. This is achieved through the use of user-selected trusted items. We employ a bi-level optimization model to accurately match the labels of the trusted items and to minimize the training loss. Based on this model, a scalable data correction algorithm is developed to handle tens of thousands of labeled data efficiently. The selection of the trusted items is facilitated by an incremental tSNE with improved computational efficiency and layout stability to ensure a smooth transition between different levels. To prioritize the display of buggy data, we have taken specific consideration of outliers, items whose labels are different from those of its neighboring items, in the sampling process, and used a density map to reflect outlier ratios. We evaluated our method on real-world datasets through quantitative evaluation and case studies, and the results were generally favorable.