To understand how the immune system works, one needs to have a clear picture of its cellular compositon and the cells’ corresponding properties and functionality. Mass cytometry is a novel technique to determine the properties of single-cells with unprecedented detail. This amount of detail allows for much finer differentiation but also comes at the cost of more complex analysis. In this work, we present Cytosplore, implementing an interactive workflow to analyze mass cytometry data in an integrated system, providing multiple linked views, showing different levels of detail and enabling the rapid definition of known and unknown cell types. Cytosplore handles millions of cells, each represented as a high-dimensional data point, facilitates hypothesis generation and confirmation, and provides a significant speed up of the current workflow. We show the effectiveness of Cytosplore in a case study evaluation.
Mass cytometry allows high-resolution dissection of the cellular composition of the immune system. However, the high-dimensionality, large size, and non-linear structure of the data poses considerable challenges for data analysis. Here, we introduce Hierarchical Stochastic Neighbor Embedding (HSNE), a computational approach that constructs a hierarchy of non-linear similarities. We integrated HSNE into the Cytosplore+HSNE framework to facilitate interactive exploration and analysis of the hierarchy by a set of corresponding two-dimensional plots with a stepwise increase in detail up to the single-cell level. We validated its discovery potential by re-analyzing a study on gastrointestinal disorders and two other publicly available mass cytometry datasets. We found that Cytosplore+HSNE efficiently identifies rare cell populations, missed in a previous analysis, without a need for downsampling and in a very short time span. Thus, Cytosplore+HSNE offers single-cell resolution while exploring mass cytometry datasets on tens of millions of cells on a standard computer.