Authors: Anamaria Crisan, Tamara Munzner
Abstract: Domain experts are inundated with new and heterogeneous types of data and require better and more specific types of data visualization systems to help them. In this paper, we consider the data landscape that domain experts seek to understand, namely the set of datasets that are either currently available or could be obtained. Experts need to understand this landscape to triage which data analysis projects might be viable, out of the many possible research questions that they could pursue. We identify data reconnaissance and task wrangling as processes that experts undertake to discover and identify sources of data that could be valuable for some specific analysis goal. These processes have thus far not been formally named or defined by the research community. We provide formal definitions of data reconnaissance and task wrangling and describe how they relate to the data landscape that domain experts must uncover. We propose a conceptual framework with a four-phase cycle of acquire, view, assess, and pursue that occurs within three distinct chronological stages, which we call fog and friction, informed data ideation, and demarcation of final data. Collectively, these four phases embedded within three temporal stages delineate an expert's progressively evolving understanding of the data landscape. We describe and provide concrete examples of these processes within the visualization community through an initial systematic analysis of previous design studies, identifying situations where there is evidence that they were at play. We also comment on the response of domain experts to this framework, and suggest design implications stemming from these processes to motivate future research directions. As technological changes will only keep adding unknown terrain to the data landscape, data reconnaissance and task wrangling are important processes that need to be more widely understood and supported by the data visualization tools. By articulating a concrete understanding of this challenge and its implications, our work impacts the design and evaluation of data visualization systems.