Sensors, Networks, and Massive Data
Michael W. Mahoney, Department of Mathematics, Stanford University
Massive quantities of data are routinely generated in many scientific and non-scientific domains, and developing tools to deal with these data leads to very fundamental algorithmic and statistical challenges. At root, these data are generated in such quantities because technological developments permit us to measure or monitor or sense the world very inexpensively at unprecedented levels of granularity; and, relatedly, a common theme in many of these applications is that, since the data are generated in relatively-uncontrolled ways, "noise" is often a dominant property of the data, with interesting "signal" being a second order effect. A good "hydrogen atom" for addressing these issues and for developing algorithmic and statistical methods for massive data more generally can be found in large social and information networks, basically since nearly any "niceness" assumption, e.g., about how the data are generated or structured, is severely violated. Using this as a case study, I will describe some of the challenges that scientists will face as they deal more and more with increasingly massive data.