The Efficient and Effective Transmission, Storage, and Retrieval of Information on a Large-Scale are among the Core Technical Problems in the Modern Digital Revolution
Anna Gilbert, University of Michigan
Even areas of science and technology that traditionally generated and analyzed small ``analog'' data sets, such as biology, now routinely handle much larger, discrete data with sophisticated algorithmic processing and generation. The massive volume of data necessitates the quest for mathematical and algorithmic methods for efficiently describing, summarizing, synthesizing, and, increasingly more critical, deciding when and how to discard data before storing or transmitting it. The mathematical and algorithmic techniques used to describe data, to capture its inherent information, and to encode it for transmission and analysis are fundamentally different from those used in the analysis of small data sets. They include using randomness to take random snapshots of data sets and running algorithms whose success is only approximate and correct with high probability. I will discuss several of these techniques, what implications they have for scientific analysis, and how these techniques are changing not only how scientists collect data but the devices with which they measure physical or biological phenomena.