Big Data in the Cloud: A Case for Cloud Storage Diversity
Hakim Weatherspoon, Cornell University
Cloud computing and storage (big data) is taking the world by storm [sic]: yesterday, we stored photos and email and other materials on our personal computers and accessed remote content through browsers; today, the balance has shifted---the content is on the web ("in the cloud"), and increasingly, applications run there as well. On the positive side, this new architecture is igniting a wave of innovation and development. It promises to catalyze the technology economy, revolutionize health care (by providing a convenient and reliable means for processing and storing medical health records), financial systems (by creating powerful new opportunities for data analysis), scientific research (through access to farms of computers and shared data), and of course society (Facebook, Twitter, etc.). Yet many of these uses demand properties that today's cloud platforms either struggle to provide efficiently or lack altogether: provider independence and mobility (i.e. vendor lock-in), robustness and availability despite failure or attack, and security of data and integrity of computation. Software developers and users no longer have physical control over the security and integrity of their computation or data. In this talk, I will discuss cloud computing and big data trends and make a case for applying RAID (redundant array of inexpensive disk)-like techniques used by disks and file systems, but at the cloud storage level. I will demonstrate that striping user data across multiple providers can return control of data back to the user, allow customers to avoid vendor lock-in, reduce the cost of switching providers, and better tolerate provider outages or failures.
Background Review Article:
RACS: A Case for Cloud Storage Diversity, Hussam Abu-Libdeh, Lonnie Princehouse, Hakim Weatherspoon, Appears in Proceedings of the First ACM Symposium on Cloud Computing (SOCC), June 2010.