Disco is a Python-based MapReduce framework that provides a refreshing alternative to the Hadoop hegemony. In this presentation, Chris will introduce Disco and the Disco Distributed File System and demonstrate how do deploy a basic Disco installation on Amazon EC2 using StarCluster. Using examples inspired by real projects, he will show how to use Disco to work with large collections of binary data and also discuss the strengths and weaknesses of using MapReduce for large data problems.
This talk was presented at PyData NYC 2012: nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: sv2013.pydata.org/