Disco is a Python-based MapReduce framework that provides a refreshing alternative to the Hadoop hegemony. In this presentation, Chris will introduce Disco and the Disco Distributed File System and demonstrate how do deploy a basic Disco installation on Amazon EC2 using StarCluster. Using examples inspired by real projects, he will show how to use Disco to work with large collections of binary data and also discuss the strengths and weaknesses of using MapReduce for large data problems.
IPython for Teaching and Collaboration: a discussion of the strengths and weaknesses of IPython for teaching statistical machine learning, as a medium for lecture notes and student collaboration. This talk will be based on the speaker's experiences as the instructor for General Assembly's course on data science.