In this talk we will introduce the typical predictive modeling tasks on "not-so-big-data-but-not- quite-small-either" that benefit from distributed the work on several cores or nodes in a small cluster (e.g. 20 * 8 cores).

We will talk about cross validation, grid search, ensemble learning, model averaging, numpy memory mapping, Hadoop or Disco MapReduce, MPI AllReduce and disk & memory locality.

We will also feature some quick demos using scikit-learn and IPython.parallel from the notebook on an spot-instance EC2 cluster managed by StarCluster.

Loading more stuff…

Hmm…it looks like things are taking a while to load. Try again?

Loading videos…