Analysis of a causal or time series relationship between two data sets (or functions) is important for fields from yield optimization to signal processing, stock market analysis to functional genomics, and many other applications.
This talk describes an algorithm developed by Karmasphere Labs for performing the entire family of cross correlation algorithms on arbitrarily large data sets. The algorithm supports wide or even unbounded windowing functions. When we reduce the algorithm to one of the degenerate cases: autocorrelation, fourier transform and so forth, the time bound on the algorithm improves.
The example is interesting because it is not just a parallelization of a classical algorithm; it is quite structurally different. We designed the algorithm according to our four tenets of map-reduce optimization, which we present as a secondary theme.
Fear not, the talk is light on the mathematics, and we have most fun with the data flow design.