In this talk (which I gave at Rice Univ), I explain my VLDB paper 'Only Aggregation for Large MapReduce Jobs'.
Link to my slides:
Abstract of the paper:
In online aggregation, a database system processes a user’s aggregation query in an online fashion. At all times during processing, the system gives the user an estimate of the final query result, with the confidence bounds that become tighter over time. In this paper,
we consider how online aggregation can be built into a MapReduce system for large-scale data processing. Given the MapReduce paradigm’s close relationship with cloud computing (in that one might expect a large fraction of MapReduce jobs to be run in the
cloud), online aggregation is a very attractive technology. Since large-scale cloud computations are typically pay-as-you-go, a user can monitor the accuracy obtained in an online fashion, and then
save money by killing the computation early once sufficient accuracy has been obtained.
This video is for educational purpose only.