Stratosphere is a joint project by TU Berlin, HU Berlin, and HPI Potsdam and researches "Information Management on the Cloud". In the course of the project, a massively parallel data processing system is built. The current version of the system consists of the parallel PACT programming model, a database inspired optimizer, and the parallel dataflow processing engine, Nephele. Stratosphere has been released as open source. This talk will focus on the PACT programming model, which is a generalization of Map/Reduce, and show how PACT eases the specification of complex data analysis tasks. At the end of the talk, an overview of Stratosphere's upcoming release will be given.
Fabian has been a research associate at the Database Systems and Information Management (DIMA) group at the Technische Universität Berlin since June 2008. He is working in the Stratosphere research project, focusing on parallel programming models, parallel data processing, and query optimization.
Fabian started his studies at the University of Cooperative Education, Stuttgart, in cooperation with IBM Germany in 2003. During that course, he visited the IBM Almaden Research Center in San Jose, USA, twice and finished in 2006. Fabian undertook his studies at Universität Ulm and earned a master's degree in 2008. His research interests include distributed information management, query processing, and query optimization.