Hadoop is an Apache project that provides a framework for running applications that process vast amounts of data (hundreds of terabytes) on large clusters (thousands of computers) of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a distributed file system and Map Reduce. This presentation presents the motivation and approach for Hadoop, an overview of the components and architecture, and an overview of some of the tools built on top of Hadoop, such as Hbase, Pig, and Hive.
Mahalo to ThinkTech Hawaii (http://www.thinktechhawaii.com) and Panopto (http://www.panopto.com) for their video recording services.