Hadoop provides a powerful tool for batch-analyzing large amounts of multi-structured data. It can be made to tap into various sources of raw data in their dormant state and parallelize computations onto hundreds of nodes.
Yet Hadoop's ecosystem leaves something to be desired for Haskell programmers. Complex data formats such as custom-format logs, proprietary data stores and deep JSON files must be decoded into something the Hive/Pig ecosystem will understand, introducing many redundancies along the way. The alternative is to drop down to Hadoop Streaming and use the available Java/Python/Ruby/Scala toolchain, abandoning the advantages of Haskell.
In this talk, we will introduce Hadron: A Haskell library/toolkit that makes it possible to construct Map-Reduce programs in Haskell and run them on Hadoop as smoothly as possible. We will discuss what motivated Hadron, how its design materialized within a very limited timeframe, how a "free monad" was used to kill two birds with one stone, and some of the API pain points that remain as future work.
Hadron has been developed by Soostone originally for use in client projects and will be formally open sourced following this month's NY Haskell Meetup.