We use a combination of Hadoop, Neo4j and browser based visualization and interactive tools to look at graphs, search for known interesting patterns in big graphs and do ad hoc querying against graphs.
The use of a graph database allows for ad hoc querying and visualization, which has proven very valuable when working with domain experts to identify interesting patterns and paths. Using Hadoop again for the heavy lifting, we can do traversals against the graph without having to limit the number of features (attributes) of each node or edge used for traversal. The combination of both can be a very productive workflow for network analysis.
In this talk, I will demonstrate our workflow of using Hadoop to create a graph out of data and bulk load the result into Neo4j for efficient ad hoc querying and visualization, potentially partitioning the graph in Hadoop, to create partitions of manageable volume for the database. Also, I will take a look at basic graph traversal in Hadoop MapReduce, implemented using Cascading.
Note to commission: I marked this as intermediate talk, because I do not want to dive into graph algorithms or theory too much, but intend to focus on the engineering side of things. So, we'll look at tools, libraries and frameworks that we used to quickly build a workflow for network analysis.
Here is an example of something I already published on graph partitioning to give an idea: waredingen.nl/graph-partitioning-in-mapreduce-with-cascadin and waredingen.nl/graph-partitioning-part-2-connected-graphs
More Info and Slides: berlinbuzzwords.de/sessions/serious-network-analysis-using-hadoop-and-neo4j