In the previous episode (vimeo.com/73849021), we saw how to to transfer some file data into Hadoop. In order to interrogate easily the data, the next step is to create some Hive tables. This will enable quick interaction with high level languages like SQL and Pig.
We experiment with the SQL queries, then parameterize them and insert them into a workflow in order to run them together in parallel. Including Hive queries in an Oozie workflow is a pretty common use case with recurrent pitfalls as seen on the user group. We can do it with Hue in a few clicks.
In this talk I'll show how a number of tools from the pandas library can be used to quickly wrangle raw data into shape for analysis. Techniques for structured and semi-structured data manipulation, cleaning and preparation, reshaping, and other common tasks will be the main focus.
Sandro Hawke, June 8, 2010 - MIT Cambridge, MA
World Wide Web Consortium w3.org
Although the first Semantic Web standards are more than ten years old, only recently have we begun to actually see machines sharing data on the Web. The key turning point was the acceptance of the core Linked Data principle, that object identifiers should also work with Web protocols to access useful information. This talk will cover the basic concepts and techniques of publishing and using Linked Data, assuming some familiarity with programming and the Web. No prior knowledge of Semantic Web technologies is required.