1. Using HBase Co-Processors to Build a Distributed, Transactional RDBMS


    from Chicago Hadoop User Group / Added

    100 Plays / / 0 Comments

    John Leach Co-Founder and CTO of Splice Machine with 15+ years software development and machine learning experience will discuss how to use HBase co-processors to build an ANSI-99 SQL database with 1) parallelization of SQL execution plans, 2) ACID transactions with snapshot isolation and 3) consistent secondary indexing. Transactions are critical in traditional RDBMSs because they ensure reliable updates across multiple rows and tables. Most operational applications require transactions, but even analytics systems use transactions to reliably update secondary indexes after a record insert or update. In the Hadoop ecosystem, HBase is a key-value store with real-time updates, but it does not have multi-row, multi-table transactions, secondary indexes or a robust query language like SQL. Combining SQL with a full transactional model over HBase opens a whole new set of OLTP and OLAP use cases for Hadoop that was traditionally reserved for RDBMSs like MySQL or Oracle. However, a transactional HBase system has the advantage of scaling out with commodity servers, leading to a 5x-10x cost savings over traditional databases like MySQL or Oracle. HBase co-processors, introduced in release 0.92, provide a flexible and high-performance framework to extend HBase. In this talk, we show how we used HBase co-processors to support a full ANSI SQL RDBMS without modifying the core HBase source. We will discuss how endpoint transactions are used to serialize SQL execution plans over to regions so that computation is local to where the data is stored. Additionally, we will show how observer co-processors simultaneously support both transactions and secondary indexing. The talk will also discuss how Splice Machine extended the work of Google Percolator, Yahoo Labs’ OMID, and the University of Waterloo on distributed snapshot isolation for transactions. Lastly, performance benchmarks will be provided, including full TPC-C and TPC-H results that show how Hadoop/HBase can be a replacement of traditional RDBMS solutions. To view the accompanying slide deck: http://www.slideshare.net/ChicagoHUG/splice-machine-chicagohug

    + More details
    • Oozie in Practice - Big Data Workflow Scheduler - Oozie Case Study


      from Teng / Added

      108 Plays / / 0 Comments

      Oozie Introduction, Case Study, and Tips also some introduction about about Integration of Kettle and Oozie using Spoon PDF download: http://user.cs.tu-berlin.de/~tqiu/Oozie_BigData_Workflow_Scheduler_Case_Study.pdf During the past three years Oozie has become the de-facto workflow scheduling system for Hadoop. Oozie has proven itself as a scalable, secure and multi-tenant service. Oozie概述 适合使用Oozie的情景 oozie的实现原理及特点 Oozie的核心组件(各 flow control node、action node 介绍) Oozie Case Study 实战 及 Tips Oozie API编程接口介绍 支持oozie的图形化开源ETL工具Kettle初探 总结展望(与azkaban比较) More: http://www.chinahadoop.net/thread-6659-1-1.html Online Open Course: http://chinahadoop.edusoho.cn/course/19

      + More details
      • HCatalog: Table Management for Hadoop


        from Chicago Hadoop User Group / Added

        358 Plays / / 0 Comments

        Alan Gates gives us an introduction to HCatalog which he helped design. This presentation was given on September 17th, 2012 at the All State in Northbrook, IL. To view this presention on slideshare: http://www.slideshare.net/ChicagoHUG/hcatalog-chug-20120917

        + More details
        • Using Apache Drill


          from Chicago Hadoop User Group / Added

          239 Plays / / 0 Comments

          Jim Scott, CHUG co-founder and Director, Enterprise Strategy and Architecture for MapR presents "Using Apache Drill". This presentation was given on August 13th, 2014 at the Nokia office in Chicago, IL. Jim has held positions running Operations, Engineering, Architecture and QA teams. He has worked in the Consumer Packaged Goods, Digital Advertising, Digital Mapping, Chemical and Pharmaceutical industries. His work with high-throughput computing at Dow Chemical was a precursor to more standardized big data concepts like Hadoop. Apache Drill brings the power of standard ANSI:SQL 2003 to your desktop and your clusters. It is like AWK for Hadoop. Drill supports querying schemaless systems like HBase, Cassandra and MongoDB. Use standard JDBC and ODBC APIs to use Drill from your custom applications. Leveraging an efficient columnar storage format, an optimistic execution engine and a cache-conscious memory layout, Apache Drill is blazing fast. Coordination, query planning, optimization, scheduling, and execution are all distributed throughout nodes in a system to maximize parallelization. This presentation contains live demonstrations. To view this presentation on slideshare: http://www.slideshare.net/ChicagoHUG/drill-chug-20140813 The data that was demonstrated and the queries that were run have been posted to the drill-users mailing list: http://mail-archives.apache.org/mod_mbox/incubator-drill-user/201408.mbox/browser

          + More details
          • So You want to be a Beekeeper?


            from Bob Fleischer / Added

            31 Plays / / 0 Comments

            Learn some basics of what is involved in the adventure of backyard beekeeping and decide if this hobby is for you. Or do you simply want to be a good friend to bees? There are tips on this as well. Jennifer Reed, of the Middlesex Beekeepers Association and beekeeper extraordinaire, will provide guidance on such topics as hive placement, cost, time commitment, and equipment; as well as an overview of the seasonal activities involved with beekeeping.

            + More details
            • Apache Spark: An Introduction with Use Cases


              from Chicago Hadoop User Group / Added

              314 Plays / / 0 Comments

              Mike Emerick, Midwest Sales Architect for MapR presents "Hello Hadoop, meet Apache Spark" The Spark software stack includes a core data-processing engine, an interface for interactive querying, Spark-streaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction to the Spark stack, explain how Spark has lightening fast results, and how it complements Apache Hadoop. To view the accompanying slide deck: http://www.slideshare.net/ChicagoHUG/meet-spark

              + More details
              • Copious Data, the “Killer App” for Functional Programming


                from Chicago Hadoop User Group / Added

                176 Plays / / 0 Comments

                The world of Copious Data (permit me to avoid the overexposed term Big Data) is currently dominated by Apache Hadoop, a clean-room version of the MapReduce computing model and a distributed, (mostly) reliable file system invented at Google. But the MapReduce computing model is hard to use. It’s very course-grained and relatively inflexible. Translating many otherwise intuitive algorithms to MapReduce requires specialized expertise. The industry is already starting to look elsewhere… However, the very name MapReduce tells us its roots, the core concepts of mapping and reducing familiar from Functional Programming (FP). We’ll discuss how to return MapReduce and Copious Data, in general, to its ideal place, rooted in FP. We’ll discuss the core operations (“combinators”) of FP that meet our requirements, finding the right granularity for modularity, myths of mutability and performance, and trends that are already moving us in the right direction. We’ll see why the dominance of Java in Hadoop is harming progress. In fact, FP has a long tradition in data systems already, but we’ve been calling it SQL… To download the accompanying presentation: http://polyglotprogramming.com/papers/CopiousData_TheKillerAppForFP.pdf

                + More details
                • [Webinar] The Impact of Intelligence on Consumer Mobile Experiences


                  from Upsight (formerly Kontagent) / Added

                  3 Plays / / 0 Comments

                  Is your mobile experience hurting or helping your business? Do you know how to leverage your mobile user data to drive customer lifetime value, engagement and profitability? Attend this webinar to learn how mobile intelligence is answering these questions and changing the way businesses think about analytics. Join Julie Ask, Vice President and Principal Analyst at Forrester Research, Inc., and Josh Williams, President and Chief Science Officer at Kontagent as they discuss: - Why “Mobile First” organizations are using mobile intelligence to lead the market - How to craft impactful mobile strategies and KPIs based on user location, multiscreen and more Moderator: Michael Becker, Managing Director, North America, Mobile Marketing Association Speakers: Julie Ask, Vice President and Principal Analyst, Forrester Research, Inc. Josh Williams, President and Chief Science Officer, Kontagent

                  + More details
                  • Association of Southern Maryland Beekeepers (ASMB) 2011 Short Course Lecture - Hive Components


                    from SoMDBeekeeper / Added

                    196 Plays / / 0 Comments

                    Association of Southern Maryland Beekeepers (ASMB) 2011 Short Course Lecture - Hive Components

                    + More details
                    • Qubole + Forrester Webinar: Big Data in the Cloud


                      from Qubole / Added

                      23 Plays / / 0 Comments

                      Over the last few years, cloud computing has ushered in new ways in which enterprises consume software applications and infrastructure. In today's world of fast moving technology, cloud with its focus on agility and flexibility, provides companies a mechanism to keep up with change. For these reasons of enhanced agility, flexibility and lower TCO, big data infrastructure – with its ever-changing technology landscape and with its need for a lot of compute power – is a perfect candidate to take advantage of these distinguishing attributes of the cloud. In this webinar, Qubole & guest presenter Forrester will: Compare and contrast the benefits of operating Big Data infrastructure in the cloud vs on-prem data centers. Cover how cloud helps companies to derive faster time to value from big data. Talk about how agility and flexibility of the cloud benefits big data infrastructure and completely changes the model on how this infrastructure is operated in the cloud with a reduced TCO. Cover how new advances in cloud security and compliance and progressively changing perceptions around those topics in the enterprise and causing more and more enterprises to consider the cloud option for their big data infrastructure. About the Speakers: Noel Yuhanna is Pricinipal Analyst of Enterprise Architecture at Forrester Research. Noel has more than 25 years of experience in IT and has held various technical and management positions. Prior to Forrester, Noel spent several years at Exodus Communications and led a group responsible for planning and implementing mission-critical enterprise applications including ERP, CRM, and other internal apps. Ashish Thusoo is co-founder and CEO of Qubole. Before co-founding Qubole, Ashish ran Facebook’s Data Infrastructure team; under his leadership the team built one of the largest data processing and analytics platforms in the world. This platform achieved not just the bold aim of making data accessible to analysts, engineers and data scientists, but drove the big data revolution. At Facebook, Ashish helped drive the creation of a host of tools, technologies and templates that are used industry wide today.

                      + More details

                      What are Tags?


                      Tags are keywords that describe videos. For example, a video of your Hawaiian vacation might be tagged with "Hawaii," "beach," "surfing," and "sunburn."