1. Daniel Templeton(Cloudera) Hadoop & Beyond Sep8: Common pitfalls in developing MapReduce applications & how to avoid them VOD

    25:42

    from Global Big Data Conference / Added

    Daniel Templeton(Software Engineer, Cloudera) Topic : Common pitfalls in developing MapReduce applications and how to avoid them Abstract : The intention of this session is to take a light-hearted look at a very technical topic. Following in the footsteps of the highly popular Java Puzzlers talks from Josh Bloch, Neal Gafter, and Bill Pugh, the structure of the talk will be to dissect a series of code samples that look innocuous, but whose behavior is anything but obvious. After presenting the code and explaining its apparent function, we will present a multiple choice list of possible results and ask the audience to vote for the right answer by show of hands. After the audience has ruled, we’ll reveal the actual behavior, talk through why it happens, and draw from the example lessons that can be put to practical use. The target audience is Hadoop developers who have at least a basic understanding of HDFS, MapReduce, and how to develop Hadoop jobs. Attendees will learn a series of best practices that will hopefully save them hours of debugging time and frustration further down the road. Profile : Daniel works in the Cloudera training team building Cloudera’s developer and data science Cloudera Certified Professional certifications. Daniel also has a long history as a software engineer in the high performance computing space and has been kicking around big data since about 2009. Prior to Cloudera, Daniel spent more than a decade at Sun doing various engineering and product management roles and speaking at conferences. Daniel has a BE in EE/CS from Vanderbilt and an MSCS from Stanford.

    + More details
    • Connor Johnson(Data Scientist, Halliburton) Data Science & Machine Learning Track Sep8 : Machine Learning with Open Source Tools VOD

      35:24

      from Global Big Data Conference / Added

      Connor Johnson(Data Scientist, Halliburton) Topic : Machine Learning with Open Source Tools Abstract : Using Python one can leverage mature open source machine learning tools. First we will discuss how to pose a machine learning problem, next we will discuss two example cases. The first example will of use decision tree classifiers from scikit-learn to train data and predict geological formations from log curves. The second will consider using Tkinter and PyBrain for quickly generating training data and then building a neural network for prediction. The advantage of using open source tools is that they can be picked apart, modified, and recombined in ways that more user-friendly analysis software and dashboards cannot. Profile : Connor Johnson is a Senior Data Manager at Halliburton where he performs various data related tasks from data collection and cleaning, to modeling and analysis. He has a master's in mathematics, and several years of experience writing code to interpret and exploit typically messy, unstructured data.

      + More details
      • Chris fregly(Founder,Flux Capacitor) Hadoop & Beyond Track Seo8 : Spark Streaming VOD

        54:44

        from Global Big Data Conference / Added

        Chris fregly(Founder,Flux Capacitor) Topic : Spark Streaming Abstract : Spark Streaming Profile : Author of Spark In Action, Manning Publications. Former Netflix Streaming and Big Data Platform Engineer

        + More details
        • Barry Zane (CEO, SPARQL City)NoSQLTrack Sep9: Breaking the Analytic Barriers of Big Data with a Scalable High Performance Engine VOD

          25:40

          from Global Big Data Conference / Added

          Barry Zane (Founder & CEO, SPARQL City) Topic : Breaking the Analytic Barriers of Big Data with a Scalable High Performance Graph Analytical Engine Abstract : Breaking the Analytic Barriers of Big Data with a Scalable High Performance Graph Analytical Engine Big Data and the Internet are creating a significant expansion in data collection and analytic possibilities, and have led to large investments in new data storage and processing capabilities over the past few years. However, what is missing are the right foundational systems that can remove the high barriers that end users face when working with these technologies, and the new data sets to which they allow access to, thus limiting their overall potential business impact. SPARQL City has built its product to bridge this gap. SPARQL City has developed its Hadoop based graph analytic solution, SPARQLverse, over the past several years and has recently launched it to market. In this session, we will discuss why graph analytics is becoming a critical component of the modern analytic toolkit in leading organizations, and more specifically, how a scalable SPARQL based solution to graph analytics opens up new possibilities in the world of Big Data for the first time. Profile : Barry was Founder and CTO of ParAccel, Inc. Previously Barry was part of the founding team and VP of Technology and Architecture at Netezza, Inc. (now IBM) and CTO of Applix (now IBM). www.linkedin.com/pub/barry-zane/2/173/b70

          + More details
          • Ankur Gupta(Director, Sears): Hadoop & Beyond Track Sep10 : Hadoop Use Cases: Speeding Up Data Workloads VOD

            13:11

            from Global Big Data Conference / Added

            Topic : Hadoop Use Cases: Speeding Up Data Workloads Abstract : One reason organizations struggle to achieve value from big data is lack of a compelling business use case. Gain insight into practical uses for Hadoop by looking at specific ways big data technologies can be leveraged to enable business analytics by speeding up data workloads: speed up ETL processing; speed up mainframe batch processing; speed up pricing optimization, fraud detection, network analytics, and more. These solutions - developed within retail, finance and healthcare industries - can be used by any organization facing similar data processing challenges to identify their own big data use case and create business value. Profile : Ankur Gupta is an IT Director at Sears Holdings Corporation. Ankur leads efforts to accelerate Big Data efforts for other enterprises, leveraging learning from implementing Hadoop and other Big Data technologies at Sears. Before moving into this role, Ankur led several other major monetization initiatives at Sears in various businesses. Prior to Sears, Ankur worked with IBM Global Services in India and US. Ankur received an MBA from Duke University and a degree in Mechanical Engineering from Indian Institute of Technology, Roorkee (India). Ankur has participated as a keynote, general session speaker or panelist at a number of events, including: The CIO Event North America, Hadoop Summit North America (2013 and 2014), IT Roadmap Conference & Expo, Big Data Business Forum, The Escape Big Data Analytics, Data360, Big Data for Executives, Big Data Week Chicago, Big Data Innovation Summit London and Netherlands’ Customer Festival.

            + More details
            • Bikas Saha(Software Engineer, Hortonworks) : Hadoop & Beyond Track Sep 8 : Apache Tez VOD

              36:16

              from Global Big Data Conference / Added

              Topic : Apache Tez Profile : Bikas has been working in the Apache Hadoop ecosystem since 2011 and is a committer/PMC member of the Apache Hadoop and Tez projects. He has been a key contributor in making Hadoop run natively on Windows and has focused on YARN and the Hadoop compute stack, with special interest in Tez. Prior to Hadoop, he has worked extensively on the Dryad distributed data processing framework that runs on some of the world’s largest clusters as part of Microsoft Bing infrastructure. @bikassaha

              + More details
              • Amit Nithianandan(WibiData) : Hadoop & Beyond Track: Sep 8: Building Next Generation, Personalized Search Applications VOD

                18:31

                from Global Big Data Conference / Added

                Amit Nithianandan(Software Engineer, WibiData): Building Next Generation, Personalized Search Applications Topic : Building Next Generation, Personalized Search Applications Abstract : Search engines help people wade through mountains of information quickly and efficiently. Apache’s Solr is one of the most popular open source search engines, enabling people to build large scale search applications customized for their domains. For example, e-commerce companies rely on search technology allows users to find exactly what they want to buy. As search becomes more pervasive, the next generation of search applications will require results returned that are not only relevant to the query terms specified, but also based on external factors such as geolocation, personal preference, past purchase history, viewing history, etc. Therefore, it becomes increasingly important that applications move to smaller form factors, such as mobile and wearable devices (i.e. Google Glass). In these situations, the penalty for a bad result is significantly higher, justifying the need for systems capable of delivering highly-relevant, personalized results. Hadoop and HBase allow developers to store and process large amounts of information than can be used to aid in decision making and Solr has also been used to power user facing search applications. However, tying these pieces of infrastructure together isn’t trivial, especially if your goal is to offer each user a customized search experience. In this presentation we'll discuss how developers can leverage the power Kiji, Wibidata’s open source framework built on top of Hadoop and HBase, to store and process user specific data both in batch and real-time. We’ll demonstrate (with a live prototype application using the MovieLens dataset) the use of machine learning techniques within the Kiji framework to help augment user queries with preference information and finally demonstrate how this information can be combined with Apache’s Solr to produce relevant, personalized search results. Profile : Amit Nithianandan is a Member of the Technical Staff on the platform engineering team at WibiData and is a contributor to the Kiji Project. Prior to WibiData, Amit was the lead search and analytics engineer at Zvents.com (now a part of eBay) where he worked on search infrastructure, relevance and the integration of access log analytics into the search engine.

                + More details
                • Ajaykumar Gupte( IBM) : NOSQL Track Sep 9: NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World VOD

                  31:57

                  from Global Big Data Conference / Added

                  Topic : Analytics: JSON Data Analysis and Acceleration in MongoDB World Abstract : MongoDB and other NoSQL solutions are designed for big data processing. In analytics world, when you need to process many millions or billions of documents to generate a single report. Novel techniques have been developed for exploiting modern processor architecture (larger on-chip cache, SIMD processing, compression, vector processing, columnar approach). Now, this technology is available to process your large JSON data.This talk will discuss analysis of JSON data using advanced data warehousing techniques and make it simple and seamless for the application/tool developer. Agenda - 1.Basic overview of JSON data management. 2.Overview of IBM in-memory accelerator 3. Performance using in-memory accelerator for JSON data Profile : Ajaykumar Gupte is a developer at IBM, working on NoSQL Hybrid model & Query processing for IBM Informix database R&D. He has 18 years of product development experience with IBM Informix database. He has developed features in the NoSQL hybrid technology with MongoDB API, NoSQL Sharding & Aggregation techniques, Distributed databases, Parallel Data Query, Hierarchical Data model, Derived table, View optimization & Data fragmentation. He holds a Bachelors degree in Computer Engineering from university of Mumbai, India.

                  + More details
                  • Vitaly Gordon(Data Scientist, Linkedin) Sep8: How LinkedIn Leveraged its Data to Become the World's Largest Professional Network VOD

                    38:53

                    from Global Big Data Conference / Added

                    Vitaly Gordon(Data Scientist, Linkedin) Topic : How LinkedIn Leveraged its Data to Become the World's Largest Professional Network Abstract : Data science can be very powerful, but data science can also be very hard. In this talk, Vitaly Gordon, a Senior Data Scientist at LinkedIn will walk you through the history of achievements, challenges and lessons of how LinkedIn transformed itself from a small data startup to a big data enterprise. Profile : Vitaly Gordon is a senior data scientist on the LinkedIn Product Data Science team where he develops data products that most of you use every day. Prior to LinkedIn, Vitaly founded the data science team at LivePerson and worked in the elite 8200 unit (the Israeli equivalent of the NSA), leading a team of researchers in developing algorithms to fight terrorism. His contributions have been recognized through a number of awards including the “Life Source” award, an award given each year deemed most high-impact in saving lives. Vitaly holds a B.Sc in Computer Science and an MBA from the Israeli Institute of Technology.

                    + More details
                    • Sriram Subramanian(Staff Software Engineer, Linkedin) Sep 9: Samza: Reliable Stream Processing atop Apache Kafka & Hadoop YARN VOD

                      37:18

                      from Global Big Data Conference / Added

                      Sriram Subramanian(Staff Software Engineer, Linkedin) Samza: Reliable Stream Processing atop Apache Kafka and Hadoop YARN Abstract : Samza is a new distributed stream processing framework developed at LinkedIn and recently incubated into the Apache Software Foundation. Built atop YARN, it provides fault tolerance, durability, scalability and even local state with a simple, Map-Reduce-like interface. Profile : Working as Staff Software Engineer at LinkedIn

                      + More details

                      What are Tags?

                      Tags

                      Tags are keywords that describe videos. For example, a video of your Hawaiian vacation might be tagged with "Hawaii," "beach," "surfing," and "sunburn."