1. I/Opener to the Big I/O - Oleg Zhurakousky

    53:29

    from JavaZone / Added

    77 Plays / / 0 Comments

    There are many mechanisms for storing and processing a collection of data sets so large and complex that we collectively refer to it as Big Data. Using live demos and code we'll show you how simple yet well-known and very powerful techniques can help you optimize data Capture, Storage and Access. There are many mechanisms for storing and processing a collection of data sets so large and complex that we collectively refer to it as Big Data. From No SQL data stores to the Distributed File Systems and Computation engines to columnar stores to flat files - its all about capture, storage, analysis, searches etc. We want it all and we want it fast and traditional data processing applications can no longer support our demands. And while technologies such as Hadoop and its ecosystem derivatives paved an initial path to solving Big Data problems the approaches and assumptions they are built on starting to show its limitations one could only overcome by radically changing the way we think about storing and accessing data in general. In the end it’s all about I/O and how to make it more efficient. The following is the small sub-set of questions that will help set the scope and drive this presentation. - How to deal with capturing high data volumes (1+ million events per/sec). - How to store and organize the data? Unstructured doesn't mean un-organized - Compress, encode or pack? What are the differences, pros and cons? - Data-Type patterns. What does it mean? How to spot them during data capture and what are the benefits? - Loss of analytical data available (for free) during the capture. What, Why, the implications and how to deal with them? - Is disk speed the limit for how fast the data can be captured/accessed? - Role of CPU/RAM in I/O intensive environments and can they play a role? In the end using live demos and code we'll show you how simple yet well known and very powerful techniques can help you optimize: - CAPTURE of data in high volumes environments (1+ million events per/sec) - STORAGE of captured data, making it much smaller (10:1 to 20:1), thus more efficient for general read/write. - ACCESS of stored data based on optimization techniques used during its capture and storage, further increasing I/O read speeds when accessing such data (e.g., search 1B records in just few seconds - single laptop). Oleg Zhurakousky Oleg is a Principal Architect with Hortonworks responsible for architecting scalable BigData solutions using various OpenSource technologies available within and outside the Hadoop ecosystem. Before Hortonworls Oleg was part of the SpringSource/VMWare where he was a core engineer working on Spring Integration framework, leading Spring Integration Scala DSL and contributing to other projects in Spring portfolio. He has 18+ years of experience in software engineering across multiple disciplines including software architecture and design, consulting, business analysis and application development. Oleg has been focusing on professional Java development since 1999. Since 2004 he has been heavily involved in using several open source technologies and platforms across a number of projects around the world and spanning industries such as Teleco, Banking, Law Enforcement, US DOD and others. As a speaker Oleg presented seminars at dozens of conferences worldwide (i.e.SpringOne, JavaOne, Java Zone, Jazoon, Java2Days, Scala Days, Oredev, Uberconf, and others).

    + More details
    • Go Beyond "Debug": Wire Tap your App for Knowledge with Hadoop

      47:31

      from Øredev Conference / Added

      45 Plays / / 0 Comments

      Today, application developers devote roughly 80% of their code to persisting roughly 20% of the total data flowing through the applications. The other 80% of the data is "Event Data" that can no longer be ignored if you want to stay competitive. Changes to application state are already stored as a sequence of events in application and middleware logs. In fact, since this data never held value to anyone but the developer in the past, a lot of potentially valuable information is often never collected In this talk, we will demonstrate how capturing all event data could dramatically simplify data collection and management within the enterprise.

      + More details
      • High Speed Continuous & Reliable Data Ingest into Hadoop

        01:03:46

        from JavaZone / Added

        313 Plays / / 0 Comments

        10M events per second into HDFS, Under a sec query per 20GB of HDFS data. . . All of this and more will be demonstrated live during this talk This talk will explore the area of real-time data ingest into Hadoop and present the architectural trade-offs as well as demonstrate alternative implementations that strike the appropriate balance across the following common challenges: * Decentralized writes (multiple data centers and collectors) * Continuous Availability, High Reliability * No loss of data * Elasticity of introducing more writers * Bursts in Speed per syslog emitter * Continuous, real-time collection * Flexible Write Targets (local FS, HDFS etc.) Intended audience Developers and Architects that are currently working or planning to work with Hadoop platform. Also, while being an employee of a company that provides a packaged distribution of the Hadoop platform, everything shown and used during the presentation will be based on raw Open Source Hadoop platform.

        + More details
        • Enterprise Integration and Batch Processing on Cloud Foundry

          01:00:50

          from JavaZone / Added

          268 Plays / / 0 Comments

          Cloud Foundry, the open source PaaS from VMware, and cloud in general, to some extent, provides today's developers with unique opportunities:scale! Unlimited scale! In this talk, Spring Integration committer Oleg Zhurakousky and Spring Developer Advocate Josh Long introduce how to use Cloud Foundry and RabbitMQ to build integration and batch processing solutions that can scale to meet any challenge with Spring Integration and Spring Batch.

          + More details

          What are Tags?

          Tags

          Tags are keywords that describe videos. For example, a video of your Hawaiian vacation might be tagged with "Hawaii," "beach," "surfing," and "sunburn."