Neha Narkhede (LinkedIn)
July 27, 2011
Kafka is a distributed publish-subscribe messaging system aimed at providing a scalable, high-throughput, low latency solution for log aggregation and activity stream processing for LinkedIn. Built on Apache Zookeeper in Scala, Kafka aims at providing a unified stream for both real-time and offline consumption. We provide a mechanism for parallel data load into Hadoop as well as the ability to partition real-time consumption over a cluster of machines. Kafka combines the benefits of traditional log aggregators and messaging systems and has been used successfully in production for 8 months. It provides API similar to that of a messaging system and allows applications to consume log events in real-time.
Written by the SNA team at LinkedIn, Kafka is open sourced under the Apache 2.0 License and is an Apache incubator project. In this presentation, we will highlight the core design principles for this system, and how this system fits into LinkedIn's data ecosystem as well as some of the products and monitoring applications it supports in our usage.
Neha Narkhede is a Senior Software Engineer in the Search, Network and Analytics Team at LinkedIn, focusing on Distributed Systems. She is one of the initial contributors to Project Kafka. In the past she's worked on search systems in large scale databases and has been an active contributor to several projects LinkedIn has open sourced, including Voldemort, Bobo and Zoie.