Twitter generates tens of billions of events per hour when users interact with it. Analyzing these events to surface relevant content and to derive insights in real-time is a challenge. To address this, we developed and open sourced Heron, a new real time distributed streaming engine. In this presentation, we first describe the design goals of Heron and show how the Heron architecture achieves task isolation and resource reservation to ease debugging, troubleshooting, and seamless use of shared cluster infrastructure with other critical Twitter services. We subsequently explore how a topology self adjusts using back pressure so that the pace of the topology goes as its slowest component. Finally, we outline how Heron implements at-most-once and at-least-once semantics and we describe a few operational stories based on running Heron in production.
Karthik Ramasamy is the engineering manager and technical lead for real-time analytics at Twitter. He has two decades of experience working in parallel databases, big data infrastructure, and networking. He cofounded Locomatix, a company that specializes in real-time streaming processing... Read More →
Wednesday November 9, 2016 9:50am - 10:30am PST
Aspen