Part I: Heron Fundamentals.- 1. Stream Processing.- 2. Heron Basics.- 3. Study Heron Code.- Part II: Write Heron Topologies.- 4. Migrate Storm Topology to Heron.- 5. Write Topology Code.- 6. Heron Topology Features.- 7. Heron Streamlet API.- Part III: Operate Heron Clus.- 8. Manage a Topology.- 9. Manage Multiple Topologies.- Part IV: Heron Insights.- 10. Explore Heron.- 11. Extending the Heron Metrics Sink.- 12. Extending Heron Scheduler.- 13. Extending Heron Scheduler.
Huijun Wu is an engineer at Twitter, Inc. He has been working on the Heron project since the summer of 2016 when Twitter open-sourced the Heron code and is a founding member of Apache Heron (Incubating). Prior to this, he had worked for Microsoft, ARRIS, and Alcatel Lucent. He received a Ph.D. from the School of Computing Informatics and Decision Systems Engineering at Arizona State University.
Maosong Fu is the engineering manager for the Real-Time Compute Team at Twitter. Previously he was the technical lead for Heron at Twitter and had worked on various components of Heron, including Heron Instance, Metrics Manager, Heron Scheduler, etc. He is the author of several publications related to distributed systems, and has a master’s degree from Carnegie Mellon University and a bachelor’s from Huazhong University of Science and Technology.
This book provides both a basic understanding of stream processing in general, and practical guidance for development and research with Apache Heron in particular. It delivers to developers of streaming applications basic and systematic knowledge about Heron, which is today only scattered across project documents, technique blogs and code snippets on the Web.
The book is organized in four parts: Part I describes basic knowledge about stream processing, Apache Storm, and Apache Heron (Incubating), and also introduces the Heron source repository. Part II then goes into details and describes two data models to write Heron topologies and often used topology features, including stateful processing. This part is especially targeted at software developers who write topologies using Heron APIs. Next, part III describes Heron tools, including the command-line interface and the user interface, needed to manage a single topology or multiple topologies in a data center. This part is particularly aimed at operators who deploy and manage running jobs. Eventually, part IV describes the Heron source code and how to customize or extend Heron. This part is especially suggested for software engineers who would like to contribute code to the Heron repository and who are curious about Heron insights.
Overall, this book aims at professionals who want to process streaming data based on Apache Heron. A basic knowledge of Java and Bash commands for Linux is assumed.