SDI/ISTC Seminar

Thursday, April 14, 2016 - 12:00pm to 1:00pm

Location:

ISTC Panther Hollow Conference Room, 4th Floor Robert Mehrabian Collaborative Innovation Center

Speaker:

YI PAN, Software Engineer https://www.linkedin.com/in/yi-pan-23885a9

Event Website:

http://www.pdl.cmu.edu/SDI/2016/041416.html

For More Information, Contact:

msakr @ cs.cmu.edukaren @ ece.cmu.edu

This talk will provide an overview of LinkedIn's distributed stream processing platform, including Samza/Kafka/Databus. It will first cover the high level scenarios for stream processing in LinkedIn, followed by detailed requirements around scalability, re-processing, accuracy of results, and ease of programmability; then we will focus on the requirements of stateful stream processing applications and explain how Samza’s state management allows us to build applications that meet all the above requirements. The key concepts, architecture and usage in LinkedIn's stream processing pipeline will be explained, including state management in Samza, the use and configuration of Kafka and Databus as input/output and as a change log.
We will also discuss in detail how we leverage the reliable, replayable messaging system (i.e. Kafka) together with durable state management in Samza to build a Lambda-less stream processing platform. The key mechanism to achieve a unified process model between batch and real-time stream is windowing. We will dive into the requirements and our solutions to windowing a real-time stream in this talk as well.

Yi Pan graduated from UCI with a Ph.D. in Computer Science in 2008. Since then, he has worked in distributed platforms for Internet applications for 8 years. He started at Yahoo! working on Yahoo!'s NoSQL database project, leading the development of multiple features, such as real-time notification of database updates, secondary index, and live-migration from legacy systems to NoSQL databases. Later, he joined and led the development of the Cloud Messaging System, which is used heavily as a pub-sub service and transaction log for distributed databases at Yahoo!. Since 2014, he joined LinkedIn and quickly became the lead of the Apache Samza team at LinkedIn, which provides a scalable stream processing service for the whole company. >
Faculty Hosts: Majd Sakr, Garth Gibson
Partially funded by Yahoo! Labs

Keywords:

Seminar Series