Special Systems Design and Implementation / Intel Science & Technology Center Seminar

Friday, February 3, 2017 - 10:30am to 11:30am

Location:

Panther Hollow Conference Room, 4th Floor Robert Mehrabian Collaborative Innovation Center

Speaker:

EHSAN TOTONI, Research Scientist https://www.linkedin.com/in/ehsan-totoni-44928286

Event Website:

http://www.pdl.cmu.edu/SDI/2017/020317.html

For More Information, Contact:

karen@ece.cmu.edu

Big data analytics requires high programmer productivity and high performance simultaneously on large-scale clusters. However, current big data analytics frameworks (e.g. Apache Spark) have prohibitive runtime overheads since they are library-based. We introduce an auto-parallelizing compiler approach that exploits the characteristics of the data analytics domain and is accurate, unlike previous auto-parallelization methods. We build High Performance Analytics Toolkit (HPAT), which parallelizes high-level scripting (Julia) programs automatically, generates efficient MPI/C++ code, and provides resiliency. Furthermore, HPAT provides automatic optimizations for scripting programs, such as fusion of array operations. Thus, HPAT is 369x to 2033x faster than Spark on the Cori supercomputer at LBL/NERSC and 20x-256x on Amazon AWS for machine learning benchmarks.We also propose a compiler-based approach for integrating data frames into HPAT to build HiFrames. It automatically parallelizes and compiles relational operations along with other array computations in end-to-end data analytics programs, and generates efficient MPI/C++ code. HiFrames is 3.6x to 70x faster than Spark SQL for basic relational operations and can be several orders of magnitude faster for advanced operations.—Ehsan Totoni is a Research Scientist at Intel Labs. He develops programming systems for large-scale HPC and big data analytics applications with a focus on productivity and performance. He received his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 2014. During his Ph.D. studies, he was a member of the Charm++/AMPI team working on performance and energy efficiency of HPC applications using adaptive runtime techniques.Faculty Host: MIke Kozuch

Keywords:

Seminar Series