PLDI11-Tutorial

Compiler Challenges for Task-Parallel Languages

PLDI 2011 Tutorial, Sunday, June 5, 2011, 1:30 pm - 5:00 pm

Tutorial Slides: 4-up, 2-up

Summary

The computer industry is at a major inflection point in its hardware roadmap with an industry-wide shift to homogeneous and heterogeneous multicore processors. Unlike previous generations of hardware evolution, this shift will have a profound impact on software. There has been a growing trend towards task parallelism in programming language products such as OpenMP 3.0 and Cilk, and in more recent research languages such as X10, Chapel, and Habanero-Java (HJ). While task parallelism increases productivity by allowing the programmer to express multiple levels of logical parallelism for portability and convenience, it can also lead to severe performance degradation due to increased overheads. In some cases, the performance gap between unoptimized and hand-optimized task-parallel programs can be very significant, with slowdown factors as high as 10x or 100x. Unfortunately, traditional compilers are unable to bridge this gap automatically because the historical foundations of compiler optimization including intermediate representations, data flow analyses, and code transformations are all deeply entrenched in the von Neumann model of sequential computing, and have to be reworked for parallelism.

In this tutorial, we briefly summarize the similarities and differences among the five task-parallel languages mentioned above, and use the summary to motivate a common set of primitives that are suitable for use in parallel intermediate representations (PIRs) for task-parallel programs. We focus on compilation challenges for task-parallel programs at the PIR level, including optimization of task creation, termination, synchronization, scheduling and communication operations. We show how these challenges can be addressed by a PIR-level transformation framework that formalizes data dependence constraints for explicitly parallel programs. Specific examples will be provided based on our experience with optimization of heap access, task creation, termination, synchronization and scheduling operations for Habanero-Java (HJ) on a range of multicore SMP's; the HJ compiler uses a PIR implemented as an extension to the open-source Soot framework. Examples will also be provided of communication optimizations of X10 programs on distributed-memory parallel machines. We conclude by identifying co-design opportunities for runtime software and hardware extensions that can assist compilers in addressing the challenges discussed in the tutorial.

The specific topics to be covered in the tutorial are as follows:

1) Identification of common primitives in task-parallel programs, using five task-parallel languages as motivation (OpenMP 3.0, Cilk, X10, Chapel, Habanero-Java [HJ])

2) Incorporation of these primitives in parallel intermediate representations (PIRs) for task-parallel programs

3) Formalization of dependence constraints for PIR-level code transformations

4) Summaries of PIR-level compiler optimizations for optimization of heap access [BS09], task creation/termination [ZSNS10], synchronization [SZNS09] and scheduling [GBRS09] operations for Habanero-Java (HJ) on a range of multicore SMP's, and for communication optimizations of X10 programs [BZGPBS10] on distributed-memory parallel machines

5) Comparison with related work

6)Open problems and interesting areas for future research

Target Audience

The target audience for this tutorial includes practitioners and researchers interested in understanding the fundamental primitives of task-parallel programs and their implementation challenges on modern multicore processors. All attendees will leave the tutorial with an understanding of the limitations of current compilers in optimizing these languages, and of the technical foundations of the compiler frameworks necessary for task-parallel programs. An understanding of the compiler implications of these primitives will be useful for language designers and implementers (who may choose to add task parallelism to their language), to application and library programmers (who may choose to use task parallelism as a foundation for their software), and to hardware designers and implementers (who will need to pay more attention to task-parallel primitives in the future). There is a set of open research problems that arise at the language, compiler, runtime, and hardware levels that should be of interest to researchers in the field. The prerequisite knowledge assumed is familiarity with the foundations of compiler analysis and optimization of sequential programs.

References

•[BS09] Interprocedural Load Elimination for Dynamic Optimization of Parallel Programs. Rajkishore Barik, Vivek Sarkar. The Eighteenth International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2009.
•[BZGPBS10] Communication Optimizations for Distributed-Memory X10 Programs. Rajkishore Barik, Jisheng Zhao, David Grove, Igor Peshansky, Zoran Budimlić, Vivek Sarkar. 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS), April 2011.
•[GBRS09] Work-First and Help-First Scheduling Policies for Terminally Strict Parallel Programs. Yi Guo, Rajkishore Barik, Raghavan Raman, Vivek Sarkar. 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2009.
•[HJ] Habanero-Java web site, http://habanero.rice.edu/hj.
•[SZNS09] Chunking Parallel Loops in the Presence of Synchronization. Jun Shirako, Jisheng Zhao, Krishna Nandivada, Vivek Sarkar. Proceedings of the 2009 ACM International Conference on Supercomputing (ICS), June 2009.
•[ZSNS10] Reducing Task Creation and Termination Overhead in Explicitly Parallel Programs. Jisheng Zhao, Jun Shirako, Krishna Nandivada, Vivek Sarkar. The Nineteenth International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2010.

Bio

Vivek Sarkar conducts research in multiple aspects of parallel software including programming languages, program analysis, compiler optimizations and runtimes for parallel and high performance computer systems. He holds the E.D. Butcher Chair in Engineering at Rice University, where he is a professor of Computer Science and of Electrical and Computer Engineering and leads the Habanero Multicore Software Research project. He also serves as Associate Director of the NSF Expeditions Center for Domain-Specific Computing (CDSC), and as co-PI on the DARPA-funded project on Platform-Aware Compilation Environment (PACE). Prior to joining Rice in July 2007, Vivek was Senior Manager of Programming Technologies at IBM Research. His responsibilities at IBM included leading IBM's research efforts in programming model, tools, and productivity in the PERCS project during 2002- 2007 as part of the DARPA High Productivity Computing System program. His past projects include the X10 programming language, the Jikes Research Virtual Machine, the ASTI optimizer used in IBM's XL Fortran product compilers, the PTRAN automatic parallelization system, and profile-directed partitioning and scheduling of Sisal programs. Vivek became a member of the IBM Academy of Technology in 1995, the E.D. Butcher Professor in Engineering at Rice University in 2007, and was inducted as an ACM Fellow in 2008. He holds a B.Tech. degree from the Indian Institute of Technology, Kanpur, an M.S. degree from University of Wisconsin-Madison, and a Ph.D. from Stanford University. In 1997, he was on sabbatical as a visiting associate professor at MIT, where he was a founding member of the MIT RAW multicore project. Vivek has given several tutorials on a wide range of topics related to programming models and compilers at past conferences including PLDI 1993, POPL 1996, ASPLOS 1996, PLDI 2000, Java Grande 2001, OOPSLA 2003, ECOOP 2004, OOPSLA 2006, PPoPP 2007, PLDI 2007, PLDI 2008, and PLDI 2009.