Techniques for constructing compilers for parallel computers have been in academic and commercial environments over the past 30 years. Some of these techniques are not quite mature and in common use, but there is still much active research in this area. Most of the reference material is scattered over many conference proceedings and journals. This book is intended to serve as coherent presentation of the important basic ideas used in modern compilers for parallel computer systems. It can be used as a reference or as a text for second or third course on compilers at the senior undergraduate or graduate level.
This book differs from previous collections in that its focus is not the automatic discovery of parallelism from sequential programs, though that is also included. Instead, its focus is techniques to generate optimized parallel code, given the source program and target architecture. The optimizer in high performance compilers is organized as a deep analysis phase followed by a code generation, or synthesis, phase. This book follows that organization.
The first chapter introduces the material and takes one example program through several stages of optimization for a variety of target machines. Each target architecture is presented at a high level, and is modeled after current commercial machines.
Chapter 2 discusses programming languages issues. Of particular interest are the parallel language extensions proposed for various languages, such as array assignments in Fortran 90, the forall statement in High Performance Fortran, and other parallel loop constructs.
Chapter 3 and 4 introduce basic analysis algorithms that are in common use in compilers. Chapter 3 focuses on algorithms for graphs, which are used in compilers to represent control flow, interprocedural calls, and dependence. Chapter 4 discusses various aspects of linear algebra, which is becoming more important in compilers, including subjects such as solving linear and integer systems of equations and inequalities.
Chapter 5 through 8 cover aspects of the analysis phase of the optimizer. Chapter 5 presents the basic ideas behind data dependence relations, as used in compilers. In order to allow the most freedom in reordering and optimizing a program, compilers find dependence relations between program statements that cannot be violated without changing the meaning of the program.
Chapter 6 discussed various aspects of scalar analysis that are important for parallel computing, such as constant propagation and precise data dependence analysis for scalars. In particular, induction variable detection is important for array analysis.
Chapter 7 shows how to use the linear algebra techniques from Chapter 4 to find data dependence relations between array references. Because the linear algebra appears separately, this chapter is somewhat shorter than such a chapter might be in other books on the subject.
Chapter 8 discusses other problems related to dependence analysis, such as summarizing array accesses across procedural boundaries, solving data dependence analysis problems in the presence of pointers and I/O, and so forth.
Chapter 9 details the techniques used in the restructuring phase of creating the optimizer, focusing on loop structuring techniques. A catalogue of loop optimizations is presented, along with examples to show effects on performance.
Chapter 10 through 14 show how to tailor the code generation for various target architectures. Chapter 10 discusses a sequential target machine in which the compiler restructures the program to take advantage of a memory hierarchy (typically one or more levels of processor cache memory).
Chapter 11 presents methods to generate code and optimize for shared-memory parallel computers, which are now becoming common even at the workstation level. Automatic discovery of parallelism is also discussed.
Chapter 12 shows how to apply similar techniques for vector instruction sets, including automatic vectorization and supervector code generation.
Chapter 13 presents code generation methods for massively parallel message-passing computer systems, of both the SIMD and MIMD variety.
Chapter 14 shows techniques for massively parallel shared-memory systems. Different methods are used for the three varieties of machines in this category, depending on whether processor cache memories are kept consistent using global information or local information, or are absent altogether.
Each chapter includes a section titled "In the Pit," which includes hints and other anecdotal material that may be of some use when applying the information covered in the chapter. A "Further Reading" section contains citations to the original material in the reference list, and the exercises can be used as assignments or to test comprehension of the material.
Additional material that has proved useful for teaching is available via anonymous FTP form the machine bc.aw.com in the directory bc/wolfe/highperform. This material includes Postscript copies of the figures, programs implementing many of the algorithms, and the Tiny loop restructuring tool. A more complete bibliography, in Bibtex format, along with citations by chapter, can also be found. There is a README file containing useful information about the rest of the directory.
Acknowledgments
This book grew out of a series of short courses that I offered over the past three years. Little material in this book is original or invented by the author; I owe a debt of gratitude to the many developers of the techniques used here. My introduction to this topic was working with the Parafrase group at the University of Illinois from 1976 to 1980. Many ideas now crucial in modern restructuring compilers were developed during the time. During my tenure at Kuck and Associates, Inc., I learned the important distinction between science and engineering , and the good engineering in a compiler is critically important. After joining the Oregon Graduate Institute, I have had more contacts with compiler researchers and developers around the globe; I have learned more during this period than ever before.
I especially thank the reviewers, who helped maintain consistency and made numerous important and helpful suggestions: Ron K. Cytron of Washington University, James Larus of the University of Wisconsin, Carl Offner of Digital Equipment Corp., J. (Ram) Ramanujam of Louisiana State University, David Stotts of the University of North Carolina, and Alan Sussman of the University of Maryland. Any errors contained herein are, of course, entirely my own fault. Several students at OGI reviewed selected chapters: Tito Autrey, Michael Gerlek, Priyadarshan Kolte, and Eric Stoltz. The editors and staff at the Benjamin/Cummings Publishing Co. were very encouraging, and to them I owe a great deal of gratitude.
Michael Wolfe
0805327304P04062001