Tutorial Abstracts

Sunday

The Science and Practice of Supercomputing Benchmarking
High Performance Fortran in Practice
Message-Passing Programming for Scientists and Engineers
Understanding and Developing Virtual Reality Systems
A Practical Guide to Java
Performance Programming for Scientific Computation
Memory Consistency Models for Shared-Memory Multiprocessors
An Intensive and Practical Introduction to the Message Passing Interface (MPI)

Monday

Applications of Web Technology and HPCC
Designing and Building Parallel Programs: An Introduction to Parallel Programming
Interactive Visualization of Supercomputer Simulations
Introduction to Effective Parallel Computing
Parallel I/O on Highly Parallel Systems
Hot Chips for High Performance Computing
Tuning MPI Applications for Peak Performance
Reinventing The Supercomputer Center

Sunday

The Science and Practice of Supercomputing Benchmarking

van der Steen, Aad J.
Academic Computing Centre Utrecht, The Netherlands

This tutorial presents a scientific approach to benchmarking and follows the methodology of the first "Parkbench" committee report. It defines a clear set of units and symbols, followed by a carefully defined set of performance parameters and metrics, and finally a hierarchy of parallel benchmarks to measure them. A new theory of performance scaling is presented through the concept of "Computational Similarity" which allows the scaling of an application for all computers and all problem sizes to be represented in a single dimensionless DUSD diagram. Benchmarking practice covers the general principles of properly setting up benchmarks and how to assess their results and relate them to other techniques like simulation and machine modelling. Results on current machines like the Cray T90, Cray T3E, Hitachi SR2201, HP/Convex SPP-1600, Fujitsu VPP700, and NEC SX4 will be discussed.

High Performance Fortran in Practice

Charles Koelbel
Center for Research on Parallel Computation, Rice University

High Performance Fortran (HPF) was defined in 1993 to provide a portable syntax for expressing data-parallel computations in Fortran. A major revision of HPF (termed HPF 2.0) will be completed by Supercomputing '96. Since the appearance of the High Performance Fortran Language Specification (available as an issue of Scientific Programming and by ftp, gopher, and WWW), several commercial compilers have appeared. There has also been great interest in HPF as a language for efficient parallel computation. The purpose of this tutorial is three-fold:

To introduce programmers to the most important features of HPF 2.0, including features inherited from Fortran 90, data mapping by ALIGN and DISTRIBUTE directives, and the data parallel FORALL statement and INDEPENDENT assertion.
To illustrate how these features can be used in practice on algorithms for scientific computation such as LU decomposition and the conjugate gradient method.
To inform users of future directions of HPF, including recommended extensions to HPF 2.0 in the areas of advanced data mapping, task parallelism, and external interfaces.

The tutorial will both broaden the appeal of HPF and help users achieve its maximum potential.

Message-Passing Programming for Scientists and Engineers

Cherri M. Pancake
Oregon State University

Hugh M. Caffey
Hewlett-Packard Company

In this tutorial, the principles of parallel programming in a message-passing environment will be introduced in terms that make sense to non-computer scientists. Emphasis will be on practical information, with a series of example programs being used to guide newcomers through the important stages of writing and tuning message-passing codes. The tutorial will not address details of parallel architectures, algorithms or theoretical models. Instead, it will offer a minimal-trauma introduction to the isses at stake in deciding whether or not to parallelize and application, basic approaches to adding parallelism and techniques for debugging, evaluating and tuning parallel programs.

Understanding and Developing Virtual Reality Systems

Henry Sowizral
Sun Microsystems

This tutorial provides an in-depth look at virtual reality systems and their construction. It introduces virtual reality with an overview of how a VR system operates, a brief history, and videos showing a collection of VR applications in operation. It continues with a survey discussing the component parts of a VR system, both hardware and software. It includes information on how those components operate and pointers to suppliers of various products. Next, the tutorial delves into the many topics involved in making the VR experience more "real" such as correcting for errors introduced by the display's optical pathway, correcting for tracker errors and lag, understanding how to use the graphics hardware most effectively, handling scene complexity, and inserting an ego-centric human model (avatar) into the scene. The tutorial concludes with a description of augmented environments and their operation.

A Practical Guide to Java

Ian G. Angus
Boeing Information and Support Services

In the span of just one year Java has risen from an experimental language to be the Internet programming language du jour. The question naturally arises: -- What is Java and how is it used? In this tutorial we will go beyond the hype. We will: 1) Introduce the Java programming language and explain the concepts (and buzzwords) that define it with careful coverage given to both its strengths and weaknesses. 2) Show with simple examples how Java can be used on the WWW and how to take advantage of it. 3) Demonstrate Java's greater potential with a discussion of selected non-WWW applications for which Java provides unique capabilities.

Performance Programming for Scientific Computation

Larry Carter
UCSD and SDSC

Bowen Alpern
IBM's Watson Research Center

Performance programming is the design, writing, and tuning of programs to sustain near-peak performance. This tutorial will present a unified framework for understanding and overcoming the bottlenecks to high performance. The goal of the course is to make performance programming a science rather than a craft. Development of high performance programs has always required an acute sensitivity to details of processor and memory hierarchy architecture. The advent of modern workstations and supercomputers brings to the fore another concern - parallelism. The tutorial will identify four requirements for attaining high performance at any level of computation. General techniques for satisfying these requirements can be applied to improve performance of the processor, of the memory hierarchy, and of parallel processors. Application of the techniques are illustrated with a variety of examples.

Memory Consistency Models for Shared-Memory Multiprocessors

Sarita V. Adve
Department of Electrical and Computer Engineering, Rice University

Kourosh Gharachorloo
Western Research Lab, Digital Equipment Corporation

A memory consistency model for a shared-memory system indicates how the memory operations of a program will appear to execute to the programmer. The most commonly assumed memory model is Lamport's sequential consistency (SC) which requires a multiprocessor to appear like a multiprogrammed uniprocessor. While SC provides a familiar interface for programmers, it restricts the use of several common uniprocessor hardware and compiler optimizations, thereby limiting performance. For higher performance, alternate memory models have been proposed. These models, however, present a more complex programming interface. Thus, the choice of the memory model involves a difficult, but important tradeoff between performance and ease-of-use. This tutorial will survey several currently proposed memory models, place them within a common framework, and assess them on the basis of their performance potential and ease-of-use. The following specific topics will be covered. - The problem of memory consistency models (including the difference from the cache coherence problem). - Implementing sequential consistency. - Alternative memory models (descriptions and performance benefits): a. system-centric approach: e.g., weak ordering, processor consistency, release consistency, and models adopted by Digital, IBM, and Sun. b. programmer-centric approach: e.g., the data-race-free and properly labeled methods for describing memory models. - Interaction with other latency hiding techniques (e.g., prefetching). - More aggressive implementations of memory consistency models. - Relaxed consistency models for software DSM systems (e.g., Munin, Treadmarks, Midway). The tutorial will assume rudimentary knowledge of shared-memory multiprocessor organization. It will cover both the basic problem in detail, and advanced issues that represent ongoing research.

An Intensive and Practical Introduction to the Message Passing Interface (MPI)

William C. Saphir
NASA Ames Research Center/MRJ

MPI has taken hold as the library of choice for portable high-performance message passing applications. The complexity of MPI presents a short but steep learning curve. This tutorial provides a rapid introduction to MPI, motivating and describing its core functionality. It is suitable both for beginning users who want a rapid introduction to MPI, as well as intermediate users who want more than just the "how to". The approach is practical, rather than comprehensive, explaining what is important, what is not, and why. The emphasis will be on obtaining high performance, differentiating between theoretical performance advantages and ones that make sense in the real world, and on avoiding common mistakes. There will be an opinionated discussion of datatypes, communication modes, topologies and other advanced MPI features. The tutorial will describe techniques for obtaining high performance, illustrated with numerous examples. For users trying to choose between MPI and PVM, the tutorial will include a comparison of the two libraries and a discussion of porting. Finally, the tutorial will cover the most recent developments in MPI-2, including dynamic process management, one-sided communication, and I/O.

Monday

Applications of Web Technology and HPCC

Geoffrey Fox, Wojtek Furmanski, Nancy McCracken
Syracuse University

We discuss the role of HPCC and Web technologies in several applications including health care, finance, education and the delivery of computing services. We assume a knowledge of base Web concepts and summarize key features of Java, VRML, and JavaScript, but do not give a tutorial on these base technologies. We will illustrate the possibilities of HPCC Web integration in these real world applications and the role of base technologies and services.

Designing and Building Parallel Programs: An Introduction to Parallel Programming

Ian Foster
Argonne National Laboratory

Carl Kesselman
California Institute of Technology

Charles Koelbel
Rice University

In this tutorial, we provide a comprehensive introduction to the techniques and tools used to write parallel programs. Our goal is to communicate the practical information required by scientists, engineers, and educators who need to write parallel programs or to teach parallel programming. First, we introduce principles of parallel program design, touching upon relevant topics in architecture, algorithms, and performance modeling. Then, we describe the parallel programming standards High Performance Fortran and Message Passing Interface and the modern parallel language Compositional C . Finally, we introduce techniques for coupling HPF and MPI, and the parallel Standard Template Library proposed for HPC . The tutorial is based on the textbook "Designing and Building Parallel Programs" (Addison-Wesley, 1995), which is also available in HTML format on the World Wide Web at http://www.mcs.anl.gov/dbpp.

Interactive Visualization of Supercomputer Simulations

Michael E. Papka, Terrence L. Disz, Rick Stevens
Mathematics and Computer Science Division, Argonne National Laboratory

Matthew Szymanski
University of Illinois at Chicago

This tutorial discusses the integration of interactive visualization environments with supercomputers used for simulation of scientific applications. The topics include an introduction to interactive visualization technology (tracking, display systems, sound, modeling), communication mechanisms (software and hardware) needed for system integration, system performance, and the use of multiple visualization systems. The experience of the presenters in using the CAVE Automatic Virtual Environment (CAVE) connected to an IBM SP machine will be used to illustrate the concepts. The participants will gain knowledge of how to link massively parallel supercomputing simulations with virtual environments for display, interaction and control. The tutorial will be concluded with a discussion of the critical performance points in the coupled supercomputer, virtual environments experience.

Introduction to Effective Parallel Computing

Quentin F. Stout
University of Michigan

Marilynn Livingston
University of Oregon

This tutorial provides a comprehensive overview of parallel computing, focusing on the aspects most relevant to the user. Throughout, the emphasis is on the iterative process of converting a serial program into an increasingly efficient, and correct, parallel program. The tutorial will help people make intelligent planning decisions concerning parallel computers, and help them develop efficient application codes for such systems. It discusses hardware and software, with an emphasis on systems that are now (or soon will be) commercially available. Program design principles such as load balancing, communication reduction, and efficient use of cache are illustrated through examples selected from engineering, scientific, and database applications.

Parallel I/O on Highly Parallel Systems

Samuel Fineberg
Bill Nitzberg
NASA Ames Research Center

Typical scientific applications require vast amounts of processing power coupled with significant I/O capacity. Highly parallel computer systems provide floating point processing power at low cost, but efficiently supporting a scientific workload also requires commensurate I/O performance. In order to achieve high I/O performance, these systems utilize parallelism in their I/O subsystems-supporting concurrent access to files by multiple nodes of a parallel application, and striping files across multiple disks. However, obtaining maximum I/O performance can require significant programming effort. This tutorial presents a comprehensive survey of the state-of-the-art in parallel I/O from basic concepts to recent advances in the research community. Requirements, interfaces, architectures, and performance are all illustrated using concrete examples from commercial offerings (Convex Exemplar, Cray T3E, IBM SP2, Intel Paragon, Meiko CS-2, and high-end workstation clusters) as well as academic and research projects (CHARISMA, Panda, PASSION/VIP-FS, PIOUS, PPFS, and Vesta) and the emerging MPI-IO standard.

Hot Chips for High Performance Computing

Subhash Saini
David H. Bailey
NASA Ames Research Center

We will discuss several current CMOS based processors: the DEC Alpha 21164, which is used in the Cray T3E; the MIPS R10000, which is used in the SGI Power Challenge; the Intel Pentium Pro Processor (P6), which is used in the first DOE ASCI system; the PowerPC 604, which is used in an IBM SMP system; the HP PA-RISC 7200/8000, which is used in the Convex Exemplar SPP1600; an NEC proprietary processor, which is used in NEC SX-4; a Fujitsu proprietary processor, which is used in the Fujitsu VPP700; a Hitachi proprietary processor, which is used in the Hitachi SR2201. The architectures of these microprocessors will first be presented, followed by a description of supercomputers based on these processors. The performance of various hardware/programming model (HPF and MPI) combinations will then be compared, based on latest NAS Parallel Benchmark results, thus providing a cross-machine and cross-model comparison. The tutorial will conclude with a discussion of general trends in the field of high performance computing, including future directions in hardware and software technology as we achieve Tflop/s performance levels and press on to Pflop/s levels in the next decade.

Tuning MPI Applications for Peak Performance

William Gropp
Rusty Lusk
Argonne National Laboratory

MPI is now widely accepted as a standard for message-passing parallel computing libraries. Both applications and important benchmarks are being ported from other message-passing libraries to MPI. In most cases it is possible to make a translation in a fairly straightforward way, preserving the semantics of the original program. On the other hand, MPI provides many opportunities for increasing the performance of parallel applications by the use of some of its more advanced features, and straightforward translations of existing programs might not take advantage of these features. New parallel applications are also being written in MPI, and an understanding of performance-critical issues for message-passing programs, along with an explanation of how to address these using MPI, can provide the application programmer with the ability to provide a greater percentage of the peak performance of the hardware to his application. This tutorial will discuss performance-critical issues in message passing programs, explain how to examine the performance of an application using MPI-oriented tools, and show how the features of MPI can be used to attain peak application performance. It will be assumed that attendees have an understanding of the basic elements of the MPI specification. Experience with message-passing parallel applications will be helpful but not required.

Reinventing The Supercomputer Center

William Kramer
William McCurdy
Horst Simon
Lawrence Berkeley National Laboratory

This is time of change for the nation's Large Scale Computing Centers. Supercomputing sites have to deal with industry consolidations, decreasing budgets, changing and expanding missions, consolidation for sites, and new technical challenges. This tutorial covers the experiences and many practical lessons from large scale computing sites that have gone through dramatic changes. These include understanding the reasons requiring changes; the changing roles of supercomputing; organization decisions; recruiting and staffing; managing client expectations and interactions; setting and measuring new goals and expectations and all the details of creating new a new facility.