Sunday
Monday
This tutorial presents a scientific approach to benchmarking and
follows the methodology of the first "Parkbench" committee
report. It defines a clear set of units and symbols, followed by
a carefully defined set of performance parameters and metrics, and
finally a hierarchy of parallel benchmarks to measure them. A new
theory of performance scaling is presented through the concept of
"Computational Similarity" which allows the scaling of an application
for all computers and all problem sizes to be represented in a single
dimensionless DUSD diagram. Benchmarking practice covers the general
principles of properly setting up benchmarks and how to assess their
results and relate them to other techniques like simulation and machine
modelling. Results on current machines like the Cray T90, Cray T3E,
Hitachi SR2201, HP/Convex SPP-1600, Fujitsu VPP700, and NEC
SX4 will be discussed.
High Performance Fortran in Practice
Charles Koelbel
Center for Research on Parallel Computation, Rice University
High Performance Fortran (HPF) was defined in 1993 to provide a portable syntax for expressing data-parallel computations in Fortran. A major revision of HPF (termed HPF 2.0) will be completed by Supercomputing '96. Since the appearance of the High Performance Fortran Language Specification (available as an issue of Scientific Programming and by ftp, gopher, and WWW), several commercial compilers have appeared. There has also been great interest in HPF as a language for efficient parallel computation. The purpose of this tutorial is three-fold:
ALIGN and DISTRIBUTE
directives,
and the data parallel
FORALL statement and INDEPENDENT assertion.
Hugh M. Caffey
Hewlett-Packard Company
In this tutorial, the principles of parallel programming in a message-passing
environment will be introduced in terms that make sense to
non-computer scientists. Emphasis will be on practical information,
with a series of example programs being used to guide newcomers through
the important stages of writing and tuning message-passing codes.
The tutorial will not address details of parallel architectures, algorithms
or theoretical models. Instead, it will offer a minimal-trauma introduction
to the isses at stake in deciding whether or not to parallelize and
application, basic approaches to adding parallelism and
techniques for debugging, evaluating and tuning parallel programs.
Understanding and Developing Virtual Reality Systems
Henry Sowizral
Sun Microsystems
This tutorial provides an in-depth look at virtual reality systems and their
construction. It introduces virtual reality with an overview of how a
VR system operates, a brief history, and videos showing a collection
of VR applications in operation. It continues with a survey discussing
the component parts of a VR system, both hardware and software. It
includes information on how those components operate and pointers
to suppliers of various products. Next, the tutorial delves into the
many topics involved in making the VR experience more "real" such as correcting
for errors introduced by the display's optical pathway, correcting for
tracker errors and lag, understanding how to use the graphics
hardware most effectively, handling scene complexity, and inserting
an ego-centric human model (avatar) into the scene. The tutorial
concludes with a description of augmented environments and their operation.
A Practical Guide to Java
Ian G. Angus
Boeing Information and Support Services
In the span of just one year Java has risen from an experimental
language to be the Internet programming language du jour. The question
naturally arises: -- What is Java and how is it used? In this tutorial we
will go beyond the hype. We will: 1) Introduce the Java programming
language and explain the concepts (and buzzwords) that define it with careful
coverage given to both its strengths and weaknesses. 2) Show with simple
examples how Java can be used on the WWW and how to take advantage of it.
3) Demonstrate Java's greater potential with a discussion of selected non-WWW
applications for which Java provides unique capabilities.
Performance Programming for Scientific Computation
Larry Carter
UCSD and SDSC
Bowen Alpern
IBM's Watson Research Center
Performance programming is the design, writing, and tuning of
programs to sustain near-peak performance. This tutorial
will present a unified framework for understanding and overcoming
the bottlenecks to high performance. The goal of the course is
to make performance programming a science rather than a craft.
Development of high performance programs has always required an
acute sensitivity to details of processor and memory hierarchy
architecture. The advent of modern workstations and supercomputers
brings to the fore another concern - parallelism. The tutorial will
identify four requirements for attaining high performance at any
level of computation. General techniques for satisfying these
requirements can be applied to improve performance of the processor,
of the memory hierarchy, and of parallel processors.
Application of the techniques are illustrated with a variety of examples.
Memory Consistency Models for Shared-Memory Multiprocessors
Sarita V. Adve
Department of Electrical and Computer Engineering, Rice University
Kourosh Gharachorloo
Western Research Lab, Digital Equipment Corporation
A memory consistency model for a shared-memory system indicates how the
memory operations of a program will appear to execute to the programmer.
The most commonly assumed memory model is Lamport's sequential
consistency (SC) which requires a multiprocessor to appear like a
multiprogrammed uniprocessor. While SC provides a familiar interface for
programmers, it restricts the use of several common uniprocessor
hardware and compiler optimizations, thereby limiting performance. For
higher performance, alternate memory models have been proposed. These models,
however, present a more complex programming interface. Thus, the choice
of the memory model involves a difficult, but important tradeoff between
performance and ease-of-use. This tutorial will survey several currently
proposed memory models, place them within a common framework, and assess
them on the basis of their performance potential and ease-of-use.
The following specific topics will be covered. - The problem of memory
consistency models (including the difference from the cache coherence
problem). - Implementing sequential consistency. - Alternative memory
models (descriptions and performance benefits): a. system-centric
approach: e.g., weak ordering, processor consistency, release consistency,
and models adopted by Digital, IBM, and Sun. b. programmer-centric
approach: e.g., the data-race-free and properly labeled methods for
describing memory models. - Interaction with other latency hiding techniques
(e.g., prefetching). - More aggressive implementations of memory consistency
models. - Relaxed consistency models for software DSM systems (e.g., Munin,
Treadmarks, Midway). The tutorial will assume rudimentary knowledge of
shared-memory multiprocessor organization. It will cover both
the basic problem in detail, and advanced issues that represent ongoing research.
An Intensive and Practical Introduction to the Message Passing Interface (MPI)
William C. Saphir
NASA Ames Research Center/MRJ
MPI has taken hold as the library of choice for portable high-performance message passing applications. The complexity of MPI presents a short but steep learning curve. This tutorial provides a rapid introduction to MPI, motivating and describing its core functionality. It is suitable both for beginning users who want a rapid introduction to MPI, as well as intermediate users who want more than just the "how to". The approach is practical, rather than comprehensive, explaining what is important, what is not, and why. The emphasis will be on obtaining high performance, differentiating between theoretical performance advantages and ones that make sense in the real world, and on avoiding common mistakes. There will be an opinionated discussion of datatypes, communication modes, topologies and other advanced MPI features. The tutorial will describe techniques for obtaining high performance, illustrated with numerous examples. For users trying to choose between MPI and PVM, the tutorial will include a comparison of the two libraries and a discussion of porting. Finally, the tutorial will cover the most recent developments in MPI-2, including dynamic process management, one-sided communication, and I/O.
Applications of Web Technology and HPCC
Geoffrey Fox,
Wojtek Furmanski,
Nancy McCracken
Syracuse University
We discuss the role of HPCC and Web technologies in several applications
including health care, finance, education and the delivery of computing
services. We assume a knowledge of base Web concepts and summarize key
features of Java, VRML, and JavaScript, but do not give a tutorial on
these base technologies. We will illustrate the possibilities of HPCC
Web integration in these real world applications and the role of base
technologies and services.
Designing and Building Parallel Programs: An Introduction to Parallel Programming
Ian Foster
Argonne National Laboratory
Carl Kesselman
California Institute of Technology
Charles Koelbel
Rice University
In this tutorial, we provide a comprehensive introduction to the
techniques and tools used to write parallel programs. Our
goal is to communicate the practical information required by
scientists, engineers, and educators who need to write parallel
programs or to teach parallel programming. First, we introduce
principles of parallel program design, touching upon relevant topics
in architecture, algorithms, and performance modeling. Then,
we describe the parallel programming standards High Performance
Fortran and Message Passing Interface and the modern parallel language
Compositional C . Finally, we introduce techniques for coupling
HPF and MPI, and the parallel Standard Template Library proposed
for HPC . The tutorial is based on the textbook "Designing and
Building Parallel Programs" (Addison-Wesley, 1995), which is also
available in HTML format on the World Wide Web at
http://www.mcs.anl.gov/dbpp.
Interactive Visualization of Supercomputer Simulations
Michael E. Papka,
Terrence L. Disz,
Rick Stevens
Mathematics and Computer Science Division, Argonne National
Laboratory
Matthew Szymanski
University of Illinois at Chicago
This tutorial discusses the integration of interactive visualization
environments with supercomputers used for simulation of scientific
applications. The topics include an introduction to interactive
visualization technology (tracking, display systems, sound,
modeling), communication mechanisms (software and hardware) needed
for system integration, system performance, and the use of
multiple visualization systems. The experience of the presenters in
using the CAVE Automatic Virtual Environment (CAVE)
connected to an IBM SP machine will be used to illustrate the concepts.
The participants will gain knowledge of how to link massively
parallel supercomputing simulations with virtual environments for
display, interaction and control. The tutorial will be concluded
with a discussion of the critical performance points in the coupled
supercomputer, virtual environments experience.
Introduction to Effective Parallel Computing
Quentin F. Stout
University of Michigan
Marilynn Livingston
University of Oregon
This tutorial provides a comprehensive overview of parallel computing,
focusing on the aspects most relevant to the user. Throughout, the emphasis
is on the iterative process of converting a serial program into an
increasingly efficient, and correct, parallel program. The tutorial will help
people make intelligent planning decisions concerning parallel
computers, and help them develop efficient application codes for such
systems. It discusses hardware and software, with an emphasis on systems
that are now (or soon will be) commercially available. Program design
principles such as load balancing, communication reduction, and efficient
use of cache are illustrated through examples selected from
engineering, scientific, and database applications.
Parallel I/O on Highly Parallel Systems
Samuel Fineberg
Bill Nitzberg
NASA Ames Research Center
Typical scientific applications require vast amounts of processing
power coupled with significant I/O capacity. Highly parallel computer
systems provide floating point processing power at low cost, but efficiently
supporting a scientific workload also requires commensurate I/O performance.
In order to achieve high I/O performance, these systems utilize parallelism
in their I/O subsystems-supporting concurrent access to files by multiple
nodes of a parallel application, and striping files across multiple disks.
However, obtaining maximum I/O performance can require significant programming
effort. This tutorial presents a comprehensive survey of the state-of-the-art
in parallel I/O from basic concepts to recent advances in the research community.
Requirements, interfaces, architectures, and performance are all illustrated
using concrete examples from commercial offerings (Convex Exemplar, Cray T3E,
IBM SP2, Intel Paragon, Meiko CS-2, and high-end workstation
clusters) as well as academic and research projects (CHARISMA, Panda,
PASSION/VIP-FS, PIOUS, PPFS, and Vesta) and the emerging MPI-IO standard.
Hot Chips for High Performance Computing
Subhash Saini
David H. Bailey
NASA Ames Research Center
We will discuss several current CMOS based processors: the DEC Alpha 21164,
which is used in the Cray T3E; the MIPS R10000, which is used in the SGI Power
Challenge; the Intel Pentium Pro Processor (P6), which is used in the
first DOE ASCI system; the PowerPC 604, which is used in an IBM SMP
system; the HP PA-RISC 7200/8000, which is used in the Convex Exemplar
SPP1600; an NEC proprietary processor, which is used in NEC SX-4; a
Fujitsu proprietary processor, which is used in the Fujitsu VPP700; a
Hitachi proprietary processor, which is used in the Hitachi SR2201. The
architectures of these microprocessors will first be presented, followed
by a description of supercomputers based on these processors. The
performance of various hardware/programming model (HPF and MPI) combinations
will then be compared, based on latest NAS Parallel Benchmark results,
thus providing a cross-machine and cross-model comparison. The tutorial
will conclude with a discussion of general trends in the field of
high performance computing, including future directions in hardware and
software technology as we achieve Tflop/s performance levels and press on
to Pflop/s levels in the next decade.
Tuning MPI Applications for Peak Performance
William Gropp
Rusty Lusk
Argonne National Laboratory
MPI is now widely accepted as a standard for message-passing parallel
computing libraries. Both applications and important benchmarks are
being ported from other message-passing libraries to MPI. In most
cases it is possible to make a translation in a fairly straightforward
way, preserving the semantics of the original program. On the other
hand, MPI provides many opportunities for increasing the performance
of parallel applications by the use of some of its more advanced
features, and straightforward translations of existing programs might
not take advantage of these features. New parallel applications are also
being written in MPI, and an understanding of performance-critical
issues for message-passing programs, along with an explanation of how
to address these using MPI, can provide the application programmer
with the ability to provide a greater percentage of the peak performance
of the hardware to his application. This tutorial will discuss
performance-critical issues in message passing programs, explain how to examine
the performance of an application using MPI-oriented tools, and show how
the features of MPI can be used to attain peak application performance.
It will be assumed that attendees have an understanding of the basic elements
of the MPI specification. Experience with message-passing parallel applications
will be helpful but not required.
Reinventing The Supercomputer Center
William Kramer
William McCurdy
Horst Simon
Lawrence Berkeley National Laboratory
This is time of change for the nation's Large Scale Computing Centers. Supercomputing sites have to deal with industry consolidations, decreasing budgets, changing and expanding missions, consolidation for sites, and new technical challenges. This tutorial covers the experiences and many practical lessons from large scale computing sites that have gone through dramatic changes. These include understanding the reasons requiring changes; the changing roles of supercomputing; organization decisions; recruiting and staffing; managing client expectations and interactions; setting and measuring new goals and expectations and all the details of creating new a new facility.