TUTORIALS

DESCRIPTIONS OF TUTORIALS

Sunday, November 17

S1 A Practical Guide to Java
Ian G. Angus, Boeing Information and Support Services
Level: 50% Beginner, 25% Intermediate, 25% Advanced
In the span of just one year, Java has risen from an experimental language to be
the Internet programming language du jour. The question naturally arises: What
is Java and how is it used? In this tutorial we will go beyond the hype. We will:
introduce the Java programming language and explain the concepts (and buzzwords)
that define it with careful coverage given to both its strengths and weaknesses;
show with simple examples how Java can be used on the WWW and how you can
take advantage of it; and demonstrate Java's greater potential with a discussion of
elected non-WWW applications for which Java provides unique capabilities.
S2 Understanding and Developing Virtual Reality Systems
Henry A. Sowizral, Sun Microsystems
Level: 50% Beginner, 30% Intermediate, 20% Advanced
This tutorial provides an in-depth look at virtual reality (VR) systems and their
construction. It introduces virtual reality with an overview of how a VR system
operates, a brief history and videos showing a collection of VR applications in
operation. It continues with a survey discussing the component parts of a VR
system, both hardware and software. It includes information on how those
components operate and pointers to suppliers of various products. The tutorial
delves into the many topics involved in making the VR experience more "real," such
as correcting for errors introduced by the display's optical pathway, correcting for
tracker errors and lag, understanding how to use the graphics hardware most
effectively, handling scene complexity and inserting an egocentric human model
(avatar) into the scene. The tutorial concludes with a description of augmented
environments and their operation.
S3 An Intensive and Practical Introduction to the Message Passing Interface
(MPI)
William Saphir, NASA Ames Research Center/MRJ
Level: 15% Beginner, 55% Intermediate, 30% Advanced
MPI has taken hold as the library of choice for portable high-performance message
passing applications. The complexity of MPI presents a short but steep learning
curve. This tutorial provides a rapid introduction to MPI, motivating and describing
its core functionality. It is suitable both for beginning users who want a rapid
introduction to MPI, as well as intermediate users who want more than just the "how to."
The approach is practical, rather than comprehensive, explaining what is important,
what is not and why. The emphasis will be on obtaining high performance,
differentiating between theoretical performance advantages and ones that make sense
in the real world and on avoiding common mistakes. There will be an opinionated
discussion of datatypes, communication modes, topologies and other advanced
MPI features. The tutorial will describe techniques for obtaining high performance,
illustrated with numerous examples. For users trying to choose between MPI and
PVM, the tutorial will include a comparison of the two libraries and a discussion
of porting. It will also cover the most recent developments in MPI-2, including dynamic
process management, one-sided communication and I/O.
S4 Message-Passing Programming for Scientists and Engineers
Cherri M. Pancake, Oregon State University
Hugh M. Caffey, Hewlett-Packard Company
Level: 70% Beginner, 30% Intermediate
In this tutorial, the principles of parallel programming in a message-passing environment
will be introduced in terms that make sense to non-computer scientists. Emphasis will
be on practical information, with a series of example programs being used to guide
newcomers through the important stages in writing and tuning message-passing codes.
The tutorial will not address details of parallel architectures, algorithms or theoretical
models. Instead, it will offer a minimal-trauma introduction to the issues at stake in
deciding whether or not to parallelize an application, basic approaches to adding
parallelism, and techniques for debugging, evaluating and tuning parallel programs.
S5 High Performance Fortran in Practice
Charles Koelbel, Rice University
Level: 30% Beginner, 50% Intermediate, 20% Advanced
High Performance Fortran (HPF) was defined in 1993 to provide a portable syntax for
expressing data-parallel computations in Fortran. A major revision of HPF (termed
HPF 2.0) will be completed by SC'96. Since the appearance of the High Performance
Fortran Language Specification (available as an issue of Scientific Programming and
by ftp, gopher, and WWW), several commercial compilers have appeared. There has
also been great interest in HPF as a language for efficient parallel computation. The
purpose of this tutorial is three-fold:
1. To introduce programmers to the most important features of HPF 2.0
2. To illustrate how these features can be used in practice on algorithms for scientific
computation.
3. To inform users of the future direction of HPF, including recommended
extensions to HPF 2.0 in the areas of advanced data mapping, task parallelism and
external interfaces.
The tutorial will both broaden the appeal of HPF and help users achieve its maximum
potential.
S6 The Science and Practice of Supercomputing Benchmarking
Aad J. van der Steen, University of Utrecht
Level: 35% Beginner, 50% Intermediate, 15% Advanced
This tutorial presents a scientific approach to benchmarking and follows the methodology
of the first "Parkbench" committee report. It defines a clear set of units and symbols,
followed by a carefully defined set of performance parameters and metrics and, finally,
a hierarchy of parallel benchmarks to measure them. A new theory of performance
caling is presented through the concept of "computational similarity," which allows the
scaling of an application for all computers and all problem sizes to be represented in a
single dimensionless diagram. Benchmarking practice covers the general principles of
properly setting up benchmarks and how to assess their results and relate them to other techniques like simulation and machine modeling. Results on current machines like the
Cray T90, Cray T3E, Hitachi SR2201, HP/Convex, SPP-1600, Fujitsu VPP700 and NEC
SX4 will be discussed.
S7 Performance Programming for Scientific Computation
Bowen Alpern, IBM T. J. Watson Research Center
Larry Carter, University of California at San Diego
Level: 30% Beginner, 50% Intermediate, 20% Advanced
Performance programming is the design, writing and tuning of programs to sustain
near-peak performance. This tutorial will present a unified framework for understanding and overcoming the bottlenecks to high performance. The goal of the course is to make performance programming a science rather than a craft.
Development of high performance programs has always required an acute sensitivity to
details of processor and memory hierarchy architecture. The advent of modern
workstations and supercomputers brings to the fore another concern-parallelism. The
tutorial will identify four requirements for attaining high performance at any level of computation. General techniques for satisfying these requirements can be applied to improve performance of the processor, of the memory hierarchy and of parallel processors. Application of the techniques are illustrated with a variety of examples.
S8 Memory Consistency Models for Shared-Memory Multiprocessors
Sarita V. Adve, Rice University Kourosh Gharachorloo, Digital Equipment Corporation
Level: 50% Beginner, 30% Intermediate, 20% Advanced
A memory consistency model for a shared-memory system indicates how the memory
operations of a program will appear to execute to the programmer. The most commonly assumed memory model is Lamport's sequential consistency (SC) which requires a multiprocessor to appear like a multiprogrammed uniprocessor. While SC provides a
familiar interface for programmers, it restricts the use of several common uniprocessor
hardware and compiler optimizations, thereby limiting performance. For higher
performance, alternate memory models have been proposed. These models, however,
present a more complex programming interface. Thus, the choice of the memory
model involves a difficult, but important tradeoff between performance and ease-of-use.
This tutorial will survey several currently proposed memory models, place them within a common framework, and assess them on the basis of their performance potential and ease-of-use. We will cover: the problem of memory consistency models; implementing sequential consistency; alternative memory models, including models adopted by Digital, IBM and Sun; interaction with other latency hiding techniques; more aggressive implementations of memory consistency models; and relaxed consistency models for software DSM systems (e.g., Munin, Treadmarks, Midway)
The tutorial will assume rudimentary knowledge of shared-memory multiprocessor organization. It will cover both the basic problem in detail and advanced issues that represent ongoing research.

Monday, November 18

M1	Interactive Visualization of Supercomputer Simulations  
	Terry Disz, Michael Papka, Rick Stevens, Argonne National Laboratory; 
	Matthew Szymanski, University of Illinois at Chicago
	Level:  100% Intermediate
This tutorial discusses the integration of interactive visualization environments with supercomputers used for simulation of scientific applications. The topics include an  introduction to interactive visualization technology (tracking, display systems, sound, modeling), communication mechanisms (software and hardware) needed for system integration, system performance and the use of multiple visualization systems.  The 
experience of the presenter in using the CAVE Automatic Virtual Environment (CAVE) connected to an IBM SP machine will be used to illustrate concepts.  The participants 
will gain knowledge of how to link massively parallel supercomputing simulations with 
virtual environments for display, interaction and control.  The tutorial will conclude 
with a discussion of the critical performance points in the coupled supercomputer, 
virtual environments experience.  
M2	Tuning MPI Applications for Peak Performance
	William Gropp, Rusty Lusk, Argonne National Laboratory
	Level:  50% Intermediate, 50% Advanced
MPI is now widely accepted as a standard for message-passing parallel computing libraries.  
Both applications and important benchmarks are being ported from other message-passing libraries to MPI.  In most cases it is possible to make a translation in a fairly straightforward 
way, preserving the semantics of the original program.  On the other hand, MPI provides 
many opportunities for increasing the performance of parallel applications by the use of 
some of its more advanced features, and straightforward translations of existing programs 
might not take advantage of these features.  New parallel applications are also being 
written in MPI, and an understanding of performance-critical issues for message-passing programs, along with an explanation of how to address these using MPI, can provide the applications programmer with the ability to provide a greater percentage of the peak performance of the hardware to the  application.  
This tutorial will discuss performance-critical issues in message passing programs, 
explain how to examine the performance of an application using MPI-oriented tools 
and show how the features of MPI can be used to attain peak application performance.  
We assume attendees will have an understanding of the basic elements of MPI.  
Experience with message-passing parallel applications will be helpful but not required.
M3	Reinventing the  Supercomputer center 
	William Kramer, William McCurdy, Horst Simon
	Lawrence Berkeley National Laboratory
	Level:  50% Intermediate, 50% Advanced
This is a time of change for the nation's large scale computing centers.  Supercomputing 
sites have to deal with industry consolidations, decreasing budgets, changing and expanding missions, consolidation of sites and new technical challenges.  This tutorial covers the 
experience and many practical lessons from large scale computing sites that have gone 
through dramatic changes.  These include understanding the reasons requiring changes, 
the changing roles of supercomputing, organization decisions, recruiting and staffing, 
managing client expectations and interactions, setting and measuring new goals and expectations, and all the details of creating a new facility.
M4	Introduction to Effective Parallel Computing
	Marilynn Livingston, Southern Illinois University; Quentin F. Stout, 
	University of Michigan
	Level:  50% Beginner, 50% Intermediate
This tutorial provides a comprehensive overview of parallel computing, focusing on the 
aspects most relevant to the user.  Throughout, the emphasis is on the iterative process of converting a serial program into an increasingly efficient, and correct, parallel program.  
The tutorial will help people make intelligent planning decisions concerning parallel 
computers and help them develop efficient application codes for such systems.  It 
discusses hardware and software, with an emphasis on systems that are now (or soon 
will be) commercially available.  Program design principles such as load balancing, communication reduction and efficient use of cache are illustrated through examples
selected from engineering, scientific and database applications.
M5	Parallel I/O on Highly Parallel Systems
	Samuel Fineberg, Bill Nitzberg, NASA Ames Research Center
	Level:  30% Beginner, 60% Intermediate,  10% Advanced
Typical scientific applications require vast amounts of processing power coupled with 
significant I/O capacity.  Highly parallel computer systems provide floating point 
processing power at low cost, but efficiently supporting a scientific workload also requires commensurate I/O performance.  In order to achieve high I/O performance, these systems 
utilize parallelism in their I/O subsystems - supporting concurrent access to files by multiple nodes of a parallel application and striping files across multiple disks.  Obtaining maximum 
I/O performance can, however, require significant programming effort.  This tutorial presents 
a comprehensive survey of the state-of-the-art in parallel I/O from basic concepts to recent advances in the research community.  Requirements, interfaces, architectures and 
performance are all illustrated using concrete examples from commercial offerings 
(Convex Exemplar, Cray T3E, IBM SP2, Intel Paragon, Meiko CS-2, and high-end 
workstation clusters), as well as academic and research projects (CHARISMA, Panda, PASSION/VIP-FS, PIOUS, PPFS and Vesta) and emerging MPI-IO standards.
M6	Hot Chips for High Performance Computing
	Subhash Saini, David H. Bailey, NASA Ames Research Center
	Level:  25% Beginner, 50% Intermediate,  25% Advanced
We will discuss several current CMOS-based processors: the DEC Alpha 21164 (used in 
the Cray T3E);  the MIPS R10000 (used in the SGI Power Challenge); the Intel Pentium 
Pro Processor (used in the first DOE ASCI system); the PowerPC 604 (used in an IBM SMP system); the HP PA-RISC 7200 (used in the Convex Exemplar SPP1600); an NEC 
proprietary processor (used in NEC SX-4); a Fujitsu proprietary processor, (used in 
the new Fujitsu VPP700); and a Hitachi proprietary processor (used in the Hitachi 
SR2201).  The architecture of these microprocessors will be presented, followed by a 
description of supercomputers based on these processors.  The various performances of hardware/programming models supported on these systems will also be discussed.  
The performance of various hardware/programming model combinations (HPF and 
MPI) will then be compared, based on the latest NAS Parallel Benchmark results, thus 
providing a cross-machine and cross-model comparison.  The tutorial will conclude 
with a discussion of general trends in high performance computing, including future 
directions in hardware and software technology as we achieve Tflop/s performance 
levels and press on to Pflops/s levels in the next decade.
M7	Designing and Building Parallel Programs:  An Introduction to Parallel
	Programming
	Ian Foster, Argonne National Laboratory; Carl Kesselman, California Institute of 
	Technology; Charles Koelbel, Rice University
	Level:  60% Introductory, 40% Intermediate
In this tutorial, we provide a comprehensive introduction to the techniques and tools used to write parallel programs.  Our goal is to communicate the practical information required by scientists, engineers and educators who need to write parallel programs or to teach parallel programming.  First, we introduce principles of parallel program design, touching upon 
relevant topics in architecture, algorithms and performance modeling.  Then, we describe the parallel programming standards High Performance Fortran and Message Passing Interface 
and the modern parallel language Compositional C++.  Finally, we introduce techniques 
for coupling HPF and MPI, and the parallel Standard Template Library proposed for HPC++.  The tutorial is based on the textbook Designing and Building Parallel Programs 
(Addison-Wesley, 1995), also available in HTML format on WWW at http://www.mcs.anl.gov/dbpp.
M8	Applications of Web Technology and HPCC
	Geoffrey Fox, Wojtek Furmanski, Nancy McCracken, 
	Syracuse University
	Level:  100% Intermediate
We discuss the role of HPCC and Web technologies in several applications, including health 
care, finance, education and the delivery of computing services.  We assume a knowledge 
of base Web concepts and summarize key features of Java, VRML, and JavaScript but do 
not give a tutorial on these base technologies.  We will illustrate the possibilities of HPCC 
Web integration in these real world applications and the role of base technologies and 
services.