A Spectral Element Ocean Model on the Cray T3D:
the Mediterranean Sea General Circulation Case

Roberto Ansaloni (1), Anne Molcard(2)
(1)Cray Research S.r.l., (2) Institute for the study of Geophysical Environmental Methodologies (IMGA-CNR)

A new numerical model, SEOM (Spectral Element Ocean Model), is considered to study the general circulation of the Mediterranean Sea. Spectral element methods combine the geometric flexibility of finite element techniques with the rapid convergence rate of spectral schemes. The current version solves the shallow water equations. The domain decomposition philosophy allows to exploit the power of parallel machines due to the large inter-element computational complexity. The original MIMD master/slave version of SEOM, written in Fortran 90 and PVM, has been ported to the Cray T3D. When critical for performance, Cray specific high-performance one-sided communication routines (SHMEM) have been adopted to fully exploit the Cray T3D interprocessor network. Tests performed with highly unstructured and irregular grid, on up to 128 processors, show an almost linear scalability even with unoptimized domain decomposition techniques. Results from one-year simulations on the Mediterranean Sea are shown for realistic bottom and coastline geometry.

Roberto Ansaloni (Cray Research)
e-mail: roberto.ansaloni@cray.com
Phone: +39-51-6170132

Anne Molcard (IMGA-CNR)
e-mail: anne@carmen.bo.cnr.it
Phone: +39-59-362388

Message-passing Implementation of a 3-D Thin-Layer Navier Stokes Program on the Cray J90 and T3E

by Dennis Morrow and Veer Vatsa     

The program solves the three dimensional time dependent thin-layer Navier Stokes equations on a structured grid. A typical use is modeling an aircraft wing. The code includes both serial and two message passing versions--one written in PVM and the other in MPI. An overview of the program developed to date by Dr. Vatsa is documented on the web for the IBM SP2 and for a heterogeneous cluster of workstations:

                    http://hpccp-www.larc.nasa.gov:80/~dana/t1.html

This poster describes a third message passing implementation using SHMEM in a homogeneous environment on the Cray J932 and Cray T3E. Inter-processor communication and the impact on load balance is presented for both machines.

Both the parallel-vector and message-passing implementations of the code are compared on the J932 in dedicated and production environments. Plots of performance vs machine size, speedup, and solution time are presented for both the J932 and the T3E for various problem sizes.

Ninf: A Network based Information Library for Global World-Wide Computing Infrastructure

Mitsuhisa Sato
Real World Computing Partnership
Hidemoto Nakada
Electrotechnical Laboratory
Satoshi Sekiguchi
Electrotechnical Laboratory
Satoshi Matsuoka
The University of Tokyo
Umpei Nagashima
Ochanomizu University
Hiromitsu Takagi
Nagoya Institiue of Technology

Ninf is an ongoing global network-wide computing infrastructure project which allows users to access computational resources including hardware, software and scientific data distributed across a wide area network with an easy-to-use interface. Ninf is intended not only to exploit high performance in network parallel computing, but also to provide high quality numerical computation services and accesses to scientific database published by other researchers. Computational resources are shared as Ninf remote libraries executable at a remote Ninf server. Users can build an application by calling the libraries with the Ninf Remote Procedure Call, which is designed to provide a programming interface similar to conventional function calls in existing languages, and is tailored for scientific computation. In order to facilitate location transparency and network-wide parallelism, Ninf metaserver maintains global resource information regarding computational server and databases, allocating and scheduling coarse-grained computation to achieve good global load balancing. Ninf also interfaces with existing network service such as the WWW for easy accessibility.

Mitsuhisa Sato
Real World Computing Partnership
Mitsui Blg 16F, 1-6-1 Takezono, Tsukuba, Ibaraki 305, Japan
(TEL: +81-298-53-1663, FAX: +81-298-53-1652)
msato@trc.rwcp.or.jp
http://www.rwcp.or.jp/people/msato

WebSubmit - Running Supercomputer Applications Via the Web

Robert R. Lipman, Judith E. Devaney
National Institute of Standards and Technology
Information Technology Laboratory
Gaithersburg, Maryland 20899

WebSubmit enables users to run applications via the Web. The initial goal of WebSubmit is to make it easier for users to run applications on supercomputers. This is accomplished by creating a web page interface to the application on the supercomputer. The first implementation of WebSubmit is for running Gaussian 94, a molecular dynamics program, on an IBM SP2. The user enters input on a web page form to submit a Gaussian 94 job to the SP2. The status of the job may be monitored from the web page in addition to using other utility functions. Additional implementations of WebSubmit will cover using the LoadLeveler on the SP2 and other applications and hardware platforms. All of the web pages use CGI scripts written in Tcl. For more information, see: http://www.nist.gov/itl/div887/sasg/gauss/

Robert Lipman
NIST
Bldg 225, Rm B122
Gaithersburg, MD 20899

(301) 975-3829, robert.lipman@nist.gov

SecureNet-- A Wide Area/ Local Area Network for ASCI

Pete Dean
Sandia National Laboratories

This poster exhibit presents work being done to provide a wide/ local area network that will enable uniform, transparent, and efficient distributed classified and unclassified computing among the three defense program laboratories. This network, which has received accreditation for transporting classified information, uses the Energy Systems Network (ESNet) as the wide area network, together with end-to-end encryptors and Kerberos authentication to provide the classified services. The technical challenges stem from the levels of performance, security, and services that will be required from the network to support the ASCI effort.

Pete Dean
Information Processes Center
Advanced Network Integration Department
Sandia National Laboratories
Dept. 4616 MS 0806
Albuquerque NM 87185-5800
email: pwdean@sandia.gov

Scalable ATM Encryption

Lyndon Pierson
Sandia National Laboratories

This poster exhibit presents work being done to assure that super-high speed encryption can be implemented to satisfy ASCI objectives, and to assure that these high speed implementations can be made to interoperate with slower speed, less expensive encryption implementations through the rate adaptation provided by ATM "Variable Bit Rate" (VBR) services. ATM end-to-end encryption devices must maintain separate encryption contexts and keys for each encrypted virtual circuit. The first few ATM encryption prototypes have demonstrated the feasibility of this concept, and have explored some of the difficulties of key management and crypto synchronization in this key-agile environment. Following these prototypes, the first few ATM encryption products are now beginning to be marketed. Integration of innovative methods of scaling encryption speed with techniques for key management, crypto synchronization, and key agility are required to make ATM encryption viable at OC-48 (2.4 Gb/s) and higher. Development of this technology will enable national defense applications requiring the secure exchange of massive amounts of data between widely separated sites.

Lyndon Pierson
Information Processes Center
Advanced Network Integration Department
Sandia National Laboratories
Dept. 4616 MS 0806
Albuquerque NM 87185-5800
email: lgpiers@sandia.gov

ISIS++: Iterative Scalable Implicit Solver (in C++)

Robert L. Clay and Alan B. Williams
Sandia National Laboratories

ISIS++ is an object-oriented framework for solving sparse linear systems of equations. Though it was developed to solve systems of equations originating from large- scale, 3-D, Finite Element Analyses, it has applications in many other fields. A key feature of ISIS++ is the simple interchangability of components - both from within the ISIS++ system and from other packages. This framework facilitates integrating components from various libraries, and in particular the matrix-vector functional units and data structures. The advantages of this approach include the ability to leverage existing work in the field. Thus, the library can be built using the matrix-vector implementation best suited to the task and compute system at hand, with no changes to the solver or preconditioner source. Thus, ISIS++ is transparently portable across a wide range of computer architectures, from desktop PC's to MPP supercomputers.

Robert L. Clay
MS 9011
Sandia National Laboratories
Livermore, CA 94550
Phone: (510) 294 2193
email: rlclay@ca.sandia.gov

Modeling Groundwater Flow Through Heterogeneous Porous Media on Massively Parallel Computers

S. Ashby, C. Baldwin, W. Bosl, R. Falgout, R. Maxwell, J. Murphy,
N. Rosenberg, D. Shumaker, C. San Soucie, S. Smith, A. Tompson
Lawrence Livermore National Laboratory
International Technology Corporation

This poster and video presentation will describe a multidisciplinary effort to develop a sophisticated simulation code for modeling multiphase flow and multicomponent transport through three-dimensional heterogeneous porous media. The simulator includes scalable subsurface modeling capabilities, a fast flow solver, and accurate component transport schemes. In particular, we employ grid-independent conceptual models and use geostatistical techniques to reproduce fine-scale heterogeneities. Fluid flow velocities are calculated via a scalable and fast multigrid algorithm. We offer the user the choice of a highly accurate Godunov procedure for advective transport or a PIC code for advective-diffusive transport coupled with reactive effects. The simulator runs on a variety of computing platforms, ranging from a single workstation to massively parallel computers. We will show a video highlighting our efforts to model several complex real-world sites that present many computational challenges. These include site-scale modeling to analyze various pumping strategies for remediation efforts and regional-scale modeling to study water resource management issues. The sites to be modeled are large and/or need to be highly resolved, resulting in problems having 8M computational zones. The sites also have complex geometries and boundary conditions, varying degrees of subsurface heterogeneity, and need sophisticated pumping strategies. We will demonstrate the scalability of the ParFlow simulator on a 256-node CRAY T3D.

Steven Ashby
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
P.O. Box 808, L-561
Livermore, CA 94551 USA
phone: 510-423-2462
fax: 510-422-6675
email: sfashby@llnl.gov

PCP: A Parallel Programming Paradigm that Spans Uniprocessor, Shared Memory and Distributed Memory Architectures.

Eugene D. Brooks III and Karen H. Warren

Lawrence Livermore National Laboratory
Livermore, California 94550

Practitioners of high performance computing have faced the rather simple problem of effectively utilizing vector processing architecture for the solution of scientific problems. We currently have a much more diverse set of architectures to exploit simultaneously. The problem of efficiently targeting this dissimilar set of computer architectures has had poor solutions if any at all.

We have assembled ideas from several sources to create a parallel extension of ANSI C that can be used efficiently on a wide range of architectures. The design goal of the parallel programming model is to to achieve reducibility on simpler architectural targets as we move up the evolutionary chain of architecture complexity. PCP is a relatively simple programming language that allows the user explicit control of both data placement and communication in a shared address space. It offers both loop parallelism and task parallelism via processor teams.

Eugene D. Brooks III
Lawrence Livermore National Laboratory
(510) 423-7341, brooks3@llnl.gov

Parallelization Issues in a Multi-Physics, Multi-Platform, 3D Finite Element ALE Code

Rob Neely, Bob Corey, Evi Dube, Scott Futral, Juliana Hsu, Jim Maltby
Rose McCallen, Al Nichols, Ivan Otero, Tim Pierce, Richard Sharp
Lawrence Livermore National Laboratory

This poster describes our approach to work in the parallelization of ALE3D - a general purpose 3D finite element code incorporating Arbitrary Lagrange-Eulerian (ALE) continuum mechanics, explicit and implicit time integration, slide surfaces, coupled heat transfer, and chemical transport. The parallel design is driven by the requirement of having a single portable code that runs efficiently on architectures ranging from workstations to SMP's to MPP's, and combinations thereof. We discuss various issues encountered during the parallelization of each of the major packages presented above, and describe a design that allows us to efficiently take advantage of architectures which combine distributed and shared memory environments. A scheme to parallelize slide surfaces is presented that provides good dynamic load balancing during a continuously changing problem geometry. Preliminary performance results showing both execution speedups and solvable problem sizes on several parallel architectures will also be presented.

Rob Neely
L-170
PO Box 808
Lawrence Livermore National Laboratory
Livermore, CA 94550
rneely@llnl.gov
phone: (510)423-4243
fax: (510)422-3389

Parallelization of a Three-Dimensional,
Unstructured-Mesh Astrophysics Code

R.J. Procassini, P.A.K. Amala, D.S. Miller, P.F. Nowak and W.G. Eme
Lawrence Livermore National Laboratory
University of California

The computational capability required for production modeling of complex astrophysical systems in three spatial dimensions dictates the use of parallel computing methods. This paper describes the plan for parallelization of a three-dimensional, unstructured-mesh, multi-physics-package astrophysics code. The initial parallel implementation of this code will employ a spatial-domain-decomposition and message-passing programming paradigm for use on distributed-memory multiprocessors. Techniques for the minimization of parallel overheads, such as load imbalance and communications, will be discussed in the context of a multi-physics package code. For instance, it is anticipated that different domain decompositions may be utilized for several of the physics packages in order to minimize both the parallel overheads and the memory requirements per processor. Plans for future, hybrid parallel computations, which use both message-passing and shared-memory programming techniques on clusters of symmetric multiprocessors will also be discussed.

Dr. Richard Procassini
Mail Stop L-18
Lawrence Livermore National Laboratory
P.O. Box 808
Livermore, CA 94551
spike@llnl.gov
Phone: (510) 424-4095
FAX: (510) 423-5112

A ScaLAPACK Out-of-Core Dense Linear Solver

Ed D'Azevedo

Oak Ridge National Laboratory, Mathematical Science Section

Jack Dongarra

Distinguished Professor, University of Tennessee, Department of Computer Science
Distinguished Scientist, Oak Ridge National Laboratory, Mathematical Science Section

We describe the initial design and implementation of a dense out-of-core solver as an extension to the ScaLAPACK library. Current implementation include LU factorization with partial pivoting, Cholesky factorization for symmetric positive definite matrices and QR factorization for general rectangular matrices. Work on band solvers are also currently underway. We implemented a left-looking column-panel oriented algorithm with a panel size that varies during the factorization to fully utilize all available memory. ScaLAPACK and PBLAS routines are reused to achieve high performance for in-core computations.

I/O is performed in high level routines that read or write general sub-sections of ScaLAPACK 2D block cyclic distributed arrays to disk. These routines support a shared file on the Intel Paragon (all data reside in a single file) and distributed files on PVM cluster (data distributed on local disks).

Preliminary results with double precision solvers on 64 nodes of the xps35 Intel Paragon at Center for Computational Sciences, Oak Ridge National Laboratory, show out-of-core factorization require approximately extra 20% overhead compared to in-core solvers.

Contact information:

Ed D'Azevedo
Mathematical Sciences Section,
Oak Ridge National Laboratory,
Oak Ridge, TN 37831--6367,
tele: (423) 576-7925, fax: (423) 574-0680
email: e6d@ornl.gov

ESnet's Internet Monitoring Activities

Les Cottrell, Gary Haney, Terry Healy, Connie Logg, David Martin, Bill Wing, Lois White
Stanford Linear Accelerator Center, Oak Ridge National Laboratory, Brookhaven National Laboratory, Fermi National Accelerator Laboratory

As the explosive growth of the Internet continues, the continued use of the Internet as a vehicle for conducting scientific research is being questioned. Presented herein are the results of detailed work undertaken by a focal group of the U.S. Dept. of Energy's Energy Sciences Network (ESnet) chartered to address the impacts of Internet growth. This work will show the impact of Internet growth on the consistency and stability of Internet connections to some of ESnet's primary national and international research partners and will also provide some recommendations for Internet monitoring.

Gary Haney, Oak Ridge National Laboratory, (423) 574-4629, hny@ornl.gov

Performance Results for the NAS NPB 2

William Saphir, Alex Woo, Maurice Yarrow NASA Ames Research Center

We present results performance results for version 2.1 of the NAS Parallel Bench marks (NPB) on the following architectures: IBM SP2/66 MHz SGI Power Challenge Array/90 MHz Cray Research T3D Cray Research T3E Intel Paragon NPB 2 is an implementation, based on Fortran 77 and the MPI message passing standard, of the original NAS Parallel Benchmark specifications. The NPB 2 suite is intended to be run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB 2 results complement, rather than replace, previously reported NPB results. Because they have not been optimized by vendors, NPB 2 implementations approximate the performance a typical user can expect for a portable parallel program on a distributed memory parallel computer. Together these results provide a well-calibrated comparison of the real-world performance of several parallel computers. By comparing these results to NPB 1 results, we draw conclusions about what optimization must be done to obtain high performance on these systems.

William Saphir
MS T27-A NASA Ames Research Center
Moffett Field, CA 94035
wcs@nas.nasa.gov
Phone: 415-604-4427
Fax: 415-604-3957

Performance Evaluation of Piecewise Parabolic Method on Convex Exemplar SPP1000

Udaya A. Ranawake, University of Maryland Baltimore County
Bruce Fryxell, George Mason University
John E. Dorband, NASA Goddard Space Flight Center

We consider the parallel implementation of an Euler equation solver using the piecewise parabolc method (PPM) on a HP-Convex Exemplar SPP1000. The performance of a message passing implementation based on PVM is compared against several implementations based on the shared memory programming model. The different versions based on the shared memory paradigm utilize different memory class addressing schemes in order to determine the best memory layout for the shared data structures. A calculation on a 450 by 1800 grid using the shared memory version of the program delivers 56 mflops per node on all 15 processors of an Exemplar. We also discuss the programming effort involved in optimizing this code.

Udaya A. Ranawake
USRA/CESDIS, NASA GSFC, Code 930.5, Greenbelt, MD 20771
email: udaya@neumann.gsfc.nasa.gov
Phone: 301-286-3046, Fax : 301- 286-1634

A Quantitative Vector-Space Model for Architecture-Independent Parallel Workload Characterization

Abdullah I. Meajil (1), Tarek El-Ghazawi (1), and Thomas Sterling (2) (1)Department of Electrical Engineering and Computer Science The George Washington University (2)Center of Excellence in Space Data & Information Sciences NASA/Goddard Space Flight Center

Experimental design of parallel computers calls for quantifiable methods to compare and evaluate the requirements of different workloads within an application domain. Such methods can establish the basis for scientific design of parallel computers driven by application needs, to optimize performance to cost. In this research, we introduce a new workload representation and workload similarity model that can contribute to important applications such as: Parallel Benchmark Design, Parallel Computer Architecture Design, and Performance Prediction on real parallel machines. This parallel workload characterization is based on our parallel instruction centroid and parallel workload similarity models. The centroid is a workload approximation which captures the type and amount of parallel work generated by the workload on the average. When captured with abstracted information about communication requirements, the result is a powerful tool in understanding the architectural requirements of workloads and their potential performance on target parallel machines. Experimental results using the NASA/NAS Parallel Benchmark suite are used to demonstrate the use of our models.

Abdullah I. Meajil
5612 Castlebury Court
Burke, VA 22015
Email: abdullah@seas.gwu.edu
Phone: (703) 764-8270
Fax: (703) 222-4343

Performance of the Parallel Multijoin Algorithm Using a Sampling Technique on the Cray C90.

Zahira S. Khan
Dept. of Mathematics and Computer Science, Bloomsburg University

In this paper is discussed the design and performance of a parallel multijoin algorithm for executing the multijoin operation of relational databases. The performance of the algorithm is improved by joining the relations in the ascending order of the join size. The multijoin algorithm consists of three phases. In the first phase, a sampling technique without replacement is used to determine the join selectivity ratios for each of the participating relations. These join selectivities determine the order in which the relations are to be joined. The second phase consists of hashing the relations to be joined. During the third phase the partitioned relations are actually joined. The performance of the algorithm depends on various factors including the size of the relations, join selectivities, data distribution, number of processors used, and the size of the sample.

Zahira S. Khan
Dept. of Mathematics and Computer Science
Bloomsburg University
Bloomsburg, PA 17815
e-mail khan@planetx.bloomu.edu

Comparison of Teaching Parallel Processing Using Supercomputers Versus Using Simulation Languages

Zahira S. Khan
Dept. of Mathematics and Computer Science, Bloomsburg University

An undergraduate course entitled "Introduction to parallel processing" was taught at Bloomsburg University in the Fall semester, 1995. The goals of this course included making the students proficient in parallel programming techniques, providing experience working on state of the art parallel architectures, and motivating students to conduct and present their research work at departmental seminars. An academic grant from Pittsburgh Supercomputing Center (PSC) was obtained to provide students with access to the Cray C90. The students attended a three day workshop at PSC and received training on executing and debugging programs on the T3D. The text book for the course was "The Art of Parallel Programming," by Bruce Lester. This text included MultiPascal as a language that simulates shared memory and multicomputer architectures and provides performance statistics for the programs. The paper compares the advantages and disadvantages of using the Cray with those using Multipascal in the classroom environment.

Zahira S. Khan
Dept. of Mathematics and Computer Science
Bloomsburg University
Bloomsburg, PA. 17815
khan@planetx.bloomu.edu

Theoretical Studies of Fescdipine, a New Organic Compound With Antiarritmic and Antihypertensive Activity

E. Angeles, I.Menconi, A.Ram=EDrez, L.Mart=EDnez, R.Mart=EDnez, M.E.Posada, A.Romero, E.Chavez, L.Favari
Instituto de Química-UNAM, Facultad de Estudios Superiores Cuautitlán-UNAM, Instituto Nacional de Cardiologí

It is shown in this work the results of the theoretical study of a new 1,4-dihydropyridine, analogous to the Nifedipine, those which showed a structural similarity with this last, however the biological activity studies presented important advantages of this experimental compound, due to the fact that the studies on the toxicology area and inmunology area of undesirable effects were smaller,comparatively with the commercial product. The theoretical studies were accomplished in a Silicon Graphics workstation and CRAY YMP 4/432 Supercomputer, using UNICHEM as software and PIMMS of the Series Molecular Oxford as molecular mechanics program . Acknowledgment : CRAY RESEARCH INC.

E. Angeles
Instituto de Química-UNAM
Facultad de Estudios Superiores Cuautitlán-UNAM
Instituto Nacional de Cardiologí
CINVESTAV-IPN
Apartado Postal 70-213
Coyoac=E1n CP 04810
Mexico
email: angeles@servidor.unam.mx

Homo and Lumo Orbital Molecular-Antibacterial and Antihelmintic Activity Relationships of Ethylphenylcarbamates, Calculated by CRAY YMP 4/432 Supercomputing

E.Moreno, R.Martinez, P.Martinez, S Reginensi, R.Castillo, N.Soto, S.Bernal, C.Ortega, R.Montoya, C.Quezada, R.Vasconcelos E. Angeles.
FESC, Instituto de Quimica, Facultad de Quimica Universidad Nacional Autonoma de Mexico

The bacterial activity and antihelmintic activity of the compound 4-hidroxi fenilethylcarbamate, has been proved, therefore we decided to design new compounds with potential anti bacterial activity and antihelmintic activity; accomplishing a comparative theoretical study of substituents influence present on drugs as antihelmintics such as albendazole, mebendazole, fenbendazole and oxfendazole in the electronic density and the HOMO and LUMO molecular orbitals, on the base kernel of ethyl phenylcarbamate elaborating this to semiempiric calculations of molecular orbital, employing AM1 and MNDO Hamiltonians, contents in the UNICHEM program, on CRAY YMP 4/432 Supercomputing to determine thus what are the structural requirements for an adequate antibacterial activity and antihelmintic activity.

Acknowledgment : CRAY RESEARCH INC, CONACYT

C. Moreno
FESC
Instituto de Quemica
Facultad de Quimica
Universidad Nacional Autfnoma de Mexico
Apartado Postal 70-213
Coyoacen CP 04810
Mexico DF
email: emoreno@servidor.unam.mx

A Parallel Molecular Simulation Environment with Dynamic Load Balancing

David F. Hegarty and M. Tahar Kechadi
Advanced Computational Research Group,
University College Dublin, Ireland.

In this poster we present a parallel simulation environment whose aim is to automatically parallelize computer simulations of complex polymers, DNA and proteins. The environment aims to mask the heterogeneity of the available hardware and communication resources from the user. We used three criteria in the design: ease of use, exploiting hardware heterogenity, and achieving high performance through parallelism. This leads to three linked system components, a graphical user interface, a virtual machine and a runtime system. These together achieve the specification of the problem, the placement of tasks onto the parallel machine, and the dynamic adaption of the decomposition to maintain efficency and react to a changing environment. We present an algorithm which balances the workload while maintaining the locality of the original decomposition. The algorithm is analysed using a theoretical model and experimental results obtained from implementations on a Cray T3D and a workstation cluster.

David Hegarty,
Advanced Computational Research Group,
Department of Chemistry,
University College Dublin,
Belfield, Dublin 4,
Ireland.
email: david@fiachra.ucd.ie
phone: +353-1-7062418

A Stability Confirmation of Two Analogous Molecules

Shandya Bhat
Eastern Michigan University

The object of this study is to determine the most stable conformation of two analogous molecules: trans-azobenzene and trans- stilbene. No conclusive structural information was available for these two widely known organic compounds before the present study. Both theoretical and experimental information available to date was inconclusive. The present study employs ab initio molecular orbital calculations with a 6-31G** basis set. Calculations were carried out using the GAUSSIAN package of programs on the PSC Supercluster (GAUSSIAN92) and a Decterm Alpha AXP system at Eastern Michigan University (GAUSSIAN94). For both molecules the energy minimum was found to be very shallow. Electronic and steric factors determining the relative stability of planar and nonplanar conformations are also analyzed.

Shandya Bhat
Eastern Michigan University
milletti@emuvax.emich.edu
(313) 487-1183
(313) 487-1496

Algebraic Multigrid Methods on a Shared Memory Vector Machine, Cray J90

Cazier J.-B., Gaertner K., Fichtner W.
Integrated Systems Lab, ETH Zurich

In semiconductor device simulation, solving three dimensional problems is not cheap. The reasons are the huge amount of unknowns to be considered and the large condition numbers involved. The use of multigrid methods can reduce the size of the problem to be solved by a direct process dramatically. The aim is to extend the algebraic multigrid methods for the continuity equations from regular to irregular grids, and from a sequential to a parallel vector machine. Results of a first implementation on a Cray J90 (within the framework of the Cray Research and the ETHZ (Eidgenossische Technische Hochschule Zurich) cooperation) will be given and discussed with respect to the absolute performance limits and its parallel efficiency. A sketch of the algorithm will also be presented.

K. Gaertner, IIS, ETZ, ETH Zurich, Gloriastr. 35, CH-8092 Zurich, Switzerland, e-mail: gaertner@iis.ee.ethz.ch

Queueing Statistics from a User's Perspective

Ann-Marie Pendrill

Job statistics from a supercomputing center can be monitored for many different purposes. It is obviously important for system tuning and problem detection. From the user's - and maybe funding agents' - point of view, the turnaround time obtained may be very important and can be described with various parameters. Unless moitored, turnaround times may approach equilibrium with workstations accessible locally to the users (IJSCA-HPC 9:4, 312-4, 1995). The poster will show different ways of displaying queueing data from the Supercomputing Centers supported by the Swedish Council for High-Performance Computing (HPDR). The relevance of different choices of parameters to be monitored will be discussed and related to analytical queueing theory. Possible implications of queueing theory for national computing policies will also be discussed.

Ann-Marie Pendrill
Dept of Physics, Göteborg University and Chalmers Univeristy of Technology
S-412 96 Göteborg, SWEDEN
Ann-Marie.Pendrill@fy.chalmers.se

APE: A Supercomputer Family for Numerical Simulation

Bartoloni Alessandro
INFN, Rome University "La Sapienza"

The poster will show the evolution of the APE supercomputer family during the last 10 year.

APE1 the first project started in 1985 with the aim of developing a SIMD array processor capable of 1 GFlop of peak performance. It was concluded in 1989 after that were produced two systems.

APE100 the second generation of APE supercomputers started in 1990 with the target to gain a factor 10 in the peak performance. Till now more than 20 systems of different size were producted for a total of more than 300 GFlops.

APEmille the third generation of machine currently under development targetted to produce computers systems in the TeraFlops range.

The poster will illustrate the APE architecture and its evolution. Further it will describe the state of the current APEmille project.

Bartoloni Alessandro
I.N.F.N (Italian Institute of Nuclear Physics)
APE Group c/o Rome University "La Sapienza"
Ple ALDO MORO n.2 00185-Roma
phone +39-6-49914423
fax +39-6-4957697
bartoloni@roma1.infn.it

Augmint: An Execution-driven Multiprocessor Simulation Toolkit for Intel x86 Architecture

Anthony-Trung Nguyen, Univ. of Illinois, Champaign-Urbana
Maged Michael, Univ. of Rochester

Most publicly-available multiprocessor simulation tools only simulate RISC architectures. Therefore, they cannot capture the instruction mix and memory reference patterns of popular architectures like Intel's x86. Augmint, an execution-driven simulation toolkit, fills this gap by supporting Intel's x86 architecture. Augmint takes a thread-based parallel application with m4 macros like the SPLASH and SPLASH-2 benchmark suites. Augmint runs on an x86-based uniprocessor PC under UNIX or Windows NT and can simulate multiple processors with very little overhead. It supports a thread-based programming model with a shared global address space and a private stack space per processor. Users can plug in their own architecture simulators. Augmint supports a simulator interface compatible with that of the MINT simulation toolkit for MIPS architectures, thus allowing the reuse of most architecture simulators written for MINT. The source code of Augmint is publicly available from http://www.csrd.uiuc.edu/iacoma/augmint.

Anthony-Trung Nguyen
anguyen@cs.uiuc.edu
1304 W. Springfield Ave
Urbana, IL 61801
Phone: 217-244-5979
Fax: 217-244-1351

Maged Michael
michael@cs.rochester.edu
Dept. of Computer Science
Univ. of Rochester
Rochester, NY 14627
Phone: 716-275-8479
Fax: 716-461-2018

Computation of Rotational and Vibrational Bound States of H-O-O Complex on Cray T3D


Xudong Troy Wu and Edward F. Hayes
Department of Chemistry, Ohio State University


The reaction H + O2 --> OH + O is one of the key steps in the combustion of hydrocarbons (e.g. natural gas, gasoline, diesel fuel, coal, etc.) The H-O-O complex is a stable molecule that is energetically accessible from both the reactants and products of this overall reaction. In this study, the rotational and vibrational bound states of H-O-O complex have been calculated on a Cray T3D. The algorithm that have been developed for this program show good scalability using up to 128 processors and the maximum performance achieved is 3.2 GFlops.

The computational method involves three key elements: 1) Use of Implicitly Restarted Lanczos Method (IRLM) to obtain the bound state eigenvectors and eigenvalues. 2) Transformation of the Hamiltonian for the problem with an efficient Sequential Diagonalization Truncation (SDT) algorithm. 3) Acceleration of the convergence of the IRLM method using Chebychev Preconditioning.

QueryDesigner: A Web-based Tool for Interactively Building Query Interfaces to Remote SQL Databases

Mark Newsome (1), Cherri Pancake (1), and Joe Hanus (2) (1) Department of Computer Science (2) Department of Botany and Plant Pathology Oregon State University

QueryDesigner is a Web-based tool for constructing query interfaces directly on Netscape Web browsers. The tool is meant for users who are not computer experts to set up their own forms and hypertext-based query interfaces to remote SQL databases. No experience in SQL and HTML programming is necessary. After choosing a target SQL database on the Internet, the user can build a personalized query interface by making menu selections and filling out forms---the tool automatically establishes network connections, and composes HTML and SQL code. The generated query form can be used immediately to issue a query, customized, or saved for later use. Results returned from the database are dynamically formatted into hypertext for navigating related information in the database. Our tool has been used successfully to implement query interfaces for several biological databases.

Mark Newsome
Department of Computer Science
Oregon State University
Corvallis, OR 97330
email: newsome@research.cs.orst.edu
work: 541.737.5300
fax: 541.737.3573

The MGAP-2 Processor Array and System

T. P. Kelliher, R. M. Owens, M. J. Irwin
MicroSystems Research Laboratory
Department of Computer Science and Engineering
The Pennsylvania State University
University Park, PA 16802
(T. P. Kelliher is with the Department of Mathematics and Computer Science,
Westminster College, New Wilmington, PA 16172.)

The Micro-Grain Array Processor (MGAP-2) is an array of 49,152 micro-grain processors, implemented as a planar mesh, operating at 50MHz, and capable of computing 4.9 teraops per second. Each processor has 32-bits of local dual-port RAM, computes two three-input boolean functions per clock, and has a dynamically reconfigurable interconnect to each of its four neighbors. This communication flexibility allows algorithms to be mapped onto the array in an efficient manner and the processors to be dynamically grouped into larger computational units. The entire MGAP-2 system fits onto a single 9Ux400 mm VME board.

We have developed a high level language, *C++, for programming the MGAP-2 and have targeted efficient systolic, low communication complexity algorithms for applications such as basic arithmetic and image processing operations, motion estimation, speech recognition, computational molecular biology, simulation of physical phenomenon using a cellular automaton model, Hough Transform, Discrete Wavelet Transform, Discrete Cosine Transform, and Singular Value Decomposition.

Thomas P. Kelliher
Department of Mathematics and Computer Science
Westminster College
New Wilmington, PA 16172
Phone: (412) 946-7290 Fax: (412) 946-7158
E-mail: kelliher@abacus.westminster.edu

VISA - the Interactive Visual Language for Parallel and Distributed Programming

M.B.Ignatiev, Y.E.Sheinin, D.E.Tatkov

The State Academy for Aerospace Instrumentation, St.-Petersburg, Russia

The full scale parallel programming for general purpose mass-parallel computers and distributed systems demands new paradigms and means.

A new interactive visual language for parallel programming, called VISA, was developed. The VISA is not a WYSIWYG-style visual language, but a true programming language for specifying parallel algorithms. The graphical clauses of a parallel program are constructed of icons according to the syntactic and semantic rules of the language. The semantics of control operators of the language is based on the Developing Asynchronous Processes model of parallel computations. A program in VISA is presented in a graphical form as a dynamic network of operators and data objects. A full scale parallel program is a complex multicomponent multilinked structure. It is impossible neither to construct ("write") it, no to understand ("read") it outside a CASE system. A practicable visual parallel programming language can be only an interactive language.

The prototype integrated programming tools set for VISA is presented. It is written in C++ and works on PC under Windows 3.11.

State Academy for Aerospace Instrumentation
Bolshaya Morskaya 67
190000 St.-Petersburg
RUSSIA
ysh@mt.spb.su
tel: +7(812) 210-7094
Contact author name: Y.E.Sheinin

A High Performance Vision System for Photo Interpretation

Yongwha Chung and Viktor K. Prasanna University of Southern California

The goal of the research is to develop scalable and portable parallel algorithms for intermediate and high level vision problems. Parallelizing intermediate and high level vision applications is challenging due to the irregular computation and communication features of these algorithms. In this work, we parallelize a system to detect and describe buildings from monocular views of aerial scenes. The computational tasks of this system include image feature extraction, perceptual grouping, shadow analysis, and hypotheses selection/verification. To our knowledge, our system is the first one to provide interactive performance for intermediate and high level vision tasks on general-purpose HPC platforms. We first define a realistic model of distributed memory machines to estimate communication cost. Based on this, we design an algorithmic framework which enhances processing node utilization and overlaps communication with computation by maintaining algorithmic threads in each processing node. For example, given an 1024 x 1024 image, the image feature extraction and one of the perceptual grouping steps can be performed in 0.717 seconds on a 64-node T3D. A serial implementation takes 29.643 seconds on a single-node T3D. We use C and MPI for our implementations to make them portable to other HPC platforms. By using our system, the execution time to produce a 3D description of buildings can be reduced from a few hours to a few seconds. This research was supported in part by NSF under grant CCR-9317301 and in part by DARPA under grant F49620-93-1-0620.

Viktor K. Prasanna
Department of EE-Systems
EEB-244
University of Southern California
Los Angeles, CA 90089-2562
email: prasanna@ganges.usc.edu
TEL: (213) 740-4483
FAX: (213) 740-4418

Scalable and Portable Implementations of Real-time FFT Benchmarks

Jin-Woo Suh and Viktor K. Prasanna University of Southern California

Recently, HPC technology has been employed to realize real-time embedded signal processing applications such as Space-Time Adaptive Processing (STAP), Synthetic Aperture Radar(SAR), and Sonar systems. For the evaluation of these systems, many real-time benchmarks have been proposed by the DoD HPC community. These include Hartstone, Rhealstone, TPC, MITRE+Rome Lab. and PARKBENCH benchmarks. These real-time benchmarks differ from traditional HPC benchmarks in many ways: 1) time is considered as the most critical factor, 2) real-time performance is measured rather than off-line performance, 3) benchmarks are usually executed many times to evaluate fluctuations in run time, and 4) throughput as opposed to latency is a very important measure of a systems's ability to meet time constraints. For scalable implementations of real-time benchmarks, we have developed scalable communication primitives for N-to-M processor pipeline. In this algorithm, the number of communication steps is reduced to ceiling(lg(M/N+1))+N-1. Previous algorithms take MN steps. Using our algorithm, we have implemented 2D real-time FFT benchmark that has been recently defined by MITRE and Rome Lab. on SP2 and T3D. The results have been very encouraging. The number of processors needed is reduced by 25% compared with earlier implementations, for the FFT operations needed in SeaSAT SAR processing. Our code written using C and MPI is portable to other HPC platforms.

Viktor K. Prasanna
Department of EE-Systems
EEB 244
University of Southern California
Los Angeles, CA 90089-2562
Email: prasanna@usc.edu
Vox: (213) 740-4483
Fax: (213) 740-4418
www page: http://www.usc.edu/dept/ceng/prasanna/home.html

Molecular Dynamics Engine

Takashi Amisaki
Shimane University
Shinjiro Toyoda
Fuji Xerox Co., Ltd.
Hiroo Miyagawa
Taisho Pharmaceutical Co., Ltd.
Akihiro Kusumi
The University of Tokyo
Eiri Hashimoto, Hitoshi Ikeda, Nobuaki Miyakawa
Fuji Xerox Co. Ltd.
Kunihiro Kitamura
Taisho Pharmaceutical Co., Ltd.

Molecular dynamics (MD) simulation presents a challenging problem to computer technology, i.e., simulation of the behavior of large and complex systems such as biomolecules requires very long time. This is due to a large number of pairwise, long-range, non-bonded interactions between constituent particles, which increases as O(N^2) as the number of particles N in the system increases. To overcome this problem, we developed MD Engine, which is a hardware accelerator designed to be plugged into a workstation. MD Engine is composed of a homogeneous array of custom processor chips, which calculate pairwise forces exerted on each particle by all other particles in the system. It accommodates periodic boundary conditions and the Ewald method to evaluate Coulombic forces. With an MD Engine consisting of 24 processors plugged into a SPARCstation 10, an MD simulation of a biomembrane system (22,264 atoms) proceeds faster than an R8000 workstation by a factor of 48.

Takashi Amisaki
Fac. of Science and Engineering
Shimane University
+81-852-32-6472
ami@cis.shimane-u.ac.jp

A Skelton Data-Parallel Particle-In-Cell Code Using HPF/MPI

DongSheng Cai
Department of Physics, University of California at Los Angeles
Institute of Infromation Sciences and Electronics, University of Tsukuba

The present poster discusses data-parallel algorithms suitable for parallel skelton Particle-In-Cell (PIC) codes using HPF/MPI. The algorithms are based on a vector model of computations, i. e. the scan model. The purpose of this paper is to show how the model can be applied to a set of vector algorithms in parallel PIC codes using HPF/MPI. A skelton PIC code is a cycle consisting of four steps: (1) Solving fields on a grid, (2) Interpolating fields to particle positions; (3) Advancing particle positions and velocities with the fields; and (4) Interpolating particle charge and current densities to the grid. The cycle is the essential part of the PIC code and the skelton code is developed in order to analyze the performace of PIC code on the various platforms. The code is written in HPF/MPI for the portability of the code. The code is developed on the CRAY T3D, and Personal PC clusters where Linux are used.

DongSheng Cai
Department of Physics, University of California at Los Angeles
Box 951547, Los Angeles, California 90095-1547
email: cai@physics.ucal.edu
TEL: (310) 206-6166
FAX:(310)825-4057
Institute of Information Sciences and Electronics
University of Tsukuba
Tsukuba, 305, Japan
email: cai@is.tsukuba.ac.jp
TEL: +81-298-53-5541
FAX: +81-298-53-5206

A Portable SPMD Code For Adaptive Finite Element Methods in Parallel Computing Environments

Don Morton

Academic year
Department of Mathematical Sciences
Cameron University
Lawton, OK 73505

Summers
Arctic Region Supercomputing Center
University of Alaska
Fairbanks, AK 99775

A heterogeneous, distributed, adaptive finite element code, originally developed by the author for the Cray Y-MP/T3D system has been modified to take advantage of the Single-Program Multiple-Data (SPMD) paradigm. The original program utilized a Cray Y-MP process for addressing issues of global mesh modifications, and actual finite element computations were distributed to Cray T3D processes. Packaging this work into a SPMD environment provides us greater flexibility in choosing an architecture, and removes a communications bottleneck that was encountered in the Y-MP/T3D implementation. The SPMD version of the program has been successfully implemented on a Cray T3D in standalone mode and a cluster of Pentium PC's running the Linux operating system. Timing data for both architectures will be provided.

Dr. Don Morton
Department of Mathematical Sciences
Cameron University
Lawton, OK 73505
Phone: (405) 581-2396
Email: morton@grizzly.cameron.edu

The Hawaii Connection: Anytime, Anyplace, for Everyone!

Marsha Mooradian-Maui High Performance Computing Center
Vicki Kajioka-Hawaii State Department of Education

The Hawaii State Department of Education (HSDOE) in collaboration with the Maui High Performance Computing Center will provide a multimedia tour focusing on the innovative N.I.I. Technology Telecommunications for Teacher staff development programs (T 3). Utilizing community resources, planning and collaboration, Hawaii has implemented a variety of significant advancements in the integration of technology across the curriculum throughout the 245 schools statewide.

The exhibit will highlight the relationship between the Maui High Performance Computing Center (MHPCC), the HSDOE and over 150 community businesses through Tech Corps Hawaii, a new non-profit corporation. The poster exhibit will present the technical infrastructure for connecting to the N.I.I. through a project called "Let's Get Wired" and the extensive training programs which have been successfully completed over the past three years.

Three successful Computer Integration projects will also be featured The Hawaii Super Computing Challenge Competition, The Electronic School - A Virtual Education Community and the T3 - Technology Telecommunications Teacher professional development project. These programs have enhanced student learning in Hawaii by utilizing the Internet as a resource for students, teachers and parents. Web pages created by students and teachers will provide examples of the innovative cross- curricular activities.

Contact Information:

Marsha Mooradian: Maui High Performance Computing Center
Vicki Kajioka: Hawaii State Department of Education

Runtime Library for Parallel Unstructured Grid Generation

Nikos P. Chrisochoides and Florian Sukup
Cornell Theory Center

The unpredictable and irregular nature of parallel algorithms for unstructured computations makes difficult their efficient implementation on top of synchronous communication primitives such as blocking sends/recvs. As an alternative one can use asynchronous communication mechanisms such as Active Messages that do not require rendezvous between the sender and receiver. Processors don't have to busy wait for requested data or remote service requests and they can proceed with their remaining work. Thus asynchronous communication can improve program's performance by masking communication and synchronization overheads. Unfortunately, programmers have to address a number of difficult problems that are inherent to asynchronous programming paradigm. In this project we make an attempt to help the programmer by providing a runtime library that makes asyncronous programming easier and more intuitive to the user. In addition the runtime library provides sophisticated data transfers that improve the performance of naive implementations. We demonstrate the effectiveness of our runtime library by implementing a kernel, the Bowyer-Watson algorithm, that is very useful to the parallel Delaunay triangulation methods for unstructured grid generation. The efficient implementation of Bowyer-Watson algorithm on multiprocessors is a challenging problem: its computation and communication patterns are variable and unpredictable. The results are quite impressive. We eliminate 66 % of the communication for small to medium size messages.

Nikos P. Chrisochoides and Florian Sukup
Cornell Theory Center
H. T. Frank Rhodes Hall
Cornell University
Ithaca, NY 14853-3801
nikosc@cornell.edu and sukup@cs.cornell.edu
(607) 2548839
(607) 2548888

Parallel I/O on the SP2 Helps to Make Big Science Possible

Jerry Gerner, Steven Hotovy, and David Schneider
Cornell Theory Center

Computational scientists and engineers and other would-be users of high-performance computers usually turn to parallel computing because their problems are "big" -- and in several different dimensions, including wall-clock time necessary to solve the problem ("mean time to publication"), memory requirements, and data storage and I/O bandwidth. For researchers with applications that involve 100's of MB's to 10's of GB's (or more) of problem data, the availability of parallel I/O facilities on the Cornell Theory Center's 512-node IBM SP2 has proven to be an important factor in their abiltiy to "do 'big' science". We provide some background on and motivation for parallel I/O, a description of the current PIOFS (Parallel I/O File System) configuration for the SP2, and some description of several scientific applications (using the SP2 and PIOFS) and the results they have obtained.

Jerry Gerner
Cornell Theory Center
730 Frank H.T. Rhodes Hall
Ithaca, NY 14853-3801
voicemail: 607-254-8852, email: jeg@tc.cornell.edu

Parallel Simulation for Neurobiology

Nigel Goddard and Greg Hood
Pittsburgh Supercomputing Center

Parallel computing platforms are becoming ubiquitous, providing computational power up to three orders of magnitude beyond desktop machines. We extended the Genesis simulator to run on these platforms, enabling effective investigation of much larger problems. Portability of simulations across serial and parallel platforms, and optimization of data communication are key goals. Extensions to the Genesis scripting language hide much of the complexity from the user while allowing explicit control over partitioning of the simulation and inter-processor synchronization. We envision multi-cell network models and parameter searching applications as those most likely to benefit from this work. Cray T3D experiments demonstrate superlinear speedup for low processor count, due to n-squared complexity in the serial algorithm. Speedup decreases to linear at 16 processors and sub-linear beyond, although the exact cutoffs are highly model dependent. The package is now in production at PSC and being ported to other MPPs.

Nigel Goddard
Pittsburgh Supercomputing Center
4400 Fifth Avenue
Pittsburgh, PA 15213

ngoddard@psc.edu

(412) 268 8858 voice
(412) 268 5832 fax

Using a Parallel-vector Approach to Create Characteristic Sequence Data Sets

Alexander J. Ropelewski, Joseph Geigel,
Hugh B. Nicholas Jr., David W. Deerfield II
Pittsburgh Supercomputing Center

Characteristic sequence data sets must be unbiased and representative of the entire range of biological molecules in the data. For instance a model that accurately describes a family of proteins needs to represent the entire phylogenetic range in which the protein is found and not be biased toward some subset of the known proteins. We describe a scalable parallel-vector technique for selecting a characteristic set of protein sequences. This approach is based on rigorous comparisons of every pair of sequences; an approach that requires a computation that is proportional to the products of the lengths of the sequences compared. This computation can be both vectorized and parallized, allowing the characteristic data set to be selected using semi-emperical statistical techniques. Characteristic data sets selected in this matter are appropriate for multiple sequence alignments and for the study of common biochemical properties. This research was funded by NIH-NCRR grant 1 P41 RR06009.

Alexander J. Ropelewski
Pittsburgh Supercomputing Center
4400 Fifth Avenue
Pittsburgh PA 15213
Phone: 412-268-4960
FAX: 412-268-8200
Email: ropelews@psc.edu

Content Based Pathology Image Classification and Retrieval

Arthur W. Wetzel, PhD1
Philip L. Andrews, PhD1
Michael J. Becich, MD, PhD2

Pittsburgh Supercomputing Center1
4400 Fifth Avenue
Pittsburgh, PA 15213

University of Pittsburgh Medical Center2
Department of Pathology
200 Lothrop Street
Pittsburgh, PA 15213

A joint effort between the Pittsburgh Supercomputing Center (PSC) and the University of Pittsburgh Medical Center (UPMC) is producing a large archive of pathology images with associated search and display software. Image sets contain standard magnifications of microscope slides tagged with pathologist's evaluations for use as known examples for training and comparison with unknown images. Methods for classifying, comparing, and retrieving images by content are the primary focus of the PSC portion of the project.

Initial tests of classification methods have been incorporated into an automated tool for grading severity of prostate cancer images. Results correlate well with grades assigned by UPMC pathologists. We are extending the classification methods to construct image signatures for the entire archive which can be used to identify images matching content query patterns. The poster exhibit outlines the role of PSC supercomputers in constructing effective image signatures and providing high speed image retrieval.

Arthur W. Wetzel
403 Mellon Institute
Pittsburgh Supercomputing Center
4400 Fifth Avenue
Pittsburgh, PA 15213
E-mail: awetzel@psc.edu
Phone: 412-268-3912
Fax: 412-268-5832
http://www.psc.edu/~awetzel

A Study of Quark-Antiquark Bound States on a Light Front

Armen Ezekielian
Ohio Supercomputer Center

A description of strongly-interacting bound states of quarks and antiquarks based on the theory of Quantum Chromodynamics (QCD) is one of the primary goals of theoretical elementary particle physics. Whereas the high-energy regime of QCD is believed well-understood, theoretical calculations in the low-energy part of the theory in which bound states of quarks and antiquarks live, have been difficult to obtain from QCD. Such calculations are critical to obtaining a more complete knowledge of elementary particles and their interactions. The field of lattice gauge theory has been responsible for the majority of numerical simulations which probe the bound-state structure of strongly-interacting systems. An alternative method proposed more recently has been the use of light-front quantization to obtain an effective QCD Hamiltonian. This Hamiltonian is then diagonalized numerically to obtain the energy eigenvalues and eigenvectors of the bound state in question. In the current study an effective Hamiltonian is derived from a quantization of QCD on the light front. Energy eigenvalues and wave functions (eigenvectors) for quark-antiquark bound states are obtained numerically. A comparison of the numerical results with experimental data is performed.

Armen Ezekielian
Ohio Supercomputer Center
1224 Kinnear Road
Columbus, OH 43212-1163
email: abe@osc.edu
fax: (614) 292-7168

Numerical Simulation of a Coal Flame

This numerical simulation provides the trajectories and reaction histories of coal particles in passage through a pulverized coal flame. Direct numerical simulation (DNS), instead of heuristic models with empirical parameters, is used to determine the flow field in the combustor geometry. Particles injected at various radial locations in the entering jet are transported by the unsteady fluid flow with the drag force due to the local fluid velocity. Particle reaction histories, modeled with a single-step, first-order rate equation for the volatile release and the Extended Resistance Equation for the char reaction, are calculated by including experimentally-based temperature and oxygen concentration fields, which provide the effect of a flame. As the particle reactions (devolatization and combustion) proceed, there are changes in particle density and size, which also affect the particle motion and energy.

This study shows the potential for developing computational histories of reacting particles in pulverized coal flames at a level of detail evidently exceeding what has been possible in the past.

Contact: Charlie Bender, director
bender@osc.edu
Moti Mittal, Senior Scientist
moti@osc.edu
phone: 614-292-9248
fax: 614-292-7168
mailing address:
Ohio Supercomputer Center
1224 Kinnear Road
Columbus, OH 43212-1163

Back to OSC SC '96 Participation Page: http://www.osc.edu/SC_96

A Numerical Investigation of the Mixing of Unsteady, High-Speed Jets

Tim Rozmajzl
Ohio Supercomputer Center

The mixing of high-speed, imperfectly-expanded, turbulent jets with surrounding air is an important consideration in the design of high-speed aircraft. In particular, the mixing characteristics of these jets play an important role in the production and propagation of jet noise. Numerical simulation of such jets provides an effective means of investigating the unsteady flow mechanisms that contribute to the mixing process. A detailed understanding of the critical flow features involved in the mixing process and their effect on jet noise is necessary for implementing design modifications to enhance mixing and reduce jet noise. In the current study the time-dependent Navier-Stokes equations are solved numerically for an underexpanded rectangular jet and for an overexpanded round jet with a convergent-divergent nozzle. The rectangular jet operates at a fully-expanded Mach number of approximately 1.44, and the round jet has an exit Mach number of 1.4. Numerical results include the time-varying distribution of flow variables such as density, pressure, temperature, Mach number and vorticity. In addition, a Fourier analysis of fluctuations in the flow variables is presented. Where possible, numerical results are compared with experiment.

Tim Rozmajzl
Ohio Supercomputer Center
1224 Kinnear Rd.
Columbus, OH 43212
Email: tim@osc.edu
FAX: (614) 292-7168