Nodes Modes Codes

8/8/2019 Nodes Modes Codes

http://slidepdf.com/reader/full/nodes-modes-codes 1/9

NODES, MODESAND FLOW CODES

Massively parallel supercomputers seem the best h opefor achieving progress on 'grand challenge' problemssuch as understanding high-Reynolds-number turbulent flows.

George Em Karniadakis an d Steven A. Orszag

Understanding turbulent flows is a "grand challenge"1

compa rable to other pro minen t scientific problems such asthe large-scale structure of the universe and the nature ofsubatomic particles. In contrast to many of the othergrand challenges, progress on the basic theory of turbu-lence translates nearly immediately into a wide range ofengineering applications and technological advances thataffect many aspects of everyday life.

Numerical prediction of fluid flows s at the heart ofunde rstan ding and modeling turbulence . However, suchcomputational fluid dynamics simulations challenge the

capabilities of both algorithms and the fastest availablesupercomputers. In 1970 Howard Emm ons2

reviewed thepossibilities for numerical modeling of fluid dynamics andconcluded: "The problem of turbu lent flows s still the bigholdout. This straightforward calculation of turbulentflows—necessarily three-dimensional and nonsteady—requires a number of numerical operations too great forthe foreseeable future." However, within a year of thepublication of his article, the field of direct numericalsimulation (DNS) of turbulence was initiated with theachievement of accurate simulations of wind-tunnel flowsat moderate Reynolds numbers.3 (The Reynolds number,a dim ensionless me asure of the degree of nonlinearity of aflow, is defined as R = vrmsL/v, where vrms is the rms

velocity, v is the kinem atic viscosity of the fluid, and L is atypical length scale at which the energy maintaining theflow is input. At sufficiently high Reynolds numbers,flows become turbulent.) In the last 20 years, the field ofturbulence simulation has developed in two directions.First, turbulence simulations are now regularly per-formed in simple geometries, and extensive databases offlow fields have been constructed for the analysis ofturbulent and even laminar-turbulent transitional inter-actions.

4Second, simulations of turbulent flows in proto-

type complex geometries are now emerging.5

(See figure 1and the cover of this issue.)

George Karniadakis is an assistant professor of mechanical

an d aerospace engineering at Princeton University.

Steven Orszag is Forrest C. Hamrick '31 Professor of

Engineering at Princeton.

Incompressible fluid flows are governed by theNavier-Stokes equations,

dtvV

2v

V-v =

(1)

(2)

where v is the velocity field, to = V x v is the vortic ity field,and FI = p + V2 v

2is the pressure head, where p is the

pressure. In direct numerical s imulat ion, the Navier -Stokes equations are solved at all scales for which there isappreciable kinet ic energy. At large Reynolds numbers ,the Kolmogorov theo ry of small scales in turbu lenc e showsthat eddies are appreciably excited at scales ranging insize from L, at which energy input takes place, down torj = L/R

3'4, at which viscous dissipation becomes signifi-

can t . (See the ar t icle by Uriel Fr isch and Orszag inPHYSICS TODAY, Ja nu ar y 1990, page 24.) Since turbu lent

flows are ne cessarily time de pen den t an d three-dimen-sional and since each excited eddy requires at least onegrid point (or mode) to describe it, as R increases, thespatial resolution, or n u m b e r of modes, required todescribe the flow increase s at leas t as fast as IR 3 /4 )3.

With convent ional DNS methods , the t ime s tep mustbe no larger than rj/vrrns in order to resolve the motion of

small eddies as they are swept around by large ones withrm s velocity vrms. Because large-scale turbulence evolveson a t ime scale of order L/vrm s, on the order of R

3'4

t imesteps are required. Thu s the comp utat ional work require-ment (embodied in the num ber of modes times the numbe rof time steps) for D NS of turbulence scales roughly as R

3

and increases by an order of m ag n i t u d e if R is doubled.This type of rapid increase in resolut ion an d correspond-ing increase in computa t iona l work r equ i rements is thechal lenge of DNS at high Reynolds numbers and necessi-t a t es the use of theory to remove degrees of freedom andsimplify the computat ions .

Two al ternat ive approaches aim to al leviate thecom putat ional re quirem ents of DNS of turbulenc e: Large-eddy s imulat ions 6 use a fixed spatial resolution, and theeffects of eddies that are not resolved are modeled usinggradient t ranspor t ideas such as eddy viscosity. (See thearticl e by Frisch and Orszag.) Rey nolds-averaged Na vie r-Stokes s imulat ions model al l turbulent fluctuations theo-

3 4 PHYSICS TODAY MARCH 1993 © 1993 Americ an Insrirure of Physics



Effects of riblets on turbulence as simulated by a spectral-element method on the DeltaTouchstone com puter. Colors indicate the instantaneous mag nitude of the streamwisecom pone nt of the velocity; the highest values occur in the midd le of the channe l. Values areshow n at three different cross-flow planes. The mean flow is from left to right, and theturbulence is fully developed and statistically steady at a Reynolds number (based on flowrate) of 350 0. Com puted turbulen ce intensities indicate that the reduction of fluctuations

near the wall with riblets (bottom) results in a 6% percent drag reduction in this geometry.(Courtesy of Douglas Chu, Catherine H. Crawford and Ronald D. Henderson, PrincetonUniversity.) Figure 1

retically or empirically—not just the ones smaller thanthe grid spacing. Recently we have studied a va riant ofReynolds-averaged Navier-Stokes modeling called very-large-eddy simulation, which has some features of large-eddy simu lation: All statistica lly isotropic eddies aremodeled, while large-scale anisotropic eddies are simulat-ed explicitly.7

The four images on the cover of this issue illustratethe effect of increasing Reynolds number on flow past asphere . The top three images, at R = 300 (top image), 500and 1000, are direct numerical sim ulations. The bottomimage, at R = 20 000, is a large-eddy sim ulation. Eachimage shows the surface at which the axia l velocity is 90%of the free stream velocity, colored according to the localvorticity ma gnitude. Red indicates high vorticity; white,low vorticity. These simulations were performed on anIntel iPSC/860 32-node hypercube using a parallel spec-tral-eleme nt Fo urier code, as discussed later. The large-scale flow pattern is present at all these Reynoldsnumbers, but for R S: 1000 the excitation of small scales(indicated by vorticity) increases rapidly, making DNSimpractical at current capabilities.

The need for parallel processingThere is now a broad consensus that major discoveries inkey applications of turbule nt flows would be w ithin graspif computers 1000 times faster than today's conventional

supercomputers were available, assuming equal progressin algorithms and software to exploit th at c omputer powerand effective visualization techniques to use the results ofthe com putations. This consensus has been realized in theHigh Performance Computing and Communications Ini-tiative, whose goal is the development and application ofteraflop (1012 floating-point operations per second) com-puters in the second half of the 1990s. This thousandfoldimprovement in useful computing capability will beaccompanied by a hundredfold improvement in availablecomputer networking capability.

It is estimated that a teraflop computer could performReynolds-averaged N avier-S tokes calcu lations of flowpast a complete aircraft, large-eddy simulation of flow pasta wing and DNS of flow past an airfoil, all at moderateReynolds number (R on the order of 10

8). Following

Andrei Kolmogorov's scaling arguments, similar esti-mates show tha t DNS of a complete aircraft will require a tleast an exaflop (1018 flops) computer.8 This example ofcomputing flow past an a ircraft is typical: Even withteraflop computing power, progress on real engineeringapplications will require synergism among computing,

theory (to describe the effects of small-scale motions) andprototype experiments9 (to elucidate fundamental phys-ical phenomena).

We will be able to achieve teraflop speeds in thisdecade only by using massively parallel supercomputer

PHYSICS TODAY MARCH 19 90 3 5





We believe that one should approach the design of acomputer system to solve physical problems much as oneapproaches the design of a laboratory to perform anexperiment. One must take into account all resolutionand computational requirements, including the balanceamong memory size, processing speed and the bandwidthsof various components. However, it is nearly impossibleto address these issues in a generic way because of thelarge variety of existing computer architectu res. Here wewill try to make some progress by first addressing theissues of programming model and parallel efficiency, andthen, in order to address other issues, focusing on the"Prototype Parallel Computer," a system that has manycomponents in common with existing and proposedparallel computers.

A popular taxonomy for parallel computers, intro-duced by Michael Flynn, divides the programming modelsinto two classes: single instruction, multiple d ata stream(SIMD) and multiple instruction, multiple data stream

(MIMD). In an SIMD computer, such as the T hinkingMachines CM-2 or an NCUBE Inc computer, each processorperforms the same arithmetic operation (or stays idle)during each computer clock cycle, as controlled by acentra l control unit. (See figure 2a.) Program s in thismodel, also referred to as data parallel programs,11 usehigh-level languages (for example, parallel extensions ofFORTRAN and c), and computation and communicationamong processors is synchronized automatically at everyclock period.

On a multiple-instruction, multiple-data-stream com-puter (see figure 2b) each of the parallel processing unitsexecutes opera tions independently of the o thers, subject tosynchronization by the passing of messages among proces-

sors at specified time interv als. The parallel data distribu-tion and the message-passing are both under user control.Examples of MIMD systems include the Intel G amma, theDelta Touchstone computers and, with fewer but morepowerful processors, the Cray C-90. (See the box onthis page for a prescient 1922 description of anMIMD computer.)

While it is often easier to design compilers andprograms for SIMD multiprocessors because of the unifor-mity among processors, such systems may be subject togreat com putationa l inefficiencies because of the ir inflexi-bility at stages of a computation in which there arerelative ly few identical operations. There has been anatural evolution of multiprocessor systems toward the

more flexible MIMD models, especially the merged-programming model, in which there is a single program(perhaps executing distinct instructions) on each node.The merged-programming model is a hybrid between thedata parallel model and the message-passing model and isexemplified in the newest Connection Machine, the CM-5.In this single-program, m ultiple-data model, data parallelprograms can enable or disable the message-passing mode.Thus one can take advantage of the best features ofboth models.

There is no universal yardstick w ith which to me asureperformance of computer systems, and the use of a singlenumber, such as the peak performance quoted by themanufacturer, to characterize performance is often mis-leading. So th at different aspects of the com puter systemare measured, performance is commonly evaluated interms of benchmark runs consisting of small code seg-me nts ("kernels") and prototype applications. This ap-

A Myriad Computers at Work'

In his landmark treatise Weather Predict/on by NumericalProcess (Cambridge University Press, 1922), the Britishmeteorologist Lewis Fry Richardson demonstrated re-markable prescience in his description of a futuristicmultiple-instruction, multiple-data-stream parallel com-puting facility for weather forecasting, albeit with human"computers" :

"Imagine a large hall like a theatre, except that thecircles and galleries go right round through the spaceusually occu pied by the stage. The walls of this chamb erare painted to form a map of the globe. . . . A myriadcom puters are at work upon the we ather of the part of themap where each sits, but each computer attends only toone equation or part of an equ ation. The work of eachregion is coord inated by an o fficial of higher rank. . . .From the floor of the pit a tall pillar rises to half the heightof the hall. It carries a large pulp it on its top . In this sitsthe man in charge of the whole theatre; he is surroundedby several assistants and messengers. On e of his duties isto maintain a uniform speed of progress in all parts of theglobe. In this respect he is like the condu ctor of anorchestra in which the instruments are slide rules andcalculating machines. But instead of waving a baton heturns a beam of rosy light upon any region that is runningahead of the rest, and a beam of blue light upon thosewho are behindhand."

proach, however, is still dependent on the quality ofsoftware rather than just on hardware characteristics.The computer science community has recognized thecontroversy over performance evaluation methods andhas made several recent attempts to provide moreobjective performance metrics for parallel computers.

Gene A mdahl noticed long ago that the efficiency of aparallel computer system depends critically on the frac-tion m of the total number of arithmetic operations thatcan be done in parallel.

12Consider a computation that

requires time T on a single processor. If there a re P suchprocessors executing in parallel, the parallelizable opera-tions require time mT/P, while the remaining fraction(1 — m) of computations done on a single processorrequires time (1 - m)T. Thus the total time is reduced to[(1 — m) + m/P]T, giving a scalar performance measure

(1 - m) + m/P(3)

which is the effective num ber of processors used. Forexample, if m = 1, then g = P, implying that all theprocessors are used effectively; if m = 0, then | = 1.Equation 3, called Amdahl's law, shows that massivelyparallel computers with large P require massively paral-lelizable computations. For example, if P is large andm = 1 - IIP, then £ is approximately PI2: Only half of thecom puter is used effectively. The effective perfo rmance ofthe system can be measured by the parallel efficiencyEP = g/P, which is about l/(k + 1) when m = 1 - k/P.

The scalar performance measure £ can sometimes bemisleading, since it may favor inefficient but highlyparallelizable algorithms over more efficient algorithmsthat may be more difficult to map onto a parallel

multiprocessor computer.

13

There are several industry-standard benchmark programs such as Whetstone, Dhry-stone and Linpack that are for nonparallel systems buthave parallel extensions. While these benchm arks havebeen used extensively in all advanced computer system




evalu ations, specific ben chm arks have been developed forevaluating shared- and distributed-memory parallel com-puters. These vary from simple parallel loops, whichmeasure the abilities of parallelizing compilers, to thePERFECT benchmark, which consists of 13 programs(including several fluid dynamics programs), and MIMDbenchmarks such as Genesis, which consists of programsfor fast Fourier transforms, molecular dynamics, linearalgebra and numerical solutions of elliptic partial differ-ential equations.

Measures of performance based on Amdahl's law are

particularly effective for small programs that do notrequire extensive and intensive use of computer memory.Most programs used as computer benchmarks are of thissort, but they do not represent many of the requirementsfor the solution of grand challenge problems like turbu-lence simu lation. For example, we can now simulate afield of homogeneous turbulence at Reynolds numberscomparable to those of low-turbulence-level laboratorywind tunnels in one day on a 50-megaflop, 32-megaworddesk-side superworkstation using 1283 modes. In 1970such a computation would have required many months onthe CDC 7600 supercomputer even though the peak CPUspeed of the CDC 7600 was also roughly 50 megaflops.

This marke d difference in through put is due mainly to thelimited memory size of the CDC 7600, which would havemade necessary many slow data transfers to disk.

We believe the issues of balancing memory, networkspeed and processing speed in computer design are bestaddressed by examining the Prototype Parallel Computer,depicted in figure 3, which we designed to solve a three-dimensional fluid dynamics problem. The key compo-nents of the PPC are an interconnecting set of P processingelements with distributed local memories, a shared globalmemory and a fast disk system. To avoid com putationalbottlenecks, data must be transferable among thesecomponents in roughly comparable times. We sta rt byconsidering memory size, because we envision that grand

challenge problem s will have th e com puter fully dedicatedto them for periods of 10

6seconds or so (roughly two weeks)

per ru n. This situation is quite different from tha t ofrunning a shared resource at a computer center, in whichmany jobs contend for resources simultaneously.

Let us assume that N3

modes are used to resolve theflow field (N = 1024, for example, will be possible withinthe n ext two years). The total memory required (includingall three velocity components, pressure and varioushistory data) is then KDN

3for some constant KD of order

10, so we require the disk system to have memory sizeMD >KDN

3. The shared memory is assumed to be large

enough to hold several dozen two-dimensional planes ofdata , so tha t its size M

s>K

SN

2, where K

sis at least 3-10

times the number of planes of data stored in the sharedmem ory. Finally, the local mem ories must be largeenough to hold several "pencils" of one-dimensional d ata,so their size Mh^>KLN, where KL is 3-10 times thenum ber of pencils stored in each local memory. (Thevalues of these K factors depend on the number ofvariables needed at each mode for the most memory-intensive steps of the com putation and on the latency tim eof the storage device at the next h igher level.) If weassume that the size of the shared memory is P times thatof the local memories, that is, MS~PML, then we canavoid discussions of the detailed architectural intercon-nections among processors of the PPC.

Next we assume tha t a total of yN3

computations arerequired per time step, where y is the number ofoperation s per mode (or grid point) per time step. In fluiddynamics computations y is usually of order 250-5000,depending on the algorithm. Here y is a measure of the

computational complexity of the numerical method usedto solve th e flow equations (see the discussion in th e nextsection). We assume tha t the code is highly para llelizableand does not suffer from inefficiencies due to paralleliza-tion; that is, we assume EP — 1.

To proceed with the design of the PPC for ourturbulence problem we first choose MD and Ms asdescribed above. Next we choose the number of processorsP so that the computations can be accomplished in 10

6

seconds. Th at is, we choose P so that NlyN3K\(fPS,

where Nt is the num ber of time steps required and S is the

speed of each processor in flo ps. Typically Nt =: WON. Forexample, in the immediate future we can envisage S = 100megaflops and N = 1024, so that more than 1000 proces-sors will be required.

Each time step of the computation takes yN3/PS

seconds, and in an efficient design all data transfers mustalso be completed in tha t time. If data a re transferredbetween each processor and local memory at speed /uPIj

words per second, between each local memory an d sharedmemory at a speed /xLS , and between shared memory andfast disk at speed fiSD ; and if at each time step there areQSDN

3words transferred between disk and shared mem-

ory, and a tota l of QLS N3

words transfe rred between all lo-cal memories and snared memory, then we require

N3

/ " S D

N3

N3

—— = r =; y-N

3

(4)

where <r~ 1-2 is the typical number of operations that aprocessing element performs on each word of data tha t istransferred to it from a local memory. Thus, with S = 100megaflops, P = 1000, y = 1000, QSD = 20 and Q ^ = 50(typical values for a spectral turbulence simulation), wemust have/iSD —15 gigabytes/sec, ^LS ~40 megabytes/secand fiPL -800 megabytes/sec. If KD = 10 and N = 1024,then the disk size must be at least MD = 100 gigabytes,while M s and PM L may be an order of magnitude or more

smaller.The principal conclusion from this a nalysis using the

PPC model is that the solution of these large DNSproblems requires a correspondingly large storage device(a fast disk in the case of the PPC) with a high transfer ratebetween the corresponding storage components. Onemust scale up the numbers given in this example toestimate performance requirements for an efficient andeffective teraflop multiprocessor computer.

Modes: Discrete approximations to flowsJust as supercomputer architectures have undergonesignificant changes roughly every 20 years, so too have thenumerical methods that solve incompressible- and com-pressible-flow problems . Early work was based almostexclusively on finite-difference methods, which approxi-ma te derivatives by discrete differences. Then in the1960s, finite-element methods (based on variational for-mulations in terms of piecewise polynomial representa-tions of the solution) came to the fore. Spectral m ethods,discussed below, underwent significant developmentthroug h th e 1970s and '80s, and most c urre nt work on thedirect numerical simulation of turbulence uses them.Today the em phasis is on combining the best features of allthe previous methods to yield efficient and accuratehybrid flow solvers.

We distinguish between methods th at have been used

primarily for simulations of incompressible turbulenceand methods that have been used for simulations ofcompressible turbulence containing shock waves, whichtypically require special trea tm ent . For incompressibleflows we discuss spectral, spe ctral-element and particle

3 6 PHYSICS TODAY MARCH 19 93



oo

0 -

O

en

O

- 1 0

fl., =200

- 1

- 1

- 0 . 5

Andrei Kolmogorov's scaling law

is verified by direct numerical

simulations of turbulence using a512

3spectral code on the CM-200

at Los Alamo s. Isotropic energyspectra at various Taylor microscaleReynolds numbers RA were rescaledby the maximum dissipationwavenumber kp and E(kp) to givethe plot shown here. The Ri~'\50

line was obtained using a different(t ime-independent) forcing term.The inset at the lower left expandsthe vertical scale of the area in the

rectangle to show the closeagreement of the data with the slope

(k~sn

) predicted by Kolmogorov'slaw. (Courtesy of Zhen-Su She,

University of Arizona; Shiyi Chenand Gary D. Doolen, Los AlamosNational Laboratory; and Robert H.

Kraichnan.) Figure 4

NORMALIZED WAVENUMBER, LOG(k/kp)

methods, while for compressible flows we discuss hybrid

finite-difference methods, including flux-corrected trans-

port and piecewise parabolic methods. All these methodshave been used for direct numerical simulation and large-

eddy simulation of turbulence.

Spectral methods. In spectral methods the Navier-

Stokes equations are solved using series expansions in

terms of smooth functions such as complex exponentials

and orthogonal polynomials. The first direct numerical

simulation of homogeneous, isotropic turbulence3

used a

Fourier series representation of solutions of the incom-

pressible Navier-Stokes equation in a periodic box with

32 3modes. Fast transform techniques were employed to

move freely between Fourier and physical space represen-

tations of fields. The computational complexity of this

spectral algorithm is relatively low; for NS 1000 we obtain

^;=500, and more than 80% of the CPU time is spent onfast Fourier transforms. The key computational kernels

(or code segments) are the fast Fourier transforms and the

array transposes necessary to access different spatial

directions.

Let us illustrate these points by outlining how such a

spectral computer code is designed to solve the time-

discretized Navier-Stokes equations,

-u" - vn + -

V-v" + 1= 0

(5)

(6)

where Af is the time step, v" is the velocity field at time

step n, and to" = V X v" is the vorticity field. The time-

stepping scheme used in equation 5 leads to errors of order

(At)2. At the start of a time step we assume that v" and

vn ~ 1

are stored on the disk of the PPC in terms of their

complex Fourier coefficients \n(k,p,q) and v™

1(k,p,q).

The momenta (k,p,q) are the (x,y,z) wavenumbers. The

stages of the computation are given in the box on page 40.(See also figure 3.)

By optimizing memory allocations in the algorithm

shown in the box it is possible to achieve a parallel

implementation with Kzz6 and QSD = 18. Such a spectral

code with J V = 512 currently runs at 20 seconds per time

step in 32-bit precision on a 512-processor Intel Delta

computer14

(30 times faster than on a single-processor

Cray YMP) and at 30 seconds per time step on a 64-

kilobyte CM-200.15

These speeds, however, are less than

one-third of the code's theoretical peak speeds on these

computers because of interprocessor communication and

memory access delays, so that these machine resources are

not quite balanced according to the criteria developed for

the PPC.Similar spectral codes are now routinely used

4to

study boundary-layer flows and flows in channels using

Fourier representations parallel to the boundary but

using Chebyshev or Jacobi polynomials in the inhomoge-

neous directions. For these problems the required compu-

tational kernels include fast Fourier transforms, direct

matrix-vector multiplications, and inversions of tridiago-

nal matrices (matrices whose only nonzero elements are

on the diagonal and adjacent to it). The corresponding

complexity measure is y~800.

In the past decade spectral methods have been

extended to problems in complex geometries, such as flow

past a sphere. (See the cover of this issue.)

Spectral-element methods'6

combine some of the

best features of spectral methods with those of finite-

element methods by decomposing the domain into subdo-

mains within which the variables and geometry are

PHYSICS TODAY MARC H 1993 3 9



represented as high-order tensor products of spectralpolynomials. In this approach the re is only a weakcoupling between the dependent variables of adjacentsubdomains, resulting in relatively sparse matrices thatmu st be solved. The latt er featu re is critical to keepingthe memory requirements and the processing time, andhence th e com putational complexity, of the method w ithinreasona ble bounds. In addition, the intrinsic coarsegranularity (the "domain decomposition") of spectral-element methods leads naturally to a geometry-baseddistribution of work among processors that allows a highdegree of parallelism.17 The key computational kernelsare scalar products, matrix-vector multiplications andmatrix-matrix multiplications. The corresponding valuefor y is approximately 2500.

Particle methods have been used for simulating avariety of incompressible and compressible flows and forplasma simulations (see the article by John M. Dawson,Victor Decyk, Richard Sydora and Pau lett Liewer on page64). For incompressible flows two types of particlemethods are popular: vortex methods and lattice gases.Random vortex methods have been used to simulate high-Reynolds-number, mostly incompressible, turb ulen t flows,including s hear flows of chemically reacting species.18 In

methods of this sort vorticity is approximated by acollection of particles (or "vortex blobs") that carrydiscrete qua ntities of vorticity. The corresponding veloc-ity field is obtained from the vorticity field by theBiot-Savart law (by analogy with the deduction of amagnetic field from underlying current loops). Thecomputational kernels involve the solution of iV-bodyproblems for the interior of the domain and on theboundary of the flow, and the solution of a potential-flow

Steps in Typical Parallel Spectral Program

1. Import x-y planes of v" from disk storage (DS) toshared memory (SM).

2. Import x-pencils from SM to local memory (LM) an dcompute x-fast Fourier transform (FFT) of v" and a>".Result: \"(x,p,q), a"(x,p,q).

3. Export results of step 2 from LM to SM.

4. Import /-pencils from SM to LM and compute y-ffl.

Result: v"{x,y,q), <an(x,y,q).

5. Export results of step 4 from LM to SM to DS.

6. Import x-z planes from DS to SM.

7. Import z-pencils f rom SM to LM and computez-FFT. Result: v"(x,y,z), a

n(x,y,z). Then compute

r = v" Xo>" in physical space and perform inverse z-

FFT of r. Resu lt: t{x,y,q).8. Export r from LM to SM to DS.

9. Import x-y planes of r(x,y,q) and v" ~ ' (x,y,z) fromDS to SM.

10 . Import x-pencils of t(x,y,q) from SM to LM and

compute inverse x-FFT. Result: r(k,y,q).

11. Export results of step 10 from LM to SM.

12 . Import )/-pencils of t(k,y,q) from SM to LM and

compute inverse /-FFT of r. Resu lt: r(k,p,q).

13 . Solve for I I algebraically to impose incompressibil-

ity: U(k,p,q) = - Mr, + pr2 + qr3)/(k2

+ p2

+ q2).

(This equation is derived by applying equation 6 to

equation 5 and Fourier-transforming.)

14 . Import v" ~ ' (k,p,q) from SM to LM and evaluate

v" + ' (k,p,q) using the Fourier transform of equation5. Result: v"

+ 1(k,p,q).

15. Export v"+' from LM to SM to DS, completing the

time-step cycle.

problem to guarantee that the induced vorticity does notcause flow across the boundary. In addition, viscouseffects require the dynamic gene ration of vortex e lementsat the boundary to impose the condition th at fluid does notslip along the wall at the boundary.

Lattice methods, including lattice gases and latticeversions of the Boltzmann and Bhatnagar-Gross-Krook(BGK) kinetic equ ations,

19are intrinsically parallelizable

due to local interac tions and com munications. Theyinvolve a novel statistical mechanics of discrete particleswith discrete velocities whose average coarse-grainedbehavior follows the Navier-Stokes equations. Thesemethods are particularly effective in treating highlycomplex flows, such as porous media flows, multi-phase flows and flows over rough b oundaries. Recentlythere has been interest in the possibility of extendingthese techniques to perform large-eddy simulation ofturbulence.

Finally, it is possible to combine the application ofthese lattice or other low-order finite-difference descrip-tions in local regions with high-order spectral-elementdescriptions applied in the remainder of the region.

16

Hybrid difference methods. Flux-corrected trans-port m ethods were originally developed to tre at problems

involving strong shocks, blast waves and chemicallyreactive flows. More recently they have been used insimulating compressible turbulent flows. They enforcethe physical principles of positivity and causality on thenumerical solution of problems involving sharp disconti-nuities.20 These methods modify relatively conventionaldifference methods for incorporating hyperbolic conserva-tion laws by using solution-dependent flux limiters thatprevent the appearance of artificial extrema and henceartificial oscillations in the solution. Three-dim ensionalcompressible codes are developed using one-dimensionalsubroutines; this is justified mathematically by factoringevolution operators ("directional splitting"). A three-dimensional com putation req uires roughly 30 calls to one-

dimensional subroutines. The computational complexityis 7-2500.

The piecewise parabolic m ethod21

is a hybrid schemethat combines classical difference methods and high-orderinterpolation techniques constrained so that sharp flowfeatures are resolved using only about two compu tationalcells. In this method th ere is no explicit incorporation ofviscous dissipation; instead dissipation is introduced athigh wavenumbers by discretization errors that arise inapproximating the inviscid Euler equations.

21The

scheme also uses directional splitting. Subdomains, typi-cally three-dimensional bricks that constitute a part of athree-dimensional uniform grid, are assigned to individualnodes. The computational and data-communication com-plexity of the piecewise parabolic m ethod is due to local fi-nite-difference arithmetic and transfer of the five primi-tive variables residing along edges of the subdom ains. Thecomputational complexity is 7 s 2500.

Flow codes: Parallel simulations of turbulenceWe now briefly describe four applications of parallelcomputers to turbulent flow problems; the first twoinvolve incompressible flows, while the latter two involvecompressible supersonic flows.

Homogeneous turbulence. A 5123

spectral simula-tion

15has been performed on the 64K CM-200 SIMD

paralle l com puter to verify Kolmogorov's the ory of small

eddies. (See the artic le by Frisch and Orszag.) With thishigh resolution it was possible to simulate homogeneousturbulence with confidence up to Taylor microscaleReynolds numbers RA —200. In figure 4 we give a log-logplot of the energy spectra, rescaled by a characteristic

4 0 PHYSICS TODAY MARCH 19 93



t = 0.3r t = 1.0T t = 2.0T-

Decaying supersonic turbulence simulated using a three-dime nsional piece wise parabolic me thod on theCM-5 . Colors indicate normalized pressure, with values increasing from red to yellow to green to blue; r is thetime it takes a sound wave to propagate across the periodic com putationa l bo x. The volum e rend ering is basedon an opacity proportional to the negative velocity divergence, so that regions near shock waves are mostopaqu e. (Courtesy of David H. Porter and Paul R. W oo dw ard , University of Minneso ta; and Ann ick Pouquet,Observatoire de la Cote d'Azur.) Figure 5

dissipation w avenumber kp, for several Reynolds numbersRx. The resu lts plotted in this figure show tha t Kolmogor-ov's universal scaling theory collapses all the data to asingle curve and thereby gives an accurate description ofturbulence energetics.

Drag reduction by riblets. One of the moreinteresting methods for reducing boundary-layer draguses "riblets"—microgrooves aligned with the mean flowdirection. The skins of some species of fast-swimmingsha rks have rible ts. Riblets were successfully em ployed inthe 1987 America's Cup competition and have alreadybeen tested at flight conditions. It has been found th atriblets can reduce drag by 4-12% for flow over a flat plate.

However, no clear explanation of the mechanism ofturbulent drag reduction by riblets has yet been con-firmed. To advance the understanding and expedite thedesign, placement and shape of riblets, direct numericalsimulation of flows with riblets have been performed usinga hybrid spectral-element-Fourier spectral method on theIntel Gamma and Delta Touchstone parallel computers.

22

With 512 processors, speeds in excess of 3 gigaflops areobtained (3 seconds per time step for 100 elements ofresolution 10 x10 x25 6). Figure 1 shows the instanta-neous streamwise velocity component of the three-dimen-siona l flow field a t three different cross-flow planes. Thesimultaneous visualization of flow structures on the upper(smooth) wall and th e lower (riblet) wall leads to quantit a-

tive predictions and to a qualitative model of theturbulence production and associated shear stress.

Supersonic, compressible homogeneous turbu-lence. High-resolution (up to 512

3) simulations of super-

sonic homogeneous turbulence have been carried out onthe parallel CM-5 computer using the piecewise parabolicmethod

21 and on the Intel Touchstone prototype using asixth-order finite-difference method.23 For the CM-5 codethe data are partitioned into 512 blocks mapped onto 512nodes; the code runs at approximately 1.5 gigaflops usingonly the sca lar CM-5 chips. Figure 5 shows a perspectivevolume rendering of the pressure field of a turbulencedecay run . The sim ulation begins with a field of homoge-neous turbulen ce with

rmsMach num ber 1.1; the goal is to

see how shock w aves develop as the turbulence dissipates.The figure shows the pressure at times 0.3r, l.Or and 2.Or,where r is the time tha t it takes a sound wave to propagateacross the computational box. Apparently the number of

shocks increases and the typical shock stren gth decreaseswith time, although there are still some fairly largepressure jumps even at later times. Such simulationsshow that in a supersonic flow vorticity is produced byshock curvature and shock intersections rather than bythe random vortex stretching mechanism that is dominantin subsonic and incompressible flows.

Supersonic reacting shear layer. Parallel flux-corrected tran spor t c omputations of supersonic, multispe-cies, chemically reac ting, exothermic turb ulen t flows haverun at 800 megaflops on the CM-200 with 16K processorsand have been used to evaluate new concepts for high-speed propulsion.24 Figure 6 shows the hydrogen mole

fraction at an advanced stage in the mixing of twocounterflowing supersonic streams of hydrogen and air ina small (lcmxlcm) region. Such conditions might befound in the engine of the proposed National AerospacePlan e. Because the computations involve nine speciesundergoing physicochemical processes (including convec-tion, therm al conduction and chemical reactions), they taxthe c apabilities of the most powerful parallel computers.

PerspectiveExperience has shown that each time a new supercom-puter is introduced, it takes several years for software tomature on the new architecture, and usually by the timethe software has matured, new versions of the computersystem ar e available. Neve rtheless, it has been possible tomake effective use of the new architectures at an earlydate for computational fluid dynamics (CFD), even with-out effective, gene ral purpose software. In fact, it is in th eearly years of new architectures that many of the mostim por tan t scientific discoveries occur. To achieve suchresults, one must understand the basic computer architec-ture and its optimal use, which may require using low-level (even assembly) languages. The knowledge gained inthese leading-edge CFD applications has been of directbenefit to developers of compilers and higher-level lan-guages. Effective collaborations between CFD scientistsand computer hardware and software experts will becritical to the development of the new teraflop computerenvironments.

Electronic component speeds and densities haveimproved by a factor of more than 105 in the last half-cen-tury. This development is unrivaled in other fields of




Supersonic streams

of counterflowing,

chemically reacting

hydrogen and air

simulated by flux-

corrected transport

on the CM-2. Color

indicates hydrogen

mole fraction: Deep

red represents pure

hydrogen; deep

purple, pure air. Thelower flow is moving

from left to right.

(Courtesy of Patrick

Vuillermoz, O N E R A ;

and Elaine Oran,

Naval Research

Laboratory.)

Figure 6

human endeavor; if automobiles had undergone similarimprovements, today a Cadillac would sell for less than apenny, or it would be capable of a peak speed in excess of

1% of the speed of light, or one gallon of gas would sufficefor about ten trips to the Moon. Despite these remarkableadvances in computer electronics, the motivating forcebehind computer developments has been (and will likelycontinue to be) the grand challenge applications. Indeed,it was the application of numerical weather forecastingthat inspired the British meteorologist Lewis Fry Richard-son in 1922 to foresee the use of MIMD parallel computers.(See the box on page 37.)

In the same way, the foresight of CFD scientistsfollowing in Richardson's tradition will likely drive manyof the most significant future computer developments. Weexpect that continued development of hybrid numericalmethods, in conjunction with the development of physical

models (based on fundamental theory and integrated withthe results of prototype experiments) and the considera-tion of computer architectures like the Prototype ParallelComputer, will form the basis for breakthroughs on thegrand challenges in fluid mechanics.

We would like to acknowledge our colleagues, too numerous tomention h ere, who have provided us with up-to-date informationin this rapidly developing field.

References1. "Grand Challenges 1993: High Performance Computing and

Communications," report by the Committee on Physical,

Mathematical and Engineering Sciences, Federal Coordinat-ing Council for Science, Engineering and Technology, Wash-ington, D. C. (1992).

2. H. W. Emmons, Annu. Rev. Fluid Mech. 2, 15 (1970).

3. S. A. Orszag, G. S. Pa tterson , Phys. Rev. L ett. 28,76 (1972).

4. M. Y. Hussaini, R. G. Voigt, eds., Instability and Transition,vols. I and II, Springer-Verlag, New York (1990). J. Kim, P.Moin, R. Moser, J. Fluid Mech. 177, 133 (1987). P. Spalart, J.Fluid Mech. 187, 61 (1988).

5. G. E. Karniadakis, Appl. Num. Math. 6,85 (1989). L. Kaiktsis,G. E. Karn iadakis, S. A. O rszag, J. Fluid Mech. 231, 501(1991). A. G. Tomboulides, S. A. Orszag, G. E. Karn iadak is,preprint AIAA-93-0546, Am. Inst. of Aeronautics and Astro-nautics, New York (January 1993).

6. B. Galperin, S. A. Orszag, eds., Large Eddy Simulations ofComplex Engineering and Geophysical Flows, CambridgeU. P., New York (1993).

7. S. A. Orszag, V. Yakhot, W. S. Flannery, F. Boysan, D. Choud-hury, J. Maruzewski, B. Patel, in Near-Wall Turbulent Flows,

R. M. So, C. G. Speziale, B. E. Launde r, eds., Elsevier, NewYork (1993).

8. A. Jameson, Science 245, 361 (1989); Aerospace America 30,

42 (1992). R. K. Ag arwal, J. C. Lewis, in Symp. on HighPerformance Computing for Flight Vehicles, Washington,D. C, 7-9 December 1992, in press. M. Y. Hussaini, in 11thInt. Conf. on Numerical Methods in Fluid Dynamics, D. L.Dwoyer, M. Y. Hussaini, R. G. Voigt, eds., Springer-Verlag,New York (1989), p. 3.

9. A. J. Smits, Exp. Thermal Fluid Sci. 5, 579 (1992).

10. F. T. Leighton, Introduction to Parallel Algorithms and Ar-chitectures, Morgan and Kaufmann, San Mateo, Calif. (1992).See also IEEE Spectrum, September 1992.

11. S. L. Joh nsson, in Topics in Atmospheric and Oceanic Sci-ences, Springer-Verlag, New York (1990), p. 231.

12. G. E. Amdahl, in Proc. AFIPS Spring Joint Computer Conf,Atlantic City, N. J., 18-20 April 1967, Thompson, Washing-

ton, D. C. (1967), p. 483.13. J. Dongarra, W. Gentzsch, Parallel Computing 17, 1067

(1991).

14. A. Wray, R. Rogallo, "Simulation of Turbulence on the IntelDelta Gamma," NASA Technical Memorandum, April 1992.

15. S. Chen, G. D. Doolen, R. H. Kraic hnan , Z.-S. She, Phys.Fluids A 5, 458 (1993). Z.-S. She, S. Chen , G. D. Doolen, R. H.Kraich nan, S. A. Orszag, submitted to Ph ys. Rev. Lett. (1993).

16. G. E. Karniadakis, S. A. Orszag, E. M. Ronquist, A. T. Patera,in Incompressible Fluid Dynamics, M. D. Gunzburger, R. A.Nicolaides, eds., Cambridge U. P., New York (1993), in press.G. E. K arniad akis, S. A. Orszag, in Algorithmic Trends forComputational Fluid Dynamics, M. Y. Hussaini, A. Kumar,M. Salas, eds., Springer-Verlag, New York (1993), in press.

17. P. Fischer, A. T. Patera, J. Comput. Phys. 92, 380 (1991).18. A. J. Chorin, J. Comput. Phys. 27, 428 (1978). A. Leonard, J.Comput. Phys. 37, 289 (1980). J. A. Sethian, A. F. Ghoniem, J.Comput. Phys. 74, 283 (1988).

19. G. Doolen, ed., Lattice Gas Methods for Partial DifferentialEquations, Addison-Wesley, Redwood City, Calif. (1989). F.Higuera, S. Succi, R. Benzi, Europhys. Lett. 9, 663 (1989).Y. H. Qian, D. d'Hum ieres, P. Lallemand, Europhys. Lett. 17,479 (1992).

20. E. S. Oran, J. P. Boris, Numerical Simulation of ReactiveFlow, Elsevier, New York (1987).

21. D. H. Porter, A. Pouquet, P. R. Woodward, Theor. Comput.Fluid Dynamics 4, 13 (1992).

22. D. Chu, R. D. Henderson, G. E. Karniadakis, Theor. Comput.

Fluid Dynam ics 3, 219 (1992). R. D. Henderson, PhD thesis,Princeton U., Princeton, N. J. (1993).

23. G. Erlebacher, private communication (1992).

24. E. S. O ran, J. P. Boris, C. R. Devore, J. Fluid Dynamics Res.10, 251 (1992). •

Date post:	09-Apr-2018
Category:	Documents
Upload:	zine68
View:	215 times
Download:	0 times

Nodes Modes Codes

Documents