+ All Categories
Home > Documents > A 2589 Line Topology Optimization Code Written for the ... · A 99 line topology optimization code...

A 2589 Line Topology Optimization Code Written for the ... · A 99 line topology optimization code...

Date post: 11-May-2020
Category:
Upload: others
View: 16 times
Download: 0 times
Share this document with a friend
23
A 2589 Line Topology Optimization Code Written for the Graphics Card Stephan Schmidt , Volker Schulz July 23rd, 2009 S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 1 / 23
Transcript
  • A 2589 Line Topology Optimization Code Written forthe Graphics Card

    Stephan Schmidt, Volker Schulz

    July 23rd, 2009

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 1 / 23

  • Outline

    1 The Graphics CardProcessing UnitMemory ManagementExample Applications

    2 Linear Elasticity and Topology OptimizationDisplacements and ComplianceTopology Optimization ProblemSIMP Method

    3 GPU Implementation

    4 Results

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 2 / 23

  • Literature

    S. Ananiev.On equivalence between optimality criteria and projected gradientmethods with application to topology optimization problem.Multibody System Dynamics, 13(1):25–38, 2003.

    M. P. Bendsøe and O. Sigmund.Topology Optimization – Theory, Methods and Applications.Springer, Berlin, Heidelberg, New York, 2nd edition, 2004.

    M. Giles.Using NVIDIA GPUs for computational finance.http://people.maths.ox.ac.uk/~gilesm/hpc/.

    O. Sigmund.A 99 line topology optimization code written in matlab.Structural and Multidisciplinary Optimization, 21(2):120–127, 2001.

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 3 / 23

    http://people.maths.ox.ac.uk/~gilesm/hpc/

  • The 3D Accelerator Graphics Card

    Traditionally:Highly specialized processor (triangular data types, sub floatprecision, no integers)

    Today:Unified shader: Autonomous compute deviceSIMD / Stream architectureHighly parallel, up to 512 threads, in order executionIdeal for vector processingSpecial programming extensions (CUDA, OpenCL)Peta-Flop supercomputers have GPU-like architecture

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 4 / 23

  • CPU vs GPU

    CPU:Cores independentMemory accesses hidden bycaches automatically10.6 GB/s to RAM (PC2-5300DDR2), optimized for latency

    GPU:Cores execute sameinstructions on differentmemory address (warp = 32)Memory access hidden bycoalescence and parallelismby programmer141.7 GB/s to RAM, optimizedfor bandwidth

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 5 / 23

  • GPU Memory Hierarchy

    Device Memory (RAM)Access time: Up to 600 clock cycles (= 150 float add/mult)Remedy: Coalescence: Channel load/store instructions (zeropadding, pitch)!!Unknowns per thread should be multiple of 4 or 2 but not 3

    Shared MemoryDelivers 32 Bit per clock cycle16 KB in 16 Banks: Bank conflict when too much data needed atthe same time or unstructured access (stride)

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 6 / 23

  • GPU Memory Hierarchy

    Constant MemoryRead onlyCached with broadcast if all threads access same address

    Registers8192 per Block, access 0 cycleShared with Shared Memory, potential bank conflicts

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 7 / 23

  • Programming Paradigm

    Grid Block

    Block

    Block

    Thread

    Thread

    Thread

    Thread

    Thread

    Thread

    Thread

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 8 / 23

  • Summary GPU Computing

    Good Problems:Dense matvec, Sparse banded matvecFractalsFFTCompute intensive PDE Solvers (High order FVM, SpectralElements, Lattice Boltzmann)Structured meshes (cartesian)

    Bad Problems:Unstructured sparse matvecUnstructured mixed element PDE schemesData intensive tasks

    ⇒ Numerical schemes should be designed with computer architecturein mind.

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 9 / 23

  • GPU Computing in Trier (with M. Siebenborn)

    Finite Volume solver for shallow water and Euler equations, JSTScheme, scalar dissipation, structured bodyfitted mesh

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 10 / 23

  • Linear Elasticity

    Deformation of a solid body under forces: Displacement vectoru ∈ R3.Linear strain tensor

    �ij :=12

    (∂ui∂xj

    +∂uj∂xi

    )Voight notation for symmetric strain tensor

    �̃ := (�11, �22, �33, �12, �13, �23)T =: Bu

    Cauchy stress tensor: Young’s modulus E , Poisson’s ratio ν

    σ =E

    (1 + ν)(1− 2ν)

    26666641− ν ν νν 1− ν νν ν 1− ν

    1− 2ν1− 2ν

    1− 2ν

    3777775 �̃=: CBu

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 11 / 23

  • Linear Elasticity and Finite Elements

    Weak formulation

    a(u, v) =∫Ω

    (Bv)T CBu dS = L(u) ∀v ∈ V .

    Matrix notation

    K (Ω)u = f

    Forces, loads, supports: fCompliance

    c(u) = uT f = uT Ku,

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 12 / 23

  • Topology Optimization Problem

    Mathematical Problem

    min(u,Ω)

    J(u,Ω) : = uT K (Ω)u

    subject toK (Ω)u = fVol(Ω) = V0

    How to deal with the unknown Ω?Level-Set method: Ω is zero level of function θ

    Extract zero-level curve of θ ⇒ unstructured curve⇒ unstructureddiscretization of Ω

    X-FEMSpecial treatment of bisected elements

    Solid Isotropic Material with Penalization (SIMP), akahomogenization approach

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 13 / 23

  • Solid Isotropic Material with Penalization (SIMP)

    Overlay Ω with cartesian gridPseudo-Density in each Finite Element ρ = (ρ1, · · · , ρN)T :

    min(u,ρ)

    J(u, ρ) : = uT K (ρ)u

    subject toK (ρ)u = f

    N∑e=1

    ρe = V0

    ρe ∈ {0,1}Replace ρe ∈ {0,1} by ρe ∈ [0,1]

    a(u, v) =N∑

    e=1

    ∫Ωe

    (Bv)TρpeCBu dS = L(u) ∀v ∈ V

    Penalty parameter pS. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 14 / 23

  • Optimality Criteria

    Lagrangian:

    L =uT K (ρ)u + λ

    (N∑

    e=1

    ρe − V0

    )+ µT (K (ρ)u − f )

    +N∑

    e=1

    αe(−ρe) +N∑

    e=1

    βe(ρe − 1)

    Optimality condition (self-adjoint in µ):

    −uTe∂Ke∂ρe

    ue + λ− αe + βe = 0

    N∑e=1

    ρe − V0 = 0

    −ρe = 0 or ρe − 1 = 0

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 15 / 23

  • Optimality Criteria Method

    Gradient of Lagrangian without constraint ρe ∈ {0,1}:

    Be :=1λ

    uTe∂Ke∂ρe

    ue = 1

    Update for ρe:

    ρe ←

    8 0, damping η = 0.5Bisection for λOC-Update can be interpreted as special projected gradientmethod for ρe ∈ [0,1] constraintImplemented in One-Shot, i.e. inexact gradient

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 16 / 23

  • Finite Elements: CPU Code

    1: Compute RefStiff[i][j] ∈ R24×242: uk+1 = 03: for all Finite Elements T do4: for all vertices i of T do5: t = 06: for all vertices j of T do7: ig = Global-Index i8: jg = Global-Index j9: t = t + ρT RefStiff[i][j]uk [jg]

    10: end for11: uk+1[ig] = uk+1[ig] + t12: end for13: end for

    Cons:Requires 32 global loadoperations per elementRequires 8 global storeoperations per elementFinal store must be atomic!Prohibitive for GPU!

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 17 / 23

  • Memory Coalescence

    Memory access is expensive!Strategy: Matrix-free FEM withCGCartesian Mesh: nx × ny × nztensor meshParallelism: Process matvecper 2D slice and stream ink -planePartition 2D slice inwarpsize × n blocks, where nis determined from avaliableshared memory

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 18 / 23

  • Finite Elements: GPU Code

    1: Compute RefStiff[i][j] ∈ R24×24 and copy to constant memory2: Partition x-y -plane in warpsize × n patches, launch GPU blocks3: Init shared memory, synchronize threads4: for all k -planes do5: (i , j) = Thread-ID, Res = 0 in thread register6: Discard slice, load new one, synchronize7: for all Elements T that have (i , j , k) as a vertex do8: (i2, j2, k2) = local index (i , j , k) has in T9: uthread = u(i , j , k) from shared memory

    10: for all (i1, j1, k1) vertex of T do11: Res = Res + ρT RefStiff[(i1, j1, k1)][(i2, j2, k2)]uthread12: end for13: end for14: Synch threads15: Upload Res from shared to global memory16: end for

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 19 / 23

  • 3D Cantilever

    180× 180× 360 mesh46.5 · 106 unknowns

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 20 / 23

    Lavf52.31.0

    cant_opt.mp4Media File (video/mp4)

  • 3D Cantilever

    (cantsmooth.u3d)

    80× 80× 160 meshFull load in k -direction

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 21 / 23

  • Speed-Up

    Time for 1000 CG iterations on 180× 180× 360 mesh

    00:05:27,60 1Core2Duo E6600 1 Core 05:18:27,38 58,33Core2Duo E6600 2 Core 02:51:28,55 31,41Core2Duo T9600 1 Core 04:37:29,92 50,82Core2Duo T9600 2 Core 01:58:50,87 21,77

    1000 CG Iterations 180x180x360 MeshGeForce GTX280

    00:00:00,00

    01:12:00,00

    02:24:00,00

    03:36:00,00

    04:48:00,00

    06:00:00,00GeForce GTX280Core2Duo E6600 1 CoreCore2Duo E6600 2 CoreCore2Duo T9600 1 CoreCore2Duo T9600 2 Core

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 22 / 23

  • Conclusions and Future Work

    ConclusionsGPU very fast for problems with specific structureProgramming: Easy to pick up, hard to master

    Future WorkMultigridFluid / structure interactionMulti-GPUHeterogenous CPU / GPU parallelismAdaptive load balancing

    Code avaliablehttp://www.mathematik.uni-trier.de/~schmidt/gputop

    S. Schmidt, V. Schulz (University of Trier) GPU Topology Optimization July 23rd, 2009 23 / 23

    http://www.mathematik.uni-trier.de/~schmidt/gputop

    The Graphics CardProcessing UnitMemory ManagementExample Applications

    Linear Elasticity and Topology OptimizationDisplacements and ComplianceTopology Optimization ProblemSIMP Method

    GPU ImplementationResults


Recommended