Date post: | 13-Dec-2015 |
Category: |
Documents |
Upload: | ira-sanders |
View: | 215 times |
Download: | 1 times |
New Features in ML
2004 Trilinos Users Group Meeting2004 Trilinos Users Group Meeting
November 2-4, 2004
Jonathan Hu, Ray Tuminaro,Marzio Sala, Michael Gee, Haim Waisman
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy under contract DE-AC04-94AL85000.
Overview
• Multigrid Options– ParMETIS– Zoltan– Repartitioning
• Analysis Tools– GGB method– Memory usage– Visualization
• Documentation
Traditional Coarsening
• Coarsening rate fixed: h/H 3n in n-d problem• What can go wrong?
• AMG complexity goes up ∑[nnz(A(j))] / nnz(A(1)) • result: more time per iteration
• In parallel, each coarse grid has latency penalty
Aggressive Coarsening
● Idea: use graph partitioner to make larger aggregates
– METIS / ParMETIS
● Coarsening rate: user-determined
Fewer levels: mitigates coarse grid latency
Smaller + fewer coarse
grids → lower complexity
Convergence rate could suffer --with-ml_metis
--with-ml_parmetis3x
method smoothers coarse DOFs medium DOFs avg its avg time1-level DD ilu 113 1502-L geom ilu-gmres/ilu 32336 24 2553-L AMG gs-ilu-superlu 1292 129444 31 38
3D transient LES (13M DOFs/1K node Cplant)
App: MPSalsa Airport Simulation
Aggressivecoarsening
Coarsening with Zoltan
• Main idea– App provides coordinates on fine level (only)
– Call to Zoltan for coarsening (RCB algorithm)
• ML internally creates coordinates for coarser levels– Centers of mass
• Status: still in testing phase
-- with-ml_zoltan
A
Repartitioning to Improve Parallel Performance
• Load balances operators in multigrid hierarchy
• Motivation– App load balancing may be non-optimal for linear solver– App may take large % of memory (e.g., multiphysics)
• Linear solver gets remaining memory• Result: low parallel efficiency
– Coarsening rate may slow as get to few unknowns / proc
• Main idea– Determine “good” partitioning with ParMETIS– Construct permutation matrix P based on partitioning– Apply to multigrid coarse grid operators
APProc. 1
Proc. 3
Proc. 2 Proc. 1
Proc. 2
Repartitioning applied toZpinch simulation
210 450 600 3600
No repartioning X X X X
Repartitioning310 / 492s
284 / 479s
257 / 530s
X*
Before repartitioning on Janus…
210+ processor simulations failed
App-supplied linear system already imbalanced
Find modes not captured by MG
adaptive filter extra coarse grid
MG GGB
GMRES \ QMR
Adaptive AMG
GMRES(20) + GGB/ML
GMRES(150) + ML
GGB
GB
Analysis / Profiling Tools
• Aggregate visualization
– Assess aggregate quality
– User provides fine-level coordinates
– CoM used as coordinates on coarser levels
– Stats calculated on avg size, diameters
– Currently using 3rd party package, OpenDX
• Error visualization
Analysis/Profiling Tools (cont’d)
• Matrix performance
– Matrix statistics– Eigen analysis– Detailed operator profiling
• Apply & communication time
MultilevelPreconditioner::AnalyzeMatrixCheap()
ML_Operator_Profile()
• Internal memory profiling– Lightweight– Highwater mark, largest free block– Postprocessing for plotting
Updated Documentation
• ML User’s Guide, version 3.0– Configure & build information– MultilevelPreconditioner() class intro– Exhaustive options list
• ML Developer’s Guide– Configuration, building, testing details– Suggested practices– Intro to tools on software.sandia.gov
• Updated web pages– Now built automatically each night– Incorporates doxygen comments– http://software.sandia.gov/trilinos/packages/ml