Femap and NX Nastran
Performance Optimization
Unrestricted © Siemens AG 2016
Siemens PLM Software
A Little History
1970’s Launch Vehicle Analysis
Saturn V Dynamic Loads
400 DOF for “refined model”
Unrestricted © Siemens AG 2016
Siemens PLM Software
Current Computational Expectations
Launch Vehicle Analysis, Dynamic Loads
• 2016 SLS; >>10 Million DOF
• 1 Load Cycle = 20 TB of data to process
Unrestricted © Siemens AG 2016
Siemens PLM Software
Improving NX Nastran Performance
Increased problem size
• 1970 – (300) DOF (large model)
• 2004 – (1.2 million) DOF (large model)
• 2011 – (10 – 20 million) DOF(typical models)
• 2016+ – (30 – 50 million) DOF (expected)
Solutions
• Selecting the right hardware and OS
• Utilizing hardware efficiently - Tuning OS settings
• Defining appropriate NX Nastran keywords and parameters for the solve
• Take advantage of nastran parallel processing
• Select appropriate solution methods to reduce elapsed time
Unrestricted © Siemens AG 2016
Siemens PLM Software
Hardware and OS Selection
• Processors• Prefer faster processors
• Choose large L2 or L3 processor cache. Larger caches provide improved performance
• Prefer multi-core processors
• Memory• Install as much memory as possible. Unallocated memory will be used by the OS for I/O
cache.
• Disk• Increase disk performance by using SSD disks. Faster I/O leads to reduced elapsed time.
• PCIe disks are a new option. Actually outperforms SATA or SCSI hosted SSD
• Prefer multiple disks (1 + 4). One for the OS and the remaining disks in RAID0 configuration
for Nastran scratch
Unrestricted © Siemens AG 2016
Siemens PLM Software
Hardware and OS Selection
• GPU and Intel MIC
• GPU processing requires expensive($3000) high end card(Firepro W9100 with 16GB)
• GPU card requires enough memory to hold Nastran module data in core
• GPU processing only helps for special problems(freq response with 5000+ modes)
• Technology changing rapidly
Unrestricted © Siemens AG 2016
Siemens PLM Software
Hardware Selection
• Priorities for getting the most performance for the least money
• Maximum number of fast cores with large cache
• Add as much RAM as possible
• Maximize I/O bandwidth and disk speed using multiple disks
• Add GPU processing for some large dynamics problems
Unrestricted © Siemens AG 2016
Siemens PLM Software
OS Settings
• Prefer an Operating System that does efficient I/O operations
• On Windows OS, exclude a list of files/folders from active monitoring
• I/O Cache – Don’t Write to Disk, Use RAM Application I/O cache
NX Nastran = smem and buffpool
OS I/O cache
Device driver I/O cache
• Cache Performance depends on the hardware and on the operating system
• For efficient disks and OS cache, NX Nastran I/O cache (smem, bpool)
performance is expected to be marginal
Unrestricted © Siemens AG 2016
Siemens PLM Software
I/O Cache and Paging - Windows
• Reasons for Unresponsive System• As file size becomes larger than system memory, the OS runs out
of memory
• OS cache manager will page out last unused memory
• Windows Default I/O Cache is 1TB
• Pages from nastran can be paged out to accommodate I/O cache
• Prevention• Limit windows I/O cache to 25% -50% of physical memory using
“cache_tool” (available on request)
• Turn off file cache – Add command line option
“sysfield=buffio=yes,raw=yes”
To
tal P
hysic
al M
em
ory
O/S
Oth
er
NX
Na
str
an
I/O
Ca
ch
e
Unrestricted © Siemens AG 2016
Siemens PLM Software
NX Nastran Settings:
Use the Default LP-64 Version until Nastran says otherwise
• How do you know?
• Fatal message in F06
• Inspect F04 file
• 16GB Ram is the minimum to take advantage of ILP-64
There are two 64 bit versions of NX Nastran:
LP-64• 4-Byte Words
• 8 GB RAM limit
• Default version when running through FEMAP
ILP-64• 8-Byte Words
• 20 TB RAM limit, which is really the hardware RAM limit
• Optional version
Unrestricted © Siemens AG 2016
Siemens PLM Software
NX Nastran Settings: Memory
• Starting with NXN 10 new default settings in rcf file
• buffsize=32769
• memory=.45*physical
• smem=20.0X
• buffpool=20.0X
• Robust settings that are more appropriate for large models and machines with more memory
• Inspect the F04 file to see if you have optimum settings for your model
Note: SMEM is “ramdisk” and if large enough for all scratch requirements, then there is essentially no I/O to disk. Check F04
file summary to see the details for each run.
Unrestricted © Siemens AG 2016
Siemens PLM Software
** MASTER DIRECTORIES ARE LOADED IN MEMORY.
USER OPENCORE (HICORE) = 804910800 WORDS
EXECUTIVE SYSTEM WORK AREA = 316925 WORDS
MASTER(RAM) = 78676 WORDS
SCRATCH(MEM) AREA = 268443648 WORDS ( 8192 BUFFERS)
BUFFER POOL AREA (GINO/EXEC) = 268427231 WORDS ( 8189 BUFFERS)
TOTAL NX NASTRAN MEMORY LIMIT = 1342177280 WORDS
NX Nastran: Memory Management
Scratch (RAM)
Master (RAM)
Buffer Pool Area
User Open Core
Executive System
Work Area
F04 file Reports the allocation detailsM
em
ory
(fr
om
“m
em
” ke
yw
ord
)
Me
mo
ry fo
r F
ile a
nd
Exe
cu
tive
Ta
ble
s
Unrestricted © Siemens AG 2016
Siemens PLM Software
NX Nastran Settings: Memory Guidelines
*** USER INFORMATION MESSAGE 4157 (DFMSYN)
PARAMETERS FOR SPARSE DECOMPOSITION OF DATA BLOCK KLL ( TYPE=RDP ) FOLLOW
MATRIX SIZE = 70345 ROWS NUMBER OF NONZEROES = 2701957 TERMS
NUMBER OF ZERO COLUMNS = 0 NUMBER OF ZERO DIAGONAL TERMS = 0
CPU TIME ESTIMATE = 78216 SEC I/O TIME ESTIMATE = 25 SEC
MINIMUM MEMORY REQUIREMENT = 1364 K WORDS MEMORY AVAILABLE = 32615 K WORDS
MEMORY REQR'D TO AVOID SPILL = 12305 K WORDS MEMORY USED BY BEND = 3651 K WORDS
EST. INTEGER WORDS IN FACTOR = 87006 K WORDS EST. NONZERO TERMS = 174758 K TERMS
Word Size = 8 bytes (ILP-64 – long integers)
Word Size = 4 bytes (LP-64 – short integers)
• Specify enough memory to avoid disk spillover
• at least 1.2 to 1.3 times the memory required to avoid spill
• Do not specify more than 50% of the memory for NX Nastran. This will leave the OS more room for I/O
cache( unless SMEM can hold all of scratch)
• Insufficient memory can affect re-ordering method leading to very slow matrix decomposition. Make sure
either BEND or METIS method is selected
Unrestricted © Siemens AG 2016
Siemens PLM Software
NX Nastran Settings: Memory Guidelines
*** USER INFORMATION MESSAGE 4157 (DFMSYN)
PARAMETERS FOR SPARSE DECOMPOSITION OF DATA BLOCK KXX ( TYPE=RDP ) FOLLOW
MATRIX SIZE = 396090 ROWS NUMBER OF NONZEROES = 13353788 TERMS
NUMBER OF ZERO COLUMNS = 0 NUMBER OF ZERO DIAGONAL TERMS = 0
CPU TIME ESTIMATE = 388 SEC I/O TIME ESTIMATE = 0 SEC
MINIMUM MEMORY REQUIREMENT = 6045 K WORDS MEMORY AVAILABLE = 784888 K WORDS
MEMORY REQR'D TO AVOID SPILL = 28981 K WORDS MEMORY USED BY BEND = 13951 K WORDS
EST. INTEGER WORDS IN FACTOR = 94086 K WORDS EST. NONZERO TERMS = 195026 K TERMS
ESTIMATED MAXIMUM FRONT SIZE = 2280 TERMS RANK OF UPDATE = 32
*** TOTAL MEMORY AND DISK USAGE STATISTICS ***
+---------- SPARSE SOLUTION MODULES -----------+ +------------- MAXIMUM DISK USAGE -------------+
HIWATER SUB_DMAP DMAP HIWATER SUB_DMAP DMAP
(WORDS) DAY_TIME NAME MODULE (MB) DAY_TIME NAME MODULE
30524546 14:20:46 XREAD 251 READ 3292.938 14:21:48 XREAD 251 READ
Compare to the HIWATER usage toward the end of the f04 file:
** MASTER DIRECTORIES ARE LOADED IN MEMORY.
USER OPENCORE (HICORE) = 784899390 WORDS
EXECUTIVE SYSTEM WORK AREA = 218621 WORDS
MASTER(RAM) = 76340 WORDS
SCRATCH(MEM) AREA = 819300 WORDS ( 100 BUFFERS)
BUFFER POOL AREA (GINO/EXEC) = 418353 WORDS ( 51 BUFFERS)
TOTAL NX NASTRAN MEMORY LIMIT = 786432004 WORDS
Unrestricted © Siemens AG 2016
Siemens PLM Software
Memory Available
> Memory Required to Avoid Spill
Memory Available
< Memory Required to Avoid Spill
Memory Available
>> Memory Required to Avoid Spill
NX Nastran Settings: Memory …
Unrestricted © Siemens AG 2016
Siemens PLM Software
NX Nastran Settings: Memory
• Even when memory is sufficient for matrix decomposition, other modules such as MPYAD might make
multiple passes when memory is insufficient. Multiple passes translates to more I/O
12:09:45 143:59 5182.9G 0.0 17602.1 0.0 DISPRS 293 SMPYAD BEGN
METHOD 1 NT, STORAGE 2, NBR PASSES= 4, EST. CPU= 409.3, I/O= 82.3, TOTAL= 491.6
12:09:45 143:59 5182.9G 4.0 17602.1 0.0 MPYAD BGN P=4
12:12:13 146:27 5206.2G 23821.0 17817.4 215.3 MPYAD PASS= 1
12:14:43 148:57 5228.8G 23199.0 18031.6 214.2 MPYAD PASS= 2
12:17:13 151:27 5251.5G 23190.0 18246.0 214.4 MPYAD PASS= 3
12:19:43 153:57 5274.1G 93414.0 18460.5 858.4 MPYAD END
Number of Passes; increase
memory to eliminate
Unrestricted © Siemens AG 2016
Siemens PLM Software
Settings: Scratch Directory
It is important to specify the scratch file folder for both Femap
and Nastran
• Scratch folder should point to a fast disk or disks configured in a RAID array
(RAID0)
• Prefer local disks over network mounted
• Scratch folder pointing to a generic network file system (NFS) will have significant
performance penalties because slow I/O goes over a general shared network
• For Nastran Set “sdir” keyword in the rcf file
SCRATCH
Unrestricted © Siemens AG 2016
Siemens PLM Software
Settings: Scratch Directory
In Femap, set File/Preferences
Use Interfaces tab to set Nastran scratchUse Database tab to set Femap scratch
Unrestricted © Siemens AG 2016
Siemens PLM Software
NX Nastran: Parallel Processing
Types of Parallelism:
Shared memory (SMP)
• Enabled with standard installation
• No extra licensing required
Distributed memory (DMP)
• Extra installation steps required; admin privilege
• DMP License Required
SMP DMP
Hardware Desktop Desktop/Cluster
Operation level Low level operations
are threaded
Higher level. Matrix
partitioned at a higher
level
Software Open MP and Intel
MKL
Message Passing
Interface (MPI)
Scalability Tapers off at 8 to12
processors
Highly scalable
SMP
DMP
Unrestricted © Siemens AG 2016
Siemens PLM Software
NX Nastran SMP
• Easy to use. • Specify smp=n or parallel=n in nastran command line
• Femap Executive and Solution Options
• Available on all NX Nastran supported platforms
• Available in all solution types
• Modules parallelized• Matrix decomposition (DCMP)
• Multiply Add (MPYAD)
• Forward-Backward Substitution (FBS)
• Frequency response (FRRD1)
• Driver module for Sol 401 (NLTRD3)
• Other modules that indirectly call DCMP, MPYAD, FBS
Unrestricted © Siemens AG 2016
Siemens PLM Software
NX Nastran DMP
• Available in Sol 101, Sol 103, Sol 105, Sol 108, Sol 111, Sol 112
and Sol 200
• Partitioning by geometry, frequency, loads
• Critical to partition problem appropriately to maximize performance
• Available on Linux x86_64 and on windows.
• Requires Experienced User Familiar with Hardware Resources to
Realize Potential Benefits
Unrestricted © Siemens AG 2016
Siemens PLM Software
NX Nastran Linear Contact Solutions
2.0mm1.0mm0.5mm
Search Distance
• Select element iterative solver• When 3D elements are > 90% of total number of elements
• When solution is linear statics
• Specify proper search distance. Large search
distances typically involve more active
contacts for the first few iterations
• Adjust the global contact parameters MAXF
and/or CTOL to reduce the number of
iterations
Unrestricted © Siemens AG 2016
Siemens PLM Software
0
50000
100000
150000
200000
250000
300000
350000
1 2 3 4 5 6 7 8 9 10Nu
mb
er
of
Co
nta
ct S
tatu
s C
han
ges
Iterations
Search distance = 2mmSearch distance = 1mmSearch distance = 0.5mm
NX Nastran Linear Contact Solutions
Unrestricted © Siemens AG 2016
Siemens PLM Software
NX Nastran Modal Solution
Use RDMODES (Recursive modes). Partitions the
model into “nrec” partitions• No big triangular solves
• No orthogonalization
• Reduced I/O
• Approximate solution
• Used when large number of modes are to be computed
• Can be used with SMP, DMP or in Hybrid mode
Use system cell 462=1 • When large amount of memory is available
• Frequency response runs in-core
Unrestricted © Siemens AG 2016
Siemens PLM Software
RDMODES Performance
0
100
200
300
400
500
600
1 2 4 8E
lap
se
d T
ime
(m
ins
)
Number of Processors
SMP
DMP
Hardware
Processor Intel Xeon 5690
(3.47 GHz)
L1,L2,L3
cache
32KB, 256KB,
12MB
Cores 6 per socket and 2
sockets
Memory 96GB
Disks 6 x 585 GB disks
in RAID0
Engine Block Model
DOF 21945096
CTETRA 2233552
Unrestricted © Siemens AG 2016
Siemens PLM Software
Femap PerformancePreferences/Database
New for FEMAP 11.0. Performs the
“cleanup” portion of “File, Rebuild” when
the model is saved, so you don’t have to
do it manually to make model files smaller
after deleting results
Usually, it is probably best to leave this
setting alone, and remember, sometimes
it might be better to go lower, this is
FEMAP’s Memory Cache
Critical for Maximum Performance – Set
to a number higher than your highest
Node or Element ID
File Open/Save significantly faster. Use
the “Read/Write Test” button to determine
proper setting for each machine
Unrestricted © Siemens AG 2016
Siemens PLM Software
Femap Performance
Graphics Options that Effect Graphics Performance
Performance Graphics – Introduced in FEMAP v11.1,
Performance Graphics can significantly improve graphics
performance on graphics cards supporting OpenGL 4.2+.
Recommended cards are Nvidia Quadro and AMD
FirePro
Vertex Buffer Objects – Only turn this option on if you
have a graphics card which supports OpenGL 2.0 or
above. Using VBOs can greatly improve performance of
dynamic rotation
Max VBO MB – FEMAP will determine how much RAM is
available on the graphics card, then allow you to choose
how much you want to allow FEMAP to use. Typically,
half of available is a safe value, but using more may
improve performance without causing any issues.
Min VBO MB – By default, this value is set to 1024. This
value should work for a large majority of models. That
said, increasing or decreasing the value may benefit
certain graphics cards and/or models
Unrestricted © Siemens AG 2016
Siemens PLM Software
Miscellaneous
More Info
Siemens PLM FEMAP Community - Official Site
http://community.plm.automation.siemens.com/t5/Femap/ct-p/Femap
Unrestricted © Siemens AG 2016
Siemens PLM Software
Q and A