Cray IO COEPerformance of MPIIO on
DVS+GPFSYushu Yao
Collaboration with:Mike Aamodt, Katie Antypas, Tina Butler, Mark Cruciani, Jason
Hick, David Knaak, Rei Lee, Rose Olson, Mike Welcome
1
Wednesday, July 25, 12
WHY
2
Wednesday, July 25, 12
Scratch 35GB/S
LSI 7900
OSS OSS OSS OSS OSS OSS … …
LSI 7900
LSI 7900 … … LSI
7900 LSI
7900 … …
DDN DDN DDN DDN
PROJECT
DVS
DDN DDN DDN DDN
GSCRATCH
Data AnalysisVisualization
... ...PNSD PNSD PNSD
CMP CMP CMP …
CMP CMP CMP …… … …
Carver
…
Euclid
... ...
Reason 1. Users Love Global File Systems
FAST MPIIO!!!
3
Wednesday, July 25, 12
0
3
6
9
12
Carver FilePerProc Hopper FilePerProc Carver MPIIO Hopper MPIIO
1.8
7
1111
0.8
11
1212
Read Write
GB/
Sec
Performance using IOR (Jan 2012) 20 nodes with 4PE/node
Reason 2. Well, DVS+MPIIO Was Super SLOW
4
Wednesday, July 25, 12
Main Difficulty
• For Users: Setting the right Parameters
• For DVS/MPIIO developer: Not sure what GPFS is doing
5
Wednesday, July 25, 12
Method
• Experts can quickly point out setup problems
• Give feedback to developers to quickly implement library changes
Yushu Run IOR Jobs
Change Job Setup or MPIIO LibraryBoard of Cray IO
Experts Review Results & Suggest
Changes
6
Wednesday, July 25, 12
0
3
6
9
12
Baseline (Jan) Jan 26 Feb 8 Feb 10 Feb 22 Mar 7
5
3.5
1.5
0.50.05
1.8
7.8
6
555
0.8
Read Write
File/Process
11
12
DVS_MaxNode=16=1
DVS_MaxNode=14
Wrong DVS Block size
Match DVS Block size
4MB
#cb_Nodes wrong
Turn Off CB
New MPIIO Setting Correct
Defaults
Best we get so far
Progress over time ...
7
Wednesday, July 25, 12
Best Performance• 24PE/Node Each node reads/
writes 24GB DVSMaxNode=14 Custom MPIIOHints
• DVS_MAXNODES=14DVS_BLOCKSIZE=4194304IOR_HINT__MPI__romio_cb_read=disable IOR_HINT__MPI__romio_cb_write=enable IOR_HINT__MPI__romio_ds_read=disable IOR_HINT__MPI__romio_ds_write=disable IOR_HINT__MPI__striping_unit=4194304 IOR_HINT__MPI__cb_nodes=14
0
3
6
9
12
FilePerProc Carver MPIIO Hopper MPIIO
5
7
11
7.8
11
12
Read Write
Still, too complicated for a user to set, a
naive user will 100% guess them wrongly
8
Wednesday, July 25, 12
Best Solution:Setting Defaults for All Users
• For all users we set default environment variable: MPICH_MPIIO_DVS_MAXNODES=14 DVS_BLOCKSIZE 4194304
• Non-intrusive.: This will not affect anything else
• Work-less: A user don’t need to set any MPIIO hints to get (relatively) good performance
Will be on Hopper from MPT/5.5.0
9
Wednesday, July 25, 12
Conclusion
0
3
6
9
12
Before After
5
1.8
7.8
0.8
Read Write
• 10 X performance improvement on read, 3X write, after changing both run setup and MPIIO library
• Setting DEFAULT values for users so that they can get best performance (in most cases) automatically
10
Wednesday, July 25, 12
Next Step
• For DVS/MPIIO developer: Not sure what GPFS is doing
• IO benchmarking on DVS nodes to figure out where the bottlenecks are
• Maybe carried out in a less formal way?
11
Wednesday, July 25, 12