Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | diana-lily-carson |
View: | 221 times |
Download: | 0 times |
MUG - Mid April - June Period 1
MINERVA USER
GROUP MEETING
3 July 20123 Jul 2012
MUG - Mid April - June Period 23 Jul 2012
MUG - Mid April - June Period 3
Minerva Operational StatisticsMaintenance Stats – Mid-April through June
Number of Planned PMs 6
Number of Cancelled PMs 4
Number of Unplanned Outages 1
Total Time in Period 1,800 Hours
Total Planned Downtime 40 Hours
Total Unplanned Downtime 12 Hours
Total Time Available 97%
Time in Period 14M CPU Hours
Total Compute Time Used 5.3M CPU Hours
Average Utilization 37%
Accounting StatisticsNumber of Users 91Number of Completed Jobs 313,000
3 Jul 2012
MUG - Mid April - June Period 4
Minerva Operational StatisticsMid-April
Number of Completed Jobs 17,726
Time in Period ~2.5M CPU Hours
Total Compute Time Used 160K CPU Hours
Average Utilization 6.4%
May
Number of Completed Jobs 69,677
Time in Period 5.3M CPU Hours
Total Compute Time Used 1.75M CPU Hours
Average Utilization 33%June
Number of Completed Jobs 230,795
Time in Period 5.3M CPU Hours
Total Compute Time Used 3.3M CPU Hours
Average Utilization 62%3 Jul 2012
MUG - Mid April - June Period 5
Minerva Usage By UserUser CPU Hours N Jobs Avg CPU
H / JobMin Size
Avg Size Max Size
Mihaly Mezei 816,880 132,035 6.18 11.5 2,048
Menachem Fromer 773,833 57,731 13 115.3 25
Ernesto Borrero 768,744 274 2,805 1693.4 4,096
Yacob Gomez 688,090 2,107 326 192.1 1,024
Vladimir Makarov 397,200 24,709 16 17.1 64
Ana Negri 266,359 277 961 32175.6 256
Michael Linderman 196,460 11,670 17 117.2 64
Hardik Shah 190,146 470 404 1647.1 64
Elena Parkhomenko 168,407 946 178 1 3.9 4
Bojan Losic 142,900 3,643 39 126.1 64
3 Jul 2012
MUG - Mid April - June Period 6
Mihaly Mezei; 816,880
Menachem Fromer; 773,834
Ernesto Borrero; 768,744
Yacob Gomez; 688,091
Vladimir Makarov; 397,201
Ana Negri; 266,359
Michael Linderman; 196,461
Hardik Shah; 190,146
Elena Parkhomenko; 168,407
Bojan Losic; 142,901
Zachary Giles; 129,745
Hyung min Cho; 110,500
Ariella Cohain; 110,210
Harm Van bakel; 110,172
Roberto Sanchez; 87,468
Sonali Arora; 86,950Dalila Pinto; 82,309
Other; 213,183
Minerva Hours By User
3 Jul 2012
MUG - Mid April - June Period 7
User Core Hours N Jobs User Core Hours N Jobs
gilesz01 129,744 45,250 changr04 13,385 1,024
choh07 110,499 497 pendlm02 8,081 49
cohaia01 110,210 13,758 bongeg01 6,446 83
vanbah01 110,172 1,264 kouy01 5,094 66
sanchr05 87,468 280 yoos01 3,911 56
aroras03 86,949 2,726 jabado01 2,540 4,545
pintod02 82,309 4,079 ruderd02 1,500 72
zhuj05 35,151 380 bashia02 1,404 30
fludee01 27,196 42 purces04 1,401 70
osmanr01 24,737 29 holeha01 949 5
johnsj12 21,694 22 yangy10 405 313
gargp01 21,618 6,852 brandt02 341 11
schade01 18,654 190 ludtka01 187 6
goldba06 18,464 1,064 provad01 14 12
caig01 2
Remaining Users CPU Hours
3 Jul 2012
MUG - Mid April - June Period 8
Minerva Usage By GroupP.I. CPU Hours N Jobs Avg CPU
H / JobMin Size
Avg Size Max Size
Marta Filizola 1,056,812 586 1,803 1409 4,096
Mihaly Mezei 816,876 132,031 6.18 11.5 2,048
Pamela Sklar 776,361 62,164 12 114 64
Ivan Ubarrechena 688,090 2,107 326 192 1,024
Joseph Buxbaum 584,072 25,845 22 17 64
HPC Staff 267,440 46,127 5.79 17,680
Rui Chang 210,545 17,548 12 16 64
Michael Linderman 197,597 11,751 16.8 1617.2 64
Milind Mahajan 189,941 502 378 1 47.9 4
Bojan Losic 142,900 3,643 39 126.1 64
3 Jul 2012
MUG - Mid April - June Period 9
Marta Filizola; 1,056,812
Mihaly Mezei; 816,876
Pamela Sklar; 776,362Ivan Ubarrechena; 688,091
Joseph Buxbaum; 584,073
HPC Staff; 267,441
Rui Chang; 210,546
Michael Linderman;
197,598
Milind Maha-jan; 189,942
Bojan Losic; 142,901
Harm Van bakel; 110,172Roberto Sanchez; 87,468 Dalila Pinto; 82,309 Jun Zhu; 39,062
Minerva Hours by Group
3 Jul 2012
MUG - Mid April - June Period 10
14
516
3264
128192
256384
512768
10242048
3840
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
15
812
2448
7296
100144
Minerva Job Mix
1581224487296100144
Job Size
CPU
Hou
rs
Wall Time3 Jul 2012
MUG - Mid April - June Period 11
Minerva UtilizationMid-April - June
3 Jul 2012
MUG - Mid April - June Period 12
Minerva UtilizationMay - June
3 Jul 2012
MUG - Mid April - June Period 13
Minerva Scratch Usage
Group Size
Joseph Buxbaum 134T
Dalila Pinto 27T
Genomic Core 22T
Shaun Purcell 12T
Next Gen Seq 3.9T
Genomic Core II 2.0T
Jun Zhu 1.8T
Harm Van bakel 907G
Milind Mahajan 126G
Zhidong Tu 75G
User Size
Ernesto Borrero 12T
Ana Negri 4.5T
Bojan Losic 4.1T
Hyung min Cho 1.3T
Temp Folders 1.3T
Harm Van bakel 1.2T
Yan Kou 979G
Mihaly Mezei 941G
Yacob Gomez 927G
Zhidong Tu 755G
/projects /scratch
3 Jul 2012
MUG - Mid April - June Period 14
Other Plans/Projects
• Archival Storage– Ordered: Tape Library with 4 Tape transports
• 350TB tape capacity
– Anticipated 1 Sep 2012 start of service• GPGPU
– Chassis w/2 Fermi-based Tesla cards ordered– Target availability date is 1 Aug 2012
• Checkpoint/Restart (BLCR)– Partially Installed – needs reboot of systems and testing.
• Monthly Training Meetings– Third Tuesday of Month– Alternate between basic and advanced
3 Jul 2012
MUG - Mid April - June Period 15
HiccupsScheduler Failure:
Problem: June Tripled previous job count. Scheduler database table overflowed.
Resolution: We put limits for the number of jobs per user in Torque and Moab. Long Term: Newer version of Torque and Moab. Move to a SQL Database.
Infiniband / MPI Issues:Problem: Mellanox driver buffer overflowing because of 64 core systems.Resolution: We built a custom version of the Mellanox driver.Long Term: Working with Mellanox to add changes to mainline code.
AMD 64core understanding + performance:Problem: Misunderstanding of number of 32 FPU’s in a system, not 64. Also
the ACML Library is not tuned for the FFTW Library.Resolution: Changed scheduling to allow blocks of 32 and job exclusive
nodes.Long Term: AMD is creating a new ACML library with tuned FFT sizes.
3 Jul 2012
MUG - Mid April - June Period 16
Open ForumRequested/Suggested Topics
• Bioconductor R site-library– Should we put all Bioconductor R packages in one
library? ( module load bioconductor)• Epilogue report– Report job resource resource usage to syserr?
• PM Schedule– Can we reduce PM’s to monthly?
• Fairshare– Comments? Feedback?
3 Jul 2012