Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | jasper-potter |
View: | 213 times |
Download: | 0 times |
9-Sept-2003 CAS2003, Annecy, France, WFS 1
Distributed Data ManagementDistributed Data Management
at DKRZat DKRZ
Wolfgang SellWolfgang Sell
Hartmut FichtelHartmut Fichtel
Deutsches Klimarechenzentrum GmbHDeutsches Klimarechenzentrum [email protected], [email protected]@dkrz.de, [email protected]
Wolfgang SellWolfgang Sell
Hartmut FichtelHartmut Fichtel
Deutsches Klimarechenzentrum GmbHDeutsches Klimarechenzentrum [email protected], [email protected]@dkrz.de, [email protected]
9-Sept-2003 CAS2003, Annecy, France, WFS 2
Table of ContentsTable of Contents
• DKRZ - a German HPC Center
• HPC Systemarchitecture suited for Earth System Modeling
• The HLRE Implementation at DKRZ
• Implementing IA64/Linux based Distributed Data Management
• Some Results
• Summary
• DKRZ - a German HPC Center
• HPC Systemarchitecture suited for Earth System Modeling
• The HLRE Implementation at DKRZ
• Implementing IA64/Linux based Distributed Data Management
• Some Results
• Summary
9-Sept-2003 CAS2003, Annecy, France, WFS 3
DKRZ - a German HPCCDKRZ - a German HPCCDKRZ - a German HPCCDKRZ - a German HPCC
• Mission of DKRZ
• DKRZ and its Organization
• DKRZ Services
• Model and Data Services
• Mission of DKRZ
• DKRZ and its Organization
• DKRZ Services
• Model and Data Services
9-Sept-2003 CAS2003, Annecy, France, WFS Page 4
In 1987 DKRZ was founded with the Mission to
• Provide state-of-the-art supercomputing
and data service to the German scientific
community to conduct top of the line Earth
System and Climate Modelling.
• Provide associated services including
high level visualization.
In 1987 DKRZ was founded with the Mission to
• Provide state-of-the-art supercomputing
and data service to the German scientific
community to conduct top of the line Earth
System and Climate Modelling.
• Provide associated services including
high level visualization.
Mission of DKRZMission of DKRZ
9-Sept-2003 CAS2003, Annecy, France, WFS Page 5
Deutsches KlimaRechenZentrum = DKRZ German Climate Computer Center
• organised under private law (GmbH) with 4 shareholders
• investments funded by federal government,operations funded by shareholders
• usage 50 % shareholders and 50 % community
Deutsches KlimaRechenZentrum = DKRZ German Climate Computer Center
• organised under private law (GmbH) with 4 shareholders
• investments funded by federal government,operations funded by shareholders
• usage 50 % shareholders and 50 % community
DKRZ and its Organization (1)DKRZ and its Organization (1)
9-Sept-2003 CAS2003, Annecy, France, WFS Page 6
DKRZ internal Structure
• 3 departments for
• systems and networks
• visualisation and consulting
• administration
• 20 staff in total
• until restructuring end of 1999 a fourth department supported climate model applications and climate data management
DKRZ internal Structure
• 3 departments for
• systems and networks
• visualisation and consulting
• administration
• 20 staff in total
• until restructuring end of 1999 a fourth department supported climate model applications and climate data management
DKRZ and its Organization (2)DKRZ and its Organization (2)
9-Sept-2003 CAS2003, Annecy, France, WFS Page 7
• operations center: DKRZ
• technical organization of computational ressources(compute-, data- and network-services, infrastructure)
• advanced visualisation• assistance for parallel architectures
(consulting and training)
• operations center: DKRZ
• technical organization of computational ressources(compute-, data- and network-services, infrastructure)
• advanced visualisation• assistance for parallel architectures
(consulting and training)
DKRZ ServicesDKRZ Services
9-Sept-2003 CAS2003, Annecy, France, WFS Page 8
competence center: Model & Data
• professional handling of community models• specific scenario runs• scientific data handling
Model & Data Group external to DKRZ, administered by MPI for Meteorology, funded by BMBF
competence center: Model & Data
• professional handling of community models• specific scenario runs• scientific data handling
Model & Data Group external to DKRZ, administered by MPI for Meteorology, funded by BMBF
Model & Data ServicesModel & Data Services
9-Sept-2003 CAS2003, Annecy, France, WFS 9
HPC Systemarchitecture HPC Systemarchitecture suited for Earth System Modeling suited for Earth System ModelingHPC Systemarchitecture HPC Systemarchitecture suited for Earth System Modeling suited for Earth System Modeling
• Principal HPC System Configuration
• Links between Different Services
• The Data Problem
• Principal HPC System Configuration
• Links between Different Services
• The Data Problem
9-Sept-2003 CAS2003, Annecy, France, WFS Page 10
Principal HPC System ConfigurationPrincipal HPC System ConfigurationPrincipal HPC System ConfigurationPrincipal HPC System Configuration
9-Sept-2003 CAS2003, Annecy, France, WFS Page 11
• Functionality and Performance Requirements for Data Service
• Transparent Access to Migrated Data
• High Bandwidth for Data Transfer
• Shared Filesystem
• Possibility for Adaptation in Upgrade Stepsdue to Changes in Usage Profile
• Functionality and Performance Requirements for Data Service
• Transparent Access to Migrated Data
• High Bandwidth for Data Transfer
• Shared Filesystem
• Possibility for Adaptation in Upgrade Stepsdue to Changes in Usage Profile
Link between Compute Powerand Non-Computing ServicesLink between Compute Powerand Non-Computing Services
9-Sept-2003 CAS2003, Annecy, France, WFS Page 12
Compute server powerCompute server powerCompute server powerCompute server power
Installed compute power (peak)
0,10
1,00
10,00
100,00
1000,00
10000,00
GF
lop
s
Installed compute power (peak)
0,10
1,00
10,00
100,00
1000,00
10000,00
GF
lop
s
9-Sept-2003 CAS2003, Annecy, France, WFS Page 13
Adaptation Problem for Data ServerAdaptation Problem for Data ServerAdaptation Problem for Data ServerAdaptation Problem for Data Server
Dataproblem in HPC
0
500
1.000
1.500
2.000
2.500
3.000
0 50 100 150 200 250 300 350 400 450 500
Effective Compute Power (P) in GFlops
Da
ten
erz
eu
gu
ng
sra
te in
TB
yte
/Ja
hr
data increase:
linear, P1
P3/4
P2/3
9-Sept-2003 CAS2003, Annecy, France, WFS Page 14
• High Bandwidth between the Coupled Servers
• Scalability supported by Operating System
• No Needs for Multiple Copies
• Record Level Access to Data with High Performance
• Minimized Data Transfers
• High Bandwidth between the Coupled Servers
• Scalability supported by Operating System
• No Needs for Multiple Copies
• Record Level Access to Data with High Performance
• Minimized Data Transfers
Pros of Shared Filesystem CouplingPros of Shared Filesystem Coupling
9-Sept-2003 CAS2003, Annecy, France, WFS Page 15
• Proprietary Software needed
• Standardisation still missing
• Limited Number of Vendors whose Systems can be connected
• Proprietary Software needed
• Standardisation still missing
• Limited Number of Vendors whose Systems can be connected
Cons of Shared Filesystem CouplingCons of Shared Filesystem Coupling
9-Sept-2003 CAS2003, Annecy, France, WFS 16
HLRE Implementation at DKRZ HLRE Implementation at DKRZ HLRE Implementation at DKRZ HLRE Implementation at DKRZ
HöchstLeistungsRechnersystem für die Erdsystem-forschung = HLREHigh Performance Computer System for Earth System Research
• Principal HLRE System Configuration
• HLRE Installation Phases
• IA64/Linux based Data Services
• Final HLRE Configuration
HöchstLeistungsRechnersystem für die Erdsystem-forschung = HLREHigh Performance Computer System for Earth System Research
• Principal HLRE System Configuration
• HLRE Installation Phases
• IA64/Linux based Data Services
• Final HLRE Configuration
9-Sept-2003 CAS2003, Annecy, France, WFS Page 17
Principal HLRE System ConfigurationPrincipal HLRE System ConfigurationPrincipal HLRE System ConfigurationPrincipal HLRE System Configuration
9-Sept-2003 CAS2003, Annecy, France, WFS Page 18
HLRE PhasesHLRE PhasesHLRE PhasesHLRE Phases
Mass Storage Capacit y [Tbytes] >720 >1400 >3400
Date Feb 2002 4Q 2002 3Q 2003
Nodes 8 16 24
CPUs 64 128 192
Expected Sustained Performance [Gflops]
ca. 200 ca. 350 ca. 500
Expected Increase in Thruput compared to CRAY C916
ca. 40 ca. 75 ca. 100
Main Memory [Tbytes] 0.5 1.0 1.5
Disk-Capacit y [Tbytes] ca. 30 ca. 50 ca. 60
9-Sept-2003 CAS2003, Annecy, France, WFS Page 19
DS phase 1: basic structureDS phase 1: basic structure
• CS performance increase • f = 37• F = f3/4 = 15• minimal component
performanceindicated in diagram
• explicit user access• ftp, scp ...• CS disks with local copies• DS disks for cache
• physically distributed DS
• NAS architecture
• CS performance increase • f = 37• F = f3/4 = 15• minimal component
performanceindicated in diagram
• explicit user access• ftp, scp ...• CS disks with local copies• DS disks for cache
• physically distributed DS
• NAS architecture
CS client(s)
DS
other clients
GE
180 MB/s
45 MB/s
150 MB/s375 MB/s
16.5 TB ~ PB
11 TB
9-Sept-2003 CAS2003, Annecy, France, WFS Page 20
Adaptation Option for Data ServerAdaptation Option for Data ServerAdaptation Option for Data ServerAdaptation Option for Data Server
Dataproblem in HPC
0
500
1.000
1.500
2.000
2.500
3.000
0 50 100 150 200 250 300 350 400 450 500
Effective Compute Power (P) in GFlops
Dat
ener
zeu
gu
ng
srat
e in
TB
yte/
Jah
r data increase:
linear, P1
P3/4
P2/3
9-Sept-2003 CAS2003, Annecy, France, WFS Page 21
DS phases 2,3: basic structureDS phases 2,3: basic structure
• CS performance increase • f = 63/100• F = f3/4 = 22.4/31.6• minimal component
performanceindicated in diagram
• implicit user access• local UFS commands
• CS disks with local copies
• shared disks (GFS)
• DS disks for IO buffercache
• Intel/Linux platforms• homogenous HW
• technological challenge
• CS performance increase • f = 63/100• F = f3/4 = 22.4/31.6• minimal component
performanceindicated in diagram
• implicit user access• local UFS commands
• CS disks with local copies
• shared disks (GFS)
• DS disks for IO buffercache
• Intel/Linux platforms• homogenous HW
• technological challenge
CS client(s)
DS
other clients
GE
270/325 MB/s
70/80 MB/s
225/270 MB/s
560/675 MB/s
16.5 TB ~ PB
FC25/30 TB
11 TB
9-Sept-2003 CAS2003, Annecy, France, WFS 22
Implementing IA64/Linux based Implementing IA64/Linux based Distributed Data ManagementDistributed Data ManagementImplementing IA64/Linux based Implementing IA64/Linux based Distributed Data ManagementDistributed Data Management
• Overall Phase 1 Configurations
• Introducing Linux based Distributed HSM
• Introducing Linux based Distributed DBMS
• Final Overall Phase 3 Configuration
• Overall Phase 1 Configurations
• Introducing Linux based Distributed HSM
• Introducing Linux based Distributed DBMS
• Final Overall Phase 3 Configuration
9-Sept-2003 CAS2003, Annecy, France, WFS Page 23
Proposed final phase 3 configurationProposed final phase 3 configurationProposed final phase 3 configurationProposed final phase 3 configuration
HS/MS LAN
GE x 48
x 16
x 25
FE x 2/nodeFor PolestarLite
AsAmA 16wayAsAmA 16wayGFS/Server
UVDM AsAmA 16wayAsAmA 16wayGFS/Server
UVDM
UDSN
AsAmA 4wayAsAmA 4wayGFS/Client
Oracle AsAmA 4wayAsAmA 4way
GFS/ClientOracle
UDSN/UDNL UDSN/UDNL
GFS Disk(Polestar)0.28 x 53=14.8TB
x 36GFS Disk(Polestar)0.28 x 53=14.8TB
x 36
x 8x 8 x 8
x 2 x 2
x 4
x 4
FC x 72
x 8
Disk Cache (Polestar)
0.57TB x 15= 8.5TB
Disk Cache (DDN)
0.69TB x 12= 8.3TB
x 72
Local DiskFC- RAID
0.28TB x20=5.6TB
Local DiskFC- RAID
0.28TB x20=5.6TB
Silkworm 12000
x 20 x 20
x 120
x 32SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6
IXS 24nodes
9940B x 209840B x 09840C x 5
Local Disk(Polestar) 0.14 x 2
= 0.28TB
Local Disk(Polestar) 0.14 x 2
= 0.28TB
x 2 for Local disk
x 2 for Local disk
Fibre channel
GigabitEther
AsAmA 4wayAsAmA 4way
GFS/Client Oracle AsAmA 4wayAsAmA 4way
GFS/ClientOracle
UDSN/UDNL UDSN/UDNL
x 2x 2
x 4x 4
AzusAAzusA 16way16wayGFS/Server
Post processing system UCFM/UDSN
Disk FC x 8Tape FC x 6
Disk FC x 8Tape FC x 6
x 4 x 4 x 4 x 4x 16 x 8
Oracle DB(DDN)
2TB x 4= 8TB
x 8
x 4
Oracle ApplicationServer
SQLNET
Sun 4CPUSun 4CPU
The Internet
AsamA 4CPUAsamA 4CPUSQLNET
Migration upon market availability of components
9-Sept-2003 CAS2003, Annecy, France, WFS 24
Some ResultsSome ResultsSome ResultsSome Results
• Growth of the Data Archive
• Growth of Transferrate
• Observed Transferrates for HLRE
• FLOPS-Rates
• Growth of the Data Archive
• Growth of Transferrate
• Observed Transferrates for HLRE
• FLOPS-Rates
9-Sept-2003 CAS2003, Annecy, France, WFS Page 25
DS archive capacity [TB]DS archive capacity [TB]
0
100
200
300
400
500
600
[TB]
1992
1994
1996
1998
2000
2002
archive capacity
duplicates
original
0
100
200
300
400
500
600
[TB]
1992
1994
1996
1998
2000
2002
archive capacity
duplicates
original
9-Sept-2003 CAS2003, Annecy, France, WFS Page 26
DS archive capacity (2001-2003)DS archive capacity (2001-2003)DS archive capacity (2001-2003)DS archive capacity (2001-2003)
0
200
400
600
800
1000
[TB]
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
archive capacity
original
duplicates
0
200
400
600
800
1000
[TB]
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
archive capacity
original
duplicates
9-Sept-2003 CAS2003, Annecy, France, WFS Page 27
DS transfer rates [GB/day]DS transfer rates [GB/day]
0
500
1000
1500
2000
2500
3000
[GB]
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
daily transfer volume
fetch
store
0
500
1000
1500
2000
2500
3000
[GB]
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
daily transfer volume
fetch
store
9-Sept-2003 CAS2003, Annecy, France, WFS Page 28
DS transfer rates (2001-2003)DS transfer rates (2001-2003)DS transfer rates (2001-2003)DS transfer rates (2001-2003)
0500
1000150020002500300035004000
[GB]
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
daily transfer volume
fetch
store
0500
1000150020002500300035004000
[GB]
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
daily transfer volume
fetch
store
9-Sept-2003 CAS2003, Annecy, France, WFS Page 29
DS transfer rates (2001-2003)DS transfer rates (2001-2003)DS transfer rates (2001-2003)DS transfer rates (2001-2003)
0
5000
10000
[GB]
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
daily transfer volume
minimum
average
maximum0
5000
10000
[GB]
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
daily transfer volume
minimum
average
maximum
9-Sept-2003 CAS2003, Annecy, France, WFS Page 30
Observed Transferrates for HLREObserved Transferrates for HLREObserved Transferrates for HLREObserved Transferrates for HLRE
Link Single StreamTransferrate [MB/s]
AggregateTransferrate [MB/s]
CS -> DSvia ftp, (12.1 SUPER-UX)
13 100
CS -> DSvia ftp, (12.2 SUPER-UX)
25 200
CS -> local disk,(12.1 SUPER-UX)
40 - 50 > 2.000
CS -> GFS disk,(13.1 SUPER-UX)
Up to 90 3.900
DS -> GFS disk,(Linux)
Up to 80 500 per node
9-Sept-2003 CAS2003, Annecy, France, WFS Page 31
Observed FLOPS-rates for HLREObserved FLOPS-rates for HLREObserved FLOPS-rates for HLREObserved FLOPS-rates for HLRE
• 4 node performance > approx.100 GLFOPS
( about 40 % Efficiency) for
• ECHAM (70-75)
• MOM
• Radar Reflection on Sea Ice
• 24 node performance for Turbulence Code about 470 GFLOPS (30+ % Efficiency)
9-Sept-2003 CAS2003, Annecy, France, WFS 32
SummarySummarySummarySummary
• DKRZ provides Computing Resources for Climate Research in Germany on an competitive international level
• The HLRE System Architecture is suited to cope with a data-intensive Usage Profile
• Shared Filesystems today are operational in Heterogenous System Environments
• Standardisation-Efforts for Shared Filesystems needed
• DKRZ provides Computing Resources for Climate Research in Germany on an competitive international level
• The HLRE System Architecture is suited to cope with a data-intensive Usage Profile
• Shared Filesystems today are operational in Heterogenous System Environments
• Standardisation-Efforts for Shared Filesystems needed
9-Sept-2003 CAS2003, Annecy, France, WFS 33
Thank you for your attention !Thank you for your attention !
9-Sept-2003 CAS2003, Annecy, France, WFS Page 34
010002000300040005000600070008000
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
daily transfer volume [GB]
repack
client
010002000300040005000600070008000
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
daily transfer volume [GB]
repack
client
Tape transfer rates (2001-2003)Tape transfer rates (2001-2003)Tape transfer rates (2001-2003)Tape transfer rates (2001-2003)
9-Sept-2003 CAS2003, Annecy, France, WFS Page 35
DS transfer requests (2001-2003)DS transfer requests (2001-2003)DS transfer requests (2001-2003)DS transfer requests (2001-2003)
05000
10000150002000025000300003500040000
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
daily transfer requests
fetch
store
05000
10000150002000025000300003500040000
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
daily transfer requests
fetch
store
9-Sept-2003 CAS2003, Annecy, France, WFS Page 36
DS archive capacity (2001-2003)DS archive capacity (2001-2003)DS archive capacity (2001-2003)DS archive capacity (2001-2003)
0
200
400
600
800
1000
[TB]
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
archive capacity
9940
9840
SD3
VHS
94900
200
400
600
800
1000
[TB]
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
archive capacity
9940
9840
SD3
VHS
9490
9-Sept-2003 CAS2003, Annecy, France, WFS Page 37
DS archive capacity (2001-2003)DS archive capacity (2001-2003)DS archive capacity (2001-2003)DS archive capacity (2001-2003)
0
2
4
6
8
10
12
[million]
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
number of files stored
9940
9840
SD3
VHS
94900
2
4
6
8
10
12
[million]
Sep
01
Dez
01
Mrz
02
Jun
02
Sep
02
Dez
02
Mrz
03
Jun
03
number of files stored
9940
9840
SD3
VHS
9490