Post on 03-Feb-2022
transcript
Distributed RAID Architectures Distributed RAID Architectures for Cluster I/O Computingfor Cluster I/O Computing
Kai Hwang Kai Hwang
Internet and Cluster Computing Lab.
University of Southern California
1
Presentation Outline :
n Scalable Cluster I/O
n The RAID-x Architecture
n Cooperative disk drivers
n Benchmark Experiments
n Security and Fault Tolerance
n Conclusions
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 3
Scalable clusters providing SSI services are gradually replacing the SMP, cc-NUMA, and MPP in Servers,
Web Sites, and Database Centers
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 4
g Size Scalability (physical & application)g Enhanced Availability (failure management)
g Single System Image (Middleware,OS extensions)
g Fast Communication (networks & protocols)
g Load Balancing (CPU, Net, Memory, Disk)
g Security and Encryption (clusters of clusters)
g Distributed Environment (User friendly)g Manageability (Jobs and resources )
g Programmability (simple API required)
g Applicability (cluster- and grid-awareness)
Issues in Cluster Design
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 5
g Sixteen Pentinum PCs are housed in two 9-ft computer racks.
g All PCs run with the RedHatLinux v. 6.0 (Kernel v. 2.2.5)
g All nodes are connected by a 100 Mbps Fast Ethernet
g The cluster is ported with DQS, LSF, MPI, PVM, TreadMarks, Elias, and NAS benchmarks, etc.
g Scalable to a future system with 100’s of future PC nodes inter-connected by Gigabit networks
The USC Trojans Cluster ProjectInternet and Cluster Computing Lab. EEB Rm.104
http://andy.usc.edu/trojan/
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 6
Trojans Linux Clusterwith Middleware for Security
and Checkpoint Recovery
PentiumPC
PentiumPC
Pentium PC
Gigabit Network Interconnect
Security and Checkpointing Middleware
Single-System Image and Availability Infrastructure
Programming Environments(Java, EDI, HTML, XML)
Web WindowsUser Interface
Other Subsystems(Database, OLTP, etc.)
Linux Linux Linux
An I/O-centric cluster architecture
EntryPartition
Fast Ethernet
Internet/IntranetClient
DatabasePartition
ServicePartition
Service Flow Data Flow 4
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 8
Distributed RAID Embedded in Clusters or Storage-Area Networks:
g I/O Bottleneck in Scalable Cluster Computing
n The gap between CPU/Memory and disk-IO widens
as the µµP doubles in speed every year
n Cluster applications are often I/O-bound
g Disks connected to hosts are often subject to failure by
hosts themselves. Distributed RAID has much higher
availability by fault isolation, rollback recovery, and
automatic file migration.
Distributed RAID with a single I/O space embedded in a cluster
Workstations or PCs
Cluster Network (SAN or LAN)
ds-RAID
Research Projects on Parallel and Distributed RAID
SystemAttributes
USC TrojansRAID-x
PrincetonTickerTAIP
DigitalPetal
Berkeley TertiaryDisk
HPAutoRAID
RAIDArchitectureenvironment
Orthogonal stripingand mirroring in aLinux cluster
RAID-5 withmultiplecontrollers
ChainedDeclusteringin Unix cluster
RAID-5 built witha PC cluster
Hierarchicalwith RAID-1and RAID-5
EnablingMechanismfor SIOS
Cooperative devicedrivers in Linuxkernel
Single RAIDserverimplementation
Petal devicedrivers atuser level
xFS storageservers at filelevel
Disk arraywithin singlecontroller
DataConsistencyChecking
Locks at devicedriver level
Sequencingof userrequests
Lamport’sPaxosalgorithm
Modified DASHprotocol in the xFSfile system
Use mark toupdate theparity disk
Reliabilityand FaultTolerance
Orthogonal stripingand mirroring
Parity checksin RAID-5
ChainedDeclustering
SCSI disks withparity in RAID-5
Mirroring andparity checks
Four RAID architectures using different mirroring and parity checking schemes
B8
M6
M7
M8
B6 B7
Disk 1 Disk 2 Disk 3 Disk 4
B0
B4
B8
P4
B12
B16
B1
B5
P3
B9
B13
B17
B2
P2
B6
B10
B14
P6
P1
B3
B7
B11
P5
B15
Disk 1 Disk 2 Disk 3 Disk 4
B0
B2
B4
B6
B8
B10
B1
B3
B5
B7
B9
B11
M0
M2
M4
M6
M8
M10
M1
M3
M5
M7
M9
M11
Disk 0 Disk 1 Disk 2 Disk 3
B0
B4
B1
B5
B2
M3
M4
M5
B3
M9
M10
M11
B9 B10 B11
M0
M1
M2
Disk 0 Disk 1 Disk 2 Disk 3
B0
M3
B4
M77
B1
M0
B5
M4
B2
M1
B6
M5
B3
M2
B7
M6
B8
M11
B9
M8
B10
M9
B11
M10
Data blocks
Mirrored blocks
(a) Striped mirroring in RAID-10 (b) Parity checking in RAID-5
(c) Orthogonal striping and mirroring(OSM) in the RAID-x
(d) Skewed striping in a chaineddeclustering RAID 6
Theoretical Peak Performance of Four RAID Architectures
7
PerformanceIndicators RAID-10 RAID-5
ChainedDeclustering RAID-x
Read n B n B n B n BLarge Write n B (n-1) B n B n BMax. I/O
Bandwidth Small Write n B nB / 2 n B n BLarge Read mR / n mR / n mR / n mR / nSmall Read R R R RLarge Write 2 mW / n mW / (n-1) 2 mW / n mW / n +
mW / n(n-1)
ParallelRead orParallel
Write TimeSmall Write 2W R+W 2W W
Max. Fault Coveragen/2 diskfailures
Single diskfailure
n/2 diskfailures
Single diskfailure
Distributed RAID-x architectureCluster Network
P/M
CDD
P/M
CDD
P/M
CDD
Node 0 Node 1 Node 3
D0
P/M
CDD
Node 2
D1 D2 D3B0B12B24M25M26M27
B1B13B25M14M15M24
B2B14B26M3M12M13
B3B15B27M0M1M2
D4 D5 D6 D7B4B16B28M29M30M31
B5B17B29M18M19M28
B6B18B30M7M16
M17
B7B19B31M4M5M6
D8 D9 D10 D11B8B20B32M33M34M35
B9B21B33M22M23M32
B10B22B34M11M20M21
B11B23B35
M8M9M10 8
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 14
Single I/O space in a Distributed RAID enabled by CDDs at Linux kernel level
Cluster node
Cluster node
Cluster node
Interconnection Network
(b) A global virtual disk with a SIOS formed by cooperative disks
CDD CDD CDDCluster node
Cluster node
Central NFS Server
Interconnection Network
(a) Separate disks driven by independent disk drivers (IDDs)
IDDIDD
IDD
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 15(b) Using CDDs to achieve a SIOS in a serverless cluster.
User Level
Kernel Level
UserLevel
KernelLevel
Client side Server side
CDD CDD
User Application
NFS Serveris bypassed
(a) Parallel disk I/O using the NFS in a server/client cluster.
User Level
Kernel Level
UserLevel
KernelLevel
Client side Server side
NFS Client
NFS ServerUser
Application
Traditional Device Driver
Remote disk access using central NFS server versus using cooperative disk drivers in the RAID-x cluster
13
54
6
1
3
456
2
2
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 16
Architectural design of Architectural design of cooperative device driverscooperative device drivers
Node 2Node 1
CDD CDD
Virtual disksPhysical disks
Communications through the network
(a) Device masquerading
Cooperative Disk Driver (CDD)
StorageManager
CDD Client Module
Data Consistency Module
Communications through the network
(b) CDD architecture
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 17
Cluster node 1
Application
/sios
dir1 dir2
file1 file2 file3
Cluster node 2
Application
/sios
dir1 dir2
file1 file2 file3
Cluster node 3
Application
/sios
dir1 dir2
file1 file2 file3
Cluster node 4
Application
/sios
dir1 dir2
file1 file2 file3
CDD CDD CDD CDD
Cluster Network
Maintaining consistency of the global directory /sios by all CDDs in the distributed RAID-x
Elapsed Time in Executing the Andrew Benchmark on the Linux Cluster at USC
0
5
10
15
20
25
30
35
1 4 8 12 16
Number of Clients
Elap
sed
Tim
e (s
ec)
CompileRead FileScan DirCopy FilesMake Dir
0
1
2
3
4
5
6
7
8
1 4 8 12 16
Number of Clients
Ela
psed
Tim
e (s
ec)
NFS results RAID-x results
Parallel Write Performance of four RAID Architectures against Traffic Rate
13Parallel writes (20MB per client)
0
2
4
6
8
10
12
14
16
18
1 4 8 12 16Number of Clients
Ag
gre
gat
e B
and
wid
th (
MB
/s)
RAID-x Chained DeclusteringRAID-10RAID-5NFS
Parallel Write Performance of four RAID architectures vs. Disk Array Size
13Parallel write
0
2
4
6
8
10
12
14
16
18
2 4 8 12 16Disk Numbers
Ag
gre
gat
e B
and
wid
th (
MB
/s)
RAID-xChained DeclusteringRAID-10RAID-5
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 21
Achievable I/O Bandwidth and Improvement Factor on Trojans Cluster
NFS RAID-xI/OOperations 1 Client 16 Clients Improve 1 Client 16 Clients ImproveLarge Read 2.58 MB/s 2.3 MB/s 0.89 2.59 MB/s 15.63 MB/s 6.03
Large Write 2.11 MB/s 2.77 MB/s 1.31 2.92 MB/s 15.29 MB/s 5.24
Small Write 2.47 MB/s 2.81 MB/s 1.34 2.35 MB/s 15.1 MB/s 6.43Chained Declustering RAID-10Operations
1 Client 16 Clients Improve 1 Client 16 Clients Improve
Large Read 2.46 MB/s 15.8 MB/s 6.42 2.37 MB/s 10.76 MB/s 4.54
Large Write 2.62 MB/s 12.63 MB/s 4.82 2.31 MB/s 9.96 MB/s 4.31
Small Write 2.31 MB/s 12.54 MB/s 5.43 2.27 MB/s 9.98 MB/s 4.39
Effects of Stripe Unit Size on I/O Bandwidth of RAID Architectures
Large write (320MB for 16 clients) 14
0
2
4
6
8
10
12
14
16
18
20
16 32 64 128Stripe Unit Size (KB)
Ag
gre
gat
e B
and
wid
th (
MB
/s)
RAID-xChained DeclusteringRAID-10RAID-5
Bonnie Benchmark Results on Trojans ClusterBonnie Benchmark Results on Trojans Cluster
File rewrite
0
0.5
1
1.5
2
2.5
3
3.5
2 4 8 12 16Number of Disks
Ou
tpu
t R
ate
(MB
/s)
RAID-xchained declusteringRAID-10RAID-5
Securing Networks, Intranets,Clusters, or Grid Resources
with intrusion control and automatic recovery from malicious attacks
Highly secured Intranet with intrusion detection and
response, automatic recovery from
malicious attacks, and fault-tolerance
with distributed storage for reliable I/O
Gateway firewall
to screentraffic flow
between networks
Cluster with no security protection
Incr
easin
g R
elia
bil
ity
SMP cluster Intranet Grid
Increasing scalability
No dataprotection
Faulttolerance
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 25
Distributed Micro-Firewalls
Source: Murali and Hwang, USC
Distributed Checkpointing on Distributed Checkpointing on The RAIDThe RAID--x in Trojans Clusterx in Trojans Cluster
Process 0
Time
Process 2
Process 4
Process 1
Process 3
Process 5
Process 6
Process 7
Process 8
Process 9
Process10
Process11
Stripe0
Stripe1
Stripe2
C S
C: Checkpointing overhead
S: Synchronization overhead 18
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 27
Security Component Technologies
F Firewalls and Cryptography
F Cluster Middleware for Security
F Anti-virus and Immune Systems
F Intrusion Detection and Response
F Distributed Software RAIDs
F Security & Assurance Policies
Distributed Intrusion Detection and Responses
Security Threats Effectiveness in using Micro-Firewalls
Insider attacks Protect hosts against attack from insiders
Denial-of-Service attacks
Protect against denial-of-service attacks from any source
Trojan Program Protect hosts from trapdoors by any source
IP Address Spoofing
Can be reconfigured to prevent IP spoofing at the client host level
Probes and Scans
Use with IDS to block the probes and scans close to their sources
Unauthorized External access
Can prevent unauthorized access to the external networks at the source
Attacks on Intranet Infra-structure
Resist both internal and external attacks and provide fine-grained access control
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 29
Checkpointing overhead on distributed RAIDs
0
0.5
1
1.5
2
2.5
3
3.5
1.21 2.21 3.21 4.21 5.21 6.21 7.21 8.11
checkpoint file size (MB)
chec
kpo
int
ove
rhea
d (
sec) NFS
Vaidya
Striped
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 30
Advantages and Shortcomings of Distributed Checkpointing
Checkpointing Scheme Advantages Shortcomings Suitable applications
Simultaneous writing toa central storage(The NFS scheme)
Simple,no inconsistent state
Has network and I/Ocontentions, NFS is singlepoint of failure
Small size of checkpoint,small number of nodes,low I/O operation
Staggered writing to acentral storage(Vaidya scheme)
Eliminate the network and I/Ocontention
Network bandwidth iswasted, NFS is a singlepoint of failure
Small size ofcheckpointers, smallnumber of nodes,low I/O operations
Striped staggeringcheckpointing on anydistributed RAID(Our scheme)
Eliminate network and I/Ocontentions, low checkpointoverhead, fully utilize networkbandwidth, tolerate multiplefailures among stripe groups
Can not tolerate morenode failures withineach stripe group
Large size ofcheckpointers, largenumber of nodes,low communication,I/O intensiveapplications
K. Hwang , March 15,2001 in BeijingK. Hwang , March 15,2001 in Beijing 31
F Distributed storage-area networks demands hardware or software support of a single I/O space not only in clusters but also in pervasive information grids.
F Hierarchical checkpointing with striping and staggered mirroring for building fault-tolerant clusters to provide continuous network services
F Hacker-proof clusters are in great demand for securingE-business, distributed computing, and metacomputinggrid applications.
F Exploring new applications in multiserver consolidation, collaborative design, and pervasive network services.
Conclusions :
Call for Participation
IEEE Third International
Conference on Cluster Computing
CLUSTER 2001Sutton Place Hotel, Newport Beach, California
October 8 - 11, 2001