Oracle 10gDatabase Storage
DemystifiedJeff Browning, O.C.P, R.H.C.A.
Senior ManagerNetwork Appliance, Inc.
OracleWorld 2003
San Francisco
A little history The notion of storage networking SAN and NAS
– Current-technology SAN: FCP– Current-technology NAS: IP over GbE
RAID: The “packaging” of hard disks– RAID0– RAID1– RAID4– RAID5– Combinations of RAID levels
Emerging storage technologies– ATA RAID– Serial ATA (SATA)– iSCSI– NFS v. 4 (NFS RDMA)
Conclusion and wrap up
Agenda
A Little History
IDE/ATA: The beginning SCSI: A proliferation of standards
– SCSI-1– SCSI-2: The proliferation begins– SCSI-3: A new approach
In the BeginningThere Was IDE/ATA
Introduced by IBM with the AT/PC in 1984 Supported a master/slave concept Enhanced and adopted by Compaq in 1986
with the Deskpro 386 as the IDE interface– ATA and IDE are now interchangeable terms
What You Could Dowith an IDE/ATA Device:Not Much
IDE/ATA was slow (4 MB/s to start) It didn’t support many devices
(usually 2 hard drives) It wasn’t reliable But it was, and remains, very, very cheap It was never used widely for databases
Host
DVD
Master
Slave
SCSI: A Proliferation of Standards
Invented by Alan Shugart(founder of Seagate) in 1979
Adopted as an ANSI standard in 1986 First version was referred to as SCSI-1
What You Could Dowith a SCSI-1 Device:A Bit More SCSI-1 was still pretty slow (5
MB/s) It supported 7 peripheral
devices It was more reliable than
IDE/ATA It was also more expensive This was the first choice for
Sun, HP and other open systems vendors and, notably, the Macintosh
DVD
SCSI ID 1
DVD
SCSI ID 2
DVD
SCSI ID 3
DVD
SCSI ID 4
DVD
SCSI ID 5
DVD
SCSI ID 6
DVD
SCSI ID 0
SCSI ID 7
SCSI-2: The Proliferation Begins
Fast SCSI: Higher transfer speed(10 MB/s or higher)
Wide SCSI: Width of the bus wasincreased from 16 to 32 bits
More devices per bus (from 7 to 15) Other improvements
– Improved cables and connectors– Improved signaling– Active termination
SCSI-3: A New Approach
With SCSI-3 the approach changed– Cabling and connection layer no longer
defined in the basic spec So-called “interconnect” or “physical
layer” standards
SCSI-3 basic spec only defines acommand set and a communication protocol
SCSI-3: The PhysicalLayer StandardsSerial Bus SCSI This is the form of SCSI-3 found in
many hosts today
Serial Storage Architecture (SSA)
Used by IBM on its larger systems; not common
Fibre Channel Protocol (FCP)
Defines a standard for SCSI-3 traffic over Fibre Channel networks; by far the most popular form of SCSI-3 today for databases
iSCSI Emerging standard for SCSI-3 traffic over IP networks
The Notion of Storage Networking SCSI provided a way
to attach disks to a host
The need for sharing of disk and tape backup resources led to the idea of“shared SCSI”
Tape Library
DVD
DVD
DVD
DVD
DVD
DVD
DVD
DVD
DVD
DVD
DVD
DVD
DVD
DVD
SharedSCSITape
Library
Storage Networking for Applications Certain
applications required shared disk
Shared SCSI evolved as a way to solve this problem
Shared DiskArray
UNIX host A runningOracle w/ Parallel Extension
UNIX host B runningOracle w/ Parallel Extension
SharedSCSI
Storage Networking Evolves
Storage networking evolved along two paths– SAN: With FCP being the dominant protocol– NAS: With Gigabit Ethernet (GbE) NAS became
a viable alternative to FCP for many applications
The next section discusses the tradeoffs between these approaches
SAN and NAS
Storage Area Networks (SAN) take the approach of making SCSI sharable
Network Attached Storage (NAS) uses existing file sharing protocols to connect databases to storage
Both approaches have their place: They are different
Fibre Channel Emerges as Dominant SAN
Fibre Channel was designed as a SAN protocol
It was adopted as an ANSI standard in 1994 It has emerged as the de facto standard for
creating a SAN
Typical Fibre Channel SANWindows
host A LUNs
Windowshost B LUNs
Vol0: TargetOS
UNIX host ALUNs
UNIX host BLUNs
Vol0: TargetOS
Windows host A
Windows host B
UNIX host A
UNIX host B
FCP target A
FCP target B
UNIX FCPRedundant Network
UNIX FCP Network
Windows FCPRedundant Network
Windows FCP Network
BrocadeSwitch B
BrocadeSwitch A
Fibre Channel SAN Tradeoffs
Advantages– Bandwidth is good: 2 Gb FC is now common– Host CPU cost per I/O is comparable to SCSI– Latency is low and performance is good– Scalability is good
Disadvantages– More expensive than comparable IP network– Interoperability is poor but improving– Highly complex to setup and administer– Difficult to share disk capacity
NAS Emerges as Alternative to SAN NFS was created by Sun in in the early 1980s Version 1 of NFS was widely regarded as
inappropriate as a file sharing protocol for databases
Version 2 improved enough that Oracle certified NFS for Oracle datafiles in 1997
Version 3 builds upon those improvements Version 4 is emerging (more on this later)
Typical IP/GbE NAS
Multi-protocolsharedvolume
Vol0: FilerOS
Vol0: FilerOS
Windows host A
Windows host B
UNIX host A
UNIX host B
Filer A
Filer B
UNIX IPRedundant Network
UNIX IP Network
Windows IPRedundant Network
Windows IP Network
CiscoSwitch B
CiscoSwitch A
CISCOSYSTEMS
CISCOSYSTEMS
Multi-protocolsharedvolume
IP/GbE NAS Tradeoffs Advantages
– Bandwidth is pretty good using GbE– Switches/NICs are very inexpensive compared to FC
switches/HBAs– Simple and easy to setup and administer– Interoperability is excellent– Disk capacity can be easily shared – even across platforms
Disadvantages– Host CPU cost may be higher than FC, depending on load,
but not if the load is spindle-bound (NFS v. 4 fixes this in spades)
– CPU Scalability (in the sense of CPU count)can be lower than FC (again NFS v. 4 addresses this)
SAN vs. NAS Suitability
SAN– Suitable for high-end environments where latency,
performance, or CPU cost per I/O are critical– Required by some applications where NAS is not
supported
NAS– Suitable for low- or mid-end environments where
performance or CPU cost is less important than$$ cost
– Also suitable for some high-end environments where CPU is compute intensive, not I/O intensive
SAN and NAS are converging
RAID: Redundant Array of Inexpensive Disks
The problem:– Disks are fragile; they fail– Data is precious and must be protected– Tape or disk backup is too slow or too expensive
RAID provides a way to combine disks together with redundancy so that a single disk failure will not lose data
Hot spares and auto-promotion make this a viable long-term solution
Software RAID vs. hardware RAID
RAID and Its Variants
RAID0 Simple striping; not truly RAID
RAID1 Disk-to-disk mirroring
RAID4 Striping with a parity disk
RAID5 Striping with striped parity
RAID1+0RAID0+1RAID5+1Etc.
Combinations of RAID protection; can get complex
Data Disk 1
Data Disk 2
Data Disk 3
Data Disk 4
Data Disk 5
Data Disk 6
Data Disk 7
Data Disk 8
Data Disk 9
Data Disk 10
Data Disk 11
Data Disk 12
Data Disk 13
Data Disk 14
Volume /vol/silly
RAID0array
File /vol/silly/foo.txt
RAID0: Striping
RAID0 Tradeoffs
Advantages:– Fastest type of RAID; leverages disks well– No disk overhead
Disadvantage:– A single disk loss is critical
Suitability– Any environment where performance is
important, and you do not care about the data, e.g. Datamarts
File /vol/mirrored/
foo.txt
Mirror Disk
Data Disk
Volume /vol/mirrored
RAID1: Simple Mirroring
File /vol/mirrored/
foo.txt
Mirror Disk
Data Disk
Volume /vol/mirrored
RAID1: Simple Mirroring
RAID1 Tradeoffs Advantages:
– Read capacity is higher than single disk (but lower than striping)
– Very fault tolerant; all data is mirrored
Disadvantage:– Single disk capacity for writes– Two write per I/O penalty– Doubles disk cost
Suitability:– Very commonly used for online redo logs
RAID0+1: Striping with Mirroring
Data Disk 1
Data Disk 2
Data Disk 3
Data Disk 4
Data Disk 5
Data Disk 6
Data Disk 7
Mirror Disk 1
Mirror Disk 2
Mirror Disk 3
Mirror Disk 4
Mirror Disk 5
Mirror Disk 6
Mirror Disk 7
Volume /vol/spensive
File /vol/spensive/
foo.txt
RAID0+1array
RAID0+1: Striping with Mirroring
Data Disk 1
Data Disk 2
Data Disk 3
Data Disk 4
Data Disk 5
Data Disk 6
Data Disk 7
Mirror Disk 1
Mirror Disk 2
Mirror Disk 3
Mirror Disk 4
Mirror Disk 5
Mirror Disk 6
Mirror Disk 7
Volume /vol/spensive
File /vol/spensive/
foo.txt
RAID0+1array
Data Disk 1
Mirror Disk 1
Data Disk 2
Mirror Disk 2
Data Disk 3
Mirror Disk 3
Data Disk 4
Mirror Disk 4
Data Disk 5
Mirror Disk 5
Data Disk 6
Mirror Disk 6
Data Disk 7
Mirror Disk 7
Volume /vol/spensive
File /vol/spensive/
foo.txt
RAID1+0array
RAID1+0: Mirroring with Striping
Data Disk 1
Mirror Disk 1
Data Disk 2
Mirror Disk 2
Data Disk 3
Mirror Disk 3
Data Disk 4
Mirror Disk 4
Data Disk 5
Mirror Disk 5
Data Disk 6
Mirror Disk 6
Data Disk 7
Mirror Disk 7
Volume /vol/spensive
File /vol/spensive/
foo.txt
RAID1+0array
RAID1+0: Mirroring with Striping
RAID0+1/RAID1+0 Tradeoffs Advantages:
– Read capacity is high; multiple disks are leveraged
– Very fault tolerant; all data is mirrored
Disadvantage:– Two write per I/O penalty– Doubles disk cost
Suitability:– Very common for storing Oracle datafiles
where redundancy is highly valued
Data Disk 1
Data Disk 2
Data Disk 3
Data Disk 4
Data Disk 5
Data Disk 6
Data Disk 7
Data Disk 8
Data Disk 9
Data Disk 10
Data Disk 11
Data Disk 12
Data Disk 13
Parity Disk
Volume /vol/thrifty
File /vol/thrifty/foo.txt
RAID4array
RAID4: Striping with Parity Disk
Data Disk 1
Data Disk 2
Data Disk 3
Data Disk 4
Data Disk 5
Data Disk 6
Data Disk 7
Data Disk 8
Data Disk 9
Data Disk 10
Data Disk 11
Data Disk 12
Data Disk 13
Parity Disk
Volume /vol/thrifty
File /vol/thrifty/foo.txt
RAID4array
RAID4: Striping with Parity Disk
RAID4 Tradeoffs Advantages:
– Read Capacity is high; multiple disks are leveraged– Low RAID overhead; almost as good as RAID 0– RAID protection exists
Disadvantage:– Two disks cannot be lost– Parity disk can become a bottleneck (some vendors avoid
this issue with buffering, in which case performance is similar to RAID 1)
Suitability:– Very common for storing Oracle datafiles where
redundancy is needed, and the cost of RAID0+1/RAID1+0 is too high
Data Disk 1
Data Disk 2
Data Disk 3
Data Disk 4
Data Disk 5
Data Disk 6
Data Disk 7
Data Disk 8
Data Disk 9
Data Disk 10
Data Disk 11
Data Disk 12
Data Disk 13
Data Disk 14
Volume /vol/slow
File /vol/slow/foo.txt
RAID5array
RAID5: Striping with Striped Parity
Data Disk 1
Data Disk 2
Data Disk 3
Data Disk 4
Data Disk 5
Data Disk 6
Data Disk 7
Data Disk 8
Data Disk 9
Data Disk 10
Data Disk 11
Data Disk 12
Data Disk 13
Data Disk 14
Volume /vol/slow
File /vol/slow/foo.txt
RAID5array
RAID5: Striping with Striped Parity
RAID5 Tradeoffs Advantages:
– Read Capacity is high; multiple disks are leveraged– Low RAID overhead; almost as good as RAID 0– RAID protection exists
Disadvantage:– Two disks cannot be lost– Slowest RAID; CPU cost of parity striping is high
Suitability:– Very common for storing Oracle datafiles where
redundancy is needed, performance is not critical, and the cost of RAID0+1/RAID1+0 is too high
Emerging Storage Technologies
ATA RAID Serial ATA (SATA) iSCSI NFS v. 4 (NFS RDMA)
ATA RAID
SharedTape
Library
UNIX host A
UNIX host B
Windows host A
Windows host B
FCP orIP/GbENetwork
Tape Library
A repackaging of cheap ATA/IDE disks
Used as a tape backup substitute
Archive storage is on-line and accessible
Faster than tape Almost as cheap as tape,
or even cheaper if compression is used
SharedATARAIDArray
UNIX host A
UNIX host B
Windows host A
Windows host B
FCP orIP/GbENetwork
Serial ATA
An updating of the ATA/IDE spec to current technology Intel and Dell Targeted for desktops and next generation storage
appliances Could become a serious competitor to FCP and serial
bus SCSI
iSCSI Implements SCSI-3 protocol over IP networks Intel is a leader Software initiators exist for Windows and Linux HP-UX and AIX initiators are in public beta Targets are available from a variety of vendors Presently immature, but will become viable competitor
to FCP– Key is TOE HBAs on both target and initiator
Effectively offloads host/target CPU from IP traffic– Cost per port for switches and HBAs is vastly cheaper than
FCP– If performance becomes comparable, FCP could be toast
Typical iSCSI SAN
Windowshost A LUNs
Windowshost B LUNs
Vol0: TargetOS
UNIX host ALUNs
UNIX host BLUNs
Vol0: TargetOS
iSCSI target A
iSCSI target B
Windows host A
Windows host B
UNIX host A
UNIX host B
UNIX IPRedundant Network
UNIX IP Network
Windows IPRedundant Network
Windows IP Network
CiscoSwitch B
CiscoSwitch A
CISCOSYSTEMS
CISCOSYSTEMS
NFS v. 4 (NFS RDMA)
Basically, a rewrite of NFS Focused on “local sharing” i.e., database customers
and the like, who need to share data across a small, focused network with very good performance
Supports Read Direct Memory Access, a very high performance, low latency I/O protocol
Supports Infiniband as an I/O interface Leaders are Network Appliance and Sun Will provide a transparent performance upgrade path
for NFS database customers
Agenda A little history The notion of storage networking SAN and NAS
– Current-technology SAN: FCP– Current-technology NAS: IP over GbE
RAID: The “packaging” of hard disks– RAID0– RAID1– RAID4– RAID5– Combinations of RAID levels
Emerging storage technologies– ATA RAID– Serial ATA (SATA)– iSCSI– NFS v. 4 (NFS RDMA)
Conclusion and wrap up
Wrap Up