CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System...

CIT 470: Advanced Network and System Administration Slide #1

CIT 470: Advanced Network and System Administration

Disks


Topics

1. Disk interfaces2. Disk components3. Performance4. Reliability5. Partitions6. RAID7. Adding a disk8. Logical volumes9. Filesystems10. Storage Management


Volumes

A volume is a chunk of storage as seen by the server. A disk. A partition. A RAID set.

Logical Unit Numbers (LUNs) identify volumes.A volume is formatted with a filesystem.

ext2 and ext3 ZFS FAT NTFS ISO9660


Disk Interfaces

SCSIStandard interface for servers.

IDEStandard interface for PCs.

Fibre Channel (FC-AL)High bandwidthCan run SCSI or IP

iSCSISCSI over fast (e.g., 10-gigabit) IP network equipment.

USBFast enough for slow devices on PCs.


SCSI

Small Computer Systems Interface Fast, reliable, expensive.

A bus, not a simple PC to device interface.Each device has a target # ranging 0-7 or 0-15.Devices can communicate directly w/o CPU.

Many versionsOriginal: SCSI-1 (1979) 5MB/sCurrent: SCSI-3 (2001) 320MB/s

Serial Attached SCSI (SAS)Up to 128 devicesUp to 750 MB/s full duplex.


IDE

Integrated Drive Electronics / AT attachmentSlower, less reliable, cheap.

Only allows 2 devices per interface.

ATAPI standard added removable devices.

Many versionsOriginal: IDE / ATA (1984)

Current: Ultra-ATA/133 133MB/s

Serial ATAUp to 128 devices.

150 MB/s (SATA-1) and 300 MB/s (SATA-2)


IDE vs. SCSI

SCSI offers better performance/scaleFaster bus

Faster hard drives (up to 15,000rpm).

Lower CPU usage

Better handling of multiple requests.

Cheaper IDE often best for workstations.

ConvergenceSATA2 and SAS converging on a single std.


Hard Drive Components


Hard Drive ComponentsActuator

Moves arm across disk to read/write data.Arm has multiple read/write heads (often 2/platter.)

PlattersRigid substrate material.Thin coating of magnetic material stores data.Coating type determines areal density: Gbits/in2

Spindle MotorSpins platters from 3600-15,000 rpm.Speed determines disk latency.

Cache8-32MB of cache memoryReliability: write-back vs. write-through


Disk Information: hdparm# hdparm -i /dev/hde

/dev/hde:

Model=WDC WD1200JB-00CRA1, FwRev=17.07W17, SerialNo=WD-WMA8C4533667 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq } RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40 BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=off CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=no WriteCache=enabled Drive conforms to: device does not report version:

* signifies the current active mode


Disk Performance

Seek TimeTime to move head to desired track (3-8 ms)

Rotational DelayTime until head over desired block (8ms for 7200)

LatencySeek Time + Rotational Delay

ThroughputData transfer rate (20-100 MB/s)


Latency vs. Throughput

Which is more important?Depends on the type of load.

Sequential access – ThroughputMultimedia on a single user PC

Random access – LatencyMost servers

How to improve performance Faster disks (15 krpm vs 7200rpm) Caching (disk, controller, server OS, client OS)More spindles (disks).More disk controllers.


Disk Performance: hdparm

# hdparm -tT /dev/sda

/dev/sda:

Timing cached reads: 1954 MB in 2.00 seconds = 977.02 MB/sec

Timing buffered disk reads:

268 MB in 3.02 seconds = 88.66 MB/sec


Reliability

MTBFAverage time between failures (>100,000 hours).

Real failure curvesEarly phase: high failure rate from defects.Constant failure rate phase: MTBF valid.Wearout phase: high failure rate from wear.

Failures more likely on traumatic events.Power on/off.

Systems often wear out before MTBF.


Partitions and the MBR

4 primary partitions.

One can be used as an extended partition, which is a link to an Extended boot record on the 1st sector of that partition.

Each logical partition is described by its own EBR, which links to the next EBR.


Extended Partitions and EBRs

There is only one extended partition.– It is one of the primary partitions.

– It contains one or more logical partitions.

– It should contain all disk space not used by the other primary partitions.

EBRs contain two entries.– The first entry describes a logical partition.

– The second entry points to the next EBR if there are more logical partitions after the current one.


Why Partition?

1. Separate OS from user files, to allow user backups + OS upgrades w/o problems.

2. Have a faster swap area for virtual memory.

3. Improve performance by keeping filesystem tables small and keeping frequently used together files close together on the disk.

4. Limit the effect of disk full issues, often caused by log or cache files.

5. Multi-boot systems with multiple OSes.


RAID

Redundant Array of Independent Disks Combine physical disks into single logical unit.

Can be implemented in hardware or software. Hardware RAID controllers: Caching

Automate rebuilding of arraysAdvantages

CapacityReliabilityThroughput


RAID Levels

Level Min Description

JBOD 2 Merge disks for capacity, no striping.

RAID 0 2 Striped for performance + capacity.

RAID 1 2 Mirrored for fault tolerance.

RAID 3 3 Striped set with dedicated parity disk.

RAID 4 3 Block instead of byte level striping.

RAID 5 3 Striped set with distributed parity.

RAID 6 4 Striped set with dual distributed parity.


Striping

• Distribute data across multiple disks.

• Improve I/O by accessing disks in parallel.– Independent requests can be serviced in parallel

by separate disks.– Single multi-block requests can be serviced by

multiple disks.

• Performance vs. reliability– Performance increases with # disks.– Reliability decreases with # disks.


Parity

Store extra bit with each chunk of data.

7-bit data even parity odd parity

0000000 00000000 10000000

1011011 11011011 01011011

1100110 01100110 11100110

1111111 11111111 01111111

Odd parity add 0 if # of 1s is odd add 1 if # of 1s is even

Even parity add 0 if # of 1s is even add 1 if # of 1s is odd


Error Detection with Parity

Even: every byte must have even # of 1s.

What if you read a byte with an odd # of 1s?– It’s an error.– An odd # of bits were flipped.

What if you read a byte with an even # of 1s?– It may be correct.– It may be an error where an even # of bits are bad.


Error Correction

XOR each block to get parity information.

XOR with parity block to retrieve missing block on bad drive.


RAID 0: Striping, no Parity

PerformanceThroughput = n * disk speed

Reliability Lower reliability. If one disk lost, entire set is lost. MTBF = (avg MTBF)/# disks

Capacityn * disk size


RAID 1: Disk MirroringPerformance

– Reads are faster since read operations will return after first read is complete.

– Writes are slower because write operations return after second write is complete.

Reliability– System continues to work after one disk dies.– Doesn’t protect against disk or controller

failure that corrupts data instead of killing disk.

– Doesn’t protect against human or software error.

Capacity– n/2 * disk size


RAID 3: Striping + Dedicated Parity

ReliabilitySurvive failure of any 1 disk.

Performance Striping increases performance,

but Parity disk must be accessed on

every write. Parity calculation decreases write

performance. Good for sequential reads (large

graphics + video files.)

Capacity(n-1) * disk size


RAID 4: Stripe + Block Parity Disk

• Identical to RAID 3 except uses block striping instead of byte striping.


RAID 5: Stripe + Distributed Parity

ReliabilitySurvive failure of any 1 disk.

Performance Fast reads (RAID 0), but

slow writes. Like RAID 4 but without

bottleneck of a single parity disk.

Still have to read blocks + write parity block if alter any data blocks.

Capacity(n-1) * disk size


RAID 6: Striped w/ Dual Parity

• Like RAID 5 but with two parity blocks.

• Can survive failure of two drives at once.


Nested RAID Levels

Many RAID systems can use both– Physical drives.– RAID sets.

as RAID volumes:– Allows admins to combine advantages of levels.– Nested levels named by combination of levels,

e.g. RAID 01 or RAID 0+1


RAID 01 (0+1)

Mirror of stripes.– If disk fails in RAID 0

array, can be tolerated by using disk from other RAID 0.

– Cannot tolerate 2 disk failures unless both from same stripe.


RAID 10 (1+0)

Stripe of mirrors.– Can tolerate all but one

drive can failing from each RAID 1 set.

– Uses more disk space than RAID 5 but provides higher performance.

– Highest capacity, performance, and cost.


RAID 51 (5+1)

Mirror of RAID 5sCapacity = (n/2-1) *

disk size

Min disks: 6


RAID Failures

RAID sets work after single disk failure Except RAID 0 Operate in degraded mode

RAID set rebuilds after bad disk replaced Can take hours to rebuild parity/mirror data. Some hardware allows hot swapping, so server

doesn’t have to be rebooted to replace disk. Some hardware supports a hot spare disk that

will be used immediately on disk failure for rebuild.


You still need backups

Human and software errors– RAID won’t protect you from rm –rf / or copying

over the wrong file.System crash

– Crashes can interrupt write operations, leading to situation where data is updated but parity not.

Correlated disk failures– Accidents (power failures, dropping the machine) can

impact all disks at once.– Disks bought at same time often fail at same time.

Hardware data corruption– If a disk controller writes bad data, all disks will have

the bad data.


Logical Volumes

What are logical volumes?Appear to user as a physical volume.

But can span multiple partitions and/or disks.

Why logical volumes?Aggregate disks for performance/reliability.

Grow and shrink logical volumes on the fly.

Move logical volumes btw physical devices.

Replace volumes w/o interrupting service.


LVM


LVM Components

Logical Volume Group (LVG)Set of physical volumes (partitions or disks.)

May be divided into logical volumes (LVs.)

LVs made up of fixed sized logical extentsEach LE is 4MB.

Physical extents are the same size.


Mapping Modes

Linear MappingLVs assigned to continguous areas of PV space.

Striped MappingLEs interleaved across PVs to improve performance.


Setting up a LVG and LV

1. Initialize physical volumespvcreate /dev/hda1

pvcreate /dev/hdb1

2. Initialize a volume groupvgcreate nku_proj /dev/hda1 /dev/hdb1

Use vgextend to add more PVs later.

3. Create logical volumes lvcreate -n nku1 --size 100G nku_proj1

4. Create filesystemmkfs –v –t ext3 /dev/nku_proj/nku1


Extending a LV

Set absolute sizelvextend –L120G /dev/nku_proj/nku1

Or set relative sizelvextend –L+20G /dev/nku_proj/nku1

Expand the filesystem without unmountingext2online –v /dev/nku_proj/nku1

Check sizedf –k


Adding a Disk

Install new hardwareVerify disk recognized by BIOS.

BootVerify device exists in /dev

Partitionfdisk /dev/sdb

Create filesystemmkfs –v –t ext3 /dev/sdb1

Add entry to /etc/fstab/dev/sdb1 /proj ext3 defaults 0 2

mount -a


Filesystemsext3fs

Current common Linux filesystem.

Journaling eliminates need for regular fscking.

ext2fsOld Linux non-fragmenting fast filesystem.


inode

• File consists of inode + data blocks.

• inodes are static:– table has fixed # inodes

– size: 128 bytes

• inode contains– UID of owner

– GID of group

– Permissions

– Timestamps

– Reference count

– Block pointers


When don’t you need a filesystem?

Swap spacemkswap –v /dev/sdb1

Server applicationsOracle

VMWare Server


Swap

Can use swapfile instead of swap partitiondd if=/dev/zero of=/swapfile bs=1024k count=512

mkswap /swapfileEnable swap

swapon /swapfileswapon /dev/sda2

Disable swapswapoff /swapfileswapoff /dev/sda2

Check swap resource usagecat /proc/swaps


Mounting

To use a filesystemmount /dev/sda1 /mnt

df /mnt

Automatic mountingAdd an entry in /etc/fstab

Unmountumount /dev/sda1

Cannot unmount a volume in use.


fstab# /etc/fstab: static file system information.## <file system> <mount point> <type> <options> <dump>

<pass>proc /proc proc defaults 0 0/dev/hdc1 / ext3 defaults 0 1/dev/hdc5 /win vfat user,rw 0 0/dev/hdc7 none swap sw 0 0/dev/hdc8 /var ext3 defaults 0 2/dev/hdc9 /home ext3 defaults 0 2/dev/hda /media/cdrom0 iso9660 ro,user 0 0/dev/fd0 /media/floppy0 auto rw,user 0 0


UUIDs

# /etc/fstab: static file system information.## <file system> <mount point> <type> <options> <dump> <pass>UUID=fbdfebe2-fbde-42c9-963d-12428b642f1d / ext3 defaults 0 1UUID=a1858e04-78b9-460b-a6cb-3f1dfe3fa16e /home ext3 defaults 0 2UUID=c4f14e27-96cd-420c-9860-4bd5298e3f76 none swap sw 0 0

Universally Unique IDentifiers– 128-bit numbers written as 32 hex digits.

– 3.4 × 1038 possible UUIDs

Used to identify devices on Linux– To find UUID for a specific device: vol_id –u /dev/sda1

– All devices: ls –l /dev/disk/by-uuid


fsck: check + repair fs

Filesystem corruption sourcesPower failureSystem crash

Types of corruptionUnreferenced inodes.Bad superblocks.Unused data blocks not recorded in block maps.Data blocks listed as free that are used in files.

fsck can fix these and moreAsks user to make more complex decisions.Stores unfixable files in lost+found.


Cost of Storage

Disks are cheap1TB SATA disks cost $100 in late 2008.

Storage is expensive 20% of cost is hard disks 80% of cost is overhead

• Servers• Power• AC• Support• Backups


Storage ManagementGroup-based Storage

– Ideally, each group has its own fileserver.– Group activities can interfere with each other:

• Capacity (filling disks)• Performance

Storage Needs Assessment– Ask customers for anticipated storage growth.– Monitor servers to measure current growth.

Storage SLA– Availability– Performance (response time)– Cost and time to add new storage.


References1. Aeleen Frisch, Essential System Administration, 3rd edition, O’Reilly,

2002.2. Charles M. Kozierok, “Reference Guide—Hard Disk Drives,”

http://www.pcguide.com/ref/hdd/, 2005.3. A.J. Lewis, LVM HOWTO,

http://www.tldp.org/HOWTO/LVM-HOWTO/index.html, 2005.4. H. Mauelson and M. O’Keefe, “The Linux Logical Volume

Manager,” Red Hat Magazine, http://www.redhat.com/magazine/009jul05/features/lvm2/, July 2005.

5. Evi Nemeth et al, UNIX System Administration Handbook, 3rd edition, Prentice Hall, 2001.

6. Octane, “SCSI Technology Primer,” http://arstechnica.com/paedia/s/scsi-1.html, 2002.

7. RedHat, RHEL4 System Administration Guide, http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/sysadmin-guide/, 2005.

8. Wikipedia, “RAID”, http://en.wikipedia.org/wiki/RAID

http://www.pcguide.com/ref/hdd/

http://www.tldp.org/HOWTO/LVM-HOWTO/index.html

http://www.redhat.com/magazine/009jul05/features/lvm2/

http://arstechnica.com/paedia/s/scsi-1.html

http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/sysadmin-guide/

http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/sysadmin-guide/

Date post:	29-Mar-2015
Category:	Documents
Upload:	elle-freckleton
View:	232 times
Download:	4 times

CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System...

Documents