+ All Categories
Home > Documents > CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam...

CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam...

Date post: 02-May-2018
Category:
Upload: vokhanh
View: 218 times
Download: 5 times
Share this document with a friend
24
20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage and I/O (MMIO, Devices, Reliability/Availability, Performance)
Transcript
Page 1: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

20 November 2012 Sam Siewert

CSE A215 Assembly Language Programming

for Engineers Lecture 13 – Storage and I/O

(MMIO, Devices, Reliability/Availability, Performance)

Page 2: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Hardware/Software Interface for I/O

Basics and Driver Concept

Sam Siewert

2

Page 3: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Sam Siewert 3

PCI (Peripheral Component Interconnect) System

CPU

North Bridge Graphics Adapter

SDRAM/ DDR

South Bridge

Super IO Audio

ISA Bus

PCI 2.x Bus

IDE

COM-A

COM-B

Ethernet Expansion Slots

AGP

FSB

Page 4: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Sam Siewert 4

Hardware View of Device Interfaces Analog I/O – DAC analog output: servos, motors, heaters, ... – ADC analog input: photodiodes, thermistors, ...

Digital I/O – Direct TTL I/O or GPIO – Digital Serial (I2C, SPI, ... - Chip-to-Chip) – Bus Interfaces

Parallel – PCI 2.x, PCI-X, SCSI, etc (32-bit, 64-bit, synchronous parallel transfer)

Differential Serial – USB – Infiniband – gigE / 10GE Ethernet – Fiber Channel – SAS/SATA

Page 5: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Sam Siewert 5

Software View of Drivers MMIO – Device Buffers Decode Memory Bus Addresses (Outside of RAM address space) Character – Register Control/Config, Status, Data – Typical of Low-Rate I/O Interfaces (RS232) – Linux User Space Buffer Drivers (Direct IO) – e.g.

SCSI Generic Block – FIFOs, Dual-Port RAM and DMA – Typical of High-Rate I/O Interfaces (Network, Storage) – Only Interface for 512 Byte LBA/Sector HDDs

Network – Driver Stacks – OSI 7 Layer Model (Phy, Link, Network, Transport, Session,

Presentation, Application) – TCP/IP/Ethernet/Cat-6e

Page 6: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Sam Siewert 6

Linux Char Driver Design Application Interface – Application Policy – Blocking/Non-Blocking – Multi-thread access – Abstraction

Device Interface – SW/HW Interface – Immediate Buffering – Interrupt Service

Routine App/Device Interface Hardware Device

Application(s)

ISR

SemGive Input Ring-Buffer

Output Ring-Buffer

If Output Ring-Buffer Full then

{SemTake or EAGAIN}

else {Process and Return}

If Input Ring-Buffer Empty then

{SemTake or EAGAIN}

else {Processand Return}

open/close, read/write, creat, ioctl EAGAIN, Block,

Data, Status

Page 7: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Sam Siewert 7

Cached Memory and DMA

Cache Coherency – Making sure that cached data and memory are in sync – Can become out of sync due to DMAs and Multi-Processor

Caches – Push Caches Allow for DMA into and out of Cache Directly – Cache Snooping by HW may Obviate Need for Invalidate

Drivers Must Ensure Cache Coherency – Invalidate Memory Locations on DMA Read Completion – Flush Cache Prior to DMA Write Initiation

IO Data Cache Line Alignment – Ensure that IO Data is Aligned on Cache Line Boundaries – Other Data That Shares Cache Line with IO Data Could

Otherwise Be Errantly Invalidated

Page 8: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

How Reliable and Available are Data Center Systems?

Availability vs. Reliability

Sam Siewert

8

Page 9: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Sam Siewert 9

Reliability and Recovery

Redundancy – Dual String

Side A, Side B Pilot, Co-Pilot

– Fail-Over Fault Detection, Protection, Recovery

– Backup System Independent Design e.g. Backup Flight System

– Cross Strapping of Sides Dual String A & B 3 Components C1, C2, C3 8 Possible Configurations 4 Component Switches A|B Select Switch

C1 C1

C2 C2

C3 C3

A B

Configurations C1 C2 C3

1 A A A

2 A A B

3 A B A

4 A B B

5 B A A

6 B A B

7 B B A

8 B B B

SW1 SW2

SW3 SW4

Page 10: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Sam Siewert 10

High Availability

Service Up-time is Figure of Merit – Number of Times down? – How long down? – Quick recovery is key

Hot or Warm Spare Equipment Protection – Fault Detection and Fail Over Without Service Outage – Excess Capacity

E.g. Diverse Routing in a Network Overlapping Coverage in Cell Phone Systems On-orbit spare satellites

Page 11: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Sam Siewert 11

Availability vs. Reliability

Are They the Same? – Are all Reliable Systems Highly Available? – Are all Highly Available Systems Reliable?

Reliability = Long MTTF – Mean Time To Failure – Mean Time Between Failures, MTBF=MTTF+MTTR – FDIR When Failures Do Occur

Fault Detection, Isolation and Recovery Safing MTTR (Mean Time to Recover)

Availability = MTTF / (MTTF + MTTR) = % Uptime – MTTF = 8,766 hours (525,960 minutes) – MTTR = 5 minutes – Availability = 525,960 / (525,960 + 5) = 99.999% Uptime

Page 12: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Storage I/O

Storing Data Long Term

Sam Siewert

12

Page 13: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

A Single Disk Drive Read and Write 512-byte Sectors at LBA (Logical Block Address) A 2TB 3.5” SATA Disk Drive has 4 billion 512-byte sectors to manage The Operating System SATA/SCSI Driver and Filesystem Layered on the Block Driver Provide Use of a Disk Drive The Operating System Caches Pages (Typically 4K), that are Written Back (like CPU cache) from RAM to Disk When Needed (See slabtop) Filesystem Manages Access to Sectors Block I/O Can Be Done Directly as Well

Sam Siewert 13

Page 14: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

RAID-10

Sam Siewert 14

A1 A1 A2 A2 A3 A3 A4 A4 A5 A5 A6 A6

RAID-1 Mirror RAID-1 Mirror RAID-1 Mirror

RAID-0 Striping Over RAID-1 Mirrors

A7 A7 A8 A8 A9 A9 A10 A10 A11 A11 A12 A12

A1,A2,A3, … A12

Page 15: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

RAID5,6 XOR Parity Encoding

MDS Encoding, Can Achieve High Storage Efficiency with N+1: N/(N+1) and N+2: N/(N+2)

Sam Siewert 15

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Stor

age

Effic

ienc

y

Number of Data Devices for 1 XOR or 2 P,Q Encoded Devices

RAID6

RAID5

Page 16: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

RAID-50

Sam Siewert 16

A1

RAID-5 Set RAID-5 Set

B1 C1 D1 P(ABCD)

E1 F1 G1 H1 P(EFGH)

I1 J1 P(IJKL) K1 L1 M1 P(MNOP) N1 P1 O1

P(QRST) Q1 R1 S1 T1

A2 B2 C2 D2 P(ABCD)

E2 F2 G2 H2 P(EFGH)

I2 J2 P(IJKL) K2 L2 M2 P(MNOP) N2 P2 O2

P(QRST) Q2 R2 S2 T2

RAID-0 Striping Over RAID-5 Sets

A1,B1,C1,D1,A2,B2,C2,D2,E1,F1,G1,H1,…, Q2,R2,S2,T2

Page 17: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

A1

RAID-6 Set RAID-6 Set

B1 C1 D1 P(ABCD)

E1 F1 G1 P(EFGH)

I1 J1 P(IJKL) K1 M1 P(MNOP) N1 O1 P(QRST) Q1 R1 S1

RAID-0 Striping Over RAID-6 Sets

A1,B1,C1,D1,A2,B2,C2,D2,E1,F1,G1,H1,…, Q2,R2,S2,T2

Disk5 Disk1 Disk2 Disk3 Disk4

Q(EFGH)

Disk6

H1 QABCD)

Q(IJKL)

Q(MNOP)

Q(QRST)

L1 P1 T1

A2 B2 C2 D2 P(ABCD)

E2 F2 G2 P(EFGH)

I2 J2 P(IJKL) K2 M2 P(MNOP) N2 O2 P(QRST) Q2 R2 S2

Disk5 Disk1 Disk2 Disk3 Disk4

Q(EFGH)

Disk6

H2 QABCD)

Q(IJKL)

Q(MNOP)

Q(QRST)

L2 P2 T2

RAID-60 (Reed-Solomon Encoding)

Page 18: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

I/O Performance

Some Methods to Improve I/O on Linux

Sam Siewert

18

Page 19: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Hiding IO Latency – Overlapping with Processing

Simple Design – Each Thread has READ, PROCESS, WRITE-BACK Execution Frame rate is READ+PROCESS+WRITE latency – e.g. 10 fps for 100 milliseconds – If READ is 70 msec, PROCESS is 10 msec, and WRITE-BACK

20 msec, predominate time is IO time, not processing – Disk drive with 100 MB/sec READ rate can only read 16 fps,

62.5 msec READ latency

Sam Siewert 19

READ F(1) Process F(1) Write-back F(1) READ F(2)

Page 20: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Hiding IO Latency

Schedule Multiple Overlapping Threads? Requires Nthreads = Nstages x Ncores 1.5 to 2x Number of Threads for SMT (Hyper-threading) For IO Stage Duration Similar to Processing Time More Threads if IO Time (Read+WB+Read) >> 3 x Processing Time Sam Siewert 20

READ F1 Process F1 Write-back F1 READ F4 Process F4 Write-back F4

READ F2 Process F2 Write-back F2 READ F5 Process F5 …

READ F3 Process F3 Write-back F3 Read F6 …

Start-up Core #1 Continuous Processing Core #1 Continuous Processing

READ F1 Process F1 Write-back F1 READ F4 Process F4 Write-back F4

READ F2 Process F2 Write-back F2 READ F5 Process F5 …

READ F3 Process F3 Write-back F3 Read F6 …

Start-up Core #2 Continuous Processing Core #2 Continuous Processing

Page 21: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Hiding Latency – Dedicated IO

Schedule Reads Ahead of Processing

Requires Nthreads = 2 + Ncores

Synchronize Frame Ready/Write-backs Balance Stage Read/Write-Back Latency to Processing 1.5 to 2x Threads for SMT (Hyper-threading)

Sam Siewert 21

Wait Process F1 Process F3 Process F5 …

Wait Process F2 Process F4 Process F6

Read F1 Read F2 Read F3 Read F4 Read F5 Read F6 Read F7 Read F8

Start-up

Wait … WB F1 WB F2 WB F3 WB F4 WB F5 WB F6

Dual-Core Concurrent Processing Completion

Page 22: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Processing Latency Alone Write Code with Memory Resident Frames – Load Frames in Advance – Process In-Memory Frames Over and Over – Do No IO During Processing – Provides Baseline Measurement of Processing Latency per

Frame Alone – Provides Method of Optimizing Processing Without IO Latency

Sam Siewert 22

Page 23: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

IO Latency Alone Comment Out Frame Transformation Code or Call Stubbed NULL Function – Provides Measurement of IO Frame Rate Alone – Essentially Zero Latency Transform – No Change Between Input Frames and Output Frames – Allows for Tuning of IO Scheduler and Threading

Sam Siewert 23

Page 24: CSE A215 Assembly Language Programming for …ssiewert/a225_doc/Lectures/...20 November 2012 Sam Siewert CSE A215 Assembly Language Programming for Engineers Lecture 13 – Storage

Tips for IO Scheduling

blockdev --getra /dev/sda – Should return 256 – Means that reads read-ahead up to 128K – Function calls – read, fread should request as much as possible – Check “actual bytes read”, re-read as needed in a loop

blockdev --setra /dev/sda 16384 (8MB) Switch CFQ to Deadline – Use “lsscsi” to verify your disk is /dev/sda … substitue block

driver interface used for file system if not sda – cat /sys/block/sda/queue/scheduler – echo deadline > /sys/block/sda/queue/scheduler

Options are noop, cfq, deadline

Sam Siewert 24


Recommended