+ All Categories
Home > Technology > Storage: Alternate Futures

Storage: Alternate Futures

Date post: 11-Jul-2015
Category:
Upload: -
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
43
1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Storage: Alternate Futures Storage: Alternate Futures Jim Gray Microsoft Research Research.Micrsoft.com/~Gray/talks NetStore ’99 Seattle WA, 14 Oct 1999
Transcript
Page 1: Storage: Alternate Futures

1

Yotta

Zetta

Exa

Peta

Tera

Giga

Mega

Kilo

Storage: Alternate FuturesStorage: Alternate FuturesJim Gray

Microsoft Research

Research.Micrsoft.com/~Gray/talks

NetStore ’99

Seattle WA, 14 Oct 1999

Page 2: Storage: Alternate Futures

2

Acknowledgments: Thank You!!

• Dave Patterson: – Convinced me that processors are moving to the

devices.

• Kim Keeton and Erik Riedell– Showed that many useful subtasks can be done by

disk-processors, and quantified execution interval

• Remzi Dusseau – Re-validated Amdhl’s laws

Page 3: Storage: Alternate Futures

3

Outline• The Surprise-Free Future (5 years)

– 500 mips cpus for 10$ – 1 Gb RAM chips – MAD at 50 Gbpsi – 10 GBps SANs are ubiquitous– 1 GBps WANs are ubiquitous

• Some consequences– Absurd (?) consequences.– Auto-manage storage– Raid10 replaces Raid5– Disc-packs– Disk is the archive media of choice

• A surprising future?– Disks (and other useful things) become supercomputers.– Apps run “in the disk”

Page 4: Storage: Alternate Futures

4

The Surprise-free Storage Future• 1 Gb RAM chips

• MAD at 50 Gbpsi

• Drives shrink one quantum

• Standard IO

• 10 GBps SANs are ubiquitous

• 1 Gbps WANs are ubiquitous

• 5 tips cpus for 1K$ and 500 mips cpus for 10$

Page 5: Storage: Alternate Futures

5

1 Gb RAM Chips • Moving to 256 Mb chips now

• 1Gb will be “standard” in 5 years, 4 Gb will be premium product.

• Note: – 256Mb = 32MB: the smallest memory– 1 Gb = 128 MB: the smallest memory

Page 6: Storage: Alternate Futures

6

MAD at 50 Gbpsi• MAD: Magnetic Aerial Density:

3-10 Mbpsi in products 20 Mbpsi in lab 50 Mbpsi = paramagnetic limit

but…. People have ideas.

• Capacity: rise 10x in 5 years (conservative)• Bandwidth: rise 4x in 5 years (density+rpm) • Disk: 50GB to 500 GB,

• 60-80MBps • 1k$/TB• 15 minute to 3 hour scan time.

Page 7: Storage: Alternate Futures

7

Disk vs Tape

• Disk– 47 GB– 15 MBps– 10 ms seek time– 5 ms rotate time– 9$/GB for drive

3$/GB for ctlrs/cabinet– 4 TB/rack

• Tape– 40 GB– 5 MBps– 30 sec pick time– Many minute seek time– 5$/GB for media

10$/GB for drive+library– 10 TB/rack

The price advantage of tape is narrowing, and the performance advantage of disk is growing

GuestimatesCern: 200 TB3480 tapes2 col = 50GBRack = 1 TB=20 drives

Page 8: Storage: Alternate Futures

8

System On A Chip• Integrate Processing with memory on one chip

– chip is 75% memory now– 1MB cache >> 1960 supercomputers– 256 Mb memory chip is 32 MB!– IRAM, CRAM, PIM,… projects abound

• Integrate Networking with processing on one chip– system bus is a kind of network– ATM, FiberChannel, Ethernet,.. Logic on chip.– Direct IO (no intermediate bus)

• Functionally specialized cards shrink to a chip.

Page 9: Storage: Alternate Futures

9

500 mips System On A Chip for 10$

• 486 now 7$ 233 Mhz ARM for 10$ system on a chiphttp://www.cirrus.com/news/products99/news-product14.html AMD/Celeron 266 ~ 30$

• In 5 years, today’s leading edge will be– System on chip (cpu, cache, mem ctlr, multiple IO)– Low cost– Low-power – Have integrated IO

• High end is 5 BIPS cpus

Page 10: Storage: Alternate Futures

10

Standard IO in 5 Years

• Probably

• Replace PCI with something better will still need a mezzanine bus standard

• Multiple serial links directly from processor

• Fast (10 GBps/link) for a few meters

• System Area Networks (SANS) ubiquitous (VIA morphs to SIO?)

Page 11: Storage: Alternate Futures

11

1 GBps1 GBps

Ubiquitous 10 GBps SANs in 5 years

• 1Gbps Ethernet are reality now.– Also FiberChannel ,MyriNet, GigaNet,

ServerNet,, ATM,…

• 10 Gbps x4 WDM deployed now (OC192)

– 3 Tbps WDM working in lab

• In 5 years, expect 10x, progress is astonishing

• Gilder’s law: Bandwidth grows 3x/year http://www.forbes.com/asap/97/0407/090.htm

5 MBps20 Mbsp

40 MBps

80 MBps

120 MBps120 MBps(1Gbps)(1Gbps)

Page 12: Storage: Alternate Futures

12

Thin Client’s mean HUGE servers

• AOL hosting customer pictures

• Hotmail allows 5 MB/user, 50 M users

• Web sites offer electronic vaulting for SOHO.

• IntelliMirror: replicate client state on server

• Terminal server: timesharing returns

• …. Many more.

Page 13: Storage: Alternate Futures

13

Standard Storage Metrics• Capacity:

– RAM: MB and $/MB: today at 512MB and 3$/MB– Disk: GB and $/GB: today at 50GB and 10$/GB– Tape: TB and $/TB: today at 50GB and 12k$/TB (nearline)

• Access time (latency)– RAM: 100 ns– Disk: 10 ms– Tape: 30 second pick, 30 second position

• Transfer rate– RAM: 1 GB/s– Disk: 15 MB/s - - - Arrays can go to 1GB/s– Tape: 5 MB/s - - - striping is problematic, but “works”

Page 14: Storage: Alternate Futures

14

New Storage Metrics: Kaps, Maps, SCAN?

• Kaps: How many kilobyte objects served per second– The file server, transaction processing metric– This is the OLD metric.

• Maps: How many megabyte objects served per second– The Multi-Media metric

• SCAN: How long to scan all the data– the data mining and utility metric

• And– Kaps/$, Maps/$, TBscan/$

Page 15: Storage: Alternate Futures

15

For the Record (good 1999 devices packaged in system

http://www.tpc.org/results/individual_results/Compaq/compaq.5500.99050701.es.pdf)

DRAM DISK TAPE robotUnit capacity (GB) 1 9 40

Unit price $ 5000 900 20000$/GB 3300 12 12

Latency (s) 1.E-7 2.E-3 3.E+1Bandwidth (MBps) 1000 15 20

Kaps 9.E+5 6.E+2 3.E-2Maps 1.E+3 14.67 3.E-2

Scan time (s/TB) 1 600 24500$/Kaps 6.E-11 1.E-8 6.E-3$/Maps 5.E-8 6.E-7 6.E-3

$/TBscan $0.05 $1 $129

X 100

Tape is 1Tb with 4 DLT readers at 5MBps each.

Page 16: Storage: Alternate Futures

16

For the Record (good 1999 devices packaged in system

http://www.tpc.org/results/individual_results/Compaq/compaq.5500.99050701.es.pdf)

Tape is 1Tb with 4 DLT readers at 5MBps each.1.E-11

1.E-9

1.E-7

1.E-5

1.E-3

1.E-1

1.E+1

1.E+3

1.E+5

1.E+7

Kaps

Map

s

Scan

time

(s/T

B)

$/Kap

s

$/M

aps

$/TBsc

an

DRAM

DISK

TAPE

Page 17: Storage: Alternate Futures

17

The Access Time Myth• The Myth: seek or pick time dominates• The reality: (1) Queuing dominates• (2) Transfer dominates BLOBs• (3) Disk seeks often short• Implication: many cheap servers

better than one fast expensive server– shorter queues– parallel transfer– lower cost/access and cost/byte

• This is obvious for disk arrays• This even more obvious for tape arrays

Seek

Rotate

Transfer

Seek

Rotate

Transfer

Wait

Page 18: Storage: Alternate Futures

18

Storage Ratios Changed• 10x better access time• 10x more bandwidth• 4,000x lower media price

Disk Performance vs Time

1

10

100

1980 1990 2000

Year

seek

s p

er s

eco

nd

ban

dw

idth

: MB

/s

0.1

1.

10.

Cap

acity

(GB

)

Disk accesses/second vs Time

1

10

100

1980 1990 2000

Year

Acc

esse

s p

er S

eco

nd

Storage Price vs TimeMegabytes per kilo-dollar

0.1

1.

10.

100.

1,000.

10,000.

1980 1990 2000

Year

MB

/k$

• DRAM/disk media price ratio changed– 1970-1990 100:1

– 1990-1995 10:1

– 1995-1997 50:1

– today ~ 0.1$pMB disk 30:1

3$pMB dram

Page 19: Storage: Alternate Futures

19

Data on Disk Can Move to RAM in 8 years

Storage Price vs TimeMegabytes per kilo-dollar

0.1

1.

10.

100.

1,000.

10,000.

1980 1990 2000

Year

MB

/k$

30:1

6 years

Page 20: Storage: Alternate Futures

20

Outline• The Surprise-Free Future (5 years)

– 500 mips cpus for 10$ – 1 Gb RAM chips – MAD at 50 Gbpsi – 10 GBps SANs are ubiquitous– 1 GBps WANs are ubiquitous

• Some consequences– Absurd (?) consequences.– Auto-manage storage– Raid10 replaces Raid5– Disc-packs– Disk is the archive media of choice

• A surprising future?– Disks (and other useful things) become supercomputers.– Apps run “in the disk”.

Page 21: Storage: Alternate Futures

21

The (absurd?) consequences• 256 way nUMA?• Huge main memories: now:

500MB - 64GB memories then: 10GB - 1TB memories

• Huge disksnow: 5-50 GB 3.5” disks then: 50-500 GB disks

• Petabyte storage farms– (that you can’t back up or restore).

• Disks >> tapes– “Small” disks:

One platter one inch 10GB

• SAN convergence 1 GBps point to point is easy

• 1 GB RAM chips

• MAD at 50 Gbpsi

• Drives shrink one quantum

• 10 GBps SANs are ubiquitous

• 500 mips cpus for 10$

• 5 bips cpus at high end

Page 22: Storage: Alternate Futures

22

The Absurd? Consequences• Further segregate processing from storage

• Poor locality

• Much useless data movement

• Amdahl’s laws: bus: 10 B/ips io: 1 b/ips

ProcessorsDisks

~ 1 Tips

RAM Memory

~ 1 TB

~ 100TB

100 GBps10 TBps

Page 23: Storage: Alternate Futures

23

Storage Latency: How Far Away is the Data?

RegistersOn Chip CacheOn Board Cache

Memory

Disk

12

10

100

Tape /Optical Robot

10 9

10 6

Olympia

This Hotel

This RoomMy Head

10 min

1.5 hr

2 Years

1 min

Pluto

2,000 YearsAndromeda

Page 24: Storage: Alternate Futures

24

Consequences• AutoManage Storage

• Sixpacks (for arm-limited apps)

• Raid5-> Raid10

• Disk-to-disk backup

• Smart disks

Page 25: Storage: Alternate Futures

25

Auto Manage Storage• 1980 rule of thumb:

– A DataAdmin per 10GB, SysAdmin per mips

• 2000 rule of thumb– A DataAdmin per 5TB – SysAdmin per 100 clones (varies with app).

• Problem:– 5TB is 60k$ today, 10k$ in a few years.– Admin cost >> storage cost???

• Challenge: – Automate ALL storage admin tasks

Page 26: Storage: Alternate Futures

26

The “Absurd” Disk

• 2.5 hr scan time (poor sequential access)

• 1 aps / 5 GB (VERY cold data)

• It’s a tape!

1 TB100 MB/s

200 Kaps

Page 27: Storage: Alternate Futures

27

Extreme case: 1TB disk: Alternatives

• Use all the heads in parallel– Scan in 30 minutes– Still one Kaps/5GB

• Use one platter per arm– Share power/sheetmetal– Scan in 30 minutes– One KAPS per GB

1 TB500 MB/s

200 Kaps

200GB 200GB eacheach

500 MB/s

1,000 Kaps

Page 28: Storage: Alternate Futures

28

Drives shrink (1.8”, 1”)• 150 kaps for 500 GB is VERY cold data

• 3 GB/platter today, 30 GB/platter in 5years.

• Most disks are ½ full• TPC benchmarks use 9GB drives

(need arms or bandwidth).

• One solution: smaller form factor– More arms per GB– More arms per rack– More arms per Watt

Page 29: Storage: Alternate Futures

29

Prediction: 6-packs

• One way or another, when disks get huge– Will be packaged as multiple arms– Parallel heads gives bandwidth– Independent arms gives bandwidth & aps

• Package shares power, package, interfaces…

Page 30: Storage: Alternate Futures

30

Stripes, Mirrors, Parity (RAID 0,1, 5)

• RAID 0: Stripes– bandwidth

• RAID 1: Mirrors, Shadows,…– Fault tolerance– Reads faster, writes 2x slower

• RAID 5: Parity– Fault tolerance– Reads faster– Writes 4x or 6x slower.

0,3,6,.. 1,4,7,.. 2,5,8,..

0,1,2,.. 0,1,2,..

0,2,P2,.. 1,P1,4,.. P0,3,5,..

Page 31: Storage: Alternate Futures

31

RAID 10 (strips of mirrors) Wins“wastes space, saves arms”

RAID 5:

• Performance– 225 reads/sec– 70 writes/sec– Write

• 4 logical IO, • 2 seek + 1.7 rotate

• SAVES SPACE

• Performance degrades on failure

RAID1

• Performance– 250 reads/sec– 100 writes/sec– Write

• 2 logical IO• 2 seek 0.7 rotate

• SAVES ARMS

• Performance improves on failure

Page 32: Storage: Alternate Futures

32

The Storage RackToday

• 140 arms • 4TB• 24 racks

24 storage processors6+1 in rack

• Disks = 2.5 GBps IO• Controllers = 1.2 GBps IO• Ports 500 MBps IO

Page 33: Storage: Alternate Futures

33

Storage Rack in 5 years?• 140 arms

• 50TB• 24 racks

24 storage processors6+1 in rack

• Disks = 2.5 GBps IO• Controllers = 1.2 GBps IO• Ports 500 MBps IO

• My suggestion: move the processors into the storage racks.

Page 34: Storage: Alternate Futures

34

It’s hard to archive a PetaByteIt takes a LONG time to restore it.

• Store it in two (or more) places online (on disk?).

• Scrub it continuously (look for errors)

• On failure, refresh lost copy from safe copy.

• Can organize the two copies differently (e.g.: one by time, one by space)

Page 35: Storage: Alternate Futures

35

Crazy Disk Ideas• Disk Farm on a card: surface mount disks

• Disk (magnetic store) on a chip: (micro machines in Silicon)

• Full Apps (e.g. SAP, Exchange/Notes,..) in the disk controller

(a processor with 128 MB dram)ASIC

The Innovator's Dilemma: When New Technologies Cause Great Firms to FailClayton M. Christensen.ISBN: 0875845851

Page 36: Storage: Alternate Futures

36

The Disk Farm On a Card• The 500GB disc card• An array of discs• Can be used as• 100 discs• 1 striped disc• 50 Fault Tolerant discs• ....etc• LOTS of accesses/second bandwidth

14"

Page 37: Storage: Alternate Futures

37

Functionally Specialized Cards• Storage

• Network

• Display

M MB DRAM

P mips processor

ASIC

ASIC

ASIC Today:

P=50 mips

M= 2 MB

In a few years

P= 200 mips

M= 64 MB

Page 38: Storage: Alternate Futures

38

It’s Already True of PrintersPeripheral = CyberBrick

• You buy a printer• You get a

– several network interfaces– A Postscript engine

• cpu, • memory, • software,• a spooler (soon)

– and… a print engine.

Page 39: Storage: Alternate Futures

39

Tera Byte Backplane

• TODAY– Disk controller is 10 mips risc engine

with 2MB DRAM– NIC is similar power

• SOON– Will become 100 mips systems

with 100 MB DRAM.

• They are nodes in a federation(can run Oracle on NT in disk controller).

• Advantages– Uniform programming model– Great tools– Security– Economics (cyberbricks)– Move computation to data (minimize traffic)

All Device Controllers will be Cray 1’s

CentralProcessor &

Memory

Page 40: Storage: Alternate Futures

40

With Tera Byte Interconnectand Super Computer Adapters

• Processing is incidental to – Networking– Storage– UI

• Disk Controller/NIC is – faster than device– close to device– Can borrow device

package & power

• So use idle capacity for computation.

• Run app in device.• Both Kim Keeton (UCB) and

Erik Riedel (CMU) thesis investigate thisshow benefits of this approach.

Tera ByteBackplane

Page 41: Storage: Alternate Futures

41

Implications

• Offload device handling to NIC/HBA

• higher level protocols: I2O, NASD, VIA, IP, TCP…

• SMP and Cluster parallelism is important.

Tera Byte Backplane

• Move app to NIC/device controller

• higher-higher level protocols: CORBA / COM+.

• Cluster parallelism is VERY important.

CentralProcessor &

Memory

Conventional Radical

Page 42: Storage: Alternate Futures

42

How Do They Talk to Each Other?• Each node has an OS• Each node has local resources: A federation.• Each node does not completely trust the others.• Nodes use RPC to talk to each other

– CORBA? COM+? RMI? – One or all of the above.

• Huge leverage in high-level interfaces.• Same old distributed system story.

SANSIO

stre

ams

data

gram

s

RP

C?

Applications

SIO

streams

datagrams

RP

C ?

Applications

Page 43: Storage: Alternate Futures

43

Outline• The Surprise-Free Future (5 years)

– Astonishing hardware progress.

• Some consequences– Absurd (?) consequences.– Auto-manage storage– Raid10 replaces Raid5– Disc-packs– Disk is the archive media of choice

• A surprising future?– Disks (and other useful things) become supercomputers.– Apps run “in the disk”


Recommended