+ All Categories
Home > Documents > Harnessing Petabytes of Online Storage Effectively

Harnessing Petabytes of Online Storage Effectively

Date post: 27-Jan-2016
Category:
Upload: gabby
View: 40 times
Download: 0 times
Share this document with a friend
Description:
Harnessing Petabytes of Online Storage Effectively. 2005/09/27. Jun Nitta ([email protected]). Hitachi, Ltd. 1. Introduction: where are we today? 2. Configuring mass online storage 3. Defining distribution of intelligence 4. Miscellaneous topics 5. Summary: beyond 10 petabytes. - PowerPoint PPT Presentation
26
Copyright © Hitachi, Ltd. 2005. All rights reserved. HPTS2005 Harnessing Petabytes of On line Storage Effectively Jun Nitta ([email protected]) Hitachi, Ltd. 2005/09/27
Transcript
Page 1: Harnessing Petabytes of Online Storage Effectively

Copyright © Hitachi, Ltd. 2005. All rights reserved.

HPTS2005

Harnessing Petabytes of Online Storage Effectively

Jun Nitta ([email protected])

Hitachi, Ltd.

2005/09/27

Page 2: Harnessing Petabytes of Online Storage Effectively

Copyright © Hitachi, Ltd. 2005. All rights reserved.

HPTS2005

1. Introduction: where are we today?

2. Configuring mass online storage

3. Defining distribution of intelligence

4. Miscellaneous topics

5. Summary: beyond 10 petabytes

Page 3: Harnessing Petabytes of Online Storage Effectively

Copyright © Hitachi, Ltd. 2005. All rights reserved.

HPTS2005

Introduction: where are we today?1

Page 4: Harnessing Petabytes of Online Storage Effectively

4Copyright © Hitachi, Ltd. 2005. All rights reserved.

1-1 Looking into the latest specifications of HDDs…

disk sizedisk size rotational speedrotational speed(seek / latency)(seek / latency)

interfaceinterface(sustained data rate)(sustained data rate)

data data bufferbuffercapacity / diskscapacity / disks

3.5’’ 147GB/5

3.5’’ 300GB/5

15,000rpm(3.7ms/2.0ms)

10,025rpm(4.7ms/3.0ms)

4Gbp/s FC-AL(n/a-93.3MB/s)

2Gbp/s FC-AL(46.8-89.3MB/s)

16MB

16MB

3.5’’

2.5’’

1.0’’

500GB/57,200rpm

(8.5ms/4.2ms) 16MB3Gbp/s SATA-II(31-64.8MB/s)

* based on HGST catalogues as of Sep. 2005

100GB/27,200rpm

(10ms/4.2ms)1.5Gbp/s SATA

(n/a-n/a) 8MB

modelmodel

8GB/13,600rpm

(12ms/8.3ms)CE-ATA

(5.1-10.0MB/s) 128KB

for portable audio player?for portable audio player?

for small form factorfor small form factor

for large volume archivesfor large volume archives

for most other applicationsfor most other applications

for high performance OLTPfor high performance OLTP

Page 5: Harnessing Petabytes of Online Storage Effectively

5Copyright © Hitachi, Ltd. 2005. All rights reserved.

1-2 … and storage subsystems (RAID controllers)

raw capacityraw capacity

HDDsHDDs

FC portsFC ports

cachecache

1152 (5 cabinets) 225240 105

enterpriseenterpriseenterpriseenterprise midrangemidrangemidrangemidrange workgroupworkgroupworkgroupworkgroup

128GB

332TB (FC)

192

88.5TB (SATA)

4

8GB

4

40.5TB (SATA)

4GB

* based on HDS catalogues as of Sep. 2005

72TB (FC)

48

64GB

LUNsLUNs 16,384 2,048 51216,384

roughly: 1 rack = 200 disks (3.5”) = 100TB (500GB drive)

Page 6: Harnessing Petabytes of Online Storage Effectively

6Copyright © Hitachi, Ltd. 2005. All rights reserved.

1-3 Sheer number of HDDs matters practically

HDDsHDDs

O(10O(1000))

O(10O(1022))

O(10O(1033))

O(10O(1044))

O(10O(1055))

O(10O(1066))

O(10O(1011))

capacitycapacity

500GB -

5TB -

50TB -

500TB -

50PB -

500PB -

practicalitypracticality

almost prohibitive

major inhibitormajor inhibitorbesides $$$besides $$$

* MTBF of a high-end FC HDD is 106h by catalogue spec. (=114yrs, actual number may vary by order of magnitude)

practical limit

for most datacenters

a piece of cake (even possible

personally)

today’s enterprisemainstream

challenging but still feasible

getting impractical

5PB -

power & cooling

disk failure*

storage management

none

Page 7: Harnessing Petabytes of Online Storage Effectively

Copyright © Hitachi, Ltd. 2005. All rights reserved.

HPTS2005

Configuring mass online storage:array of nodes or disks?

2

Page 8: Harnessing Petabytes of Online Storage Effectively

8Copyright © Hitachi, Ltd. 2005. All rights reserved.

2-1 Two alternatives to configure online storage

server farm (diskless)

storage network (FC or IP)

storage farm

array-of-disks(separate from servers)

Very cost effective for some kind of applications

- Secondary data management (especially search)- Can utilize cheapest components

Versatile for various mix of applications

- OLTP, ERP, DWH, email, …- Cost is steadily going down

CPU

memory

HDD

netw

ork

netw

ork

array-of-nodes(stack of self-contained boxes)

Page 9: Harnessing Petabytes of Online Storage Effectively

9Copyright © Hitachi, Ltd. 2005. All rights reserved.

2-2 Rationale for array-of-disks model

It is reasonable to separate mechanical components– HDD is the only mechanical component bedsides a cooling fan

– It makes much easier to implement hot-swap mechanisms

It is reasonable to have external storage subsystems– Disks can be shared among clusters of servers

– Spare disks can be shared within a storage subsystem

HDD1

HDD2

HDD3

HDD4

HDD5

HDD6

HDD7

HDD8

HDD9

HDD10

HDD11

RAID-5 (4D+1P) group 1

shared hot-spare diskRAID-5 (4D+1P) group

2

vs.

Page 10: Harnessing Petabytes of Online Storage Effectively

10

Copyright © Hitachi, Ltd. 2005. All rights reserved.

2-3 Additional discussion for array-of-disks model

It makes data management easier*– Various data protection techniques can be employed including

third-party backup and D2D replication

– For the array-of-nodes configuration, replication between nodes is the almost only viable solution for data protection (conventional backup is difficult to be employed effectively)

* Actually backup is one of the most compelling reason to consolidate scattered storages into an external RAID box

backup server

application server

RAID subsystem

tape library

Page 11: Harnessing Petabytes of Online Storage Effectively

11

Copyright © Hitachi, Ltd. 2005. All rights reserved.

2-4 But does this dichotomy has a meaning?

Nonetheless we need storage “controller” for array-of-disks– “Controller” is just another name of a special-purpose server of

which restricted operating environment some users prefer

– Two configuration differs essentially in CPU-to-HDD ratio determined by intelligence which a storage farm requires

Which

is most p

rom

ising?

Which

is most p

rom

ising?

Which

is most p

rom

ising?

Which

is most p

rom

ising?

general-purpose server with a couple of disks

O(103-4) ofclustered nodes

special-purpose controllerwith a lot of disks

O(100-2) ofclustered subsystems

even a HDD has CPU and memory (device controller)

O(103-4) ofclustered disks

basic building blockbasic building block petabytes configurationpetabytes configuration

Page 12: Harnessing Petabytes of Online Storage Effectively

Copyright © Hitachi, Ltd. 2005. All rights reserved.

HPTS2005

Defining distribution of intelligence:protocol and interface

3

Page 13: Harnessing Petabytes of Online Storage Effectively

13

Copyright © Hitachi, Ltd. 2005. All rights reserved.

3-1 Distribution of intelligence among farms

server farm storage network(FC or IP)

storage farm

server side

intelligence

storage side

intelligence

3 reasons some functions are better placed at storage side- It is naturally implemented using CPU and memory near HDDs- It requires operations with durable state- It makes multiple servers share data objects

3 reasons some functions are better placed at server side- It is better implemented using CPU and memory near applications- It requires more powerful and economical CPU / memory- It handles multiple controllers

Page 14: Harnessing Petabytes of Online Storage Effectively

14

Copyright © Hitachi, Ltd. 2005. All rights reserved.

3-2 Alternative way to place intelligence

Some intelligence could be placed on the network- But a closer look reveals that most of those “intelligent network components” are not genuine network core components- Rather they are placed on the boundary between network and server /storage which is not a clear-cut edge but a blurred region

server farm

storage network (FC or IP) storage farm

network corenetwork corenetwork corenetwork coreboundaryboundaryboundaryboundary

network edge

intelligence(server side)

boundaryboundaryboundaryboundary

network edge

intelligence(storage

side)

Is this a part of network or storage farm?

Page 15: Harnessing Petabytes of Online Storage Effectively

15

Copyright © Hitachi, Ltd. 2005. All rights reserved.

3-3 Placement of functions: an example

storage side storage side intelligenceintelligence

server side server side intelligenceintelligence

intelligence on intelligence on both sideboth side

- basic RAID control / LUN management- remote filesystem- local replication including snapshots (copy-on-write)- volume migration transparent to servers

- block aggregation (a.k.a. logical volume management)- remote replication- backup- data encryption

- local filesystem- volume migration among multiple controllers- multi-path management (load balancing & fail over)- content search / indexing

Here is an example of intelligence distribution scheme assuming array-of-disks configuration

Page 16: Harnessing Petabytes of Online Storage Effectively

16

Copyright © Hitachi, Ltd. 2005. All rights reserved.

3-4 Which interface & protocol should we adopt?

There are 3 well-established I/O interfaces: block, file, SQL- None of them is optimal for today’s server/storage farm environment- Though file may be most promising for its balanced features- But I/O interface is stubborn to change (very conservative)- Thus multi interface/protocol support is a practical solution

blockblock

protocolprotocol(transport)(transport)

filefile SQLSQL

SCSI-3(FC or IP)

NFS/CIFS-SMB(TCP/IP)

proprietary(mostly TCP/IP)

strengthstrength

weaknessweakness

- low latency- strong standard protocol

- layers away from application- not network-friendly

- broad application- strong standard protocol

- performance and scalability (especially for DBMS)

- high level enough to encapsulate physical properties

- limited application- no standard protocol

interfaceinterface

Page 17: Harnessing Petabytes of Online Storage Effectively

Copyright © Hitachi, Ltd. 2005. All rights reserved.

HPTS2005

Miscellaneous topics for managing petabytes of online storage

4

Page 18: Harnessing Petabytes of Online Storage Effectively

18

Copyright © Hitachi, Ltd. 2005. All rights reserved.

4-1 Virtualization: simply too many mappings

“Virtualization” itself is a powerful technology to hide complexity if used properly

- But current situation is too confusing

Operating Systems and DBMSs should be aware that a storage volume is a logical network resource

- It can even expand and shrink dynamically- There may be more than 100,000 volumes on the network (most OS can recognize up to only about 1,000 volumes)

RAID/block aggregationRAID/block aggregationRAID/block aggregationRAID/block aggregation

LULU

LULUHDDHDDHDDHDDHDDHDD

LULU

HDDHDDHDDHDDHDDHDD

RAID/block aggr.RAID/block aggr.RAID/block aggr.RAID/block aggr.

LULU LULU

LULU

HDDHDDHDDHDDHDDHDD

RAID/block aggr.RAID/block aggr.RAID/block aggr.RAID/block aggr.

RAID/block aggregationRAID/block aggregationRAID/block aggregationRAID/block aggregation

LULU

LULU

LULU

HDDHDDHDDHDDHDDHDD

RAID/block aggr.RAID/block aggr.RAID/block aggr.RAID/block aggr.

LULU

RAID/block aggregationRAID/block aggregationRAID/block aggregationRAID/block aggregation

AP-recognizable volumeAP-recognizable volume

server level virtualization(HBA/device

driver, OS/LVM, DBMS)

switch level virtualizatio

n

controller level

virtualization

server

switch

controller

recognize

export

controller

controller

controller

Page 19: Harnessing Petabytes of Online Storage Effectively

19

Copyright © Hitachi, Ltd. 2005. All rights reserved.

4-2 Data protection: disk plays the protagonist

You have to go to disks at least for the first step to make backup workable for > 10TB of data

- Eventually those data may go to tape (D2D2T)

primary primary volumevolume HDDHDDHDDHDDHDDHDD

MT emulationMT emulationMT emulationMT emulation

server1

controller

VTL

consistenconsistent t

snapshotsnapshot

copy on write

AP/DBMSAP/DBMSAP/DBMSAP/DBMS

agentagentagentagent

server2

HDDHDDHDDHDDHDDHDDHDDHDDHDDHDDHDDHDD

RAIDRAIDRAIDRAID

controller

HDDHDDHDDHDDHDDHDD

data data protection protection managermanager

data data protection protection managermanager

1) make quiescent

3) resume4) mount

5) backup to VTL or replicate to disks

2) take snapshot

typical backup scenario for large amount of data

Page 20: Harnessing Petabytes of Online Storage Effectively

20

Copyright © Hitachi, Ltd. 2005. All rights reserved.

4-3 Data migration: latent cost of online storage

Since data always outlives its container, you should migrate data from one subsystem to another several times

- Non-disruptiveness to upper layer is desirable which requires some form of address mapping- Durable address mapping for storage is not well standardized for both block and file level (cf. URL -[DNS]-> IP address -> MAC address)

storage level

mapping

storage level

mapping

switch level

mapping

switch level

mapping

sever level

mapping

sever level

mapping

invariantinvariant data data movementmovement

path(more flexible)

SCSI LUN(less flexible)

scattered& long

localized& short

HDDHDDHDDHDDHDDHDD

server1

switch

old controller

AP/DBMSAP/DBMSAP/DBMSAP/DBMS

yet another mappingyet another mappingyet another mappingyet another mapping

server2

AP/DBMSAP/DBMSAP/DBMSAP/DBMS

another mappinganother mappinganother mappinganother mapping

switchanother mappinganother mappinganother mappinganother mapping

some mappingsome mappingsome mappingsome mapping

HDDHDDHDDHDDHDDHDD

new controllersome mappingsome mappingsome mappingsome mapping

yet another mappingyet another mappingyet another mappingyet another mapping

Page 21: Harnessing Petabytes of Online Storage Effectively

21

Copyright © Hitachi, Ltd. 2005. All rights reserved.

4-4 Security: as always matters

And of course there are a lot of security concerns storage subsystems have to take care of

- Data-at-rest protection is much more challenging than data-in-flight because of long-term key management

application server

storage administrat

ormanagement server

primary site secondary site

storage subsystem

[management port security]- user authentication- access control- data-in-flight protection

[data port security]- device authentication- access control- data-in-flight protection

[other subsystem security]- data-at-rest protection- audit logging

Page 22: Harnessing Petabytes of Online Storage Effectively

22

Copyright © Hitachi, Ltd. 2005. All rights reserved.

4-5 Storage resource management: spreadsheet?

Even the basic discovery-and-reporting is still a pain in the neck for most administrators

- Most widely used management tool today is a spreadsheet- But can they continue using it for PB environment?- SNIA SMI-S standard seems good because of its set-oriented query capability (SNMP has already gone broken for storage management)- Yet most commercial tools are not proven over PB

storage administrato

r

Page 23: Harnessing Petabytes of Online Storage Effectively

23

Copyright © Hitachi, Ltd. 2005. All rights reserved.

4-6 Applications: will they use DBMS?

What kind of applications will use petabytes of online storage?

- email/IM, voice, video archive, …- stream data from sensor network (including RFID)- geoscience, bioscience, medical, …

How those data will be managed?- Most bulk data may not be stored in RDBMSs but in filesystems (with global name space)- XLM native store may engulf a lot of data (structured and semi-structured) once well established

today’s typical PB system

HDDHDDHDDHDDHDDHDD

application server

staging disk

MT library

MTMTMTMTMTMT

file server

HSMHSMHSMHSM

HDDHDDHDDHDDHDDHDD

metadata DB

contents server

DBMSDBMSDBMSDBMS

contents contents managermanagercontents contents managermanager

front-end applicationfront-end applicationfront-end applicationfront-end application

cachecachecachecache

O(10G-100GB)

O(1PB)e.g.

100MB*107files

Page 24: Harnessing Petabytes of Online Storage Effectively

Copyright © Hitachi, Ltd. 2005. All rights reserved.

HPTS2005

Summary: beyond 10 petabytes5

Page 25: Harnessing Petabytes of Online Storage Effectively

25

Copyright © Hitachi, Ltd. 2005. All rights reserved.

5-1 Beyond 10 petabytes of data

Continuing capacity growth of HDD enables >10PB online storage within the reach of most IT organizations in 5 years

- HDD with perpendicular magnetic recording technology is emerging- Declining $/GB trend shows no sign of discontinuing

Server farm – network – storage farm configuration will continue to dominate enterprise data centers

- It is the most cost effective and flexible way to configure online storage for varieties of applications

Protocol and interface between server and storage should evolve to be more network-conscious

- But old guards will never die in a foreseeable future

XML data store may come to play a significant role in addition to filesystem and RDBMS

- Who knows!

Page 26: Harnessing Petabytes of Online Storage Effectively

Copyright © Hitachi, Ltd. 2005. All rights reserved.

HPTS2005

Jun NittaHitachi, Ltd.

2005/09/27

Harnessing Petabytes of Online Storage Effectively


Recommended