+ All Categories
Home > Documents > New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq...

New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq...

Date post: 09-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
30
Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL DATA LABORATORY Carnegie Mellon University A transparently scalable metadata service for the Ursa Minor storage system
Transcript
Page 1: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

Shafeeq SinnamohideenRaja Sambasivan, James Hendricks, Likun Liu,

Gregory R. Ganger

PARALLEL DATA LABORATORYCarnegie Mellon University

A transparently scalablemetadata service for the

Ursa Minor storage system

Page 2: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 2 Shafeeq Sinnamohideen © June 10

Ursa Minor• Prototype of a Self-* storage system [FAST05]

• Direct-access system model• Data path for bulk data• Metadata path for attributes• Similar to NASD, Panasas, PVFS, Lustre, etc.

• Research questions• How to automate management?• How to build a versatile system?

• This talk : one hard problem with simple solution

Page 3: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 3 Shafeeq Sinnamohideen © June 10

Metadata Server

Ursa Minor Overview

Client

Metadata

/foo

Object-ID

Data

Object-basedStorageDevices

Metadata Server

Metadata

Object-IDFile attributesStorage node list

Page 4: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 4 Shafeeq Sinnamohideen © June 10

Desired properties• Scalability

• Adding servers increases capacity• Ideally the increase is proportional

• Transparency• Users don’t care which server is used• Always provide consistent semantics

• Atomic operations are a useful building block• Standard compliance• Difficult for programmers to do without

Page 5: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 5 Shafeeq Sinnamohideen © June 10

Maintaining semanticsEasy for the data path:• Operations affect a single file• Only one server involved in each op

Some metadata ops can affect two items:• Renaming a file to different directory• Parent & child• Could involve two servers

Page 6: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 6 Shafeeq Sinnamohideen © June 10

Handling multi-server ops1. Only allow single-server ops

• e.g.: AFS, NFS, OnTAP GX• Volume abstraction->limited transparency

2. Use a distributed transaction protocol• e.g.: Farsite• Complex to implement

3. Use distributed locking & shared state• e.g.: GPFS• Push complexity into lock manager

Page 7: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 7 Shafeeq Sinnamohideen © June 10

Our approach to multi-server ops• Use the simplest possible solution• System can already:

• Perform single server atomic operations• Migrate items for load balancing

Reuse features to support multi-server ops

Page 8: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 8 Shafeeq Sinnamohideen © June 10

The ideaWhen a request needs multiple files:• Migrate file’s metadata to one server• Execute the single-server code path• Fix any load imbalance

• Return metadata to original server• Move other files

Page 9: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 9 Shafeeq Sinnamohideen © June 10

Core tradeoff• Gain simplicity through reuse

• Unmodified single server execution• Unmodified migration path

• Lose some performance• Migration latency added to op latency

• Expect this to be a worthwhile tradeoff

Page 10: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 10 Shafeeq Sinnamohideen © June 10

What do we expect?Traces of large file systems show that:• Multi-object ops are a tiny fraction• Most multi-object ops are parent-child

• CREATE, DELETE• Parent & child on same server for locality

• Other multi-object ops extremely rare• RENAME: 0.005% involve 2 dirs• LINK: 0.120% possible (0.005% actual)• Most of these will be close in directory tree

• Rare case doesn’t have to be fast

Page 11: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 11 Shafeeq Sinnamohideen © June 10

Metadata distribution• Distributed key-value store for “inodes”

• Key: Object-ID• Value: object metadata (attributes & layout)

• Distribute by Object-ID

Object-IDs

Page 12: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 12 Shafeeq Sinnamohideen © June 10

Metadata distribution• Distributed key-value store for “inodes”

• Key: Object-ID• Value: object metadata (attributes & layout)

• Distribute by Object-ID• Partition into ranges

00000999

10001999

20002999

30003999

40004999

50007499

75009999

Page 13: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 13 Shafeeq Sinnamohideen © June 10

Metadata distribution• Distributed key-value store for “inodes”

• Key: Object-ID• Value: object metadata (attributes & layout)

• Distribute by Object-ID• Partition into ranges• Assign each range to a server

Metadata Server 1 Metadata Server 2

00000999

10001999

20002999

30003999

40004999

50007499

75009999

Page 14: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 14 Shafeeq Sinnamohideen © June 10

Metadata distribution• Distributed key-value store for “inodes”

• Key: Object-ID• Value: object metadata (attributes & layout)

• Distribute by Object-ID• Partition into ranges• Assign each range to a server

Metadata Server 1 Metadata Server 2

00000999

10001999

20002999

30003999

40004999

50007499

75009999

Page 15: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 15 Shafeeq Sinnamohideen © June 10

Metadata distribution• Delegation coordinator assigns ranges• Range is unit of migration• Metadata persistently stored in data path

Metadata Server 1 Metadata Server 2

00000999

10001999

20002999

30003999

40004999

50007499

75009999

Delegation Coordinator

Page 16: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 16 Shafeeq Sinnamohideen © June 10

Multi-server operations• When a metadata server needs a range :

1.Borrow it from the server that has it2.Perform the operation3.Return it to the original server

Metadata Server 1 Metadata Server 2

00000999

10001999

20002999

30003999

40004999

50007499

75009999

Delegation Coordinator

Migration

2000299920002999

Page 17: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 17 Shafeeq Sinnamohideen © June 10

Object-IDs• Object-ID determines which server to use• Assign Object-IDs to minimize multi-server ops

• Directory tree determines operation locality• Multi-file ops involve nearby directories• Nearby files should get similar Object-IDs

• Fall into same range• Served by same server - locality benefits

• Encode hierarchy into Object-ID• Analogous to IP address subnetting

Page 18: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 18 Shafeeq Sinnamohideen © June 10

Example tree

/dir1

/dir1/dir1/dir2

/dir1/dir1

/dir1/dir1/dir2/file3

/dir1/dir2

/dir1/dir1/dir1/dir1/dir1/file1 /dir1/dir2/file1

Page 19: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 19 Shafeeq Sinnamohideen © June 10

Object-ID assignment

01 01 02 03

/dir1

/dir1/dir1/dir2

/dir1/dir1

/dir1/dir1/dir2/file3

Object-ID

Page 20: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 20 Shafeeq Sinnamohideen © June 10

Evaluation1. Is Metadata Service scalable?2. Sensitivity to workload characteristics3. Sensitivity to system parameters4. Headroom for future workloads

Page 21: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 21 Shafeeq Sinnamohideen © June 10

Experimental setup• Modified SpecSFS97 NFS benchmark

• Applied to Ursa Minor NFS head-ends• NFS head-end translates to Ursa Minor• Configured to maximize MDS load

• 8.3 million files & directories• 26GB of metadata (158GB of file data)

• Vary number of metadata servers• Rest of system is constant

Page 22: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 22 Shafeeq Sinnamohideen © June 10

Metadata traffic

OSD1

OSD3

OSD2

NFS1

NFS3

NFS2

NFS4

MDS1

MDS2

(24)(1-32)(48)

NFSRequests

Measured

Page 23: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 23 Shafeeq Sinnamohideen © June 10

Scalability w/o multi-server ops

Page 24: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 24 Shafeeq Sinnamohideen © June 10

About multi-server ops SpecSFS97 doesn’t produce any• Simple directory structure• No multi-directory ops in workload• OID-assignment policy does perfectly

Page 25: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 25 Shafeeq Sinnamohideen © June 10

Adding multi-server ops Artificially introduce them• Replace CREATEs with cross-dir LINKs

• Same work for each operation• Use “bad” OID-assignment policy• 1% multi-server ops• 100X rate from traces!

Page 26: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 26 Shafeeq Sinnamohideen © June 10

Scalability with multi-server ops

Page 27: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 27 Shafeeq Sinnamohideen © June 10

Causes of slowdown• Latency of migration• Side-effects on other operations

• Migration makes a table unavailable• Servers flush cache on migration

• Granularity of migration is significant• The smaller, the better• Extreme case is single-object

• Encountered very rarely in practice

Page 28: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 28 Shafeeq Sinnamohideen © June 10

Implementation

• Half of our implementation is a simple lock manager• Our 2PC implementation is not robust

2587Multi-server using 2PC820Multi-server operations

47000Base metadata serverLines of C

Page 29: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 29 Shafeeq Sinnamohideen © June 10

Conclusion• Feasible to reuse migration to support

multi-server operations• Almost no overhead w/ shared storage

• Harvard, NetApp, SpecSFS97 workloads• Even higher multi-server operation rates

• Good choice for system designers• Transparent scalability made easy

Page 30: New A transparently scalable metadata service for the Ursa Minor … · 2019. 2. 25. · Shafeeq Sinnamohideen Raja Sambasivan, James Hendricks, Likun Liu, Gregory R. Ganger PARALLEL

http://www.pdl.cmu.edu/ 30 Shafeeq Sinnamohideen © June 10


Recommended