Synchronizing Lustre file systems Dénes Németh (nemeth.denes@iit.bme.hu) Balázs Fülöp...

transcript

Synchronizing Lustrefile systems

Dénes Németh (nemeth.denes@iit.bme.hu)

Balázs Fülöp (fulop.balazs@ik.bme.hu)

Dr. János Török (torok@ik.bme.hu)

Dr. Imre Szeberényi (szebi@iit.bme.hu)

The current state of art

• Partially solved– Conventional local file systems– Off-line operation (rsync)

• Problems– Walk through the directory structure – Have to know what will change (Inotify)– Does not work on distributed file systems– Scalability problems

The environment - Lustre

• Distributed– Stripes (part of a file) on separate hosts– ~100-1000 clients (reading writing)

• Redundant– File system and file metadata

• Fault tolerance– Transaction driven operations– Rollback capability

Lustre – synchronization

• Distributed– Hosts absolute event sequencing

• Is the time accurate enough?

– Clients extreme efficiency

• Redundant – Fault tolerance– Pulling the plug during synchronizing

• Moving, tracking events

– Rollback synchronize to transactions

The basic Lustre concept

Object StorageTargets

Lustre Server Side Lustre Client SideMetadata

Server

failover

~100-1000

„inode”

Moving the information - metadata

Object StorageTargets

Lustre Server Side Lustre Client SideMetadata

Server

~100-1000

LustreMetadataAccess

Kernel space

Local EventSequencer Global Event

SequencerEvent

Reporter

EventMultiplexer

EventProcessor

How-to move the informationMetadata

Server

Local EventSequencer Global Event

SequencerEvent

Reporter

EventMultiplexer

EventProcessor

Block Device

Proc FileSystem

TCP/IPNet

Block Device

• Asynchrone notification

• system calls:

•Select (timeout)

•Read, write (blocking)

• Max 100.000 events/sec

• Relative Complicated access

Proc FileSystem

• Easy access from user-space

• Notifications through signals

• Possibility for multiple reporters

• Minimal network usage

• Usually not a bottleneck

• ER & EM can be deployed together or separately

TCP/IPNet

• Just multiplexing events

• No problems

• No authorization, registration

(fix configuration)

TCP/IPNet

• Big difficulties

• Sequencing = Accurate timing

• Network delay

• Delay from FS overload

• Connection to all MDS

• Can be a bottleneck

Accurate sequencing

Linearly increasing output

Number oflocal sequencers

Average sequence performanceServer has enough threads

- Performance OK -

Server needs more threads- Performance DROPS -

Why?~ 5000 event/thread

„Graceful degradation”

Linear drop inperformance

Constant QoS

Resource usage on the global sequencer

at most 2 ms in each second ~ 0

How-to commit the changes

MDS OST

SFS 2SFS 1

CommitterClient

EventProcessor

CommitterClient

EventProcessor

MDS OST

EventMultiplexer

MDS OST

EventReporter

EventMultiplexer

EventReporter

CommitterClient

EventProcessor

How-to execute „3” if„4” already happened?

Unfortunately noreal good solution

Event sequence error resolution

1. Ostrich politic• Drop all evens with conflicting sequence

2. Conflict detection• Is the event applicable?• In design stage …

3. Replaying the already committed events• Currently lack of Lustre support

Questions?

Thank you for your

Attention!

Synchronizing Lustre file systems Dénes Németh (nemeth.denes@iit.bme.hu) Balázs Fülöp...

Documents