+ All Categories
Home > Documents > Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination...

Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination...

Date post: 03-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
33
Self Stabilizing Distributed File System Shlomi Dolev Shlomi Dolev and Ronen I. Kat and Ronen I. Kat Department of Computer Science, Ben Department of Computer Science, Ben - - Gurion Gurion University University Research Sponsored by IBM Research Sponsored by IBM
Transcript
Page 1: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Self Stabilizing Distributed File System

Shlomi Dolev Shlomi Dolev and Ronen I. Katand Ronen I. KatDepartment of Computer Science, BenDepartment of Computer Science, Ben--GurionGurion UniversityUniversity

Research Sponsored by IBMResearch Sponsored by IBM

Page 2: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

DFS Motivation

• Performance• Communication, placing files closer to users• Load, no single bottle-neck

• Fault tolerance• No single point of failure• Partitions, disconnected operations

Page 3: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Related Work

• File systems• NFS – network file system protocol• AFS – Andrew File system – CMU (1988)• Coda - CMU (1998)• Intermezzo – Peter J. Braam, CMU

• Peer to peer (2000)• Global storage: OceanStore – Berkeley• Server less: Microsoft Farsite.

Page 4: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Talk Overview

• Self-stabilization• A Distributed File System• Algorithms• File system operations• Future work

Page 5: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Self Stabilization

A self-stabilizing system is a system that can automatically recover following the occurrence of (transient) faults.

The idea is to design system that can be started in an arbitrary state and still converge to a desired behaviour.

Page 6: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Self Stabilization Motivation

• The combination and type of faults cannot be totallytotally anticipated in on-going systems

• Any on-going system mustmust be self stabilizing (or manually monitored)

• Self-stabilizing algorithm can recover from any arbitrary state reached due to the occurrence of faults

Page 7: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Current Research

• Self healing• Adaptiveness• Automatic recovery• Autonomic computing

Self StabilizationDijkstra 1974

Page 8: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Examples

• Token passing (mutual exclusion)• Spanning tree• Finding cliques in distributed systems

•• From theory to practice !From theory to practice !

Page 9: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

A Distributed File System

• Self-stabilizing spanning tree of the replication servers

• Self-stabilizing extension of the replication backbone tree to include caches

• File operations controlled by a self-stabilizing synchronizer

Page 10: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Algorithms – Self Stabilizing

!! Electing a leader (leader election)Electing a leader (leader election)• Collecting connectivity information• Optimising communication cost • Synchronizer for file consistency

Page 11: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Leader Election

• A single leader coordinates the tree construction

• Leader broadcast heart beats• If heart beat not received then become

leader• If more than one exists, one survives

Page 12: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Algorithms – Self Stabilizing

• Electing a leader (leader election)!!Collecting connectivity informationCollecting connectivity information• Optimising communication costs • Synchronizer for file consistency

Page 13: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

TTL of Multicast Defines Graph

Page 14: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Update Algorithm

• Repeatedly collect routing tables from all neighbours (in the graph)

• If leader is not in routing table then a manager (local leader) notifies the leader to increase TTL (graph connectivity)

• Routing tables define a distributed BFS spanning tree

• Stabilizes!

Page 15: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Algorithms – Self Stabilizing

• Electing a leader (leader election)• Collecting connectivity information!!Optimising communication costsOptimising communication costs• Synchronizer for file consistency

Page 16: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Optimising Communication Costs

• Goal: find the minimal ε radius that keeps connectivity

• Too small, increase ε by a factor of 2• Run a 2nd instance of update with γ< ε• Finding the smallest ε by binary search

Page 17: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Tree Structure

Page 18: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Caching Tree

• Extends the replication tree• The update algorithm constructs both• Servers execute two instances• Caches execute one instance

Page 19: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Combined Spanning Tree

Page 20: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Algorithms – Self Stabilizing

• Electing a leader (leader election)• Collecting connectivity information• Optimising communication costs!!Synchronizer for file consistencySynchronizer for file consistency

Page 21: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Synchronization Mechanism

• Provide reliable command and timing• Propagate commands between servers• Collect and distribute information

Page 22: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Synchronizer

• Leader repeatedly chooses a colour in a round robin fashion (Dijkstra)

• The colour is propagated to the leaves• Convergcast of the colour arrives to

leader before the leader chooses the next colour

• Distributes and collects information!

Page 23: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Replication Consistency

• Verifies file signatures (e.g., crc)• Different signatures – a conflict• Conflict resolution (e.g., majority)• Broadcast resolved signature and

sources list

Page 24: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Locking Table

• A single global lock table • Servers request a lock • Leader resolves multiple requests• Lock are removed by requesting server

(or disappearance of server from tree)

Page 25: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

File System Operations

Page 26: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Accessing a FileLock file

Get signature

Get a copy

Yes

No

No

Use cached

Yes

Write?

Cached?

Page 27: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Closing a FileSend newsignature

Yes

No

Updated?

Confirmsignature

Page 28: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Meta Access (e.g., ls)

• Blocked until a lock is obtained

• Globally processed• Verify signatures on

results

Lock file

Executecommand

Waitconfirmation

Page 29: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

SRFS Linux Interface

Application

User LevelLinux system calls

System Calls

New implementation:

open, close, lstat, mkdir, etc …

SyncDaemon:Cache manager & Server

Up calls

NetworkCommunication

Page 30: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Future Work

• Kernel VFS module.• Communication improvements:

– Reducing update messages– Using timers with β-synchronizer

• Integrating disconnected operations• Conflict resolution algorithms

Page 31: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Credits

Undergraduate Students:Amir Livneh [email protected] Granik [email protected] Lansky [email protected] Shmuel [email protected] Shish [email protected] Erlich [email protected] Chohen [email protected] Biran [email protected] Fridman [email protected] Bernard [email protected] Ferents [email protected] Feintuch [email protected] Shalev [email protected] Kraim [email protected] Hayuit

FacultyProf Shlomi Dolev [email protected] StudentsRonen I. Kat [email protected] EngeenierAlbina Budker [email protected]

Page 32: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

Visit us atVisit us at

www.www.cscs..bgubgu.ac..ac.ilil/~/~srfssrfs

Page 33: Self Stabilizing Distributed File System€¦ · Self Stabilization Motivation • The combination and type of faults cannot be totally anticipated in on-going systems • Any on-going

MIT Press, 2000


Recommended