CentralizedDecentralized
MapReduceGrid
Brief HistoryClasses of DFS
An Overview of theClasses and Uses of
Distributed Filesystems
Calvin Winkowski
Linux and Unix Users Group at Virginia Tech
April 24, 2014
Calvin Winkowski Distributed FileSystems 1/20
CentralizedDecentralized
MapReduceGrid
Brief HistoryClasses of DFS
OriginsITS
• MIT AI Lab - IncompatibleTimesharing System
• Virtual software devices• Inter-machine FS access
Locus
• Atomic operations• Transparent data location• Replication• Caching• Fault-tolerance Calvin Winkowski Distributed FileSystems 2/20
CentralizedDecentralized
MapReduceGrid
Brief HistoryClasses of DFS
AFSGoals
• Develop the future environment of computing• Reduction in central timesharing devices• Networking will be pivotal
Design
• File locking• Location independent• Hierarchy of file systems• Storage independent of consumers
Calvin Winkowski Distributed FileSystems 3/20
CentralizedDecentralized
MapReduceGrid
Brief HistoryClasses of DFS
Modern World
Data
• Large data stores, very fast access• High availability
Compute
• Variable requirements• High speed & small storage vs Low speed & large storage
Calvin Winkowski Distributed FileSystems 4/20
CentralizedDecentralized
MapReduceGrid
Brief HistoryClasses of DFS
Gains of DFSs
Capability
• Speed (Write vs Read)• Replication• Locking• De-duplication
Utility
• Unified Namespace• Access Control• Enumeration• Failover vs Fault Tolerance• Load Balancing
Calvin Winkowski Distributed FileSystems 5/20
CentralizedDecentralized
MapReduceGrid
Brief HistoryClasses of DFS
Classes of DFSs
• Centralized• Decentralized• Mapreduce• Grid
Calvin Winkowski Distributed FileSystems 6/20
CentralizedDecentralized
MapReduceGrid
MethodologyExamples
Basics of Centralized DFSs
• Central Metadata server• Distributed object storage• Parallel Access is simplified• Fast throughput, higher latency• Loss of high-availability• Favours big compute
Calvin Winkowski Distributed FileSystems 7/20
CentralizedDecentralized
MapReduceGrid
MethodologyExamples
Lustre
Calvin Winkowski Distributed FileSystems 8/20
CentralizedDecentralized
MapReduceGrid
MethodologyExamples
Examples of Centralized DFSs
• Lustre ← Note the spelling• XtreemFS• pNFS• AFS
Calvin Winkowski Distributed FileSystems 9/20
CentralizedDecentralized
MapReduceGrid
MethodologyExamples
Basics of Decentralized DFSs
• All data is clustered• Increases locking complexity• HA friendly• Loss of throughput• May increase latency
Calvin Winkowski Distributed FileSystems 10/20
CentralizedDecentralized
MapReduceGrid
MethodologyExamples
CEPH
Calvin Winkowski Distributed FileSystems 11/20
CentralizedDecentralized
MapReduceGrid
MethodologyExamples
Examples of Decentralized DFSs
• CEPH• GFS• GlusterFS• FhGFS• Tahoe-LAFS
Calvin Winkowski Distributed FileSystems 12/20
CentralizedDecentralized
MapReduceGrid
MethodologyExamplesMapReduce DFSsExamples
MapReduction
Map
• Mathematical definition• Organizing the data for consumption
Reduce
• Produce a series of values• Generating results
Calvin Winkowski Distributed FileSystems 13/20
CentralizedDecentralized
MapReduceGrid
MethodologyExamplesMapReduce DFSsExamples
MapReduction Example
Calvin Winkowski Distributed FileSystems 14/20
CentralizedDecentralized
MapReduceGrid
MethodologyExamplesMapReduce DFSsExamples
Basics of MapReduce DFSs
• Favours big data• Some centralized entry point• Master maps to storage servers• Integrate with MapReduce frameworks• Usually provides an API
Calvin Winkowski Distributed FileSystems 15/20
CentralizedDecentralized
MapReduceGrid
MethodologyExamplesMapReduce DFSsExamples
Hadoop
Calvin Winkowski Distributed FileSystems 16/20
CentralizedDecentralized
MapReduceGrid
MethodologyExamplesMapReduce DFSsExamples
Examples of MapReduce DFSs
• Hadoop• GFS (Not to be confused with GFS, GFS2, GlusterFS,
GPFS)• GloudStore• QFS
Calvin Winkowski Distributed FileSystems 17/20
CentralizedDecentralized
MapReduceGrid
Grid File Systems
• Take advantage of many small nodes• Often run on workstations• "Crowd source" storage• Gfarm File System• Scalable I/O• Grid computing
Calvin Winkowski Distributed FileSystems 18/20
CentralizedDecentralized
MapReduceGrid
Gfarm
Calvin Winkowski Distributed FileSystems 19/20
CentralizedDecentralized
MapReduceGrid
Questions
Calvin Winkowski Distributed FileSystems 20/20