Use Distributed Filesystem as a Storage Tier

Fabrizio Manfred Furuholmen

Use Distributed File system as a Storage Tier

14/04/2023

2

Agenda

Introduction Next Generation Data Center Distributed File system

Distributed File system OpenAFS GlusterFS HDFS Ceph

Case Studies

Conclusion

14/04/2023

3

Class Exam

What do you know about DFS ?

How can you create a Petabyte storage ?

How can you make a centralized system log ?

How can you allocate space for your user or system, when you have a thousands of users/systems ?

How can you retrieve data from everywhere ?

14/04/2023

4

Introduction

Next Generation Data Center: the “FABRIC”

Key categories: Continuous data protection and disaster

recovery

File and block data migration across

heterogeneous environments

Server and storage virtualization

Encryption for data in-flight and at-rest

In other words: Cloud data center

14/04/2023

5

Introduction

Storage Tier in the “FABRIC” High Performance Scalability Simplified Management Security High Availability

Solutions Storage Area Network Network Attached Storage Distributed file system

14/04/2023

6

Introduction

What is a Distributed File system ?

“A distributed file system takes advantage of the

interconnected nature of the network by storing

files on more than one computer in the network

and making them accessible to all of them..”

Introduction

7

Part II

Implementations

8

How many DFS do you know ?

14/04/2023

9

OpenAFS: introduction

Key ideas: Make clients do work whenever possible.

Cache whenever possible.

Exploit file usage properties. Understand them. One-third of Unix files are temporary.

Minimize system-wide knowledge and change. Do not hardwire locations.

Trust the fewest possible entities. Do not trust workstations.

Batch if possible to group operations.

is the open source implementation of Andrew File system of IBM

14/04/2023

10

OpenAFS: design

OpenAFS: components

Server A

Server A+B

Server C

11

64

256

1024

4096

16384

65536

262144 4

64

1024

16384

0

5000

10000

15000

20000

25000

30000

35000

40000

kb

block

write

35000-40000

30000-35000

25000-30000

20000-25000

15000-20000

10000-15000

5000-10000

0-5000

4

16

64

256

1024

4096

16384 2048

16384

1310720

10000

20000

30000

40000

50000

60000

70000

80000

90000

a

43

read

80000-90000

70000-80000

60000-70000

50000-60000

40000-50000

30000-40000

20000-30000

10000-20000

0-10000

OpenAFS: performances

OpenAFS OpenAFS OSD 2 Servers

14/04/2023

13

OpenAFS: features

Uniform name space: same path on all workstations

Security: base to krb4/krb5, extended ACL, traffic encryption

Reliability: read-only replication, HA database, read/write replica in OSD version

Availability: maintenance tasks without stopping the service

Scalability: server aggregation

Administration: administration delegation

Performance: client side disk base persistent cache, big rate client per Server

openAFS: who uses it ?

14

OpenAFS: good for ...

15

14/04/2023

16

GlusterFS

“Gluster can manage data in a single global namespace on commodity hardware..”

Keys: Lower Storage Cost—Open source software runs on commodity

hardware

Scalability—Linearly scales to hundreds of Petabytes

Performance—No metadata server means no bottlenecks

High Availability—Data mirroring and real time self-healing

Virtual Storage for Virtual Servers—Simplifies storage and keeps VMs always-on

Simplicity—Complete web based management suite

14/04/2023

17

GlusterFS: design

14/04/2023

18

GlusterFS: components

volume posix1 type storage/posix option directory /home/export1end-volume

volume brick1 type features/posix-locks option mandatory subvolumes posix1end-volume

volume server type protocol/server option transport-type tcp option transport.socket.listen-port 6996 subvolumes brick1 option auth.addr.brick1.allow * end-volume

14/04/2023

19

Gluster: components

14/04/2023

20

Gluster: performance

14/04/2023

21

Gluster: carateristics

Uniform name space: same path on all workstation

Reliability: read-1 replication, asynchronous replication for disaster recovery

Availability: No system downtime for maintenance (better in the next release)

Scalability: Truly linear scalability

Administration: Self Healing, Centralized logging and reporting, Appliance version

Performance: Stripe files across dozens of storage blocks, Automatic load balancing, per volume i/o tuning

Gluster: who uses it ?

Avail TVN (USA) 400TB for Video on demand, video storage

Fido Film (Sweden)visual FX and Animation studio

University of Minnesota (USA)142TB Supercomputing

Partners Healthcare (USA)336TB Integrated health system

Origo (Switzerland)open source software development and collaboration platform

22

Gluster: good for ...

23

14/04/2023

24

Implementations

Implementations

Old way Metadata and data in the same place Single stream per file

New way Multiple streams are parallel channels

through which data can flow Files are striped across a set of nodes in

order to facilitate parallel access OSD Separation of file metadata

management (MDS) from the storage of file data

14/04/2023

25

HDFS: Hadoop

HDFS is part of the Apache Hadoop project which develops open-source software for reliable, scalable, distributed computing.

Hadoop was inspired by Google’s MapReduce and Google File system

14/04/2023

26

HDFS: Google File System

“ Design of a file systems for a different environment where assumptions of a general purpose file system do not hold—interesting to see how new assumptions lead to a different type of system…”

Key ideas: Component failures are the norm. Huge files (not just the occasional file) Append rather than overwrite is typical Co-design of application and file system API—specialization.

For example can have relaxed consistency.

“Moving Computation is Cheaper than Moving Data”

HDFS: MapReduce

27

HDFS: goals

28

HDFS: design

29

HDFS: components

30

14/04/2023

31

HDFS: features

Uniform name space: same path on all workstations

Reliability: rw replication, re-balancing, copy in different locations

Availability: hot deploy

Scalability: server aggregation

Administration: HOD

Performance: “grid” computation, parallel transfer

HDFS: who uses it ?

Major players

Yahoo!A9.comAOLBooz Allen HamiltonEHarmonyFacebookFreebaseFox Interactive MediaIBMImageShackISIJoostLast.fmLinkedInMetawebMeeboNingPowerset (now part of Microsoft)Proteus TechnologiesThe New York TimesRackspaceVeohTwitter…

32

HDFS: good for ...

33

Ceph

“Ceph is designed to handle workloads in which tens thousands of clients or more simultaneously access the same file or write to the same directory–usage scenarios that bring typical enterprise storage systems to their knees.”

Keys: Seamless scaling — The file system can be seamlessly expanded by simply

adding storage nodes (OSDs). However, unlike most existing file systems, Ceph proactively migrates data onto new devices in order to maintain a balanced distribution of data.

Strong reliability and fast recovery — All data is replicated across multiple OSDs. If any OSD fails, data is automatically re-replicated to other devices.

Adaptive MDS — The Ceph metadata server (MDS) is designed to dynamically adapt its behavior to the current workload.

34

Ceph: design

35

Ceph: features

36

37

Ceph: features

Ceph: features

38

Ceph: good for …

39

Others

40

Part III

Case Studies

41

14/04/2023

42

Class Exam

What can DFS do for you ?

How can you create a Petabyte storage ?

How can you make a centralized system log ?

How can you allocate space for your user or system, when you have a thousands of users/systems ?

How can you retrieve data from everywhere ?

File sharing

43

Web Service

44

Internet Disk: myS3

45

Log concentrator

46

Private cloud

14/04/2023

48

Conclusion: problems

FailureFor 10 PB of storage, you will have an average of 22 consumer-grade SATA drives failing per day.

Read/write timeEach of the 2TB drives takes approximately best case 24,390 seconds to be read and written over the network.

Data ReplicationData replication is the number of the disk drives, plus difference.

Do you have enough bandwidth ?

Conclusion

49

14/04/2023

50

Conclusion: next step

Links

51

I look forward to meeting you…

XVII European AFS meeting 2010 PILSEN - CZECH REPUBLIC

September 13-15

Who should attend: Everyone interested in deploying a globally accessible

file system Everyone interested in learning more about real world

usage of Kerberos authentication in single realm and federated single sign-on environments

Everyone who wants to share their knowledge and experience with other members of the AFS and Kerberos communities

Everyone who wants to find out the latest developments affecting AFS and Kerberos

More Info: http://afs2010.civ.zcu.cz/

14/04/2023

52

Thank you

[email protected]

Date post:	30-Jun-2015
Category:	Technology
Upload:	manfred-furuholmen
View:	4,907 times
Download:	5 times

Use Distributed Filesystem as a Storage Tier

Technology