+ All Categories
Home > Documents > GridFS Targeting Data Sharing in Grid Environments

GridFS Targeting Data Sharing in Grid Environments

Date post: 02-Feb-2016
Category:
Upload: clem
View: 41 times
Download: 0 times
Share this document with a friend
Description:
GridFS Targeting Data Sharing in Grid Environments. Marcelo Nery dos Santos / Renato Cerqueira PUC-Rio, Brazil Presented by: Francisco Silva. Motivation. User-level file system infra-structure Providing access to remote file systems Having a simple configuration - PowerPoint PPT Presentation
Popular Tags:
21
GridFS Targeting Data Sharing in Grid Environments Marcelo Nery dos Santos / Renato Cerqueira PUC-Rio, Brazil Presented by: Francisco Silva
Transcript
Page 1: GridFS Targeting Data Sharing in Grid Environments

GridFSTargeting Data Sharing in Grid Environments

Marcelo Nery dos Santos / Renato CerqueiraPUC-Rio, Brazil

Presented by: Francisco Silva

Page 2: GridFS Targeting Data Sharing in Grid Environments

Motivation

• User-level file system infra-structure– Providing access to remote file systems– Having a simple configuration

• No need for super-user privileges

• Lessening problems faced by CSBase, a framework for developing grid environments– Reducing NFS dependency

• Facilitating deployment

– Enabling useful file transfer metrics

Page 3: GridFS Targeting Data Sharing in Grid Environments

Related Work

• Distributed File Systems (e.g. NFS/AFS)– Configuration overhead for system administrators– Local access to large files is not available

• Avaki Data Grid– Proprietary solution, no file transfer metrics

• Globus– GridFTP / Reliable File Transfer Service

• Useful, but hard installation for novices• Oversized solution for simpler cases

Page 4: GridFS Targeting Data Sharing in Grid Environments

GridFS - Characteristics

• Scalability, allowing a large number of files to be shared;

• Performance;

• Interoperability through the use of CORBA for remote access;

• Federative approach.

Page 5: GridFS Targeting Data Sharing in Grid Environments

GridFS - Characteristics

• Historical data about data transfers, that can be used by scheduling algorithms in order to choose na executing host for a task based on the estimated time and effort for data transfer;

• Metadata support that can store (field, value) tuples;

• Object Oriented Interface.

Page 6: GridFS Targeting Data Sharing in Grid Environments

GridFS - Features

• Remote File System Access– List / Create / Delete files and directories– Read / Write operations over files– Retrieve file system free space

• General Operations– Metadata get / set operations– Copy files directly between servers– Add / Remove mount points

• In order to allow a GridFS federation

Page 7: GridFS Targeting Data Sharing in Grid Environments

CORBA IDL – RemoteFileinterface RemoteFile { RemoteFile createDirectory(in Path name) RemoteFile createFile(in Path name) RemoteFile getChild(in Path name) FileSequence getChildren() boolean remove() ReadChannel getReadChannel() WriteChannel getWriteChannel() RandomAccessChannel getRandomAccessChannel()

boolean copyTo (in RemoteFile dst, in string method)boolean addMount (in Path name, in RemoteFile target)

RemoteFile removeMountPoint (in Path name) FileServer getFileServer() //continues...}

Page 8: GridFS Targeting Data Sharing in Grid Environments

GridFS – Data Accessibility

• Remote Access– Through CORBA remote invocations

• Allows read/write access

– By mounting a GridFS on the local file systems using FUSE

• Allows use of legacy applications

• File Transfer Operations– Several implementation methods

• Java NIO / CORBA / FTP– New methods/protocols can be easily added

– Performance evaluation

Page 9: GridFS Targeting Data Sharing in Grid Environments

Implementation Aspects

• CORBA– Interoperability– Scalability

• POA Policies (RootPOA) (DefaultServant)

• Java– Portability– Performance issues

• Use of NIO allows performance similar to FTP (Transfer Rate) (CPU) (Load)

Page 10: GridFS Targeting Data Sharing in Grid Environments

GridFS - Limitations

• Remove operations only over leaves– Files or empty directories

• No lock mechanism– Several writers to the same file (unix-like)

• Single user– No users, groups or permissions

• Caching– No caching policies implemented

Page 11: GridFS Targeting Data Sharing in Grid Environments

Limits Tested

• Simultaneous file transfer operations– NIO (96 - 192, independently of the method used)– FTP (50, PureFTPd server limit)– CORBA (80 - 480, 80 threads dealing with 480 ops)

• Performance– NIO and FTP: limited by IDE disk speed (Gigabit

network)– CORBA: limited by disk speed and Round Trip Time– FUSE: 1,5MB/s (naive implementation)

• Remote Access Channels– 1000 (operating system file descriptors limit)

Page 12: GridFS Targeting Data Sharing in Grid Environments

CSBase•Infra-structure for remote algorithm execution

•GridFS used to implement CSFS Daemon

•Files are copied to execution host or accessed remotely by NFS

•CSBase server controls file transfer operations from Data Repository to Execution Hosts

•CSFS Daemons allow local file system accessibility

Page 13: GridFS Targeting Data Sharing in Grid Environments

CSBase: Algorithm Execution

1. User requests an algorithm execution2. CSBase server creates an object to handle the request3. This object verifies if the selected execution host has

access to binaries and data files (uses CSFS to copy files, if necessary)

4. CSBase server starts a command on that host using the Node Daemon

5. Whenever the command is finished, the modified files are synchronized back to repository

6. A clean-up procedure is invoked7. Client is notified of command completion

Page 14: GridFS Targeting Data Sharing in Grid Environments

Main Contributions

• A file server that:– Is scalable, portable, interoperable– Has reasonable performance– Combines the benefits of different approaches

• Remote File Access• File Staging

– Offers special functionalities for Grid Computing (estimated transfer cost and file copy to local system)

Page 15: GridFS Targeting Data Sharing in Grid Environments

Future Work

• Notification Mechanism– Allowing caching policies implementation and

online remote-tree visualization for GUIs

• Users and security issues– In order to guarantee data integrity and

confidentiality

• Index and search capabilities– Over the stored metadata

Page 16: GridFS Targeting Data Sharing in Grid Environments

Questions?

Page 17: GridFS Targeting Data Sharing in Grid Environments
Page 18: GridFS Targeting Data Sharing in Grid Environments
Page 19: GridFS Targeting Data Sharing in Grid Environments

(x*y) x: number of client machines / y: number of threads

Page 20: GridFS Targeting Data Sharing in Grid Environments
Page 21: GridFS Targeting Data Sharing in Grid Environments

Recommended