Date post: | 16-Feb-2017 |
Category: |
Engineering |
Upload: | anamika-vinod |
View: | 118 times |
Download: | 4 times |
Liquid- A Scalable Deduplication File System ForVirtual Machine Images
Anamika G V(12143630)S7 CSA
College of Engineering, Cherthala
August 8, 2016
Guided ByJosna Jose
Assistant ProfessorComputer Science & Engineering
CONTENTS
1 INTRODUCTION2 VIRTUAL MACHINE3 DEDUPLICATION4 EXISTING SYSTEM5 ISSUES IN VM STORAGE6 LIQUID SYSTEM ARCHITECTURE7 DEDUPLICATION IN LIQUID8 OPTIMIZATIONS ON FINGER PRINT CALCULATION9 FILE SYSTEM LAYOUT10 COMMUNICATION AMONG COMPONENTS HEART BEAT
PROTOCOL11 FAST CLONING FOR VM IMAGES12 FAULT TOLERANCE13 GARBAGE COLLECTION14 ADVANTAGES OF LIQUID15 CONCLUSION16 REFERENCE
INTRODUCTION
Cloud computing is the practice of using a remote servers hostedon the internet to store manage and process data rather than alocal server or a personnel computer.
Figure : 1.A sample cloud computing network
VIRTUALIZATION and VIRTUAL
MACHINE
1 Virtualization deals with extending or replacing an existinginterface so as to mimic the behavior of another system
2 A Virtual Machine is a software that creates a virtualizedenvironment between the computer platform and the end userin which the end user can operate software.
3 Crucial component in cloud computing.
4 Virtual Machine - Hypothetical Computer.
5 Executes programs like a physical machine.
6 Initial state of a virtual machine is stored in a file calledvirtual Machine image.
VIRTUAL MACHINE
Figure : 2.Virtual Machine representation
DEDUPLICATION
1 Data Deduplication data compression technology.2 Eliminates duplicate copies of repeating data.3 A redundant data blocks are replaced so as to avoid the
problems regarding storage consumption of a large number ofVM images.
4 Improves storage utilization.
Figure : 3.Deduplicated file system
EXISTING SYSTEM
1 Hypervisors such as Xen, KVM etc.
2 Network Attached Storage (NAS)
3 Storage Area Network (SAN)
4 Direct Attached Storage (DAS)
ISSUES IN VM STORAGE
1 High demand on VM storage remains a challenging problem.
2 Existing systems have made efforts to reduce storageconsumption.
3 Uses SAN cluster.
4 Cannot satisfy increasing demand due to cost limitation.
5 Hence we propose LIQUID.
LIQUID SYSTEM ARCHITECTURE
1 Three components - Single meta server with hot back upmultiple data server and multiple clients.
2 Runs on user-level service process.
3 VM images are split into fixed size data blocks.
4 Meta server namespace , finger print , reference count.
5 Meta server mirrored to hot back up shadow meta server.
6 Data servers change of managing data blocks in VM images.
7 Organized in a distributed hash table.
8 A liquid client provides a POSIX compatible file system.
9 Client critical component (provides deduplication)
10 Fault tolerance Mirroring the meta server.
11 Replicas of data blocks are stored.
LIQUID SYSTEM ARCHITECTURE
(CONT.)
Figure : 4.Liquid Architecture
DEDUPLICATION IN LIQUID
1 Liquid chooses fixed size chunking instead of variable sizechunking.
2 Better since all files stored in VM images will be aligned ondisk block boundaries.
3 Advantage-simplicity.
4 Block size choice.
5 Block size- balancing factor which is hard to choose. Greatimpact on both deduplication and io performance.
DEDUPLICATION IN LIQUID(CONT.)
1 Smaller block size-more random seeks when accessing a VMimage.
2 Not tolerable.
3 A large block size is also not preferable, it will reducededuplication ratio.
4 Liquid choose different block size under different situation.
5 Advised to use a multiplication of 4 kb between 256 kb and 1MB to achieve good balance between IO performance anddeduplication ratio.
DEDUPLICATION IN LIQUID(CONT.)
Figure : 5.Pictorial representation of Deduplication
OPTIMIZATIONS ON FINGERPRINT
CALCULATION
∗ Rely on comparison of data block finger prints for redundancy.∗ Finger print-collision resistant hash value calculated from datablock contents.∗ MD5[26] and SHA-1[12] are frequently used for this purpose.∗ Finger print collision - very small, orders of magnitude smallerthan hardware error rates.∗ So we could safely assume that two data blocks are identical.∗ Finger print calculation - expensive. ∗ Delays finger printcalculation for recently modified data blocks.∗ Runs deduplication lazily only when it is necessary.∗ Client side maintains a shared cache which contains recentlyaccessed data blocks.
OPTIMIZATIONS ON FINGERPRINT
CALCULATION (CONT.)
∗ A portion of memory is used by the client side of liquid as privatecache.∗ Private cache hold-modified data blocks and delay finger printcalculation on them.∗ Modified data block ejected from shared cache and added toprivate cache.∗ Modified data will be ejected if private cache becomes full.∗ And ejected based on LRU policy.∗ Only then will the modified data block’s finger print becalculated.∗ Liquid uses multiple threads for finger print calculation.∗ Multiple threads will process different data blocks currently.∗ Provides good IO performance.
FILE SYSTEM LAY OUT
1 All file system meta data are stored on the meta server.
2 Organized in a file system tree.
3 Client side could cache portions of file system meta data forfast accesses.
4 When a VM is stopped ,modified meta data and data blocks.
5 Will be pushed back to meta server.
6 Data servers ensures modification on VM image is visible toother client nodes.
FILE SYSTEM LAY OUT
Figure : 6:File System Structure
COMMUNICATION AMONG COMPONENTS:HEART BEAT PROTOCOL
1 META SERVER-manages all data servers.
2 Exchange regular heart beat message with each data server ina ROUND ROBIN FASHION.
3 Detect failed data servers when there are many data servers.
4 To speed up failure detection data servers send an error signalto meta server.
FAST CLONING FOR VM IMAGES
1 Copying large images may be time consuming.
2 Liquid provide efficient solution by means of fast cloning.
3 VM images represented by meta data files having reference todata blocks.
4 By copying meta data file and updating reference count aclone VM image is achieved.
5 Modification on cloned images will not effect the originalimage.
FAULT TOLERANCE
1 Data replication
2 Data migration
3 Hot backup of Meta server
GARBAGE COLLECTION
1 Removes unused garbage data blocks when running out ofspace.
2 Reference counting of all data blocks are maintained by meteservers.
3 Garbage collection request is issued periodically to data server.
4 Garbage collection is executed based on the data blockmembership in the Bloom filter.
ADVANTAGES OF LIQUID
1 Fast Virtual Machine deployment with peer to peer datatransfer.
2 Low storage consumption by means of deduplication.
3 Instant cloning for virtual machine images.
4 On demand fetching through a network caching with localdisks.
5 LIQUID files has no specific limit.
CONCLUSION
1 Presented LIQUID which is a deduplication file system withgood IO performance.
2 Achieved by caching frequently accessed data blocks inmemory cache.
3 Avoids additional disk operations.
4 Deduplication of VM images proved to be effective.
REFERENCES
[1].Xun Zhao, Yang Zhang, Yongwei Wu, Kang Chen, Jinlei Jiang,and Keqin Li, Senior Member, IEEE, Liquid: A ScalableDeduplication File System for Virtual Machine ImagesIEEETRANSACTIONS ON PARALLEL AND DISTRIBUTEDSYSTEMS, VOL. 25, NO. 5, MAY 2014.[2]AmazonMachineImage,Sept.2001.[Online]. Availablehttp://en.wikipedia.org/wiki/Amazon/Machine/Image.[3]BittorrentProtocol,Sept.2011.[Online]. Availablehttp://en.wikipedia.org/wiki/BitTorrent/protocol.[4]BloomFilter,Sept.2011.[Online]. Availablehttp://en.wikipedia.org/wiki/Bloom/filter[5]Xfs:AHigh Performance Journaling Filesystem,Sept.2011.[Online].http://oss.sgi.com/projects/xfs
[6]RabinFingerprint,Sept.2011.[Online]. Availablehttp://en.wikipedia.org/wiki/Rabin/fingerprint.[7]DataDeduplication,Sept.2013.[Online].Available:http://en.wikipedia.org/wiki/Data/deduplication.[8]A.T. Clements, I. Ahmad, M. Vilayannur, and J. Li,Decentralized Deduplication in San Cluster File Systems, in Proc.Conf. USENIX Annu. Techn. Conf., 2009, p. 8, USENIXAssociation.[9]K. Jin and E.L. Miller, The Effectiveness of Deduplication onVirtualMachine Disk Images, in Proc. SYSTOR, Israeli Exp. Syst.Conf., New York, NY, USA, 2009, pp.1-12.[10]A. Liguori and E. Hensbergen, Experiences with ContentAddressable Storage and Virtual Disks,,in Proc. WIOV08, SanDiego, CA, USA, 2008, p. 5.
[11]M. McLoughlin,The qcow2 Image Format, Sept. 2011.[Online]. Available:http://people.gnome.org/markmc/qcow-image-format.html.[12]C. Tang, Fvd: A High-Performance Virtual Machine ImageFormat for Cloud, in Proc. USENIX Conf. USENIX Annu. Tech.Conf., 2011, p. 18.[13]B. Zhu, K. Li, and H. Patterson,Avoiding the Disk Bottleneckin the Data Domain Deduplication File System, in Proc. 6thUSENIX Conf. FAST, Berkeley, CA, USA, 2008, pp. 269-282.