+ All Categories
Home > Documents > [IEEE 2013 International Conference on Computer Communication and Informatics (ICCCI) - Coimbatore,...

[IEEE 2013 International Conference on Computer Communication and Informatics (ICCCI) - Coimbatore,...

Date post: 27-Jan-2017
Category:
Upload: chitra
View: 215 times
Download: 1 times
Share this document with a friend
8
2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 - 06, 2013, Coimbatore, INDIA A Novel Indexing Scheme for Efcient Handling of Small Files in Hadoop Distributed File System Chandrasekar S, Dakshinamurthy R, Seshakumar P G, Prabavathy B, Chitra Babu Department of Computer Science and Engineering SSN College of Engineering Kalavakkam, Tamilnadu, India Email: chandrasekar8027, sesha8095, [email protected], [email protected], [email protected] Abstract- Hadoop Distributed File System (HDFS) is designed for reliable storage and management of very large les. All the les in HDFS are managed by a single server, the NameNode. NameNode stores metadata, in its main memory, for each le stored into HDFS. As a consequence, HDFS suffers a performance penalty with increased number of small les. Storing and managing a large number of small les imposes a heavy burden on the NameNode. The number of les that can be stored into HDFS is constrained by the size of NameNode‘s main memory. Further, HDFS does not take the correlation among les into account, and it does not provide any prefetching mechanism to improve the I/O performance. In order to improve the efciency of storing and accessing the small les on HDFS, we propose a solution based on the works of Dong et al., namely Extended Hadoop Distributed File System (EHDFS). In this approach, a set of correlated les is combined, as identied by the client, into a single large le to reduce the le count. An indexing mechanism has been built to access the individual les from the corresponding combined le. Further, index prefetching is also provided to improve I/O performance and minimize the load on NameNode. The experimental results indicate that EHDFS is able to reduce the metadata footprint on NameNode‘s main memory by 16% and also improve the efciency of storing and accessing large number of small les. Index Terms—hadoop distributed le system, small le, extended hdfs, le correlation, indexing, prefetching. I. INTRODUCTION Hadoop [7] [5] is an open source project, which develops software for reliable and scalable distributed computing. Hadoop framework has been widely used in various clusters to build large scale, high performance systems. The Hadoop architecture consists of Hadoop Distributed File System, and a programming model, MapReduce, to perform data intensive computations on a cluster of commodity computers. A Hadoop cluster scales computation capacity, storage capacity and I/O bandwidth by simply adding commodity machines. Hadoop Distributed File System (HDFS) [1] [4] is the agship le system component of Hadoop. Inspired by the design of proprietary Google File System (GFS) [3], HDFS follows the pattern of write-once and read-many-times[6]. HDFS has a master-slave architecture, with a single master called the NameNode and multiple slaves called DataNodes. NameNode manages the metadata and the le system conguration data within the HDFS. The metadata is maintained in the main memory of the NameNode to ensure fast access to the client, on read/write requests. DataNodes store and service read/write requests on les in HDFS, as directed by the NameNode. The les stored into HDFS are replicated into any number of DataNodes as per conguration, to ensure reliability and data availability. These replicas are distributed across the cluster to ensure rapid computations. Storing large number of small les into HDFS becomes an overhead in terms of memory usage by metadata stored in NameNode. In such scenarios, a single NameNode becomes a bottleneck for handling metadata requests, when an application accesses a larger set of these small les. Further, the size of main memory in NameNode restricts the number of les that can be stored into HDFS. This prevents HDFS from being used as a primary data store for scientic and many other applications that produce large amount of small les and can be benetted with the data processing capabilities provided by Hadoop. In this paper, we modify HDFS to provide a solution to reduce the metadata footprint in NameNode‘s main memory. This needs an efcient way of storing small les into HDFS. The basic approach is to combine correlated small les, as identied by the client, into a single large le. This helps in reducing the le count and thereby, the metadata stored. An index mechanism has been built to access the individual les from the combined le. Further, providing prefetching mechanism for small les, based on le correlations, will help decrease the load due to metadata requests on NameNode. The index for correlated les is also prefetched and cached in the client to achieve better performance on read requests. The rest of this paper is organized as follows: Section 2 discusses the background on HDFS, small les problem and the existing work that addresses this problem; Section 3 978-1-4673-2907-1/13/$31.00 ©2013 IEEE
Transcript
Page 1: [IEEE 2013 International Conference on Computer Communication and Informatics (ICCCI) - Coimbatore, Tamil Nadu, India (2013.01.4-2013.01.6)] 2013 International Conference on Computer

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 - 06, 2013, Coimbatore, INDIA

A Novel Indexing Scheme for Efficient Handling of Small Files in

Hadoop Distributed File System

Chandrasekar S, Dakshinamurthy R, Seshakumar P G, Prabavathy B, Chitra Babu Department of Computer Science and Engineering

SSN College of Engineering Kalavakkam, Tamilnadu, India

Email: chandrasekar8027, sesha8095, [email protected], [email protected], [email protected]

Abstract- Hadoop Distributed File System (HDFS) is designed for reliable storage and management of very large files. All the files in HDFS are managed by a single server, the NameNode. NameNode stores metadata, in its main memory, for each file stored into HDFS. As a consequence, HDFS suffers a performance penalty with increased number of small files. Storing and managing a large number of small files imposes a heavy burden on the NameNode. The number of files that can be stored into HDFS is constrained by the size of NameNode‘s main memory. Further, HDFS does not take the correlation among files into account, and it does not provide any prefetching mechanism to improve the I/O performance. In order to improve the efficiency of storing and accessing the small files on HDFS, we propose a solution based on the works of Dong et al., namely Extended Hadoop Distributed File System (EHDFS). In this approach, a set of correlated files is combined, as identified by the client, into a single large file to reduce the file count. An indexing mechanism has been built to access the individual files from the corresponding combined file. Further, index prefetching is also provided to improve I/O performance and minimize the load on NameNode. The experimental results indicate that EHDFS is able to reduce the metadata footprint on NameNode‘s main memory by 16% and also improve the efficiency of storing and accessing large number of small files. Index Terms—hadoop distributed file system, small file, extended hdfs, file correlation, indexing, prefetching.

I. INTRODUCTION Hadoop [7] [5] is an open source project, which develops software for reliable and scalable distributed computing. Hadoop framework has been widely used in various clusters to build large scale, high performance systems. The Hadoop architecture consists of Hadoop Distributed File System, and a programming model, MapReduce, to perform data intensive computations on a cluster of commodity computers. A Hadoop cluster scales computation capacity, storage capacity and I/O bandwidth by simply adding commodity machines. Hadoop Distributed File System (HDFS) [1] [4] is the flagship file system component of Hadoop. Inspired by the design of proprietary Google File System (GFS) [3], HDFS

follows the pattern of write-once and read-many-times[6]. HDFS has a master-slave architecture, with a single master called the NameNode and multiple slaves called DataNodes. NameNode manages the metadata and the file system configuration data within the HDFS. The metadata is maintained in the main memory of the NameNode to ensure fast access to the client, on read/write requests. DataNodes store and service read/write requests on files in HDFS, as directed by the NameNode. The files stored into HDFS are replicated into any number of DataNodes as per configuration, to ensure reliability and data availability. These replicas are distributed across the cluster to ensure rapid computations. Storing large number of small files into HDFS becomes an overhead in terms of memory usage by metadata stored in NameNode. In such scenarios, a single NameNode becomes a bottleneck for handling metadata requests, when an application accesses a larger set of these small files. Further, the size of main memory in NameNode restricts the number of files that can be stored into HDFS. This prevents HDFS from being used as a primary data store for scientific and many other applications that produce large amount of small files and can be benefitted with the data processing capabilities provided by Hadoop. In this paper, we modify HDFS to provide a solution to reduce the metadata footprint in NameNode‘s main memory. This needs an efficient way of storing small files into HDFS. The basic approach is to combine correlated small files, as identified by the client, into a single large file. This helps in reducing the file count and thereby, the metadata stored. An index mechanism has been built to access the individual files from the combined file. Further, providing prefetching mechanism for small files, based on file correlations, will help decrease the load due to metadata requests on NameNode. The index for correlated files is also prefetched and cached in the client to achieve better performance on read requests. The rest of this paper is organized as follows: Section 2 discusses the background on HDFS, small files problem and the existing work that addresses this problem; Section 3

978-1-4673-2907-1/13/$31.00 ©2013 IEEE

Page 2: [IEEE 2013 International Conference on Computer Communication and Informatics (ICCCI) - Coimbatore, Tamil Nadu, India (2013.01.4-2013.01.6)] 2013 International Conference on Computer

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 - 06, 2013, Coimbatore, INDIA

describes the proposed system architecture and the modules that provide changes to the system; Section 4 discusses the experimental results and evaluations; Section 5 concludes and provides possible future directions.

II. BACKGROUND

Hadoop is an open source framework for providing distributed storage and data processing capabilities to data intensive applications. It consists of two major components: Hadoop Distributed File System (HDFS), for distributed storage and MapReduce, for distributed computation. A. Hadoop Distributed File System The Hadoop Distributed File System provides global access to files in the cluster [8]. HDFS consists of two services namely, the NameNode and DataNode. The NameNode is a centralized, single server, responsible for maintaining the metadata for files inside HDFS. It also maintains the configuration data such as, the count of replicas for each block of a file called replication factor, size of a block and other such parameters for HDFS. NameNode maintains the directory tree structure for the files in the filesystem. The DataNodes store the files in the form of blocks on behalf of the client. Every block is stored as a separate file in node‘s local filesystem. As DataNodes abstract away the details of underlying filesystem, all nodes need not be identical in their features. DataNode is responsible for storing, retrieving and deleting blocks on the request of NameNode. Files in HDFS are divided into blocks, with a default block size of 64 MB, and each block is replicated and stored in multiple DataNodes. The NameNode maintains the metadata for each file stored into HDFS, in its main memory. This includes a mapping between stored file names, the corresponding blocks of each file and the DataNodes that host these blocks. Hence, every request by a client to create, write, read or delete a file passes through the NameNode. Using the metadata stored, NameNode has to direct every request from client to the appropriate set of DataNodes. The client then communicates directly with the DataNodes to perform file operations. The single NameNode, storing metadata in its main memory, becomes a bottleneck when it has to handle massive number of small files. A small file is any file whose size is significantly lesser than the block size of HDFS. This issue, named as the small files problem [9], prevents many potential applications from using the benefits of Hadoop framework. B. Small Files Problem The NameNode stores the entire metadata in the main memory for faster and efficient servicing of client requests. The metadata is stored for each block of a file. When a file whose size is larger than the block size is stored, the amount of metadata stored is justified by the file size. Whereas, when a

large number of small files, each less than the block size are stored, each file forms a block and hence, the corresponding metadata stored is considerably high. For example, assume that, metadata in memory for each block of a file takes up about 150 bytes. Consequently, for a 1 GB file, divided into 16 64MB blocks, 2.4KB of metadata is stored. Whereas, for 10,500 files of size 100KB each (total 1 GB), about 1.5MB metadata is stored [9]. Thus, a large number of small files takes less space on file system but a significant amount of NameNode‘s main memory. This results in unfair use of the cluster space available. Further, accessing a large number of these files results in a bottleneck in NameNode. This prevents optimal usage of HDFS for various applications.

III. DESIGN

The proposed solution extends HDFS and has been named as Extended HDFS (EHDFS). Although this solution is influenced by the one proposed by Dong et al., EHDFS provides an improved indexing mechanism and prefetching of index information. EHDFS has four techniques that play an important role in improving the efficiency with which HDFS handles small files. They are file merging, file mapping, prefetching and file extraction. The overall system architecture depicting the placement of the modules that handle these operations is shown in Figure 1.

Figure 1. System Architecture

The following sections describe these techniques in detail. A. File Merging The main problem with small files in HDFS is the amount of memory used by the NameNode to manage these files. HDFS does not differentiate a small file from a large file and hence stores the same amount of metadata regardless of the file size. NameNode maintains two types of metadata for every file in HDFS namely, the file metadata and the block metadata. File metadata comprises of information about the file such as name of the file, location of the file in the name space tree, file size, modification time, access time, ownership details and file permissions. Block metadata comprises of information about

Page 3: [IEEE 2013 International Conference on Computer Communication and Informatics (ICCCI) - Coimbatore, Tamil Nadu, India (2013.01.4-2013.01.6)] 2013 International Conference on Computer

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 - 06, 2013, Coimbatore, INDIA

the list of blocks that hold the file data and the location of these blocks. The file merging technique reduces both the file metadata and the block metadata maintained by the NameNode for small files. It capitalizes on the fact that most of the file metadata remains the same for all the files whenever a user groups them together and uploads them to HDFS. File merging ensures that the NameNode maintains metadata only for the combined file and not for all the small files present in it. The names of the constituent files and their block information are maintained as part of a special data structure in the NameNode. File merging process is carried out in the client side while creating a file using EHDFS. While creating a combined file, the client specifies the names of the small files and the data associated with each file. This data is buffered in the client side till no more file data can be added without exceeding the HDFS block size. This ensures that no small file is split across blocks. Along with the file data, an index table is also placed at the beginning of each block. This table contains an entry for each small file that is a part of this block. Every table entry is an (offset, length) pair. For the ith files in the block, the ith entry in the index table specifies the offset of the first byte of the small file from the beginning of block and the length of the small file in bytes. The information in the index table can be used to identify the offsets that point to the start and end of the corresponding file. The structure of the produced block, named as extended block, is depicted in Figure 2.

Figure 2. Structure of an extended block

The extended blocks represent a part of the combined file.

These blocks are stored as any other block in HDFS DataNode long with other normal file blocks. The storage of blocks in DataNode is depicted in Figure 3.

Figure 3. Storage of extended blocks alongside normal file blocks

B. File Mapping File mapping is the process of mapping the small file name to the block of the combined file that contains this file. This is carried out by the NameNode. File mapping technique comes into play when the user wants to read a small file from the combined file. The user has to explicitly specify the name of the combined file and the small file while initiating the read operation. A request is sent to the NameNode, along with these two file names, for obtaining the location of the desired small file NameNode maintains a data structure called Constituent FileMap, for each combined file. It contains a mapping between a small file name and the logical block number of the combined file that holds this small file. Along with the logical block number and DataNode information, the NameNode also provides an “index entry number”. This “index entry number” specifies which entry in the index table stored at the beginning of the block corresponds to the requested small file, thereby avoiding a linear search.

Figure 4. Block structure after file merging

Page 4: [IEEE 2013 International Conference on Computer Communication and Informatics (ICCCI) - Coimbatore, Tamil Nadu, India (2013.01.4-2013.01.6)] 2013 International Conference on Computer

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 - 06, 2013, Coimbatore, INDIA

Figure 5 shows the ConstituentFileMap data structure for a combined file named temp. This file contains 8 small files spread over 3 blocks. The first block contains the files f1 and f2. The second block holds four files from f3 to f6, while the last block stores the remaining two files. The file names are hashed into the constituent file map. Along with the name of the file, we maintain information about ordering of files in the block. This is done using the “index entry number” field. An “index entry number” value of 0 indicates that this is the first file in the given block and its corresponding entry in the index table is present at an offset of zero from the beginning of the block. The index entry number is assigned in a similar manner to each file in a block during the file merging process. The block is depicted in Figure 4. The mapping technique that has been used is more scalable when compared to that used by Dong et al. It also does not maintain any file to logical block number mapping as a part of the file name. This allows the solution to domain independent. C. Prefetching The file merging technique only reduces the metadata footprint in the NameNode. It does not improve the performance of read operations. As mentioned before, HDFS is designed on “write once, read many times” pattern. Improving the speed of file read operation is more significant than improving the speed of write operation.

Figure 5. Constituent File Map structure When a file is being read from HDFS, a request is sent to the NameNode initially to obtain the metadata associated with the file. This metadata provides information about the list of blocks containing the file and the DataNodes that hold these blocks. This is essential for reading the contents of the file. Communication with the NameNode happens via Remote Procedure Calls (RPC). Therefore, for every file that is being opened, an RPC request has to be initiated to obtain the metadata. This poses a problem at the NameNode, when a large number of small files are accessed. The number of requests and the frequency with which they are generated,

while reading small files, places a considerable load on the NameNode, making it a bottleneck in the system. The proposed EHDFS overcomes this bottleneck by providing a framework for prefetching file metadata. Whenever the client tries to read a small file, present as a part of the combined file, metadata for other small files in the same block, as that of the requested file, is prefetched from the NameNode. This makes the assumption that a client will probably access small files in the combined file in a manner that is akin to sequential access. The prefetched data is then cached in the client side. Whenever the file metadata is present in the cache, the client need not initiate an RPC request to the NameNode. This ensures that the number of requests sent to the NameNode is reduced considerably, thereby improving the performance of read operation. D. File Extraction File extraction involves the process of extracting the desired file contents from a block. This operation is carried out by the DataNode. While reading a file, the client specifies both the name of the small file and the name of the combined file. This information is used to obtain the block holding the file, the DataNode holding the block and the “index entry number” from the NameNode. The obtained “index entry number” is then sent to the DataNode that has the block. The DataNode then uses this value and seeks to the desired entry in the index table placed at the beginning of the block. The entry in index table contains the offset of the file data from the beginning of the block and also the length of the file data. The DataNode then seeks to the desired offset and reads the requested file data and sends it to the client. This operation greatly reduces the load on the network as the entire block data is not sent back to the client. E. File Access Operations The following sections describe how the file read and write operations are carried out in EHDFS. 1) Write Operation: Write operation is initiated whenever a combined file is created using EHDFS. User invokes the create operation using the ExtendedHDFS module available in the client side. This sends a request to the NameNode, via RPC, for creating and initializing the data structures that are required to store the combined file. The NameNode creates a special inode to indicate that the file being created is a combined file and saves it into the namespace tree. NameNode initializes a constituent file map. This map is saved as part of the inode representing the file and holds entries indicating the small files that form the part of a combined file. The client is then provided with an output stream and some additional helper methods that help to associate file data with an entry in the combined file. The data written to the output

Page 5: [IEEE 2013 International Conference on Computer Communication and Informatics (ICCCI) - Coimbatore, Tamil Nadu, India (2013.01.4-2013.01.6)] 2013 International Conference on Computer

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 - 06, 2013, Coimbatore, INDIA

stream is initially buffered in the client side to perform file merging. The buffer is capable of holding a block‘s worth of data. An index table is constructed for all the files whose data is present in the buffer. This table contains a tuple per file as explained in the file merging process. The buffer‘s content is appended to the index table and then sent to the DataNode. The offset for each entry in the table is calculated such that it also includes the size of the table at the beginning of the block. The purpose of this index table is to quickly identify the beginning of the required file in the block without the need to scan through the entire block. This is made much faster by storing in the NameNode, along with the logical block number, the index number in this table where the tuple corresponding to the required file is present. After successfully writing the block data to the DataNodes, the client sends an update to the NameNode specifying the list of file names that were written to the most recent block, along with the index entry number for each file. This is used by NameNode to construct the constituent file map. This process is repeated until the client closes the combined file that is being created. The overview of the process is shown in Figure 6.

Figure 6. Write Operation in EHDFS

2) Read Operation: The user initiates the read operation by directly accessing the small file inside the combined file. The file path in the file system is represented by specifying the combined file name as the name of the folder containing the small file. In other words, if “xyz” is the name of the combined file and “file1” is the name of the small file, then the user should specify the path for “file1” as “location-of- combined-file/xyz/file1”. The client side module extracts the combined file name and the small file name from the given path and requests the NameNode for metadata corresponding to the small file. This is an RPC based request and the NameNode responds to it by looking up the small file name in the constituent file map stored for the specified combined file. The logical block

number and the index entry number are sent back to client, as response, along with the block locations (list of DataNodes). The client can also include a prefetch count as an additional parameter while opening a small file. This option instructs the NameNode to send metadata of files around the requested file along with the response to the current request. The count indicates the number of files for which metadata must be prefetched. The NameNode preserves the ordering of files in the block in its constituent file map. Therefore, whenever a client is planning to access files sequentially for processing, a large prefetch count can be specified (limited by cache capacity). Metadata can also be prefetched for files present before the current file using appropriate options. These prefetched metadata entries are stored in a client side cache. This cache is checked whenever a small file is being opened by the client, and if the metadata is already present in the cache, no request is sent to the NameNode. This considerably improves the response time for read operations. It also reduces the number of requests that are being sent to the NameNode, thus reducing the load on the NameNode while improving the speed of consecutive read operations. After obtaining the metadata for the requested file, from cache or from the NameNode, a streaming connection is established between the client and the DataNode holding the block. The index entry number for the requested file is sent to the DataNode during this phase, and the DataNode reads the specified entry in the index table of the block. It uses this offset and length values for future read operations that are carried out on this connection. The client can read through the file specifying the offset with respect to the beginning of the file, i.e. first byte in the file is at offset zero from the beginning of the file. This offset is then translated to equivalent offset in the block by adding it to the value obtained from the index table at the DataNode. The read operation is then delegated to the existing original HDFS API that allows reading from a block starting at a specified offset. The entire read process is shown in Figure 7.

Figure 7. Read Operation in EHDFS

Page 6: [IEEE 2013 International Conference on Computer Communication and Informatics (ICCCI) - Coimbatore, Tamil Nadu, India (2013.01.4-2013.01.6)] 2013 International Conference on Computer

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 - 06, 2013, Coimbatore, INDIA

IV. EVALUATION AND RESULTS The performance of the HDFS cluster with respect to the main memory usage and the time taken to complete read and write operations was initially benchmarked with the original default HDFS and then compared with results obtained using EHDFS. This process is described in the following sections. A. Experimental Environment The test platform contains a cluster comprising 3 machines. Each of these machines has the following configuration: 1) Intel Pentium Core 2 Duo 2.6 GHz processor 2) 4GB (2 x 2 GB) DDR2 RAM 3) 160 GB SATA HDD All the machines are connected using a 1.0 Gbps Ethernet network. In each machine, Cent OS 5.4 with the kernel version of 2.6.18-164 is installed. Hadoop version 0.20.203 and Java version 1.6.0 have been used. The number of replicas for data blocks is set to 2 and the default block size is 64MB. B. Workload Overview The workload for the main memory usage measurement contains a total of 100,000 files. The size of these files range from 10KB to 150KB. The cumulative size of all the files is approximately 3.24GB. The distribution of file sizes is shown in Figure 8.

Figure 8. Distribution of File Sizes in Workload

The workloads for measuring the time taken for read and write operations is a subset of the workload used for memory usage experiment, containing 1000 files. C. Measurement Parameters The performance of the cluster was measured based on the following parameters:

1) Amount of main memory used by NameNode for storing meta data 2) Time taken to complete read and write operations The memory used by NameNode for storing meta data was measured using jmap and Memory Analyzer Toolkit. jmap tool was used to obtain the heap dump of all the live objects that are currently present in the memory of the NameNode java process. Memory Analyzer Toolkit was then used to analyze the obtained heap dump. The retained size [10] of the instance of the class, org.apache.hadoop.hdfs.server.namenode.FSNameSystem, was used as a measure of the amount of metadata maintained by the NameNode. The time taken for completing the read and write operations was measured in milliseconds using date linux command in Cent OS. D. Memory Usage Measurement The experiment was carried out both in original HDFS and in EHDFS. The memory usage of NameNode was measured by analyzing heap dumps after placing sets of 1000 files into HDFS. A total of 100,000 files were placed and 100 readings were taken (for every 1000 files placed into HDFS). This was repeated 3 times and three sets of readings were taken, for which the average value was calculated. The same steps were repeated for EHDFS and the memory usage in both these cases is shown in Figure 9.

Figure 9. Memory Usages of HDFS and EHDFS The graph in Figure 9 shows memory usage patterns for two cases. The bar which is represented in blue color corresponds to memory usage of original HDFS without any changes made to it. There is a linear increase in memory usage as the number of files increases. This behavior is expected as the NameNode has to maintain file metadata as well as block metadata for each file that is being stored. The next bar which is represented in green color represents the EHDFS, which maintains additional data structures in the NameNode for efficient file access and prefetching. The memory used by the EHDFS is lesser than the memory used

Page 7: [IEEE 2013 International Conference on Computer Communication and Informatics (ICCCI) - Coimbatore, Tamil Nadu, India (2013.01.4-2013.01.6)] 2013 International Conference on Computer

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 - 06, 2013, Coimbatore, INDIA

by HDFS to store the same number of files. The EHDFS only maintains the file metadata for each small file and not the block metadata. The block metadata is maintained by the NameNode for the single combined file alone and not for every single small file. This accounts for the reduced memory usage in the EHDFS. E. Time Taken For File Access In this experiment, the time taken to complete read and write operations was measured for both the original HDFS and the EHDFS. A set of 1000 files were first copied into HDFS and then read back to the local file system. The time taken to complete these operations was noted down and similar readings were obtained for the EHDFS. The readings for the EHDFS were taken for various values of prefetch counts. This was then repeated three times and an average of the results was calculated and used for analysis. The time taken for read operation in HDFS and the EHDFS is depicted in the graph shown in Figure 10. The results obtained for write operation are summarized in a similar manner in Figure 11.

Figure 10. Time Taken for Read Operation The data obtained clearly indicates that the EHDFS is faster than the default original HDFS for read operation. The time taken for read operation with EHDFS is comparable to the original HDFS read operation when the prefetch count is set to zero (EHDFS with zero prefetch count). As the prefetch count is increased, the time taken for reading decreases. The graph in Figure 10 shows that when the prefetch count is set to 100, the time taken to read 10000 files is reduced by about 16 seconds. This is evident from the fact that as the prefetch count increases, the number of requests sent to the NameNode while reading 10000 files is reduced by a factor of 100. The write operation in the EHDFS is considerably faster than the write operation in HDFS. This is indicated by the results of experimental analysis shown in Figure 11. The time taken for writing 10000 files into HDFS is extremely high as the NameNode must be contacted once to create and allocate

blocks for writing each file into HDFS. This is avoided in EHDFS as the NameNode is contacted only once for file creation and is called only when there is a new block to be added to the combined file. Another factor that comes into play is the buffering mechanism used in the client side for combining files. A block‘s worth of data is buffered in the client side before a request for adding a new block is sent to NameNode. A request is sent to the NameNode for every 64MB of data only and not for every single small file.

Figure 11. Time Taken for Write Operation IV. RELATED WORK EHDFS is an extension of the work done by Dong et al. for handling small files in HDFS. The solution proposed by Dong et al. for handling small files is specific to presentation files and preview pictures of each slide in that file. Their technique was fine tuned for this special set of small files. Therefore, their solution cannot be used directly in other domains that have large number of small files. It was primarily designed for use in BlueSky which is one of the most prevalent e-Learning resource sharing systems in China, utilizing HDFS to store courseware majorly in the form of PowerPoint (PPT) files and video clips. The user interacting with this system would upload a single presentation (PPT) file or any other courseware file to a web server. The user knows only about the existence of the web server, which abstracts the HDFS and the way in which the presentation file is stored in HDFS. This presentation file is primarily accessed online. The web server then generates a set of preview images, across various resolutions, for each slide in the presentation. These images are sent back to client when the PPT file is viewed online. The number of image files can easily exceed 100 or 200, for a standard PPT file having 50 slides with images generated for different image resolutions (2 or 4), and storing these files individually in HDFS will only lead to a metadata overhead in the NameNode.

Page 8: [IEEE 2013 International Conference on Computer Communication and Informatics (ICCCI) - Coimbatore, Tamil Nadu, India (2013.01.4-2013.01.6)] 2013 International Conference on Computer

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 - 06, 2013, Coimbatore, INDIA

This was the inspiration for Dong et al. and the solution proposed by them is specific to only this case. Their solution creates a combined file for each presentation file that is being uploaded. This combined file contains the presentation file along with all the other preview images generated by the web server. These preview images are never accessed or addressed directly by the client. The client only knows about the presentation file and its name. Therefore the generated preview images can be named anything by the web server. This aspect is used for reducing the metadata overhead in the NameNode. The generated image files are given names that make it easier to access them. Each such image has a name that has the four domains, which in turn are name domain, resolution domain, serial number domain and block number domain. The block number domain part of the name indicates that logical block number of the block containing that image file. The technique of maintaining the logical block number as a part of the file name is not scalable and cannot be applied to any arbitrary set of small files. This can be solved by maintaining a mapping between the name of the small files and the combined file name in the Name Node. An outline of this approach is provided by Dong et al. in their paper. EHDFS takes this a step further in the sense that it not only maintains such a mapping in the NameNode but also maintains the ordering of files in the block. This makes it easier to prefetch metadata of the next file that is present in the block. While sequentially accessing files in the combined file, this property can be made use of to reduce the number of metadata requests to the NameNode. The memory usage of the NameNode in EHDFS is obviously lesser than the NameNode memory usage of a normal HDFS that does not use file merging techniques to handle the small files problem. However, it is higher than the memory usage in solution proposed by Dong et al. This increase in memory usage can be attributed to the fact that the solution proposed by Dong et al. does not store any special data structures in the NameNode, while the new solution maintains a constituent file map per combined file. This increase in memory usage is justifiable by the fact that EHDFS is not domain specific and can be applied to any arbitrary set of small files. IV. CONCLUSIONS AND FUTURE DIRECTIONS HDFS was originally designed for storing large files. When it is used to store a large number of small files, the I/O performance becomes the bottleneck. In addition, the metadata footprint in NameNode‘s main memory increases rapidly. The proposed EHDFS has been designed and implemented in such a way that a large number of small files can be merged into a single combined file and it also provides a framework for prefetching metadata for a specified number of files. From the performance analysis carried out, it can be concluded that the use of EHDFS improves the efficiency of access to small files

and reduces the metadata footprint in NameNode‘s main memory, corresponding to the small files. EHDFS allows for greater utilization of HDFS resources by providing more efficient metadata management for small files. For 100,000 small files, the memory usage was reduced by 16%. Writing time is reduced by 82% and reading time is brought down by 28% (with prefetching) compared to the original HDFS. As for future work, this solution can be improved further to provide a more advanced prefetching framework. This framework should support better file correlation analysis for prefetching and a better file merging process that takes this file correlation information into consideration. Append operation can be provided by EHDFS, to add files into an existing combined file. This solution can also be ported to Federated HDFS [11], a new version of HDFS which uses distributed name space management.

REFERENCES [1] K. Schvachko, H. Kuang, S. Radia, R. Chansler. “The Hadoop Distributed File System”. In Proceedings of IEEE 26th symposium on Mass Storage Systems and Technologies (MSST), Incline Village, Nevada, USA, May 2010. [2] B. Dong, J. Qiu, Q. Zheng, X. Zhong, J. Li, Y. Li. “A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files”. In Proceedings of IEEE International Conference on Services Computing, Miami, Florida, USA, July 2010, pp. 65 72. [3] S. Ghemawat, H. Gobioff, S. Leung. “The Google File System”. In Proceedings of ACM Symposium on Operating Systems Principles, Lake George, NY, October 2003, pp. 29 43. [4] Tom White, Hadoop: The Definitive Guide, 2nd ed. O’Reilly Media, Yahoo! Press, Jun. 2009, pp. 41 45. [5] Jason Venner, Pro Hadoop, 1st ed. Apress, Jun. 2009, pp. 4 17. [6] Chuck Lam, Hadoop In Action, 1st ed. Dec. 2010, pp. 8. [7] “Apache Hadoop”. http://hadoop.apache.org/, 2009. [8]”HDFS Arhitecture Guide”. http://hadoop.apache.org/common/docs/current/hdfs design.html, 2009. [9] Tom White, “The Small Files Problem”. http://www.cloudera.com/blog/2009/02/the small files problem, 2009. [10] “Shallow and Retained Sizes”. http://www.yourkit.com/docs/80/help/sizes.jsp. [11] “HDFS Federation”. http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop yarn-site/Federation.html.


Recommended