The Design and Implementation of an Application Specific Log Structured File System for Caching...

The Design and Implementation of anApplication Specific Log StructuredFile System for Caching Internet

Data

Faramarz Rabii RichardMorris

[email protected]@infolibria.com

InfoLibria, Inc. 271 Waverley Oaks Road

Waltham MA 02452

AbstractWe have designed and implemented an application specificfile system tailored to optimize performance by takingadvantage of the unique file system usage characteristicsof an Internet cache server. The InfoLibria Web FileSystem or ILWFS is a user-space log structured filesystem for Web caching.

Recent work has proposed the use of in-memory meta-data[8], improved buffer cache management [9], and clustering[13] as methods to improve performance of Web cache filesystems. We demonstrate that there is significant benefitgained through the use a log structure as the basis for aWeb cache file system layout. We show that our approachis complementary the work mentioned above.

1

mailto:[email protected]

mailto:[email protected]

Traditional log structured file systems such as LFS [3]provide especially good performance when handling smallfiles [15], which is exactly the profile of Web cachefiles. The main drawback of the log approach is theperformance problem due to garbage collection. The uniquecharacteristics of a cache allows ILWFS to handle thegarbage problem at virtually no cost thus making the logstructure ideal for caches.

We present the design, sketch the implementation, anddiscuss the performance of this commercially availablefile system. ILWFS achieves three main objectives: First,file lookup, creation, deletion and garbage collectionrequires no I/O. Second, no disk seeks are required towrite new files. And third, automatic recovery fromcrashes without the need for journaling or any filesystem check programs.

1 IntroductionInternet data caches are devices used to accelerate Web access bykeeping a local copy of the data cached closer to the user, thus improving latency and taking load off the origin server. An example of a widely used Web cache is the Squid freeware1. Our primary goal in this project was to improve the performance of our DynaCache2 product. DynaCache is an Internet cache appliance that uses a threaded user space application to process and cache Internet Web objects. It runs on a BSD kernel as originally used a modified version of the Fast File System (FFS) [1,2]. Web objects are stored as files in the local file system and served without contacting the origin server when possible.

Performance tests done at InfoLibria demonstrated that the main performance bottleneck was in the BSD Fast File System. Traditional general-purpose file systems available in standard operating systems such as the BSD Fast File System [1,2] or NTFS [20] have been designed to handle a wide range of usage patterns and offer a rich set of features. Many applications exhibiting

1 Squid Web Proxy Cache (http://www.squid-cache.org/)2 DynaCache, InfoLibria, Inc (http://www.infolibria.com/products/dynacache_overview.html)

2

http://www.infolibria.com/products/dynacache_overview.html

http://www.squid-cache.org/

specific file system usage characteristics do not require such rich sets of features, and would instead trade features for additional performance. The following are some of the features that negatively impact the performance of general-purpose file systems:

Directory namespaces are made up of hierarchical structures with variable length strings representing file names. Additionally, file lookups often require multiple I/O requestsand long string matches.

File data is represented using intermediate on-disk meta-data structures such as inodes involving multiple indirect blocks. This type of structure requires multiple I/O(s) to get the location of the file data, as well as multiple I/O(s) to get to the data itself.

The use of complex meta-data structures results in greater susceptibility to corruption in case of a crash.

The traditional buffer cache yields little throughput benefit when buffering writes for a Web cache. Furthermore, the periodic flushing of dirty buffers to disk (update in FFS) causes a severe drop in performance for the duration of the update process and thus a very uneven performance profile.

File block allocation may require multiple I/O(s) in order to read and update bitmaps indicating free space and the file’s inode structure to reflect pointers to the new blocks. This process gets slower as the file system fills up because of theneed to examine multiple bitmaps in order to find the requiredblocks.

Detection and deletion of files that are no longer needed is very slow and I/O intensive. It requires one or more I/O(s) toget the meta-data and thus the file’s time stamp or other attributes needed to determine if the file is no longer needed. The deletion of the file also requires multiple I/O(s)in order to remove the directory entry, intermediate meta-data(e.g. inode), indirect data and meta-data blocks, and also to update the free space bitmaps.

Researchers and industrial groups have addressed many of the issues enumerated above as described in section #5. We consideredadapting one of the existing implementations, but decided against

3

it for the following reasons. Many implementations are not freelyavailable or are proprietary altogether.

Other available implementations such as the Log Structured File System (LFS) [3,4] do not fully take advantage of the unique nature of Web cache files. LFS provides excellent performance forread and writes of small files (<256 KB) [15], and would therefore appear suitable for Web files. Unfortunately it suffersfrom performance degradation of up to 40% [15] when it comes to reclaiming space used by deleted files. LFS also has a relativelycomplex directory and inode structure geared towards traditional file system usages. We therefore decided to design and implement a new file system called the InfoLibria Web File System or ILWFS.

Section #2 describes the high level attributes of a Web cache file system, and contrasts these attributes with those of general-purpose file systems. Section #3 provides the ILWFS file system design details. The on-disk layout of both meta-data and file data is described. High-level descriptions of both ILWFS data allocation and garbage collection are also presented. Section #4 reports performance results. Section #5 presents somealternate approaches. Section #6 presents conclusions, and section #7 explores future work.

2 Attributes of a Web Cache File SystemA Web cache puts a file system through a very specific type of usage. A large number of lookups are done, many of which result in failure. If a file is found, its meta-data is examined in order to determine whether it is still valid or has expired. If the file is valid, it is then read sequentially. If the file is not found, it is fetched from the origin site and then written sequentially. The file data is never modified while the meta-datamay be. The Web cache also needs to periodically purge files whose content has expired due to age.

Analysis of the type of usage led us to focus on the following attributes:

Meta-Data AccessAlthough efficient lookups, and creations are important to all file systems, they are of particular importance to a cache. The main purpose of a cache application is to fetch and place files in a cache. It is therefore critical to make the lookup and

4

create path as efficient as possible. ILWFS keeps all meta-data in memory, thus allowing the file system to handle file lookups as well as creation, deletion and access to expiration dates without the need to any disk I/O.

It is common practice in Web caches such as Apache3 to map a URL into a fixed length file name, which can be represented as a numerical value. This enables ILWFS to use a flat directory namespace with fixed length file names thus simplifying the algorithms required for file access and reduces the code paths for file lookup, creation, deletion, and meta-data access.

A general-purpose file system typically requires a complex set ofmeta-data structures. These structures require large amounts of space and are not easily kept in memory. These structures are also often scattered around the disk, thus requiring multiple disk head movements and I/O(s) in order to page them in and out of memory.

Data AccessA cache must be able to handle a large volume of file writes and reads. This requires the file system to have an efficient disk block allocation mechanism and to allow access to file data with as little disk head movement as possible. In order to do so, ILWFS takes advantage of the fact that file sizes are known at creation time as well as the fact that files are written and readsequentially and never modified once written. The ILWFS block allocation algorithm puts data for new files contiguously in the same disk region, thus eliminating the need for disk seeks between writes. Intervening read requests often result in disk seeks since the file being read is located in a random part of the disk, but the number of seeks is minimal since in most cases the file data is read in a single I/O.

In a general-purpose file system files may be modified after theyare written. This makes it impossible for the file system to everknow the final size of a file. Therefore, all of the storage for a file cannot be pre-allocated at the time of creation or ever.

Garbage Collection

3 The Apache HTTP Server Project: (http://httpd.apache.org)

5

http://httpd.apache.org/

Files in a Web cache are subject to expiration, and need to be removed from the cache. Therefore a Web cache must periodically find and deleted expired files. This process, referred to as garbage collection, frees cache space for new files. Since this process is very common, its performance is critical. While general-purpose file systems are usually slow at garbage collection, ILWFS accomplishes it at virtually no cost using a novel approach as described in the next section.

3 Detailed Description of the ILWFS file system

User Space ImplementationILWFS is a user level file system, which accesses storage as raw partitions. Although recent work by Joubert et al. [11] has indicated superior performance can be achieved by placing the file system and the server in the kernel, we chose to implement ILWFS in user space.

ILWFS is designed primarily for use by the caching process in DynaCache, which runs at user level. By running the file system in user space we completely avoid kernel calls when doing lookup,creations, deletions and meta-data access. Reads and writes of file data are also done to a user space buffer cache and therefore avoid kernel calls in instances where the request can be handled in the buffer cache. Virtual memory techniques combined with the user space buffer cache allow the application to access file data without the need for any data copies.

There are two other practical advantages of a user space implementation. The first involves porting to other operation systems. It is impossible to port an in-kernel implementation to a new operating system unless the source code for the operating system is available, and when so the port requires much more engineering resources. The second is that debugging is much simpler in user space and thus shortens time to market.

On-Disk LayoutThe ILWFS file system contains a superblock, which holds global file system information followed by the directory data. This set of information is collectively known as the meta-data and is kepton a dedicated partition called the meta-data partition as shown

6

in Figure #1. There are also one or more data partitions, which are used to keep file data. Details of the data partitions are described in a later section.

Figure 1: Details of the Meta Data Partition

File Meta-DataAll of the meta-data for a file is kept within its directory entry in the meta-data partition. This includes the information required to locate the file data on disk as well as information such as the file’s data size, creation date, last modification date, and expiration date. A directory entry is therefore the only entity required to describe an ILWFS file.

File DataThe data in an ILWFS file is stored contiguously, using a single extent on a single data partition. File data can be accessed using only a partition number, an offset and a size. No indirect blocks or extra extent information is necessary. This format alsosimplifies block allocation and deallocation algorithms as described below.

7

The data region for an ILWFS file has a specific format that is used at run time to verify its validity. As shown in Figure #2, the file starts with a header, is followed by data and ends with a trailer. Both header and trailer contain information about the file name. They are checked when the file is to ensure validity. The header is always written before the data while the trailer iswritten after all file data has been written. Therefore if both header and trailer are valid, the data in between will also be safe from corruption due to a crash. If a file fails validation, it is deleted and the open call fails.

Figure 2: Details of the file data on disk

This enables ILWFS to replace time consuming startup validity checks (e.g. fsck) with a lightweight run time validity check that will handle any file system corruption caused by system crashes.

Data Block Allocation

8

ILWFS uses a file data allocation scheme similar to that of LFS [3,4.]. ILWFS divides each data partition into a number of fixed length segments. Data blocks are allocated from a segment starting at the beginning and moving on until the segment is fully used up. Each segment has a start, an end, and a current position as shown in Figure #3. For an empty segment, the current position is set to the start position. As blocks are allocated, the current position is increased until it reaches theend of the segment.

Figure 3: Data Partition and Segment Layout

ILWFS takes advantage of the fact that the user knows the size offiles at their creation time. This information is supplied to ILWFS as a part of the file creation interface. ILWFS then finds a segment with enough room and allocates the required blocks contiguously.

9

This approach works very well on an empty file system but can runinto trouble when the file system becomes full. The problem is that when a random file is deleted, it is not easily possible to reclaim the space. The space can be reclaimed only when it falls beyond the current position of a segment. Otherwise, it becomes an unused fragment in and otherwise used region of disk.

There are two ways to handle this problem. The first approach is to have a defragmentation process move the data for the files around such that the free space is always at the end of a segment. The second approach is to not move the files at all. When a segment is sufficiently old, the file system will free allfiles within it and reclaim the whole space.

A general-purpose file system may not use the second approach as that will result in the loss of files without an explicit deletion request. ILWFS however, is used only to cache Internet data and may therefore purge files whenever necessary. This allows ILWFS to use the second approach. If the cache server looks for a file in ILWFS and does not find it, the file will be fetched from the original source.

There are also two other characteristics of Web caches that help with this type of storage reclamation. Web objects tend to expirein a few days. Therefore ILWFS need only keep a segment around for that period. Once a segment is old enough most of its files will be expired and the segment may be freed. Also, the cost of disk storage is dropping thereby making it possible to supply ILWFS with so many large segments that only very old segments need reclamation.

File NamingA URL uniquely identifies a web object. Therefore the URL may be viewed as the name of the object. In ILWFS this name is first mapped into a 64-bit value using a hashing function. The hash value is then used as the name of the object represented as a file when interacting with the file system. The purpose of this approach is to allow quick lookup of Web objects in the cache. InILWFS we take advantage of this flat namespace by creating a directory region as shown in Figure #4.

10

Figure 4: Directory Block Array and Indexing

The directory region is organized as an array of directory blocks. Each directory block is addressed by indexing into the directory array using the first N bits of the hash value of the names within it. Therefore, a given ILWFS instance will have 2^N directory blocks where 0 < N < 64. All names within a directory block start with the same values for the first N bits.

Within each directory block is a set of directory entry slots anda header. All directory entry slots within a block are equal and may hold any name that belongs to that block. The header containsinformation such as the number of in-use directory entries for each block. Each directory block contains P slots where P is configurable. A directory block is shown in Figure #5.

11

Figure 5: Details of Directory Blocks and Directory Entries

ILWFS keeps all directory blocks in memory for performance. A name lookup is done by using the first N bits of the name as an index in the directory block array to get the block that will contain the name if present, and next performing a search within the directory block for a match of the full name. The first part of the lookup is simply a pointer de-reference while the second part involves at most M name checks. Since the names are 64-bit values, the checks are fast and do not require string operations.

One drawback of this approach is that the ILWFS directory region allows at most (2^N * M) files and must be pre-configured when the file system is created. The other drawback is that when a directory block is full, no more files with names hashing to thatblock may be added to ILWFS. This is the case even if there are empty directory slots elsewhere. The hash algorithm is therefore very important, as it must map random URL(s) evenly into the 64-bit namespace.

12

Increasing the size of the directory region will address these problems. However, if the directory region is too large it will no longer fit in memory and parts of it will have to be paged. ILWFS has been enhanced to support over 1 billion files by using paging techniques without a significant impact on performance (~12 %). The details of directory paging and the hashing algorithm are proprietary and will not be discussed here.

Garbage CollectionWhenever the number of empty segments within a partition or the number of free directory entries drops below a configurable threshold, garbage collection is scheduled. Each segment containsa segment expiration date. This value is set to reflect the expiration date of the most long-lived file in the segment. A segment is eligible for garbage collection when its expiration date has been reached.

Garbage collection scans all segments, looking for a segment whose data has already expired. If one is not found, then garbagecollection is deferred until a segment is due to expire. Until that time the partition will be viewed as full. Note that this algorithm is done entirely in memory and requires no I/O.

Garbage collection must also free directory entries that point tofiles that have been removed, otherwise ILWFS will run out of free directory entries. ILWFS frees directory entries using a lazy evaluation method. Every segment and directory entry includes a generation number. When a directory entry is created, its generation number is set to match that of the segment holdingthe data. When a segment undergoes garbage collection its generation number is incremented. This effectively invalidates all active directory entries that reference that segment. During file lookup, ILWFS searches a directory block in order to locate the named file. As directory entries in that block are examined to see if they contain a particular file, their segment generation number is checked against that of the corresponding segment. If the generation numbers do not match the directory entry is freed. Since ILWFS has to go through the directory blockscan as a part of lookup anyway then, freeing stale directory entries comes for free.

13

Crash RecoveryILWFS takes advantage of the fact that each file is represented by only two on-disk structures to implement a unique type of crash recovery. A file has a directory entry and a data region, which are the only structure that may become inconsistent in the event of a crash. The inconsistency may take one of two forms. The first is a directory entry without the corresponding file data, and the second is data for a file without the correspondingdirectory entry.

In the first situation, if the directory entry is ever accessed, the on-disk file data will be read. The header and trailers of the on-disk file will be checked, found to be invalid and the directory entry will be freed. No invalid data will be served to the user. In the second situation, there is no directory entry pointing to the file data and it will never be accessed. In a standard file system this will result in a leakage of space, but in ILWFS the corresponding segment will eventually get aged and freed thus reclaiming the space.

InterfacesThe ILWFS file system was designed specifically to cache URL objects. It therefore implements only a modified subset of general-purpose file system interfaces that were needed for that purpose. This subset consists of the following 12 interfaces:

Create Generate a new file

Save Save a newly generated file

Open Open an existing file

Close Close an existing open file

Read Read data from an open file

Write Write data into a new file

14

SetMeta Modify a file’s meta-data

GetMeta Read a file’s meta-data

GetStats Get global file system info.

Remove Delete a file using its name

Initialize Initialize the file system

Shutdown Shutdown the file system

CreateThis interface takes a name (which is a 64-bit hash value) and a maximum size parameter. It returns an open file descriptor similar to the FFS file descriptor. The following algorithm is followed:

A search is made to insure that the name is not presently inuse and if so an error is returned indicating the file already exists.

A directory entry is allocated and the file name inserted into it.

The space for the file data is allocated contiguously from asingle segment.

A set of in-memory objects is allocated and a file descriptor is returned to the caller.

If any of the above allocations fail, then the allocated resources are freed, and an error is returned to the user.

The user may write data to the file up to its maximum size. The file must then be closed using the special save interface described next.

The file descriptor returned from Create allows the user to writeto the file. This file descriptor may be closed either using the

15

Save or Close interfaces with different results as explained below. Also, no other thread may access a file while it is being written. Any attempt to access a file via the Open or Create interfaces will fail.

SaveThis interface is called when the user has successfully written the file data. When a file is saved:

A trailer is inserted which includes validation information such as a magic number and the file name.

Once this is done, other threads may access the file for read using the Open interface.

If two threads attempt to create the same file, the first one will succeed and the other will get an error indicating the file already exists.

The appropriate in-memory objects are deallocated.

Once a file is written, it becomes read-only and the data isessentially immutable. The file may of course be deleted using the Remove interface.

Open ILWFS files may be opened for read only access once they have been created.

The open call takes the 64-bit file name and uses it to perform a lookup. If the file is not found an error is returned to the user.

Next ILWFS will look to see if the file is already open. If so a file descriptor is returned.

Otherwise, ILWFS will read the header and the trailer and perform the necessary validations. If the file is found to be invalid the file name it is removed from the directory namespace, and an error is returned to the user.

If all has been successful a set of in-memory objects is instantiated, and a file descriptor is returned. The user

16

may use the file descriptor to perform operations such as Read.

Close This interface is called to close an open file descriptor. It is called under in two situations and behaves differently according to the context.

When the file was opened using the Open interface. In this case, it will deallocate the appropriate in-memory objects.

When the file was just created using the create interface. In this case the user should have used Save to successfully finish the create process. Therefore Close implies that an error occurred. In this situation ILWFS will remove the fileand deallocate the related internal resources.

WriteThis interface only works on a file descriptor that was generatedvia Create. All attempts to write using a file descriptor obtained via Open will fail with a permission error.

4 Performance ResultsWe focus mainly on macro benchmarks because of our interest in the impact of ILWFS on the whole Web cache appliance. We use an industry wide benchmark called the Web Polygraph4. This is a freely available benchmarking tool for Web caches and provides realistic traffic generation and content simulation. The benchmark measures the number of operations per second a Web cache can perform. An operation represents a request for a web object. In a cache server this results in a request to retrieve one file from the file system. If the file is found in the cache,it is read and the content sent back, if the file is not already in the cache then it is fetched from its origin server and written to the cache.

We compare three types of setups. As a baseline we use the freelyavailable Apache 2.0 running on an unmodified FFS. We next measure the performance of DynaCache with a modified FFS that offers async I/O and keeps all directory information in memory.

4 Web Polygraph Benchmarking Tool: (http://www.web-polygraph.org)

17

http://www.web-polygraph.org/

Finally, we measure the performance of DynaCache with ILWFS. We do not measure DynaCache with unmodified FFS because DynaCache utilizes a single threaded process which if run on FFS with synchronous I/O will loose all of its parallelism. Apache on the other hand is a multi-process cache server and is able to maintain its parallelism while using synchronous I/O.

Data was collected for three types of workloads.

All writes, where every request was a miss, resulting in the object being fetched from the origin server and written to thefile system.

All reads, where every request resulted in a cache hit.

A mixture of 60 % hits and 40 % misses.

Since our work does not focus on in-memory caching of data, we have avoided memory hits as much as possible. In all tests we used a working set that was much larger than the buffer cache, and avoided temporal locality therefore taking the buffer cache out of the picture. The results are given below:

Polygraph Results Apache with FFS (1)

DynaCache with Async FFS (2)

DynaCache with ILWFS (3)

All Writes 11 ops/sec 124 ops/sec 411 ops/sec

All Reads 15 ops/sec 326 ops/sec 490 ops/sec

60 % Reads, 40 % Writes

25 ops/sec 230 ops/sec 476 ops/sec

The results of Apache with FFS are only useful as a baseline. They fall far short of DynaCache using either FFS with async I/O or ILWFS. We therefore focus on the difference between DynaCache on FFS with async I/O (column #2) and DynaCache on ILWFS (column #3).

Lookup: File lookup in ILWFS is done all in-memory and involves no string matches. It therefore takes constant 10 microseconds to do a lookup for any file. FFS on the same platform takes anywhere between 5 milliseconds and 16 microseconds depending on how much information is present in the system memory and the path name. ILWFS therefore beats FFSunder all conditions. This performance is critical to a cache

18

since every cache requests results in a lookup. The effect of improved lookups is reflected in all performance results.

Creations and deletions. Creation and deletions are implemented very similar to lookups. The code paths are very similar the performance characteristics are almost identical.

Write: We observe a factor of (411/124 = 3.3) improvement in throughput when writing 10K files. The primary reasons are theminimization of disk head movement and fewer writes. FFS generates multiple I/O(s) in order to write a file. The meta-data by itself may require the directory, the directory inode,and the file inode to be written. Next, the data has to be written in fixed sized blocks. If the block size is 4K, this translates into an average of 2.5 disk writes per file just for the data. Between any of these writes, the disk may have to perform a head movement further lowering performance.

Read: Here we see a factor of (490/326 = 1.5) improvement in throughput. In this case, ILWFS is less able to utilize locality to avoid head movements. Read traffic exhibits more random behavior than write because while ILWFS can control disk block location during write it cannot do so for read. Methods such as clustering can minimize head movement during reads [13] and should be investigated.

Mixture: This test was done in order to simulate a typical run. The results are no surprise and the improvement is in line with the rest for reasons already discussed.

It is important to point out that the file system is only responsible for a fraction of the cache server work. Therefore performance improvements shown by the cache server will always bea fraction of that achieved by the file system.

The HardwareAll tests were run on a 550 MHz Pentium-II machine, with 512 MB of RAM with 2 9-Gig, 1000-rpm SCSI disks. In all tests, one disk was used for as the system disk, while the second was used to store the file system data.

19

5 Related WorkILWFS presents a number of unique solutions although many of the problems addressed by it are not new. Below is a summary of worksaddressing some of the same problems as ILWFS.

Reduction of Meta-Data OverheadSoft Updates [7], and the Phase Tree5 work eliminate the need forsynchronous meta-data updates. The Soft Updates approach addresses the problem of synchronous writes for a file system containing a more complex set of meta-data structures and is therefore not applicable to ILWFS. The Phase Tree approach, whileapplicable, is more complicated than the approach taken in ILWFS.

Improvement of Write PerformanceThe Log Structured File System [3, 4] has made progress towards this goal. ILWFS is an instance of a log structured file system. We did not use an existing LFS implementation because our application allowed us to use a simpler and more efficient approach.

User Level File SystemsA number of researchers and industry groups have implemented userlevel file systems. These include a replicated file system [6], the WPSFS [8], Hummingbird [13], and the work of Mazières [14]. The first work was not applicable to our needs because it is a distributed file system. The other works were not available at the time ILWFS was designed and therefore were not used as a basis for our design.

Fast Recovery from CrashesWe know of two distinct approaches aside from the one used by ILWFS. The first is the journalling approach used by a number of file systems such as XFS [5]. The second is the Phase Tree approach used in the Tux-II file system mentioned earlier. We adopted neither since unique features of ILWFS allowed a much simpler approach.

5 Phase Tree design of the Tux-II file system, presentation by Daniel Phillips (2nd Annual Linux Storage Management Workshop, Oct 15-19, 2000)

20

A Flat NamespaceWang et al. [8] use an in-memory directory structure with fixed length file names similar to ILWFS. This work was not available when ILWFS was developed and was not used as a basis for our work.

Buffer Cache ManagementThe work of Pei Cao [9] focused on application control of the buffer cache, while I/O Lite [10] has focused on reduction of data copies. The work we present here does not focus on in-memorycaching and therefore we have not taken advantage of the work of Pei Cao. The ILWFS design however does reduce data copies throughthe use of a user-level buffer cache.

6 ConclusionsThe ILWFS file system was designed specifically for a Web cache server. It was therefore optimized for the specific set of file system operations needed by the appliance. ILWFS design decisions were made to take advantage of these types of file system usage. The outcome was a file system with the following characteristics:

File lookup, creation, deletion, and garbage collection is done in memory without any disk I/O.

The majority of writes are done without the need for disk headmovement and a single disk I/O.

Crash recovery is accomplished instantly because of the uniquefile validation technique.

The ILWFS file system demonstrates that a log-based approach is possible and perhaps ideal for Web cache file systems. It succeeds in turning the main drawback of the log approach (garbage collection) into a substantial advantage. ILWFS frees a large segment of a log instantly thus reducing the overhead of garbage collection to a negligible amount. ILWFS also achieves optimal write performance by writing all files to the end of a log segment.

Designing the ILWFS filesystem has taught us a number of lessons that are not restricted to file systems or Web caches.

21

When searching for a solution, consider options that are not already available or already implemented. It may be more productive to start with a clean slate and design a solution to fit the problem.

Avoiding unnecessary features greatly simplifies the design. This in turn opens the door to architectures not normally feasible. A case in point with ILWFS is the avoidance of complex meta-data structures, which in turn allowed us to replace fsck with file validation.

A user space implementation is advantageous during development. It simplifies debugging and greatly simplifies subsequent porting efforts. The work can always be inserted into the operation system kernel at a later time.

7 Future WorkFile clustering may be added to ILWFS in order to further reduce I/O and disk head movement. Data that occupies a contiguous disk region could be bunched together and written to disk in a single large write. The block allocation algorithm in ILWFS is especially suitable for clustering because it allocates file datacontiguously from a segment. Intelligent on-disk placement of files [13] allows clustering of reads and helps with pre-fetchingoptimizations.

Paging of meta-data will allow ILWFS to handle a far greater number of files than possible when keeping all the meta-data in memory. Work done at InfoLibria has demonstrated that adding paging to ILWFS allows it to handle a virtually unlimited number of files. We have tested an enhanced version of ILWFS with a billion files at only a 12 % cost as measured using the Polygraphtest.

Improved buffer cache handling [9] and pre-fetching will be able to increase memory hits and thus performance. Since ILWFS is simple and implemented in user-space, it is easy to experiment with various buffer cache policies and achieves the best results.

Finally, ILWFS could be placed inside the kernel as suggested by the work of Joubert et al. [11]. We expect the best performance if this is done in combination with the placement of the whole cache server application into the kernel.

22

8 AcknowledgementsThanks to Prof. A. Haddaya, and Dr. J. Nicol for their guidance and help in this project.

9 References1. M. K. McKusick, “The Design and Implementation of the 4.4 BSD Operating

System”, Addison Wesley, Reading, MA, 1996.

2. M. K. McKusick, “A Fast File system For UNIX”, ACM Transactions on Computer Systems, 2(3):181-197, August 1984.

3. M. Rosenblum and J. K. Ousterhout, “The Design and Implementation of a Log Structured File System”, ACM Transactions on Computer Systems, vol 10, pp. 26-52, February 1992.

4. M. Seltzer, K Bostic, M. K. McKusick, C. Staelin, “An Implementation of a Log Structured File System For UNIX”, 1993 Winter USENIX - January 1993.

5. A. Sweeney, D. Doucette, W. Hu, C. Anderson, M. Nishimoto, andG. Peck. “Scalability in the XFS File System”, Proceedings of the USENIX 1996 Annual Technical Conference, January 1996.

6. G. S. Fowler, Y. Huang, D. G. Korn and H. Rao, “A User-level Replicated File System”, Proceedings of 1993 Summer USENIX, pp. 279-290, June 1993.

7. M. K. McKusick, “Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast File System”, Proceedings of the FREENIX Track: 1999USENIX Annual Technical Conference, June 1999.

8. J. Wang, R. Min, Z. Wu, Y. Hu, "Boosting I/O performance of internet servers with user-level custom file systems", 2nd Workshop on Performance and Architecture of Web Servers (PAWS’2001), June 2001.

9. P. Cao, “Application-Controlled File Caching And Prefetching” , Ph. D. Thesis, Princeton University, 1996.

10. V. Pai, P. Druschel, and W. Zwaenepoel, “I/O-Lite: A unifiedI/O buffering and caching system”, Rice University CS Technical Report TR97-294. 1997.

11. P. Joubert, R.King, R. Neves, M. Russinovich, and J. Tracey, “High-Performance Memory-Based Web Servers: Kernel and User-Space

23

Performance”, 2001 USENIX Annual Technical Conference, June 2001.

12. H. Custer, “Inside Windows NT”, Microsoft Press, 1993.

13. E. Shriver and E. Gabber, L. Huang, and C. A. Stein, “StorageManagement for Web Proxies”, 2001 USENIX Annual Technical Conference, June 2001.

14. D. Mazières, “A Toolkit for User-Level File Systems”, 2001 USENIX Annual Technical Conference, June 2001.

15. M. Seltzer, K. Smith, H. Balakrishnan, J. Chang, S. McMains,V. Padmanabhan, “File System Logging Versus Clustering: A Performance Comparison”, Proceedings of the 1995 Usenix Technical Conference, June 1995.

24

Date post:	13-Jan-2023
Category:	Documents
Upload:	mit
View:	0 times
Download:	0 times

The Design and Implementation of an Application Specific Log Structured File System for Caching...

Documents