File Management Chapter 12. Files and File systems File system provides the resource abstractions...

transcript

File Management

Chapter 12

Files and File systems

File system provides the resource abstractions typically associated with secondary storage.It permit user to create data collections, called files with the following properties:

Long-term existence Files are stored on disk/other secondary storage do not disappear when a user logs off.

Sharable between processes Files have names and can have associated access permissions that permit controlled sharing

Structure file can have internal structure that convenient for a particular applications. It can be organized in hierararchical structure.

A collection of functions that can be performed on files:

CreateNew file is define and positioned within the structure of files

DeleteA file is removed from the file structure and destroyed

OpenAn existing file is declared to be “opened” by a process, allowing the process to perform function on the file

CloseThe file is closed with to respect to a process, process no longer may perform functions on file

Reada process reads all /portion of data in a file

WriteProcess updates a file , add new data /changing values.

File structure

Terms are commonly used when discussing about files:

FieldBasic element of data

An individual field contains single value, e.g. employee‘s name

It’s characterized by its length and data type

Can be fixed or variable length depending on file design

Can contains subfields

File structureRecord

Collection of related fieldsCan be treated as a unit by some application program.Exp: employee record have fields such as name, social sec number, date hired etc…Can be fixed/variable length

FileA collection of similar recordsTreated as s single entity by users and applications and may be referenced by nameMay be created and deletedApplying access control

File structure

DatabaseCollection of related data.Essential aspects of database are that the relationships that exist among elements of data are explicit and the database is designed for use by number of different applications.May contain all of the info related to an organization.Consists one/more types of files

File structure

Operations that must be supported when to use files:

Retrieve_AllRetrieve all the record of a file.Required for an application that must process all of the info in the file at one time.E.g.: application that produces a summary of the info in the file This operation is often equated with the term sequential processing

– since all records are access in sequence.O

File structure

Retrieve_One: Just retrieve one recordE.g.: interactive transaction-oriented applications need this operation.

Retrieve_NextRetrieve the record that is “next” in some logical sequence to the most recently retrieved record.E.g.: interactive application like filling in forms, performing a search operation.

Retrieve_PreviousRecord that is “previous” to the currently accessed record is retrieved.

Insert_OneInsert new record into the file.

Delete_OneDelete an existing record.

Update_oneRetrieve a record, update one/more of its field and rewrite the updated record back into the file.

File structure

Retrieve_FewRetrieve a number of record.

The nature of the operations that are most commonly performed on a file will influence the way the file is organized.

File structure

File Management Systems (FMS)

set of system software that provides services to users an applications in the use of files.Users/application may access files in through the FMS.Objectives:

To meet the data mgmt needs and requirements of the user, which include storage of data and the ability to perform the operation required.To guarantee, to the extend possible, that the data in the file are valid.

To optimize performance, both from the system point of view in terms of overall throughput and from user’s point of view in term of response time.To provide I/O support for a variety of storage device types.To minimize/eliminate the potential for lost /destroyed dataTo provide a standardize set of I/O interface routines to use processesTo provide I/O support for multiple users.

For objective 1: meeting user requirement Requirements depends on the variety of applications and the environment in which the computer system will be used.For an interactive general-purpose system, the following constitute a minimal set of requirements:

each user should be able to create, delete, read,write,modify files.Each user may have controlled access to other users’s files

Each user may control what types of accesses are allowed to the user’s filesEach user should be able to restructure the user’s files in a form appropriate to the problem.Each user should be able to move data between filesEach user should be able to back up and recover the user’s files in case of damageEach user should be able to access the user’s files by using symbolic names

Need to look at software organization in order to understand file mgmt.Figure 12.1 show the File system software architecture.Lowest level:

device drivers communicate directly with peripheral devicesDevice driver responsible for starting I/O operations on a device and processing the completion of an I/O request.Exp: disk and tape.Part of OS.

File System Architecture

Basic file system/physical I/O:Primary interface with the environment outside of the computer system.It deals with blocks of data that are exchanged with disk/tapeConcerns with the placement of those blocks on the 2nd storageAnd on the buffering in main memoryPart of OS

Basic I/O supervisorResponsible for all file I/O initiation and terminationControl structures are maintained that deals with device I/O, scheduling and file statusPart of OS

Logical I/OEnables users and applications to access recordsDeals with file records.Provides a general-purpose record I/O capability and maintained basic data about files.

Access method Level that closest to the userProvide standard interface between application and the file system and devices that hold the dataDifferent access methods reflect different file structures and way of accessing and processing the data

File Management Functions

Another way of viewing the functions of a file system is shown in Figure 12.2User and application program interact with the file system by means of commands for creating and deleting files and performing operations on files. Before performing any operation the file system identify and locate a selected fileUse a directory to describe the location of all files plus their attributes

On a shared system enforce user access control

Only authorized users are allowed to access files.Basic operations may perform on a file are performed at record levelFiles are viewed as some structure that organizes the record

Sequential structure – employee name stored alphabetically by last name

Thus, to translate user commands into specific file manipulation commands, the access method appropriate to this file structure must be employed.

I/O is done on block basis.The records of a file must be blocked for output and unblocked after input.To support block I/O files:

Secondary storage must be managedAllocating files to free blocksManaging free storage for available blocks.

File Organization and access

File organization refer to the logical structuring of the records as determined by the way in which they are accessed.Criteria need to look when choosing a file organization:

Short access time

Ease of update

Economy of storage

Simple maintenance

Reliability

Continue..

Focus on 5 organizations :

The pile

The sequential file

The indexed sequential file

The indexed file

The direct/hashed file

The pile

Least complicatedData are collected in the order in which they arriveEach record consists of one burst of dataPurpose: simply to accumulate the mass of data and save it.Records may have different fields/similar fields in different orderEach field should be self-describing, filed name as well as valueThe length of the field must be implicitly indicated by delimitersNo structure to the pile record, record access is by exhaustive search.

The pile… cont

i.e: need to find record that contains a particular field with a particular value, necessary to examine each record in the pile until found/not found.Pile files are encountered when data are collected and stored prior to processing/when data not easy to organizeUses space well when the stored data vary in size and structurePerfectly adequate for exhaustive searches, easy to updateNot suit for most applications.

The sequential file

Most commonA fixed format is used for recordsAll records are of the same length, consisting of the same number of fixed-length fields in a particular orderFirst field in each record is referred as key field.The key field uniquely identifies the recordUsually used for batch applicationEasily stored on tape/diskFor interactive application that involve queries-poor performance

The sequential file… cont

The Sequential FileNew records are placed in a log file or transaction fileBatch update is performed to merge the log file with the master file

Sequential File

The Indexed Sequential File

Maintains the key characteristic of the sequential file

Records are organized in sequence based on key field.Add two features:

An index to the file to support random accessAn overflow file

Index provides lookup capability to reach quickly

Overflow similar to log file used with sequential file but is integrated so that record in the overflow file is located by following a pointer from its predecessor record.

The Indexed Sequential File… cont

Comparison of sequential and indexed sequential

Example: a file contains 1 million records

On average 500,00 accesses are required to find a record in a sequential file

If an index contains 1000 entries, it will take on average 500 accesses to find the key, followed by 500 accesses in the main file. Now on average it is 1000 accesses

Indexed Sequential File

Uses multiple indexes for different key fields

May contain an exhaustive index that contains one entry for every record in the main file

May contain a partial index – contains entries to records where the field of interest exists.

When new record is added to main file, all of the index files must be updated.

Used in applications where timeliness of info is critical i.e airline reservation system, inventory control system.

Indexed File

Directly access a block at a known address

Key field required for each record

Make use of hashing function on the key value.

Often used when very rapid access is required, where fixed length length record sre used and where records are always accessed one at a time.

i.e directories, pricing

The Direct or Hashed File

File Directories

Contains information about filesAttributesLocationOwnership

Directory itself is a file owned by the operating systemProvides mapping between file names and the files themselves

Simple Structure for a Directory

List of entries, one for each fileSequential file with the name of the file serving as the keyProvides no help in organizing the filesForces user to be careful not to use the same name for two different files

Two-level Scheme for a Directory

One directory for each user and a master directoryMaster directory contains entry for each user

Provides address and access control information

Each user directory is a simple list of files for that userStill provides no help in structuring collections of files

Hierarchical, or Tree-Structured Directory

Master directory with user directories underneath itEach user directory may have subdirectories and files as entries

Files can be located by following a path from the root, or master, directory down various branches

This is the pathname for the fileCan have several files with the same file name as long as they have unique path names

Current directory is the working directoryFiles are referenced relative to the working directory

File Sharing

In multiuser system, allow files to be shared among usersTwo issues

Access rights

Management of simultaneous access

Access Rights-exp on access right

NoneUser may not know of the existence of the file

User is not allowed to read the user directory that includes the file

KnowledgeUser can only determine that the file exists and who its owner is

Access Rights

ExecutionThe user can load and execute a program but cannot copy it

ReadingThe user can read the file for any purpose, including copying and execution

AppendingThe user can add data to the file but cannot modify or delete any of the file’s contents

Access Rights

UpdatingThe user can modify, deleted, and add to the file’s data. This includes creating the file, rewriting it, and removing all or part of the data

Changing protectionUser can change access rights granted to other users

DeletionUser can delete the file

Access Rights

OwnersHas all rights previously listed

May grant rights to others using the following classes of users

Specific user

User groups

All for public files

Simultaneous Access

User may lock entire file when it is to be updatedUser may lock the individual records during the updateMutual exclusion and deadlock are issues for shared access

Record Blocking

Records are the logical unit of access of a structured fileBlocks are the unit of I/O with secondary storage.For I/O to be performed, records must be organized as blocks.Several issues to be consider :

Should blocks be fixed or variable length?Most system, blocks are of fixed length

What should the relative size of a block to be compared to the average record size?

Larger the block, more record can be passed to I/O, with sequentially processed it’s an advantage.

With random access-result in unnecessary transfer of unused records.

Record Blocking… cont

Given the size of the block, there are 3 methods of blocking that can be used

Fixed blockingVariable-length spanned blockingVariable –length unspanned blocking

Fixed blockingFixed-length records are usedAn integral number of records is stored in a blockThere may be unused space at the end of each block –internal fragmentation

Fixed Blocking

Variable-length spanned blockingVariable –length records are usedPacks into blocks with no unused spaceSome records must span two blocks, with the continuation indicated by a pointer to the successor block

Variable Blocking: Spanned

Variable –length unspanned blockingVariable-length records are used, not employed spanningThere is waste space in most block because of the inability to use the remainder of a block if the next record is larger than the remaining unused space.

Variable Blocking Unspanned

Secondary Storage Management

Space must be allocated to filesMust keep track of the space available for allocation

Preallocation vs Dynamic AllocationPreallocation

Need the maximum size for the file at the time of creationDifficult to reliably estimate the maximum potential size of the fileTend to overestimated file size so as not to run out of space

Dynamic allocationAllocates space to a file in portions as needed

Portion sizeSize of the portion allocated to a fileContiguity of space increases performanceLarge number of small portion increases the size of tables needed to manage the allocation infoFixed sized portion simplifies the reallocation of spaceVariable size/small fixed size portion minimizes waste of unused storage due to over allocation

Portion size… contAlternatives:

variable,large contiguous portion=better performanceVariable size avoid waste, file allocation tables are smallSpace is hard to reuse

Blocks= small fixed portion provide greater flexibility

Methods of File Allocation

Contiguous allocationSingle set of blocks is allocated to a file at the time of creation

Only a single entry in the file allocation tableStarting block and length of the file

External fragmentation will occurNeed to perform compaction

Chained allocationAllocation on basis of individual block

Each block contains a pointer to the next block in the chain

Only single entry in the file allocation tableStarting block and length of file

No external fragmentation

Best for sequential files

No accommodation of the principle of locality

Indexed allocationFile allocation table contains a separate one-level index for each file

The index has one entry for each portion allocated to the file

The file allocation table contains block number for the index

File Management Chapter 12. Files and File systems File system provides the resource abstractions...

Documents