+ All Categories
Home > Documents > Fundamental File Structure Concepts

Fundamental File Structure Concepts

Date post: 15-Jan-2016
Category:
Upload: renee
View: 50 times
Download: 4 times
Share this document with a friend
Description:
Fundamental File Structure Concepts. Chapter 4. Record and Field Structure. A record is a collection of fields . A field is used to store information about some attribute. The question: when we write records, how do we organize the fields in the records: - PowerPoint PPT Presentation
62
Processing - Fundamental concepts MVNC 1 Fundamental File Structure Concepts Chapter 4
Transcript
Page 1: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 1

Fundamental File Structure Concepts

Chapter 4

Page 2: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 2

Record and Field Structure

A record is a collection of fields. A field is used to store information about

some attribute. The question: when we write records, how do

we organize the fields in the records:» so that the information can be recovered» so that we save space» so that we can process efficiently» to maximize record structure flexibility

Page 3: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 3

Field Structure issues

What if» Field values vary greatly» Fields are optional

Page 4: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 4

Field Delineation methods

Fixed length fields Include length with field Separate fields with a delimiter Include keyword expression to identify each

field

Page 5: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 5

Fixed length fields

Easy to implement - use language record structures (no parsing)

Fields must be declared at maximum length needed

last first address city state zip

10 10 15 15 2 9

“Yeakus Bill 123 Pine Utica OH43050 “

Page 6: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 6

Include length with field

Begin field with length indicator If maximum field length <256, a byte can be

used for length

last first address city state zip

Length bytes

Yeakus Bill 123 Pine

06 59 65 61 6B 75 73 04 42 69 6C 6C 08 31 32 33 20 50 69 6E 64 . .

Page 7: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 7

Separate fields with a delimiter

Use a special character not used in data» space, comma, tab» Also special ASCII char’s: Field Separator (fs) 1C» Here we use “|”

Also need a end of record delimiter: “#”

“Yeakus|Bill|123 Pine|Utica|OH|43050#“

Page 8: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 8

Include keyword expression

Keywords label each fields A self-describing structure Allows LOTS of flexibility Uses lots of space

“LAST=Yeakus|FIRST=Bill|ADDRESS=123 Pine|CITY=Utica|STATE=OH|ZIP=43050#“

Page 9: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 9

Optional Fields

Fixed length» Leave blank

Field length» zero length field

Delimiter» Adjacent delimiters

Keywords» Just leave out

Page 10: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 10

Reading a stream of fields

Need to break record into fields Fixed length can simply be read into record

structure Others must be “parsed” with a parse

algorithm

Page 11: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 11

Record Structures

How do we organize records in a file? Records can be fixed length or variable length

» Fixed length allows simple direct access lookup» Fixed may waste space» Variable - how do we find a records position?

Page 12: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 12

Record Structures

Fixed Length Records Fixed number of fields in records Variable length

» prefix each record with a length» Use a second file to keep track of record start

positions» Place delimiter between records

Page 13: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 13

Fixed Length Records

All records same length Record positions can be calculated for direct

access reads. Does not imply the that the sizes or number of

fields are fixed. Variable length records would lead to unused

space.

Page 14: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 14

Fixed number of fields in records

Field size could be fixed or variable Fixed

» results in fixed size records» simply read directly into “struct”

Variable sized fields» delimited or field lengths» Simply count fields while parsing

Page 15: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 15

Variable length Records

prefix each record with a length Use a second file to keep track of record start

positions Place delimiter between records

Page 16: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 16

Prefix records with a length

Allows true variable length records Form of prefix:

» Character number (fixed length)» Binary number (write integer without conversion)» Must consider Maximum length

No direct access (great for sequencial access)

Page 17: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 17

Index of record start addresses

A second file is simply a list of offsets to successive records

Since the offsets are fixed length, this file allows direct access, thereby allow direct access to main file.

Problem» Maintaining file (adding and deleting records)» Cost of index

Page 18: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 18

Place delimiter between records

Special character not used in record Allows efficient variable size No direct access Bible files - use ‘\n’ as delimiter

Page 19: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 19

Binary data in files

Binary reals and integers can be written, and read, from a file:» Need to know byte size of variables used.» “tsize” function returns data size

Page 20: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 20

Binary data in files

int rsize;

char rec_buf[MAX];

...

cpystr(rec_buf,”this is a test record”);

rsize = strlen(rec_buf);

write(my_fd,&rsize,tsize(int)); // write the size

write(my_fd,&rec_buf,rsize); // write the record

...

read(my_fd, &rsize,tsize(int)); // read the size

read(my_fd,&rec_buf,rsize); // read the record

Page 21: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 21

Viewing Binary file data

Use the file dump utility (od - octal dump)» od -xc <filename>» x - hex output» c - character output

Useful for viewing what is actually in file

Page 22: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 22

Using Classes to Manipulate Buffer

Three Classes» delimited fields» Length-based fields» Fixed length fields

Page 23: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 23

Class for Delimited fields

Consider a class to manage delimited text buffers» Allows reading and writing of delimited records» Allows packing and unpacking

Page 24: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 24

Class for Delimited fieldsclass Person

{

public:

// fields

char LastName [11];

char FirstName [11];

char Address [16];

char City [16];

char State [3];

char ZipCode [10];

// Methods next ...

}

Page 25: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 25

Class for Delimited fieldsclass DelimTextBuffer

{ public:

DelimTextBuffer (char Delim = '|', int maxBytes = 1000);

int Read (istream &);

int Write (ostream &) const;

int Pack (const char *, int size = -1);

int Unpack (char *);

private:

char Delim;

char DelimStr[2]; // zero terminated string for Delim

char * Buffer; // character array to hold field values

int BufferSize; // size of packed fields

int MaxBytes; // maximum number of characters in the buffer

int NextByte; // packing/unpacking position in buffer

};

Page 26: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 26

Class for Delimited fields

Packing a bufferPerson Bill_Yeakus

DelimitedTextBuffer buffer;

buffer.pack(Bill_Yeakus.LastName);

buffer.pack(Bill_Yeakus.FastName);

buffer.pack(Bill_Yeakus.ZipCode);

buffer.Write(stream);

Page 27: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 27

Class for Delimited fieldsint DelimTextBuffer :: Pack (const char * str, int size)

// set the value of the next field of the buffer;

// if size = -1 (default) use strlen(str) as Delim of field

{

short len; // length of string to be packed

if (size >= 0) len = size;

else len = strlen (str);

if (len > strlen(str)) // str is too short!

return FALSE;

int start = NextByte; // first character to be packed

NextByte += len + 1;

if (NextByte > MaxBytes) return FALSE;

memcpy (&Buffer[start], str, len);

Buffer [start+len] = Delim; // add delimeter

BufferSize = NextByte;

return TRUE;

}

Page 28: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 28

Class for Delimited fieldsint DelimTextBuffer :: Write (ostream & stream) const

{

stream . write ((char*)&BufferSize, sizeof(BufferSize));

stream . write (Buffer, BufferSize);

return stream . good ();

}

Page 29: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 29

Class for Delimited fieldsint DelimTextBuffer :: Read (istream & stream)

{

Clear ();

stream . read ((char*)&BufferSize, sizeof(BufferSize));

if (stream.fail()) return FALSE;

if (BufferSize > MaxBytes) return FALSE; // buffer overflow

stream . read (Buffer, BufferSize);

return stream . good ();

}

Page 30: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 30

Class for Delimited fieldsint DelimTextBuffer :: Unpack (char * str)

// extract the value of the next field of the buffer

{

int len = -1; // length of packed string

int start = NextByte; // first character to be unpacked

for (int i = start; i < BufferSize; i++)

if (Buffer[i] == Delim)

{len = i - start; break;}

if (len == -1) return FALSE; // delimeter not found

NextByte += len + 1;

if (NextByte > BufferSize) return FALSE;

strncpy (str, &Buffer[start], len);

str [len] = 0; // zero termination for string

return TRUE;

}

Page 31: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 31

Class for Delimited fields

Class Person can be extended to provide specialized packing functions

Page 32: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 32

Class for Delimited fieldsint Person::Pack (DelimTextBuffer & Buffer) const

{// pack the fields into a FixedTextBuffer, return TRUE if all succeed, FALSE o/w

int result;

Buffer . Clear ();

result = Buffer . Pack (LastName);

result = result && Buffer . Pack (FirstName);

result = result && Buffer . Pack (Address);

result = result && Buffer . Pack (City);

result = result && Buffer . Pack (State);

result = result && Buffer . Pack (ZipCode);

return result;

}

Page 33: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 33

Class for Delimited fieldsint Person::Unpack (DelimTextBuffer & Buffer)

{

int result;

result = Buffer . Unpack (LastName);

result = result && Buffer . Unpack (FirstName);

result = result && Buffer . Unpack (Address);

result = result && Buffer . Unpack (City);

result = result && Buffer . Unpack (State);

result = result && Buffer . Unpack (ZipCode);

return result;

}

Page 34: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 34

Class for Fixed Length fields

int FixedTextBuffer :: AddField (int fieldSize)

{

if (NumFields == MaxFields) return FALSE;

if (BufferSize + fieldSize > MaxChars) return FALSE;

FieldSize[NumFields] = fieldSize;

NumFields ++;

BufferSize += fieldSize;

return TRUE;

}

Page 35: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 35

Class for Fixed Length fields

int FixedTextBuffer :: Read (istream & stream)

{

stream . read (Buffer, BufferSize);

return stream . good ();

}

Page 36: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 36

Class for Fixed Length fields

int FixedTextBuffer :: Write (ostream & stream)

{

stream . write (Buffer, BufferSize);

return stream . good ();

}

Page 37: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 37

Class for Fixed Length fields

int FixedTextBuffer :: Pack (const char * str)// set the value of the next field of the buffer;{

if (NextField == NumFields || !Packing) // buffer is full or not packing modereturn FALSE;

int len = strlen (str);int start = NextCharacter; // first byte to be packedint packSize = FieldSize[NextField]; // number bytes to be packedstrncpy (&Buffer[start], str, packSize);NextCharacter += packSize;NextField ++;// if len < packSize, pad with blanksfor (int i = start + packSize; i < NextCharacter; i ++)

Buffer[start] = ' ';Buffer [NextCharacter] = 0; // make buffer look like a stringif (NextField == NumFields) // buffer is full{

Packing = FALSE;NextField = NextCharacter = 0;

}return TRUE;

}

Page 38: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 38

Class for Fixed Length fields

int FixedTextBuffer :: Unpack (char * str)// extract the value of the next field of the buffer{

if (NextField == NumFields || Packing) // buffer is full or not unpacking mode

return FALSE;int start = NextCharacter; // first byte to be unpackedint packSize = FieldSize[NextField]; // number bytes to be unpackedstrncpy (str, &Buffer[start], packSize);str [packSize] = 0; // terminate string with zeroNextCharacter += packSize;NextField ++;if (NextField == NumFields) Clear (); // all fields unpackedreturn TRUE;

}

Page 39: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 39

Class for Fixed Length fields

void FixedTextBuffer :: Print (ostream & stream)

{

stream << "Buffer has max fields "<<MaxFields<<" and actual "<<NumFields<<endl

<<"max bytes "<<MaxChars<<" and Buffer Size "<<BufferSize<<endl;

for (int i = 0; i < NumFields; i++)

stream <<"\tfield "<<i<<" size "<<FieldSize[i]<<endl;

if (Packing) stream <<"\tPacking\n";

else stream <<"\tnot Packing\n";

stream <<"Contents: '"<<Buffer<<"'"<<endl;

}

Page 40: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 40

Class for Fixed Length fields

class FixedTextBuffer

{ public:

FixedTextBuffer (int maxFields, int maxChars = 1000); int AddField (int fieldSize);

int Read (istream &);

int Write (ostream &);

int Pack (const char *);

int Unpack (char *);

private:

char * Buffer; // character array to hold field values

int BufferSize; // sum of the sizes of declared fields

int * FieldSize; // array to hold field sizes

int MaxChars; // maximum number of characters in the buffer

int NextCharacter; // packing/unpacking position in buffer

};

Page 41: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 41

Class for Fixed Length fields

int Person::Pack (FixedTextBuffer & Buffer) const

{// pack the fields into a FixedTextBuffer, return TRUE if all succeed, FALSE o/w

int result;

Buffer . Clear ();

result = Buffer . Pack (LastName);

result = result && Buffer . Pack (FirstName);

result = result && Buffer . Pack (Address);

result = result && Buffer . Pack (City);

result = result && Buffer . Pack (State);

result = result && Buffer . Pack (ZipCode);

return result;

}

Page 42: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 42

Class for Fixed Length fields

int Person::Unpack (FixedTextBuffer & Buffer)

{

Clear ();

int result;

result = Buffer . Unpack (LastName);

result = result && Buffer . Unpack (FirstName);

result = result && Buffer . Unpack (Address);

result = result && Buffer . Unpack (City);

result = result && Buffer . Unpack (State);

result = result && Buffer . Unpack (ZipCode);

return result;

}

Page 43: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 43

Record Access - Keys

Attribute used to identify records Often used to find records Standard or canonical form

» rules which keys must conform to» prevents missing record because key in different

form» Example:

– all capitals– Phone in form (nnn) nnn-nnnn

Page 44: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 44

Record Access - Keys

Keys can distinct - uniquely identify records» Primary keys» one-to-one relationship between key value and

possible entities represented» SSN, Student ID

Keys can identify a collection of records» Secondary keys» one-to-many relationship» City, position, department

Page 45: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 45

Record Access - Keys

Primary key desired characteristics» unique among collection of entities» dataless - what if some entities have not value of

this type (e.g. SSN)» unchanging

Page 46: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 46

Record access

Performance of access method» how do we compare techniques?» Must be careful what events we count.» “big-oh” notation gives us a way to factor out all but

the most significant factors

Page 47: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 47

Record Access - timing

Sequential searching» Consider file of 4000 records» What if no blocking done, and one record per

block? (500 bytes records, 512 byte blocks)» What if cluster size set to 8?» always requires O(n), but search is faster by a

constant factor

Page 48: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 48

Sequential searching

Usually NOT the best method Sometimes it is best:

» Searching for some ASCII pattern (grep)» Small files» Files rarely searched» Searching on secondary key, and a large

percentage of records match (say 25%)

Page 49: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 49

Unix Tools for sequential file processing

cat - display a file wc - count lines, words, and characters grep - find lines in file(s) which match regular

expression.

Page 50: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 50

Direct Access

Move “directly” to record without scanning preceding data

Different languages/OS’s support different models:» Byte offset model

– Programmer must specify offset to record, and record size to read.

– Supports variable size records, skip sequential processing

» Relative Record Number (RRN) model– File has a fixed record size (declared at creation time)

– Records are specified by a record number

– File modeled as a collection of components

– Higher level of abstraction

Page 51: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 51

Direct Access

Different language support» RRN support

– PL/I– COBOL– Pascal (files are modeled as a collection of components

(records)– FORTRAN

» Byte offset– C

Page 52: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 52

Choosing Record Sizes for Direct Access

Fixed Length Fields» Very easy to parse records - just read into record

structure!» Each field must be maximum length needed!

– Thus record must be as long all the maximum fields

last first address city state zip

10 10 15 15 2 9

“Yeakus Bill 123 Pine Utica OH43050 “

Page 53: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 53

Choosing Record Sizes for Direct Access

Variable length fields» Each field can be any length» since some can be long, others short, overall

record size may be shorter.» This gives more flexibility to fields length» Records must be parsed, space wasted for

delimiter or length bytes.

Yeakus|Bill|123|Pine|Utica|OH43050Snivenloppinsky|Helmut|12232 Galmentary Avenue|Spotsdale|NY|11232

Page 54: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 54

Header Records

The first record in a direct file may be used to store special information» Number of records used.» Location of first record in key order sequence.» Location of first empty record» File record structure (meta-data)

In languages with the RRN model Pascal, variant record facility must be used

In C, the header record can be of different size from the rest of the file records.

Page 55: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 55

Header Records

Consider “update.c” is text. Header record contains 2 byte number of

record count. Header size is 32, record size is 64

static struct { short rec_count; char fill[30];} head;

Page 56: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 56

Header Records

Must be written when file created Must be rewritten when file changed Must be read when file is opened

Page 57: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 57

File Access and Organization

File Organization» Variable Length Records» Fixed Length Records» Field Structures (size bytes, delimiters, fixed)

File Access» Sequential access» Direct access» Indexed access

Page 58: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 58

File Access and Organization

Interaction between organization and access» Can the file be divided into fields?» Is there a higher level of organization to the file

(mete data)?» Do all records have to have the same number of

fields, bytes?» How do we distinguish one record from the next?» How do we recognize if a fixed length record holds

real data or not?

Page 59: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 59

File Access and Organization

There is a often a trade-off between space and time» Fixed length records - allow direct access, waste

space» Variable require sequential search

We also must consider the typical use of the file - what are the desired access patterns

Selection of a particular organization has implications on the allowable types of access

Page 60: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 60

Portability and Standardization

Differences among Languages» Fixed sized records versus byte addressable

access

Differences among Machine Architectures» Byte order of binary data» May be high order or low order byte first

Page 61: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 61

Byte order of binary data

High order first: (Big Endian)» A long int: say 45 is stored in memory.» It is stored as: 00 00 00 2D» Sun’s, Network protocols

Low order first (Little Endian)» A long int: say 45 is stored in memory.» It is stored as: 2D 00 00 00» PC’s, VAX’s

Page 62: Fundamental File Structure Concepts

File Processing - Fundamental concepts MVNC 62

Byte order of binary data

If binary data is written to a file, it is written in the order stored in memory

If the data is later read by a system with a different ordering, the number will be incorrect!

For the sake of portability, files should be written in an agreed upon format (probably Big Endian)


Recommended