+ All Categories
Home > Documents > Chap2 FileOrg Indexes

Chap2 FileOrg Indexes

Date post: 05-Apr-2018
Category:
Upload: rama-krishna
View: 218 times
Download: 0 times
Share this document with a friend

of 42

Transcript
  • 7/31/2019 Chap2 FileOrg Indexes

    1/42

    File Organization and Index StructuresInstructor: Mr Mourad Benchikh

    Text Books: Elmasri & Navathe Chap. 5+6

    Ramakrishnan & Gehrke Chap. 7+8+9

    Oracle9i documentation

    First-Semester 1427-1428 Databases are stored physically as files of records typically

    stored on magnetic disks.

    This chapter will deal with the organization of databases instorage and the techniques for accessing them efficiently

    using various algorithms some of which require auxiliary

    data structures called Indexes. Emphasize on search process ; deletion, update, and

    insertion issues will not be covered.

  • 7/31/2019 Chap2 FileOrg Indexes

    2/42

    Storage Medium Primary Storage

    Main memory, smaller but faster cache memories.

    Fast access to data but is of limited storage capacity

    Can be operated on directly by the CPU

    Secondary Storage Magnetic disks, optical disks and tapes

    Larger capacity and less cost

    Slower access to data

    Data cannot be processed directly by CPU

    Magnetic Disks Secondary storage.

    Transfer of data between main memory and disk takes place in units of disk blocks: blocksunits of data transfer and data allcation.

    For read command: the block from disk is copied into the buffer

    For write command: the contents of the buffer are copied into the disk block

  • 7/31/2019 Chap2 FileOrg Indexes

    3/42

    Records Records Data is usually stored in form of records.

    Each record consists of a collection of related data values or items.

    Records usually describe entities and their attributes.

    For example, an EMPLOYEE record represents and employee

    entity and each field value in the record specifies some attribute of

    that employee, such as NAME, BIRTHDATE, SALARY.

    A collection of field names and their corresponding data types

    constitutes a record type or record format.

    C-Notation:

    struct employee{

    char name[30];char ssn[9];

    int salary;

    int jobCode;

    char department[20];

    };

  • 7/31/2019 Chap2 FileOrg Indexes

    4/42

    File File

    A sequence of records.

    Usually all records in a file are of the same record type (Fixed-length records)

    Variable-length records: some possible schemes: The file records are of the same record type but one or more of the fields are of varying

    size.

    The file records are of the same record type but one or more of the fields may havemultiple values for the individual records.

    The file records are of the same record type, but one or more of the fields are optional. The file include records of different types, each record will be preceded by a record typeindication: if a relation exists between EMPLOYEE and DEPARTMENT, then their corresponding records are physicallycontiguous (clustered) in order to minimize I/O operations.

    In general, a block contains one or more records specific toone file only:

    Spanned organization: records can cross block boundaries

    Unspanned organization: records cant cross block boundaries.

    Blocking Factor: Bfr =Number of records per block.

  • 7/31/2019 Chap2 FileOrg Indexes

    5/42

    Allocating File Blocks Contiguous Allocation The file blocks are allocated to consecutive disk blocks.

    Reading the whole file is very fast (using double buffering)

    Expanding the file is difficult

    Linked Allocation Each file block contains a pointer to the next file block.

    Easy to expand but slow to read the whole file.

    Combination Allocates clusters of consecutive disk blocks and the clusters are

    linked.

    Indexed allocation One or more index blocks contain pointers to the actual file blocks.

  • 7/31/2019 Chap2 FileOrg Indexes

    6/42

    Organization & Access Method File Organization

    The organization of the data of a file into records, blocks, and access structures

    The way records and blocks are placed on the storage medium and interlinked

    Example: Sorted File.

    Access Method Provide a group of operations that can be applied to a file :

    Open, Find, Delete, Modify, Insert, Close,..etc.

    It is possible to apply several access methods to a file organization.

    Some access methods can be applied only to files organized in certain ways:

    Cannot apply an indexed access method to a file without an index.

    Choose the file organization that efficiently implement

    the access methods needed by the application.

  • 7/31/2019 Chap2 FileOrg Indexes

    7/42

    Heap Files (Unordered Files)

    Heap File (Pile) The simplest type of file organization.

    Records are placed in the file in the order in which they are inserted.

    New records are inserted at the end of the file. the address of the last block infile header-

    Searching, using any search cdt, involves a linear search, an expensiveprocedure

    Relative or Direct File Relative or (Direct File)

    Unordered fixed-length records using unspanned blocks and contiguous

    allocation

    We can then access any record by its position in the file.

    The ith record is located in blocki/Bfr.

    Helpful organization to locate a record by its position but not helpful to locate a

    record based on a search condition.

  • 7/31/2019 Chap2 FileOrg Indexes

    8/42

    Sorted Files Organization that physically order the records of a file on disk basedon the values of one of the their fields called the ordering field.

    If the ordering field is also a key field of the file then the field is

    called the ordering key for the file. Figure 5.9 shows an ordered file with NAME as the ordering key

    field (assuming that employees have distinct names).

    Reading the records in order of the ordering key values becomesextremely efficient, because no sorting is required.

    Using a search condition based on the value of an ordering key field

    results in faster access when the binary search technique is used.

    Ordering does not provide any advantage for random or ordered

    access of the records based on values for the other non-ordering

    fields of the file. In this case, do a linear search for random access

  • 7/31/2019 Chap2 FileOrg Indexes

    9/42

  • 7/31/2019 Chap2 FileOrg Indexes

    10/42

    Binary SearchAlgorithm 5.1 Binary search on an ordering key of a disk file

    L= 1; U = b; /* b is the number of file blocks*/

    while(U >= L) do

    begin I = (L + U) div 2;read block I of the file into the buffer;

    if K < (ordering key field value of the first record in block I)

    then U = I-1

    else if K > (ordering key field value of the last record in block I)then L = I+1

    else if the record with ordering key field value = K is in the buffer

    then goto found

    else goto notFound

    endif;

    goto notFound;

    If b is the number of a sorted files block, then in average log2(b) isthe number of blocks to search using a binary search.

  • 7/31/2019 Chap2 FileOrg Indexes

    11/42

    Hashing Organization Provides very fast access to records on certain search

    conditions.

    The search condition must be an equality condition on a

    hash field of the file.

    In most cases, the hash field is also a key field of the

    file (hash key) Hashing

    To provide a function h, called a hash function, that is

    applied to the hash field value of a record and yields theaddress of the disk block in which the record is stored.

    A search for the record within the block can be carried

    out in a main memory buffer.

  • 7/31/2019 Chap2 FileOrg Indexes

    12/42

    Internal Hashing Internal files

    Hashing is also used as an internal search structure within

    a program whenever a group of records accessed

    exclusively by using the value of one field.

    Hashing is implemented as a hash table through the use of

    an array of records.

    Suppose that the array index range is from 0 to N-1; then

    we have N slots whose addresses correspond to the array

    indexes.

    We choose a hash function that transforms the hash field

    value into an integer between 0 and N-1.

    One common hash function is the h(K) = K mod M

    function, this value is used for the record address.

  • 7/31/2019 Chap2 FileOrg Indexes

    13/42

    Internal Hashing

    rrecord

    s

    N

    record

    slo

    ts

    H(K)

    Key0

    1

    N-1

    K mod N

    In general, r N

  • 7/31/2019 Chap2 FileOrg Indexes

    14/42

    Hashing Function Key is student id (six digits) Assume we have N = 100,000 record slots numbered 00000 99999

    H(K): student_id mod 100000

    085768085768 mod 100000 = 85768 134281134281 mod 100000 = 34281 101004101004 mod 100000 = 1004 100000100000 mod 100000 = 0 601004601004 mod 100000 = 1004 (collision)

    Collision Collision

    A collision occurs when the hash field value of a record that is being inserted hashes to anaddress that already contains a different record.

    The process of finding another position (after collision) is called collision resolution.

    Methods for collision resolution:

    Open addressing Chaining Multiple hashing

  • 7/31/2019 Chap2 FileOrg Indexes

    15/42

    External Hashing Hashing for disk files is called external hashing.

    The target address space is made of buckets, each of which holds multiple

    records. A bucket is either one disk block or a cluster of contiguous blocks.

    The hashing function maps a the indexing fields value into a relative

    bucket number.

    A table maintained in the file header converts the bucket number into the

    corresponding disk block address.

  • 7/31/2019 Chap2 FileOrg Indexes

    16/42

    Dynamic Files & Hashing

    One problem with hashing so far is that the

    address space N is fixed.

    Extendible hashing

    If the number of records grows beyond original size,

    the file must be reorganized

    How to handle dynamic files better?

    Dynamic hashing Linear hashing

  • 7/31/2019 Chap2 FileOrg Indexes

    17/42

    Indexing Index File (same idea as textbook index) : auxiliary structure designed to speed up access to

    desired data.

    Indexing field: field on which the index file is defined.

    Index file stores each value of the index field along with pointer: pointer(s) to block(s) that

    contain record(s) with that field value or pointer to the record with that field value:

    In oracle, the pointer is called RowID which tells the DBMS where the row (record) is located (by file, block within thatfile, and row within the block).

    To find a record in the data file based on a certain selection criterion on an indexing field,

    we initially access the index file, which will allow the access of the record on the data file. Index file much smaller than the data file => searching will be fast.

    Indexing important for file systems and DBMSs:

    Databases eventually map data to file structures on disk :

    Records of each relation may be stored in a separate file. Records of several different relations can be stored in the same file (i.e. physically

    clustered file organization : to minimize I/O)

    In DBMSs, the query processor accesses the index structures for processing a query

    (e.g., indexed join called also single-loop join)

  • 7/31/2019 Chap2 FileOrg Indexes

    18/42

    Types of Indexes

    Indexes on ordered vs. unordered files Dense vs. non-dense (i.e. sparse) indexes

    - Dense: An entry in the index file for each record of the data file.

    - Sparse: only some of the data records are represented in the index, often one index entry per block of the data file.

    Primary indexes vs. secondary indexes

    Ordered Indexes Hash indexes- Ordered Indexes: indexing fields stored in sorted order.

    - Hash indexes: indexing fields stored using a hash function.

    Single-level vs. multi-level single-level index is an ordered file and is searched using binary search.

    multi-level ones are tree-structured that improve the search and require a more elaborate search algorithm.

    Index on a single indexing field Index on multiple indexing

    fields (i.e.Composite Index). If a certain combination of fields is used frequently, set an index on multiple fields.

  • 7/31/2019 Chap2 FileOrg Indexes

    19/42

    Single-Level Ordered Index : Primary Index

    Physical records may be kept ordered on the primary

    key

    The index is ordered but only one entry record foreach block (non-dense).

    Each index entry has the value of the primary key

    field for the first record (or the last record) in a blockand a pointer to that block.

    Reduces the index requirements

    fewer index entries than records in the file

    binary search over index can be faster (fewer index block

    to read than ordered? file approach).

  • 7/31/2019 Chap2 FileOrg Indexes

    20/42

    Single-Level Ordered Index: Primary Index10567 J. Doe CS 3

    11589 T. Allen BA 215973 M. Smith CS 3

    29579 B. Zimmer BS 1

    34596 T. Atkins ME 475623 J. Wong BA 3

    84920 S. Allen CS 496256 P. Wright ME 2

    15973

    7562396256

  • 7/31/2019 Chap2 FileOrg Indexes

    21/42

    Single-Level Ordered Index: Clustering Index Records physically ordered by a non-key field

    Same general structure as ordered file index

    One entry in the index for each distinct value of the

    clustering field with a pointer to the first block in the

    data file that has a record with that value for its

    clustering field. Possibly many records for one index entry (non-dense)

    Sometimes entire blocks reserved for each distinct

    clustering field value

  • 7/31/2019 Chap2 FileOrg Indexes

    22/42

    Single-Level Ordered Index: Clustering Index11589 T. Allen BA 2

    75623 J. Wong BA 329579 B. Zimmer BS 1

    10567 J. Doe CS 3

    15973 M. Smith CS 384920 S. Allen CS 4

    34596 T. Atkins ME 496256 P. Wright ME 2

    BA

    BSCS

    ME

  • 7/31/2019 Chap2 FileOrg Indexes

    23/42

    Single-Level Ordered Index: Secondary Indexes Ordered file with two fields.

    Non-ordering field (indexing field)

    Block pointer or a record pointer

    There can be several secondary indexes for the same file but only oneprimary index.

    Dense Secondary Index (non-ordering key field). See Figure 6.4.

    Several options for a secondary index on a non-key field: Option1:Include several index entries with the same value of the

    indexing field -one for each record- dense index.

    Option2: More commonly used, have a single entry for each index

    value but to create an extra level of indirection to handle the

    multiple pointers. See figure 6.5

    Etc.

  • 7/31/2019 Chap2 FileOrg Indexes

    24/42

  • 7/31/2019 Chap2 FileOrg Indexes

    25/42

  • 7/31/2019 Chap2 FileOrg Indexes

    26/42

    Types of Single-Level Ordered

    Indexes

    Secondary Index (non-key)Clustering IndexNon-key Field

    Secondary Index (key)Primary IndexKey Field

    Non-ordering FieldOrdering Field

    Non-DenseNumber of distinct index

    field values (Option 2 )

    Secondary (non-

    key)

    DenseNumber of records in a

    data file

    Secondary (Key)

    Non-denseNumber of distinct index

    field values

    Clustering

    Non-denseNumber of blocks in data

    file

    Primary

    Dense or non-denseNumber of first-level

    Index entries

  • 7/31/2019 Chap2 FileOrg Indexes

    27/42

    Static Multilevel Indexes

    Multilevel index considers the index file (first level) as anordered file with a distinct value of each value of the

    indexing field. The primary index to first level is called

    second level of the multilevel index. Hence multilevel index with r1 first-level entries will have

    approximately t levels, t = logfo r1

    . Fanout : fo = Nb records per First level block.

    Indexed Sequential File: commonly used file organization The data file is an ordered file with a multilevel primary index on its ordering

    key field. See Figure 6.6

    Multilevel index speeds record search.

    Problems of index deletion & insertion which may require

    reorganization of the index: when the data file is modified,

    the index must be updated.

  • 7/31/2019 Chap2 FileOrg Indexes

    28/42

  • 7/31/2019 Chap2 FileOrg Indexes

    29/42

    Dynamic Multilevel Indexes Retain the benefits of using multilevel indexing while reducing index insertion & deletionproblems: automatically reorganizes itself with small, local changes in the face of insertions

    and deletions.

    Leave some space in each of its blocks for inserting new entries.

    Dynamic multilevel indexes are implemented as B-trees and often as B+

    -trees.B-tree: . allow an indexing field value to appear only once at some level in the tree ;

    .pointer to data at each node.

    B+-tree: .pointers to data are stored only at the leaf nodes of the tree ;

    . Leaf nodes have an entry for every indexing field value.

    . The leaf nodes are usually linked together to provide ordered access on the indexing field to the records.

    . All the leaf nodes of the tree are at the same depth: retrieval of any record takes the same time.

    . In Oracle B+-tree is called B*-tree??? see next figure -

    Other types of indexes-Other indexing techniques other than tree-based techniques are: hashed-based techniques:

    -Hashing can be used not only for file organization, but also for index-structure creation: a hash

    index organizes the indexing fields, with their associated pointers, into a hash file structure.

  • 7/31/2019 Chap2 FileOrg Indexes

    30/42

    3-levels B+-index

    Fil f i d d Cl t i O l 9i

  • 7/31/2019 Chap2 FileOrg Indexes

    31/42

    Files of mixed records:Clusters in Oracle 9i

    A cluster is made up of a group of tables that share the same datablocks, These tables have been grouped together because they share common columns and areoften used together.

    For example, the EMP and DEPT tables share the DEPTNO column called clusterkey-. When you cluster the EMP and DEPT tables clustered tables-, Oracle

    physically stores all rows for each department from both the EMP and DEPT tables inthe same data blocks.

    Advantages:

    Access time improves for joins of clustered tables The cluster key is the column, or group of columns, that the clustered tables have in common.

    Each cluster key value is stored only once each in the cluster and the cluster index, no matterhow many rows of different tables contain the value. Therefore, less storage might be requiredto store related table and index data in a cluster than is necessary in non-clustered table format.

    For example, notice how each cluster key (each DEPTNO) is stored just once for many rowsthat contain the same value in both the EMP and DEPT tables. see next figure-

    A hash cluster : for performance accessOracle physically stores the rows of a table in a hash cluster and retrieves themaccording to the results of a hash function. a way to improve the performance of dataretrieval

  • 7/31/2019 Chap2 FileOrg Indexes

    32/42

    Clusters in Oracle 9i (contd)

    Cl t i O l 9i ( td)

  • 7/31/2019 Chap2 FileOrg Indexes

    33/42

    Clusters in Oracle 9i (contd)

    Steps Create the cluster

    CREATE CLUSTER emp_dept (deptno NUMBER(3)) PCTUSED 80 PCTFREE 5SIZE 600 TABLESPACE users STORAGE (INITIAL 200k NEXT 300K

    MINEXTENTS 2 MAXEXTENTS 20 PCTINCREASE 33);

    Creating Clustered Tables

    CREATE TABLE dept ( deptno NUMBER(3) PRIMARY KEY, . . . ) CLUSTER

    emp_dept (deptno); CREATE TABLE emp ( empno NUMBER(5) PRIMARY KEY, ename

    VARCHAR2(15) NOT NULL, . . . deptno NUMBER(3) REFERENCES dept)CLUSTER emp_dept (deptno);

    Creating the Cluster Indexe:A cluster index must be created before

    any rows can be inserted into any clustered table

    CREATE INDEX emp_dept_index ON CLUSTER emp_dept INITRANS 2MAXTRANS 5 TABLESPACE users STORAGE (INITIAL 50K NEXT 50KMINEXTENTS 2 MAXEXTENTS 10 PCTINCREASE 33) PCTFREE 5;

    SQL O l 9i d I d

  • 7/31/2019 Chap2 FileOrg Indexes

    34/42

    SQL, Oracle9i and Indexes SQL-92 doesnt include statement for index structure, and so there are some

    variation in index-related commands cross different DBMSs.

    When a table is created, it is desirable to add indexes on certain

    attributes

    Especially the primary key

    The existence of indexes can greatly speed query processing

    Consider selecting a subset of tuples from a relation based on the value of the

    key field or a join like:

    RR.ATTR1>S.ATTR2 S

    Indexes can be created implicitly by the DBMS at table creation

    time E.g. on any attribute designated as a primary key

    Oracle automatically creates an index when UNIQUE or PRIMARY KEY

    constraints clause is specified in a Create Table.

    SQL O l 9i d I d

  • 7/31/2019 Chap2 FileOrg Indexes

    35/42

    SQL, Oracle9i and Indexes Indexes may also be created explicitly with SQL DDL

    commands

    Consider the following Oracle Statements:

    When you create an index, Oracle fetches and sorts the columns to beindexed, and stores the RowId along with the index value for each row.

    Then Oracle loads the index from the bottom up.

    CREATE INDEX emp_ename ON emp(ename); Oracle sorts the EMP table on theENAME column. It then loads the index with the ENAME and corresponding RowId

    values in this sorted order. When it uses the index, Oracle does a quick search through the

    sorted ENAME values and then uses the associated RowId values to locate the rows

    having the sought ENAME value.

    In Oracle you can create more than one index using the same columns

    provided that you specify distinctly different combinations of the columns

    In Oracle you cannot create an index that references only one column in a

    table if another such index already exists.

    SQL O l 9i d I d ( )

  • 7/31/2019 Chap2 FileOrg Indexes

    36/42

    SQL, Oracle9i and Indexes (contd)

    Consider the following Oracle Statements (contd): CREATE UNIQUE INDEX pkIdx ON Staff(SIN)

    Creates an index on the field SIN in the table Staff

    The UNIQUE keyword ensures the uniqueness of SIN values in the table(and index). This uniqueness is enforced even when adding an index to atable with existing data. If the SIN field is non-unique then the indexcreation fails.

    If the UNIQUE keyword is not used, then two rows of the table can have thesame value.

    Nonunique indexes are sorted by the index key and rowid.

    Composite index is an index that you create on multiple columns in a table

    CREATE INDEX CInd ON Student(Fname, Lname); Composite indexes can speed retrieval of data for SELECT statements in

    which the WHERE clause references all or the leading portion of thecolumns in the composite index

    - DROP INDEX clIdx; -Drops the index clIdx-.

    SQL Oracle9i and Indexes (contd)

  • 7/31/2019 Chap2 FileOrg Indexes

    37/42

    SQL, Oracle9i and Indexes (cont d) Oracle and indexes

    Table indexes: Store each field value repeatedly with each stored RowId. Oracle uses B*-tree (B+-tree???) as internal structure of a table index.

    Bitmap indexes:

    Rather than a B*-tree, bitmap indexes store the RowIds associated with a field value asa bitmap. Each bit in the bitmap corresponds to a possible RowId, and if the bit is set, itmeans that the row with the corresponding RowId contains the field value.

    A mapping function converts the bit position to an actual RowId, so the bitmap index providesthe same functionality as a regular index even though it uses a different representationinternally.

    Among the advantages of using bitmap indexes: speed searches in case where low cardinalitycolumns are used - columns in which the number of distinct values is small compared to thenumber of rows in the table-.

    Cluster indexes: A cluster index is an index defined specifically for a cluster. A cluster index contains an entry for each cluster key value.

    To locate a row in a cluster

    the cluster index is used to find the cluster key value, which points to the data block associatedwith that cluster key value.

    rac e an n exes cont

  • 7/31/2019 Chap2 FileOrg Indexes

    38/42

    , rac e an n exes cont- create bitmap index Emp_M_S on Employee(Marital_Status);

    - create bitmap index Emp_R on Employee(Region);

    SQL Oracle9i and Indexes (contd)

  • 7/31/2019 Chap2 FileOrg Indexes

    39/42

    SQL, Oracle9i and Indexes (cont d) Oracle and indexes (contd)

    Function-Based indexes You can create indexes based on Oracle Functions.

    You can create such an index -Create index name_emp on emp(upper(name))-

    . Can facilitates processing the query: select * from emp where upper(ename)=ALI.

    - Index-Organized table The entire table is stored within an index structure.

    Create table employee (ID char(9) primary key, name varchar2(20)) organization index;

    Instead of maintaining two separate storages for the table and the B*-tree index, the

    database system only maintains a single B*-tree index . The tables data is sorted by the tables primary key.-primary key mandatory-

    Each B*-tree index leaf entry contains instead of

  • 7/31/2019 Chap2 FileOrg Indexes

    40/42

    Index-Organized Table

    Overview of Oracle9i DB structure and Space

  • 7/31/2019 Chap2 FileOrg Indexes

    41/42

    Overview of Oracle9i DB structure and Space

    management Oracle DB has logical and physical structures. Such separation allow logical structures to be defined identically

    across different hardware and operating system platforms.

    Logical DB structures represent the components see inan Oracle DB. Consist of: Tablespaces: The DB is divided logically divided into units called tablespaces

    regrouping together related logical structures like all applications objects.SYSTEM tablespace is the minimum tablesapce requirement at DB creation. Italways contains the Data Dictionary..

    Blocks: a block is the smallest unit of storage in Oracle.

    Extents: an extent is a grouping of contiguous blocks.

    Segments: a segment is a set of extents allocated for logical structures (as schemas).There are four segment types : data segments (store table (cluster) data), indexsegments (store index data), temporary segments (for temporary work: sort,etc.),undo segments (store undo information)

    Schema objects : are the logical structures referring to the DBs data: tables, views,indexes, cluster, etc.

    Overview of Oracle9i DB structure and Space

  • 7/31/2019 Chap2 FileOrg Indexes

    42/42

    Overview of Oracle9i DB structure and Space

    management Physical DB structures represents the method of internal

    storage. Consist of:

    Datafiles: contain all the DB data. An Oracle DB should have one ormore data files. Each data file is associated with only one tablespace. A

    tablespace can consists of more than one data file.

    When a user wants to read data in a table and the requested information is not in the

    memory cache of the DB, it is read from the appropriate datafiles and stored in memory.

    Modified or new data is not necessary written to a datafile immediately. It is pooled in

    memory and written to the appropriate datafiles all at once as determined by the DBW).

    Redo log files: record all changes made to data. These files are critical

    for DB operation and recovery from failure. Two or more redo log files

    are necessary. A redo log is made of redo entries (I.e. redo records).

    Control files: maintain information about the physical structure of the

    DB (ex. name and location of every data file and redo log file, etc.).

    Every Oracle DB has at least one control file.


Recommended