+ All Categories
Home > Education > Indexing

Indexing

Date post: 11-May-2015
Category:
Upload: davood-barfeh
View: 1,239 times
Download: 3 times
Share this document with a friend
Popular Tags:
23
INDEXING Davood Pour Yousefian Barfeh
Transcript
Page 1: Indexing

INDEXING

Davood Pour Yousefian Barfeh

Page 2: Indexing

INDEXINGINDEXING What is index?What is index?

Why is it needed?Why is it needed?

When should it be used?When should it be used?

Types of indexesTypes of indexes

Page 3: Indexing

What is index?What is index?

a data structurea data structure

a way of sorting a way of sorting

holds the field value, and pointer to the holds the field value, and pointer to the record it relates to record it relates to

Page 4: Indexing

Why is Index needed?Why is Index needed?(((((Advantage)))))(((((Advantage)))))

speed up speed up retrievalretrieval of data of data

without index: Linear Search without index: Linear Search N= number of records

- key (unique value) – N/2- key (unique value) – N/2

- non-key – N- non-key – N

using index: Binary Searchusing index: Binary Search

loglog22NN

Page 5: Indexing

IndexingIndexing(((((Disadvantage)))))(((((Disadvantage)))))

Additional space on the diskAdditional space on the disk

Slow downSlow down

Page 6: Indexing

Field name Field name Data type Data type Size on disk Size on diskid (Primary key)id (Primary key) Unsigned INT Unsigned INT 4 bytes 4 bytesfirstName firstName Char(50) Char(50) 50 bytes 50 byteslastName lastName Char(50) Char(50) 50 bytes 50 bytesemailAddress emailAddress Char(100) Char(100) 100 bytes 100 bytes

*char was used in place of varchar to allow for an accurate size on disk value *database contains five million rows, and is unindexed r = 5,000,000 records & record length R = 204 bytes & block size B = 1,024 bytes

bfr = (B/R) = 1024/204 = 5 records per disk block total number of blocks required N = (r/bfr) = 5,000,000 / 5 = 1,000,000 blocks

linear search for a key field: N / 2 = 500,000 blocks -- can be log2N = 19.93 20 blocks

Linear search for a non-key field: N = 1,000,000 blocks

Ex. Without IndexingEx. Without Indexing

Page 7: Indexing

Field name Field name Data type Data type Size on disk Size on diskfirstName firstName Char(50) Char(50) 50 bytes 50 bytes(record pointer) (record pointer) Special Special 4 bytes 4 bytes

*Pointers in MySQL are 2, 3, 4 or 5 bytes in length depending on the size of the table

r r = 5,000,000 records & index record length R = 54 bytes & block size B = 1,024 bytes

bfr = (B/R) = 1024 / 54 = 18 records per disk block

The total number of blocks required to hold the index is: N = (r/bfr) = 5000000 / 18 → 277,778 blocks

Binary Search:Binary Search:

loglog22N =N = loglog

22277,778 = 18.08 → 19 blocks 277,778 = 18.08 → 19 blocks

Ex.Using IndexingEx.Using Indexing

Page 8: Indexing

When When shouldshould indexing be used? indexing be used? cancan

General Rule: Anything that limits the number of results you are trying to find.

speed up finding data

cardinality

table that references other table

Page 9: Indexing

When should indexing be used?When should indexing be used?

speed up finding data but slow down but slow down inserting inserting , , deleting deleting or or updatingupdating datadata

- not only table must be updated but - not only table must be updated but the index as well the index as well

bankbank account numberaccount number is better than one onis better than one on balancebalance

Page 10: Indexing

Cardinality: The number of distinct values for a column

Binary SearchBinary Search

Linear SearchLinear Search

When should indexing be used?When should indexing be used?

Page 11: Indexing

When should indexing be used?When should indexing be used?

Cardinality

Ex. good Selectivity: A table having 100'000 records and one of its indexed column has 88’000 distinct values, then the selectivity of this index is 88'000 / 100’000 = 88%

Ex. bad Selectivity: A table of 100'000 records had only 200 distinct values, then the index's selectivity is 200 / 100'000 = 0.2%

Number of records in each group= 100’000 / 200 = 5’000 full table scan is more efficient as using such an index where much more I/O is

needed to scan repeatedly the index and the table

Index SelectivityIndex Selectivity= = Number of distinct valuesNumber of distinct values Number of recordsNumber of records

Page 12: Indexing

When should indexing be used?When should indexing be used?table that references other table - join

Ex.Ex.CREATE TABLE newsitem (  newsid INT PRIMARY KEY,  newstitle VARCHAR(255),  newscontent TEXT,  authorid INT,  newsdate TIMESTAMP);

CREATE TABLE authors (  authorid INT PRIMARY KEY,  username VARCHAR(255),  firstname VARCHAR(255),  lastname VARCHAR(255));

SELECT newstitle, firstname, lastname FROM newsitem n, authors a WHERE n.authorid=a.authorid;

CREATE INDEX newsitem_authorid ON newsitem(authorid);

General Rule: Any fields involved in a table join must be indexed

Page 13: Indexing

When should indexing be used?When should indexing be used?

CREATE TABLE newsitem (  newsid INT PRIMARY KEY,  newstitle VARCHAR(255),  newscontent TEXT,  authorid INT,  newsdate TIMESTAMP);

CREATE TABLE newsitem_categories (  newsid INT,  categoryid INT);

CREATE TABLE categories (  categoryid INT PRIMARY KEY,   categoryname VARCHAR(255));

SELECT n.newstitle, c.categoryname FROM categories c, newsitem_categories nc, newsitem n WHERE c.categoryid=nc.categoryid AND nc.newsid=n.newsid;

These fields must be indexed:newsitem newsidnewsitem_categories newsidnewsitem_categories categoryidcategories categoryid

CREATE INDEX newscat_news ON newsitem_categories(newsid);

CREATE INDEX newscat_cats ON newsitem_categories(categoryid);

Ex.

Page 14: Indexing

Combination on IndexingCombination on Indexing

CREATE INDEX newscat_news ON newsitem_categories(newsid);CREATE INDEX newscat_cats ON newsitem_categories(categoryid);

CREATE INDEX news_cats ON newsitem_categories(newsid, categoryid);C

an w

e do?

YES but LIMITATIONs

Page 15: Indexing

Conjunctions in Cobnations on IndexingConjunctions in Cobnations on Indexing

CREATE TABLE example (CREATE TABLE example (  a int,  a int,  b int,  b int,  c int  c int););

CREATE INDEX example_index ON example(a,b,c);

• It will be used when you check against ‘a’.

• It will be used when you check against ‘a’ and ‘b’.

• It will be used when you check against ‘a’, ‘b’ and ‘c’.

• It will not be used if you check against ‘b’ and ‘c’, or if you only check ‘b’ or you only check ‘c’

• It will be used when you check against ‘a’ and ‘c’ but only for the ‘a’ column – it will not be used to check the ‘c’ column as well.

A query against ‘a’ OR ‘b’ like this:SELECT a,b,c FROM example where a=1 OR b=2;

• Will only be able to use the index to check the ‘a’ column as well – it will not be able to use it to check the ‘b’ column.

Page 16: Indexing

Types of indexes (1)Types of indexes (1)

Clustered Clustered andand Non-clusteredNon-clustered

IndexesIndexes

indexes whose order of the rows in the data page correspond to the order of the rows in the index

• Only one per table – primary key

• Faster to read than non clustered as data is physically stored in index order

• Can be used many times per table

• Quicker for insert, delete, and update operations than a clustered index

Order of rows is not important

Page 17: Indexing

Types of indexes (2)Types of indexes (2)

UniqueUnique andand Non-uniqueNon-unique

IndexesIndexeshelp maintain data integrity by ensuring that no two rows of data in a table have identical key values

uniqueness is enforced

improve query performance by maintaining a sorted order of data values that are used frequently

Page 18: Indexing

Types of indexes (3)Types of indexes (3)

Bitmap index - stores the bulk of its data as bit array

values of a variable repeat very frequently

Dense index - An index record appears for every search key value in file. This record contains search key value and a pointer to the actual record

Sparse index - Index records are created only for some of the records

primary key

Reverse index - reverses the key value before entering it in the index sequence numbers, where new key values monotonically

increase

Page 19: Indexing

Types of indexes (4)Types of indexes (4)

Fulltext - search engine examines all of the words in every stored document as

it tries to match search words supplied by the user

many other types of search: Two words near each other Any word derived from a particular root (for example run, ran, or running) Multiple words with distinct weightings A word or phrase close to the search word or phrase

Spatial - allow users to treat data within a data-store as existing within a two dimensional context

extended index that allows you to index a spatial column. A spatial column is a table column that contains

data of a spatial data type, such as geometry or geography

Page 20: Indexing

Syntax of Index (1)Syntax of Index (1)

Creation:

CREATE [UNIQUE|FULLTEXT|SPATIAL] INDEX index_name [index_type] ON tbl_name (index_col_name,...) [index_type]index_col_name: col_name [(length)] [ASC | DESC]index_type: USING {BTREE | HASH}

Page 21: Indexing

Access MethodAccess Method

BTree:Keys have some locality of reference

They can be sorted well

Neighborhood-expect that a query for a given key

will likely be followed by a query for one of its neighbors

Hash:Dataset is extremely large

Page 22: Indexing

Syntax of Index(2)Syntax of Index(2)

Displaying Index Information:Displaying Index Information:

SHOW INDEX FROM table_name

Deletion:

DROP INDEX index_name ON table_name

Page 23: Indexing

Summary Summary

What is index? - What is index? - data structure – sorting a number of records

Why is it needed? - Why is it needed? - advantages & disadvantages

When should it be used? - When should it be used? - finding

Types of indexes - Types of indexes - clustered & non-clustered – unique & non-unique

Syntax - Syntax - creation, display, deletion


Recommended