+ All Categories
Home > Documents > Dictionaries - nritech.edu.innritech.edu.in/eLearning/CSE-2-1/CSE-II-I-DS-UNIT-4.pdfDictionary as...

Dictionaries - nritech.edu.innritech.edu.in/eLearning/CSE-2-1/CSE-II-I-DS-UNIT-4.pdfDictionary as...

Date post: 14-Mar-2018
Category:
Upload: hoangnhi
View: 220 times
Download: 3 times
Share this document with a friend
12
NRI Institute of Technology Data Structures through C++ B.Tech II Yr I Sem(R09) Prepared by A. Sharath Kumar UNIT-4 Q1. What is Dictionary? Explain various operations and implementation methods. Dictionaries Dictionary is one which is capable of storing objects in sorted order based on key. Primarily, the dictionary component will store objects based on some sort of a string key. The main purpose of the dictionary is to provide us with an easy way to see what data points we have already entered into our storage place. In a dictionary, you store a value with an associated key and then you may retrieve this value later using the key. Dictionary can be associative array, map or hash table. A dictionary is a collection of elements Each element has a field called key (key, value) Every key is usually distinct Typical dictionary operations are: Determine whether or not the dictionary is empty Determine the dictionary size (i.e., # of pairs) Insert a pair into the dictionary Search the pair with a specified key Delete the pair with a specified key. Dictionary ADT A dictionary ADT implements the following operations Operation Description Insert(x) puts the item x into the dictionary Delete(x) deletes the item x from the dictionary IsThere(x) returns true if the item x is in the dictionary Simple container methods: size() isEmpty() elements() Query methods: findElement(k) findAllElements(k) Update methods: insertItem(k, e) removeElement(k) removeAllElements(k) Dictionary as Ordered Linear List Dictionary can be associative array, map or hash table. As an ordered list the following time complexities are observed Type of Array Complexity Unsorted array O(n) Sorted array O(log n) L = (e1, e2, e3, …, en) Each ei is a pair (key, value)
Transcript

NRI Institute of Technology Data Structures through C++ B.Tech II Yr I Sem(R09)

Prepared by A. Sharath Kumar

UNIT-4

Q1. What is Dictionary? Explain various operations and implementation methods.

Dictionaries Dictionary is one which is capable of storing objects in sorted order based on key. Primarily, the

dictionary component will store objects based on some sort of a string key. The main purpose of the

dictionary is to provide us with an easy way to see what data points we have already entered into our

storage place. In a dictionary, you store a value with an associated key and then you may retrieve this

value later using the key. Dictionary can be associative array, map or hash table.

A dictionary is a collection of elements

Each element has a field called key

– (key, value)

Every key is usually distinct

Typical dictionary operations are:

– Determine whether or not the dictionary is empty

– Determine the dictionary size (i.e., # of pairs)

– Insert a pair into the dictionary

– Search the pair with a specified key

– Delete the pair with a specified key.

Dictionary ADT

A dictionary ADT implements the following operations

Operation Description

Insert(x) puts the item x into the dictionary

Delete(x) deletes the item x from the dictionary

IsThere(x) returns true if the item x is in the dictionary

Simple container methods: size()

isEmpty()

elements()

Query methods: findElement(k)

findAllElements(k)

Update methods: insertItem(k, e)

removeElement(k)

removeAllElements(k)

Dictionary as Ordered Linear List

Dictionary can be associative array, map or hash table. As an ordered list the following time

complexities are observed

Type of Array Complexity

Unsorted array O(n)

Sorted array O(log n)

L = (e1, e2, e3, …, en)

Each ei is a pair (key, value)

NRI Institute of Technology Data Structures through C++ B.Tech II Yr I Sem(R09)

Prepared by A. Sharath Kumar

Array or chain representation

– unsorted array: O(n) search time

– sorted array: O(logn) search time

– unsorted chain: O(n) search time

– sorted chain: O(n) search time.

Implementation of Dictionaries

Dictionaries can be implemented using one of the following ways

Sequences

Binary Search Trees

Skip lists

Hash Tables

Q2. Explain the Skip List representation of Dictionary with an example.

Skip List Representation Skip lists are simple, and yet have the same asymptotic efficiency as much more complicated AVL trees

and red-black trees. While many standard libraries for various programming languages provide a sorted

set data structure, there are numerous problems that require more control over the internal data structure

than a sorted set exposes. One of such structure is skip list.

Good implementation for a dictionary

A series of lists {s0, s1, …, sk}

Each list si stores a sorted subset of the dictionary d

Skip list:-

List S(i+1) contains items picked at random from S(i)

Each item has probability 50% of being in the upper level list

o Like flipping a coin

S0 has n elements

S1 has about n/2 elements

S2 has about n/4 elements ……

Traversing Positions in a Skip List:-

Assume a node P in the skip list

after(p)

before(p)

below(p)

above(p)

NRI Institute of Technology Data Structures through C++ B.Tech II Yr I Sem(R09)

Prepared by A. Sharath Kumar

Operations On Skip Lists

The basic operations on a skip list are as follows:

Operation Time Complexity

Insertion O(log N)

Removal O(log N)

Check if contains O(log N)

Enumerate in order O(N)

This makes skip list a very useful data structure. Skip list can be used as the underlying storage

for a sorted set data structure. But, skip list can be directly used to implement some operations

that are not efficient on a typical sorted set:

o Find the element in the set that is closest to some given value, in O(log N) time.

o Find the k-th largest element in the set, in O(log N) time. Requires a simple

augmentation of the the skip list with partial counts.

o Count the number of elements in the set whose values fall into a given range, in O(log N)

time. Also requires a simple augmentation of the skip list.

Use skip lists to implement dictionaries

Need to deal with

o Search o Insert

o Remove

Searching

o Search for key K

o Start with p = the top-most, left position node in the skip list o two steps:

1. if below(p) is null then stop

we are at the bottom 2. While key(p) < K move to the right

go back to 1

o Search for 27

NRI Institute of Technology Data Structures through C++ B.Tech II Yr I Sem(R09)

Prepared by A. Sharath Kumar

Pseudo code for Searching:-

Algorithm SkipSearch (k)

Input: Search key k

Output: Position p in S such that p has the largest key less than or equal to k

p = top-most, left node in S

while below(p) != null do

p below(p)

while(key (after(p)) <= k do

p after(p)

return p .

Pseudocode for Insertion :-

Algorithm SkipInsert(k,e)

Input: Item (k,e)

Output: -

p SkipSearch(k)

q insertAfterAbove(p, null, Item (k,e))

while random( ) <= 50% do

while(above(p) == null) do

p before(p)

p above(p)

q insertAfterAbove(p, q, Item(k,e))

Q3. What is hashing? Explain various hashing methods for implementing Dictionary.

Hashing And Hash Techniques, Hash Table Representation The hash table is an array that is normally stored on disk, for example consider phone numbers,

the bucket indexes range from 0 to 9999 and each bucket can store multiple telephone numbers.

Hash Tables implement the Dictionary ADT, namely:

NRI Institute of Technology Data Structures through C++ B.Tech II Yr I Sem(R09)

Prepared by A. Sharath Kumar

An element with key k is stored in slot h(k), where h is a hash function mapping U into given set of

elements. When Two keys can now hash to the same slot, this is called as situation ―Collision‖,

Some of the collision resolution techniques are In chaining, all elements that hash to the same slot

are put in a linked list.

Introduction to Hashing:-

•Suppose that we want to store 10,000 students records (each with a 5-digit ID) in a given container.

· A linked list implementation would take O(n) time.

· A height balanced tree would give O(log n) access time.

· Using an array of size 100,000 would give O(1) access time but will lead to a lot of

space wastage.

•Is there some way that we could get O(1) access without wasting a lot of space?

•The answer is hashing.

Hash Tables:-

•A Hash Table is a one-dimensional array indexed by an integer value computed by an

Index function called a hash function.

•Hash tables are sometimes referred to as scatter tables.

•Typical hash table operations are:

· Initialization.

· Insertion.

· Retrieval.

· Deletion.

Hash Functions

•A hash function, h, is a function which transforms a key from a set, K, into an index in a table of

size n:

h: K -> {0, 1, ..., n-2, n-1}

•A key can be a number, a string, a record etc.

NRI Institute of Technology Data Structures through C++ B.Tech II Yr I Sem(R09)

Prepared by A. Sharath Kumar

•The size of the set of keys, |K|, to be relatively very large.

•There can be more keys than hash table cells and

•Different keys will hash to the same location.

•This situation is called collision and the colliding keys are called synonyms.

•Unfortunately, collisions cannot be avoided unless we have a priori knowledge of the keys.

Types of Hashing:-

•There are two types of hashing:

1. Static hashing: the set of keys is fixed and given in advance.

2· Dynamic hashing: the set of keys can change dynamically.

•The load factor of a hash table is the ratio of the number of keys in the table to the size of

the hash table.

•As the load factor gets closer to 1.0, the likelihood of collisions increases.

•The load factor is a typical example of a space/time trade-off.

Good Hash Functions

•A good hash function should:

· Minimize collision.

· Be easy and quick to compute.

· Distribute key values evenly in the hash table.

· Use all the information provided in the key.

· Have a high load factor for a given set of keys.

Hashing Methods 1. Prime-Number Division Remainder :-

•Computes hash value from key using the % operator.

•Table size that is a power of 2 like 32 and 1024 should be avoided, for it leads to more collisions.

•Also, powers of 10 are not good for table sizes when the keys rely on decimal integers.

•Prime numbers not close to powers of 2 are better table size values.

NRI Institute of Technology Data Structures through C++ B.Tech II Yr I Sem(R09)

Prepared by A. Sharath Kumar

•This method is best when combined with truncation or folding.

2. Truncation or Digit/Character Extraction:-

•Works based on the distribution of digits or characters in the key.

•More evenly distributed digit positions are extracted and used for hashing purposes.

•For instance, students IDs or ISBN codes may contain common subsequences which may

increase the likelihood of collision.

•Very fast but digits/characters distribution in keys may not be very even.

3. Folding:-

•It involves splitting keys into two or more parts and then combining the parts to form the hash

addresses.

•To map the key 25936715 to a range between 0 and 9999, we can:

· split the number into two as 2593 and 6715 and

· add these two to obtain 9308 as the hash value.

•Very useful if we have keys that are very large.

•Fast and simple especially with bit patterns.

•A great advantage is ability to transform non-integer keys into integer values.

4. Radix Conversion:-

•Transforms a key into another number base to obtain the hash value.

•Typically use number base other than base 10 and base 2 to calculate the hash addresses.

•To map the key 38652 in the range 0 to 9999 using base 11 we have:

3x114 + 8x113 + 6x112 + 5x111 + 2x110 = 5535411

•We may truncate the high-order 5 to yield 5354 as our hash address within 0 to 9999.

5. Mid-Square:-

•The key is squared and the middle part of the result taken as the hash value.

•To map the key 3121 into a hash table of size 1000, we square it 31212 = 9740641 and extract

406 as the hash value.

•Can be more efficient with powers of 2 as hash table size.

NRI Institute of Technology Data Structures through C++ B.Tech II Yr I Sem(R09)

Prepared by A. Sharath Kumar

•Works well if the keys do not contain a lot of leading or trailing zeros.

•Non-integer keys have to be preprocessed to obtain corresponding integer values.

6. Use of a Random-Number Generator:-

•Given a seed as parameter, the method generates a random number.

•The algorithm must ensure that:

• It always generates the same random value for a given key.

• It is unlikely for two keys to yield the same random value.

•The random number produced can be transformed to produce a valid hash value.

Q4. What is Collision? Explain various collision resolution methods.

Collision Resolution (Separate Chaining, Open Addressing-Linear Probing,

Quadratic Probing, Double Hashing, Rehashing, Extendible Hashing)

Two ways to resolve collisions:

Separate Chaining

Open Addressing

o linear probing

o quadratic probing

o double hashing

Hashing: Collision Resolution Schemes:-

•Collision Resolution Techniques

•Introduction to Separate Chaining

•Collision Resolution using Separate Chaining

•Introduction to Collision Resolution using Open Addressing.

Collision Resolution Techniques:-

•There are three broad ways of collision resolution:

1. Separate Chaining: A linked list-based implementation.

2. Open Addressing: Array-based implementation.

(i) Linear probing (linear search)

(ii) Quadratic probing (nonlinear search)

(iii) Random increments/decrements

NRI Institute of Technology Data Structures through C++ B.Tech II Yr I Sem(R09)

Prepared by A. Sharath Kumar

(iv) Rehashing (double hashing)

3. Buckets methods: Usually a combination of (1) & (2)

Introduction to Separate Chaining:-

•The hash table is implemented as an array of linked lists.

•Inserting an item, r, at index i is simply insertion into the linked list at position i.

•Synonyms are chained in the same linked list.

•Retrieval of an item, r, with hash address, i, is simply retrieval from the linked list at position i.

•Deletion of an item, r, with hash address, i, is simply deleting r from the linked list at position i.

Separate Chaining with String Keys:-

•Recall that search keys can be numbers, strings or some other object.

•The following Java method implements such technique

public static int hash(String key, int tableSize) {

int hashVal = 0;

for (int i = 0; i < key.length(); i++) {

hashVal += key.charAt(i);

}

return hashVal % tableSize;

}

Example 1: Separate Chaining:-

•Devise an appropriate hash function and use it to load the information about the following

commodity items into a hash table of size 13 using separate chaining.

NRI Institute of Technology Data Structures through C++ B.Tech II Yr I Sem(R09)

Prepared by A. Sharath Kumar

NRI Institute of Technology Data Structures through C++ B.Tech II Yr I Sem(R09)

Prepared by A. Sharath Kumar

Introduction to Open Addressing:-

•In this method the entries are placed inside the array itself.

•The probe sequence is essentially a sequence of functions {h0, h1, h2, …, hn-1}

where, hi: K -> {0, 1, …, n-1 }

•To insert item r, we examine array locations h0(r), h1(r), h2(r), ...,

• Similarly, to find item r, we examine the same sequence of locations in the same order.

•The most common probe sequences are of the form hi(r) = (h(r) + c(i)) mod n, i = 0, 1, …, n-1.

•The function c(i) is required to have the following two properties:

• Property 1:c(0) = 0.

• Property 2: The set of values

{c(0) mod n, c(1) mod n, c(2) mod n, …, c(n-1) mod n} must contain every integer between 0

and n-1 inclusive.

Open Addressing: Linear Probing:-

•Linear Probe: Here the function c(i) is a linear function in i: c(i) = ai + b

•Property 1 requires that c(0) = 0. Therefore, b must be zero.

•For c(i) = ai to satisfy Property 2, a and n must be relatively prime.

•The linear probing sequence that is usually used is hi (r)= (h(r) + i) mod n, i=0,1,2,…, n-1

•Insert record at first empty slot and if no empty slot is found then the hash table is full and

insertion fails.

Linear Probing: Some Notes :-

•Notice from this table that a large cluster has already been formed.

•In general, empty cells following the cluster have higher chance of being hashed into.

•The probability of taking longer probe sequences is much higher with clusters.

•This is one disadvantage of linear probing. Other methods Attempt to improve on this.

Introduction to Retrieval & Deletion:-

•Retrieval: To search for a record we:

• Calculate its hash value.

NRI Institute of Technology Data Structures through C++ B.Tech II Yr I Sem(R09)

Prepared by A. Sharath Kumar

• Check that location of the array for the record.

· If found, return the record.

· If not, keep searching until you find the record or you reach an empty table location.

•Attempting to retrieve a non-existent record is very expensive.

• Deletion:

• In open addressing, where a record is stored is not ecessarily its home position.

• We cannot just set the location of a deleted record to empty.

• A special flag or key value is needed to mark deleted records locations.

Following table describes variety of collision techniques and their mechanisms


Recommended