+ All Categories
Home > Documents > CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

Date post: 08-Feb-2016
Category:
Upload: kevina
View: 37 times
Download: 0 times
Share this document with a friend
Description:
CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees. Dan Grossman Spring 2010. Where we are. Studying the absolutely essential ADTs of computer science and classic data structures for implementing them ADTs so far: Stack: push , pop , isEmpty , … - PowerPoint PPT Presentation
Popular Tags:
37
CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees Dan Grossman Spring 2010
Transcript
Page 1: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

CSE332: Data Abstractions

Lecture 6: Dictionaries; Binary Search Trees

Dan GrossmanSpring 2010

Page 2: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

2CSE332: Data Abstractions

Where we are

Studying the absolutely essential ADTs of computer science and classic data structures for implementing them

ADTs so far:

1. Stack: push, pop, isEmpty, …2. Queue: enqueue, dequeue, isEmpty, …3. Priority queue: insert, deleteMin, …

Next:4. Dictionary (a.k.a. Map): associate keys with values

– probably the most common, way more than priority queue

Spring 2010

Page 3: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

3CSE332: Data Abstractions

The Dictionary (a.k.a. Map) ADT

• Data:– set of (key, value) pairs– keys must be

comparable

• Operations:– insert(key,value)– find(key)– delete(key)– …

• djgDanGrossman…

• trobisonTylerRobison…

• sandona1BrentSandona…

insert(djg, ….)

find(trobison)Tyler, Robison, …

Will tend to emphasize the keys, don’t forget about the stored values

Spring 2010

Page 4: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

4CSE332: Data Abstractions

Comparison: The Set ADT

The Set ADT is like a Dictionary without any values– A key is present or not (no repeats)

For find, insert, delete, there is little difference– In dictionary, values are “just along for the ride”– So same data-structure ideas work for dictionaries and sets

But if your Set ADT has other important operations this may not hold– union, intersection, is_subset– notice these are binary operators on sets

Spring 2010

Page 5: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

5CSE332: Data Abstractions

Dictionary data structures

Will spend the next 1.5-2 weeks implementing dictionaries with three different data structures

1. AVL trees– Binary search trees with guaranteed balancing

2. B-Trees– Also always balanced, but different and shallower

3. Hashtables– Not tree-like at all

Skipping: Other balanced trees (red-black, splay)

But first some applications and less efficient implementations…Spring 2010

Page 6: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

6CSE332: Data Abstractions

A Modest Few Uses

Any time you want to store information according to some key and be able to retrieve it efficiently– Lots of programs do that!

• Networks: router tables• Operating systems: page tables• Compilers: symbol tables• Databases: dictionaries with other nice properties• Search: inverted indexes, phone directories, …• Biology: genome maps• …

Spring 2010

Page 7: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

7CSE332: Data Abstractions

Simple implementationsFor dictionary with n key/value pairs

insert find delete• Unsorted linked-list

• Unsorted array

• Sorted linked list

• Sorted array

We’ll see a Binary Search Tree (BST) probably does better, but not in the worst case unless we keep it balanced

Spring 2010

Page 8: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

8CSE332: Data Abstractions

Simple implementationsFor dictionary with n key/value pairs

insert find delete• Unsorted linked-list O(1) O(n) O(n)

• Unsorted array O(1) O(n) O(n)

• Sorted linked list O(n) O(n) O(n)

• Sorted array O(n) O(log n) O(n)

We’ll see a Binary Search Tree (BST) probably does better, but not in the worst case unless we keep it balanced

Spring 2010

Page 9: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

9CSE332: Data Abstractions

Lazy Deletion

A general technique for making delete as fast as find:– Instead of actually removing the item just mark it deleted

Plusses:– Simpler– Can do removals later in batches– If re-added soon thereafter, just unmark the deletion

Minuses:– Extra space for the “is-it-deleted” flag– Data structure full of deleted nodes wastes space– find O(log m) time where m is data-structure size (okay)– May complicate other operations

Spring 2010

10 12 24 30 41 42 44 45 50

Page 10: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

10CSE332: Data Abstractions

Some tree terms (mostly review)

• There are many kinds of trees– Every binary tree is a tree– Every list is kind of a tree (think of “next” as the one child)

• There are many kinds of binary trees– Every binary min heap is a binary tree– Every binary search tree is a binary tree

• A tree can be balanced or not– A balanced tree with n nodes has a height of O(log n) – Different tree data structures have different “balance

conditions” to achieve this

Spring 2010

Page 11: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

11CSE332: Data Abstractions

Binary Trees• Binary tree is empty or

– a root (with data)– a left subtree (maybe empty) – a right subtree (maybe empty)

• Representation:

A

B

D E

C

F

HG

JI

Dataright

pointerleft

pointer

• For a dictionary, data will include a key and a value

Spring 2010

Page 12: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

12CSE332: Data Abstractions

Binary Tree: Some NumbersRecall: height of a tree = longest path from root to leaf (count edges)

For binary tree of height h:– max # of leaves:

– max # of nodes:

– min # of leaves:

– min # of nodes:

Spring 2010

Page 13: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

13CSE332: Data Abstractions

Binary Trees: Some NumbersRecall: height of a tree = longest path from root to leaf (count edges)

For binary tree of height h:– max # of leaves:

– max # of nodes:

– min # of leaves:

– min # of nodes:

2h

2(h + 1) - 1

1

h + 1

For n nodes, we cannot do better than O(log n) height, and we want to avoid O(n) height

Spring 2010

Page 14: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

14CSE332: Data Abstractions

Calculating height

What is the height of a tree with root r?

Spring 2010

int treeHeight(Node root) {

???

}

Page 15: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

15CSE332: Data Abstractions

Calculating heightWhat is the height of a tree with root r?

Spring 2010

int treeHeight(Node root) { if(root == null) return -1; return 1 + max(treeHeight(root.left), treeHeight(root.right));}

Running time for tree with n nodes: O(n) – single pass over tree

Note: non-recursive is painful – need your own stack of pending nodes; much easier to use recursion’s call stack

Page 16: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

16CSE332: Data Abstractions

Tree Traversals

A traversal is an order for visiting all the nodes of a tree

• Pre-order: root, left subtree, right subtree

• In-order: left subtree, root, right subtree

• Post-order: left subtree, right subtree, root

+

*

2 4

5

(an expression tree)

Spring 2010

Page 17: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

17CSE332: Data Abstractions

More on traversals

void inOrdertraversal(Node t){ if(t != null) { traverse(t.left); process(t.element); traverse(t.right); }}

Sometimes order doesn’t matter• Example: sum all elements

Sometimes order matters• Example: print tree with parent above

indented children (pre-order)• Example: evaluate an expression tree

(post-order)

A B D E C

F G

A

B

D E

C

F G

Spring 2010

Page 18: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

CSE332: Data Abstractions 18

Binary Search Tree

4

121062

115

8

14

13

7 9

• Structural property (“binary”)– each node has 2 children– result: keeps operations simple

• Order property– all keys in left subtree smaller

than node’s key– all keys in right subtree larger

than node’s key– result: easy to find any given key

Spring 2010

Page 19: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

19CSE332: Data Abstractions

Are these BSTs?

3

1171

84

5

4

181062

115

8

20

21

7

15

Spring 2010

Page 20: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

20CSE332: Data Abstractions

Are these BSTs?

3

1171

84

5

4

181062

115

8

20

21

7

15

Spring 2010

Page 21: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

21CSE332: Data Abstractions

Find in BST, Recursive

2092

155

12

307 1710

Data find(Key key, Node root){ if(root == null) return null; if(key < root.key) return find(key,root.left); if(key > root.key) return find(key,root.right); return root.data;}

Spring 2010

Page 22: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

22CSE332: Data Abstractions

Find in BST, Iterative

2092

155

12

307 1710

Data find(Key key, Node root){ while(root != null && root.key != key) { if(key < root.key) root = root.left; else(key > root.key) root = root.right; } if(root == null) return null; return root.data;}

Spring 2010

Page 23: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

CSE332: Data Abstractions 23

Other “finding operations”

• Find minimum node– “the liberal algorithm”

• Find maximum node– “the conservative algorithm”

• Find predecessor of a non-leaf• Find successor of a non-leaf• Find predecessor of a leaf• Find successor of a leaf

2092

155

12

307 1710

Spring 2010

Page 24: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

24CSE332: Data Abstractions

Insert in BST

2092

155

12

307 17

insert(13)insert(8)insert(31)

(New) insertions happen only at leaves – easy!10

8 31

13

Spring 2010

Page 25: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

25CSE332: Data Abstractions

Deletion in BST

2092

155

12

307 17

Why might deletion be harder than insertion?

10

Spring 2010

Page 26: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

26CSE332: Data Abstractions

Deletion• Removing an item disrupts the tree structure

• Basic idea: find the node to be removed, then “fix” the tree so that it is still a binary search tree

• Three cases:– node has no children (leaf)– node has one child– node has two children

Spring 2010

Page 27: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

27CSE332: Data Abstractions

Deletion – The Leaf Case

2092

155

12

307 17

delete(17)

10

Spring 2010

Page 28: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

28CSE332: Data Abstractions

Deletion – The One Child Case

2092

155

12

307 10

Spring 2010

delete(15)

Page 29: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

29CSE332: Data Abstractions

Deletion – The Two Child Case

3092

205

12

7

What can we replace 5 with?

10

Spring 2010

delete(5)

Page 30: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

30CSE332: Data Abstractions

Deletion – The Two Child Case

Idea: Replace the deleted node with a value guaranteed to be between the two child subtrees

Options:• successor from right subtree: findMin(node.right)• predecessor from left subtree: findMax(node.left)

– These are the easy cases of predecessor/successor

Now delete the original node containing successor or predecessor• Leaf or one child case – easy cases of delete!

Spring 2010

Page 31: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

31CSE332: Data Abstractions

BuildTree for BST• We had buildHeap, so let’s consider buildTree

• Insert keys 1, 2, 3, 4, 5, 6, 7, 8, 9 into an empty BST

– If inserted in given order, what is the tree?

– What big-O runtime for this kind of sorted input?

– Is inserting in the reverse order any better?

1

2

3

O(n2)Not a happy place

Spring 2010

Page 32: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

32CSE332: Data Abstractions

BuildTree for BST• Insert keys 1, 2, 3, 4, 5, 6, 7, 8, 9 into an empty BST

• What we if could somehow re-arrange them– median first, then left median, right median, etc.– 5, 3, 7, 2, 1, 4, 8, 6, 9

– What tree does that give us?

– What big-O runtime?

842

73

5

9

6

1

O(n log n), definitely better

Spring 2010

Page 33: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

33CSE332: Data Abstractions

Unbalanced BST

• Balancing a tree at build time is insufficient, as sequences of operations can eventually transform that carefully balanced tree into the dreaded list

• At that point, everything isO(n) and nobody is happy– find– insert– delete

1

2

3

Spring 2010

Page 34: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

34CSE332: Data Abstractions

Balanced BST

Observation• BST: the shallower the better!• For a BST with n nodes inserted in arbitrary order

– Average height is O(log n) – see text for proof– Worst case height is O(n)

• Simple cases such as inserting in key order lead tothe worst-case scenario

Solution: Require a Balance Condition that1. ensures depth is always O(log n) – strong enough!2. is easy to maintain – not too strong!

Spring 2010

Page 35: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

35CSE332: Data Abstractions

Potential Balance Conditions1. Left and right subtrees of the root

have equal number of nodes

2. Left and right subtrees of the roothave equal height

Too weak!Height mismatch example:

Too weak!Double chain example:

Spring 2010

Page 36: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

36CSE332: Data Abstractions

Potential Balance Conditions3. Left and right subtrees of every node

have equal number of nodes

4. Left and right subtrees of every nodehave equal height

Too strong!Only perfect trees (2n – 1 nodes)

Too strong!Only perfect trees (2n – 1 nodes)

Spring 2010

Page 37: CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees

37

The AVL Balance ConditionLeft and right subtrees of every nodehave heights differing by at most 1

Definition: balance(node) = height(node.left) – height(node.right)

AVL property: for every node x, –1 balance(x) 1

• Ensures small depth– Will prove this by showing that an AVL tree of height

h must have a number of nodes exponential in h

• Easy (well, efficient) to maintain– Using single and double rotations

Spring 2010 CSE332: Data Abstractions


Recommended