+ All Categories
Home > Documents > CSE 5350 - Fall 2007Slide 1 Textbook readings: Cormen: Part III, Chapters 10-14 Mihaela Iridon...

CSE 5350 - Fall 2007Slide 1 Textbook readings: Cormen: Part III, Chapters 10-14 Mihaela Iridon...

Date post: 21-Dec-2015
Category:
Upload: garry-reynolds
View: 217 times
Download: 0 times
Share this document with a friend
60
CSE 5350 - Fall 2007 Slide 1 Data Structures Data Structures Specification and Specification and Implementation Implementation Textbook readings: Cormen: Part III, Chapters 10-14 Mihaela Iridon Mihaela Iridon, Ph.D. [email protected] CSE 5350/7350 Introduction to Algorithms Data Structures
Transcript

CSE 5350 - Fall 2007 Slide 1

Data StructuresData StructuresSpecification and ImplementationSpecification and Implementation

Data StructuresData StructuresSpecification and ImplementationSpecification and Implementation

Textbook readings:Cormen: Part III, Chapters 10-14

Mihaela IridonMihaela Iridon, Ph.D.

[email protected]

CSE 5350/7350Introduction to Algorithms

Data Structures

ObjectivesObjectives

• Understand what dynamic sets are• Learn basic techniques for

a) Representing &b) Manipulating finite dynamic set

• Elementary Data Structures– Stacks, queues, heaps, linked lists

• More Complex Data Structures– Hash tables, binary search trees

• Data Structures in C#.NET 2.0CSE 5350 - Fall 2007 Slide 2Data Structures

High-Level Structure (1)High-Level Structure (1)

• Arrays– System.Collections.ArrayList– System.Collections.Generic.List

• Queue– System.Collections.Generic.Queue

• Stack– System.Collections.Generic.Stack

CSE 5350 - Fall 2007 Slide 3Data Structures

High-Level Structure (2)High-Level Structure (2)

• Hashtable– System.Collections.Hashtable– System.Collections.Generic.Dictionary

• Trees– Binary Trees, BST, Self-Balancing BST– Linked Lists

• System.Collections.Generic.LinkedList

• Graphs

CSE 5350 - Fall 2007 Slide 4Data Structures

Dynamic Data SetsDynamic Data Sets

• Definition• Why dynamic• General examples• Data structures and the .NET framework• “An Extensive Examination of Data

Structures Using C# 2.0” – Scott Mitchell

• http://msdn2.microsoft.com/en-us/library/ms364091(VS.80).aspx

CSE 5350 - Fall 2007 Slide 5Data Structures

Data Structure DesignData Structure Design

• Impact on efficiency/running time• The data structure used by an

algorithm can greatly affect the algorithm's performance

• Important to have rigorous method by which to compare the efficiency of various data structures

CSE 5350 - Fall 2007 Slide 6Data Structures

Example: file extension searchExample: file extension search

• Search is of O(n)

CSE 5350 - Fall 2007 Slide 7

public bool DoesExtensionExist(string [] fileNames, string extension) {

int i = 0; for (i = 0; i < fileNames.Length; i++) if (String.Compare(Path.GetExtension(fileNames[i]), extension, true) == 0)

return true; return false; // If we reach here, we didn't find the extension }

}

Data Structures

The ArrayThe Array

• Linear• Simple• Direct Access• Homogeneous• Most widely used

CSE 5350 - Fall 2007 Slide 8Data Structures

The Array (2)The Array (2)

• The contents of an array are stored in contiguous memory.

• All of the elements of an array must be of the same type or of a derived type; hence arrays are referred to as homogeneous data structures.

• Array elements can be directly accessed. With arrays if you know you want to access the ith element, you can simply use one line of code: arrayName[i].

CSE 5350 - Fall 2007 Slide 9Data Structures

Array OperationsArray Operations

• Allocation• Accessing

– Declaring an array in C#:string[] myArray;

(initially myArray reference is null)

– Creating an array in C#:myArray = new string[5];

CSE 5350 - Fall 2007 Slide 10Data Structures

Array AllocationArray Allocation

• string[] myArray = new string[someIntegerSize];

this allocates a contiguous block of memory on the heap (CLR-managed)

CSE 5350 - Fall 2007 Slide 11Data Structures

Array AccessingArray Accessing

• Accessing an element at index i: O(1)

• Searching through and array– Unsorted: O(n)– Sorted: O(log n)

• Array class: static method:– Array.BinarySearch(Array input, object

val)

CSE 5350 - Fall 2007 Slide 12Data Structures

Array ResizingArray Resizing

• When the size needs to change:– Must create a new array instance– Copy old array into new array:

Array1.CopyTo(Array2, 0)

• Time consuming• Also, inserting into an array is

problematic

CSE 5350 - Fall 2007 Slide 13Data Structures

Multi-Dimensional ArraysMulti-Dimensional Arrays

• Rectangular– n x n– n x n x n x …– Accessing: O(1)– Searching: O(nk)

• Jagged/Ragged– n1 x n2 x n3 x …

CSE 5350 - Fall 2007 Slide 14Data Structures

GoalsGoals

• Type-safe• Performant• Reusable

• Example: payroll application

CSE 5350 - Fall 2007 Slide 15Data Structures

System.Collections.ArrayListSystem.Collections.ArrayList

• Can hold any data type: (hybrid)• Internally: array object• Automatic resizing• Not type safe: casting errors

detected only at runtime• Boxing/unboxing: extra-level of

indirection affects performance• Loose homogeneity

CSE 5350 - Fall 2007 Slide 16Data Structures

GenericsGenerics

• Remedy for Typing and Performance• Type-safe collections• Reusability

• Example:public class MyTypeSafeList<T>{

T[] innerArray = new T[0];}

CSE 5350 - Fall 2007 Slide 17Data Structures

ListList• Homogeneous• Self-Re-dimensioning Array• System.Collections.Generic.List

List<string> studentNames = new List<string>();

studentNames.Add(“John”); …string name = studentNames[3];studentNames[2] = “Mike”;

CSE 5350 - Fall 2007 Slide 18Data Structures

List MethodsList Methods

• Contains()• IndexOf()• BinarySearch()• Find()• FindAll()• Sort()

– Asymptotic Running Time: same as array but with extra overhead

CSE 5350 - Fall 2007 Slide 19Data Structures

Ordered Requests ProcessingOrdered Requests Processing

• First-come, First-serve (FIFO)• Priority-based processing• Inefficient to use List<T>• List will continue to grow (internally,

the size is doubled every time)• Solution: circular list/array• Problem: initial size??

CSE 5350 - Fall 2007 Slide 20Data Structures

QueueQueue

• System.Collections.Generic.Queue• Operations:

– Enqueue()– Dequeue()– Contains()– ToArray()– Peek()

• Does not allow random access• Type-safe; maximizes space utilization

CSE 5350 - Fall 2007 Slide 21Data Structures

Queue (continued)Queue (continued)

• Applications:– Web servers– Print queues

• Rate of growth:– Specified in the constructor– Default: double initial size

CSE 5350 - Fall 2007 Slide 22Data Structures

StackStack

• LIFO• System.Collections.Generic.Stack• Operations:

– Push()– Pop()

• Doubles in size when more space is needed

• Applications:– CLR call stack (functions invocation)

CSE 5350 - Fall 2007 Slide 23Data Structures

Limitations of Ordinal IndexingLimitations of Ordinal Indexing

• Ideal access time: O(1)• If index is unknown

– O(n) if not sorted– O(log n) if sorted

• Example: SSN: 10 ^ 9 possible combinations

• Solution: compress the ordinal indexing domain with a hash function; e.g. use only 4 digits

CSE 5350 - Fall 2007 Slide 24Data Structures

Hash TableHash Table• Hashing:

– Math transformation of one representation into another representation

• Hash table:– The array that uses hashing to compress

the indexers space

• Cryptography (information security)• Hash function:

– Non-injective (not a one-to-one function)– “Fingerprint” of initial data

CSE 5350 - Fall 2007 Slide 25Data Structures

GoalsGoals

• Fast access of items in large amounts of data

• Few collisions as possible– collision avoidance

• Avalanche effect:– Minor changes to input major

changes to output

CSE 5350 - Fall 2007 Slide 26Data Structures

Collision Resolution (1)Collision Resolution (1)

• Probability to map to a given location:

1/k (k = size = number of slots)

• (1) Linear ProbingIs H[i] empty?• YES: place item at location I• NO: i = i + 1; repeat

– Deficiency: clustering– Access and Insertion: no longer O(1)

CSE 5350 - Fall 2007 Slide 27Data Structures

Collision Resolution (2)Collision Resolution (2)

• (2) Quadratic Probing– Check s + 12

– Check s – 12

– Check s + 22

– Check s – 22

– …– Check s +/- i2

– Clustering a problem as well

CSE 5350 - Fall 2007 Slide 28Data Structures

Collision Resolution (3)Collision Resolution (3)• (3) Rehashing – used by Hashtable

(C#)• System.Collections.Hashtable• Operations:

– Add(key, item)– ContainsKey()– Keys()– ContainsValue()– Values()

• Key, Value: any type not type safeCSE 5350 - Fall 2007 Slide 29Data Structures

Hashtable Data Type – ExampleHashtable Data Type – Example

CSE 5350 - Fall 2007 Slide 30Data Structures

using System;using System.Collections;

public class HashtableDemo{ private static Hashtable employees = new Hashtable();

public static void Main() { // Add some values to the Hashtable, indexed by a string key employees.Add("111-22-3333", "Scott"); employees.Add("222-33-4444", "Sam"); employees.Add("333-44-55555", "Jisun");

// Access a particular key if (employees.ContainsKey("111-22-3333")) { string empName = (string) employees["111-22-3333"]; Console.WriteLine("Employee 111-22-3333's name is: " + empName); } else Console.WriteLine("Employee 111-22-3333 is not in the hash table..."); }}

using System;using System.Collections;

public class HashtableDemo{ private static Hashtable employees = new Hashtable();

public static void Main() { // Add some values to the Hashtable, indexed by a string key employees.Add("111-22-3333", "Scott"); employees.Add("222-33-4444", "Sam"); employees.Add("333-44-55555", "Jisun");

// Access a particular key if (employees.ContainsKey("111-22-3333")) { string empName = (string) employees["111-22-3333"]; Console.WriteLine("Employee 111-22-3333's name is: " + empName); } else Console.WriteLine("Employee 111-22-3333 is not in the hash table..."); }}

HashtableHashtable

• Key = any type• Key is transformed into an index via

GetHashCode() function• Object class defines GetHashCode()• H(key) = [GetHash(key) + 1 +

(((GetHash(key) >> 5) + 1) %(hashsize – 1))] % hashsize

Values = 0 .. hashsize-1

CSE 5350 - Fall 2007 Slide 31Data Structures

Collision Resolution (3 – cont’d)Collision Resolution (3 – cont’d)

• Rehashing = double hashing• Set of hash functions: H1, H2, …, Hn

• Hk(key) = [GetHash(key) + k *

(1 + (((GetHash(key) >> 5) + 1) %(hashsize – 1)))] % hashsize

• Hashsize must be PRIME

CSE 5350 - Fall 2007 Slide 32Data Structures

HashtableHashtable• Load Factor = MAX ( # items / # slots)• Optimal: 0.72• Expanding the hashtable: 2 steps: (costly)

– Double # slots (crt prime next prime which is about twice bigger)

– Rehash

• High LoadFactor Dense Hashtable– Less space– More probes on collision (1/(1-LF))– If LF = 0.72 expected # probes = 3.5 O(1)

CSE 5350 - Fall 2007 Slide 33Data Structures

HashtableHashtable

• Costly to expand• Set the size in constructor if size is

known• Asymptotic running times:

– Access: O(1)– Add, Remove: O(1)– Search: O(1)

CSE 5350 - Fall 2007 Slide 34Data Structures

System.Collections.Generic.DictionSystem.Collections.Generic.Dictionaryary• Typesafe• Strongly typed KEYS + VALUES• Operations:

– Add(key, value)– ContainsKey(key)

• Collision Resolution: CHAININGCHAINING– Uses linked lists from an entry where

collision occurs

CSE 5350 - Fall 2007 Slide 35Data Structures

Chaining in Dictionary Data Chaining in Dictionary Data TypeType

CSE 5350 - Fall 2007 Slide 36Data Structures

Dictionary ExampleDictionary Example

CSE 5350 - Fall 2007 Slide 37Data Structures

Dictionary<keyType, valueType> variableName = new Dictionary<keyType, valueType>();

Dictionary<keyType, valueType> variableName = new Dictionary<keyType, valueType>();

Dictionary<int, Employee> employeeData = new Dictionary<int, Employee>();Dictionary<int, Employee> employeeData = new Dictionary<int, Employee>();

// Add some employeesemployeeData.Add(455110189) = new Employee("Scott Mitchell");employeeData.Add(455110191) = new Employee("Jisun Lee");...// See if employee with SSN 123-45-6789 works hereif (employeeData.ContainsKey(123456789)) ...

// Add some employeesemployeeData.Add(455110189) = new Employee("Scott Mitchell");employeeData.Add(455110191) = new Employee("Jisun Lee");...// See if employee with SSN 123-45-6789 works hereif (employeeData.ContainsKey(123456789)) ...

Chaining in the Dictionary typeChaining in the Dictionary type• Efficiency:

– Add: O(1)– Remove: O (n/m)– Search: O(n/m)Where:

n = hash table sizem = number of buckets/slots

• Implemented s.t. n=m at ALL times– The total # of chained elements can never

exceed the number of buckets

CSE 5350 - Fall 2007 Slide 38Data Structures

TreesTrees

• = set of linked nodes where no cycle exists

• (GT) a connected acyclic graph• Nodes:

– Root– Leaf– Internal

• |E| = ?• Forrest = { trees }CSE 5350 - Fall 2007 Slide 39Data Structures

Popular Tree-Type Data Popular Tree-Type Data StructuresStructures• BST: Binary Search Tree• Heap• Self-balancing binary search trees

– AVL – Red-black

• Radix tree• …

CSE 5350 - Fall 2007 Slide 40Data Structures

Binary TreesBinary Trees

• Code example for defining a tree data object

• Tree Traversal– In-order: L Ro R– Pre-order: Ro L R– Post-order: L R Ro– Ө(n)

CSE 5350 - Fall 2007 Slide 41Data Structures

Binary Tree Data StructureBinary Tree Data Structure

CSE 5350 - Fall 2007 Slide 42Data Structures

Tree OperationsTree Operations• Search: Recursive: O(h)

– h = height of the tree

• Max & Min Search: search right/left• Successor & Predecessor Search• Insertion (easy: always add a new leaf)

& Deletion (more complicated as it may cause the tree structure to change)

• Running time:– function of the tree topology

CSE 5350 - Fall 2007 Slide 43Data Structures

Binary Search TreeBinary Search Tree

• Improves the search time (and lookup time) over the binary tree in general

• BST property:– for any node n, every descendant node's

value in the left subtree of n is less than the value of n, and every descendant node's value in the right subtree is greater than the value of n

CSE 5350 - Fall 2007 Slide 44Data Structures

Non-BST vs BSTNon-BST vs BST

CSE 5350 - Fall 2007 Slide 45Data Structures

(a) Non-BST(b) BST

Linear Search Time in BSTLinear Search Time in BST

CSE 5350 - Fall 2007 Slide 46Data Structures

The search time for a BST depends upon its topology.

BST continuedBST continued

• Perfectly balanced BST:– Search: O(log n) [ height = log n]

• Sub-linear search running time

• Balanced Binary Tree:– Exhibits a good ration: breadth/width

• Self-balancing trees

CSE 5350 - Fall 2007 Slide 47Data Structures

The HeapThe Heap

• Specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key(A) ≥ key(B). [max-heap]

• Operations:– delete-max or delete-min: removing the root

node of a max- or min-heap, respectively – increase-key or decrease-key: updating a key

within a max- or min-heap, respectively – insert: adding a new key to the heap – merge: joining two heaps to form a valid new

heap containing all the elements of both

CSE 5350 - Fall 2007 Slide 48Data Structures

Max Heap ExampleMax Heap Example

• Example of max-heap:

CSE 5350 - Fall 2007 Slide 49Data Structures

Linked ListsLinked Lists• No resizing necessary• Search: O(n)• Insertion

– O(1) if unsorted– O(n) is sorted

• Access: O(n)• System.Collections.Generic.LinkedList

– Doubly-linked; type safe (value Generics)– Element: LinkedListNode

CSE 5350 - Fall 2007 Slide 50Data Structures

Skip ListSkip List• Link list with self-balancing BST-like

property• The elements are sorted• Height = log n• Problems with insert & delete• Solution: randomized distribution• Overall: O(log n)• Worst case: O(n) – but very, very, slim

changes to reach worst case

CSE 5350 - Fall 2007 Slide 51Data Structures

Skip List ExamplesSkip List Examples

CSE 5350 - Fall 2007 Slide 52Data Structures

GraphsGraphs

• A collection of interconnected nodes• A graph or undirected graph G is

an ordered pair G: = (V,E) that is subject to the following conditions:

– V is a set, whose elements are called vertices or nodes,

– E is a set of pairs (unordered) of distinct vertices, called edges or lines.

• Edges (1):– Directed - Weighted– Undirected - Unweighted

CSE 5350 - Fall 2007 Slide 53Data Structures

Graph (cont’d)Graph (cont’d)

• Sparse: |E| << |Emax| or |E| ≤ n2

• Representation:– Adjacency List– Adjacency Matrix– (Packed Edge List)

• Problems applicable to graphs:– Minimum spanning tree (Kruskal, Prim)– Shortest Path (Dijkstra)

CSE 5350 - Fall 2007 Slide 54Data Structures

Website Navigation as a GraphWebsite Navigation as a Graph

CSE 5350 - Fall 2007 Slide 55Data Structures

Distance Graph ExampleDistance Graph Example

CSE 5350 - Fall 2007 Slide 56Data Structures

Graph RepresentationGraph Representation

CSE 5350 - Fall 2007 Slide 57Data Structures

Minimum Spanning TreeMinimum Spanning Tree• Spanning Tree of a connected,

undirected graph = some subset of the edges that connect all the nodes, and does not introduce a cycle

CSE 5350 - Fall 2007 Slide 58Data Structures

Kruskal’s AlgorithmKruskal’s Algorithm

CSE 5350 - Fall 2007 Slide 59Data Structures

Prim’s AlgorithmPrim’s Algorithm

CSE 5350 - Fall 2007 Slide 60Data Structures


Recommended