+ All Categories
Home > Documents > Scientific Programming [24pt] Data structuresdisi.unitn.it/~montreso/sp/slides/B02-strutture.pdf ·...

Scientific Programming [24pt] Data structuresdisi.unitn.it/~montreso/sp/slides/B02-strutture.pdf ·...

Date post: 18-Mar-2019
Category:
Upload: hoangdat
View: 213 times
Download: 0 times
Share this document with a friend
114
Scientific Programming Data structures Alberto Montresor Università di Trento 2018/11/21 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Transcript

Scientific Programming

Data structures

Alberto Montresor

Università di Trento

2018/11/21

This work is licensed under a Creative CommonsAttribution-ShareAlike 4.0 International License.

references

Table of contents1 Abstract data types (PSADS §1.8, §1.13)

SequencesSetsDictionaries

2 Elementary data structures implementationsLinked List (PSADS §3.19–§3.21)Dynamic vectors

3 Sets, Dictionary implementations in PythonDirect access tablesHash functionsHashtable implementation

4 Stack and queuesStack – Last-in, First-out (LIFO)Queue (PSADS §3.10–§3.12, §3.15–§3.16)

5 Some code examples

Abstract data types (PSADS §1.8, §1.13) Definitions

Introduction

Data

In a programming lan-guage, a piece of datais a value that can beassigned to a variable.

Abstract data type (ADT)

A mathematical model, given by acollection of values and a set of ope-rators that can be performed onthem.

Primitive abstract data types

Primitive abstract data types are provided directly by the language

Type Operators Notesint +,-,*,/, //, % 32-64 bits (264 � 1)long +,-,*,/, //, % Unlimited integer precisionfloat +,-,*,/, //, % 32-64 bitsboolean and, or, not True, False

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 1 / 69

Abstract data types (PSADS §1.8, §1.13) Definitions

Data types

"Specification" and "implementation" of an abstract data type

Specification: its “manual”, hides the implementation details fromthe userImplementation: the actual code that realizes the abstract datatype

Example

Real numbers vs IEEE-754

Don’t try this at home (or try it?):

>>> 0.1 + 0.2

0.30000000000000004

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 2 / 69

Abstract data types (PSADS §1.8, §1.13) Definitions

Data structure

Data structures

Data structures are collections of data, characterized more by theorganization of the data rather than the type of contained data.

How to describe data structures

a systematic approach to organize the collection of dataa set of operators that enable the manipulation of the structure

Characteristics of the data structures

Linear / Non linear (sequence)Static / Dynamic (content variation, size variation)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 3 / 69

Abstract data types (PSADS §1.8, §1.13) Definitions

Data structures

Type Java C++ Python

Sequences

List, Queue, DequeLinkedList,ArrayList, Stack,ArrayDeque

list, forward_listvectorstackqueue, deque

listtupledeque

Sets

SetTreeSet, HashSet,LinkedHashSet

setunordered_set

set, frozenset

Dictionaries

MapHashTree, HashMap,LinkedHashMap

mapunordered_map

dict

Trees

- - -

Graphs

- - -

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 4 / 69

Abstract data types (PSADS §1.8, §1.13) Sequences

Sequence

Sequence

A dynamic data structure representing an "ordered" group of ele-ments

The ordering is not defined by the content, but by the relativeposition inside the sequence (first element, second element,etc.)Values could appear more than onceExample: [0.1, "alberto", 0.05, 0.1] is a sequence

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 5 / 69

Abstract data types (PSADS §1.8, §1.13) Sequences

Sequence

Operators

It is possible to add / remove elements, by specifying their positions = s1, s2, . . . , snthe element si is in position posi

It is possible to access directly some of the elements of the sequencethe beginning and/or the end of the list

having a reference to the position

It is possible to sequentially access all the other elements

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 6 / 69

Abstract data types (PSADS §1.8, §1.13) Sequences

Sequence – Specification

Sequence

% Return True if the sequence is emptyboolean isEmpty()

% Returns the position of the first elementPos head()

% Returns the position of the last elementPos tail()

% Returns the position of the successor of pPos next(Pos p)

% Returns the position of the predecessor of pPos prev(Pos p)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 7 / 69

Abstract data types (PSADS §1.8, §1.13) Sequences

Sequence – Specification

Sequence (continue)% Inserts element v of type object in position p.% Returns the position of the new elementPos insert(Pos p, object v)

% Removes the element contained in position p.% Returns the position of the successor of p, which % becomes successor of thepredecessor of p

Pos remove(Pos p)

% Reads the element contained in position pobject read(Pos p)

% Writes the element v of type object in position pwrite(Pos p, object v)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 8 / 69

Abstract data types (PSADS §1.8, §1.13) Sets

Sets

Set

A dynamic, non-linear data structure that stores an unordered collectionof values without repetitions.

We can consider a total order between elements as the orderdefined over their abstract data type, if present.

Operators

Basic operators:insert

delete

contains

Sorting operatorsMaximum

Minimum

Set operatorsunion

intersection

difference

Iterators:for x in S:

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 9 / 69

Abstract data types (PSADS §1.8, §1.13) Sets

Sets – Specifications

Set

% Returns the size of the setint len()

% Returns True if x belongs to the set; Python: x in S

boolean contains(object x)

% Inserts x in the set, if not already presentadd(object x)

% Removes x from the set, if presentdiscard(object x)

% Returns a new set which is the union of A and BSet union(Set A, Set B)

% Returns a new set which is the intersection of A and BSet intersection(Set A, Set B)

% Returns a new set which is the difference of A and BSet difference(Set A, Set B)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 10 / 69

Abstract data types (PSADS §1.8, §1.13) Dictionaries

Dictionaries

Dictionary

Abstract data structure that represents the mathematical conceptof partial function R : D ! C, or key-value association

Set D is the domain (elements called keys)Set C is the codomain (elements called values)

Operators

Lookup the value associated to a particular key, if present, NoneotherwiseInsert a new key-value association, deleting potential associationthat are already present for the same keyRemove an existing key-value association

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 11 / 69

Abstract data types (PSADS §1.8, §1.13) Dictionaries

Dictionaries – Specification

Dictionary

% Returns the value associated to key k, if present; returns noneotherwise

object lookup(object k)

% Associates value v to key kinsert(object k, object v)

% Removes the association of key kremove(object k)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 12 / 69

Abstract data types (PSADS §1.8, §1.13) Dictionaries

Comments

Concepts of sequences, sets, dictionaries are linkedKey sets/ value sets

Iterate over the set of keys

Some realizations are "natural"Sequence $ list

Queue $ queue as a list

Alternative implementations existSet as boolean vector

Queue as circular vector

The choice of the data structure has implications on theperformances

Dictionary as a hash table: lookup O(1), minimum search O(n)Dictionary as a tree: lookup O(log n), minimum search O(1)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 13 / 69

Table of contents

1 Abstract data types (PSADS §1.8, §1.13)SequencesSetsDictionaries

2 Elementary data structures implementationsLinked List (PSADS §3.19–§3.21)Dynamic vectors

3 Sets, Dictionary implementations in PythonDirect access tablesHash functionsHashtable implementation

4 Stack and queuesStack – Last-in, First-out (LIFO)Queue (PSADS §3.10–§3.12, §3.15–§3.16)

5 Some code examples

Elementary data structures implementations Linked List (PSADS §3.19–§3.21)

List

List (Linked List)

A sequence of memory objects, containing arbitrary data and 1-2pointers to the next element and/or the previous one

Note

Contiguity in the list 6) contiguity in memoryAll the operations require O(1), but in some cases you need a lot ofsingle operations to complete an action

Possible implementations

Bidirectional / MonodirectionalWith sentinel / Without sentinelCircular / Non-circular

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 14 / 69

Elementary data structures implementations Linked List (PSADS §3.19–§3.21)

List

v1L v2 v3 vn nil…

nil v1L v2 vn nilv3 …

v1L v2 vnv3 …

L v1 v2 vn nil…

Monodirectional

Bidirectional

Bidirectional, circular

Monodirectional, with sentinel

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 15 / 69

Elementary data structures implementations Linked List (PSADS §3.19–§3.21)

Monodirectional list, Pythonclass Node:

def __init__(self,initdata):

self.data = initdata

self.next = None

def getData(self):

return self.data

def getNext(self):

return self.next

def setData(self,newdata):

self.data = newdata

def setNext(self,newnext):

self.next = newnext

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 16 / 69

Elementary data structures implementations Linked List (PSADS §3.19–§3.21)

Monodirectional list, Pythonclass UnorderedList:

def __init__(self):

self.head = None

def add(self,item):

temp = Node(item)

temp.setNext(self.head)

self.head = temp

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 17 / 69

Elementary data structures implementations Linked List (PSADS §3.19–§3.21)

Monodirectional list, Pythonclass UnorderedList:

def __init__(self):

self.head = None

def add(self,item):

temp = Node(item)

temp.setNext(self.head)

self.head = temp

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 18 / 69

Elementary data structures implementations Linked List (PSADS §3.19–§3.21)

Monodirectional list, Pythonclass UnorderedList: # Continued

def search(self,item):

current = self.head

found = False

while current != None and not found:

if current.getData() == item:

found = True

else:

current = current.getNext()

return found

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 19 / 69

Elementary data structures implementations Linked List (PSADS §3.19–§3.21)

Monodirectional list, Pythondef remove(self,item):

current = self.head

previous = None

found = False

while not found:

if current.getData() == item:

found = True

else:

previous = current

current = current.getNext()

if previous == None:

self.head = current.getNext()

else:

previous.setNext(current.getNext())

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 20 / 69

Elementary data structures implementations Dynamic vectors

Dynamic vectors

Lists in Python implemented through dynamic vectors

A vector of a given size (initial capacity) is allocatedWhen inserting an element before the end, all elements haveto be moved - cost O(n)

When inserting an element at the end (append), the cost isO(1) (just writing the element at first available slot)

Problem:It is not known how many elements have to be storedThe initial capacity could be insufficient

Solution:A new (larger) vector is allocated, the content is copied in the newvector, the old vector is released

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 21 / 69

Elementary data structures implementations Dynamic vectors

Dynamic vectors

Question

Which is the best approach?

Approach 1

If the old vector has size n, allocate a new vector of size dn. Forexample, d = 2

Approach 2

If the old vector has size n, allocate a new vector of size n+d, whered is a constant. For example, d = 16

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 22 / 69

Elementary data structures implementations Dynamic vectors

Amortized analysis - doubling

Actual cost of an append() operation:

ci =

(i 9k 2 Z+

0

: i = 2

k+ 1

1 otherwise

Assumptions:Initial capacity: 1Writing cost: ⇥(1)

n cost

1 1

2 1 + 2

0= 2

3 1 + 2

1= 3

4 1

5 1 + 2

2= 5

6 1

7 1

8 1

9 1 + 2

3= 9

10 1

11 1

12 1

13 1

14 1

15 1

16 1

17 1 + 2

4= 17

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 23 / 69

Elementary data structures implementations Dynamic vectors

Amortized analysis - doubling

Actual cost of n operations append():

T (n) =nX

i=1

ci

= n+

blogncX

j=0

2

j

= n+ 2

blognc+1 � 1

n+ 2

logn+1 � 1

= n+ 2n� 1 = O(n)

Amortized cost of a singleappend():

T (n)/n =

O(n)

n= O(1)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 24 / 69

Elementary data structures implementations Dynamic vectors

Amortized analysis - increment

Actual cost of an append() operation:

ci =

(i (i mod d) = 1

1 altrimenti

Assumptions

Increment: d

Initial size: d

Writing cost: ⇥(1)

Example

d = 4

n cost

1 1

2 1

3 1

4 1

5 1 + d = 5

6 1

7 1

8 1

9 1 + 2d = 9

10 1

11 1

12 1

13 1 + 3d = 13

14 1

15 1

16 1

17 1 + 4d = 17

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 25 / 69

Elementary data structures implementations Dynamic vectors

Amortized analysis - increment

Actual cost of n operations append():

T (n) =nX

i=1

ci

= n+

bn/dcX

j=1

d · j

= n+ d

bn/dcX

j=1

j

= n+ d(bn/dc+ 1)bn/dc

2

n+

(n/d+ 1)n

2

= O(n2

)

Amortized cost of a singleappend():

T (n)/n =

O(n2

)

n= O(n)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 26 / 69

Elementary data structures implementations Dynamic vectors

Reality check

Language Data structure Expansion factorGNU C++ std::vector 2.0

Microsoft VC++ 2003 vector 1.5Python list 1.125

Oracle Java ArrayList 2.0OpenSDK Java ArrayList 1.5

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 27 / 69

Elementary data structures implementations Dynamic vectors

Python – List

Operator Worst case Worst caseamortized

L.copy() Copy O(n) O(n)L.append(x) Append O(n) O(1)

L.insert(i,x) Insert O(n) O(n)L.remove(x) Remove O(n) O(n)L[i] Index O(1) O(1)

for x in L Iterator O(n) O(n)L[i:i+k] Slicing O(k) O(k)L.extend(s) Extend O(k) O(n+ k)x in L Contains O(n) O(n)min(L), max(L) Min, Max O(n) O(n)len(L) Get length O(1) O(1)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 28 / 69

Table of contents

1 Abstract data types (PSADS §1.8, §1.13)SequencesSetsDictionaries

2 Elementary data structures implementationsLinked List (PSADS §3.19–§3.21)Dynamic vectors

3 Sets, Dictionary implementations in PythonDirect access tablesHash functionsHashtable implementation

4 Stack and queuesStack – Last-in, First-out (LIFO)Queue (PSADS §3.10–§3.12, §3.15–§3.16)

5 Some code examples

Sets, Dictionary implementations in Python

Dictionary: possible implementations

Unorderedarray

Orderedarray

LinkedList

RB Tree Idealimpl.

insert() O(1), O(n) O(n) O(1), O(n) O(log n) O(1)

lookup() O(n) O(log n) O(n) O(log n) O(1)

remove() O(n) O(n) O(n) O(log n) O(1)

Ideal implementation: hash tables

Choose a hash function h that maps each key k 2 U to an integerh(k)

The key-value hk, vi is stored in a list at position h(k)

This vector is called hash table

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 29 / 69

Sets, Dictionary implementations in Python

Hash table – Definitions

All the possible keys are contained in the universe set U of size uThe table is stored in list T [0 . . .m� 1] with size mAn hash function is defined as: h : U ! {0, 1, . . . ,m� 1}

Alberto Montresor

Cristian Consonni

Alessio Guerrieri

Edsger Dijkstra

Keys Hash function

Hash table

00

01

02

03

04

05

06

m-3

m-2

m-1

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 30 / 69

Sets, Dictionary implementations in Python

Collisions

When two or more keys in the dictionary have the same hashvalues, we say that a collision has happenedIdeally, we want to have hash functions with no collisions

Alberto Montresor

Cristian Consonni

Alessio Guerrieri

Edsger Dijkstra

Keys Hash functions

Hash table

00

01

02

03

04

05

06

m-3

m-2

m-1

Collision

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 31 / 69

Sets, Dictionary implementations in Python Direct access tables

Direct access tables

In some cases: the set U is already a (small) subset of Z+

The set of year days, numbered from 1 to 366The set of Kanto’s Pokemons, numbered from 1 to 151

Direct access tables

We use the identity function h(k) = k as hash functionWe select m = u

Problems

If u is very large, this approach may be infeasibleIf u is not large but the number of keys that are actually recordedis much smaller than u = m, memory is wasted

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 32 / 69

Sets, Dictionary implementations in Python Hash functions

Perfect hash functions

Definition

A hash function h is called perfect if h is injective, i.e.8k

1

, k2

2 U : k1

6= k2

) h(k1

) 6= h(k2

)

Examples

Students ASD 2005-2016N. matricola in [100.090, 183.864]

h(k) = k � 100.090,m = 83.774

Studentes enrolled in 2014N. matricola in [173.185, 183.864]

h(k) = k � 173.185,m = 10.679

Problems

Universe space is oftenlarge, sparse, unknown

To obtain a perfecthash function is difficult

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 33 / 69

Sets, Dictionary implementations in Python Hash functions

Hash functions

If collisions cannot be avoided

Let’s try to minimize their numberWe want hash functions that uniform distribute the keys intohash indexes [0 . . .m� 1]

Simple uniformity

Let P (k) be the probability that key k is inserted in the tableLet Q(i) be the probability that a key ends up in the i-thentry of the table

Q(i) =X

k2U :h(k)=i

P (k)

An hash function h has simple uniformity if:8i 2 [0, . . . ,m� 1] : Q(i) = 1/m

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 34 / 69

Sets, Dictionary implementations in Python Hash functions

Hash functions

To obtain a hash function with simple uniformity, the probability

distribution P should be known

Example

if U is given by real number in [0, 1[ and each key has the sa-me probability of being selected, then H(k) = bkmc has simpleuniformity

In the real world

The key distribution may unknown or partially known

Heuristic techniques are used to obtain an approximation of simpleuniformity

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 35 / 69

Sets, Dictionary implementations in Python Hash functions

How to realize a hash function

Assumption

Each key can be translated in a numerical, non-negative values, byreading their internal representation as a number.

Example: string transformation

ord(c): ordinal binary value of character c in ASCIIbin(k): binary representation of key k, by concatenating thebinary values of its charactersint(b): numerical value associated to the binary number b

int(k) = int(bin(k))

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 36 / 69

Sets, Dictionary implementations in Python Hash functions

Division method

Division method

Let m be a prime number

H(k) = int(k) mod m

Example

m = 383

H(”Alberto”) =18.415.043.350.787.183 mod 383 = 221

H(”Alessio”) =18.415.056.470.632.815 mod 383 = 77

H(”Cristian”) = 4.860.062.892.481.405.294 mod 383 = 130

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 37 / 69

Sets, Dictionary implementations in Python Hash functions

Multiplication method

Multiplication method

Le m be any size, for example 2

k

Let C be a real constant, 0 < C < 1

Let i = int(k)

H(k) = bm(C · i� bC · ic)c

Example

m = 2

16

C =

p5�1

2

H(”Alberto”) = 65.536 · 0.78732161432 = 51.598

H(”Alessio”) = 65.536 · 0.51516739168 = 33.762

H(”Cristian”) = 65.536 · 0.72143641000 = 47.280

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 38 / 69

Sets, Dictionary implementations in Python Hashtable implementation

Separate chaining

Idea

The keys with the same valueh are stored in amonodirectional list /dynamic vector

The H(k)-th slot in the hashtable contains the list/vectorassociated to k

© Alberto Montresor 20

Tecniche di risoluzione delle collisioni

✦ Liste di trabocco (chaining) ✦ Gli elementi con lo stesso

valore hash h vengono memorizzati in una lista

✦ Si memorizza un puntatore alla testa della lista nello slot A[h] della tabella hash

✦ Operazioni: ✦ Insert:

inserimento in testa

✦ Lookup, Delete: richiedono di scandire la lista alla ricerca della chiave

k2

0

m–1

k1 k4

k5 k6

k7 k3

k8

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 39 / 69

Sets, Dictionary implementations in Python Hashtable implementation

Separate chaining: complexity analysis

n Number of keys stored in the hash tablem Size of the hash table↵ = n/m Load factorI(↵) Average number of memory accesses to search a key that

is not in the table (insuccess)S(↵) Average number of memory accesses to search a key that

is not in the table (success)

Worst case analysis

All the keys are inserted in a unique listinsert(): ⇥(1)

lookup(), remove(): ⇥(n)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 40 / 69

Sets, Dictionary implementations in Python Hashtable implementation

Separate chaining: complexity analysis

Average case analysis

Let’s assume the hash function has simple uniformity

Hash function computation: ⇥(1), to be added to all searches

How long the lists are?

The expected length of a listis equal to ↵ = n/m

© Alberto Montresor 22

Liste di trabocco: complessità

✦ Teorema: ✦ In tavola hash con concatenamento, una ricerca senza successo richiede un tempo

atteso Ө(1 + α)

✦ Dimostrazione: ✦ Una chiave non presente nella tabella può essere collocata in uno qualsiasi degli m

slot

✦ Una ricerca senza successo tocca tutte le chiavi nella lista corrispondente

✦ Tempo di hashing: 1 + lunghezza attesa lista: α → Θ(1+α)

k1 k41

α

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 41 / 69

Sets, Dictionary implementations in Python Hashtable implementation

Separate chaining: complexity analysis

Insuccess

When searching for a missingkey, all the keys in the listmust be read

Success

When searching for a keyincluded in the table, onaverage half of the keys in thelist must be read.

Expected cost: ⇥(1) + ↵ Expected cost: ⇥(1) + ↵/2

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 42 / 69

Sets, Dictionary implementations in Python Hashtable implementation

Separate chaining: complexity analysis

What is the meaning of the load factor?

The cost factor of every operation is influenced by the cost factor

If m = O(n), ↵ = O(1)

In such case, all operations are O(1) in expectation

If ↵ becomes too large, the size of the hash table can be doubledthrough dynamic vectors

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 43 / 69

Sets, Dictionary implementations in Python Hashtable implementation

Python and hash function/tables

Python sets and dict

Are implemented through hash tables

Sets are degenerate forms of dictionaries, where there are novalues, only keys

Unordered data structures

Order between keys is not preserved by the hash function; this iswhy you get unordered results when you print them

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 44 / 69

Sets, Dictionary implementations in Python Hashtable implementation

Python – set

Operation Average case Worst casex in S Contains O(1) O(n)S.add(x) Insert O(1) O(n)S.remove(x) Remove O(1) O(n)S|T Union O(n+m) O(n ·m)

S&T Intersection O(min(n,m)) O(n ·m)

S-T Difference O(n) O(n ·m)

for x in S Iterator O(n) O(n)len(S) Get length O(1) O(1)

min(S), max(S) Min, Max O(n) O(n)

n = len(S),m = len(T)

https://docs.python.org/2/library/stdtypes.html#set

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 45 / 69

Sets, Dictionary implementations in Python Hashtable implementation

Python – dict

Operation Average case Worst casex in D Contains O(1) O(n)D[] = Insert O(1) O(n)= D[] Lookup O(1) O(n)del D[] Remove O(1) O(n)for x in S Iterator O(n) O(n)len(S) Get length O(1) O(1)

n = len(S),m = len(T)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 46 / 69

Sets, Dictionary implementations in Python Hashtable implementation

Implementing hash functions 1

Rule: If two objects are equal, then their hashes should be equal

If you implement __eq__(), then you should implement function

__hash__() as well

Rule: If two objects have the same hash, then they are likely to be equal

You should avoid to return values that generate collisions in your hash

function.

Rule: In order for an object to be hashable, it must be immutable

The hash value of an object should not change over time

1http://www.asmeurer.com/blog/posts/what-happens-when-you-mess-with-hashing-in-python/

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 47 / 69

Sets, Dictionary implementations in Python Hashtable implementation

Example

class Point:

def __init__(self, x, y):

self.x = x

self.y = y

def __eq__(self, other):

return self.x == other.x and self.y == other.y

def __hash__(self):

return hash( (self.x,self.y) )

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 48 / 69

Table of contents

1 Abstract data types (PSADS §1.8, §1.13)SequencesSetsDictionaries

2 Elementary data structures implementationsLinked List (PSADS §3.19–§3.21)Dynamic vectors

3 Sets, Dictionary implementations in PythonDirect access tablesHash functionsHashtable implementation

4 Stack and queuesStack – Last-in, First-out (LIFO)Queue (PSADS §3.10–§3.12, §3.15–§3.16)

5 Some code examples

Stack and queues Stack

Stack

Stack

A linear, dynamic data structure, in which the operation "remove" returns(and removes) a predefined element: the one that has remained in the datastructure for the least time

Stack

% Returns True if the stack is emptyboolean isEmpty()

% Returns the size of the stackboolean size()

% Inserts v on top of the stackpush(object v)

% Removes the top element of thestack and returns it to the caller

object pop()

% Read the top element of the stack,without modifying it

object peek()

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 49 / 69

Stack and queues Stack

Stack example

Stack Operation Stack Contents Return Values.isEmpty() [] True

s.push(4) [4]

s.push(’dog’) [4,’dog’]

s.peek() [4,’dog’] ’dog’

s.push(True) [4,’dog’,True]

s.size() [4,’dog’,True] 3

s.isEmpty() [4,’dog’,True] False

s.push(8.4) [4,’dog’,True,8.4]

s.pop() [4,’dog’,True] 8.4

s.pop() [4,’dog’] True

s.size() [4,’dog’] 2

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 50 / 69

Stack and queues Stack

Stack

Possible uses

In languages like Python:Compiler: To balance parentheses

In the the interpreter: A new activation

record is created for each function call

In graph analysis:To perform visits of the entire graph

Possible implementations

Through bidirectional listsreference to the top element

Through vectorslimited size, small overhead

top

top

A

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 51 / 69

Stack and queues Stack

Stack in procedural languagesdef func3(y):

z = y

y = y*y

return z+y

def func2(x):

y = x

x = x*3

y = x * func3(y)

return y+x

def func1(w):

x = w

w = w*2

x = w + func2(x)

return x+w

print(func1(5))

See it in code lens!

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 52 / 69

Stack and queues Stack

Stack in Pythonclass Stack:

def __init__(self):

self.items = []

def size(self):

return len(self.items)

def isEmpty(self):

return self.items == []

def pop(self):

return self.items.pop()

def peek(self):

return self.items[len(self.items)-1]

def push(self, item):

self.items.append(item)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 53 / 69

Stack and queues Stack

Example of application: balanced parenthesis

Check whether the following sets of parentheses are balanced{ { ( [ ] [ ] ) } ( ) }

[ [ { { ( ( ) ) } } ] ]

[ ] [ ] [ ] ( ) { }

( [ ) ]

( ( ( ) ] ) )

[ { ( ) ]

These parentheses could be associated to sets, lists, tuples and/orarithmetic operations

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 54 / 69

Stack and queues Stack

Example of application: balanced parenthesisdef parChecker(parString):

s = Stack()

index = 0

for symbol in parString:

if symbol in "([{":

s.push(symbol)

else:

if s.isEmpty():

return False

else:

top = s.pop()

if not matches(top,symbol):

return False

return s.isEmpty()

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 55 / 69

Stack and queues Stack

Example of application: balanced parenthesisdef matches(openpar,closepar):

opens = "([{"

closers = ")]}"

return opens.index(openpar) == closers.index(closepar)

print(parChecker(’{{([][])}()}’))

print(parChecker(’[{()]’))

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 56 / 69

Stack and queues Queue

Queue – First-In, First-Out (FIFO)

Queue

A linear, dynamic data structure, in which the operation "remo-ve" returns (and removes) a predefined element: the one that hasremained in the data structure for the longest time)

Queue

% Returns True if queue is emptyboolean isEmpty()

% Returns the size of the queueint size()

% Inserts v at the end of thequeue

enqueue(object v)

% Extracts q from the beginningof the queue

object dequeue()

% Reads the element at the top ofthe queue

object top()

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 57 / 69

Stack and queues Queue

Queue example

Queue Operation Queue Contents Return Valueq.isEmpty() [] True

q.enqueue(4) [4]

q.enqueue(’dog’) [’dog’,4]

q.enqueue(True) [True,’dog’,4]

q.size() [True,’dog’,4] 3

q.isEmpty() [True,’dog’,4] False

q.enqueue(8.4) [8.4,True,’dog’,4]

q.dequeue() [8.4,True,’dog’] 4

q.dequeue() [8.4,True] ’dog’

q.size() [8.4,True] 2

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 58 / 69

Stack and queues Queue

Queue

Possible uses

To queue requests performed on alimited resource (e.g., printer)To visit graphs

Possible implementations

Through listsadd to the tail

remove from the head

Through circular arraylimited size, small overhead

head

tail

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 59 / 69

Stack and queues Queue

Queue in Python (wrong)

class Queue:

def __init__(self):

self.items = []

def isEmpty(self):

return self.items == []

def enqueue(self, item):

self.items.insert(0,item)

def dequeue(self):

return self.items.pop()

def size(self):

return len(self.items)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 60 / 69

Stack and queues Queue

Queue based on circular array

Implementation based on the modulus operationPay attention to overflow problems (full queue)

head

head+n

head+n

enqueue(12)

dequeue() → 3

enqueue(15,17,33)

3 6 54 … … 43

3 6 54 … … 43 12

head head+n

6 54 … … 43 12

headhead+n

33 6 54 … … 43 12 15 17

head

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array

By MuhannadAjjan [CC BY-SA 4.0] (http://creativecommons.org/licenses/by-sa/4.0) via Wikimedia

CommonsAlberto Montresor (UniTN) SP - Data structure 2018/11/21 61 / 69

Stack and queues Queue

Queue based on circular array – Pseudocode

Queue

Queue(self , dim)self .A new int[0 . . . dim� 1]

self .cap dim % Maximum sizeself .head 0 % Head of the queueself .size 0 % Current size

top()if size > 0 then

return A[head]

dequeue()if size > 0 then

temp A[head]

head (head + 1)%cap

size size� 1

return temp

enqueue(v)if size < cap then

A[(head + size)%cap] vsize size + 1

size()return size

isEmpty()return size = 0

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 62 / 69

Stack and queues Queue

Python data structure - Dequeue

>>> from collections import deque

>>> Q = deque(["Eric", "John", "Michael"])

>>> Q.append("Terry") # Terry arrives

>>> Q.append("Graham") # Graham arrives

>>> Q.popleft() # The first to arrive now leaves

’Eric’

>>> Q.popleft() # The second to arrive now leaves

’John’

>>> Q # Remaining queue in order of arrival

deque([’Michael’, ’Terry’, ’Graham’])

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 63 / 69

Table of contents

1 Abstract data types (PSADS §1.8, §1.13)SequencesSetsDictionaries

2 Elementary data structures implementationsLinked List (PSADS §3.19–§3.21)Dynamic vectors

3 Sets, Dictionary implementations in PythonDirect access tablesHash functionsHashtable implementation

4 Stack and queuesStack – Last-in, First-out (LIFO)Queue (PSADS §3.10–§3.12, §3.15–§3.16)

5 Some code examples

Some code examples

Reverse string

def reverse(s):

n = len(s)-1

res = ""

while n >= 0:

res = res + s[n]

n -= 1

return res

Complexity: ⇥(n2

)

n string sums

Each sum copies all thecharacters in a new string

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 64 / 69

Some code examples

Reverse string

def reverse(s):

n = len(s)-1

res = ""

while n >= 0:

res = res + s[n]

n -= 1

return res

Complexity: ⇥(n2

)

n string sums

Each sum copies all thecharacters in a new string

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 64 / 69

Some code examples

Reverse string

def reverse(s):

res = []

for c in s:

res.insert(0, c)

return "".join(res)

Complexity: ⇥(n2

)

n list inserts

Each insert moves allcharacters one position up inthe list

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 65 / 69

Some code examples

Reverse string

def reverse(s):

res = []

for c in s:

res.insert(0, c)

return "".join(res)

Complexity: ⇥(n2

)

n list inserts

Each insert moves allcharacters one position up inthe list

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 65 / 69

Some code examples

Reverse string

def reverse(s):

n = len(s)-1

res = []

while n >= 0:

res.append(s[n])

n -= 1

return "".join(res)

Complexity: ⇥(n)

n list append

Each append has anamortized cost of O(1)

Better solution

def reverse(s):

return s[::-1]

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 66 / 69

Some code examples

Reverse string

def reverse(s):

n = len(s)-1

res = []

while n >= 0:

res.append(s[n])

n -= 1

return "".join(res)

Complexity: ⇥(n)

n list append

Each append has anamortized cost of O(1)

Better solution

def reverse(s):

return s[::-1]

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 66 / 69

Some code examples

Reverse string

def reverse(s):

n = len(s)-1

res = []

while n >= 0:

res.append(s[n])

n -= 1

return "".join(res)

Complexity: ⇥(n)

n list append

Each append has anamortized cost of O(1)

Better solution

def reverse(s):

return s[::-1]

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 66 / 69

Some code examples

Remove duplicates

def deduplicate(L):

res=[]

for item in L:

if item not in res:

res.append(item)

return res

Complexity: ⇥(n2

)

n list append

n checks whether an elementis already present

Each check costs O(n)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 67 / 69

Some code examples

Remove duplicates

def deduplicate(L):

res=[]

for item in L:

if item not in res:

res.append(item)

return res

Complexity: ⇥(n2

)

n list append

n checks whether an elementis already present

Each check costs O(n)

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 67 / 69

Some code examples

Remove duplicates

def deduplicate(L):

res=[]

present=set()

for item in L:

if item not in present:

res.append(item)

present.add(item)

return res

Complexity: ⇥(n)

n list append

n checks whether an elementis already present

Each check costs O(1)

Other possibility – destroy original order

def deduplicate(L):

return list(set(L))

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 68 / 69

Some code examples

Remove duplicates

def deduplicate(L):

res=[]

present=set()

for item in L:

if item not in present:

res.append(item)

present.add(item)

return res

Complexity: ⇥(n)

n list append

n checks whether an elementis already present

Each check costs O(1)

Other possibility – destroy original order

def deduplicate(L):

return list(set(L))

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 68 / 69

Some code examples

Remove duplicates

def deduplicate(L):

res=[]

present=set()

for item in L:

if item not in present:

res.append(item)

present.add(item)

return res

Complexity: ⇥(n)

n list append

n checks whether an elementis already present

Each check costs O(1)

Other possibility – destroy original order

def deduplicate(L):

return list(set(L))

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 68 / 69

Some code examples

Exercise

Queues and Priority Queues are data structures which are known tomost computer scientists. The Italian Queue, however, is not so wellknown, though it occurs often in everyday life. At lunch time the queuein front of the cafeteria is an italian queue, for example.

In an italian queue each element belongs to a group. If an elemententers the queue, it first searches the queue from head to tail to check ifsome of its group members (elements of the same group) are already inthe queue.

If yes, it enters the queue right behind them.

If not, it enters the queue at the tail and becomes the new lastelement (bad luck).

Dequeuing is done like in normal queues: elements are processed fromhead to tail in the order they appear in the italian queue.

Alberto Montresor (UniTN) SP - Data structure 2018/11/21 69 / 69


Recommended