Pf congres20110917 data-structures

Post on 13-Jun-2015

1,391 views 0 download

Tags:

transcript

1

SPL Data Structures and their Complexity

Jurrien Stutterheim

September 17, 2011

2

1. Introduction

3

This presentation §1

I Understand what data structures are

I How they are represented internally

I How “fast” each one is and why that is

4

Data structures §1

I Classes that offer the means to store and retrieve data,possibly in a particular order

I Implementation is (often) optimised for certain use cases

I array is PHP’s oldest and most frequently used datastructure

I PHP 5.3 adds support for several others

5

Current SPL data structures §1

I SplDoublyLinkedList

I SplStack

I SplQueue

I SplHeap

I SplMaxHeap

I SplMinHeap

I SplPriorityQueue

I SplFixedArray

I SplObjectStorage

6

Why care? §1

I Using the right data structure in the right place couldimprove performance

I Already implemented and tested: saves work

I Can add a type hint in a function definition

I Adds semantics to your code

7

Algorithmic complexity §1

I We want to be able to talk about the performance of thedata structure implementation

I Running speed (time complexity)I Space consumption (space complexity)

I We describe complexity in terms of input size, which ismachine and programming language independent

8

Example §1

for ($i = 0; $i < $n; $i++)

for ($j = 0; $j < $n; $j++)

echo ’tick’;

For some n, how many times is “tick” printed? I.e. what is thetime complexity of this algorithm?

n2 times

8

Example §1

for ($i = 0; $i < $n; $i++)

for ($j = 0; $j < $n; $j++)

echo ’tick’;

For some n, how many times is “tick” printed? I.e. what is thetime complexity of this algorithm?

n2 times

9

Talking about complexity §1

I Pick a function to act as boundary for the algorithm’scomplexity

I Worst-caseI Denoted O (big-Oh)I “My algorithm will not be slower than this function”

I Best-caseI Denoted Ω (big-Omega)I “My algorithm will at least be as slow as this function”

I If they are the same, we write Θ (big-Theta)

I In example: both cases are n2, so the algorithm is in Θ(n2)

10

Visualized §1

11

Example 2 §1

for ($i = 0; $i < $n; $i++)

if ($myBool)

for ($j = 0; $j < $n; $j++)

echo ’tick’;

What is the time complexity of this algorithm?

I O(n2)

I Ω(n) (if $myBool is false)

I No Θ!

11

Example 2 §1

for ($i = 0; $i < $n; $i++)

if ($myBool)

for ($j = 0; $j < $n; $j++)

echo ’tick’;

What is the time complexity of this algorithm?

I O(n2)

I Ω(n) (if $myBool is false)

I No Θ!

12

We can be a bit sloppy §1

for ($i = 0; $i < $n; $i++)

if ($myBool)

for ($j = 0; $j < $n; $j++)

echo ’tick’;

I We describe algorithmic behaviour as input size grows toinfinity

I constant factors and smaller terms don’t matter too much

I E.g. 3n2 + 4n + 1 is in O(n2)

13

Other functions §1

for ($i = 0; $i < $n; $i++)

for ($j = 0; $j < $n; $j++)

echo ’tick’;

for ($i = 0; $i < $n; $i++)

echo ’tock’;

This algorithm is still in Θ(n2).

14

Bounds §1

Figure: Order relations1

1Taken from Cormen et al. 2009

15

Complexity Comparison §1

100

101

101

102

103

Logarithmic

Linear

Quadratic

ExponentialFactorialSuperexponential

Constant: 1, logarithmic: lg n, linear: n, quadratic: n2,exponential: 2n, factorial: n!, super-exponential: nn

16

In numbers §1

Approximate growth for n = 50:

1 1

lg n 5.64

n 50

n2 2500

n3 12500

2n 1125899906842620

n! 3.04 ∗ 1064

nn 8.88 ∗ 1084

17

Some more notes on complexity §1

I Constant time is written 1, but goes for any constant c

I Polynomial time contains all functions in nc for someconstant c

I Everything in this presentation will be in polynomial time

18

2. SPL Data Structures

19

Credit where credit is due §2

The first three pictures in this section are from Wikipedia

20

SplDoublyLinkedList §2

12 99 37

I Superclass of SplStack and SplQueue

I SplDoublyLinkedList is not truly a doubly linked list; itbehaves like a hashtable

20

SplDoublyLinkedList §2

12 99 37

I Superclass of SplStack and SplQueue

I SplDoublyLinkedList is not truly a doubly linked list; itbehaves like a hashtable

I Usual doubly linked list time complexityI Append/prepend to available node in Θ(1)I Lookup by scanning in O(n)I Access to beginning/end in Θ(1)

20

SplDoublyLinkedList §2

12 99 37

I Superclass of SplStack and SplQueue

I SplDoublyLinkedList is not truly a doubly linked list; itbehaves like a hashtable

I Usual doubly linked list time complexityI Append/prepend to available node in Θ(1)I Lookup by scanning in O(n)I Access to beginning/end in Θ(1)

I SplDoublyLinkedList time complexityI Insert/delete by index in Θ(1)I Lookup by index in Θ(1)I Access to beginning/end in Θ(1)

21

SplStack §2

I Subclass of SplDoublyLinkedList; adds no new operations

I Last-in, first-out (LIFO)

I Pop/push value from/on the top of the stack in Θ(1)

PopPush

22

SplQueue §2

I Subclass of SplDoublyLinkedList; adds enqueue/dequeueoperations

I First-in, first-out (FIFO)

I Read/dequeue element from front in Θ(1)

I Enqueue element to the end in Θ(1)

Dequeue

Enqueue

23

Short excursion: trees §2

100

19 36

17 3 25 1

2 7

I Consists of nodes (vertices) and directed edgesI Each node always has in-degree 1

I Except the root: always in-degree 0

I Previous property implies there are no cycles

I Binary tree: each node has at most two child-nodes

24

SplHeap, SplMaxHeap and SplMinHeap §2

100

19 36

17 3 25 1

2 7

I A heap is a tree with the heap property : for all A and B, ifB is a child node of A, then

I val(A) > val(B) for a max-heap: SplMaxHeapI val(A) 6 val(B) for a min-heap: SplMinHeap

I Where val(A) denotes the value of node A

25

Heaps contd. §2

I SplHeap is an abstract superclass

I Implemented as binary tree

I Access to root element in Θ(1)

I Insertion/deletion in O(lg n)

26

SplPriorityQueue §2

I Variant of SplMaxHeap: for all A and B, if B is a childnode of A, then prio(A) > prio(B)

I Where prio(A) denotes the priority of node A

27

SplFixedArray §2

I Fixed-size array with numerical indices onlyI Efficient OO array implementation

I No hashing required for keysI Can make assumptions about array size

I Lookup, insertion, deletion in Θ(1) time

I Resize in Θ(n)

28

SplObjectStorage §2

I Storage container for objects

I Insertion, deletion in Θ(1)

I Verification of presence in Θ(1)

I Missing: set operationsI Union, intersection, difference, etc.

29

3. Concluding

30

Missing in PHP §3

I Set data structureI Map/hashtable data structure

I Does SplDoublyLinkedList satisfy this use case?I If yes: split it in two separate structures and make

SplDoublyLinkedList a true doubly linked list

I Immutable data structuresI Allows us to more easily emulate “pure” functionsI Less bugs in your code due to lack of mutable state

31

Closing remarks §3

I Use the SPL data structures!

I Choose them with care

I Reason about your code’s complexity

32

Questions §3

Questions?