+ All Categories
Home > Technology > Masterizing php data structure 102

Masterizing php data structure 102

Date post: 26-Jun-2015
Category:
Upload: patrick-allaert
View: 9,933 times
Download: 0 times
Share this document with a friend
Description:
We all have certainly learned data structures at school: arrays, lists, sets, stacks, queues (LIFO/FIFO), heaps, associative arrays, trees, … and what do we mostly use in PHP? The “array”! In most cases, we do everything and anything with it but we stumble upon it when profiling code.During this session, we’ll learn again to use the structures appropriately, leaning closer on the way to employ arrays, the SPL and other structures from PHP extensions as well.
Popular Tags:
75
Masterizing PHP Data Structure 102 Patrick Allaert PHPBenelux Conference Antwerp 2012
Transcript
Page 1: Masterizing php data structure 102

Masterizing PHP Data Structure 102Patrick Allaert

PHPBenelux Conference Antwerp 2012

Page 2: Masterizing php data structure 102

About me

● Patrick Allaert● Founder of Libereco● Playing with PHP/Linux for +10 years● eZ Publish core developer● Author of the APM PHP extension● @patrick_allaert● [email protected]● http://github.com/patrickallaert/● http://patrickallaert.blogspot.com/

Page 3: Masterizing php data structure 102

Masterizing =Mastering +

Rising

Page 4: Masterizing php data structure 102

PHP native datatypes

● NULL (IS_NULL)

● Booleans (IS_BOOL)

● Integers (IS_LONG)

● Floating point numbers (IS_DOUBLE)

● Strings (IS_STRING)

● Arrays (IS_ARRAY, IS_CONSTANT_ARRAY)

● Objects (IS_OBJECT)

● Resources (IS_RESOURCE)

● Callable (IS_CALLABLE)

Page 5: Masterizing php data structure 102

Wikipedia datatypes●

2-3-4 tree

●2-3 heap

●2-3 tree

●AA tree

●Abstract syntax tree

●(a,b)-tree

●Adaptive k-d tree

●Adjacency list

●Adjacency matrix

●AF-heap

●Alternating decision tree

●And-inverter graph

●And–or tree

●Array

●AVL tree

●Beap

●Bidirectional map

●Bin

●Binary decision diagram

●Binary heap

●Binary search tree

●Binary tree

●Binomial heap

●Bit array

●Bitboard

●Bit field

●Bitmap

●BK-tree

●Bloom filter

● Boolean

●Bounding interval hierarchy

●B sharp tree

●BSP tree

●B-tree

●B*-tree

●B+ tree

●B-trie

●Bx-tree

●Cartesian tree

●Char

●Circular buffer

●Compressed suffix array

●Container

●Control table

●Cover tree

●Ctrie

●Dancing tree

●D-ary heap

●Decision tree

●Deque

●Directed acyclic graph

●Directed graph

●Disjoint-set

●Distributed hash table

●Double

●Doubly connected edge list

●Doubly linked list

●Dynamic array

●Enfilade

●Enumerated type

●Expectiminimax tree

●Exponential tree

●Fenwick tree

●Fibonacci heap

●Finger tree

●Float

●FM-index

●Fusion tree

●Gap buffer

●Generalised suffix tree

●Graph

●Graph-structured stack

●Hash

●Hash array mapped trie

● Hashed array tree

● Hash list

● Hash table

● Hash tree

● Hash trie

● Heap

● Heightmap

● Hilbert R-tree

● Hypergraph

● Iliffe vector

● Image

● Implicit kd-tree

● Interval tree

● Int

● Judy array

● Kdb tree

● Kd-tree

● Koorde

● Leftist heap

● Lightmap

● Linear octree

● Link/cut tree

● Linked list

● Lookup table

●Map/Associative array/Dictionary

●Matrix

●Metric tree

●Minimax tree

●Min/max kd-tree

●M-tree

●Multigraph

●Multimap

●Multiset

●Octree

●Pagoda

●Pairing heap

●Parallel array

●Parse tree

●Plain old data structure

●Prefix hash tree

●Priority queue

●Propositional directed acyclic graph

●Quad-edge

●Quadtree

●Queap

●Queue

●Radix tree

●Randomized binary search tree

●Range tree

●Rapidly-exploring random tree

●Record (also called tuple or struct)

●Red-black tree

●Rope

●Routing table

●R-tree

●R* tree

●R+ tree

●Scapegoat tree

●Scene graph

●Segment tree

●Self-balancing binary search tree

●Self-organizing list

●Set

●Skew heap

●Skip list

●Soft heap

●Sorted array

●Spaghetti stack

●Sparse array

●Sparse matrix

●Splay tree

●SPQR-tree

●Stack

●String

●Suffix array

●Suffix tree

●Symbol table

●Syntax tree

●Tagged union (variant record, discriminated union, disjoint union)

●Tango tree

●Ternary heap

●Ternary search tree

●Threaded binary tree

●Top tree

●Treap

●Tree

●Trees

●Trie

●T-tree

●UB-tree

●Union

●Unrolled linked list

●Van Emde Boas tree

●Variable-length array

●VList

●VP-tree

●Weight-balanced tree

●Winged edge

●X-fast trie

●Xor linked list

●X-tree

●Y-fast trie

●Zero suppressed decision diagram

●Zipper

●Z-order

Page 6: Masterizing php data structure 102

Game:Can you recognize some structures?

Page 7: Masterizing php data structure 102
Page 8: Masterizing php data structure 102
Page 9: Masterizing php data structure 102
Page 10: Masterizing php data structure 102
Page 11: Masterizing php data structure 102
Page 12: Masterizing php data structure 102
Page 13: Masterizing php data structure 102
Page 14: Masterizing php data structure 102

Array: PHP's untruthfulness

PHP “Arrays” are not true Arrays!

Page 15: Masterizing php data structure 102

Array: PHP's untruthfulness

PHP “Arrays” are not true Arrays!

An array is typically implemented like this:

Data DataDataData Data Data

Page 16: Masterizing php data structure 102

Array: PHP's untruthfulness

PHP “Arrays” can be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.

Page 17: Masterizing php data structure 102

Array: PHP's untruthfulness

PHP “Arrays” can be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.

Implementation based on a Doubly Linked List (DLL):

Data Data Data Data Data

Head Tail

Enables List, Deque, Queue and Stack implementations

Page 18: Masterizing php data structure 102

Array: PHP's untruthfulness

PHP “Arrays” elements are always accessible using a key (index).

Page 19: Masterizing php data structure 102

Array: PHP's untruthfulness

PHP “Arrays” elements are always accessible using a key (index).

Implementation based on a Hash Table:

Data Data Data Data Data

Head Tail

Bucket Bucket Bucket Bucket Bucket

Bucket pointers array

Bucket *

0

Bucket *

1

Bucket *

2

Bucket *

3

Bucket *

4

Bucket *

5 ...

Bucket *

nTableSize -1

Page 20: Masterizing php data structure 102

Array: PHP's untruthfulness

http://php.net/manual/en/language.types.array.php:

“This type is optimized for several different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and probably more.”

Page 21: Masterizing php data structure 102

Optimized for anything ≈ Optimized for nothing!

Page 22: Masterizing php data structure 102

Array: PHP's untruthfulness

● In C: 100 000 integers (using long on 64bits => 8 bytes) can be stored in 0.76 Mb.

● In PHP: it will take 13.97 Mb!≅● A PHP variable (containing an integer) takes 48

bytes.● The overhead of buckets for every “array” entries is

about 96 bytes.● More details:

http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html

Page 23: Masterizing php data structure 102

Data Structure

Page 24: Masterizing php data structure 102

Structs (or records, tuples,...)

● A struct is a value containing other values which are typically accessed using a name.

● Example:Person => firstName / lastNameComplexNumber => realPart / imaginaryPart

Page 25: Masterizing php data structure 102

Structs – Using array

$person = array( "firstName" => "Patrick", "lastName" => "Allaert");

Page 26: Masterizing php data structure 102

Structs – Using a class

$person = new PersonStruct( "Patrick", "Allaert");

Page 27: Masterizing php data structure 102

Structs – Using a class (Implementation)

class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; }}

Page 28: Masterizing php data structure 102

Structs – Using a class (Implementation)

class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } public function __set($key, $value) { // a. Do nothing // b. trigger_error() // c. Throws an exception }}

Page 29: Masterizing php data structure 102

Structs – Pros and Cons

Array+ Uses less memory (PHP < 5.4)

- Uses more memory (PHP = 5.4)

- No type hinting

- Flexible structure

+|- Less OO

+ Slightly faster

Class- Uses more memory (PHP < 5.4)

+ Uses less memory (PHP = 5.4)

+ Type hinting possible

+ Rigid structure

+|- More OO

- Slightly slower

Page 30: Masterizing php data structure 102

“true” Arrays

● An array is a fixed size collection where elements are each identified by a numeric index.

Page 31: Masterizing php data structure 102

“true” Arrays

● An array is a fixed size collection where elements are each identified by a numeric index.

Data DataDataData Data Data

0 1 2 3 4 5

Page 32: Masterizing php data structure 102

“true” Arrays – Using SplFixedArray

$array = new SplFixedArray(3);$array[0] = 1; // or $array->offsetSet()$array[1] = 2; // or $array->offsetSet()$array[2] = 3; // or $array->offsetSet()$array[0]; // gives 1$array[1]; // gives 2$array[2]; // gives 3

Page 33: Masterizing php data structure 102

“true” Arrays – Pros and Cons

Array- Uses more memory

+|- Less OO

+ Slightly faster

SplFixedArray+ Uses less memory

+|- More OO

- Slightly slower

Page 34: Masterizing php data structure 102

Queues

● A queue is an ordered collection respecting First In, First Out (FIFO) order.

● Elements are inserted at one end and removed at the other.

Page 35: Masterizing php data structure 102

Queues

● A queue is an ordered collection respecting First In, First Out (FIFO) order.

● Elements are inserted at one end and removed at the other.

Data DataDataData Data Data

Data

Data

Enqueue

Dequeue

Page 36: Masterizing php data structure 102

Queues – Using array

$queue = array();$queue[] = 1; // or array_push()$queue[] = 2; // or array_push()$queue[] = 3; // or array_push()array_shift($queue); // gives 1array_shift($queue); // gives 2array_shift($queue); // gives 3

Page 37: Masterizing php data structure 102

Queues – Using SplQueue

$queue = new SplQueue();$queue[] = 1; // or $queue->enqueue()$queue[] = 2; // or $queue->enqueue()$queue[] = 3; // or $queue->enqueue()$queue->dequeue(); // gives 1$queue->dequeue(); // gives 2$queue->dequeue(); // gives 3

Page 38: Masterizing php data structure 102

Queues – Pros and Cons

Array- Uses more memory

(overhead / entry: 96 bytes)

- No type hinting

+|- Less OO

SplQueue+ Uses less memory

(overhead / entry: 48 bytes)

+ Type hinting possible

+|- More OO

Page 39: Masterizing php data structure 102

Stacks

● A stack is an ordered collection respecting Last In, First Out (LIFO) order.

● Elements are inserted and removed on the same end.

Page 40: Masterizing php data structure 102

Stacks

● A stack is an ordered collection respecting Last In, First Out (LIFO) order.

● Elements are inserted and removed on the same end.

Data DataDataData Data Data

Data

Data

Push

Pop

Page 41: Masterizing php data structure 102

Stacks – Using array

$stack = array();$stack[] = 1; // or array_push()$stack[] = 2; // or array_push()$stack[] = 3; // or array_push()array_pop($stack); // gives 3array_pop($stack); // gives 2array_pop($stack); // gives 1

Page 42: Masterizing php data structure 102

Stacks – Using SplStack

$stack = new SplStack();$stack[] = 1; // or $stack->push()$stack[] = 2; // or $stack->push()$stack[] = 3; // or $stack->push()$stack->pop(); // gives 3$stack->pop(); // gives 2$stack->pop(); // gives 1

Page 43: Masterizing php data structure 102

Stacks – Pros and Cons

Array- Uses more memory

(overhead / entry: 96 bytes)

- No type hinting

+|- Less OO

Class+ Uses less memory

(overhead / entry: 48 bytes)

+ Type hinting possible

+|- More OO

Page 44: Masterizing php data structure 102

Sets

● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.

Page 45: Masterizing php data structure 102

Sets

● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.

Data

Data

Data

Data

Data

Page 46: Masterizing php data structure 102

Sets – Using array

$set = array();$set[] = 1;$set[] = 2;$set[] = 3;

in_array(2, $set); // truein_array(5, $set); // false

array_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement

Page 47: Masterizing php data structure 102

Sets – Using array

$set = array();$set[] = 1;$set[] = 2;$set[] = 3;

in_array(2, $set); // truein_array(5, $set); // false

array_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement

True performance killers!

Page 48: Masterizing php data structure 102

Sets – Using array (simple types)

$set = array();$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;

isset($set[2]); // trueisset($set[5]); // false

$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement

Page 49: Masterizing php data structure 102

Sets – Using array (simple types)

● Remember that PHP Array keys can be integers or strings only!

$set = array();$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;

isset($set[2]); // trueisset($set[5]); // false

$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement

Page 50: Masterizing php data structure 102

Sets – Using array (objects)

$set = array();$set[spl_object_hash($object1)] = $object1;$set[spl_object_hash($object2)] = $object2;$set[spl_object_hash($object3)] = $object3;

isset($set[spl_object_hash($object2)]); // trueisset($set[spl_object_hash($object5)]); // false

$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement

Page 51: Masterizing php data structure 102

Sets – Using SplObjectStorage (objects)

$set = new SplObjectStorage();$set->attach($object1); // or $set[$object1] = null;$set->attach($object2); // or $set[$object2] = null;$set->attach($object3); // or $set[$object3] = null;

isset($set[$object2]); // trueisset($set[$object2]); // false

$set1->addAll($set2); // union$set1->removeAllExcept($set2); // intersection$set1->removeAll($set2); // complement

Page 52: Masterizing php data structure 102

Sets – Using QuickHash (int)

● No union/intersection/complement operations (yet?)

● Yummy features like (loadFrom|saveTo)(String|File)

$set = new QuickHashIntSet(64, QuickHashIntSet::CHECK_FOR_DUPES);$set->add(1);$set->add(2);$set->add(3);

$set->exists(2); // true$set->exists(5); // false

Page 53: Masterizing php data structure 102

Sets – With finite possible values

define("E_ERROR", 1); // or 1<<0define("E_WARNING", 2); // or 1<<1define("E_PARSE", 4); // or 1<<2define("E_NOTICE", 8); // or 1<<3

$set = 0;$set |= E_ERROR;$set |= E_WARNING;$set |= E_PARSE;

$set & E_ERROR; // true$set & E_NOTICE; // false

$set1 | $set2; // union$set1 & $set2; // intersection$set1 ^ $set2; // complement

Page 54: Masterizing php data structure 102

Sets – With finite possible values (function features)

Instead of:function remove($path, $files = true, $directories = true, $links = true, $executable = true){ if (!$files && is_file($path)) return false; if (!$directories && is_dir($path)) return false; if (!$links && is_link($path)) return false; if (!$executable && is_executable($path)) return false; // ...}

remove("/tmp/removeMe", true, false, true, false); // WTF ?!

Page 55: Masterizing php data structure 102

Sets – With finite possible values (function features)

Instead of:define("REMOVE_FILES", 1 << 0);define("REMOVE_DIRS", 1 << 1);define("REMOVE_LINKS", 1 << 2);define("REMOVE_EXEC", 1 << 3);define("REMOVE_ALL", ~0); // Setting all bits

function remove($path, $options = REMOVE_ALL){ if (~$options & REMOVE_FILES && is_file($path)) return false; if (~$options & REMOVE_DIRS && is_dir($path)) return false; if (~$options & REMOVE_LINKS && is_link($path)) return false; if (~$options & REMOVE_EXEC && is_executable($path)) return false; // ...}

remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS); // Much better :)

Page 56: Masterizing php data structure 102

Sets: Conclusions

● Use the key and not the value when using PHP Arrays.

● Use QuickHash for set of integers if possible.● Use SplObjectStorage as soon as you are playing

with objects.● Don't use array_unique() when you need a set!

Page 57: Masterizing php data structure 102

Bloom filters

● A bloom filter is a space-efficient probabilistic data structure used to test whether an element is member of a set.

● False positives are possible, but false negatives are not!

Page 58: Masterizing php data structure 102

Bloom filters – Using bloomy

// BloomFilter::__construct(int capacity [, double error_rate [, int random_seed ] ])$bloomFilter = new BloomFilter(10000, 0.001);

$bloomFilter->add("An element");

$bloomFilter->has("An element"); // true for sure$bloomFilter->has("Foo"); // false, most probably

Page 59: Masterizing php data structure 102

Maps

● A map is a collection of key/value pairs where all keys are unique.

Page 60: Masterizing php data structure 102

Maps – Using array

● Don't use array_merge() on maps.

$map = array();$map["ONE"] = 1;$map["TWO"] = 2;$map["THREE"] = 3;

// Merging maps:array_merge($map1, $map2); // SLOW!$map2 + $map1; // Fast :)

Page 61: Masterizing php data structure 102

Multikey Maps – Using array

● Don't use array_merge() on maps.

$map = array();$map["ONE"] = 1;$map["UN"] =& $map["ONE"];$map["UNO"] =& $map["ONE"];$map["TWO"] = 2;$map["DEUX"] =& $map["TWO"];$map["DUE"] =& $map["TWO"];

$map["UNO"] = "once";$map["DEUX"] = "twice";

var_dump($map);/*array(6) {["ONE"] => &string(4) "once"["UN"] => &string(4) "once"["UNO"] => &string(4) "once"["TWO"] => &string(5) "twice"["DEUX"] => &string(5) "twice"["DUE"] => &string(5) "twice"}*/

Page 62: Masterizing php data structure 102

Heap

● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.

Page 63: Masterizing php data structure 102

Heap

● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.

Page 64: Masterizing php data structure 102

Heap – Using array

$heap = array();$heap[] = 3;sort($heap);$heap[] = 1;sort($heap);$heap[] = 2;sort($heap);

Page 65: Masterizing php data structure 102

Heap – Using Spl(Min|Max)Heap

$heap = new SplMinHeap;$heap->insert(3);$heap->insert(1);$heap->insert(2);

Page 66: Masterizing php data structure 102

Heaps: Conclusions

● MUCH faster than having to re-sort() an array at every insertion.

● If you don't require a collection to be sorted at every single step and can insert all data at once and then sort(). Array is a much better/faster approach.

● SplPriorityQueue is very similar, consider it is the same as SplHeap but where the sorting is made on the key rather than the value.

Page 67: Masterizing php data structure 102

Other related projects

● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types

Page 68: Masterizing php data structure 102

Other related projects

● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types

● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy

Page 69: Masterizing php data structure 102

Other related projects

● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types

● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy

● Weakref: Weak references implementation. Provides a gateway to an object without preventing that object from being collected by the garbage collector.

Page 70: Masterizing php data structure 102

Conclusions

● Use appropriate data structure. It will keep your code clean and fast.

Page 71: Masterizing php data structure 102

Conclusions

● Use appropriate data structure. It will keep your code clean and fast.

● Think about the time and space complexity involved by your algorithms.

Page 72: Masterizing php data structure 102

Conclusions

● Use appropriate data structure. It will keep your code clean and fast.

● Think about the time and space complexity involved by your algorithms.

● Name your variables accordingly: use “Map”, “Set”, “List”, “Queue”,... to describe them instead of using something like: $ordersArray.

Page 73: Masterizing php data structure 102

Questions?

Page 74: Masterizing php data structure 102

Thanks

● Don't forget to rate this talk on https://joind.in/4753

Page 75: Masterizing php data structure 102

Photo Credits

● Northstar Ski Jump: http://www.flickr.com/photos/renotahoe/5593248965

● Tuned car:http://www.flickr.com/photos/gioxxswall/5783867752

● London Eye Structure: http://www.flickr.com/photos/photographygal123/4883546484

● Cigarette:http://www.flickr.com/photos/superfantastic/166215927

● Heap structure:http://en.wikipedia.org/wiki/File:Max-Heap.svg


Recommended