Date post: | 26-Jun-2015 |
Category: |
Technology |
Upload: | patrick-allaert |
View: | 9,933 times |
Download: | 0 times |
Masterizing PHP Data Structure 102Patrick Allaert
PHPBenelux Conference Antwerp 2012
About me
● Patrick Allaert● Founder of Libereco● Playing with PHP/Linux for +10 years● eZ Publish core developer● Author of the APM PHP extension● @patrick_allaert● [email protected]● http://github.com/patrickallaert/● http://patrickallaert.blogspot.com/
Masterizing =Mastering +
Rising
PHP native datatypes
● NULL (IS_NULL)
● Booleans (IS_BOOL)
● Integers (IS_LONG)
● Floating point numbers (IS_DOUBLE)
● Strings (IS_STRING)
● Arrays (IS_ARRAY, IS_CONSTANT_ARRAY)
● Objects (IS_OBJECT)
● Resources (IS_RESOURCE)
● Callable (IS_CALLABLE)
Wikipedia datatypes●
2-3-4 tree
●2-3 heap
●2-3 tree
●AA tree
●Abstract syntax tree
●(a,b)-tree
●Adaptive k-d tree
●Adjacency list
●Adjacency matrix
●AF-heap
●Alternating decision tree
●And-inverter graph
●And–or tree
●Array
●AVL tree
●Beap
●Bidirectional map
●Bin
●Binary decision diagram
●Binary heap
●Binary search tree
●Binary tree
●Binomial heap
●Bit array
●Bitboard
●Bit field
●Bitmap
●BK-tree
●Bloom filter
● Boolean
●Bounding interval hierarchy
●B sharp tree
●BSP tree
●B-tree
●B*-tree
●B+ tree
●B-trie
●Bx-tree
●Cartesian tree
●Char
●Circular buffer
●Compressed suffix array
●Container
●Control table
●Cover tree
●Ctrie
●Dancing tree
●D-ary heap
●Decision tree
●Deque
●Directed acyclic graph
●Directed graph
●Disjoint-set
●Distributed hash table
●Double
●Doubly connected edge list
●Doubly linked list
●Dynamic array
●Enfilade
●Enumerated type
●Expectiminimax tree
●Exponential tree
●Fenwick tree
●Fibonacci heap
●Finger tree
●Float
●FM-index
●Fusion tree
●Gap buffer
●Generalised suffix tree
●Graph
●Graph-structured stack
●Hash
●Hash array mapped trie
● Hashed array tree
● Hash list
● Hash table
● Hash tree
● Hash trie
● Heap
● Heightmap
● Hilbert R-tree
● Hypergraph
● Iliffe vector
● Image
● Implicit kd-tree
● Interval tree
● Int
● Judy array
● Kdb tree
● Kd-tree
● Koorde
● Leftist heap
● Lightmap
● Linear octree
● Link/cut tree
● Linked list
● Lookup table
●Map/Associative array/Dictionary
●Matrix
●Metric tree
●Minimax tree
●Min/max kd-tree
●M-tree
●Multigraph
●Multimap
●Multiset
●Octree
●Pagoda
●Pairing heap
●Parallel array
●Parse tree
●Plain old data structure
●Prefix hash tree
●Priority queue
●Propositional directed acyclic graph
●Quad-edge
●Quadtree
●Queap
●Queue
●Radix tree
●Randomized binary search tree
●Range tree
●Rapidly-exploring random tree
●Record (also called tuple or struct)
●Red-black tree
●Rope
●Routing table
●R-tree
●R* tree
●R+ tree
●Scapegoat tree
●Scene graph
●Segment tree
●Self-balancing binary search tree
●Self-organizing list
●Set
●Skew heap
●Skip list
●Soft heap
●Sorted array
●Spaghetti stack
●Sparse array
●Sparse matrix
●Splay tree
●SPQR-tree
●Stack
●String
●Suffix array
●Suffix tree
●Symbol table
●Syntax tree
●Tagged union (variant record, discriminated union, disjoint union)
●Tango tree
●Ternary heap
●Ternary search tree
●Threaded binary tree
●Top tree
●Treap
●Tree
●Trees
●Trie
●T-tree
●UB-tree
●Union
●Unrolled linked list
●Van Emde Boas tree
●Variable-length array
●VList
●VP-tree
●Weight-balanced tree
●Winged edge
●X-fast trie
●Xor linked list
●X-tree
●Y-fast trie
●Zero suppressed decision diagram
●Zipper
●Z-order
Game:Can you recognize some structures?
Array: PHP's untruthfulness
PHP “Arrays” are not true Arrays!
Array: PHP's untruthfulness
PHP “Arrays” are not true Arrays!
An array is typically implemented like this:
Data DataDataData Data Data
Array: PHP's untruthfulness
PHP “Arrays” can be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.
Array: PHP's untruthfulness
PHP “Arrays” can be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.
Implementation based on a Doubly Linked List (DLL):
Data Data Data Data Data
Head Tail
Enables List, Deque, Queue and Stack implementations
Array: PHP's untruthfulness
PHP “Arrays” elements are always accessible using a key (index).
Array: PHP's untruthfulness
PHP “Arrays” elements are always accessible using a key (index).
Implementation based on a Hash Table:
Data Data Data Data Data
Head Tail
Bucket Bucket Bucket Bucket Bucket
Bucket pointers array
Bucket *
0
Bucket *
1
Bucket *
2
Bucket *
3
Bucket *
4
Bucket *
5 ...
Bucket *
nTableSize -1
Array: PHP's untruthfulness
http://php.net/manual/en/language.types.array.php:
“This type is optimized for several different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and probably more.”
Optimized for anything ≈ Optimized for nothing!
Array: PHP's untruthfulness
● In C: 100 000 integers (using long on 64bits => 8 bytes) can be stored in 0.76 Mb.
● In PHP: it will take 13.97 Mb!≅● A PHP variable (containing an integer) takes 48
bytes.● The overhead of buckets for every “array” entries is
about 96 bytes.● More details:
http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html
Data Structure
Structs (or records, tuples,...)
● A struct is a value containing other values which are typically accessed using a name.
● Example:Person => firstName / lastNameComplexNumber => realPart / imaginaryPart
Structs – Using array
$person = array( "firstName" => "Patrick", "lastName" => "Allaert");
Structs – Using a class
$person = new PersonStruct( "Patrick", "Allaert");
Structs – Using a class (Implementation)
class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; }}
Structs – Using a class (Implementation)
class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } public function __set($key, $value) { // a. Do nothing // b. trigger_error() // c. Throws an exception }}
Structs – Pros and Cons
Array+ Uses less memory (PHP < 5.4)
- Uses more memory (PHP = 5.4)
- No type hinting
- Flexible structure
+|- Less OO
+ Slightly faster
Class- Uses more memory (PHP < 5.4)
+ Uses less memory (PHP = 5.4)
+ Type hinting possible
+ Rigid structure
+|- More OO
- Slightly slower
“true” Arrays
● An array is a fixed size collection where elements are each identified by a numeric index.
“true” Arrays
● An array is a fixed size collection where elements are each identified by a numeric index.
Data DataDataData Data Data
0 1 2 3 4 5
“true” Arrays – Using SplFixedArray
$array = new SplFixedArray(3);$array[0] = 1; // or $array->offsetSet()$array[1] = 2; // or $array->offsetSet()$array[2] = 3; // or $array->offsetSet()$array[0]; // gives 1$array[1]; // gives 2$array[2]; // gives 3
“true” Arrays – Pros and Cons
Array- Uses more memory
+|- Less OO
+ Slightly faster
SplFixedArray+ Uses less memory
+|- More OO
- Slightly slower
Queues
● A queue is an ordered collection respecting First In, First Out (FIFO) order.
● Elements are inserted at one end and removed at the other.
Queues
● A queue is an ordered collection respecting First In, First Out (FIFO) order.
● Elements are inserted at one end and removed at the other.
Data DataDataData Data Data
Data
Data
Enqueue
Dequeue
Queues – Using array
$queue = array();$queue[] = 1; // or array_push()$queue[] = 2; // or array_push()$queue[] = 3; // or array_push()array_shift($queue); // gives 1array_shift($queue); // gives 2array_shift($queue); // gives 3
Queues – Using SplQueue
$queue = new SplQueue();$queue[] = 1; // or $queue->enqueue()$queue[] = 2; // or $queue->enqueue()$queue[] = 3; // or $queue->enqueue()$queue->dequeue(); // gives 1$queue->dequeue(); // gives 2$queue->dequeue(); // gives 3
Queues – Pros and Cons
Array- Uses more memory
(overhead / entry: 96 bytes)
- No type hinting
+|- Less OO
SplQueue+ Uses less memory
(overhead / entry: 48 bytes)
+ Type hinting possible
+|- More OO
Stacks
● A stack is an ordered collection respecting Last In, First Out (LIFO) order.
● Elements are inserted and removed on the same end.
Stacks
● A stack is an ordered collection respecting Last In, First Out (LIFO) order.
● Elements are inserted and removed on the same end.
Data DataDataData Data Data
Data
Data
Push
Pop
Stacks – Using array
$stack = array();$stack[] = 1; // or array_push()$stack[] = 2; // or array_push()$stack[] = 3; // or array_push()array_pop($stack); // gives 3array_pop($stack); // gives 2array_pop($stack); // gives 1
Stacks – Using SplStack
$stack = new SplStack();$stack[] = 1; // or $stack->push()$stack[] = 2; // or $stack->push()$stack[] = 3; // or $stack->push()$stack->pop(); // gives 3$stack->pop(); // gives 2$stack->pop(); // gives 1
Stacks – Pros and Cons
Array- Uses more memory
(overhead / entry: 96 bytes)
- No type hinting
+|- Less OO
Class+ Uses less memory
(overhead / entry: 48 bytes)
+ Type hinting possible
+|- More OO
Sets
● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.
Sets
● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.
Data
Data
Data
Data
Data
Sets – Using array
$set = array();$set[] = 1;$set[] = 2;$set[] = 3;
in_array(2, $set); // truein_array(5, $set); // false
array_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement
Sets – Using array
$set = array();$set[] = 1;$set[] = 2;$set[] = 3;
in_array(2, $set); // truein_array(5, $set); // false
array_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement
True performance killers!
Sets – Using array (simple types)
$set = array();$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;
isset($set[2]); // trueisset($set[5]); // false
$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
Sets – Using array (simple types)
● Remember that PHP Array keys can be integers or strings only!
$set = array();$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;
isset($set[2]); // trueisset($set[5]); // false
$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
Sets – Using array (objects)
$set = array();$set[spl_object_hash($object1)] = $object1;$set[spl_object_hash($object2)] = $object2;$set[spl_object_hash($object3)] = $object3;
isset($set[spl_object_hash($object2)]); // trueisset($set[spl_object_hash($object5)]); // false
$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
Sets – Using SplObjectStorage (objects)
$set = new SplObjectStorage();$set->attach($object1); // or $set[$object1] = null;$set->attach($object2); // or $set[$object2] = null;$set->attach($object3); // or $set[$object3] = null;
isset($set[$object2]); // trueisset($set[$object2]); // false
$set1->addAll($set2); // union$set1->removeAllExcept($set2); // intersection$set1->removeAll($set2); // complement
Sets – Using QuickHash (int)
● No union/intersection/complement operations (yet?)
● Yummy features like (loadFrom|saveTo)(String|File)
$set = new QuickHashIntSet(64, QuickHashIntSet::CHECK_FOR_DUPES);$set->add(1);$set->add(2);$set->add(3);
$set->exists(2); // true$set->exists(5); // false
Sets – With finite possible values
define("E_ERROR", 1); // or 1<<0define("E_WARNING", 2); // or 1<<1define("E_PARSE", 4); // or 1<<2define("E_NOTICE", 8); // or 1<<3
$set = 0;$set |= E_ERROR;$set |= E_WARNING;$set |= E_PARSE;
$set & E_ERROR; // true$set & E_NOTICE; // false
$set1 | $set2; // union$set1 & $set2; // intersection$set1 ^ $set2; // complement
Sets – With finite possible values (function features)
Instead of:function remove($path, $files = true, $directories = true, $links = true, $executable = true){ if (!$files && is_file($path)) return false; if (!$directories && is_dir($path)) return false; if (!$links && is_link($path)) return false; if (!$executable && is_executable($path)) return false; // ...}
remove("/tmp/removeMe", true, false, true, false); // WTF ?!
Sets – With finite possible values (function features)
Instead of:define("REMOVE_FILES", 1 << 0);define("REMOVE_DIRS", 1 << 1);define("REMOVE_LINKS", 1 << 2);define("REMOVE_EXEC", 1 << 3);define("REMOVE_ALL", ~0); // Setting all bits
function remove($path, $options = REMOVE_ALL){ if (~$options & REMOVE_FILES && is_file($path)) return false; if (~$options & REMOVE_DIRS && is_dir($path)) return false; if (~$options & REMOVE_LINKS && is_link($path)) return false; if (~$options & REMOVE_EXEC && is_executable($path)) return false; // ...}
remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS); // Much better :)
Sets: Conclusions
● Use the key and not the value when using PHP Arrays.
● Use QuickHash for set of integers if possible.● Use SplObjectStorage as soon as you are playing
with objects.● Don't use array_unique() when you need a set!
Bloom filters
● A bloom filter is a space-efficient probabilistic data structure used to test whether an element is member of a set.
● False positives are possible, but false negatives are not!
Bloom filters – Using bloomy
// BloomFilter::__construct(int capacity [, double error_rate [, int random_seed ] ])$bloomFilter = new BloomFilter(10000, 0.001);
$bloomFilter->add("An element");
$bloomFilter->has("An element"); // true for sure$bloomFilter->has("Foo"); // false, most probably
Maps
● A map is a collection of key/value pairs where all keys are unique.
Maps – Using array
● Don't use array_merge() on maps.
$map = array();$map["ONE"] = 1;$map["TWO"] = 2;$map["THREE"] = 3;
// Merging maps:array_merge($map1, $map2); // SLOW!$map2 + $map1; // Fast :)
Multikey Maps – Using array
● Don't use array_merge() on maps.
$map = array();$map["ONE"] = 1;$map["UN"] =& $map["ONE"];$map["UNO"] =& $map["ONE"];$map["TWO"] = 2;$map["DEUX"] =& $map["TWO"];$map["DUE"] =& $map["TWO"];
$map["UNO"] = "once";$map["DEUX"] = "twice";
var_dump($map);/*array(6) {["ONE"] => &string(4) "once"["UN"] => &string(4) "once"["UNO"] => &string(4) "once"["TWO"] => &string(5) "twice"["DEUX"] => &string(5) "twice"["DUE"] => &string(5) "twice"}*/
Heap
● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.
Heap
● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.
Heap – Using array
$heap = array();$heap[] = 3;sort($heap);$heap[] = 1;sort($heap);$heap[] = 2;sort($heap);
Heap – Using Spl(Min|Max)Heap
$heap = new SplMinHeap;$heap->insert(3);$heap->insert(1);$heap->insert(2);
Heaps: Conclusions
● MUCH faster than having to re-sort() an array at every insertion.
● If you don't require a collection to be sorted at every single step and can insert all data at once and then sort(). Array is a much better/faster approach.
● SplPriorityQueue is very similar, consider it is the same as SplHeap but where the sorting is made on the key rather than the value.
Other related projects
● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
Other related projects
● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy
Other related projects
● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy
● Weakref: Weak references implementation. Provides a gateway to an object without preventing that object from being collected by the garbage collector.
Conclusions
● Use appropriate data structure. It will keep your code clean and fast.
Conclusions
● Use appropriate data structure. It will keep your code clean and fast.
● Think about the time and space complexity involved by your algorithms.
Conclusions
● Use appropriate data structure. It will keep your code clean and fast.
● Think about the time and space complexity involved by your algorithms.
● Name your variables accordingly: use “Map”, “Set”, “List”, “Queue”,... to describe them instead of using something like: $ordersArray.
Questions?
Thanks
● Don't forget to rate this talk on https://joind.in/4753
Photo Credits
● Northstar Ski Jump: http://www.flickr.com/photos/renotahoe/5593248965
● Tuned car:http://www.flickr.com/photos/gioxxswall/5783867752
● London Eye Structure: http://www.flickr.com/photos/photographygal123/4883546484
● Cigarette:http://www.flickr.com/photos/superfantastic/166215927
● Heap structure:http://en.wikipedia.org/wiki/File:Max-Heap.svg