Date post: | 23-Jan-2018 |
Category: |
Software |
Upload: | mark-baker |
View: | 806 times |
Download: | 1 times |
Exploring DataStructures
SPL – Standard PHP Library
• SPL provides a standard set of interfaces for PHP5
• The aim of SPL is to implement some efficient data access interfaces and classes for PHP
• Introduced with PHP 5.0.0
• Included as standard with PHP since version 5.3.0
• SPL DataStructures were added for version 5.3.0
SPL DataStructures
Dictionary DataStructures (Maps)• Fixed Arrays
Linear DataStructures• Doubly-Linked Lists
• Stacks
• Queues
Tree DataStructures• Heaps
SPL DataStructures – Why use them?
• Can improve performance• When the right structures are used in the right place
• Can reduce memory usage• When the right structures are used in the right place
• Already implemented and tested in PHP core• Saves work!
• Can be type-hinted in function/method definitions
SPL DataStructures
Dictionary DataStructures (Maps)• Fixed Arrays
Linear DataStructures
Tree DataStructures
Fixed Arrays – SplFixedArray
• Predefined Size
• Enumerated indexes only, not Associative
• Indexed from 0, Increments by 1
• Is an object
• No hashing required for keys
• Implements • Iterator• ArrayAccess• Countable
Fixed Arrays – Uses
• Returned Database resultsets, Record collections
• Hours of Day
• Days of Month/Year
• Hotel Rooms, Airline seatsAs a 2-d fixed array of fixed array
Fixed Arrays – Big-O Complexity
• Insert an element O(1)
• Delete an element O(1)
• Lookup an element O(1)
• Resize a Fixed Array O(n)
Fixed Arrays
Standard Arrays SplFixedArray
Data Record 1Key 12345
Data Record 2Key 23456
Data Record 4Key 34567
Data Record 3Key 45678
[0]
[1]
[2]
[…]
[…]
[12]
[n-1]
HashFunction
Key 12345
Key 23456
Key 45678
Key 34567
Data Record 1Key 0
Data Record 2Key 1
Data Record 3Key 2
Data Record 4Key 3
[0]
[1]
[2]
[…]
[…]
[12]
[n-1]
Key 0
Key 1
Key 2
Key 3
Fixed Arrays
$a = array();
for ($i = 0; $i < $size; ++$i) {
$a[$i] = $i;
}
// Random/Indexed access
for ($i = 0; $i < $size; ++$i) {
$r = $a[$i];
}
// Sequential access
foreach($a as $v) {
}
// Sequential access with keys
foreach($a as $k => $v) {
}
Initialise: 0.0000 sSet 1,000,000 Entries: 0.1323 sRandom Read 1,000,000 Entries: 0.3311 sIterate values for 1,000,000 Entries: 0.0146 sIterate keys and values for 1,000,000 Entries: 0.0198 s
Total Time: 0.4979 sMemory: 41,668.32 k
Fixed Arrays
$a = new \SplFixedArray($size);
for ($i = 0; $i < $size; ++$i) {
$a[$i] = $i;
}
// Random/Indexed access
for ($i = 0; $i < $size; ++$i) {
$r = $a[$i];
}
// Sequential access
foreach($a as $v) {
}
// Sequential access with keys
foreach($a as $k => $v) {
}
Initialise: 0.0011 sSet 1,000,000 Entries: 0.1061 sRandom Read 1,000,000 Entries: 0.3144 sIterate values for 1,000,000 Entries: 0.0394 sIterate keys and values for 1,000,000 Entries: 0.0476 s
Total Time: 0.5086 sMemory: 17,889.43 k
Fixed Arrays
0.0000
0.0500
0.1000
0.1500
0.2000
0.2500
0.3000
0.3500
Initialise (s) Set Values (s) Sequential Read (s) Sequential Readwith Key (s)
Random Read (s) Pop (s)
Speed
SPL Fixed Array Standard PHP Array
0.00
5,000.00
10,000.00
15,000.00
20,000.00
25,000.00
30,000.00
35,000.00
40,000.00
45,000.00
Current Memory (k) Peak Memory (k)
Memory Usage
SPL Fixed Array Standard PHP Array
Fixed Arrays
• Faster to populate
• Lower memory usage
• Faster for random/indexed access than standard PHP arrays
• Slower for sequential access than standard PHP arrays
Fixed Arrays – Gotchas
• Can be extended, but at a cost in speed
• Standard array functions won’t work with SplFixedArraye.g. array_walk(), sort(), array_pop(), array_intersect(), etc
• Avoid unsetting elements if possible• Unlike standard PHP enumerated arrays, this leaves empty nodes that trigger
an Exception if accessed
SPL DataStructures
Dictionary DataStructures (Maps)
Linear DataStructures• Doubly-Linked Lists
• Stacks
• Queues
Tree DataStructures
Doubly Linked Lists – SplDoublyLinkedList
Doubly Linked Lists
• Iterable Lists• Top to Bottom• Bottom to Top
• Unindexed
• Good for sequential access• Not good for random/indexed access
• Implements • Iterator• ArrayAccess• Countable
Doubly Linked Lists – Uses
• Stacks
• Queues
• Most-recently used lists
• Undo functionality
• Trees
• Memory Allocators
• Fast dynamic, iterable arrays
• iNode maps
• Video frame queues
Doubly Linked Lists – Big-O Complexity
• Insert an element by index O(1)
• Delete an element by index O(1)
• Lookup by index O(n)• I have seen people saying that SplDoublyLinkedList behaves like a hash table
for lookups, which would make it O(1); but timing tests prove otherwise
• Access a node at the beginning of the list O(1)
• Access a node at the end of the list O(1)
Doubly Linked Lists
Head Tail
A B C D E
Doubly Linked Lists
$a = array();
for ($i = 0; $i < $size; ++$i) {
$a[$i] = $i;
}
// Random/Indexed access
for ($i = 0; $i < $size; ++$i) {
$r = $a[$i];
}
// Sequential access
for ($i = $size-1; $i >= 0; --$i) {
$r = array_pop($a);
}
Initialise: 0.0000 sSet 100,000 Entries: 0.0585 sRandom Read 100,000 Entries: 0.0378 sPop 100,000 Entries: 0.1383 sTotal Time: 0.2346 s
Memory: 644.55 kPeak Memory: 8457.91 k
Doubly Linked Lists
$a = new \SplDoublyLinkedList();
for ($i = 0; $i < $size; ++$i) {
$a->push($i);
}
// Random/Indexed access
for ($i = 0; $i < $size; ++$i) {
$r = $a->offsetGet($i);
}
// Sequential access
for ($i = $size-1; $i >= 0; --$i) {
$a->pop();
}
Initialise: 0.0000 sSet 100,000 Entries: 0.1514 sRandom Read 100,000 Entries: 22.7068 sPop 100,000 Entries: 0.1465 sTotal Time: 23.0047 s
Memory: 133.67 kPeak Memory: 5603.09 k
Doubly Linked Lists
• Fast for sequential access
• Lower memory usage
• Traversable in both directions
Use setIteratorMode() to determine direction
• Size limited only by memory
• Slow for random/indexed access
• Insert into middle of list only available from PHP 5.5.0
SPL DataStructures
Dictionary DataStructures (Maps)
Linear DataStructures• Doubly-Linked Lists
• Stacks
• Queues
Tree DataStructures
Stacks – SplStack
Stacks
• Implemented as a Doubly-Linked List
• LIFO• Last-In• First-Out
• Essential Operations• push()• pop()
• Optional Operations• count()• isEmpty()• peek()
Stack – Uses
• Undo mechanism (e.g. In text editors)
• Backtracking (e.g. Finding a route through a maze or network)
• Call Handler (e.g. Defining return location for nested calls)
• Shunting Yard Algorithm (e.g. Converting Infix to Postfix notation)
• Evaluating a Postfix Expression
• Depth-First Search
Stacks – Big-O Complexity
• Push an element O(1)
• Pop an element O(1)
Stacks
class StandardArrayStack {
private $stack = array();
public function count() {
return count($this->stack);
}
public function push($data) {
$this->stack[] = $data;
}
public function pop() {
if (count($this->stack) > 0) {
return array_pop($this->stack);
}
return NULL;
}
function isEmpty() {
return count($this->stack) == 0;
}
}
Stacks
$a = new \StandardArrayStack();
for ($i = 1; $i <= $size; ++$i) {
$a->push($i);
}
while (!$a->isEmpty()) {
$i = $a->pop();
}
PUSH 100,000 ENTRIESPush Time: 0.5818 sCurrent Memory: 8.75
POP 100,000 ENTRIESPop Time: 1.6657 sCurrent Memory: 2.25
Total Time: 2.2488 sCurrent Memory: 2.25Peak Memory: 8.75
Stacks
class StandardArrayStack2 {
private $stack = array();
private $count = 0;
public function count() {
return $this->count;
}
public function push($data) {
++$this->count;
$this->stack[] = $data;
}
public function pop() {
if ($this->count > 0) {
--$this->count;
return array_pop($this->stack);
}
return NULL;
}
function isEmpty() {
return $this->count == 0;
}
}
Stacks
$a = new \StandardArrayStack2();
for ($i = 1; $i <= $size; ++$i) {
$a->push($i);
}
while (!$a->isEmpty()) {
$i = $a->pop();
}
PUSH 100,000 ENTRIESPush Time: 0.5699 sCurrent Memory: 8.75
POP 100,000 ENTRIESPop Time: 1.1005 sCurrent Memory: 1.75
Total Time: 1.6713 sCurrent Memory: 1.75Peak Memory: 8.75
Stacks
$a = new \SplStack();
for ($i = 1; $i <= $size; ++$i) {
$a->push($i);
}
while (!$a->isEmpty()) {
$i = $a->pop();
}
PUSH 100,000 ENTRIESPush Time: 0.4301 sCurrent Memory: 5.50
POP 100,000 ENTRIESPop Time: 0.6413 sCurrent Memory: 0.75
Total Time: 1.0723 sCurrent Memory: 0.75Peak Memory: 5.50
Stacks
0.0796 0.0782
0.0644
0.1244
0.0998
0.0693
8.75 8.75
5.50
0
1
2
3
4
5
6
7
8
9
10
0.0000
0.0200
0.0400
0.0600
0.0800
0.1000
0.1200
0.1400
StandardArrayStack StandardArrayStack2 SPLStack
Mem
ory
(M
B)
Tim
e (s
eco
nd
s)
Stack Timings
Push Time (s)
Pop Time (s)
Memory after Push (MB)
Stacks – Gotchas• Peek (view an entry from the middle of the stack)
• StandardArrayStackpublic function peek($n = 0) {
if ((count($this->stack) - $n) < 0) {
return NULL;
}
return $this->stack[count($this->stack) - $n - 1];
}
• StandardArrayStack2public function peek($n = 0) {
if (($this->count - $n) < 0) {
return NULL;
}
return $this->stack[$this->count - $n - 1];
}
• SplStack$r = $a->offsetGet($n);
Stacks – Gotchas
0.0075 0.0077 0.00640.0111
0.0078
0.1627
0.01240.0098
0.0066
1.00 1.00
0.75
0.00
0.20
0.40
0.60
0.80
1.00
1.20
0.0000
0.0200
0.0400
0.0600
0.0800
0.1000
0.1200
0.1400
0.1600
0.1800
StandardArrayStack StandardArrayStack2 SPLStack
Mem
ory
(M
B)
Tim
e (s
eco
nd
s)
Stack Timings
Push Time (s)
Peek Time (s)
Pop Time (s)
Memory after Push (MB)
Stacks – Gotchas
• PeekWhen looking through the stack, SplStack has to follow each link in the “chain” until it finds the nth entry
O(n)
SPL DataStructures
Dictionary DataStructures (Maps)
Linear DataStructures• Doubly-Linked Lists
• Stacks
• Queues
Tree DataStructures
Queues – SplQueue
Queues
• Implemented as a Doubly-Linked List
• FIFO• First-In• First-Out
• Essential Operations• enqueue()• dequeue()
• Optional Operations• count()• isEmpty()• peek()
Queues – Uses
• Job/print/message submissions
• Breadth-First Search
• Request handling (e.g. a Web server)
Queues – Big-O Complexity
• Enqueue an element O(1)
• Dequeue an element O(1)
Queues
class StandardArrayQueue2 {
private $queue = array();
private $count = 0;
public function count() {
return $this->count;
}
public function enqueue($data) {
++$this->count;
$this->queue[] = $data;
}
public function dequeue() {
if ($this->count > 0) {
--$this->count;
return array_shift($this->queue);
}
return NULL;
}
function isEmpty() {
return $this->count == 0;
}
}
Queues
$a = new \StandardArrayQueue2();
for ($i = 1; $i <= $size; ++$i) {
$a->enqueue($i);
}
while (!$a->isEmpty()) {
$i = $a->dequeue();
}
ENQUEUE 100,000 ENTRIESEnqueue Time: 0.6884Current Memory: 8.75
DEQUEUE 100,000 ENTRIESDequeue Time: 335.8434Current Memory: 1.75
Total Time: 336.5330Current Memory: 1.75Peak Memory: 8.75
Queues
$a = new \SplQueue();
for ($i = 1; $i <= $size; ++$i) {
$a->enqueue($i);
}
while (!$a->isEmpty()) {
$i = $a->dequeue();
}
ENQUEUE 100,000 ENTRIESEnqueue Time: 0.4087Current Memory: 5.50
DEQUEUE 100,000 ENTRIESDequeue Time: 0.6148Current Memory: 0.75
Total Time: 1.0249Current Memory: 0.75Peak Memory: 5.50
Queues
0.0075 0.0080 0.00640.0087 0.0070
0.1582
0.6284 0.6277
0.0066
1.00 1.00
0.75
0.00
0.20
0.40
0.60
0.80
1.00
1.20
0.0000
0.1000
0.2000
0.3000
0.4000
0.5000
0.6000
0.7000
StandardArrayQueue StandardArrayQueue2 SPLQueue
Mem
ory
(M
B)
Tim
e (s
eco
nd
s)
Queue Timings
Enqueue Time (s)
Peek Time (s)
Dequeue Time (s)
Memory after Enqueue (MB)
Queues – Gotchas
• DequeueIn standard PHP enumerated arrays, shift() and unshift() are expensive operations because they re-index the entire array
This problem does not apply to SplQueue
• PeekWhen looking through the queue, SplQueue has to follow each link in the “chain” until it finds the nth entry
SPL DataStructures
Dictionary DataStructures (Maps)
Linear DataStructures
Tree DataStructures• Heaps
Heaps – SplHeap
Heaps
• Ordered Lists• Random Input• Ordered Output
• Implemented as a binary tree structure• Essential Operations
• Insert• Extract• Ordering Rule
• Abstract that requires extending with the implementation of a compare() algorithm• compare() is reversed in comparison with usort compare callbacks
• Partially sorted on data entry
Heaps
• Implements • Iterator
• Countable
Heaps – Uses
• Heap sort
• Selection algorithms (e.g. Max, Min, Median)
• Graph algorithms• Prim’s Minimal Spanning Tree (connected weighted undirected graph)
• Dijkstra’s Shortest Path (network or traffic routing)
• Priority Queues
Heaps – Big-O Complexity
• Insert an element O(log n)
• Delete an element O(log n)
• Access root element O(1)
Heaps
class DistanceSplHeap extends \SplHeap {
protected $longitude = 0.0;
protected $latitude = 0.0;
private $comparator;
public function __construct(Callable $comparator,
$longitude, $latitude) {
$this->longitude = $longitude;
$this->latitude = $latitude;
$this->comparator = $comparator;
}
protected function compare($a, $b) {
return call_user_func(
$this->comparator, $a, $b
);
}
public function insert($value) {
$value->distance =
$this->calculateDistance($value);
parent::insert($value);
}
}
$comparator = function($a, $b) {
if ($a->distance == $b->distance)
return 0;
return ($a->distance > $b->distance)
? -1
: 1;
};
// Latitude and Longitude for Barcelona
$speakersHeap = new \DistanceSplHeap(
$comparator,
2.1833,
41.3833
);
Heaps
$file = new SplFileObject("speakers.csv");
$file->setFlags(
SplFileObject::DROP_NEW_LINE |
SplFileObject::SKIP_EMPTY
);
while (!$file->eof()) {
$speakerData = $file->fgetcsv();
$speaker = new \StdClass;
$speaker->name = $speakerData[0];
$speaker->from = $speakerData[1];
$speaker->latitude = $speakerData[2];
$speaker->longitude = $speakerData[3];
$speakersHeap->insert($speaker);
}
"Anthony Ferrara","New Jersey, USA",40.0000,-74.5000"Bastian Hofmann","Berlin, Germany",52.5167,13.3833"Damien Seguy","The Hague, Netherlands",52.0833,4.3167"Derick Rethans","London, UK",51.5072,-0.1275"Juozas Kaziukėnas","New York, USA",40.7127,-74.0059"Marcello Duarte","London, UK",51.5072,-0.1275"Mark Baker","Wigan, UK",53.5448,-2.6318"Mathias Verraes","Belgium",50.8500,4.3500"Matthias Noback","Utrecht, Netherlands",52.0833,5.1167"Nikita Popov","Berlin, Germany",52.5167,13.3833"Paweł Jędrzejewski","Łódź, Poland",51.7833,19.4667"Steve Maraspin","Udine, Italy",46.0667,13.2333"Tudor Barbu","Barcelona, Catalonia",41.3833,2.1833"Zeev Suraski","Israel",31.0000,35.0000
Heaps
echo 'There are ', $speakersHeap->count(),
' speakers at PHPBarcelona 2015',
PHP_EOL, PHP_EOL;
echo 'Distance that they have travelled to
reach Barcelona', PHP_EOL, PHP_EOL;
foreach($speakersHeap as $speaker) {
echo sprintf(
"%-22s from %-
24s has travelled %6.1f miles".PHP_EOL,
$speaker->name,
$speaker->from,
$speaker->distance
);
}
echo PHP_EOL;
There are 14 speakers at PHPBarcelona 2015
Distance that they have travelled to reach Barcelona
Tudor Barbu from Barcelona, Catalonia has travelled 0.0 miles
Steve Maraspin from Udine, Italy has travelled 638.8 miles
Mathias Verraes from Belgium has travelled 662.2 miles
Derick Rethans from London, UK has travelled 708.0 miles
Marcello Duarte from London, UK has travelled 708.0 miles
Damien Seguy from The Hague, Netherlands has travelled 746.0 miles
Matthias Noback from Utrecht, Netherlands has travelled 752.0 miles
Mark Baker from Wigan, UK has travelled 869.3 miles
Nikita Popov from Berlin, Germany has travelled 930.8 miles
Bastian Hofmann from Berlin, Germany has travelled 930.8 miles
Paweł Jędrzejewski from Łódź, Poland has travelled 1085.9 miles
Zeev Suraski from Israel has travelled 1951.0 miles
Juozas Kaziukėnas from New York, USA has travelled 3831.8 miles
Anthony Ferrara from New Jersey, USA has travelled 3877.9 miles
Heaps – Gotchas
• Compare method is reversed logic from a usort() callback
• Traversing the heap removes elements from the heap
SPL – Standard PHP Library
Other SPL Datastructures
• SplMaxHeap
• SplMinHeap
• SplPriorityQueue
• SplObjectStorage
SPL – Standard PHP Library
E-BookMastering the SPL Library
Joshua Thijssen
Available in PDF, ePub, Mobi
http://www.phparch.com/books/mastering-the-spl-library/
SPL – Standard PHP Library
E-BookIterating PHP Iterators
Cal Evans
Available in PDF, ePub, Mobi
https://leanpub.com/iteratingphpiterators
SPL DataStructures
?Questions
Who am I?
Mark BakerDesign and Development ManagerInnovEd (Innovative Solutions for Education) Ltd
Coordinator and Developer of:Open Source PHPOffice library
PHPExcel, PHPWord, PHPPowerPoint, PHPProject, PHPVisioMinor contributor to PHP core
@Mark_Baker
https://github.com/MarkBaker
http://uk.linkedin.com/pub/mark-baker/b/572/171
SPL: The Undiscovered Library – Exploring Datastructures
http://joind.in/talk/view/15874