Date post: | 02-Aug-2015 |
Category: |
Technology |
Upload: | mark-baker |
View: | 875 times |
Download: | 1 times |
DataStructuresA data structure is a particular way of organizing data in a computer so that it can be used efficiently.
Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks.
DataStructures in PHP• Some basic DataStructures available in PHP’s SPL• Stack• Queue• Heap• Doubly-Linked List• Fixed Array• SPL Object Storage
• SPL is the Standard PHP Library• (Yet another recursive acronym)
Tries• A Tree structure comprising a hierarchy of “indexed” nodes
• Each node can contain:• A series of pointers (keys) to the next node in the hierarchy• A bucket for data values
• This allows for multiple values with the same key
• There are three basic types of Tries:• Tries• Radix Tries• Suffix Tries
Tries – Purpose• Fast lookup with a partial key
• Example implementationhttps://github.com/MarkBaker/Tries
Tries – Uses• Replacement for PHP Arrays (Hashmaps)• No key collisions• Duplicate Keys supported• No Hashing function required
• Partial Key Lookups• Predictive Text• Autocomplete• Spell-Checking• Hyphen-isation
Tries – Methods• add($key, $value = null)
Adds new data to a Trie• search($prefix)
Find data in a Trie
• delete($key)• isNode($key)• isMember($key)
Tries – Basic Trie$trie = new \Trie();
$trie->add('cat', 'cat data');
$trie->add('car', 'car data');
C
A
T R
Tries – Basic Trie$trie = new \Trie();
$trie->add('cat', 'cat data');
$trie->add('car', 'car data');
$trie->add('cart', 'cart data');
C
A
T R
T
Tries – Basic Trie$trie = new \Trie();
$trie->add('cat', 'cat data');
$trie->add('car', 'car data');
$trie->add('cart', 'cart data');
$trie->search('car');
T
T
C
C A
A
R
R
Tries – Basic Trie• The key to a data node is inherent in the path to that node,
so it is not necessary to store the key
Tries – Radix Trie• Node pointers comprise one or more characters or bytes• This means they can be more compact and memory efficient than
a basic Trie• It can add more overhead to building the Trie• It may be faster to search the Trie hierarchy
Tries – Radix Trie$radixTrie = new \Trie();
$radixTrie->add('cat', 'cat data');
$radixTrie->add('car', 'car data');
CA
T R
Tries – Radix Trie$radixTrie = new \Trie();
$radixTrie->add('cat', 'cat data');
$radixTrie->add('car', 'car data');
$radixTrie->add('cart', 'cart data');
CA
T R
T
Tries – Suffix Trie$suffixTrie = new \SuffixTrie();
$suffixTrie->add('cat', 'cat data');
$suffixTrie->search('at');
C
A
T
T
A T
A
T
Tries – Suffix Tries
•Memory hungry• n + n-1 + n-2… 2 + 1 nodes (where n is key length) used for every
key/value stored in a Suffix Trie
• Slow to populate
• Can be used to search for “contains” rather than simply “begins with”
Tries – Suffix Tries
• It is necessary to store the key with the data
• A search can return duplicate values• e.g. “banana” if we search for “a” or “n” or even “ana”
• Data should only be stored once for the “full word”, and subsequent sequences should only store a pointer to that data
QuadTrees
• A Tree structure that partitions a 2-Dimensional space by recursively subdividing it into quadrants (or regions)
• Each node can contain:• A series of pointers (keys) to the next node in the hierarchy• A bucket for data values
• There are different types of QuadTrees:• Point QuadTrees• Region QuadTrees• Edge QuadTrees• Polygonal Map (PM) QuadTrees
QuadTrees – Purpose• Fast Geo-spatial or Graph lookup• Sparse data compression
• Example implementationhttps://github.com/MarkBaker/QuadTrees
QuadTrees – Uses• Spatial Indexing• Storing Sparse Data
e.g.• Spreadsheet format data• Pixel data in images
• Collision Detection• Points within a field of vision
QuadTrees – Methods• insert($xyCoordinate, $value = null)
Adds new data to a QuadTree• search($boundingBox)
Find data in a QuadTree
QuadTrees – Spatial Indexing$quadTree = new \QuadTree(
-180, 90, 180, -90, // Dimensions
3 // Bucket size
);
-90
90
0
-180 180
$quadTree = new \QuadTree(
-180, 90, 180, -90, // Dimensions
3 // Bucket size
);
$quadTree->add('London', 51.5072, -0.1275);
$quadTree->add('New York', 40.7127, - 74.0059);
$quadTree->add('Paris', 48.8567, 2.3508);
QuadTrees – Spatial Indexing
-90
90
0
-180 180
QuadTrees – Spatial Indexing$quadTree = new \QuadTree(
-180, 90, 180, -90, // Dimensions
3 // Bucket size
);
$quadTree->add('London', 51.5072, -0.1275);
$quadTree->add('New York', 40.7127, - 74.0059);
$quadTree->add('Paris', 48.8567, 2.3508);
$quadTree->add('Munich', 48.1333, 11.5667);
$quadTree->add('Dublin', 53.3478, 6.2597);
$quadTree->add('Rome', 41.9000, 12.5000);
$quadTree->add('Athens', 37.9667, 23.7167);
-90
90
90
0
0
-180
-180 1800 0
45
90
0
45
180
QuadTrees – Spatial Indexing$quadTree = new \QuadTree(
-180, 90, 180, -90, // Dimensions
3 // Bucket size
);
$quadTree->add('London', 51.5072, -0.1275);
$quadTree->add('New York', 40.7127, - 74.0059);
$quadTree->add('Paris', 48.8567, 2.3508);
$quadTree->add('Munich', 48.1333, 11.5667);
$quadTree->add('Dublin', 53.3478, 6.2597);
$quadTree->add('Rome', 41.9000, 12.5000);
$quadTree->add('Athens', 37.9667, 23.7167);
$quadTree->add('Amsterdam', 52.3667, 4.9000);
-90
90
90
0
90
45
0
-180
-180 1800 0
45
90
0
45
180
0 90
$quadTree = new \QuadTree(
-180, 90, 180, -90, // Dimensions
3 // Bucket size
);
$quadTree->add('London', 51.5072, -0.1275);
$quadTree->add('New York', 40.7127, - 74.0059);
$quadTree->add('Paris', 48.8567, 2.3508);
$quadTree->add('Munich', 48.1333, 11.5667);
$quadTree->add('Dublin', 53.3478, 6.2597);
$quadTree->add('Rome', 41.9000, 12.5000);
$quadTree->add('Athens', 37.9667, 23.7167);
$quadTree->add('Amsterdam', 52.3667, 4.9000);
…
// Search QuadTree for Northern Europe
$quadTree->find(
-15.0, 60.0,
25.0, 45.0
);
QuadTrees – Spatial Indexing
-90
90
90
0
90
45
45 45
0 0
0
0
45
45
67.5
45 -45
0
-90
-180 180
-180 1800 0 0 180
90
0
45
0 90 0 90 90 180
0 45
QuadTrees – Spatial Indexing
• The top-level node need not be limited to the maximum graph space (i.e. the whole world)
QuadTrees – Spatial Indexing
•With a larger bucket size• QuadTree is smaller, fewer nodes using less memory• More points need checking in each node• Faster to insert / slower to search
•With a smaller bucket size• The QuadTree uses more memory• Fewer points in each node to check• Slower to insert / faster to search
PHP DataStructures – Beyond SPL
A dreamscape made from random noise. Illustration: Google
Questions?
Who am I?
Mark BakerDesign and Development ManagerInnovEd (Innovative Solutions for Education) Learning Ltd
Coordinator and Developer of:Open Source PHPOffice library
PHPExcel, PHPWord, PHPPowerPoint, PHPProject, PHPVisioMinor contributor to PHP coreOther small open source libraries available on github
@Mark_Baker
https://github.com/MarkBaker
http://uk.linkedin.com/pub/mark-baker/b/572/171