+ All Categories
Home > Documents > [Lecture Notes in Computer Science] Algorithms and Data Structures Volume 709 || The K-D heap: An...

[Lecture Notes in Computer Science] Algorithms and Data Structures Volume 709 || The K-D heap: An...

Date post: 12-Dec-2016
Category:
Upload: sue
View: 213 times
Download: 0 times
Share this document with a friend
12
The K-D Heap: An Efficient Multi-dimensional Priority Yuzheng Dingt and Mark Allen Weiss$ tComputer Science Department, University of California Los Angeles, CA 90024, U.S.A. ~:School of Computer Science, Florida International University Miami, FL 33199, U.S.A Queue Abstract. This paper presents the k-d heap, an efficient data structure that implements a multi-dimensional priority queue. The basic form of the k-d heap uses no extra space, takes linear time to construct, and supports instant access to the items carrying the minimum key of any dimension, as well as logarithmic time insertion, deletion, and modifi- cation of any item in the queue. Moreover, it can be extended to a multi-dimensional double-ended mergeable priority queue, capable of ef- ficiently supporting all the operations linked to priority queues. The k-d heap is very easily implemented, and has direct applications. 1 Introduction The Priority queue is one of the fundamental abstract data types widely used, and has been extensively studied. A classic priority queue consists of a collection of items, each of which has a priority drawn from a fully-ordered set. The basic operations on a priority queue are the insertion of new items and the retrieval and deletion of the item with highest priority. In some applications, it is also desired that the deletions and the priority changes of arbitrary items are allowed in a priority queue, and priority queues can be merged and/or split. There are also cases where the access to both the item of highest priority and that of lowest priority is necessary, i.e. the priority queue is required to be double-ended. Implementations of priority queues are usually named heaps, in which prior- ity is represented by the keys of the data items. If the smallest key represents the highest priority, the heap is called a min-heap; if the largest key represents the highest priority, the heap is a ma~-heap. Heaps are usually based on certain tree structures; if pointers are used to maintain such structures, it is an e$pliei~ implementation; if pointers are not used (in which case an indexed array is usu- ally the alternative), it is an implicit implementation. Implicit implementations are generally preferable since they use no or less extra space, unless merging is supported, in which case it seems that implicit implementations suffer a linear number of data movements.
Transcript

The K-D Heap: An Efficient Multi-dimensional Priority

Yuzheng Dingt and Mark Allen Weiss$

tComputer Science Department, University of California Los Angeles, CA 90024, U.S.A.

~:School of Computer Science, Florida International University Miami, FL 33199, U.S.A

Queue

A b s t r a c t . This paper presents the k-d heap, an efficient data structure that implements a multi-dimensional priority queue. The basic form of the k-d heap uses no extra space, takes linear time to construct, and supports instant access to the items carrying the minimum key of any dimension, as well as logarithmic time insertion, deletion, and modifi- cation of any item in the queue. Moreover, it can be extended to a multi-dimensional double-ended mergeable priority queue, capable of ef- ficiently supporting all the operations linked to priority queues. The k-d heap is very easily implemented, and has direct applications.

1 I n t r o d u c t i o n

The Priority queue is one of the fundamental abstract da ta types widely used, and has been extensively studied. A classic priority queue consists of a collection of items, each of which has a priority drawn from a fully-ordered set. The basic operations on a priority queue are the insertion of new items and the retrieval and deletion of the i tem with highest priority. In some applications, it is also desired tha t the deletions and the priority changes of arbi t rary items are allowed in a priority queue, and priority queues can be merged and /or split. There are also cases where the access to both the i tem of highest priority and tha t of lowest priority is necessary, i.e. the priority queue is required to be double-ended.

Implementa t ions of priority queues are usually named heaps, in which prior- ity is represented by the keys of the da ta items. If the smallest key represents the highest priority, the heap is called a min-heap; if the largest key represents the highest priority, the heap is a ma~-heap. Heaps are usually based on certain tree structures; if pointers are used to mainta in such structures, it is an e$pliei~ implementation; if pointers are not used (in which case an indexed array is usu- ally the alternative), it is an implicit implementation. Implicit implementat ions are generally preferable since they use no or less extra space, unless merging is supported, in which case it seems that implicit implementat ions suffer a linear number of da ta movements.

303

Since the invention of the first priority queue implementation, the binary heap, by Williams [19], many papers have been published on this issue, includ- ing [1, 3, 4, 5, 6, 7, 9, 10, 11, 12, 15, 16, 17, 18], to name just a few. In [7] a data structure is introduced that efficiently supports all the priority queue oper- ations that have been discussed in literature, including merge and double-ended operations.

On the other hand, in practice, data objects are often associated with more than one priority category. In a database system, records can be indexed on different fields, and each index usually defines a priority relationship. Access based on any index should be supported as efficiently. It is not difficult to give more examples. For such applications, a multi-dimensional priority queue is desired.

In all these cases, if the classic priority queue data type is used, for each pri- ority relation we have to build a queue; moreover, we have to maintain the links among the presences of the same object in different queues, and take appropriate operations on all the queues every time an operation based on one priority is required. This clearly consumes extra space and time, and is inconvenient to implement.

In this paper we present a data structure that implements a multi-dimensional priority queue efficiently. This structure, named the k-d heap, can implement a priority queue on a set of k different priority relationships, and support in- tegrated priority queue operations according to any priority relationship as ef- ficiently as the same operations on standard one-dimensionM priority queues. Since a single structure is maintained, there is no need for extra space to keep links, and extra operations to maintain different priority relationships. Using an implicit implementation, the k-d heap does not require any extra space. Moreover, the k-d heap can be implemented to support some or all the priority operations on any number of dimensions, and it contains previously studied heap structures, e.g. the double-ended heaps, as a special case.

A related work is the k-d tree [2], which extends binary search tree to multiple keys. The major drawback of the k-d tree is that both deletion and balancing are difficult. The k-d heap proposed in this paper follows similar order, but is automatical ly balanced, and simple to implement.

The rest of this paper is organized as follows. Section 2 presents the general structure of a k-d heap, and its properties. In section 3 we discuss the operations for the special case of a two-dimensional priority queue; this will be extended to the general case in section 4. Section 5 is devoted to further extensions to support non-standard operations. Section 6 discusses the improvement of the k-d heap structure for large values of k. Section 7 concludes the paper. Due to page limitations we are unable to include detailed complexity analysis; it can be found in [8].

2 T h e K - D H e a p

We are given a set of items, each of which contains k keys key1, key2, ..., key~, where key i is drawn from fully-ordered set Ki. A (min) k-d heap H over this set

304

Fig.1 A 2-dimensional heap of 20 items

of items is a binary tree that satisfies the following conditions:

1. H is a complete binary tree (i.e. it is full except possibly at the rightmost positions of the bot tom level);

2. H maintains k-d heap order.

(a) the i tem at root has the smallest key 1 value in the tree;

(b) for any node v other than the root, if the item at its parent w has the smallest key~ value in the subtree rooted at w, then the item at v has the smallest keYmod(i,k)+l value in the subtree rooted at v.

For simplicity we will not distinguish a node and the item at the node. We call k the dimension of the heap. An example of a 2-d heap is shown in Figure 1.

Define the level of a node to be the number of nodes on the path from the root to that node, and the height of the heap to be the largest level of its nodes. In practice, the height of a heap is usually much larger than the dimension. The k-d heap order can be re-expressed as the following:

L e m m a 1 In a k-d heap, a node at level i has the smallest keYmod(i_l,k)+l in its subtree. �9

We will refer to level i as a level for the key keYmod~i_l,~)+l , and keYmod(i_l,~)+l the key of the level i. Also, if a node in a complete binary tree satisfies Lemma 1, we call it a heap node. A complete binary tree is a k-d heap if and only if every node is a heap node.

Clearly, a k-d heap can be implemented either implicitly or explicitly. Unless otherwise specified, we will assume an implicit implementation. It is easily seen that k-d heaps have the following properties.

L e m m a 2 In a k-d heap, the node that has the smallest key i value is among the highest i levels of the heap. �9

L e m m a 3 A 1-d heap is a binarv heap [19]; a 2-d heap with keYl -- -key2 for all nodes is a min-maz heap [1]. �9

305

Lemma 2 shows that a/~-d heap implements a priority queue, i.e. the node with smallest key is retrieved from the top of the heap (among constant number of nodes1). Lemma 3 indicates that the/r heap is a more general structure than previous ones. In the following sections, we will also show that i t is as efficient as previous structures. We first demonstrate the operations on 2-d heaps, then generalize to k-d heaps.

3 Basic Operat ions on 2-D Heap

The retrieval of the node with smallest key 1 or key 2 is trivial according to Lemma 2. Insertion will add a new node to the end of the heap, however the new node might not be in accordance with the 2-d heap order at this position. Deletion of a node with the smallest key will leave a hole at one of the top two levels, and we can move the node from the end of the heap to fill this hole. The more general case is that an arbitrary node is deleted and the last node is used to replace it. The substitution might not be in accordance with the 2-d heap order at the new position. Therefore, the most essential operation is the normalization of an abnormal node in an otherwise 2-d heap ordered complete binary tree.

3.1 N o r m a l i z a t i o n

Assume that in a 2-d heap, one node is altered. If every node was a heap node, then every node except the altered node and its ancestors is still a heap node. Moreover, the restoration of the heap order property at a node relates only to the descendants of the node. In other words, if a node is a heap node, it is still a heap node after any rearrangement among its descendants. Therefore, we restore the order from top to bot tom so that restoration is permanent.

The first part of the normalization procedure is to restore the order among the altered node and its ancestors. The algorithm is outlined as follows.

a l g o r i t h m normalize-up(node) fo r each ancestor of node starting from root do

i f the order relationship between ancestor and node is violated t h e n swap ancestor and node.

e n d .

To see how this works, note that t he swap of an ancestor and the altered node does not affect the order property on the ancestors at smaller levels, and after the swap, the heap order at this ancestor is restored s incethe altered node was the only problem. Therefore, after each iteration we convert one ancestor to a heap node (if it was not). Figure 2(a) shows a 2-d heap with an abnormal node and Figure 2(b) is the result of n0rmalize-uP0 on it (a crossed out arrow indicates a comparison, but no swap).

ZThe cost of the search among these nodes can be easily eliminated by distributing it to the operations modifying the heap.

306

(a) Heap with abnormal node (b) After normalize-up0 (c) After normalize-down0

Fig.2 Normalization of an abnormal node in 2-d heap

After such a procedure, all nodes except one are heap nodes, so we only need to restore the heap order in the subtree rooted at this abnormal node. The algorithm follows.

a l g o r i t h m normalize-down(node) Let the key for the level of i~era be key~. Find the node v with smallest keyi among node,

its children and grandchildren. i f v :node t h e n s top. swap v and node. i f v is a grandchild of node and

has smaller keYmod(i,2)+l value than its parent t h e n swap v and its parent.

call normalize-down(v). e n d ,

This procedure is illustrated in Figure 2(c) where the abnormal node in Fig- ure 2(b) is normalized towards its descendants. The procedure can be regarded as a procedure of pushing the abnormal node down the heap. Clearly, recursion is not necessary in implementation.

7 b~.

(a) Before insertion

Fig.3

// /'

(b) After insertion

Inser t ion of an i t em into a 2-d heap

307

~ .ql ...................... : ....

(a) The original 2-d heap (b) After deletion of min keyl (c) After deletion of min key2

Fig.4 Deletion of the item with minimum key in n 2-d heap

Now the normalization procedure is simply the following:

a l g o r i t h m normalize(node) call normalize-up(node). call normalize-down(node).

e n d .

3.2 2-D Heap Operations

Based on the algorithms in the last subsection, the basic operations in 2-d heaps are easy.

To insert an item, we add it to the end of the heap, and normalize it. In this case, only normalize-upO is required. Figure 3 gives an example.

To delete the minimum for ke~li, we replace the root with the last item in the heap, and normalize it. In this case, only normalize-down 0 is required. An example is shown in Figure 4(b), where the original heap is in Figure 4(@.

To delete the minimum for key2 , we first locate it among the first three items in the heap. Then we replace it with the last item in the heap and normalize it. Figure 4(c) shows how this is applied to the original heap.

To delete an arbitrary item (with known position), we simply replace it with the last item in the heap and normalize it. To modify an arbitrary item, we just do the modification and then normalize the item.

The creation of a 2-d heap is similar to that of a binary heap, i.e. from bottom up. We describe it in recursive form; again, recursion is not necessary, but for simplicity of description (so is the parameter indicating the key of the level).

a l g o r i t h m create(heap, ke~li ) call create(left-s~bheap, keYmod(i,2)+i ). call create( rigM-subheap, keYmod(i,2)+ i ). call normalize-down(root).

e n d .

Figure 5 shows an example.

308

(a) Original tree (b) Level 4 normalized

(d) Level 2 normalized

(c) Level 3 normalized

(e) Level 1 normalized

Fig.5 Creation of a 2-d heap

3.3 C o m p l e x i t y

It is easily seen that the cost of normalize-uPO is linear in the level of the node it is applied to, and the cost of normalize-down 0 is linear in the height of the subtree rooted at the node to which the operation is applied. Therefore, except for creation, the complexity of any of the operations discussed in the last subsection is linear in the height of the heap. Creation is linear in the number of items; the proof is similar to that of binary heap, since normalize-down 0 is linear in the height of the heap. Thus we have

T h e o r e m 1 A 2-d heap of n items can be created in O(n) time. On such a heap, finding the item with minimum value of either key takes constant time; insertion, deletion of the item with minimum value of either key, deletion of an arbi t rary item whose position is known, and modification of an arbitrary item take O(log n) time each in the worst case. �9

The detailed analysis of the exact number of comparisons and data move- ments can be found in [8].

3.4 2-D H e a p as M i n - M a x H e a p

According to Lemma 3, a 2-d heap can implement a min-max heap if we set key 2 = -key 1 for all nodes, where key 1 is the original rain-key, and key 2 is the new maz-key. However, since the operations of 2-d heap are designed to handle the general case, where key I and key 2 can be totally unrelated, it is more efficient to use the min-max heap operations for this special case. For instance, when a node is normalized towards its descendant, in a general 2-d heap the minimum key can be with any of its children and grandchildren. However, in a

309

min-max heap, if there is any grandchild, then the minimum key must be with one of the grandchildren. Therefore, normalize-down 0 can only examine the grandchildren unless there is no grandchild. Similarly, normalize-upO can use the bot tom-up approach and go through only the max- or min-levels. This will reduce the number of comparisons by about 50% (for detailed analysis, see [8]). In general, the operations on a 2-d heap can be improved if the results of some comparisons can be implied by the results of some other comparisons based on known relationship between the two priorities.

4 B a s i c O p e r a t i o n s o n K - D H e a p

The results of last section can be easily generalized to the case of general k-d heap for any constant k. The retrieval of the minimum item with regard to key~ is implemented by a search among the constant number of nodes in the first i levels, so it takes O(1) time. If retrieval is f requent , the positions can also be memorized to allow instant access. Normalize-uPO remains the same (independent of k); normalize-down 0 can be done in the following way, which is a direct generalization of the 2-d case.

a l g o r i t h m normalize-down(node) Let key~ be the key for the level of node. Find the node v with smallest key~ value

in the descendants of node within k levels. i f node has smaller keyi than v

t h e n s t o p e lse swap node and v.

call normalize-up(v) for the subheap rooted at node. call normalize-down(v).

e n d .

Note that the call of normalize-uPO is within a heap of k levels, so the cost is constant.

Creation is similar to the case of 2-d heap except the key for any level is determined with modulo k instead of 2. Therefore,

T h e o r e m 2 A k-d heap of n items, where h is any given constant, can be created in O(n) time. It takes constant time to retrieve the item with minimum value of any of the k keys, and it takes O(log n) time to do insertion, deletion of an i tem with minimum value of any of the/c keys, and deletion or modification of an arbitrary item (whose position in the heap is known) on such a heap. �9

Figure 6 shows a 3-d heap and the result of the deletion of the root. It should be pointed out that the hidden constant for the deletion operations

is exponential in ~, which makes the operations impractical when k is very large. (On the other hand, insertion cost is independent of k, and creation cost is proportional to k 2 [8].) In Section 6 we will show that the exponential constant can be eliminated by a slight modification of the heap structure.

310

(a) Original 3-d heap (b) After deletion of min key 1

Fig.6 Deletion of item with minimum ke!ll in a 3-d heap

5 Extensions of K-D Heap

The basic k-d heaps can be extended to support other operations. We present some of them briefly; details are in [8].

5.1 Doub le -Ended K-D Heap

As we have demonstrated in Section 3.3.4, 2-d heaps can be used to implement min-max heaps. In general, if each item has k keys key1, ke~/2, ..., ke~/~, we can (conceptually) include k more keys key-~ , ke~l~ , ..., key~ where key[ = -key~ , where the negative sign represents the general functional mapping that reverses the order. Then, the 2k-d heap becomes a double-ended k dimensional heap. Note that the new keys need not be stored; instead, the mapping can be encoded in the operations, so there is no need for extra space.

5.2 Mergeable K-D Heap

We assume explicit implementation of binary trees when merging is considered. Merging of the basic k-d heaps is difficult. As special cases, binary heaps (1-d heaps) take O(log 2 n) time to merge [16], and min-max heaps (a special type of 2-d heap) have an f~(n) lower bound for merging [13].

On the other hand, in [7] it is shown that by introducing certain kind of relaxation to the min-max order, a structure similar to the binomial queue is possible, and thus merging becomes an O(log n) time operation. Specifically, the heap is decomposed into a set of units, each containing 2 i items for a unique i, in the form of a perfect binary tree plus a single item. For odd (even) i, the perfect binary tree is a relazed rain-max (max-min) heap, where the word relazed means that some nodes are allowed not to obey the heap order. It was shown that two such units of the same size can be merged in constant time, therefore the entire heap is merged in logarithmic time. This technique can be generalized to the k-d heaps by a similar decomposition, and for a perfect binary tree of height i, the root has keYmod(i,~)+l has its key for the relaxed order. The operations

311

are more involved than that of the relaxed min-max heap, but the complexity is of the same order, i.e. the merging of (double-ended) k-d heaps also takes logarithmic time. Details can be found in [8].

5.3 G e n e r a l i z e d P r i o r i t y Queues

By definition, a priority queue supports efficient access to the item with highest (and/or lowest) priority. There are often cases where a function on the domain of one or more priority relationships forms a new priority and access based on this priority is also desired. Examples include a student record file where the grade of each course form a priority relationship, while GPA is also a desired priority relationship for many applications. The min-max heap is a special case. In general, all such pre-determined functionally formed priorities can be encoded into the operations, and explicit storage of the corresponding "keys" is not neces- sary. In this sense, k-d heaps can be used to implement such ger~eralized priority queues. Double-ended operations is a special case of such generalization.

6 Improvement of K - D H e a p f o r L a r g e K V a l u e s

The h-d heap structure we have discussed so far maintains a very loose order relationship, in the sense that in any subheap, although the smallest key is at the root, the second smallest key can be anywhere among the first k levels of descendants. This results in the exponential constant for the deletion operations. When h is large, as is the case for double-ended implementations, this must be improved to make the h-d heap practical.

In this section we briefly describe an approach to solve this problem. The detailed analysis is given in [8].

Instead of building a complete binary tree for the heap, we allow each node at the levels i, rood(i, h) r 0, to have at most one child, while the nodes at the levels i, mod(i,/c) = 0, still have up to two children. Figure 7(a) shows the modified structure for the 3-d heap shown in Figure 6(a). In order to allow efficient implicit implementation, every chain of k nodes can be packed into a single node, so that the underlying tree is still a complete binary tree, as shown in Figure 7(b).

Clearly, the normalize-down operation now only needs to examine at most 2k descendants to select the minimum. On the other hand, the height of the heap is about k times as large, so the constant factor for the deletion operations is reduced to O(k~). The cost of the insertion operation, however, is increased by a factor of k. It can be shown that the cost of heap creation is also slightly increased (but still O(k2n)).

Although Figure 7 illustrates the modification for/~ = 3, it is only preferred when k is large. In practice, small values of/~ (for instance, 2 or 3) are more usual, and the original structure is more efficient.

(a) The structure

Fig.7

312

(b) The implementation

Improved k-d heap structure for large k

7 Conclusion

We have presented the k-d heap, a data structure that efficiently implements a multi-dimensional priority queue without using extra space. One form supports insertion, deletion of any minimum, and creation in O(logn), 0(2 ~ log n), and O(k~n), respectively, with particularly easy operations for k = 2, while another form gives times of O(k log n), O(k 2 log n), and O(k2n). (Note that for typical values of k, the difference between 2 ~ and k s is not large). Moreover, the k-d heap can be extended to support double-ended and merging operations.

The implementation of the k-d heap is extremely simple. We have coded the operations in C, and they take about 120 lines totally. Therefore, it is very practical. A complete implementation with performance comparison is given in [8].

Several related problems are still open. It is possible to improve the bounds for insertion and minimum-deletion to O(k log n) by using only O(n) additional space. We do not know if this can be achieved with only contant extra space.

Although the k-d heaps presented in Section 4 can be merged in logarithmic time via order relaxation, the same technique does not extend to the modified structure discussed in Section 6. Another interesting problem is to exploit the inter-priority relationships to improve the efficiency of k-d heap operations. In the case of double-ended priority queues, we have shown that such relation- ships yield significantly complexity reduction. The impact of other commonly presented relationships is also worth studying.

R e f e r e n c e s

1. M. ATKINSON, J. SACK, N. SANTOI%O, T. STROTHOTTE, "Min-Max Heaps and Generalized Priority Queues," Comm. ACM, Vol.29 (1986), 996-1000.

2. J. B~NTLBY, "Multidimensional Binary Search Trees Used for Associative Searching," Comm. ACM, Vol.18 (1975), 509-517.

313

3. M. BROWN, "Implementation and Analysis of Binomial Queue Algo- rithms," SIAM J. Comput., Vol.7 (1978), 298-319.

4. S. CARLSSON, "The Deap - - A Double-Ended Heap to Implement Double- Ended Priority Queues," Inform. Process. Left., Vol.26 (1987), 33-36.

5. S. CARLSSON, J. MUNRO, P. POBLBTE, "An Implicit Binomial Queue with Constant insertion time," Proc. SWAT (1988).

6. J. DRISCOLL, H. GABOW, a. SHRAIRMAN, R. TARJAN, "Relaxed Heaps: An Alternative to Fibonacci Heaps with Applications to Parallel Compu- tation," Comm. ACM, Vol.31 (1988), 1343-1354.

7, Y. DING, M. WEISS, "The Relaxed Min-Max Heap: A Mergeable Double- Ended Priority Queue," Acfa Informatiea, Vol.30 (1993), to appear.

8. Y. DING, M. W~.ISS, "Efficient Implementations of Multi-dimensional Pri- ority Queues," School of Computer Science Technical Report, Florida In- ternational University, Feb. 1993.

9. R. FLOYD, "Algorithm 245: Treesort," Comm. ACM, Vol.7 (1964), 701.

10. M. FREDMAN, R. SEDGBWICK, D. SLBATOR, R. TARffAN, "The Pairing Heap: A New Form of Self-Adjusting Heap," Algorithmica, Vol.1 (1986), 111-129.

11. M. FREDMAN, R. TARJAN, "Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms," J. ACM, Vol.34 (1987), 596-615.

12. G. GAMBOSI, E. NARDELLI, M. TALAMO, "A Pointer-free Data Structure for Merging Heaps and Min-Max Heaps," Theoretical Computer Science, Vol. 84 (1991), 107-126.

13. A. HASltAM, J. SACK, "Bounds for Min-Max Heaps," BIT, Vol.27 (1987), 315-323.

14. D. KNUTH, The Art of Computer Programming, "Col. 3, Addison-Wesley, Reading, MA, 1973.

15. S. OLARIU, C. OVERSTREET, Z. WEN, "A Mergeable Double-ended Pri- ority Queue," The Computer Journal, Vol.34 (1991), 423-427.

16. :l. SACK, T. STB.OTHOTTE, "An Algorithm for Merging Heaps," Acts In- formatica, Vol.22 (1985), 171-186.

17. D. SLEATOR, R. TARJAN, "Self-Adjusting Heaps," SIAM J. Compuf., Vol.15 (1986), 52-69.

18. J. VUILL~.MIN, " A Data Structure for Manipulating Priority Queues," Comm. ACM, Vol.21 (1978), 309-315.

19. J. WILLIAMS, "Algorithm 232: Heapsort," Comm. ACM, Vol.7 (1964), 347-348.


Recommended