+ All Categories
Home > Documents > Data structures.pdf

Data structures.pdf

Date post: 26-Oct-2015
Category:
Upload: kamayanipr
View: 546 times
Download: 77 times
Share this document with a friend
Description:
Data structures
Popular Tags:
622
PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Thu, 27 Dec 2012 23:04:53 UTC Data structures
Transcript
  • PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information.PDF generated at: Thu, 27 Dec 2012 23:04:53 UTC

    Data structures

  • ContentsArticlesIntroduction 1

    Data structure 1Linked data structure 3Succinct data structure 6Implicit data structure 8Compressed data structure 9Search data structure 10Persistent data structure 11Concurrent data structure 18

    Abstract data types 21Abstract data type 21List 28Stack 32Queue 61Deque 63Priority queue 66Map 71Bidirectional map 74Multimap 74Set 75Tree 81

    Arrays 86Array data structure 86Row-major order 92Dope vector 94Iliffe vector 95Dynamic array 96Hashed array tree 99Gap buffer 100Circular buffer 102Sparse array 114

  • Bit array 116Bitboard 121Parallel array 126Lookup table 128

    Lists 134Linked list 134XOR linked list 150Unrolled linked list 152VList 154Skip list 156Self-organizing list 162

    Binary trees 167Binary tree 167Binary search tree 175Self-balancing binary search tree 187Tree rotation 190Weight-balanced tree 193Threaded binary tree 194AVL tree 199Red-black tree 203AA tree 218Scapegoat tree 223Splay tree 227T-tree 242Rope 245Top Trees 250Tango tree 254Van Emde Boas tree 276Cartesian tree 281Treap 286

    B-trees 289B-tree 289B+ tree 300Dancing tree 3052-3 tree 306

  • 2-3-4 tree 307Queaps 309Fusion tree 313Bx-tree 317

    Heaps 320Heap 320Binary heap 323Binomial heap 329Fibonacci heap 3342-3 heap 339Pairing heap 339Beap 342Leftist tree 343Skew heap 346Soft heap 349d-ary heap 351

    Tries 354Trie 354Radix tree 361Suffix tree 365Suffix array 371Compressed suffix array 376FM-index 377Generalized suffix tree 380B-trie 381Judy array 381Directed acyclic word graph 383

    Multiway trees 384Ternary search tree 384Andor tree 387(a,b)-tree 388Link/cut tree 389SPQR tree 389Spaghetti stack 392Disjoint-set data structure 393

  • Space-partitioning trees 397Space partitioning 397Binary space partitioning 398Segment tree 404Interval tree 407Range tree 413Bin 415k-d tree 417Implicit k-d tree 425min/max kd-tree 428Adaptive k-d tree 429Quadtree 430Octree 436Linear octrees 438Z-order curve 438UB-tree 443R-tree 444R+ tree 450R* tree 451Hilbert R-tree 454X-tree 461Metric tree 462Vp-tree 463BK-tree 464

    Hashes 466Hash table 466Hash function 480Open addressing 488Lazy deletion 491Linear probing 491Quadratic probing 492Double hashing 496Cuckoo hashing 498Coalesced hashing 503Perfect hash function 506Universal hashing 508

  • Linear hashing 513Extendible hashing 5142-choice hashing 520Pearson hashing 520FowlerNollVo hash function 521Bitstate hashing 523Bloom filter 524Locality preserving hashing 533Zobrist hashing 534Rolling hash 535Hash list 536Hash tree 537Prefix hash tree 539Hash trie 540Hash array mapped trie 540Distributed hash table 541Consistent hashing 547Stable hashing 549Koorde 549

    Graphs 552Graph 552Adjacency list 554Adjacency matrix 556And-inverter graph 559Binary decision diagram 561Binary moment diagram 565Zero-suppressed decision diagram 567Propositional directed acyclic graph 568Graph-structured stack 569Scene graph 570

    Appendix 575Big O notation 575Amortized analysis 586Locality of reference 587Standard Template Library 591

  • ReferencesArticle Sources and Contributors 601Image Sources, Licenses and Contributors 610

    Article LicensesLicense 615

  • 1Introduction

    Data structure

    a hash table

    In computer science, a data structure isa particular way of storing andorganizing data in a computer so that itcan be used efficiently.[1][2]

    Different kinds of data structures aresuited to different kinds of applications,and some are highly specialized tospecific tasks. For example, B-trees areparticularly well-suited forimplementation of databases, whilecompiler implementations usually usehash tables to look up identifiers.

    Data structures provide a means tomanage huge amounts of data efficiently,such as large databases and internetindexing services. Usually, efficient datastructures are a key to designing efficient algorithms. Some formal design methods and programming languagesemphasize data structures, rather than algorithms, as the key organizing factor in software design. Storing andretrieving can be carried out on data stored in both main memory and in secondary memory. Various Data Structuresare available that are needed to be employed based on the need.

    Overview An array data structure stores a number of elements of the same type in a specific order. They are accessed using

    an integer to specify which element is required (although the elements may be of almost any type). Arrays may befixed-length or expandable.

    Record (also called tuple or struct) Records are among the simplest data structures. A record is a value thatcontains other values, typically in fixed number and sequence and typically indexed by names. The elements ofrecords are usually called fields or members.

    A hash or dictionary or map is a more flexible variation on a record, in which name-value pairs can be added anddeleted freely.

    Union. A union type definition will specify which of a number of permitted primitive types may be stored in itsinstances, e.g. "float or long integer". Contrast with a record, which could be defined to contain a float and aninteger; whereas, in a union, there is only one value at a time.

    A tagged union (also called a variant, variant record, discriminated union, or disjoint union) contains an additionalfield indicating its current type, for enhanced type safety.

    A set is an abstract data structure that can store specific values, without any particular order, and no repeatedvalues. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a boolean"in" or "not in".

  • Data structure 2

    An object contains a number of data fields, like a record, and also a number of program code fragments foraccessing or modifying them. Data structures not containing code, like those above, are called plain old datastructure.

    Many others are possible, but they tend to be further variations and compounds of the above.

    Basic principlesData structures are generally based on the ability of a computer to fetch and store data at any place in its memory,specified by an addressa bit string that can be itself stored in memory and manipulated by the program. Thus therecord and array data structures are based on computing the addresses of data items with arithmetic operations; whilethe linked data structures are based on storing addresses of data items within the structure itself. Many data structuresuse both principles, sometimes combined in non-trivial ways (as in XOR linking)The implementation of a data structure usually requires writing a set of procedures that create and manipulateinstances of that structure. The efficiency of a data structure cannot be analyzed separately from those operations.This observation motivates the theoretical concept of an abstract data type, a data structure that is defined indirectlyby the operations that may be performed on it, and the mathematical properties of those operations (including theirspace and time cost).

    Language supportMost assembly languages and some low-level languages, such as BCPL (Basic Combined Programming Language),lack support for data structures. Many high-level programming languages and some higher-level assemblylanguages, such as MASM, on the other hand, have special syntax or other built-in support for certain datastructures, such as vectors (one-dimensional arrays) in the C language or multi-dimensional arrays in Pascal.Most programming languages feature some sort of library mechanism that allows data structure implementations tobe reused by different programs. Modern languages usually come with standard libraries that implement the mostcommon data structures. Examples are the C++ Standard Template Library, the Java Collections Framework, andMicrosoft's .NET Framework.Modern languages also generally support modular programming, the separation between the interface of a librarymodule and its implementation. Some provide opaque data types that allow clients to hide implementation details.Object-oriented programming languages, such as C++, Java and .NET Framework may use classes for this purpose.Many known data structures have concurrent versions that allow multiple computing threads to access the datastructure simultaneously.

    References[1] Paul E. Black (ed.), entry for data structure in Dictionary of Algorithms and Data Structures. U.S. National Institute of Standards and

    Technology. 15 December 2004. Online version (http:/ / www. itl. nist. gov/ div897/ sqg/ dads/ HTML/ datastructur. html) Accessed May 21,2009.

    [2] Entry data structure in the Encyclopdia Britannica (2009) Online entry (http:/ / www. britannica. com/ EBchecked/ topic/ 152190/data-structure) accessed on May 21, 2009.

    Further reading Peter Brass, Advanced Data Structures, Cambridge University Press, 2008. Donald Knuth, The Art of Computer Programming, vol. 1. Addison-Wesley, 3rd edition, 1997. Dinesh Mehta and Sartaj Sahni Handbook of Data Structures and Applications, Chapman and Hall/CRC Press,

    2007. Niklaus Wirth, Algorithms and Data Structures, Prentice Hall, 1985.

  • Data structure 3

    External links UC Berkeley video course on data structures (http:/ / academicearth. org/ courses/ data-structures) Descriptions (http:/ / nist. gov/ dads/ ) from the Dictionary of Algorithms and Data Structures CSE.unr.edu (http:/ / www. cse. unr. edu/ ~bebis/ CS308/ ) Data structures course with animations (http:/ / www. cs. auckland. ac. nz/ software/ AlgAnim/ ds_ToC. html) Data structure tutorials with animations (http:/ / courses. cs. vt. edu/ ~csonline/ DataStructures/ Lessons/ index.

    html) An Examination of Data Structures from .NET perspective (http:/ / msdn. microsoft. com/ en-us/ library/

    aa289148(VS. 71). aspx) Schaffer, C. Data Structures and Algorithm Analysis (http:/ / people. cs. vt. edu/ ~shaffer/ Book/ C+ +

    3e20110915. pdf)

    Linked data structureIn computer science, a linked data structure is a data structure which consists of a set of data records (nodes) linkedtogether and organized by references (links or pointers).In linked data structures, the links are usually treated as special data types that can only be dereferenced or comparedfor equality. Linked data structures are thus contrasted with arrays and other data structures that require performingarithmetic operations on pointers. This distinction holds even when the nodes are actually implemented as elementsof a single array, and the references are actually array indices: as long as no arithmetic is done on those indices, thedata structure is essentially a linked one.Linking can be done in two ways - Using dynamic allocation and using array index linking.Linked data structures include linked lists, search trees, expression trees, and many other widely used data structures.They are also key building blocks for many efficient algorithms, such as topological sort[1] and set union-find.[2]

    Common types of linked data structures

    Linked listsA linked list is a collection of structures ordered not by their physical placement in memory but by logical links thatare stored as part of the data in the structure itself. It is not necessary that it should be stored in the adjacent memorylocations. Every structure has a data field and an address field. The Address field contains the address of itssuccessor.Linked list can be singly, doubly or multiply linked and can either be linear or circular.Basic Properties

    Objects, called nodes, are linked in a linear sequence A reference to the first node of the list is always kept. This is called the 'head' or 'front'.[3]

    A linked list with three nodes contain two fields each: an integer value and a link to the next node

  • Linked data structure 4

    A linked list with a single node.

    Example in Java

    This is an example of the node class used to store integers in a Java implementation of a linked list.

    public class IntNode {

    public int value;

    public IntNode link;

    public IntNode(int v) { value = v; }

    }

    Example in C

    This is an example of the node structure used for implementation of linked list in C.

    struct node

    {

    int val;

    struct node *next;

    };

    This is an example using typedefs.

    typedef struct node_s node_t;

    struct node_s

    {

    int val;

    node_t *next;

    };

    Note: A structure like this which contains a member that points to the same structure is called a self-referentialstructure.

  • Linked data structure 5

    Search treesA search tree is a tree data structure in whose nodes data values can be stored from some ordered set, which is suchthat in an in-order traversal of the tree the nodes are visited in ascending order of the stored values.Basic Properties

    Objects, called nodes, are stored in an ordered set. In-order traversal provides an ascending readout of the data in the tree Sub trees of the tree are in themselves, trees.

    Advantages and disadvantages

    Advantages against arraysCompared to arrays, linked data structures allow more flexibility in organizing the data and in allocating space for it.In arrays, the size of the array must be specified precisely at the beginning, this can be a potential waste of memory.A linked data structure is built dynamically and never needs to be bigger than the programmer requires. It alsorequires no guessing in terms of how much space you must allocate when using a linked data structure. This is afeature that is key in saving wasted memory.In array, the array elements have to be in contiguous(connected and sequential) portion of memory. But in linkeddata structure, the reference to each node gives us the information where to find out the next one. The nodes of alinked data structure can also be moved individually to different locations without affecting the logical connectionsbetween them, unlike arrays. With due care, a process can add or delete nodes to one part of a data structure evenwhile other processes are working on other parts.On the other hand, access to any particular node in a linked data structure requires following a chain of referencesthat stored in it. If the structure has n nodes, and each node contains at most b links, there will be some nodes thatcannot be reached in less than logb n steps. For many structures, some nodes may require worst case up to n1 steps.In contrast, many array data structures allow access to any element with a constant number of operations,independent of the number of entries.Broadly the implementation of these linked data structure is through dynamic data structures. It gives us the chanceto use particular space again. Memory can be utilized more efficiently by using this data structures. Memory isallocated as per the need and when memory is not further needed, deallocation is done.

    General disadvantagesLinked data structures may also incur in substantial memory allocation overhead (if nodes are allocated individually)and frustrate memory paging and processor caching algorithms (since they generally have poor locality of reference).In some cases, linked data structures may also use more memory (for the link fields) than competing array structures.This is because linked data structures are not contiguous. Instances of data can be found all over in memory, unlikearrays.In arrays, nth element can be accessed immediately, while in a linked data structure we have to follow multiplepointers so element access time varies according to where in the structure the element is.In some theoretical models of computation that enforce the constraints of linked structures, such as the pointermachine, many problems require more steps than in the unconstrained random access machine model.

  • Linked data structure 6

    References[1] Donald Knuth, The Art of Computer Programming[2] Bernard A. Galler and Michael J. Fischer. An improved equivalence algorithm. Communications of the ACM, Volume 7, Issue 5 (May 1964),

    pages 301-303. The paper originating disjoint-set forests. ACM Digital Library (http:/ / portal. acm. org/ citation. cfm?doid=364099. 364331)[3] http:/ / www. cs. toronto. edu/ ~hojjat/ 148s07/ lectures/ week5/ 07linked. pdf

    Succinct data structureIn computer science, a succinct data structure is data structure which uses an amount of space that is "close" to theinformation-theoretic lower bound, but (unlike other compressed representations) still allows for efficient queryoperations. The concept was originally introduced by Jacobson [1] to encode bit vectors, (unlabeled) trees, and planargraphs. Unlike general lossless data compression algorithms, succinct data structures retain the ability to use themin-place, without decompressing them first. A related notion is that of a compressed data structure, in which the sizeof the data structure depends upon the particular data being represented.Suppose that is the information-theoretical optimal number of bits needed to store some data. A representation ofthis data is called

    implicit if it takes bits of space, succinct if it takes bits of space, and compact if it takes bits of space.Implicit structures are thus usually reduced to storing information using some permutation of the input data; the mostwell-known example of this is the heap.

    Succinct dictionariesSuccinct indexable dictionaries, also called rank/select dictionaries, form the basis of a number of succinctrepresentation techniques, including binary trees, -ary trees and multisets,[2] as well as suffix trees and arrays.[3]

    The basic problem is to store a subset of a universe , usuallyrepresented as a bit array where iff . An indexable dictionary supports the usualmethods on dictionaries (queries, and insertions/deletions in the dynamic case) as well as the following operations:for .There is a simple representation [4] which uses bits of storage space (the original bit array and an auxiliary structure) and supports rank and select in constant time. It uses an idea similar to that for range-minimumqueries; there are a constant number of recursions before stopping at a subproblem of a limited size. The bit array is partitioned into large blocks of size bits and small blocks of size bits. For each largeblock, the rank of its first bit is stored in a separate table ; each such entry takes bits for a totalof bits of storage. Within a large block, another directory stores the rank ofeach of the small blocks it contains. The difference here is that it only needs

    bits for each entry, since only the differences from the rank of the first bit in thecontaining large block need to be stored. Thus, this table takes a total of bits. Alookup table can then be used that stores the answer to every possible rank query on a bit string of length for

    ; this requires bits of storage space. Thus, since each of theseauxiliary tables take space, this data structure supports rank queries in time and bits ofspace.To answer a query for in constant time, a constant time algorithm computes

  • Succinct data structure 7

    In practice, the lookup table can be replaced by bitwise operations and smaller tables to perform find the numberof bits set in the small blocks. This is often beneficial, since succinct data structures find their uses in large data sets,in which case cache misses become much more frequent and the chances of the lookup table being evicted fromcloser CPU caches becomes higher.[5] Select queries can be easily supported by doing a binary search on the sameauxiliary structure used for rank; however, this takes time in the worst case. A more complicated structureusing bits of additional storage can be used to support select inconstant time.[6] In practice, many of these solutions have hidden constants in the notation which dominatebefore any asymptotic advantage becomes apparent; implementations using broadword operations and word-alignedblocks often perform better in practice.[7]

    Entropy-compressed dictionaries

    The space approach can be improved by noting that there are distinct -subsets of (or binarystrings of length with exactly 1s), and thus is an information theoretic lower boundon the number of bits needed to store . There is a succinct (static) dictionary which attains this bound, namelyusing space.[8] This structure can be extended to support rank and select queries andtakes space.[2] This bound can be reduced to a space/time tradeoff byreducing the storage space of the dictionary to with queries taking time.[9]

    ExamplesWhen a sequence of variable-length items needs to be encoded, the items can simply be placed one after another,with no delimiters. A separate binary string consisting of 1s in the positions where an item begins, and 0s everywhere else is encoded along with it. Given this string, the function can quickly determine where each itembegins, given its index.[10]

    Another example is the representation of a binary tree: an arbitrary binary tree on nodes can be represented inbits while supporting a variety of operations on any node, which includes finding its parent, its left and

    right child, and returning the size of its subtree, each in constant time. The number of different binary trees on nodes is . For large , this is about ; thus we need at least about bits toencode it. A succinct binary tree therefore would occupy only bits per node.

    References[1] Jacobson, G. J (1988). Succinct static data structures.[2] Raman, R.; V. Raman, S. S Rao (2002). "Succinct indexable dictionaries with applications to encoding k-ary trees and multisets" (http:/ /

    www. cs. cmu. edu/ afs/ cs. cmu. edu/ project/ aladdin/ wwwlocal/ hash/ RaRaRa02. pdf). Proceedings of the thirteenth annual ACM-SIAMsymposium on Discrete algorithms. pp.233242. ISBN0-89871-513-X. .

    [3] Sadakane, K.; R. Grossi (2006). "Squeezing succinct data structures into entropy bounds" (http:/ / www. dmi. unisa. it/ people/ cerulli/ www/WSPages/ WSFiles/ Abs/ S3/ S33_abs_Grossi. pdf). Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm.pp.12301239. ISBN0-89871-605-5. .

    [4] Jacobson, G. (1989). Space-efficient static trees and graphs (http:/ / www. cs. cmu. edu/ afs/ cs/ project/ aladdin/ wwwlocal/ compression/00063533. pdf). .

    [5] Gonzlez, R.; S. Grabowski, V. Mkinen, G. Navarro (2005). "Practical implementation of rank and select queries" (http:/ / www. dcc. uchile.cl/ ~gnavarro/ algoritmos/ ps/ wea05. pdf). Poster Proceedings Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA).pp.2738. .

    [6] Clark, D. (1998). Compact pat trees (https:/ / uwspace. uwaterloo. ca/ bitstream/ 10012/ 64/ 1/ nq21335. pdf). .[7] Vigna, S. (2008). "Broadword implementation of rank/select queries" (http:/ / sux. dsi. unimi. it/ paper. pdf). Experimental Algorithms:

    154168. .[8] Brodnik, A.; J. I Munro (1999). "Membership in constant time and almost-minimum space" (http:/ / www. cs. cmu. edu/ afs/ cs. cmu. edu/

    project/ aladdin/ wwwlocal/ compression/ BM99. pdf). SIAM J. Comput. 28 (5): 16271640. doi:10.1137/S0097539795294165. .

  • Succinct data structure 8

    [9] Ptracu, M. (2008). "Succincter" (http:/ / people. csail. mit. edu/ mip/ papers/ succinct/ succinct. pdf). Foundations of Computer Science,2008. FOCS'08. IEEE 49th Annual IEEE Symposium on. pp.305313. .

    [10] Belazzougui, Djamal. "Hash, displace, and compress" (http:/ / cmph. sourceforge. net/ papers/ esa09. pdf). .

    Implicit data structureIn computer science, an implicit data structure is a data structure that uses very little memory besides the actualdata elements i.e. very little information other than main data is stored in these structures. These are storage schemeswhich retain no pointers and represent the file of n k-key records as a simple n by k array n thus retrieve faster. Inimplicit data structures the only structural information to be given is to allow the array to grow and shrink as n. Noextra information is required. It is called "implicit" because most of the structure of the elements is expressedimplicitly by their order. Another term used interchangeably is space efficient. Definitions of very little are vagueand can mean from O(1) to O(log n) extra space. Everything is accessed in-place, by reading bits at various positionsin the data. To achieve optimal coding, we use bits instead of bytes. Implicit data structures are frequently alsosuccinct data structures.Although one may argue that disk space is no longer a problem and we should not concern ourselves with improvingspace utilization, the issue that implicit data structures are designed to improve is main memory utilization. Harddisks, or any other means of large data capacity, I/O devices, are orders of magnitudes slower than main memory.Hence, the higher percentage of a task can fit in buffers in main memory the less dependence is on slow I/O devices.Hence, if a larger chunk of an implicit data structure fits in main memory the operations performed on it can befaster even if the asymptotic running time is not as good as its space-oblivious counterpart. Furthermore, since theCPU-cache is usually much smaller than main-memory, implicit data structures can improve cache-efficiency andthus running speed, especially if the method used improves locality.

    Implicit data structure for weighted elementFor presentation of elements with different weight several data structures are required.The structure uses one morelocation besides required for values of elements.The first structure supports worst case search time in terms of rankof weight of elements w.r.t multi-set of weights.If the elements are drawn from uniform distribution, then variationof this structure takes average time.The same result obtain for the data structures in which the intervals betweenconsecutive values have access probabilities.

    ExamplesExamples of implicit data structures include Binary heap Beap

    Further reading See publications of Herv Brnnimann [1], J. Ian Munro [2], Greg Frederickson [3]

    References[1] http:/ / photon. poly. edu/ ~hbr/[2] http:/ / www. cs. uwaterloo. ca/ ~imunro/[3] http:/ / www. cs. purdue. edu/ people/ gnf

  • Compressed data structure 9

    Compressed data structureThe term compressed data structure arises in the computer science subfields of algorithms, data structures, andtheoretical computer science. It refers to a data structure whose operations are roughly as fast as those of aconventional data structure for the problem, but whose size can be substantially smaller. The size of the compresseddata structure is typically highly dependent upon the entropy of the data being represented.Important examples of compressed data structures include the compressed suffix array[1][2] and the FM-index,[3]

    both of which can represent an arbitrary text of characters T for pattern matching. Given any input pattern P, theysupport the operation of finding if and where P appears in T. The search time is proportional to the sum of the lengthof pattern P, a very slow-growing function of the length of the text T, and the number of reported matches. The spacethey occupy is roughly equal to the size of the text T in entropy-compressed form, such as that obtained byPrediction by Partial Matching or gzip. Moreover, both data structures are self-indexing, in that they can reconstructthe text T in a random access manner, and thus the underlying text T can be discarded. In other words, theysimultaneously provide a compressed and quickly searchable representation of the text T. They represent asubstantial space improvement over the conventional suffix tree and suffix array, which occupy many times morespace than the size of T. They also support searching for arbitrary patterns, as opposed to the inverted index, whichcan support only word-based searches. In addition, inverted indexes do not have the self-indexing feature.An important related notion is that of a succinct data structure, which uses space roughly equal to theinformation-theoretic minimum, which is a worst-case notion of the space needed to represent the data. In contrast,the size of a compressed data structure depends upon the particular data being represented. When the data arecompressible, as is often the case in practice for natural language text, the compressed data structure can occupysubstantially less space than the information-theoretic minimum.

    References[1] R. Grossi and J. S. Vitter, Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching], Proceedings

    of the 32nd ACM Symposium on Theory of Computing, May 2000, 397-406. Journal version in SIAM Journal on Computing, 35(2), 2005,378-407.

    [2] R. Grossi, A. Gupta, and J. S. Vitter, High-Order Entropy-Compressed Text Indexes, Proceedings of the 14th Annual SIAM/ACM Symposiumon Discrete Algorithms, January 2003, 841-850.

    [3] P. Ferragina and G. Manzini, Opportunistic Data Structures with Applications, Proceedings of the 41st IEEE Symposium on Foundations ofComputer Science, November 2000, 390-398. Journal version in Indexing Compressed Text, Journal of the ACM, 52(4), 2005, 552-581.

  • Search data structure 10

    Search data structureIn computer science, a search data structure is any data structure that allows the efficient retrieval of specific itemsfrom a set of items, such as a specific record from a database.The simplest, most general, and least efficient search structure is merely an unordered sequential list of all the items.Locating the desired item in such a list, by the linear search method, inevitably requires a number of operationsproportional to the number n of items, in the worst case as well as in the average case. Useful search data structuresallow faster retrieval; however, they are limited to queries of some specific kind. Moreover, since the cost ofbuilding such structures is at least proportional to n, they only pay off if several queries are to be performed on thesame database (or on a database that changes little between queries).Static search structures are designed for answering many queries on a fixed database; dynamic structures also allowinsertion, deletion, or modification of items between successive queries. In the dynamic case, one must also considerthe cost of fixing the search structure to account for the changes in the database.

    ClassificationThe simplest kind of query is to locate a record that has a specific field (the key) equal to a specified value v. Othercommon kinds of query are "find the item with smallest (or largest) key value", "find the item with largest key valuenot exceeding v", "find all items with key values between specified bounds vmin and vmax".In certain databases the key values may be points in some multi-dimensional space. For example, the key may be ageographic position (latitude and longitude) on the Earth. In that case, common kinds of queries are find the recordwith a key closest to a given point v", or "find all items whose key lies at a given distance from v", or "find all itemswithin a specified region R of the space".A common special case of the latter are simultaneous range queries on two or more simple keys, such as "find allemployee records with salary between 50,000 and 100,000 and hired between 1995 and 2007".

    Single ordered keys Array if the key values span a moderately compact interval. Priority-sorted list; see linear search Key-sorted array; see binary search Self-balancing binary search tree Hash table

    Finding the smallest element Heap

    Asymptotic amortized worst-case analysis

    In this table, the asymptotic notation O(f(n)) means "not exceeding some fixed multiple of f(n) in the worst case."

  • Search data structure 11

    Insert Delete Search Find maximum Space usage

    Unsorted array O(1) O(1) O(n) O(n) O(n)

    Value-indexed array O(1) O(1) O(1) O(n) O(n)

    Sorted array O(n) O(n) O(log n) O(1) O(n)

    Unsorted linked list O(1)* O(1)* O(n) O(n) O(n)

    Sorted linked list O(n)* O(1)* O(n) O(1) O(n)

    Self-balancing binary tree O(log n) O(log n) O(log n) O(log n) O(n)

    Heap O(log n) O(log n)** O(n) O(1) O(n)

    Hash table O(1) O(1) O(1) O(n) O(n)

    * The cost to add or delete an element into a known location in the list (i.e. if you have an iterator to the location) is O(1). If you don't know the

    location, then you need to traverse the list to the location of deletion/insertion, which takes O(n) time. ** The deletion cost is O(log n) for theminimum or maximum, O(n) for an arbitrary element.

    This table is only an approximate summary; for each data structure there are special situations and variants that maylead to different costs. Also two or more data structures can be combined to obtain lower costs.

    Footnotes

    Persistent data structureIn computing, a persistent data structure is a data structure that always preserves the previous version of itselfwhen it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update thestructure in-place, but instead always yield a new updated structure. (A persistent data structure is not a datastructure committed to persistent storage, such as a disk; this is a different and unrelated sense of the word"persistent.")A data structure is partially persistent if all versions can be accessed but only the newest version can be modified.The data structure is fully persistent if every version can be both accessed and modified. If there is also a meld ormerge operation that can create a new version from two previous versions, the data structure is called confluentlypersistent. Structures that are not persistent are called ephemeral.[1]

    These types of data structures are particularly common in logical and functional programming, and in a purelyfunctional program all data is immutable, so all data structures are automatically fully persistent.[1] Persistent datastructures can also be created using in-place updating of data and these may, in general, use less time or storagespace than their purely functional counterparts.While persistence can be achieved by simple copying, this is inefficient in CPU and RAM usage, because mostoperations make only small changes to a data structure. A better method is to exploit the similarity between the newand old versions to share structure between them, such as using the same subtree in a number of tree structures.However, because it rapidly becomes infeasible to determine how many previous versions share which parts of thestructure, and because it is often desirable to discard old versions, this necessitates an environment with garbagecollection.

  • Persistent data structure 12

    Partially persistentIn the partial persistence model, we may query any previous version of the data structure, but we may only updatethe latest version. This implies a linear ordering among the versions.Three methods on balanced binary search tree:

    Fat NodeFat node method is to record all changes made to node fields in the nodes themselves, without erasing old values ofthe fields. This requires that we allow nodes to become arbitrarily fat. In other words, each fat node contains thesame information and pointer fields as an ephemeral node, along with space for an arbitrary number of extra fieldvalues. Each extra field value has an associated field name and a version stamp which indicates the version in whichthe named field was changed to have the specified value. Besides, each fat node has its own version stamp,indicating the version in which the node was created. The only purpose of nodes having version stamps is to makesure that each node only contains one value per field name per version. In order to navigate through the structure,each original field value in a node has a version stamp of zero.

    Complexity of Fat Node

    With using fat node method, it requires O(1) space for every modification: just store the new data. Each modificationtakes O(1) additional time to store the modification at the end of the modification history. This is an amortized timebound, assuming we store the modification history in a growable array. For access time, we must find the rightversion at each node as we traverse the structure. If we made m modifications, then each access operation hasO(logm) slowdown resulting from the cost of finding the nearest modification in the array.

    Path CopyingPath copy is to make a copy of all nodes on the path which contains the node we are about to insert or delete. Thenyou have to cascade the change back through the data structure: all nodes that pointed to the old node must bemodified to point to the new node instead. These modifications cause more cascading changes, and so on, until wereach to the root. Maintain an array of roots indexed by timestamp. The data structure pointed to by time ts root isexactly time ts date structure.

    Complexity of Path Copying

    With m modifications, this costs O(logm) additive lookup time. Modification time and space are bounded by the sizeof the structure, since a single modification may cause the entire structure to be copied. Thats O(m) for one update,and thus O(n2) preprocessing time.

    A combinationSleator, Tarjan et al. came up with a way to combine the advantages of fat nodes and path copying, getting O(1)access slowdown and O(1) modification space and time.In each node, we store one modification box. This box can hold one modification to the nodeeither a modificationto one of the pointers, or to the nodes key, or to some other piece of node-specific dataand a timestamp for whenthat modification was applied. Initially, every nodes modification box is empty.Whenever we access a node, we check the modification box, and compare its timestamp against the access time. (The access time specifies the version of the data structure that we care about.) If the modification box is empty, or the access time is before the modification time, then we ignore the modification box and just deal with the normal part of the node. On the other hand, if the access time is after the modification time, then we use the value in the modification box, overriding that value in the node. (Say the modification box has a new left pointer. Then well use

  • Persistent data structure 13

    it instead of the normal left pointer, but well still use the normal right pointer.)Modifying a node works like this. (We assume that each modification touches one pointer or similar field.) If thenodes modification box is empty, then we fill it with the modification. Otherwise, the modification box is full. Wemake a copy of the node, but using only the latest values.(That is, we overwrite one of the nodes fields with thevalue that was stored in the modification box.) Then we perform the modification directly on the new node, withoutusing the modification box. (We overwrite one of the new nodes fields, and its modification box stays empty.)Finally, we cascade this change to the nodes parent, just like path copying. (This may involve filling the parentsmodification box, or making a copy of the parent recursively. If the node has no parentits the rootwe add thenew root to a sorted array of roots.)With this algorithm, given any time t, at most one modification box exists in the data structure with time t. Thus, amodification at time t splits the tree into three parts: one part contains the data from before time t, one part containsthe data from after time t, and one part was unaffected by the modification.

    Complexity of the combination

    Time and space for modifications require amortized analysis. A modification takes O(1) amortized space, and O(1)amortized time. To see why, use a potential function ,where (T)is the number of full live nodes in T . The livenodes of T are just the nodes that are reachable from the current root at the current time (that is, after the lastmodification). The full live nodes are the live nodes whose modification boxes are full.Each modification involves some number of copies, say k, followed by 1 change to a modification box. (Well, notquiteyou could add a new rootbut that doesnt change the argument.) Consider each of the k copies. Each costsO(1) space and time, but decreases the potential function by one. (First, the node we copy must be full and live, so itcontributes to the potential function. The potential function will only drop, however, if the old node isnt reachable inthe new tree. But we know it isnt reachable in the new treethe next step in the algorithm will be to modify thenodes parent to point at the copy. Finally, we know the copys modification box is empty. Thus, weve replaced afull live node with an empty live node, and goes down by one.) The final step fills a modification box, which costsO(1) time and increases by one.Putting it all together, the change in is =1 k.Thus, weve paid O(k +)= O(1) space and O(k + +1) =O(1) time.

    Fully persistentIn fully persistent model, both updates and queries are allowed on any version of the data structure.

    Confluently persistentIn confluently persistent model, we use combinatiors to combine input of more than one previous versions to outputa new single version. Rather than a branching tree, combinations of versions induce a DAG (directed acyclic graph)structure on the version graph.

    Examples of persistent data structuresPerhaps the simplest persistent data structure is the singly linked list or cons-based list, a simple list of objectsformed by each carrying a reference to the next in the list. This is persistent because we can take a tail of the list,meaning the last k items for some k, and add new nodes on to the front of it. The tail will not be duplicated, insteadbecoming shared between both the old list and the new list. So long as the contents of the tail are immutable, thissharing will be invisible to the program.Many common reference-based data structures, such as red-black trees,[2] and stacks,[3] can easily be adapted to create a persistent version. Some others need slightly more effort, for example: Queue, Double-ended queues

  • Persistent data structure 14

    (dequeue), Min-Dequeue (which have additional operation min returning minimal element in constant time withoutincurring additional complexity on standard operations of queuing and dequeuing on both ends), Random access list(with constant cons/head as single linked list, but with additional operation of random access with sub-linear, mostoften logarithmic, complexity), Random access queue, Random access double-ended queue and Random accessstack (as well Random access Min-List, Min-Queue, Min-Dequeue, Min-Stack).There also exist persistent data structures which use destructible operations, making them impossible to implementefficiently in purely functional languages (like Haskell), but possible in languages like C or Java. These types of datastructures can be avoided with proper design. One primary advantage to using purely persistent data structures is thatthey often behave better in multi-threaded environments.

    Linked listsThis example is taken from Okasaki. See the bibliography.

    Singly linked lists are the bread-and-butter data structure in functional languages. In ML-derived languages andHaskell, they are purely functional because once a node in the list has been allocated, it cannot be modified, onlycopied or destroyed. Note that ML itself is not purely functional.Consider the two lists:

    xs = [0, 1, 2]

    ys = [3, 4, 5]

    These would be represented in memory by:

    where a circle indicates a node in the list (the arrow out representing the second element of the node which is apointer to another node).Now concatenating the two lists:

    zs = xs ++ ys

    results in the following memory structure:

  • Persistent data structure 15

    Notice that the nodes in list xs have been copied, but the nodes in ys are shared. As a result, the original lists (xsand ys) persist and have not been modified.The reason for the copy is that the last node in xs (the node containing the original value 2) cannot be modified topoint to the start of ys, because that would change the value of xs.

    TreesThis example is taken from Okasaki. See the bibliography.

    Consider a binary tree used for fast searching, where every node has the recursive invariant that subnodes on the leftare less than the node, and subnodes on the right are greater than the node.For instance, the set of data

    xs = [a, b, c, d, f, g, h]

    might be represented by the following binary search tree:

  • Persistent data structure 16

    A function which inserts data into the binary tree and maintains the invariant is:

    fun insert (x, E) = T (E, x, E)

    | insert (x, s as T (a, y, b)) =

    if x < y then T (insert (x, a), y, b)

    else if x > y then T (a, y, insert (x, b))

    else s

    After executing

    ys = insert ("e", xs)

    we end up with the following:

  • Persistent data structure 17

    Notice two points: Firstly the original tree (xs) persists. Secondly many common nodes are shared between the oldtree and the new tree. Such persistence and sharing is difficult to manage without some form of garbage collection(GC) to automatically free up nodes which have no live references, and this is why GC is a feature commonly foundin functional programming languages.

    Reference cyclesSince every value in a purely functional computation is built up out of existing values, it would seem that it isimpossible to create a cycle of references. In that case, the reference graph (the graph of the references from object toobject) could only be a directed acyclic graph. However, in most functional languages, functions can be definedrecursively; this capability allows recursive structures using functional suspensions. In lazy languages, such asHaskell, all data structures are represented as implicitly suspended thunks; in these languages any data structure canbe recursive because a value can be defined in terms of itself. Some other languages, such as OCaml, allow theexplicit definition of recursive values.

    References[1] Kaplan, Haim (2001). "Persistent data structures" (http:/ / www. math. tau. ac. il/ ~haimk/ papers/ persistent-survey. ps). Handbook on Data

    Structures and Applications (CRC Press). .[2] Neil Sarnak, Robert E. Tarjan (1986). "Planar Point Location Using Persistent Search Trees" (http:/ / www. link. cs. cmu. edu/ 15859-f07/

    papers/ point-location. pdf). Communications of the ACM 29 (7): 669679. doi:10.1145/6138.6151. .[3] Chris Okasaki. Purely Functional Data Structures (thesis) (http:/ / www. cs. cmu. edu/ ~rwh/ theses/ okasaki. pdf). .

    Further reading Persistent Data Structures and Managed References (http:/ / www. infoq. com/ presentations/

    Value-Identity-State-Rich-Hickey) - video presentation by Rich Hickey on Clojure's use of persistent datastructures and how they support concurrency

  • Persistent data structure 18

    Making Data Structures Persistent (http:/ / www. cs. cmu. edu/ ~sleator/ papers/ Persistence. htm) by James R.Driscoll, Neil Sarnak, Daniel D. Sleator, Robert E. Tarjan

    Fully persistent arrays for efficient incremental updates and voluminous reads (http:/ / citeseerx. ist. psu. edu/viewdoc/ summary?doi=10. 1. 1. 34. 1317)

    Real-Time Deques, Multihead Turing Machines, and Purely Functional Programming (http:/ / citeseerx. ist. psu.edu/ viewdoc/ summary?doi=10. 1. 1. 51. 2895)

    Purely functional data structures by Chris Okasaki, Cambridge University Press, 1998, ISBN 0-521-66350-4. Purely Functional Data Structures (http:/ / www. cs. cmu. edu/ ~rwh/ theses/ okasaki. pdf) thesis by Chris Okasaki

    (PDF format) Fully Persistent Lists with Catenation (http:/ / www. cs. cmu. edu/ ~sleator/ papers/ fully-persistent-lists. pdf) by

    James R. Driscoll, Daniel D. Sleator, Robert E. Tarjan (PDF) Persistent Data Structures (http:/ / ocw. mit. edu/ courses/ electrical-engineering-and-computer-science/

    6-854j-advanced-algorithms-fall-2005/ lecture-notes/ persistent. pdf) from MIT open course AdvancedAlgorithms (http:/ / ocw. mit. edu/ courses/ electrical-engineering-and-computer-science/6-854j-advanced-algorithms-fall-2005)

    External links Lightweight Java implementation of Persistent Red-Black Trees (http:/ / wiki. edinburghhacklab. com/

    PersistentRedBlackTreeSet)

    Concurrent data structureIn computer science, a concurrent data structure is a particular way of storing and organizing data for access bymultiple computing threads (or processes) on a computer.Historically, such data structures were used on uniprocessor machines with operating systems that supportedmultiple computing threads (or processes). The term concurrency captured the multiplexing/interleaving of thethreads' operations on the data by the operating system, even though the processors never issued two operations thataccessed the data simultaneously.Today, as multiprocessor computer architectures that provide parallelism become the dominant computing platform(through the proliferation of multi-core processors), the term has come to stand mainly for data structures that can beaccessed by multiple threads which may actually access the data simultaneously because they run on differentprocessors that communicate with one another. The concurrent data structure (sometimes also called a shared datastructure) is usually considered to reside in an abstract storage environment called shared memory, though thismemory may be physically implemented as either a "tightly coupled" or a distributed collection of storage modules.

    Basic principlesConcurrent data structures, intended for use in parallel or distributed computing environments, differ from"sequential" data structures, intended for use on a processor machine, in several ways .[1] Most notably, in asequential environment one specifies the data structure's properties and checks that they are implemented correctly,by providing safety properties. In a concurrent environment, the specification must also describe livenessproperties which an implementation must provide. Safety properties usually state that something bad never happens,while liveness properties state that something good keeps happening. These properties can be expressed, forexample, using Linear Temporal Logic.The type of liveness requirements tend to define the data structure. The method calls can be blocking or non-blocking. Data structures are not restricted to one type or the other, and can allow combinations where some

  • Concurrent data structure 19

    method calls are blocking and others are non-blocking (examples can be found in the Java concurrency softwarelibrary).The safety properties of concurrent data structures must capture their behavior given the many possible interleavingsof methods called by different threads. It is quite intuitive to specify how abstract data structures behave in asequential setting in which there are no interleavings. Therefore, many mainstream approaches for arguing the safetyproperties of a concurrent data structure (such as serializability, linearizability, sequential consistency, and quiescentconsistency [1]) specify the structures properties sequentially, and map its concurrent executions to a collection ofsequential ones.In order to guarantee the safety and liveness properties, concurrent data structures must typically (though not always)allow threads to reach consensus as to the results of their simultaneous data access and modification requests. Tosupport such agreement, concurrent data structures are implemented using special primitive synchronizationoperations (see synchronization primitives) available on modern multiprocessor machines that allow multiple threadsto reach consensus. This consensus can be reached achieved in a blocking manner by using locks, or without locks,in which case it is non-blocking. There is a wide body of theory on the design of concurrent data structures (seebibliographical references).

    Design and ImplementationConcurrent data structures are significantly more difficult to design and to verify as being correct than theirsequential counterparts.The primary source of this additional difficulty is concurrency, exacerbated by the fact that threads must be thoughtof as being completely asynchronous: they are subject to operating system preemption, page faults, interrupts, and soon.On today's machines, the layout of processors and memory, the layout of data in memory, the communication loadon the various elements of the multiprocessor architecture all influence performance. Furthermore, there is a tensionbetween correctness and performance: algorithmic enhancements that seek to improve performance often make itmore difficult to design and verify a correct data structure implementation.A key measure for performance is scalability, captured by the speedup of the implementation. Speedup is a measureof how effectively the application is utilizing the machine it is running on. On a machine with P processors, thespeedup is the ratio of the structures execution time on a single processor to its execution time on T processors.Ideally, we want linear speedup: we would like to achieve a speedup of P when using P processors. Data structureswhose speedup grows with P are called scalable. The extent to which one can scale the performance of a concurrentdata structure is captured by a formula known as Amdahl's law and more refined versions of it such as Gustafson'slaw.A key issue with the performance of concurrent data structures is the level of memory contention: the overhead intraffic to and from memory as a result of multiple threads concurrently attempting to access the same locations inmemory. This issue is most acute with blocking implementations in which locks control access to memory. In orderto acquire a lock, a thread must repeatedly attempt to modify that location. On a cache-coherent multiprocessor (onein which processors have local caches that are updated by hardware in order to keep them consistent with the latestvalues stored) this results in long waiting times for each attempt to modify the location, and is exacerbated by theadditional memory traffic associated with unsuccessful attempts to acquire the lock.

  • Concurrent data structure 20

    References[1] Mark Moir and Nir Shavit (2007). " Concurrent Data Structures (http:/ / www. cs. tau. ac. il/ ~shanir/ concurrent-data-structures. pdf)". In

    Dinesh Metha and Sartaj Sahni. 'Handbook of Data Structures and Applications' (1st ed.). Chapman and Hall/CRC Press. pp.47-14 47-30.

    Further reading Nancy Lynch "Distributed Computing" Hagit Attiya and Jennifer Welch "Distributed Computing: Fundamentals, Simulations And Advanced Topics, 2nd

    Ed" Doug Lea, "Concurrent Programming in Java: Design Principles and Patterns" Maurice Herlihy and Nir Shavit, "The Art of Multiprocessor Programming" Mattson, Sanders, and Massingil "Patterns for Parallel Programming"

    External links Multithreaded data structures for parallel computing, Part 1 (http:/ / www. ibm. com/ developerworks/ aix/

    library/ au-multithreaded_structures1/ index. html) (Designing concurrent data structures) by Arpan Sen Multithreaded data structures for parallel computing: Part 2 (http:/ / www. ibm. com/ developerworks/ aix/

    library/ au-multithreaded_structures2/ index. html) (Designing concurrent data structures without mutexes) byArpan Sen

    libcds (http:/ / libcds. sourceforge. net/ ) - C++ library of lock-free containers and safe memory reclamationschema

  • 21

    Abstract data types

    Abstract data typeIn computer science, an abstract data type (ADT) is a mathematical model for a certain class of data structures thathave similar behavior; or for certain data types of one or more programming languages that have similar semantics.An abstract data type is defined indirectly, only by the operations that may be performed on it and by mathematicalconstraints on the effects (and possibly cost) of those operations.[1]

    For example, an abstract stack could be defined by three operations: push, that inserts some data item onto thestructure, pop, that extracts an item from it (with the constraint that each pop always returns the most recentlypushed item that has not been popped yet), and peek, that allows data on top of the structure to be examined withoutremoval. When analyzing the efficiency of algorithms that use stacks, one may also specify that all operations takethe same time no matter how many items have been pushed into the stack, and that the stack uses a constant amountof storage for each element.Abstract data types are purely theoretical entities, used (among other things) to simplify the description of abstractalgorithms, to classify and evaluate data structures, and to formally describe the type systems of programminglanguages. However, an ADT may be implemented by specific data types or data structures, in many ways and inmany programming languages; or described in a formal specification language. ADTs are often implemented asmodules: the module's interface declares procedures that correspond to the ADT operations, sometimes withcomments that describe the constraints. This information hiding strategy allows the implementation of the module tobe changed without disturbing the client programs.The term abstract data type can also be regarded as a generalised approach of a number of algebraic structures,such as lattices, groups, and rings.[2] This can be treated as part of subject area of artificial intelligence. The notion ofabstract data types is related to the concept of data abstraction, important in object-oriented programming and designby contract methodologies for software development .

    Defining an abstract data type (ADT)An abstract data type is defined as a mathematical model of the data objects that make up a data type as well as thefunctions that operate on these objects. There are no standard conventions for defining them. A broad division maybe drawn between "imperative" and "functional" definition styles.

    Imperative abstract data type definitionsIn the "imperative" view, which is closer to the philosophy of imperative programming languages, an abstract datastructure is conceived as an entity that is mutable meaning that it may be in different states at different times.Some operations may change the state of the ADT; therefore, the order in which operations are evaluated isimportant, and the same operation on the same entities may have different effects if executed at different times just like the instructions of a computer, or the commands and procedures of an imperative language. To underscorethis view, it is customary to say that the operations are executed or applied, rather than evaluated. The imperativestyle is often used when describing abstract algorithms. This is described by Donald E. Knuth and can be referencedfrom here The Art of Computer Programming.

  • Abstract data type 22

    Abstract variable

    Imperative ADT definitions often depend on the concept of an abstract variable, which may be regarded as thesimplest non-trivial ADT. An abstract variable V is a mutable entity that admits two operations: store(V,x) where x is a value of unspecified nature; and fetch(V), that yields a value;with the constraint that fetch(V) always returns the value x used in the most recent store(V,x) operation on the same variable V.As in so many programming languages, the operation store(V,x) is often written V x (or some similar notation),and fetch(V) is implied whenever a variable V is used in a context where a value is required. Thus, for example, V V + 1 is commonly understood to be a shorthand for store(V,fetch(V) + 1).In this definition, it is implicitly assumed that storing a value into a variable U has no effect on the state of a distinctvariable V. To make this assumption explicit, one could add the constraint that if U and V are distinct variables, the sequence { store(U,x); store(V,y) } is equivalent to { store(V,y);store(U,x) }.

    More generally, ADT definitions often assume that any operation that changes the state of one ADT instance has noeffect on the state of any other instance (including other instances of the same ADT) unless the ADT axiomsimply that the two instances are connected (aliased) in that sense. For example, when extending the definition ofabstract variable to include abstract records, the operation that selects a field from a record variable R must yield avariable V that is aliased to that part of R.The definition of an abstract variable V may also restrict the stored values x to members of a specific set X, called therange or type of V. As in programming languages, such restrictions may simplify the description and analysis ofalgorithms, and improve their readability.Note that this definition does not imply anything about the result of evaluating fetch(V) when V is un-initialized,that is, before performing any store operation on V. An algorithm that does so is usually considered invalid,because its effect is not defined. (However, there are some important algorithms whose efficiency strongly dependson the assumption that such a fetch is legal, and returns some arbitrary value in the variable's range.)

    Instance creation

    Some algorithms need to create new instances of some ADT (such as new variables, or new stacks). To describesuch algorithms, one usually includes in the ADT definition a create() operation that yields an instance of theADT, usually with axioms equivalent to the result of create() is distinct from any instance S in use by the algorithm.This axiom may be strengthened to exclude also partial aliasing with other instances. On the other hand, this axiomstill allows implementations of create() to yield a previously created instance that has become inaccessible to theprogram.

  • Abstract data type 23

    Preconditions, postconditions, and invariants

    In imperative-style definitions, the axioms are often expressed by preconditions, that specify when an operation maybe executed; postconditions, that relate the states of the ADT before and after the execution of each operation; andinvariants, that specify properties of the ADT that are not changed by the operations.

    Example: abstract stack (imperative)

    As another example, an imperative definition of an abstract stack could specify that the state of a stack S can bemodified only by the operations push(S,x), where x is some value of unspecified nature; and pop(S), that yields a value as a result;with the constraint that For any value x and any abstract variable V, the sequence of operations { push(S,x); V pop(S) } is equivalent

    to { V x };Since the assignment { V x }, by definition, cannot change the state of S, this condition implies that { V pop(S) } restores S to the state it had before the { push(S,x) }. From this condition and from the properties ofabstract variables, it follows, for example, that the sequence

    { push(S,x); push(S,y); U pop(S); push(S,z); V pop(S); W pop(S); }where x,y, and z are any values, and U, V, W are pairwise distinct variables, is equivalent to

    { U y; V z; W x }Here it is implicitly assumed that operations on a stack instance do not modify the state of any other ADT instance,including other stacks; that is, For any values x,y, and any distinct stacks S and T, the sequence { push(S,x); push(T,y) } is equivalent to {push(T,y); push(S,x) }.

    A stack ADT definition usually includes also a Boolean-valued function empty(S) and a create() operation thatreturns a stack instance, with axioms equivalent to create() S for any stack S (a newly created stack is distinct from all previous stacks) empty(create()) (a newly created stack is empty) not empty(push(S,x)) (pushing something into a stack makes it non-empty)

    Single-instance style

    Sometimes an ADT is defined as if only one instance of it existed during the execution of the algorithm, and alloperations were applied to that instance, which is not explicitly notated. For example, the abstract stack above couldhave been defined with operations push(x) and pop(), that operate on "the" only existing stack. ADT definitions inthis style can be easily rewritten to admit multiple coexisting instances of the ADT, by adding an explicit instanceparameter (like S in the previous example) to every operation that uses or modifies the implicit instance.On the other hand, some ADTs cannot be meaningfully defined without assuming multiple instances. This is the casewhen a single operation takes two distinct instances of the ADT as parameters. For an example, consider augmentingthe definition of the stack ADT with an operation compare(S,T) that checks whether the stacks S and T contain thesame items in the same order.

  • Abstract data type 24

    Functional ADT definitionsAnother way to define an ADT, closer to the spirit of functional programming, is to consider each state of thestructure as a separate entity. In this view, any operation that modifies the ADT is modeled as a mathematicalfunction that takes the old state as an argument, and returns the new state as part of the result. Unlike the"imperative" operations, these functions have no side effects. Therefore, the order in which they are evaluated isimmaterial, and the same operation applied to the same arguments (including the same input states) will alwaysreturn the same results (and output states).In the functional view, in particular, there is no way (or need) to define an "abstract variable" with the semantics ofimperative variables (namely, with fetch and store operations). Instead of storing values into variables, onepasses them as arguments to functions.

    Example: abstract stack (functional)

    For example, a complete functional-style definition of a stack ADT could use the three operations: push: takes a stack state and an arbitrary value, returns a stack state; top: takes a stack state, returns a value; pop: takes a stack state, returns a stack state;In a functional-style definition there is no need for a create operation. Indeed, there is no notion of "stackinstance". The stack states can be thought of as being potential states of a single stack structure, and two stack statesthat contain the same values in the same order are considered to be identical states. This view actually mirrors thebehavior of some concrete implementations, such as linked lists with hash cons.Instead of create(), a functional definition of a stack ADT may assume the existence of a special stack state, theempty stack, designated by a special symbol like or "()"; or define a bottom() operation that takes no argumentsand returns this special stack state. Note that the axioms imply that push(,x) In a functional definition of a stack one does not need an empty predicate: instead, one can test whether a stack isempty by testing whether it is equal to .Note that these axioms do not define the effect of top(s) or pop(s), unless s is a stack state returned by a push.Since push leaves the stack non-empty, those two operations are undefined (hence invalid) when s = . On theother hand, the axioms (and the lack of side effects) imply that push(s,x) = push(t,y) if and only if x = y and s = t.As in some other branches of mathematics, it is customary to assume also that the stack states are only those whoseexistence can be proved from the axioms in a finite number of steps. In the stack ADT example above, this rulemeans that every stack is a finite sequence of values, that becomes the empty stack () after a finite number of pops.By themselves, the axioms above do not exclude the existence of infinite stacks (that can be poped forever, eachtime yielding a different state) or circular stacks (that return to the same state after a finite number of pops). Inparticular, they do not exclude states s such that pop(s) = s or push(s,x) = s for some x. However, since one cannotobtain such stack states with the given operations, they are assumed "not to exist".

    Advantages of abstract data typing EncapsulationAbstraction provides a promise that any implementation of the ADT has certain properties and abilities; knowingthese is all that is required to make use of an ADT object. The user does not need any technical knowledge of howthe implementation works to use the ADT. In this way, the implementation may be complex but will be encapsulatedin a simple interface when it is actually used. Localization of change

  • Abstract data type 25

    Code that uses an ADT object will not need to be edited if the implementation of the ADT is changed. Since anychanges to the implementation must still comply with the interface, and since code using an ADT may only refer toproperties and abilities specified in the interface, changes may be made to the implementation without requiring anychanges in code where the ADT is used. FlexibilityDifferent implementations of an ADT, having all the same properties and abilities, are equivalent and may be usedsomewhat interchangeably in code that uses the ADT. This gives a great deal of flexibility when using ADT objectsin different situations. For example, different implementations of an ADT may be more efficient in differentsituations; it is possible to use each in the situation where they are preferable, thus increasing overall efficiency.

    Typical operationsSome operations that are often specified for ADTs (possibly under other names) are compare(s,t), that tests whether two structures are equivalent in some sense; hash(s), that computes some standard hash function from the instance's state; print(s) or show(s), that produces a human-readable representation of the structure's state.In imperative-style ADT definitions, one often finds also create(), that yields a new instance of the ADT; initialize(s), that prepares a newly created instance s for further operations, or resets it to some "initial

    state"; copy(s,t), that puts instance s in a state equivalent to that of t; clone(t), that performs s new(), copy(s,t), and returns s; free(s) or destroy(s), that reclaims the memory and other resources used by s;The free operation is not normally relevant or meaningful, since ADTs are theoretical entities that do not "usememory". However, it may be necessary when one needs to analyze the storage used by an algorithm that uses theADT. In that case one needs additional axioms that specify how much memory each ADT instance uses, as afunction of its state, and how much of it is returned to the pool by free.

    ExamplesSome common ADTs, which have proved useful in a great variety of applications, are Container Deque List Map Multimap Multiset Priority queue Queue Set Stack String TreeEach of these ADTs may be defined in many ways and variants, not necessarily equivalent. For example, a stackADT may or may not have a count operation that tells how many items have been pushed and not yet popped.This choice makes a difference not only for its clients but also for the implementation.

  • Abstract data type 26

    ImplementationImplementing an ADT means providing one procedure or function for each abstract operation. The ADT instancesare represented by some concrete data structure that is manipulated by those procedures, according to the ADT'sspecifications.Usually there are many ways to implement the same ADT, using several different concrete data structures. Thus, forexample, an abstract stack can be implemented by a linked list or by an array.An ADT implementation is often packaged as one or more modules, whose interface contains only the signature(number and types of the parameters and results) of the operations. The implementation of the module namely,the bodies of the procedures and the concrete data structure used can then be hidden from most clients of themodule. This makes it possible to change the implementation without affecting the clients.When implementing an ADT, each instance (in imperative-style definitions) or each state (in functional-styledefinitions) is usually represented by a handle of some sort.[3]

    Modern object-oriented languages, such as C++ and Java, support a form of abstract data types. When a class is usedas a type, it is an abstract type that refers to a hidden representation. In this model an ADT is typically implementedas a class, and each instance of the ADT is an object of that class. The module's interface typically declares theconstructors as ordinary procedures, and most of the other ADT operations as methods of that class. However, suchan approach does not easily encapsulate multiple representational variants found in an ADT. It also can underminethe extensibility of object-oriented programs. In a pure object-oriented program that uses interfaces as types, typesrefer to behaviors not representations.

    Example: implementation of the stack ADTAs an example, here is an implementation of the stack ADT above in the C programming language.Imperative-style interface

    An imperative-style interface might be:

    typedef struct stack_Rep stack_Rep; /* Type: instance

    representation (an opaque record). */

    typedef stack_Rep *stack_T; /* Type: handle to a stack

    instance (an opaque pointer). */

    typedef void *stack_Item; /* Type: value that can be

    stored in stack (arbitrary address). */

    stack_T stack_create(void); /* Create new stack

    instance, initially empty. */

    void stack_push(stack_T s, stack_Item e); /* Add an item at the top of

    the stack. */

    stack_Item stack_pop(stack_T s); /* Remove the top item from

    the stack and return it . */

    int stack_empty(stack_T ts); /* Check whether stack is

    empty. */

    This implementation could be used in the following manner:

    #include /* Include the stack interface. */

    stack_T t = stack_create(); /* Create a stack instance. */

    int foo = 17; /* An arbitrary datum. */

    t = stack_push(t, &foo); /* Push the address of 'foo' onto the

  • Abstract data type 27

    stack. */

    void *e = stack_pop(t); /* Get the top item and delete it from

    the stack. */

    if (stack_empty(t)) { } /* Do something if stack is empty. */

    This interface can be implemented in many ways. The implementation may be arbitrarily inefficient, since the formaldefinition of the ADT, above, does not specify how much space the stack may use, nor how long each operationshould take. It also does not specify whether the stack state t continues to exist after a call s pop(t).In practice the formal definition should specify that the space is proportional to the number of items pushed and notyet popped; and that every one of the operations above must finish in a constant amount of time, independently ofthat number. To comply with these additional specifications, the implementation could use a linked list, or an array(with dynamic resizing) together with two integers (an item count and the array size)

    Functional-style interface

    Functional-style ADT definitions are more appropriate for functional programming languages, and vice-versa.However, one can provide a functional style interface even in an imperative language like C. For example:

    typedef struct stack_Rep stack_Rep; /* Type: stack state

    representation (an opaque record). */

    typedef stack_Rep *stack_T; /* Type: handle to a stack

    state (an opaque pointer). */

    typedef void *stack_Item; /* Type: item (arbitrary

    address). */

    stack_T stack_empty(void); /* Returns the empty stack

    state. */

    stack_T stack_push(stack_T s, stack_Item x); /* Adds x at the top of s,

    returns the resulting state. */

    stack_Item stack_top(stack_T s); /* Returns the item

    currently at the top of s. */

    stack_T stack_pop(stack_T s); /* Remove the top item

    from s, returns the resulting state. */

    The main problem is that C lacks garbage collection, and this makes this style of programming impractical;moreover, memory allocation routines in C are slower than allocation in a typical garbage collector, thus theperformance impact of so many allocations is even greater.

    ADT librariesMany modern programming languages, such as C++ and Java, come with standard libraries that implement severalcommon ADTs, such as those listed above.

    Built-in abstract data typesThe specification of some programming languages is intentionally vague about the representation of certain built-indata types, defining only the operations that can be done on them. Therefore, those types can be viewed as "built-inADTs". Examples are the arrays in many scripting languages, such as Awk, Lua, and Perl, which can be regarded asan implementation of the Map or Table ADT.

  • Abstract data type 28

    References[1] Barbara Liskov, Programming with Abstract Data Types, in Proceedings of the ACM SIGPLAN Symposium on Very High Level Languages,

    pp. 50--59, 1974, Santa Monica, California[2] Rudolf Lidl (2004). Abstract Algebra. Springer. ISBN81-8128-149-7., Chapter 7,section 40.[3] Robert Sedgewick (1998). Algorithms in C. Addison/Wesley. ISBN0-201-31452-5., definition 4.4.

    Further Mitchell, John C.; Plotkin, Gordon (July 1988). "Abstract Types Have Existential Type" (http:/ / theory. stanford.

    edu/ ~jcm/ papers/ mitch-plotkin-88. pdf). ACM Transactions on Programming Languages and Systems 10 (3).

    External links Abstract data type (http:/ / www. nist. gov/ dads/ HTML/ abstractDataType. html) in NIST Dictionary of

    Algorithms and Data Structures

    ListIn computer science, a list or sequence is an abstract data type that implements an ordered collection of values,where the same value may occur more than once. An instance of a list is a computer representation of themathematical concept of a finite sequence. Each instance of a value in the list is usually called an item, entry, orelement of the list; if the same value occurs multiple times, each occurrence is considered a distinct item.

    A singly linked list structure, implementing a listwith 3 integer elements.

    The name list is also used for several concrete data structures that canbe used to implement abstract lists, especially linked lists.

    The so-called static list structures allow only inspection andenumeration of the values. A mutable or dynamic list may allowitems to be inserted, replaced, or deleted during the list's existence.

    Many programming languages provide support for list data types, and have special syntax and semantics for listsand list operations. A list can often be constructed by writing the items in sequence, separated by commas,semicolons, or spaces, within a pair of delimiters such as parentheses '()', brackets, '[]', braces '{}', or angle brackets''. Some languages may allow list types to be indexed or sliced like array types. In object-oriented programminglanguages, lists are usually provided as instances of subclasses of a generic "list" class. List data types are oftenimplemented using arrays or linked lists of some sort, but other data structures may be more appropriate for someapplications. In some contexts, such as in Lisp programming, the term list may refer specifically to a linked listrather than an array.In type theory and functional programming, abstract lists are usually defined inductively by four operations: nil thatyields the empty list, cons, which adds an item at the beginning of a list, head, that returns the first element of a list,and tail that returns a list minus its first element. Formally, Peano's natural numbers can be defined as abstract listswith elements of unit type.

  • List 29

    OperationsImplementation of the list data structure may provide some of the following operations: a constructor for creating an empty list; an operation for testing whether or not a list is empty; an operation for prepending an entity to a list an operation for appending an entity to a list an operation for determining the first component (or the "head") of a list an operation for referring to the list consisting of all the components of a list except for its first (this is called the

    "tail" of the list.)

    CharacteristicsLists have the following properties: The size of lists. It indicates how many elements there are in the list. Equality of lists:

    In mathematics, sometimes equality of lists is defined simply in terms of object identity: two lists are equal ifand only if they are the same object.

    In modern programming languages, equality of lists is normally defined in terms of structural equality of thecorresponding entries, except that if the lists are typed, then the list types may also be relevant.

    Lists may be typed. This implies that the entries in a list must have types that are compatible with the list's type.It is common that lists are typed when they are implemented using arrays.

    Each element in the list has an index. The first element commonly has index 0 or 1 (or some other predefinedinteger). Subsequent elements have indices that are 1 higher than the previous element. The last element has index + 1. It is possible to retrieve the element at a particular index. It is possible to traverse the list in the order of increasing index. It is possible to change the element at a particular index to a different value, without affecting any other

    elements. It is possible to insert an element at a particular index. The indices of higher elements at that are increased by

    1. It is possible to remove an element at a particular index. The indices of higher elements at that are decreased

    by 1.

    ImplementationsLists are typically implemented either as linked lists (either singly or doubly linked) or as arrays, usually variablelength or dynamic arrays.The standard way of implementing lists, originating with the programming language Lisp, is to have each element ofthe list contain both its value and a pointer indicating the location of the next element in the list. This results in eithera linked list or a tree, depending on whether the list has nested sublists. Some older Lisp implementations (such asthe Lisp implementation of the Symbolics 3600) also supported "compressed lists" (using CDR coding) which had aspecial internal representation (invisible to the user). Lists can be manipulated using iteration or recursion. Theformer is often preferred in imperative programming languages, while the latter is the norm in functional languages.Lists can be implemented as self-balancing binary search trees holding index-value pairs, providing equal-timeaccess to any element (e.g. all residing in the fringe, and internal nodes storing the right-most child's index, used toguide the search), taking the time logarithmic in the list's size, but as long as it doesn't change much will provide theillusion of random access and enable swap, prefix and append operations in logarithmic time as well.

  • List 30

    Programming language supportSome languages do not offer a list data structure, but offer the use of associative arrays or some kind of table toemulate lists. For example, Lua provides tables. Although Lua stores lists that have numerical indices as arraysinternally, they still appear as hash tables.In Lisp, lists are the fundamental data type and can represent both program code and data. In most dialects, the list ofthe first three prime numbers could be written as (list 2 3 5). In several dialects of Lisp, including Scheme, alist is a collection of pairs, consisting of a value and a pointer to the next pair (or null value), making a singly linkedlist.

    ApplicationsAs the name implies, lists can be used to store a list of records. The items in a list can be sorted for the purpose offast search (binary search).Because in computing, lists are easier to realize than sets, a finite set in mathematical sense can be realized as a listwith additional restrictions, that is, duplicate elements are disallowed and such that order is irrelevant. If the list issorted, it speeds up determining if a given item is already in the set but in order to ensure the order, it requires moretime to add new entry to the list. In efficient implementations, however, sets are implemented using self-balancingbinary search trees or hash tables, rather than a list.

    Abstract definitionThe abstract list type L with elements of some type E (a monomorphic list) is defined by the following functions:

    nil: () Lcons: E L Lfirst: L Erest: L L

    with the axiomsfirst (cons (e, l)) = erest (cons (e, l)) = l

    for any element e and any list l. It is implicit thatcons (e, l) lcons (e, l) econs (e1, l1) = cons (e2, l2) if e1 = e2 and l1 = l2

    Note that first (nil ()) and rest (nil ()) are not defined.These axioms are equivalent to those of the abstract stack data type.In type theory, the above definition is more simply regarded as an inductive type defined in terms of constructors: niland cons. In algebraic terms, this can be represented as the transformation 1 + E L L. first and rest are thenobtained by pattern matching on the cons constructor and separately handling the nil case.

  • List 31

    The list monadThe list type forms a monad with the following functions (using E* rather than L to represent monomorphic lists withelements of type E):

    where append is defined as:

    Alternatively, the monad may be defined in terms of operations return, fmap and join, with:

    Note that fmap, join, append and bind are well-defined, since they're applied to progressively deeper arguments ateach recursive call.The list type is an additive monad, with nil as the monadic zero and append as monadic sum.Lists form a monoid under the append operation. The identity element of the monoid is the empty list, nil. In fact,this is the free monoid over the set of list elements.

  • Stack 32

    Stack

    Simple representation of a stack

    In computer science, a stack is a type of queue that in practice isimplemented as an area of memory that holds all local variablesand parameters used by any function, and remembers the order inwhich functions are called so that function returns occur correctly.Each time a function is called, its local variables and parametersare "pushed onto" the stack. When the function returns, theselocals and parameters are "popped." Because of this, the size of aprogram's stack fluctuates constantly as the program is running,but it has some maximum size.

    One way of describing the stack is as a last in, first out (LIFO)abstract data type and linear data structure. A stack can have anyabstract data type as an element, but is characterized by two fundamental operations, called push and pop (or pull).The push operation adds a new item to the top of the stack, or initializes the stack if it is empty. If the stack is fulland does not contain enough space to accept the given item, the stack is then considered to be in an overflow state.The pop operation removes an item from the top of the stack. A pop either reveals previously concealed items orresults in an empty stack, but if the stack is empty then it goes into underflow state (It means no items are present instack to be removed). A stack pointer is the register which holds the value of the stack. The stack pointer alwayspoints to the top value of the stack.

    A stack is a restricted data structure, because only a small number of operations are performed on it. The nature ofthe pop and push operations also means that stack elements have a natural order. Elements are removed from thestack in the reverse order to the order of their addition: therefore, the lower elements are those that have been on thestack the longest.[1]

    HistoryThe stack was first proposed in 1946, in the computer design of Alan M. Turing (who used the terms "bury" and"unbury") as a means of calling and returning from subroutines. In 1957, the Germans Klaus Samelson and FriedrichL. Bauer patented the idea.[2] The same concept was developed, independently, by the Australian Charles LeonardHamblin in the first half of 1957.[3]

    Abstract definitionA stack is a basic computer science data structure and can be defined in an abstract, implementation-free manner, orit can be generally defined as a linear list of items in which all additions and deletion are restricted to one end that isTop.This is a VDM (Vienna Development Method) description of a stack:[4]

    Function signatures:

    init: -> Stack

    push: N x Stack -> Stack

    top: Stack -> (N U ERROR)


Recommended