Date post: | 13-Sep-2015 |
Category: |
Documents |
Upload: | rajendranbehappy |
View: | 15 times |
Download: | 0 times |
Data Structures Introduction
BASIC TERMINOLOGY Definition: A data structure is a specialized format for organizing and storing data. General data structure types include the array, the file, the record, the table, the tree, and so on. Any data structure is designed to organize data to suit a specific purpose so that it can be accessed and worked with in appropriate ways. 1.1 Elementary Data Organization
1.1.1 Data and Data Item
Data are simply collection of facts and figures. Data are values or set of values. A data item refers to a single unit of values. Data items that are divided into sub items are group items; those that are not are called elementary items. For example, a students name may be divided into three sub items [first name, middle name and last name] but the ID of a student would normally be treated as a single item.
In the above example ( ID, Age, Gender, First, Middle, Last, Street, Area ) are elementary data items, whereas (Name, Address ) are group data items.
1.1.2 Data Type Data type is a classification identifying one of various types of data, such as
floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; and the way values of that type can be stored. It is of two types: Primitive and non-primitive data type.
Primitive data type is the basic data type that is provided by the programming
language with built-in support. This data type is native to the language and is supported by machine directly while non-primitive data type is derived from primitive data type. For example- array, structure etc. 1.1.3 Variable
It is a symbolic name given to some known or unknown quantity or information, for the purpose of allowing the name to be used independently of the information it represents. A variable name in computer source code is usually associated with a data storage location and thus also its contents and these may change during the course of program execution. 1.1.4 Record
Collection of related data items is known as record. The elements of records are usually called fields or members. Records are distinguished from arrays by the fact that their number of fields is typically fixed, each field has a name, and that each field may have a different type. 1.1.5 Program
A sequence of instructions that a computer can interpret and execute is termed as program. 1.1.6 Entity
An entity is something that has certain attributes or properties which may be assigned some values. The values themselves may be either numeric or non-numeric.
Example:
1.1.7 Entity Set
An entity set is a group of or set of similar entities. For example, employees of an organization, students of a class etc. Each attribute of an entity set has a range of values, the set of all possible values that could be assigned to the particular attribute. The term information is sometimes used for data with given attributes, of, in other words meaningful or processed data. 1.1.8 Field
A field is a single elementary unit of information representing an attribute of an entity, a record is the collection of field values of a given entity and a file is the collection of records of the entities in a given entity set. 1.1.9 File
File is a collection of records of the entities in a given entity set. For example, file containing records of students of a particular class. 1.1.10 Key
A key is one or more field(s) in a record that take(s) unique values and can be used to distinguish one record from the others.
jrfTypewritten text
Need of data structure
jrfTypewritten text
jrfTypewritten textNeed of data structure
jrfTypewritten text It give different level of organization data.
It tells how data can be stored and accessed in its elementary level.
Provide operation on group of data, such as adding an item, looking up highest priority item.
Provide a means to manage huge amount of data efficiently.
Provide fast searching and sorting of data.
jrfTypewritten text
Digital Data
gatcttttta tttaaacgat ctctttatta gatctcttat taggatcatg atcctctgtggataagtgat tattcacatg gcagatcata taattaagga ggatcgtttg ttgtgagtgaccggtgatcg tattgcgtat aagctgggat ctaaatggca tgttatgcac agtcactcggcagaatcaag gttgttatgt ggatatctac tggttttacc ctgcttttaa gcatagttatacacattcgt tcgcgcgatc tttgagctaa ttagagtaaa ttaatccaat ctttgaccca
Movies
Music Photos
Maps
ProteinShapes
DNA
0010101001010101010100100100101010000010010010100....
RAM = Symbols + Pointers
01001010011110110110111010100000001000010010001110000100010100010000000000000100100101011000101010000001110000011111111111111111
16-bit words02468
101214
ASCII table: agreement for the meaning of bits
We may agree to interpret bits as an address (pointer)
Physically, RAM is a random accessible array of bits.
=> We can store and manipulate arbitrary symbols (like letters) and associations between them.
(for our purposes)
Digital Data Must Be ...
Encoded (e.g. 01001001 )
Arranged- Stored in an orderly way in memory / disk
Accessed- Insert new data- Remove old data- Find data matching some condition
Processed Algorithms: shortest path, minimum cut, FFT, ...
}The focus of this class
Data Structures -> Data StructurINGHow do we organize information so that we can find, update, add, and delete portions of it efficiently?
Data Structure Example Applications
1. How does Google quickly find web pages that contain a search term?
2. Whats the fastest way to broadcast a message to a network of computers?
3. How can a subsequence of DNA be quickly found within the genome?
4. How does your operating system track which memory (disk or RAM) is free?
5. In the game Half-Life, how can the computer determine which parts of the scene are visible?
What is a Data Structure Anyway?
Its an agreement about: how to store a collection of objects in memory, what operations we can perform on that data, the algorithms for those operations, and how time and space efficient those algorithms are.
Ex. vector in C++: Stores objects sequentially in memory Can access, change, insert or delete objects Algorithms for insert & delete will shift items as needed Space: O(n), Access/change = O(1), Insert/delete = O(n)
Abstract Data Types (ADT)
Data storage & operations encapsulated by an ADT. ADT specifies permitted operations as well as time and space
guarantees. User unconcerned with how its implemented
(but we are concerned with implementation in this class).
ADT is a concept or convention: - not something that directly appears in your code- programming language may provide support for communicating ADT to users
(e.g. classes in Java & C++)
insert()
delete()
find_min()
find()
int main() { D = new Dictionary() D.insert(3,10); cout
Dictionary ADT Most basic and most useful ADT:
insert(key, value) delete(key, value) value = find(key)
Many languages have it built in:
Insert, delete, find each either O(log n) [C++] or expected constant [perl, python]
Any guesses how dictionaries are implemented?
awk: D[AAPL] = 130 # associative arrayperl: my %D; $D[AAPL] = 130; # hashpython: D = {}; D[AAPL] = 130 # dictionaryC++: map D = new map();
D[AAPL] = 130; // map
C++ STL
Data structures = containers Interface specifies both operations & time guarantees
Container Element Access Insert / Delete Iterator Patternsvector const O(n) Random
list O(n) const Bidirectionalstack const (limited) O(n) Frontqueue const (limited) O(n) Front, Backdeque const O(n), const @ ends Random
map O(log n) O(log n) Bidirectionalset O(log n) O(log n) Bidirectional
string const O(n) Bidirectionalarray const O(n) Random
valarray const O(n) Randombitset const O(n) Random
Some STL Operations
Select operations to be orthogonal: they dont significantly duplicate each others functionality.
Choose operations to be useful building blocks.
push_back find insert erase size begin, end (iterators) operator[] front back
for_each find_if count copy reverse sort set_union min max
E.g. Data StructureOperations
E.g. Algorithms
Itera
tors,
Sequ
ence
s
Suppose Youre Google Maps...You want to store data about cities (location, elevation, population)...
What kind of operations should your data structure(s) support?
Operations to support the following scenarios...
Finding addresses on map?- Lookup city by name...
Mobile iPhone user?- Find nearest point to me...
Car GPS system?- Calculate shortest-path between
cities...- Show cities within a given
window...
Political revolution?- Insert, delete, rename cities
X
Data Organizing Principles
Ordering: Put keys into some order so that we know something about where each key
is are relative to the other keys. Phone books are easier to search because they are alphabetized.
Linking: Add pointers to each record so that we can find related records quickly. E.g. The index in the back of book provides links from words to the pages
on which they appear.
Partitioning: Divide the records into 2 or more groups, each group sharing a particular
property. E.g. Multi-volume encyclopedias (Aa-Be, W-Z) E.g. Folders on your hard drive
Ordering
Pheasant, 10Grouse, 89Quail, 55Pelican, 3Partridge, 32Duck, 18Woodpecker,50Robin, 89Cardinal, 102Eagle, 43Chicken, 7Pigeon, 201Swan, 57Loon, 213Turkey, 99Albatross, 0Ptarmigan, 22Finch, 38Bluejay, 24Heron, 70Egret, 88Goose, 67
Albatross, 0Bluejay, 24Cardinal, 102Chicken, 7Duck, 18Eagle, 43Egret, 88Finch, 38Goose, 67Grouse, 89Heron, 70Loon, 213Partridge, 32Pelican, 3Pheasant, 10Pigeon, 201Ptarmigan, 22Quail, 55Robin, 89Swan, 57Turkey, 99Woodpecker,50
Sequential Search O(n)
(1)
(2)(3)(4)
Search for Goose
Every step discards half the remaining entries:
n/2k = 1 2k = n
k = log n
Binary Search O(log n)
Linking
Records located any where in memory Green pointers give next element Red pointers give previous element Insertion & deletion easy if you have a pointer to the middle of the list
Dont have to know size of data at start Pointers let us express relationships between pieces of information.
2 2443
78
97
Wheres the FBI?J. Edgar Hoover Building935 Pennsylvania Avenue, NWWashington, DC, 20535-0001
NW NE
SESW
Why is the DC partitioning bad? Everything interesting is in the northwest quadrant. Want a balanced partition! Another example: an unbalanced binary search tree:
(becomes sequential search)
Much of the first part of this class will betechniques for guaranteeing balance of someform.
Binary search guarantees balanceby always picking the median.
When using a linked structure,not as easy to find the median.
35
58
35
9
19
18
98
Implementing Data Structures in C++
Remember: for templates to work, you should put all the code into the .h file. Templates arent likely to be required for the coding project, but theyre a good
mechanism for creating reusable data structures.
template struct Node { Node * next; K key;};
template class List { public: List(); K front() { if(root) return root->key; throw EmptyException; } // ... protected: Node *root;};
Structure holds the user data and some data structure bookkeeping info.
Main data structure class implements the ADT.
Will work for any type K that supports the required operations (like
Any Data Type Can Be Compared:
By overloading the < operator, we can define an order on any type (e.g. MyType)
We can sort a vector of MyTypes via:
Thus, we can assume we can compare any types.
struct MyType { string name; // ...};
bool operator
So, Much of programming (and thinking about programming) involves deciding
how to arrange information in memory. [Aka data structures.]
Choice of data structures can make a big speed difference.- Sequential search vs. Binary Search means O(n) vs. O(log n).- [log (1 billion) < 21].
Abstract Data Types are a way to encapsulate and hide the implementation of a data structure, while presenting a clean interface to other programmers.
Data structuring principles:- Ordering- Linking- (Balanced) partitioning
Review Big-O notation, if youre fuzzy on it.