Computers
Data RepresentationChapter 3, SA
Data Representation and Processing
Data and information processors must be able to:• Recognize external data and convert it to an
appropriate internal format• Store and retrieve data internally• Transport data among internal storage and
processing components
Binary Representation of Data
• Computers represent data using binary numbers.
• Binary numbers correspond directly with values in boolean logic.
• Computers combine multiple digits to form a single data value to represent large numbers.
Basic data types • Integers – whole numbers• Real numbers – w/ fractional
components• Exponential representation
• Character• ASCII vs EBCDIC
• Boolean –true/false• BLOB (Binary Large Object)
Data structures
• Defined in software•Arrays•Lists•Records •Tables•Files•Indices•Objects
Data Structures
A data structure is a related group of primitive data elements that is organized for some type of processing.
Data structures are defined and manipulated within software.
Data Structures
Virtually all data structures make extensive use of pointers and addresses.
Pointer – a data element that contains the address of another data element.
Address – the location of some data element within a storage device.
Arrays and Linked Lists
Linked List:
A linked list is a data structure that uses pointers so list elements can be scattered among nonsequential storage locations.
Records and Files
• A record is a data structure composed of other data structures or primitive data elements.
• Records are used as a unit of input and output to files or databases.
File Organization
Physical arrangement of the records of a file on secondary storage devices
•Sequential•Linked List•Indexed•Hashed
Sequential File
addr00 Ayers ACCT01 Buckley MGT
02 Daley ACCT
03 Dejoie MGT
04 Kenderdine MKT
05 Linn FIN
06 Lusch MKT
07 Price MGT
08 Razook MKT
09 Schwarzkopf MGT
Sequential file sorted in alphabetical order. Sequential files are usually sorted in ID sequence order to facilitate batch processing.
Sequential File Processing
Transaction
Old Master New MasterProcess
Sequential files must be recopied from the point of any insertion or deletion to the end of the file. They are commonly used in batch processing where a new master file will be generated each time the file is updated.
Linked List
addr pointer
00 Price MGT 01
01 Schwarzkopf MGT 02
02 Kenderdine MKT 0303 Lusch MKT 08
04 Buckley MGT 09
05 Ayers ACCT 06
06 Daley ACCT 07
07 Linn FIN 04
08 Razook MKT ##09 Dejoie MGT 00
Linked list to sort data alphabetically within department. An external reference must point to the start record (05).
Linked List File Processing
The next record in a linked list is found at the address stored in the record. Records are added at any location in the DASD and pointers adjusted to include them. Deletions are not erased, but pointers changed to omit the deleted record.
Indexed File(sequential index)
addr00 Price MGT Ayers ACCT01 Schwarzkopf MGT Daley ACCT02 Kenderdine MKT Linn FIN03 Lusch MKT Razook MKT04 Buckley MGT Dejoie MGT
ACCT 00ACCT 01FIN 02MGT 00MGT 01MGT 04MKT 03
Index to access data by department abbreviation.
Indexed File Processing
Index
Data File
When a record is inserted or deleted in a file the data can be added at any location in the data file. Each index must also be updated to reflect the change. For a simple sequential index this may mean rewriting the index for each insertion.
Index
Segmented Index
addr
00 Price MGT Ayers ACCT
01 Schwarzkopf MGT Daley ACCT
02 Kenderdine MKT Linn FIN
03 Lusch MKT Razook MKT
04 Buckley MGT Dejoie MGT
05 Van Horn MGT
addr pointer pointer pointer100 101 Kenderdine 102 Razook 103101 200 Buckley 201 Dejoie 204102 203 Lusch 202103 205 Schwarzkopf 206200 00 Ayers 04 Buckley 201201 01 Daley 04 Dejoie 204202 00 Price 03 Razook 205203 02 Linn 03 Lusch 202204 02 Kenderdine 203205 01 Schwarzkopf 206206 5 Van Horn
Index
Data
Leaf
RootNodes
Indexed File Processing (segmented index)
Data File
Index
Data can be inserted or deleted at any location in the data file. The index(es) must be updated for each change, but only the affected segments need to be rewritten.
Hashing(Prime Number Remainder Algorithm)Pick a prime number to define the file spaceDivide the key by the prime numberPut the result in the location of the remainder
Key = 41 13
3
4139
2
Location = 2
Hashed File Processing
Key Calculation
addr
Contents
Records and Files
• A sequence of records on secondary storage is called a file.
• A sequence of records stored within main memory is called a table.
• Sequential files suffer the same problems as contiguous arrays when inserting and deleting records.
• To eliminate this problem, linked lists and indexed files are used.
Classes and Objects